What is the best setup for minimizing the effect when a grid node is down

in Help Center
When a grid node is down (crush, maintenance) what is the best strategy to minimize its effect on the clients (and save useless connection establishment tries). I should mention that we would like to distribute the load across the replicated nodes as much as possible and therefore would prefer not to cache the connection on the proxy and use the random selection strategy (as described in the manual section 37.3). I am aware that we can configure a frequent registry lookup to refresh the endpoints but wonder how quick the registry will be aware of the unavailability of the node and will it not include its endpoints. Also I wonder if there is a concept of "bad endpoint" which are not used temporarily in case of consecutive failures . Any other suggestions?
Thanks,
Arie.
Thanks,
Arie.
0
Comments
Ice doesn't have any concepts of "bad endpoints". If you have a commercial interest in such a feature please contact us at [email protected].
To minimize connection attempts to invalid endpoints, you'll need to configure the client to lookup for server endpoints with the registry often and configure the registry node session timeout to a suitable value in order to detect shutdown nodes in a timely manner.
Cheers,
Benoit.
I assume once the node is up again the registration (with the registry) is immediate (provided icegridnode is configured to run upon machine reboot)?
Cheers,
Benoit.