Archived

This forum has been archived. Please start a new discussion on GitHub.

What is the best setup for minimizing the effect when a grid node is down

When a grid node is down (crush, maintenance) what is the best strategy to minimize its effect on the clients (and save useless connection establishment tries). I should mention that we would like to distribute the load across the replicated nodes as much as possible and therefore would prefer not to cache the connection on the proxy and use the random selection strategy (as described in the manual section 37.3). I am aware that we can configure a frequent registry lookup to refresh the endpoints but wonder how quick the registry will be aware of the unavailability of the node and will it not include its endpoints. Also I wonder if there is a concept of "bad endpoint" which are not used temporarily in case of consecutive failures . Any other suggestions?
Thanks,
Arie.

Comments

  • benoit
    benoit Rennes, France
    How quickly the registry detects that a node is down depends on the IceGrid.Registry.NodeSessionTimeout property, see the Ice manual for more information on this property. Once the registry detects that a node is down, it won't return endpoints for servers deployed on this node.

    Ice doesn't have any concepts of "bad endpoints". If you have a commercial interest in such a feature please contact us at info@zeroc.com.

    To minimize connection attempts to invalid endpoints, you'll need to configure the client to lookup for server endpoints with the registry often and configure the registry node session timeout to a suitable value in order to detect shutdown nodes in a timely manner.

    Cheers,
    Benoit.
  • Thanks!
    I assume once the node is up again the registration (with the registry) is immediate (provided icegridnode is configured to run upon machine reboot)?
  • benoit
    benoit Rennes, France
    Yes, the node establishes a session with the registry on startup.

    Cheers,
    Benoit.