Archived

This forum has been archived. Please start a new discussion on GitHub.

Bug in round-robin load balancing

Hello,

we have seen that round-robin policy does not work as expected when any of the nodes is down. It works fine when all the nodes within the replica group are up and running, but if any of them is not available (either disabled or just down), all its load goes to the next server in the replica group, so effectively duplicating the load of that node.

[n-replicas is not set, so it should return only one node endpoint]

Comments

  • benoit
    benoit Rennes, France
    Hi,

    Yes, the round robin policy might indeed select successively twice the same server if the previous server in the replica group is down but the IceGrid node is still up (if it's down or unreachable, this shouldn't occur). We'll look into fixing this, thanks for the report.

    Cheers,
    Benoit.
  • benoit wrote: »
    Yes, the round robin policy might indeed select successively twice the same server if the previous server in the replica group is down but the IceGrid node is still up (if it's down or unreachable, this shouldn't occur). We'll look into fixing this, thanks for the report.

    It does occur when the icegridnode process is down. We have not tested with an unreachable node, but I guess it is the same behaviour.