Archived

This forum has been archived. Please start a new discussion on GitHub.

IceGrid load balancing policy

We are working on building large service cluster using ICE as communication mechanism.

Unlike normal service cluster, our service has some distinct points:

- 1 client requests may takes 3-20 seconds.
- 1 request may use almost all the resources of the server.

I read the documentation of IceGrid load balancing policy, "adaptive", and read that
it uses 'load-average' of the linux server.

As I told you above the characteristic of our service cluster, 1 node can have 1 request at most.
Thus, unlike normal services, the load itself is discrete in out cluster (e.g. 0 for idle, 1 for busy).

AFAIK, load average in linux, by default, uses 1 minutes load average from the querying time. Since our service node uses discrete load (0 or 1), using linux load average might not good idea.

For example, suppose we have 5 nodes (i.e. max 5 concurrent connection can handle):

1. At T seconds: node#1 ~ node#4 is busy.
2. At T+1 seconds: node#5 was busy, and turned to idle just now.
3. At T+2 seconds: new client request,
4. by the icegrid load balancing policy, "adaptive", it may not know that node#5 is idle. since it was just 2 seconds ago that turned to be "idle", the load average of the node#5 is almost the same as other nodes.


Unfortunately, I'm not an expert on ICE. I couldn't find a best solution or examples of the specific service cluster (1 node can handle only 1 connection, thus the monitoring/load balancing metrics should be discrete).

Q1. If there is another load balancing policy, which does not uses "average" value but the value at the querying time, it would be great for our purpose.

Q2. If there is another load balancing policy, which provides custom metrics (esp. if the icegrid servants can send the custom metric value to the registry??) it would be perfect for our purpose.

Q3. Can you provide some advice on this, other than previous questions?

Comments

  • benoit
    benoit Rennes, France
    cinsk wrote: »
    Q1. If there is another load balancing policy, which does not uses "average" value but the value at the querying time, it would be great for our purpose.

    See the available load balancing types in the manual for the possible load balancing you can set on a replica group.

    There's no load balancing type that checks at the time of the locator query the CPU usage of each node to return the endpoints of the server on the least used node. This is something we could consider adding.
    Q2. If there is another load balancing policy, which provides custom metrics (esp. if the icegrid servants can send the custom metric value to the registry??) it would be perfect for our purpose.

    We do not provide yet custom load balancing. This is on our TODO list however, please contact us at info@zeroc.com if you would like to sponsor such a feature and see it implemented rather sooner than later.
    Q3. Can you provide some advice on this, other than previous questions?

    It isn't clear to me that load balancing is actually what you need. The problem with using load balancing is that it still won't be totally accurate. For example, if 2 clients request at the same time the same replica group ID to the IceGrid locator and only one server is idle, the locator will return the endpoints for this server and the 2 clients will end up calling on this same server at the same time. Won't this be an issue in your scenario?

    Wouldn't a resource sharing mechanism be more appropriate for your use case? This mechanism would ensure that only at most one client is using a given server at a time. IceGrid provides such a mechanism where clients can allocate Ice objects and servers for the purpose of resource sharing. You'll find more information on this API at this link in the Ice manual. We also provide a demo in the C++ demo/IceGrid/allocate directory. Did you consider using this facility instead?

    Cheers,
    Benoit.
  • benoit wrote: »
    See the available load balancing types in the manual for the possible load balancing you can set on a replica group.

    There's no load balancing type that checks at the time of the locator query the CPU usage of each node to return the endpoints of the server on the least used node. This is something we could consider adding.



    We do not provide yet custom load balancing. This is on our TODO list however, please contact us at info@zeroc.com if you would like to sponsor such a feature and see it implemented rather sooner than later.



    It isn't clear to me that load balancing is actually what you need. The problem with using load balancing is that it still won't be totally accurate. For example, if 2 clients request at the same time the same replica group ID to the IceGrid locator and only one server is idle, the locator will return the endpoints for this server and the 2 clients will end up calling on this same server at the same time. Won't this be an issue in your scenario?

    Wouldn't a resource sharing mechanism be more appropriate for your use case? This mechanism would ensure that only at most one client is using a given server at a time. IceGrid provides such a mechanism where clients can allocate Ice objects and servers for the purpose of resource sharing. You'll find more information on this API at this link in the Ice manual. We also provide a demo in the C++ demo/IceGrid/allocate directory. Did you consider using this facility instead?

    Cheers,
    Benoit.

    Yes, I guess so, the allocation scheme (aka session) looks like what I need.
    Thank you for the information.