IceGrid round-robin load balancing wrong behaviour?

javiromanjaviroman Madrid (Spain)Member Javier RomanOrganization: www.keedio.comProject: Open Source in Enterprise
Hello!

I'm getting an weird behaviour with the round-robin sheduling policy in IceGrid (3.4.0). I've to say I don't know if this is a wrong behaviour (bug) otherwise it's a normal feature from an IceGrid round-robin point of view.

In a grid of nodes, one replicagroup per node, when the registry schedules the tasks in a round robin fashion, and runs into some node down (for example the node has a network failure or it's shutdown), the round-robin scheduler jump to the next replica-group in the list of nodes (the nodes are sorted in alphabetical order in the brain of registry), and schedules the task of the failed node and the next task to this new node, so this node is running two tasks, the node-failed task and the own task.

The worst scenario is when the number of down nodes is huge. For example, if we have 50 nodes, and the 25 first nodes in alphabetical order are down, the registry scheduler sends this 25 tasks failed to the next node up in the list of nodes, so this first-node-up in the alphabetical list is getting a lot of tasks. This behaviour breaks the load-balancing of the grid.

Please, is this a bug, otherwise is the normal behaviour?

Many thanks.

- Javi

Comments

  • javiromanjaviroman Madrid (Spain)Member Javier RomanOrganization: www.keedio.comProject: Open Source in Enterprise
  • mesmes CaliforniaAdministrators, ZeroC Staff Mark SpruiellOrganization: ZeroC, Inc.Project: Ice Developer ZeroC Staff
    Hi Javi,

    Thanks for the report. We'll investigate this further and reply when we have more information for you.

    Best regards,
    Mark
  • javiromanjaviroman Madrid (Spain)Member Javier RomanOrganization: www.keedio.comProject: Open Source in Enterprise
    The following is an example of this behaviour, for your information.

    With this classical application descriptor:
    <icegrid>
        <application name="HelloGridApplication">
    	<replica-group id="HelloWorldReplicaGroup">
    		<load-balancing type="round-robin"/>
    		<object identity="ObjectIdentity1"/>
    	</replica-group>
    
        	<server-template id="HelloWorldTemplate">
    		<parameter name="index"/>
      		<server id="HelloServer-${index}" 
                	    exe="./server.py" 
                	    activation="on-demand">
                    	<adapter name="HelloAdapter" 
                                         endpoints="default" 
                                         replica-group="HelloWorldReplicaGroup">
                    	</adapter>
                	</server>
            </server-template>
    
            <node name="Node4">
            	<server-instance template="HelloWorldTemplate" index="0"/>
            </node>
    
            <node name="Node2">
            	<server-instance template="HelloWorldTemplate" index="1"/>
            </node>
    
            <node name="Node1">
            	<server-instance template="HelloWorldTemplate" index="2"/>
            </node>
    
            <node name="Node3">
            	<server-instance template="HelloWorldTemplate" index="3"/>
            </node>
    
            <node name="Node5">
            	<server-instance template="HelloWorldTemplate" index="4"/>
            </node>
        </application>
    </icegrid>
    

    If you launch 200 tasks:

    1. When the whole set of nodes is up, this is the normal scheduling:

    Node1 up -> 40 tasks
    Node2 up -> 40 tasks
    Node3 up -> 40 tasks
    Node4 up -> 40 tasks
    Node5 up -> 40 tasks

    2. Never the less, whether there are (for example) the first three nodes (in alphabetical order) down, the scheluling is like this:

    Node1 down -> 0 tasks
    Node2 down -> 0 tasks
    Node3 down -> 0 tasks
    Node4 up -> 160 tasks
    Node5 up -> 40 tasks

    This scenario is critical if we are talking about thousand of tasks, and hundred of nodes.

    Many thanks.

    - Javi
  • benoitbenoit Rennes, FranceAdministrators, ZeroC Staff Benoit FoucherOrganization: ZeroC, Inc.Project: Ice ZeroC Staff
    Hi,

    Thanks for the bug report and the instructions, we were able to reproduce the issue. It's not the same problem as the one mentioned on the other thread (this other problem was fixed a while ago) but it's similar. A fix for this bug will be included in the next Ice release!

    Cheers,
    Benoit.
  • javiromanjaviroman Madrid (Spain)Member Javier RomanOrganization: www.keedio.comProject: Open Source in Enterprise
    Great!

    Is there any kind of information (roadmap) about the ice release development cycle? patch releasing or something like that?

    Many thanks.

    - Javi
Sign In or Register to comment.