IceGrid round-robin load balancing wrong behaviour?

javiroman · May 2012

Hello!

I'm getting an weird behaviour with the round-robin sheduling policy in IceGrid (3.4.0). I've to say I don't know if this is a wrong behaviour (bug) otherwise it's a normal feature from an IceGrid round-robin point of view.

In a grid of nodes, one replicagroup per node, when the registry schedules the tasks in a round robin fashion, and runs into some node down (for example the node has a network failure or it's shutdown), the round-robin scheduler jump to the next replica-group in the list of nodes (the nodes are sorted in alphabetical order in the brain of registry), and schedules the task of the failed node and the next task to this new node, so this node is running two tasks, the node-failed task and the own task.

The worst scenario is when the number of down nodes is huge. For example, if we have 50 nodes, and the 25 first nodes in alphabetical order are down, the registry scheduler sends this 25 tasks failed to the next node up in the list of nodes, so this first-node-up in the alphabetical list is getting a lot of tasks. This behaviour breaks the load-balancing of the grid.

Please, is this a bug, otherwise is the normal behaviour?

Many thanks.

- Javi

javiroman · May 2012

Probably this issue is related with this other one?

http://www.zeroc.com/forums/bug-reports/3478-bug-round-robin-load-balancing.html

mes · May 2012

Hi Javi,

Thanks for the report. We'll investigate this further and reply when we have more information for you.

Best regards,
Mark

javiroman · May 2012

The following is an example of this behaviour, for your information.

With this classical application descriptor:

<icegrid>
    <application name="HelloGridApplication">
	<replica-group id="HelloWorldReplicaGroup">
		<load-balancing type="round-robin"/>
		<object identity="ObjectIdentity1"/>
	</replica-group>

    	<server-template id="HelloWorldTemplate">
		<parameter name="index"/>
  		<server id="HelloServer-${index}" 
            	    exe="./server.py" 
            	    activation="on-demand">
                	<adapter name="HelloAdapter" 
                                     endpoints="default" 
                                     replica-group="HelloWorldReplicaGroup">
                	</adapter>
            	</server>
        </server-template>

        <node name="Node4">
        	<server-instance template="HelloWorldTemplate" index="0"/>
        </node>

        <node name="Node2">
        	<server-instance template="HelloWorldTemplate" index="1"/>
        </node>

        <node name="Node1">
        	<server-instance template="HelloWorldTemplate" index="2"/>
        </node>

        <node name="Node3">
        	<server-instance template="HelloWorldTemplate" index="3"/>
        </node>

        <node name="Node5">
        	<server-instance template="HelloWorldTemplate" index="4"/>
        </node>
    </application>
</icegrid>

If you launch 200 tasks:

1. When the whole set of nodes is up, this is the normal scheduling:

Node1 up -> 40 tasks
Node2 up -> 40 tasks
Node3 up -> 40 tasks
Node4 up -> 40 tasks
Node5 up -> 40 tasks

2. Never the less, whether there are (for example) the first three nodes (in alphabetical order) down, the scheluling is like this:

Node1 down -> 0 tasks
Node2 down -> 0 tasks
Node3 down -> 0 tasks
Node4 up -> 160 tasks
Node5 up -> 40 tasks

This scenario is critical if we are talking about thousand of tasks, and hundred of nodes.

Many thanks.

- Javi

benoit · May 2012

Hi,

Thanks for the bug report and the instructions, we were able to reproduce the issue. It's not the same problem as the one mentioned on the other thread (this other problem was fixed a while ago) but it's similar. A fix for this bug will be included in the next Ice release!

Cheers,
Benoit.

javiroman · May 2012

Great!

Is there any kind of information (roadmap) about the ice release development cycle? patch releasing or something like that?

Many thanks.

- Javi

Archived

IceGrid round-robin load balancing wrong behaviour?

Comments

Categories