Archived

This forum has been archived. Please start a new discussion on GitHub.

Client starvation with Linux server

Hi, I'm investigating a problem that looks like a client starvation, using Ice 3.3.1 on Linux CentOS 5.3 (kernel 2.6.18).

The scenario is as follows: I have ~320 clients connecting to a server. Server thread pool size is set to 4. Time processing one client request is consistently under 2ms (measured from the entering to exiting the servant method). Nevertheless, from time to time one of the clients encounters a 1 minute timeout.

Assuming generous Ice/networkingOS-scheduling overhead of 8ms, the server should be able to process 100 requests per second. If clients were served in strict round-robin manner, the delay should be on the order of 3-4 seconds. Thus, hitting 1 minute timeout seems to indicate a problem.

With this background, I'm looking at some suspicious code in Selector.h and ThreadPool.cpp:

The following code in Selector::getNextSelected()

if(_nSelectedReturned == _nSelected)
{
if(_count != _events.size())
{
_events.resize(_count);
}
return 0;
}

is never executed, since ThreadPool calls getNextSelected() only once per call to Selector::select(), and select() returns when _nSelected != 0 (ignoring some exceptional conditions).

Thus, _events.size() is always 32, as set initially in the constructor.

Now, the way Linux kernel works, when a handle is returned from epoll, but remains ready, it's moved to the end of the queue (see fs/eventpoll.c, function ep_reinject_items). One can certainly construct artificial scenarios where that would cause indefinite starvation. Say we have exactly 64 clients and they are always "ready" (i.e. they send new queries as fast as the server can handle them). The kernel queue will be then:

1-32,33-64 ->epoll-> 33-64,1-32 ->epoll-> 1-32,33-64 and so on.

The only FDs that will be serviced in this scenario are 33 and 1. Read the loop in getNextSelected() below "Round robin for the filedescriptors" comment to understand why.

As a possible fix: given how ThreadPool uses Selector, I don't see why bother ever getting more than than one ready FD from epoll. That will fix the starvation, AND reduce the work the kernel has to spend moving the linked lists around.

Comments

  • benoit
    benoit Rennes, France
    Hi Ingor,

    Thanks for the bug report and detailed analysis. This problem will be fixed in the up-coming Ice 3.4.0 release where we changed this code significantly. Instead, the Ice thread pool now gets a fixed number of FDs from epoll and process them all before calling epoll again to get a new set of FDs.

    Cheers,
    Benoit.
  • Great, thanks.

    Any hint on when 3.4.0 might be out? Also, is there public access to your under-development source tree, or do you only publish the source of released versions?

    Best regards!
  • benoit
    benoit Rennes, France
    Hi Igor,

    Yes, we only publish the source of released versions. We don't know exactly yet when Ice 3.4.0 will be released, most likely not before a month.

    Cheers,
    Benoit.