Archived

This forum has been archived. Please start a new discussion on GitHub.

Very busy IceStorm service

With 40 users (Which I assume isn't very many) broadcasting to each other about once a second our IceStorm service seems to be extraordinarly busy, using roughly 100% CPU time on a Xeon 3Ghz dual core processor machine, despite only producing a network load of less than 1.5 Mbit, which certainly isn't a strain on the network it is running on.

The IceStorm service is running on an IceBox, and is currently configures as follows:
Ice.Default.Host=192.168.0.201
#
#Basic IceStorm box configuration
#
Ice.Default.Locator=IceGrid/Locator:tcp -p 5000
IceBox.Service.IceStorm=IceStormService,30:create
Freeze.DbEnv.IceStorm.DbHome=db/storm
IceStorm.TopicManager.Endpoints=tcp -p 9999
IceStorm.TopicManager.AdapterId=IceStorm
IceStorm.TopicManager.Proxy=IceStorm:tcp -p 9999
IceStorm.Publish.Endpoints=tcp
#Disables ACM (Active Connection Management) to avoid timeout on oneway connections
Ice.ACM.Client=0
Ice.ACM.Server=0
Ice.MonitorConnections=60
Ice.RetryIntervals=-1
#Overrides to avoid hanging on connections to clients with firewalls - In milliseconds
#Ice.Override.ConnectTimeout=250 #Value is pulled from my ass - NHB
#Ice.Override.Timeout=250 #Value is pulled from my ass - NHB

#
#Thread setup
#
Ice.ThreadPool.Client.Size=5
Ice.ThreadPool.Client.SizeWarn=800
Ice.ThreadPool.Client.SizeMax=1000

Ice.ThreadPool.Server.Size=5
Ice.ThreadPool.Server.SizeWarn=800
Ice.ThreadPool.Server.SizeMax=1000

#Ice.GC.Interval=300
#Ice.Trace.GC=2

#
#Tracing for debugging purposes
#
IceStorm.Trace.Topic=2
IceStorm.Trace.Subscriber=1
Ice.Trace.Network=2
Ice.Trace.Retry=2
Ice.Trace.Protocol=1
#Ice.Warn.Connections=1
Ice.MessageSizeMax=4096

So it is currently doing a lot of tracing (Because we have been trying to trace a possible 'clogging' in certain IceStorm topics - Somewhat similar to what was described in http://www.zeroc.com/vbulletin/showthread.php?t=2171), and if this would explain the load we could certainly disable the tracing at least for now. The IceStorm service may also be experiencing a high number of dead subscribers, that haven't properly unsubscribed, as we currenly have some client crash issues.

It might also be a perfectly reasonable cpu load, it just seemed to be quite busy when compared to the network load, so I figured I would ask. We can handle the current load, but if it was a sign of things to come when we want to have a 1000 clients I just wanted to know if there was something obvoius I had overlooked.

Comments

  • matthew
    matthew NL, Canada
    What reliability QoS are you using? Assuming you are not using batch then you are probably doing 1600 RPCs a second (40 messages published once a second to 40 subscribers) so I'm not terribly suprised that the load is quite high :) You should probably enable batch processing and set the flush interval to a reasonable value for your system. If you, for example, selected 2 second flush interval then you could probably handle 3200 connected clients with about the same load.

    Also why have you configured so many threads for the client/server thread pool?
  • matthew wrote:
    What reliability QoS are you using?

    Nothing explicitly configured, but batching would probably be quite acceptable. All subscription and publishing happens via one-way proxies if that makes any difference.
    matthew wrote:
    Assuming you are not using batch then you are probably doing 1600 RPCs a second (40 messages published once a second to 40 subscribers) so I'm not terribly suprised that the load is quite high :) You should probably enable batch processing and set the flush interval to a reasonable value for your system. If you, for example, selected 2 second flush interval then you could probably handle 3200 connected clients with about the same load.

    It sounds like there was something obvious I overlooked, so I'll certainly look at that. Does this batching have to be added for each subscriber, or can it be configured globally for the whole service, or from the publisher end.

    Also we are currently transmitting on a number of different channels, as that was the best model from an OO point of view, but with batching I assume it might be more efficient to gather as much communication as possible in one subscriber interface.
    matthew wrote:
    Also why have you configured so many threads for the client/server thread pool?

    Historical reasons - At some point it seemed to improve performance quite handsomely and only cost some extra memory, which was once resource we had enough of. I've actually just forgotten to adjust it to a more reasonable level since.
  • benoit
    benoit Rennes, France
    Yes, batching would probably be useful in your case to increase the throughput. You need to change the way your clients subscribe to use batching. They should specify the "reliability = batch" QoS (see the Ice manual for more information on how to use IceStorm QoS). You should also configure the IceStorm.Flush.Timeout property (or <IceStorm service name>.Flush.Timeout property).

    You can still use as many topics as you want, it won't have any effect on the batching. The batching is done at the connection level -- if you subscriber subscriber to 10 different topics, all the updates will be received over the same connection in batches regardless of the topic.

    I'm assuming that IceStorm sends the updates to your subscrber through Glacier2 correct? If that's the case, you shouldn't need so many threads as your IceStorm service will only be sending oneway requests to Glacier2 and sending these requests should never block. Too many threads might cause some additional overhead. I would recommend to use a fixed size thread pool with 2-3 threads for the server side thread pool and the default configuration for the Ice client thread pool.

    Cheers,
    Benoit.
  • benoit wrote:
    Yes, batching would probably be useful in your case to increase the throughput. You need to change the way your clients subscribe to use batching. They should specify the "reliability = batch" QoS (see the Ice manual for more information on how to use IceStorm QoS). You should also configure the IceStorm.Flush.Timeout property (or <IceStorm service name>.Flush.Timeout property).

    Yup, I found those things. I assume there is no way to control the Flush.Timeout on a topic or connection basis - Something more finegrained than one timeout for the whole service? Not that it is that importent, just that since you specify the QoS on a per subscriber basis, it seemed logical that you might want to do the same for the timeout.
    benoit wrote:
    You can still use as many topics as you want, it won't have any effect on the batching. The batching is done at the connection level -- if you subscriber subscriber to 10 different topics, all the updates will be received over the same connection in batches regardless of the topic.

    Excuse me being a dim here, but does what exactly constitues a connection in this context. For each client I'm subscribing several different proxies, tied to several different objects, to different topics. However all the objects (and thus all the proxies) are registered at the same adapter. So is that one connection or several ?
    benoit wrote:
    I'm assuming that IceStorm sends the updates to your subscrber through Glacier2 correct? If that's the case, you shouldn't need so many threads as your IceStorm service will only be sending oneway requests to Glacier2 and sending these requests should never block. Too many threads might cause some additional overhead. I would recommend to use a fixed size thread pool with 2-3 threads for the server side thread pool and the default configuration for the Ice client thread pool.

    Ok - I'll get right on cleaning that up then.

    mvh

    Nis
  • benoit
    benoit Rennes, France
    Excuse me being a dim here, but does what exactly constitues a connection in this context. For each client I'm subscribing several different proxies, tied to several different objects, to different topics. However all the objects (and thus all the proxies) are registered at the same adapter. So is that one connection or several ?

    That's one connection if the objects are registered with the same object adapter.

    Actually, if your IceStorm service send updates through a Glacier2 router to your clients, the updates will be sent over the connection between IceStorm and Glacier2. Glacier2 will then forward the updates to each client.

    Cheers,
    Benoit.
  • benoit
    benoit Rennes, France
    Btw, there's no way to configure the flush timeout on a subscriber basis. It's actually not possible to specify the flush timeout on a subscriber basis because the batching is done at the connection level.

    Cheers,
    Benoit.
  • benoit wrote:
    Btw, there's no way to configure the flush timeout on a subscriber basis. It's actually not possible to specify the flush timeout on a subscriber basis because the batching is done at the connection level.

    That was what I expected, but I just wanted to be sure.
  • benoit wrote:
    That's one connection if the objects are registered with the same object adapter.

    Actually, if your IceStorm service send updates through a Glacier2 router to your clients, the updates will be sent over the connection between IceStorm and Glacier2. Glacier2 will then forward the updates to each client.

    Cheers,
    Benoit.

    Great - Saves me modifying my interfaces to get full use out of the batching.
  • benoit wrote:
    I'm assuming that IceStorm sends the updates to your subscrber through Glacier2 correct? If that's the case, you shouldn't need so many threads as your IceStorm service will only be sending oneway requests to Glacier2 and sending these requests should never block. Too many threads might cause some additional overhead. I would recommend to use a fixed size thread pool with 2-3 threads for the server side thread pool and the default configuration for the Ice client thread pool.

    We've now introduced batching (1 second timer) on the sunscribers I've tried reducing the threadpool to
    Ice.ThreadPool.Client.Size=5
    Ice.ThreadPool.Client.SizeWarn=5
    Ice.ThreadPool.Client.SizeMax=5
    
    Ice.ThreadPool.Server.Size=5
    Ice.ThreadPool.Server.SizeWarn=5
    Ice.ThreadPool.Server.SizeMax=5
    

    And now we are seeing a lot of threadpool warnings on the server thread pool. So it seems like it could easily use a lot larger threadpool. It especially seems to have problems coping with high numbers of 'subscribe' requests.
  • marc
    marc Florida
    Disable the warnings. All they show is that all threads in the pool are being used, which is expected for a high-throughput service. Adding more threads won't help performance. On the contrary, it would degrade performance, because there would be more task switching.
  • marc wrote:
    Disable the warnings. All they show is that all threads in the pool are being used, which is expected for a high-throughput service. Adding more threads won't help performance. On the contrary, it would degrade performance, because there would be more task switching.

    Fair enough. I'll do that first thing tomorrow.