Throttling of messages in Icestorm

mangrish · October 2009

Hi,

We are using Icestorm v3.3.1 and we are experiencing some strange throttling behaviour.

The system design involves a java process which has 25 threads as part of a java executor publishing onto an icestorm topic in ice_oneway mode. Our system generates approximately 10k messages per second with each message being approximately 40 bytes in length. During peak we can afford for a delays of up to 100ms. We have a subscriber running on the same box which then picks up these messages and processess them. We also have 'lighter' subscribers on different computers too that typically just view the messages on the topic.

When no subscriber is attached our cpu usage is quite small (about 20%). When we attach the local subscriber our icebox process and the consumer process go through the roof. When we attach a remote subscriber the icebox processess' memory usage goes ridiculously high and cpu is again remarkably high. Typically our colocated ice consumer and the icebox process far outweigh the actual processing capacity of the server process generating the messages!

Some questions:
* As cpu gets closer to 100% the ability to put messages on to the icestorm slows down and the transport time to the consumer slows down. This is unacceptable for us so we need to determine a way to improve this. We aren't sure why the icebox process needs so much cpu. What could be causing this? If there is no way around this, do we have to assume that icestorm will consume more system resources than other processes and make sure that we don't load our system to anything high than 80%?
* We are running icestorm in its default mode. We have tried increating the number of client threads but this seems to have no effect. What configuration settings can we use to improve performance?
* Is there something that icestorm clients should be doing to reduce their cpu overhead? If our clients consume too slow, what can we do to optimise our processing? Our 'light' viewer process usually consumes 20% cpu on a server class machine...this seems wrong!

I have attached some of our files:
config.icebox:

#
# The IceStorm service instance name.
#
IceStorm.InstanceName=DemoIceStorm

#
# This property defines the endpoints on which the IceStorm
# TopicManager listens.
#
IceStorm.TopicManager.Endpoints=default -p 10000

#
# This property defines the endpoints on which the topic
# publisher objects listen. If you want to federate
# IceStorm instances this must run on a fixed port (or use
# IceGrid).
#
IceStorm.Publish.Endpoints=tcp -p 10001:udp -p 10001

#
# TopicManager Tracing
#
# 0 = no tracing
# 1 = trace topic creation, subscription, unsubscription
# 2 = like 1, but with more detailed subscription information
#
IceStorm.Trace.TopicManager=0

#
# Topic Tracing
#
# 0 = no tracing
# 1 = trace unsubscription diagnostics
#
IceStorm.Trace.Topic=0

#
# Subscriber Tracing
#
# 0 = no tracing
# 1 = subscriber diagnostics (subscription, unsubscription, event
#     propagation failures)
#
IceStorm.Trace.Subscriber=0

#
# Amount of time in milliseconds between flushes for batch mode
# transfer. The minimum allowable value is 100ms.
#
IceStorm.Flush.Timeout=100

#
# Network Tracing
#
# 0 = no network tracing
# 1 = trace connection establishment and closure
# 2 = like 1, but more detailed
# 3 = like 2, but also trace data transfer
#
#Ice.Trace.Network=0

#
# This property defines the home directory of the Freeze 
# database environment for the IceStorm service.
#
Freeze.DbEnv.IceStorm.DbHome=C:\icestation\icestorm\db


Ice.ThreadPool.Client.Size=25
Ice.ThreadPool.Client.SizeMax=35
Ice.ThreadPool.Client.SizeWarn=30
Ice.ThreadPool.Server.Size=4
Ice.ThreadPool.Server.SizeMax=16
Ice.Default.CollocationOptimized=1

config.icebox:

#
# The IceBox server endpoint configuration
#
IceBox.ServiceManager.Endpoints=tcp -p 9998

#
# The IceStorm service
#
IceBox.Service.IceStorm=IceStormService,33:createIceStorm --Ice.Config=C:\icestation\icestorm\config\config.service

#
# Warn about connection exceptions
#
#Ice.Warn.Connections=1

#
# Network Tracing
#
# 0 = no network tracing
# 1 = trace connection establishment and closure
# 2 = like 1, but more detailed
# 3 = like 2, but also trace data transfer
#
#Ice.Trace.Network=1

#
# Protocol Tracing
#
# 0 = no protocol tracing
# 1 = trace protocol messages
#
#Ice.Trace.Protocol=1

Thanks for your help!

::mark

benoit · October 2009

Hi Mark,

What you describe is typical of an IceStorm service that can't keep up with the distribution of the events or subscribers which are too slow to consume the incoming events. IceStorm consuming 100% CPU for just one subscriber seems a bit too much but it's hard to say without more information whether or not this is expected and what could be the reason.

Could you describe a bit more your deployment? The following information would help to try to understand what can be changed to help with this situation:

operating system
whether or not you built Ice yourself (if this is the case, are you using a debug or optimized version?)
number of expected subscribers
the kind of network used by your subscribers to connect to IceStorm
the QoS used by the subscribers to subscribe (oneway, twoway, etc?)
the type of the events you are typically sending (whether or not they contain strings, Slice classes, etc)

You shouldn't have to tweak the thread pool size for IceStorm. Too many threads can actually have a negative effect on IceStorm performances because of additional thread context switches. So I recommend to remove the thread pool property settings.

Cheers,
Benoit.

mangrish · October 2009

benoit wrote: »

IceStorm consuming 100% CPU for just one subscriber seems a bit too much but it's hard to say without more information whether or not this is expected and what could be the reason.

Icestorm itself doesn't get to 100%. Our publishing process runs at around 30% cpu, but then our icestorm runs at around 20% and our co located subscriber consumes around 20%. When there is a spike of activity and 100% cpu is reached, icestorm seems to lock up and not accept anymore requests to publish and takes a long time to send data to the co located subscriber (and anyone else attached for that matter).

benoit wrote: »

operating system

whether or not you built Ice yourself (if this is the case, are you using a debug or optimized version?)

number of expected subscribers

the kind of network used by your subscribers to connect to IceStorm

the QoS used by the subscribers to subscribe (oneway, twoway, etc?)

the type of the events you are typically sending (whether or not they contain strings, Slice classes, etc)

[*] Production: Windows Server 2003 R2 x64. Development: Windows XP SP3
[*] We didnt build ice... just used the distro that comes bundled with visual studio.
[*] 1 Subscriber co-located. This is a special subscriber and it consequently really cares about receiving every message as soon as the server generates it. Is there something special we can configure for this since its colocated? or will icestorm be clever enough to work this out and not do the extra overhead of a network call? 1-10 subscribers on other computers. Mostly just monitoring applications.
[*] Not sure what you are after in terms of network.. its just an internal network.. no firewalls or allowable access to external connections.
[*] At the moment we are using ice_oneway(), but would there be an advantage to batching?
[*] Here is the slice:

	sequence<byte>                     ByteList;

	class ProxiedInstrumentPrice 
    {
        string ric;
        double unadjustedPrice;
        double adjustedPrice;
	    ByteList reutersPersmission;
    };

	interface MarketData
	{
	    void publish(ProxiedInstrumentPrice proxiedInstrumentPrice);
	    void heartBeat();
	};

benoit wrote: »

You shouldn't have to tweak the thread pool size for IceStorm. Too many threads can actually have a negative effect on IceStorm performances because of additional thread context switches. So I recommend to remove the thread pool property settings.

So in terms of application development:
[*] Is placing messages onto icestorm with 25 threads sharing one ObjectPrx implementation from our publisher the correct mechanism to use? Is there perhaps some sort of ice thread that would be better to use?
[*] With JMS i would normally have x threads consuming off the Topic and executing logic. I understand that by creating an implementation of the ObjectImpl (a _Disp object) icestorm's threading model will call this code. How can i control the threading? how can i tune it so as to find an optimal setting?

benoit · October 2009

Which language mapping do you use for the subscriber and publisher, is it Java or C++?

If the subscriber is running on the same machine as the IceStorm service it will use the TCP/IP loopback interface, there are no special optimizations that would allow to bypass the TCP/IP stack. Setting Ice.Default.CollocationOptimized=1 for the IceStorm service won't have any effect, IceStorm can't use the collocation optimization.

I was wondering what kind of internal network your application will be running on, a gigabit Ethernet network I guess?

Subscribing using a oneway proxy is fine. Using batching would be fine too, it would probably increase the throughput but then it would be at the expense of latency (i.e.: the messages wouldn't arrive as soon as possible but only at regular intervals).

It is fine to call a proxy from 25 threads. The sending of the requests will be serialized on the connection between the publisher and IceStorm which is ok too because the messages are queued in the same queue in the end.

The Ice threading model is explained here in the Ice manual. In short, by default the Ice server thread pool has a single thread. Increasing the number of threads can help in circumstances where the dispatch of an invocation can potentially block (because it's making another Ice synchronous call on another service, it's trying to acquire some resources or executing some SQL query for example). In this case, another thread can dispatch another request while this thread is sleeping. However, if the dispatched operations never block (i.e.: it's always runnable), there isn't much interest in increasing the number of threads beyond the number of cores of the machine. What are the specifications of your machine (how many cores, frequency)? In any case, for IceStorm, increasing the number of threads isn't useful because its internal logic never requires blocking.

I'm afraid in your case, it looks like your machine simply can't keep up with the number of invocations, 10k messages per second isn't little and with 10 subscribers, you're looking into forwarding with IceStorm 100k invocations per second... Can your subscriber actually keep up with this number of invocations? Did you try to send the messages directly to the subscriber from the publisher without IceStorm to see how it performed? It would also be interesting to try to run the subscriber on a separate machine to see if it helps (I suspect it will since your problem seems to be lack of CPU).

In any case, if the publisher is producing more messages than the subscriber can handle, the messages will start building up in IceStorm. The IceStorm service will start consuming more memory to queue all these messages, which will in turn eventually cause memory swapping which will slow things down even more... So since there's no flow-control mechanism to slow down the publisher if the subscriber can't keep up it's important to ensure there's always enough CPU/network bandwith on the subscriber side to make sure the subscriber will always be able to consume the incoming events in a timely manner, especially if the publisher publishes messages at a fixed rate.

Perhaps you can describe a little more why your publisher needs to publish that many events per second? Can you perhaps aggregate some of the messages? Does your subscriber need to process all the messages or can some be dropped?

Cheers,
Benoit.

mangrish · October 2009

benoit wrote: »

Which language mapping do you use for the subscriber and publisher, is it Java or C++?

Java

benoit wrote: »

I was wondering what kind of internal network your application will be running on, a gigabit Ethernet network I guess?

Ah.. yep.. its a gigabit Ethernet network.

benoit wrote: »

However, if the dispatched operations never block (i.e.: it's always runnable), there isn't much interest in increasing the number of threads beyond the number of cores of the machine. What are the specifications of your machine (how many cores, frequency)?

Production is currently 8 cores (Intel Xeon 5130 @2GHz Windows Server 2003), but we are moving to a 32 core Solaris 10 box next month. How can we improve our Icestorm configuration to take advantage of this?

benoit wrote: »

I'm afraid in your case, it looks like your machine simply can't keep up with the number of invocations, 10k messages per second isn't little and with 10 subscribers, you're looking into forwarding with IceStorm 100k invocations per second... Can your subscriber actually keep up with this number of invocations? Did you try to send the messages directly to the subscriber from the publisher without IceStorm to see how it performed? It would also be interesting to try to run the subscriber on a separate machine to see if it helps (I suspect it will since your problem seems to be lack of CPU).

Indeed the subscriber seems to keep up. We do see a dramatic increase in memory and cpu when another subscriber is attached from a different machine though. I mostly have a queue/topic background (Tibco EMS, MQ Series etc.), so i was just surprised to see our icebox.exe process take a majority of the cpu with just 1 subscriber attached colocated when compared against our server. our colocated subscriber also seemed remarkably high for such a simple processor of these messages. I will run the two processes as one and i'll let you know how that goes. With no subscribers, our production application runs with about 15% cpu on my developer box (half the spec as production), icebox.exe with about 5%. When we attach our collocated subscriber, icebox goes to about 20% and our consumer about 25% (not accounting for spikes which make the box hit 100%, and thus causing our subscriber to stop sending processed data downstream.. blips if you will).

benoit wrote: »

In any case, if the publisher is producing more messages than the subscriber can handle, the messages will start building up in IceStorm. The IceStorm service will start consuming more memory to queue all these messages, which will in turn eventually cause memory swapping which will slow things down even more... So since there's no flow-control mechanism to slow down the publisher if the subscriber can't keep up it's important to ensure there's always enough CPU/network bandwith on the subscriber side to make sure the subscriber will always be able to consume the incoming events in a timely manner, especially if the publisher publishes messages at a fixed rate.

Unless we add remote subscribers, we don't see an increase in memory consumption.

benoit wrote: »

Perhaps you can describe a little more why your publisher needs to publish that many events per second? Can you perhaps aggregate some of the messages? Does your subscriber need to process all the messages or can some be dropped?

We are working on a 24x7 pricing project (banking). We can't really afford blips (periods where the box hits 100% and icebox/colocated subscriber stops publishing to our reuters network). Since the server only runs at 25% max cpu.. i have to work out where the other 75% of the box goes! We are going to change our colocated subscriber to use a map backed queue (retain processing order, but refresh message with a newer price if not processed yet).

benoit · October 2009

Hi Mark,

mangrish wrote: »

Production is currently 8 cores (Intel Xeon 5130 @2GHz Windows Server 2003), but we are moving to a 32 core Solaris 10 box next month. How can we improve our Icestorm configuration to take advantage of this?

With such a number of cores, it could be interesting to increase the number of thread in the publisher thread pool using the IceStorm.Publisher.ThreadPool.Size property. You could for example test with 2 or 4 threads. This might increase IceStorm throughput.

However, increasing IceStorm throughput might actually be worst if your subscribers can't keep up! So you first need to ensure that subscribers are capable of consuming the events at the rate the publisher sends them

Otherwise, if for example the publisher publishes 10000 events/s and the subscriber can only process 8000 events/s, 2000 events/s will accumulate in the IceStorm service and this will quickly lead to IceStorm consuming lots of memory/CPU...

So it could also be interesting to increase the number of threads for your local subscriber if it's capable of processing incoming events from IceStorm concurrently as this would allow it to process more events per second. To increase the number of threads for your subscriber you can set Ice.ThreadPool.Server.Size (assuming it didn't create a per object adapter thread pool).

Indeed the subscriber seems to keep up. We do see a dramatic increase in memory and cpu when another subscriber is attached from a different machine though.

Yes, you need to figure out why this other subscriber can't deal with the flow of incoming events. Is it the network latency which is too high? Is it too slow to process the messages?

To figure this out, you can try sending the events directly from the publisher using synchronous (i.e.: non-AMI) oneway calls and measure how many events per second the publisher is able to push to the subscriber.

I mostly have a queue/topic background (Tibco EMS, MQ Series etc.), so i was just surprised to see our icebox.exe process take a majority of the cpu with just 1 subscriber attached colocated when compared against our server. our colocated subscriber also seemed remarkably high for such a simple processor of these messages. I will run the two processes as one and i'll let you know how that goes. With no subscribers, our production application runs with about 15% cpu on my developer box (half the spec as production), icebox.exe with about 5%. When we attach our collocated subscriber, icebox goes to about 20% and our consumer about 25% (not accounting for spikes which make the box hit 100%, and thus causing our subscriber to stop sending processed data downstream.. blips if you will).

I'm afraid, it's difficult to say whether or not this CPU usage is abnormal or not. Perhaps you could create a small test case using the demo/IceStorm/clock as a basis to try to reproduce your application environment and typical load? I'll be happy to take a look and check whether or not it works as expected.

Unless we add remote subscribers, we don't see an increase in memory consumption.

We are working on a 24x7 pricing project (banking). We can't really afford blips (periods where the box hits 100% and icebox/colocated subscriber stops publishing to our reuters network). Since the server only runs at 25% max cpu.. i have to work out where the other 75% of the box goes! We are going to change our colocated subscriber to use a map backed queue (retain processing order, but refresh message with a newer price if not processed yet).

Did you consider using a more distributed setup where for example the publisher publishes events on different IceStorm service instances running on different machines?

Btw note that with Java subscribers, it will be important to correctly tune the garbage collection to use a low pause collector. Otherwise, the GC could cause long pauses of the Java process to perform garbage collections and this would in turn cause IceStorm to start quickly accumulating events.

Cheers,
Benoit.

Archived

Throttling of messages in Icestorm

Comments

Categories