Minor IceStorm enhancements

joe · May 2006

Hi ZeroC,

1. Are there any design or performance issues for IceStorm supporting only one level of topic linking? I can't see any and would like to be able to use n-level linking to support geographical IceStorm "distributors" without the need to unmarshall/remarshall the event in some propagating stub code.

From a (quick) review of LinkSubscriber.cpp I would change line 83 from :

    if(event->forwarded || (_cost > 0 && event->cost > _cost))

to

    if(_cost > 0 && event->cost > _cost)

I'd need to be aware of potential for cyclic subscriptions, etc

2. Do you think a pool of publisher threads would perform any faster than a serial loop iteration when dealing with slow subscribers. ie, assign n threads to dispatch on every (iterator+n) subscribers. The case I'm thinking of is serving to a public internet audience, 10-20% of whom may still be on 56K distributed anywhere on the globe, downloading MP3s at the same time. In the case of large or high-volume messages, wouldn't these subscribers penalise the rest if the publish is done in serial?.. I'm no networking expert, so the real cost may be negligible...

TopicI.cpp line 228

    for(SubscriberList::iterator p = copy.begin(); p != copy.end(); ++p)
    {
	(*p)->publish(event);
    }

If you think this is a reasonable improvement, I'm happy to take a stab at the code changes, maybe also including COW for Event or coverting Event references to EventPtr references to save data copying also... (your notes in Event.h).

Thanks.

matthew · May 2006

I'm not sure why you want to support N-level linking directly in your application. The idea of the existing topic linking is support this through an external application. You describe the topic graph and the cost between each segment, and a maximum cost. The application then computes the connectivity, and links everything up directly with the appropriate cost.

For example, if you have three nodes A, B, C. A links to B with a cost of 5, and B links to C with a cost of 3. The tool will create interconnections of A to B with a cost of 5, and a link from A->C with a cost of 8, and a link from B to C with a cost of 3.

This means that IceStorm itself doesn't have to deal with cycles, and keep track of the cost as the event moves through the system.

The only disadvantage with this model that I'm aware of is that it assumes that all nodes are reachable from all other nodes.

With respect to event delivery:

If you have very slow & very fast subscribers on the same icestorm instance, then with the current implementation I think you can see the effect you are describing. You can make all of them fully independent by having a thread-per-subscriber model, however, this has a high cost in the number of threads. If you use a thread pool model then you'll run into problems in determining how big to make the pool... too small and then they'll all be potentially consumed by slow subscribers again.

If you are generating large amounts of traffic & have slow subscribers that cannot keep up eventually you'll start to consume large amounts of memory. With this type of model you'll likely want to start limiting the size of the queue -- which means some sort of discard policy. This starts to get quite complex

Note that you can simulate slow links using some of the linux networking tools (see the "tc" command for details).

joe · May 2006

Thanks Matthew

matthew wrote:

I'm not sure why you want to support N-level linking directly in your application.

Essentially to support chains of pub/sub. Eg, stock quotes are generated centrally and pushed to hundreds/thousands of users distributed globally. The load on a central distributor may be impractical, and rather than have several central distributors, I'd like to have several decentralised distributors for each key geography (eg, North America, Asia, Europe, Pacific). I'm not concerned at all with link or message cost since each message must be delivered. Is there a good reason to only allow one level of linking, or would it be an improvement to make this configurable?

matthew wrote:

If you use a thread pool model then you'll run into problems in determining how big to make the pool... too small and then they'll all be potentially consumed by slow subscribers again.

I'm not sure this is a problem per se, any more than sizing client or server thread pools is. Plus however small the pool, it would would be an improvement over a size of 1 currently. A crude way of supporting this out of the box might be to have a "session" object consume the messages on the server end (thread pooled) which then makes a client call to distribute :-
(a) IceStorm publish ---> (b) Session subscribe --> (c) Client invoke

I'd want to do some genuine before/after throughput tests to be sure that the average case on the client side had improved, but if (a) and (b) were co-located then the added latency cost might be trivial but you'd achieve basic parallelism...

If you are generating large amounts of traffic & have slow subscribers that cannot keep up eventually you'll start to consume large amounts of memory. With this type of model you'll likely want to start limiting the size of the queue -- which means some sort of discard policy. This starts to get quite complex

The only other option I'd consider (if it became a true bottleneck) would be to implement a heartbeat to determine client response times and redistribute load for problem clients to a dedicated "slow subscriber" IceStorm instance. All other instances would have higher QoS (fast delivery, no lost messages) whereas the slow box would drop messages as needed, with no delivery timeliness, etc. With a session facade on the client side this would be relatively transparent...

matthew · May 2006

If the link level is more than 1 then it becomes more complex to manage internally the events. That's the key difference.

I'm not sure this is a problem per se, any more than sizing client or server thread pools is. Plus however small the pool, it would would be an improvement over a size of 1 currently. A crude way of supporting this out of the box might be to have a "session" object consume the messages on the server end (thread pooled) which then makes a client call to distribute :-
(a) IceStorm publish ---> (b) Session subscribe --> (c) Client invoke
...

This is a typical solution to your problem. If you require each of your clients to have a session, then this session is typically collocated on the same machine as the Glacier2 server, and hence inside the internal network. You would want to be careful to ensure that your sessions share connections otherwise the connection concentrator advantage that Glacier2 provides would be lost

You then push from the IceStorm server to the session server, and then send this to the client. This ensures that all internal communications are fast, and mis-behaving clients to not cause the IceStorm service to hang-up. I think this will scale much better, and will be much more fault tolerant and performant than doing all of this inside the IceStorm service itself.

You have to be careful not to block the thread that consumes the events that arrives in the session from IceStorm -- since that will then eventually block up other sessions also.

joe · May 2006

Cheers Matthew

I'll be using Session objects in any case and it sounds like there's a good argument for using them as IceStorm consumers/distributors as well.

Joe

Archived

Minor IceStorm enhancements

Comments

Categories