IceStorm filtering

lbrgz · November 2006

I'm evaluating the possibility to use ICE in a project strictly closed around stock quote broadcast and massive processing and I would like to understand if IceStorm may be able to help me.

The application is composed by different processing modules (PM) which
need real time informations about stock prices coming from different
stock exchanges so this is the reason I'm thinking about a message bus
like IceStorm.

Each PM have to process different amount of stock information in relation
to the end user needs and independently to other PM so potentially each
one could process the same group of stocks or each one could process its
own group of stock informations (however, every intermediate possibility
is valid).

Some questions:

- is it possible to install something like a filter on IceStorm server
to allow different PMs subscribed to the same topic to receive only
information subsets? Isn't possible for a single PM to filter informations
itself because potentially it could spend more time discarding wrong data
than processing the good one.

- I could implement a "per PM topic policy" but in this case, is it possible
to publish to the IceStorm message bus a message bound to multiple topic in
a single operation?

And finally, if for all my previous question the answer is negative, what
is it the best way to minimize the network traffic and the operations made
by clients and servers to publish/receive informations?

Tanks

matthew · November 2006

Currently IceStorm does not support any form of filtering, so you'll have to do one of the latter two solutions. To publish to multiple topics in one RPC (which I assume is your goal) you could use batching. Namely, something like this:

// t1, t2, t3 refer to three topics in the same IceStorm
t1 = MyTopicPrx::uncheckedCast(t1->ice_batchOneway());
t2 = MyTopicPrx::uncheckedCast(t2->ice_batchOneway());
t3 = MyTopicPrx::uncheckedCast(t3->ice_batchOneway());

t1->req();
t2->req();
t3->req();
t1->ice_getConnection()->flushBatchRequests();

Note that its only necessary to flush a single connection since each topic for this solution to work must share the same connection -- this will be the case as long as you use the same timeout settings for the topic proxies.

Finally, since filtering is the best solution, as you've already worked out, you might be interested in sponsoring the development of such a feature. If you are interested please contact us at sales@zeroc.com.

quadbyte · June 2011

I have quite the same problem with massive real time quotes (up to 200k differents quotes). All subscribers could not physically received all the market feeds (even on a 10GB network bandwith)

Does new storm version since 2006 provides some new features for filtering by a subset ?

Thierry

bernard · June 2011

Hi Thierry,

This feature has not been a priority so far, and the latest IceStorm does not provide a filtering mechanism.

IceStorm is very efficient because it transmits messages without unmarshaling and remarshaling them--messages just go through IceStorm as marshaled bytes.

So a filtering mechanism could not look inside messages; it would have to rely on one or more Ice contexts. Do you think an Ice-context-based filtering would work better for you than setting up multiple topics?

Thanks,
Bernard

quadbyte · June 2011

I do not really understand your proposal:

For my understanding, contexts are just additional parameters passing to an operation by the client to the server. So you propose that filtering should be done server side depending of the context set by the client ? In that case how storm route the right packet on the right client ?

The only way I see will be to create server side as much topic (up to 200,000) as I have quotes, and each client subscribe to topics they needs. Is it realistic ?

Thierry

bernard · June 2011

Hi Thierry,

The general idea with contexts is that publishers would pass one or more Ice contexts with the messages they send to IceStorm, and IceStorm would use the information in these contexts, together with registration information provided by the subscribers, to decide whether or not a subscriber gets the message.

In your scenario - stock quotes going through IceStorm - we could have for example:
- a single topic, StockQuote
- a context StockTicker with the stock ticker as value (provided by the publisher, AAPL, GOOG etc.).
- each subscriber would provide the list of tickers it's interested in
- IceStorm would filter the messages, and for each subscriber deliver only the quotes where there is a context match

The above is actually not a very good use-case for filtering: you're sending more data, and IceStorm would need to do a lot of string comparisons.

A better approach here would be to create a topic per ticker, and as a result a very large number of topics (as you outlined). I expect IceStorm will work fine with 200,000 topics, although it's not optimized for such usage. In particular, IceStorm will create a servant for each topic and keep it in memory, which is not optimal when you have a very large number of topics.

In terms of connections and network traffic, everything would be exactly with like with a single topic "filtered" by IceStorm.

We could rework the IceStorm implementation to use less memory and perform better with 200,000 topics ... I believe this would make more sense for this stock-quote-distribution use-case.

Best regards,
Bernard

quadbyte · June 2011

Thanks Bernard, it's more clear now.

I just tried to create more than 100,000 topics, like this:

while(..)
{
      .....
      QString topic = "fair" + product.getProductID();

      IceStorm::TopicPrx topicFair = topicManagerProxy_->create(topic.toStdString());
      Ice::ObjectPrx publisherFair = topicFair->getPublisher();
      publisherFair = publisherFair->ice_oneway();

      fairProxyMap_.insert(productKey, Safir::FairPrx::uncheckedCast(publisherFair));
}

but as you mention, my process is no more responding after 2000 topics...

Thierry

bernard · June 2011

Hi Thierry,

I don't see why it would fail after just 2,000 topics ... we'll investigate. Which platform did you use for this test?

Thanks,
Bernard

quadbyte · June 2011

Hi Bernard,

My config is :
- mac OSX 10.6 (last version)
- Ice 3.4.1
- Qt 4.7 (last version)

Thierry

quadbyte · June 2011

I get the following elapse time with Mac OSX10.6 :

topicManagerProxy_->create(....)

takes 66 ms per call

Ice::ObjectPrx publisherFair = topicFair->getPublisher();
publisherFair = publisherFair->ice_oneway();
fairProxyMap_.insert(productKey, Safir::FairPrx::uncheckedCast(publisherFair));

takes less than 1 ms

So it's seems that create() topic is from far the highest time consumer, and not compatible with massive 200k calls

quadbyte · June 2011

I suggest first to add in TopicManager iceStorm.ice file new create() and retrieve() functions with vector of string argument to gain network cost.

But it seems that IceStorm internal create() function deal with database access. May be speed up this part too...

Could I disable Storm persistent state and just use a memory cache ?

bernard · June 2011

Hi Thierry,

The topic creation takes a long time because each creates commits a database transaction. You can speed this up by switching to a purely transient IceStorm: IceStorm Properties - Ice 3.4 - ZeroC

Jose was able to create 200,000 topics on a Linux system in this mode ... but it took a little over 2h. It's unfortunately not a reasonable solution.

Ideally, a future version of IceStorm would support this use-case. If we keep a separate topic for each stock in this improved IceStorm, you would be able to create all these topics with a single create call, and likewise, you could register a subscriber with multiple topics with just one call.

An alternative could be to write your own Stock Quote Distribution service, without IceStorm. One of the first design choice for this service is how to find the stock ticker. This service could unmarshal the request parameters (to find the stock ticker) and then remarshal these parameters to forward the data to the subscribers. Or it could do like IceStorm: keep the parameters in marshaled form, and use another method to identify the stock, such as the object-id or a request context.

Best regards,
Bernard

quadbyte · June 2011

Hi Bernard,

My proof of concept for 200k topics gives me the same result than yours. Thanks for the alternative solution, but it will depends on the time taken by zeroc to support massive topics creation.

Another fonctionnality I have not found with Ice is the ability to always provide the 'freshest' data to the client.

This specific commnicator mode could be: always replacing the queued 'old' data not still provide/consume by the client by the newest incomming data.

For example, for my stock quote distribution service, if the market suddenly shifted in 2 sec, it could be more important for clients to get the freshest value of the quote without passing by all old values queued in the communicator.

Regards,
Thierry

bernard · June 2011

Hi Thierry,

The Glacier2 service provides this feature, with the _ovrd context.

It would make sense to incorporate a similar feature in a future version of IceStorm.

Thanks,
Bernard

Archived

IceStorm filtering

Comments

Categories