Some questions re IceStorm

tomcwalker · July 2006

Hello there,

I'm taking a look at IceStorm to see how well it fits our needs.

I have a couple of questions after reading the documentation.

I'd like a kind of failover facility that I can use with IceStorm, so that if the IceStorm server goes down, I can automatically fail over to a different box.

So far as I can work out to emulate this sort of behaviour I'd need my publishers to check which IceStorm servers are available, and publish to the first one that is currently up. Subscribers would then receive messages from whichever one of the servers was being used by clients at the time. Is this a sensible way to handle this problem? If so, could you suggest a sensible way for clients to detect which server is up?

2) Can you clarify what happens when there is delivery failure on oneway and twoway messages to subscribers via IceStorm. As I interpret the manual, if a oneway message fails, no-one will be any the wiser, and the client remains subscribed to the topic. If a twoway message fails, the IceStorm server will become aware of the failure, and unsubscribe the client. Other than that, how do the semantics differ? Also, if a twoway message fails, is there any way to detect this and hook into it so as to react to it programatically?

Possible I've missed something in the documentation here, apologies if so, please just direct me to the relevant areas if this is the case.

Many thanks.

bernard · July 2006

Hi Tom,

With respect to IceStorm fail-over, I see two options:
- use IceGrid to monitor your IceStorm servers
- have your clients (publishers) "monitor" the IceStorm servers by sending events with twoway calls (that's the requests between publishers and IceStorm, not to be confused with the 'twoway' QoS which is for requests between IceStorm and the subscribers).

In either case, I'd recommend to build a set of identical IceStorm replicas, assuming your topics/links structure is static. When a publisher publishes an event, the Ice runtime will automatically select a running IceStorm service. You would also register your subscribers with every single topic replica (don't use the "replicated" topic proxy in this case); this way, the subscriber will receive events regardless of the replica chosen by the publishers.

The Ice runtime will provide transparent fail-over for publishers in most circumstances; the only tricky situation is when a publish fails and the Ice runtime cannot safely retry--i.e. the operation is not idempotent/nonmutating, and the Ice runtime cannot tell whether the request was processed or not by IceStorm. Your application needs to handle this situation.

2) Can you clarify what happens when there is delivery failure on oneway and twoway messages to subscribers via IceStorm. As I interpret the manual, if a oneway message fails, no-one will be any the wiser, and the client remains subscribed to the topic. If a twoway message fails, the IceStorm server will become aware of the failure, and unsubscribe the client. Other than that, how do the semantics differ?

You're right, there is not much difference between the 'oneway' and 'twoway' QoS; I was actually discussing this very topic with Matthew yesterday.

With oneways, some events can be lost without IceStorm noticing their loss (see http://www.zeroc.com/faq/onewaysLost.html). This does not mean IceStorm never detects subscribers problems with oneways, just that some problems can go undetected. For example if a subscriber disappears, IceStorm won't be able to establish or reestablish a connection to this subscriber and will remove it.
With the 'twoway' QoS, events cannot be lost without IceStorm noticing, and Ice/IceStorm will retry automatically on some exceptions.

Also, if a twoway message fails, is there any way to detect this and hook into it so as to react to it programatically?

When IceStorm detects a problem with a subscriber, it just removes this subscriber and logs a message (if logging is enabled).

What kind of notification/hook would you like to have?

Best regards,
Bernard

bernard · July 2006

On 'oneway' vs 'twoway', the main reason to choose 'twoway' over 'oneway' is when you need to enable ACM in the server hosting your subscriber.

Naturally, you're not going so have so many IceStorm servers that ACM is needed for these clients; however it's possible that the server hosting your subscribers serves many other clients.

Best regards,
Bernard

tomcwalker · August 2006

Thanks a lot

Bernard,

Thanks very much for the helpful and thorough replies.

As mentioned in another thread, we won't be using IceGrid for replication, as we can't live with the single point of failure in the registry, so we have decided to go with a DB holding a list of direct proxy details.

So we'll go with the second option you suggest - twoway calls between publisher and IceStorm servers, to allow proper handling of IceStorm server problems.

Now, a little clarification here would be great. You said:
"The Ice runtime will provide transparent fail-over for publishers in most circumstances; the only tricky situation is when a publish fails and the Ice runtime cannot safely retry--i.e. the operation is not idempotent/nonmutating, and the Ice runtime cannot tell whether the request was processed or not by IceStorm. Your application needs to handle this situation."

As it happens, we will only use IceStorm for idempotent/nonmutating operations, so that bit it easy. Cross referencing your reply and the documentation (around p1380 of the 3.0 docs, chapter 42.5), it looks like the Topic Manager will be aware of the existence of Ice Storm servers, and will return a proxy to any one of the available running ice storm servers which is capable of publishing the relevant topic (hence providing the failover capabilities if one of the servers died) when getPublisher() is called. Have I got this right, more or less? Or do I need to do more work client side to ensure failover works?

Moving onto the topic of how detecting problems between subscriber and publisher:
"What kind of notification/hook would you like to have?"
I guess I'd like to have the ability for the publisher to try resending the message a couple of times to a subscriber which failed. However, I haven't got enough experience of Ice to know if this is necessary/useful. It may be that the system is so generally reliable that you only get these sort of failures when there is some kind of serious connection problem, which is fair enough.

The case I'm concerned about is that a message delivery fails, say, because of super heavy network traffic or some temporary network glitch, or the subscriber being overloaded for a minute or so, and as a result a subscriber gets disconnected from the publisher when it could have recovered.

I notice you say
"For example if a subscriber disappears, IceStorm won't be able to establish or reestablish a connection to this subscriber and will remove it." with reference to oneway comms between IceStorm and the subscribers. This implies that IceStorm does try to reestablish lost connections when using two way comms. On the other hand, in the manual it says this:
"42.3.7 Subscriber Errors
If IceStorm encounters a failure while attempting to deliver a message to a
subscriber, the subscriber is immediately unsubscribed from the topic on which
the message was published."

That's a little confusing - would IceStorm ever attempt to reestablish a lost connection with a missing subscriber? If so, under what circumstances? Or does it just immediately dump the subscriber the first time it gets a message undelivered as the manual implies.

Thanks again for the help

bernard · August 2006

Hi Tom,

I am happy to help; we'll try to provide a replicated IceStorm demo with the next Ice release.

As mentioned in another thread, we won't be using IceGrid for replication, as we can't live with the single point of failure in the registry, so we have decided to go with a DB holding a list of direct proxy details.

Please note that the replicated IceGrid is really coming in the next Ice release. However there is no release date at this point.

Cross referencing your reply and the documentation (around p1380 of the 3.0 docs, chapter 42.5), it looks like the Topic Manager will be aware of the existence of Ice Storm servers, and will return a proxy to any one of the available running ice storm servers which is capable of publishing the relevant topic (hence providing the failover capabilities if one of the servers died) when getPublisher() is called. Have I got this right, more or less? Or do I need to do more work client side to ensure failover works?

For fail-over to work in your publishers, you want to give them "publisher proxies" (the proxies returned by Topic::getPublisher) that are replicated, i.e. that look like:

MyTopic/publish:tcp -h host1 -p 10000:tcp -h host2 -p 10000:tcp -h host3 -p 12000

When you call Topic::getPublisher on a topic, you'll get a single "object-adapter" proxy, e.g.

MyTopic/publish:tcp -h host2 -p 10000

so you will need to somehow manufacture a full-replicated proxy from all these proxies. I'd use the ice_endpoints/ice_getEndpoints functions on ObjectPrx. You could do that in your publishers (contact all running IceStorm servers to build this proxy) or in your own service.

The above assumes you use direct proxies. With IceGrid or your own Ice::Locator implementation, you would configure the Publish object adapter in IceStorm to use a replica-group, and the locator would take care of this proxy manufacturing.

Moving onto the topic of how detecting problems between subscriber and publisher:
"What kind of notification/hook would you like to have?"
I guess I'd like to have the ability for the publisher to try resending the message a couple of times to a subscriber which failed. However, I haven't got enough experience of Ice to know if this is necessary/useful. It may be that the system is so generally reliable that you only get these sort of failures when there is some kind of serious connection problem, which is fair enough.

The case I'm concerned about is that a message delivery fails, say, because of super heavy network traffic or some temporary network glitch, or the subscriber being overloaded for a minute or so, and as a result a subscriber gets disconnected from the publisher when it could have recovered.

I think the confusion comes from the distinction Ice/IceStorm: Ice takes care of all the retrying, and IceStorm does not add any special retry logic beyond what Ice does under the cover. See "34.3 Connection Establishment" in the Ice 3.1 manual.

You configure the IceStorm retry behavior (for messages from IceStorm to its subscribers) by configuring the Ice retry behavior. A message from IceStorm to a subscriber will only fail (from IceStorm's point of view) after Ice has exhausted all its automatic retries.

I notice you say
"For example if a subscriber disappears, IceStorm won't be able to establish or reestablish a connection to this subscriber and will remove it." with reference to oneway comms between IceStorm and the subscribers. This implies that IceStorm does try to reestablish lost connections when using two way comms.

When Ice encounters a closed connection, it always tries to (re-)establish the connection. That's true for oneway like for twoway requests.

In this example, I meant a subscriber that has disappeared for good, e.g. shut down or crashed. oneways are not complete "fire and forget": when you send a oneway request, Ice creates or reuses a connection, and writes the parameters to that connection. If this fails (e.g. Ice can't (re-)establish a connection) your oneway request fails and you get an exception.

On the other hand, in the manual it says this:
"42.3.7 Subscriber Errors
If IceStorm encounters a failure while attempting to deliver a message to a
subscriber, the subscriber is immediately unsubscribed from the topic on which
the message was published."

That's a little confusing - would IceStorm ever attempt to reestablish a lost connection with a missing subscriber? If so, under what circumstances? Or does it just immediately dump the subscriber the first time it gets a message undelivered as the manual implies.

As mentioned above, Ice does the retrying automatically. How much retrying depends on the Ice retry configuration. If an error reaches IceStorm after the Ice automatic retrying, IceStorm dumps the subscriber immediately: it does not keep the subscriber proxy around just in case it works again in 5 mins (which could happen when a network cable is accidently disconnected and later reconnected).

Hope this helps!

Bernard

tomcwalker · August 2006

Thanks again!

Thanks very much indeed.

That last reply really cleared things up, especially with regard to retry/connection failure policy and failover. It looks like IceStorm will do the job very nicely given the above. We'll give it a go and let you know how we get on!

Archived

Some questions re IceStorm

Comments

Categories