Weird one-second delay in IceStorm message delivery?

marian.cilka · July 2012

I am evaluating Ice and I have noticed that some of my tests have uneven response times. I have investigated it and it looks like IceStorm inserts 1-second delay between reception of message from publisher and delivery of the message to the subscriber. I am using two-way subscriber proxies. The delay appears only when sending the first message. Subsequent messages are delivered immediately. There's a direct LAN connection between all components.

I know that occasional delay doesn't matter for many applications, but I have a specific real-time application in mind where any delay is a big problem.

I have a couple of questions:
1) Is this a bug or a feature?
2) Why is the delay there? What is its purpose?
3) When does it happen? Can it happen when ACM reestablishes previously closed connection?
4) Is there a way to turn it off?

Thanks in advance,
Marian

bernard · July 2012

Hi Marian,

Welcome to our forums.

IceStorm does not introduce a 1 second delay from time to time. This delay is most likely due to a problem with the endpoint configuration of your subscriber's object adapter.

See Why does it take so long to establish a connection? - ZeroC Documentation - ZeroC.

Best regards,
Bernard

marian.cilka · July 2012

Hi Bernard,

Thanks for your reply. The problem sure isn't in the subscriber. The delay occurs before IceStorm even attempts to contact IceGrid locator to find out subscriber's address. Connecting to locator service isn't the problem either since IceStorm caches locator connection that was established earlier. Locator is specified using single IP address on the same LAN.

See an abbreviated log from IceStorm in my test environment. Notice the almost exact 1-second delay.

-- 07/04/12 10:40:11.027 MyGrid.IceStorm-MyGrid.IceStorm: Protocol: received request
identity = MyGrid.IceStorm/MyTopic.publish
operation = MyMethod
-- 07/04/12 10:40:11.027 MyGrid.IceStorm-MyGrid.IceStorm: Protocol: sending reply
reply status = 0 (ok)
-- 07/04/12 10:40:12.024 MyGrid.IceStorm-MyGrid.IceStorm: Protocol: sending asynchronous request
identity = MyGrid/Locator
operation = findAdapterById
-- 07/04/12 10:40:12.024 MyGrid.IceStorm-MyGrid.IceStorm: Protocol: received reply
reply status = 0 (ok)
-- 07/04/12 10:40:12.024 MyGrid.IceStorm-MyGrid.IceStorm: Network: tcp connection established
local address = 172.16.0.1:64994
remote address = 172.16.0.1:64989
-- 07/04/12 10:40:12.026 MyGrid.IceStorm-MyGrid.IceStorm: Protocol: received validate connection
-- 07/04/12 10:40:12.026 MyGrid.IceStorm-MyGrid.IceStorm: Protocol: sending asynchronous request
identity = b8a1acb9-414e-48d1-8535-814e916cdca7
operation = MyMethod
-- 07/04/12 10:40:12.034 MyGrid.IceStorm-MyGrid.IceStorm: Protocol: received reply
reply status = 0 (ok)

Thanks in advance for all your help,
Marian

bernard · July 2012

Hi Marian,

Could you post the IceStorm log with Ice.Trace.Network set to 2?

One possible explanation here is a delay in establishing a connection with the IceGrid registry. With the default configuration, this connection is closed when idle for 60 seconds; see Active Connection Management - Ice 3.4 - ZeroC for details.

Best regards,
Bernard

marian.cilka · July 2012

Hi Bernard,

Setting higher logging level is a good suggestion. I've done that and the log is attached below.

Apparently IceStorm is trying to open a connection somewhere and that connection attempt is being refused. Interestingly, it's a localhost connection that should get refused immediately by the operating system. IceStorm for some reason fails to notice the refused connection and instead waits one second before it eventually realizes the connection is indeed dead.

Now the question is, where is it trying to connect? And why does this connection block all other connections? Perhaps it's some old subscriber that wasn't unsubscribed? It's my understanding that IceStorm contacts all subscribers in parallel, so why would an old unresponsive subscriber delay delivery to all the healthy subscribers? IceStorm.SubscriberPool.Size is set to 10 to ensure there are always enough threads to contact all the subscribers in parallel. My test subscriber is always created anew with GUID identity and with default QoS settings, i.e. subscribers are not recycled and old subscribers should die after one failed connection attempt. So what is it waiting for?

While one second might not be much, it's a killer in applications that require message delivery with (reasonably) low latency. Since persistence of the connection is not guaranteed, I will likely have to renew it periodically, which could easily result in a lot of dead old connections, making this scenario quite common.

-- 07/06/12 14:42:18.684 MyGrid.IceStorm-MyGrid.IceStorm: Protocol: received request
identity = MyGrid.IceStorm/MyTopic.publish
operation = MyMethod
-- 07/06/12 14:42:18.684 MyGrid.IceStorm-MyGrid.IceStorm: Network: trying to establish tcp connection to 172.16.0.1:49310
-- 07/06/12 14:42:18.684 MyGrid.IceStorm-MyGrid.IceStorm: Protocol: sending reply
reply status = 0 (ok)
-- 07/06/12 14:42:19.681 MyGrid.IceStorm-MyGrid.IceStorm: Network: failed to establish tcp connection
local address: 0.0.0.0:49355
remote address: 172.16.0.1:49310
Network.cpp:1387: Ice::ConnectionRefusedException:
connection refused: The remote computer refused the network connection.
-- 07/06/12 14:42:19.681 MyGrid.IceStorm-MyGrid.IceStorm: Protocol: sending asynchronous request
identity = MyGrid/Locator
operation = findAdapterById
-- 07/06/12 14:42:19.681 MyGrid.IceStorm-MyGrid.IceStorm: Protocol: received reply
reply status = 0 (ok)
-- 07/06/12 14:42:19.681 MyGrid.IceStorm-MyGrid.IceStorm: Network: trying to establish tcp connection to 172.16.0.1:49351
-- 07/06/12 14:42:19.682 MyGrid.IceStorm-MyGrid.IceStorm: Network: tcp connection established
local address = 172.16.0.1:49356
remote address = 172.16.0.1:49351
-- 07/06/12 14:42:19.684 MyGrid.IceStorm-MyGrid.IceStorm: Protocol: received validate connection
-- 07/06/12 14:42:19.684 MyGrid.IceStorm-MyGrid.IceStorm: Protocol: sending asynchronous request
identity = a6948759-9ebb-46bd-9e9b-eab519bc1623
operation = MyMethod
-- 07/06/12 14:42:19.691 MyGrid.IceStorm-MyGrid.IceStorm: Protocol: received reply
reply status = 0 (ok)

Kind Regards,
Marian

bernard · July 2012

Hi Marian,

The IceStorm.SubscriberPool.xxx property is a very old property, removed as of Ice 3.3. Please confirm you are using Ice 3.4.2.

With Ice 3.4, you don't need to configure anything. IceStorm sends messages to its subscribers using non-blocking AMI calls. If a topic has a dead subscriber, the sending thread won't block until the connection attempt to this subscriber fails; it will immediately send to the next subscriber.

Separately, if you have a firewall on your system, you could try to shut it down to see if subscriber death (i.e. attempting to connect to a dead subscriber) is detected immediately. You could also try to unsubscribe your subscribers more cleanly, by calling "unsubscribe".

Best regards,
Bernard

marian.cilka · July 2012

Hi Bernard,

It's good to hear that there is sound technical design in IceStorm, but apparently it doesn't work in practice. I have Ice 3.4.2 on development machine as well as on my test server. I tried turning off the firewall, but it had no effect. My logs show that the test subscriber does call unsubscribe method. Are you sure the IceStorm code doesn't accidentally hang on some timeout? Does it notice dead subscribers immediately in your tests?

I've done a little more log analysis and it seems that IceStorm tries to reconnect to an old subscriber that did call unsubscribe. There are two connections per subscriber: one for topic manager and one for subscriber object itself. Interestingly, the subscriber object itself disconnects before unsubscribe is called on the topic manager connection (not sure why, perhaps timing issue in my test or in Ice). Perhaps IceStorm ignores unsubscribe when connection to the subscriber is closed at the moment?

Kind Regards,
Marian

benoit · July 2012

Hi Marian,

Are your subscribers subscribing with "indirect proxies"? IceStorm (like any Ice client) caches endpoint information for indirect proxies (proxies containing an adapter ID instead of endpoints). If the endpoint information of a given object adapter changes (the process is re-started for example), the Ice client only figures it out after failing to establish a connection to the object adapter. If it is the case, the client asks again the IceGrid locator the endpoints of the adapter ID and then tries again with the new endpoints. Could it be what is happening here? (i.e.: IceStorm doesn't try connecting to an old subscriber, it tries to connect to some old cached endpoints which are no longer valid). You could confirm it by enabling Ice.Trace.Locator=2 on the IceStorm server.

On the one second delay to notice the connection establishment failure... I don't find it so surprising, on some operating systems it takes time for the OS to return the failure. It is often the case on Windows for example. Which OS do you use? In any case, why do you think this is an issue? It doesn't mean that IceStorm will hang or come to a halt during one second: the connection establishment is asynchronous so IceStorm will perform other activities such as forwarding the event to other publishers.

If you do see a one second hang in IceStorm while it's trying to establish the connection to the bogus endpoints (i.e.: IceStorm doesn't perform anything else during the time it tries to establish the connection), it would be a bug and in this case it would be good if you could send us a test case.

Cheers,
Benoit.

marian.cilka · July 2012

benoit wrote: »

On the one second delay to notice the connection establishment failure... I don't find it so surprising, on some operating systems it takes time for the OS to return the failure. It is often the case on Windows for example. Which OS do you use?

I am on Windows. I've looked it up and this (unbelievable) behavior of Windows indeed completely explains the delay. Thanks for pointing it out. Now the question is why IceStorm fails to contact multiple subscribers in parallel.

benoit wrote: »

Are your subscribers subscribing with "indirect proxies"? IceStorm (like any Ice client) caches endpoint information for indirect proxies (proxies containing an adapter ID instead of endpoints). If the endpoint information of a given object adapter changes (the process is re-started for example), the Ice client only figures it out after failing to establish a connection to the object adapter.

My subscriber object is created with ObjectAdapter.addWithUUID method. Subscriber's proxy is taken from return value of the same method. This means that the subscriber has unique identity while its object adapter is named. Documentation doesn't mention anything about indirect proxy being returned from ObjectAdapter.addWithUUID. I am wondering what happens during restart of the subscriber process. Will IceStorm try to reuse the old object adapter even for the new subscriber? How do I prevent this from happening? Should I generate object adapter with UUID as its name?

Kind Regards,
Marian

bernard · July 2012

Hi Marian,

addWithUUID can return a direct proxy or an indirect proxy. It depends on the configuration of your object adapter: if it has an AdapterID, it's an indirect proxy, otherwise, it's a direct proxy.

This configuration is generated by IceGrid if you manage your subscribers with IceGrid.

> Will IceStorm try to reuse the old object adapter even for the new subscriber?

(Assuming you are using indirect proxies, which may not be the best choice)

The Ice runtime reuses the resolution of your adapter-ID in indirect proxies.

Say you register a subscriber "foo@myID". IceStorm delivers a message to this subscriber, so the Ice runtime in IceStorm resolves "@myID" and caches the result (1.2.4.5 -p 10000). Then you unregister foo@myID, and register a new subscriber, "foo2@myID". The Ice runtime still has the @myID resolution in its cache: when IceStorm sends a message to foo2, it first tries to connect to the cached endpoint (1.2.4.5 -p 10000) ... which fails, since the new foo2 is using another dynamically allocated port.

> How do I prevent this from happening? Should I generate object adapter with UUID as its name?

The adapter name does not matter, it's purely local. The solution is to either switch to direct proxies (probably the best) or use different adapter-IDs for each instance (probably more difficult).

Best regards,
Bernard

benoit · July 2012

Hi,

Another solution is to not declare the object adapter in the IceGrid descriptor if not needed, your subscriber object adapter will therefore not be configured with an AdapterId and will produce "direct proxies" containing the endpoints of the object adapter.

Regarding IceStorm failing to contact multiple subscribers in parallel, I tried with the IceStorm clock demo and I'm unable to reproduce. To try it, I ran IceStorm, a subscriber and a publisher. Then I modified the subscriber code to not activate the object adapter (which in effect makes the connection establishment from IceStorm to the subscriber hang because the OA doesn't accept new connections until its activated). If I start 5 of those modified subscribers, the running subscriber continues to receive updates and IceStorm tries to establish the connection to the 5 subscribers at the same time and they also fail at about the same time indicating that the connection establishment is indeed made in parallel for the 5 subscribers.

If this isn't what you're seeing, you would need to provide us a test case so that we can try it out.

Cheers,
Benoit.

marian.cilka · July 2012

Hi Benoit and Bernard,

Thanks for all your help, I now have a clear idea of what's happening. It's now obvious that after subscriber restart, IceStorm tries to reconnect to an old object adapter with matching identity and encounters one-second delay imposed by Windows API before noticing the subscriber is down. It's now also clear that the same problem will be encountered by all Ice clients, not just IceStorm.

Is there some general way to solve this issue? Will Ice clear its cache after first failed connection to the object adapter, i.e. will periodic pinging of all proxies reliably flush stale endpoint cache? I can control most Ice clients, but is there a way to make IceStorm ping its subscribers? Is there some easy way to convert indirect proxy into direct proxy?

Kind Regards,
Marian

benoit · July 2012

The easiest is to create a direct proxy with the Ice::ObjectAdapter::createDirectProxy method and use this proxy to subscribe with the topic. For your locator cache questions, check out the Ice manual here, it explains in details how the locator cache works.

Cheers,
Benoit.

Archived

Weird one-second delay in IceStorm message delivery?

Comments

Categories