RPC hangs when host gets disconnected from network

timruijs · September 2011

Little background:
In our system we have an event dispatcher that distributes events to subscribers (both locally and remote). A thread retrieves an event from the event queue, invokes the method on the subscriber to deliver the event, returns and continues monitoring the queue for other events.

Consider the following scenario:
the publisher host has two event dispatchers for different events running. A subscriber on a remote host subscribes to both events. Both event types are generated about ten times a second. When we disconnect the subscriber host from the network, one of the event handlers appears to block (we think in TCP?), but the other gets an exception from Ice. We would rather have both event threads throw exceptions and return control. The problem is that one queue starts to fill with events. What is going on?

More info can be provided when required, thanx

marc · September 2011

It is not always possible for the TCP/IP stack to detect a lost connection. For example, if you just "cut the cable", the TCP/IP stack may not be able to detect a connection loss until the TCP/IP timeout kicks in, which might take very long.

The solution is to set a timeout in Ice. This way, if a request blocks, Ice will raise an exception, even if the TCP/IP stack cannot detect a connection loss.

timruijs · September 2011

We use Ice 3.2.0.

Thanks for your quick response.
We set the timeout in the proxy like is shown below.

iSubscriberPrx tmpPrx = iSubscriberPrx::uncheckedCast((publishCallbackPtr->Prx)->ice_timeout(publishCallbackPtr->TopicProperties.mDeliveryTimeOut));

if ( tmpPrx )
{
tmpPrx->OnTopic(publishCallbackPtr->Topic, publishCallbackPtr->PublisherTag );
}

mDeliveryTimeOut = 15000 (mSec)

Agreed that TCP/IP can not always properly detect connection loss, that is exactly why we use the timeout.
The strange thing is that we do get an exception for one of the threads, but the other blocks, even with timeout set. When we reconnect the network, the last message of the blocking thread is finished and the thread continues happily.

marc · September 2011

If you set a timeout, the call should not block. Note, however, that timeouts sometimes can take longer than the timeout value. See this FAQ.

In any case, I'm afraid without a concrete example that demonstrates the problem, it's not possible to tell what the problem is.

Note that we only provide limited free support for the latest version of Ice here in these forums. While I'm not aware of any specific problem in Ice 3.2 with respect to timeouts, I still recommend to upgrade to the latest version of Ice.

timruijs · September 2011

Vialis is a long term licensee of ICE.
Unfortunately we cannot simply update to a newer version of ICE (risk, validation/certification etc).

The problem with a concrete example is that this concerns our event framework which is quite complex. We do use our own locator, could that possibly interfere with the timeout mechanism?

Our locator registers local objects(on the same host where locator runs). Indirect proxies are used to access remote objects which contain the id of the locator on the remote host.

Archived

RPC hangs when host gets disconnected from network

Comments

Categories