Archived

This forum has been archived. Please start a new discussion on GitHub.

hang up of IceStorm

Hi,

Had anyone encountered hang up of IceStorm? My project uses IceStorm as message transmit center. So it is very important that IceStorm could work for long time and never hang up. But recently the test results were not fine. There are 2 subscribers and 1 publishers in my system. To prevent subscriber from losing subscription, the subscriber will perform unsubscription and subscription in time. Normally in one or two days, the IceStorm will hang up and the publisher will fail to publish the message. It can connect to the TopicMangaer and retreive the ObjectPrx.

My project is developed under Windows with Ice version 3.2. In fact, there are two problem. One is whether the IceStorm can work for long time. Another is how the subscriber know the subscription is active.

Regards,

Feng Xuebin

Comments

  • matthew
    matthew NL, Canada
    Yes, IceStorm is expected to work for a long time. I'm afraid that you haven't given enough information to help solve your problem. What do you mean by hangup? Do you mean your subscriber stops receiving data? If that is the case most likely there was a communications problem between IceStorm and the subscriber. This could either be network related or could be caused by a bug in the subscriber itself causing a timeout exception. To find out more information I recommend enabling more tracing -- see the Ice manual for details.

    Regarding your second question since subscribers are not persistent, they will be kicked whenever an error occurs. If this happens the subscriber is not notified (how can it be? its booted because of an error :). Since this is the case there is no generic way you can know -- instead, if this is important to you, you can do this in some application specific way. For example, have the subscriber ping the subscriber specific object (see subscribeAndGetPublisher) on a regular basis. Another approach is to send heartbeat messages through IceStorm. If no message arrives within a certain time limit you know there is a problem.

    A better solution is to add persistent subscribers. However, without commercial interest this is not a high priority. If you are interested in sponsoring such a development please contact us at sales@zeroc.com.
  • I'm so sorry that I didn't describe the problem very clearly. The strange problem is publisher can't publish the message when the IceStorm hangs up. I debug the IceStorm and find out it is blocked at ThreadPool.cpp:
    bool IceInternal::ThreadPool::run()
    {
    ...
    ret = ::select(static_cast<int>(_maxFd + 1), &fdSet, 0, 0, 0);
    ...

    The publisher, subscriber and IceStorm all reside on the same host. It is very strange.
  • benoit
    benoit Rennes, France
    Hi,

    A thread from the thread pool blocked in the select() call is expected. This simply means that the thread pool waits for additional data to be received over established TCP/IP connections.

    What kind of invocations do you use to publish the messages to the topic, twoway or oneway? Can you get the strack trace of each thread from the publisher process and IceStorm process when you get the hang and post them here?

    Cheers,
    Benoit.
  • When the IceStorm process hang up, the publisher process catch connect timeout exception. At the same time, the IceStorm has no any trace info.
  • benoit
    benoit Rennes, France
    Hi,

    The connect timeout exception indicates that the publisher wasn't able to establish a network connection with the IceStorm service in a timely manner. This could be caused by a network issue (unlikely if you run everything on the same machine) or because IceStorm is for some reasons unable to answer the connection request.

    The easiest would be to provide us a small self-contained example that we can use to duplicate your problem.

    Cheers,
    Benoit.
  • Hi,

    The attatchment is the examples include a publisher and a subscriber. It might take some days to re-occure the problem.
  • matthew
    matthew NL, Canada
    Sorry, there is no attachment... could you try again please?
  • Hi,

    The problem re-occured this morning. I found out the publisher thread blocked in the following statement:
    Subscriber::queue(bool, const EventDataSeq& events)

    I suspected it is caused by subscription and unsubscription at the same time the publisher want to publish the message.
  • Hi,

    The problem re-occured this morning. I found out the publisher thread blocked in the following statement in TopicI.cpp:
    Subscriber::QueueState state = (*p)->queue(forwarded, events);

    I suspected it is caused by subscription and unsubscription at the same time the publisher want to publish the message.
  • Hi,

    The attachment is trace info when the IceStorm was blocked.
  • marc
    marc Florida
    I sent you an email and a private message regarding ongoing support. If you didn't get these, can you please get back to us at info@zeroc.com?