Archived

This forum has been archived. Please start a new discussion on GitHub.

Subscriber info for IceStorm topics

Is there any way to get any information - such as a list of proxy endpoints ect. - about the subscribers to a given IceStorm topic either though the TopicManager or through icestormadministrator?

The interfaces doesn't seem to support this, but some information must be stored somewhere.

My basic problem is that I have some topics, that seem 'clogged' somehow. If they have been used, I unsubscribe, and the resubscribe, I no longer get any messages. This is far from the most accurate description of the problem, but I feel it would be nice to have some more diagnostic tools to investigate the matter with, which is why I would like to be able to inspect the topic.

mvh

NHB

Comments

  • benoit
    benoit Rennes, France
    Sorry, it's currently not possible to get this information through the IceStorm interfaces. Did you try to enable tracing on the service to see if it could give you some clues? The following tracing properties should give you information on the topic subscribers:
    • IceStorm.Trace.Topic=2
    • IceStorm.Trace.Subscriber=1

    Of course, if you could provide a small example that demonstrate the problem it would be much easier to investigate but I understand that it's not always easy :). In any case, let us know if tracing doesn't give you more clues on what the problem could be...

    Benoit.
  • benoit wrote:
    Sorry, it's currently not possible to get this information through the IceStorm interfaces. Did you try to enable tracing on the service to see if it could give you some clues? The following tracing properties should give you information on the topic subscribers:
    • IceStorm.Trace.Topic=2
    • IceStorm.Trace.Subscriber=1

    I have added more tracing, and I will add the above traces to my files as well. I think I'm already on the right track.

    mvh

    Nis
  • Now with the extra tracing I've been able to track my problem.

    I sometimes (often but not always) get a ConnectionTimedOut error. Do I have to request a publisher object each time I want to publish something on a topic - Currently I retrieve it once, and the appearently it stops working when I haven't used it a while.

    mvh

    NHB
  • benoit
    benoit Rennes, France
    So the problem is with publishing updates to the topic and not with subscribers not receiving updates, correct?

    You don't have to retrieve the publisher object each time you want to publish an update. You should be able to retrieve it once and use its proxy as long as you need it in your publisher application.

    Note that active connection management (ACM) is enabled by default in Ice. The underlying TCP/IP connection associated to the IceStorm::Topic proxy will be closed after 60 seconds if it's not used. Ice will try to re-open this connection automatically when you do a request on the proxy. If at this time the host where IceStorm can't be reached or the connection establishment takes too long, you might get an Ice::ConnectTimeoutException. Perhaps this is what is happening?

    Do you configure timeouts for your publisher or the IceStorm service? Could you post the tracing of your publisher when this happens and configuration files of the IceStorm service and your publisher application? We might be able to find more clues if you didn't already find what the problem was ;)

    Benoit.
  • benoit wrote:
    So the problem is with publishing updates to the topic and not with subscribers not receiving updates, correct?

    I must admit that I'm not really sure.

    I know for certain that nothing happens on the subscriber end, but exactly where the call gets lost is stil a bit unclear to me.
    benoit wrote:
    You don't have to retrieve the publisher object each time you want to publish an update. You should be able to retrieve it once and use its proxy as long as you need it in your publisher application.

    Good - I assumed as much, but with these ConnectionTimedout exceptions I got unsure.
    benoit wrote:
    Note that active connection management (ACM) is enabled by default in Ice. The underlying TCP/IP connection associated to the IceStorm::Topic proxy will be closed after 60 seconds if it's not used. Ice will try to re-open this connection automatically when you do a request on the proxy. If at this time the host where IceStorm can't be reached or the connection establishment takes too long, you might get an Ice::ConnectTimeoutException. Perhaps this is what is happening?

    In the current setup the IceStorm service runs on the same machine as the program trying to contact it, and everything seems to be running fine.
    benoit wrote:
    Do you configure timeouts for your publisher or the IceStorm service? Could you post the tracing of your publisher when this happens and configuration files of the IceStorm service and your publisher application? We might be able to find more clues if you didn't already find what the problem was ;)

    Benoit.

    I haven't set any explicit timeouts for publisher, IceStorm service or subscriber, so everything should be at default settings.


    Some traces - This is what happens at the publicher wiith the problematic call
    [ Tube1: Network: tcp connection established
      local address = 192.168.0.201:34912
      remote address = 192.168.0.201:34899 ]
    [ Tube1: Protocol: received validate connection
      message type = 3 (validate connection)
      compression status = 0 (not compressed; do not compress response, if any)
      message size = 14 ]
    [ Tube1: Protocol: sending request
      message type = 0 (request)
      compression status = 0 (not compressed; do not compress response, if any)
      message size = 736
      request id = 0 (oneway)
      identity = Anne_EventObserver/publish
      facet = 
      operation = receiveInfolecule
      mode = 0 (normal)
      context =  ]
    
    And at the IceStorm server I get
    [ icebox-IceStorm: Protocol: received request
      message type = 0 (request)
      compression status = 0 (not compressed; do not compress response, if any)
      message size = 736
      request id = 0 (oneway)
      identity = Anne_EventObserver/publish
      facet = 
      operation = receiveInfolecule
      mode = 0 (normal)
      context =  ]
    [ icebox-IceStorm: Network: tcp connection established
      local address = 192.168.0.201:34913
      remote address = 192.168.0.99:1423 ]
    [ icebox-IceStorm: Protocol: received validate connection
      message type = 3 (validate connection)
      compression status = 0 (not compressed; do not compress response, if any)
      message size = 14 ]
    [ icebox-IceStorm: Protocol: sending request
      message type = 0 (request)
      compression status = 0 (not compressed; do not compress response, if any)
      message size = 758
      request id = 0 (oneway)
      identity = GameClient_091f9331-63f7-4ad7-b796-b06abb765c03
      facet = 
      operation = receiveInfolecule
      mode = 2 (idempotent)
      context =  ]
    
    When I then wait for everything to timeout etc.

    I get Ice::Exception - ConnectionI.cpp:431: Ice::ConnectionTimeoutException: on the publisher side.

    If I handle this exception by trying to get a new publisher object for the same topic I get
    [ Tube1: Network: tcp connection established
      local address = 192.168.0.201:34915
      remote address = 192.168.0.201:9999 ]
    [ Tube1: Protocol: received validate connection
      message type = 3 (validate connection)
      compression status = 0 (not compressed; do not compress response, if any)
      message size = 14 ]
    [ Tube1: Protocol: sending request
      message type = 0 (request)
      compression status = 0 (not compressed; do not compress response, if any)
      message size = 60
      request id = 1
      identity = Anne_EventObserver
      facet = 
      operation = getPublisher
      mode = 1 (nonmutating)
      context =  ]
    [ Tube1: Protocol: received reply
      message type = 2 (reply)
      compression status = 0 (not compressed; do not compress response, if any)
      message size = 87
      request id = 1
      reply status = 0 (ok) ]
    Contacting new event observer
    [ Tube1: Network: tcp connection established
      local address = 192.168.0.201:34916
      remote address = 192.168.0.201:34899 ]
    [ Tube1: Protocol: received validate connection
      message type = 3 (validate connection)
      compression status = 0 (not compressed; do not compress response, if any)
      message size = 14 ]
    [ Tube1: Protocol: sending request
      message type = 0 (request)
      compression status = 0 (not compressed; do not compress response, if any)
      message size = 738
      request id = 0 (oneway)
      identity = Anne_EventObserver/publish
      facet = 
      operation = receiveInfolecule
      mode = 0 (normal)
      context =  ]
    
    on the publisher side and
    [ icebox-IceStorm: Protocol: received request
      message type = 0 (request)
      compression status = 0 (not compressed; do not compress response, if any)
      message size = 60
      request id = 1
      identity = Anne_EventObserver
      facet = 
      operation = getPublisher
      mode = 1 (nonmutating)
      context =  ]
    [ icebox-IceStorm: Protocol: sending reply
      message type = 2 (reply)
      compression status = 0 (not compressed; do not compress response, if any)
      message size = 87
      request id = 1
      reply status = 0 (ok) ]
    [ icebox-IceStorm: Network: accepted tcp connection
      local address = 192.168.0.201:34899
      remote address = 192.168.0.201:34916 ]
    [ icebox-IceStorm: Protocol: sending validate connection
      message type = 3 (validate connection)
      compression status = 0 (not compressed; do not compress response, if any)
      message size = 14 ]
    [ icebox-IceStorm: Protocol: received request
      message type = 0 (request)
      compression status = 0 (not compressed; do not compress response, if any)
      message size = 738
      request id = 0 (oneway)
      identity = Anne_EventObserver/publish
      facet = 
      operation = receiveInfolecule
      mode = 0 (normal)
      context =  ]
    [ icebox-IceStorm: Subscriber: GameClient_091f9331-63f7-4ad7-b796-b06abb765c03: publish failed: ConnectionI.cpp:1993: Ice::CloseConnectionException:
      protocol error: connection closed ]
    
    on the IceStorm side

    So in that case I get through to the IceStorm service, but I still don't get alle the way to the client.

    If I don't let anything timeout, I can keep up the communication as long as I like. However once communications are down, I cant get them back up except by reconnecting on the server side (Via the exception handling) and the resubscribing the client.

    Apart from these errors all programs (server (publisher), IceStorm service and client (subscriber)) seem to running just fine.
  • benoit
    benoit Rennes, France
    Are you using Ice 2.1.0 by any chance?

    This would explain this exception. The following change was made in Ice 2.1.0 (see the CHANGES file in your Ice distribution for the list of all the changes):
    We do not retry oneway or batch oneway requests anymore, except if
    there are problems during connection establishment. If we retry a
    oneway or batch oneway, previous oneways from the same batch, or
    previous oneways that are buffered by the IP stack implementation,
    are silently thrown away. This can lead to a situation where the
    latest oneway succeeds due to retry, but former oneways are
    discarded.

    In your case, the connection was closed by ACM (indicated by the Ice::ConnectionTimeoutException). So when you call again on the oneway proxy, you get an exception indicating that the connection was closed. Unlike previous Ice version, the call isn't retried (for the reason mentioned in the paragraph quoted above).

    You have several option to solve this problem:
    • If you want to continue using oneway to publish and deliver updates to subscribers (note that oneway are unreliable so that's perhaps not the best choice for your application!), you can disable active connection management by setting Ice.ConnectionIdleTime to 0 in your publisher and subscriber programs and in the IceStorm service configuration. This way the connection will only get closed if there's problems with the underlying network connection.
    • Use twoway requests to publish and deliver updates to your subscribers. Twoway are reliable and will be retried if the connection is closed by ACM. Note that IceStorm only supports twoway delivery as an undocumented feature since Ice 2.1.0. This will be documented in the next version. To use twoway delivery, your subscriber just need to subscribe with the "reliability" quality of service set to "twoway".

    Please let us know if you have any questions related to this change!

    Benoit.
  • benoit wrote:
    Are you using Ice 2.1.0 by any chance?

    Yes, I am.

    I tried changing to twoway quality of service, and preliminary testing gives the same results (Nothing gets through if I wait long before using the connection and let them time out), but I'll look into it some more tomorrow, and come back with a more extensive report. I might have forgotten some setup somewhere.
  • benoit
    benoit Rennes, France
    Ok, let us know when you have more information! Note that you also need to ensure that you're using a twoway proxy for the publisher proxy that you retrieve from IceStorm... I would still expect this exception if you're using a oneway proxy to publish updates on the topic.

    Benoit.
    • If you want to continue using oneway to publish and deliver updates to subscribers (note that oneway are unreliable so that's perhaps not the best choice for your application!), you can disable active connection management by setting Ice.ConnectionIdleTime to 0 in your publisher and subscriber programs and in the IceStorm service configuration. This way the connection will only get closed if there's problems with the underlying network connection.

    I've now tried disabling active connection management, and then I don't have any problems. However I would like to know if this will give me any serious overhead - I expect the connections to be used quite a lot in normal use conditions, so my assumption would be that it wouldn't cause that much of a problem.

    I'm using twoway reliability by the way.
    benoit wrote:
    Note that you also need to ensure that you're using a twoway proxy for the publisher proxy that you retrieve from IceStorm... I would still expect this exception if you're using a oneway proxy to publish updates on the topic.

    I no longer explicitly cast my publisher to a oneway proxy, and I assume that it is then a twoway proxy by default. Otherwise that may be the cause of the problem. I'm using 'default' (which I assume is tcp ?) endpoint protocols, if that means anything. Do I need to set a reliability parameter on the pubsliher end as well ?

    mvh

    NHB
  • benoit
    benoit Rennes, France
    Active connection management only closes idle connections (by default the timeout is 60s). So you're right -- it won't make much of a difference if your connections are always used.

    I don't understand however why you have to disable ACM if you're using the twoway delivery mode an a twoway proxy to publish your updates. It should work without disabling ACM. Can you perhaps try to cast explicitly the proxy to a twoway proxy (with obj->ice_twoway()). The default should be twoway though, so this shouldn't be needed.

    There's no QoS settings to set on the publisher side.

    Benoit.
  • More curious behaviour.

    From the IceStorm server recieves the following request

    [ icebox-IceStorm: Protocol: received request
    message type = 0 (request)
    compression status = 0 (not compressed; do not compress response, if any)
    message size = 736
    request id = 5
    identity = Anne_EventObserver/publish
    facet =
    operation = receiveInfolecule
    mode = 0 (normal)
    context = ]

    It takes about 1 minute, where appearntly nothing happens, until it finally procedes with the following

    [ icebox-IceStorm: Network: tcp connection established
    local address = 192.168.0.201:34518
    remote address = 192.168.0.51:2321 ]
    [ icebox-IceStorm: Protocol: received validate connection
    message type = 3 (validate connection)
    compression status = 0 (not compressed; do not compress response, if any)
    message size = 14 ]
    [ icebox-IceStorm: Protocol: sending asynchronous request
    message type = 0 (request)
    compression status = 0 (not compressed; do not compress response, if any)
    message size = 758
    request id = 1
    identity = GameClient_792d0020-c4f7-4d16-bb83-9450e1063dd4
    facet =
    operation = receiveInfolecule
    mode = 2 (idempotent)
    context = ]
    [ icebox-IceStorm: Protocol: sending reply
    message type = 2 (reply)
    compression status = 0 (not compressed; do not compress response, if any)
    message size = 25
    request id = 5
    reply status = 0 (ok) ]
    [ icebox-IceStorm: Protocol: received reply
    message type = 2 (reply)
    compression status = 0 (not compressed; do not compress response, if any)
    message size = 25
    request id = 1
    reply status = 0 (ok) ]

    After that all other traffic on that topic comes through immediately. The publisher that made the request seems to be tied up for the same period.

    We are still running without ACM, and twoway subscription

    [ icebox-IceStorm: Protocol: received request
    message type = 0 (request)
    compression status = 0 (not compressed; do not compress response, if any)
    message size = 160
    request id = 36
    identity = Anne_EventObserver
    facet =
    operation = subscribe
    mode = 0 (normal)
    context = ]
    [ icebox-IceStorm: Topic: Subscribe: GameClient_792d0020-c4f7-4d16-bb83-9450e1063dd4 QoS: [reliability,twoway] ]
    [ icebox-IceStorm: Protocol: sending reply
    message type = 2 (reply)
    compression status = 0 (not compressed; do not compress response, if any)
    message size = 25
    request id = 36
    reply status = 0 (ok) ]

    Do you have any idea what might make it take so long to complete the first transmission.
  • Addendum to the problem above.

    The topic may have other old subscribers registered, that are no longer working. These are kicked immediately, so it doesn't seem to be processing those that take time. The proxies that are registered for subscription are direct proxies generated with a default endpoint, so no locatorservice or the like should be needed.
  • benoit
    benoit Rennes, France
    Can you try to run the service with --Ice.Trace.Network=2 and --Ice.Trace.Retry=2? This might give us some clues on what IceStorm is doing in this 60s period.

    Benoit.
  • I'll try running it with the extra tracing early tomorrow.

    However further testing seemed to indicate (Contrary to what I said earlier) that if we where careful about unsubscribing all the old subscribers we did not encounter the problem. However this isn't really a useful solution, as crashes etc. makes it impossible for us to assure that subscribers will always be unsubscribed.

    But more about this tomorrow. And thanks for the quick reply.

    mvh

    Nis
  • We have finally tracked the problem.

    When we have a dead subscriber proxy which points to a machine that is running a firewall (In this case the build in firewall in Windows XP SP2), the IceStorm service hangs when trying to establish the tcp connection. Presumeably this is because the firewall is deliberately slow about refusing the connection to avoid DOS attacks or the like.

    Some of these problems can be solved by being better at unsubscribing, controlling what ports our clients create proxies on etc. However as clients will crash, and will be running on machines with firewalls, it isn't really acceptable that the IceStorm server hangs for minutes at a time just because of clever firewalls. So is there any kind of configuration paramters that can be used to control this.

    mvh

    Nis
  • benoit
    benoit Rennes, France
    Hi Nis,

    Yes, this is a tricky problem :).

    First of all, regarding your IceStorm publisher blocking while IceStorm tries to establish the connection to your dead subscriber -- we are aware of this problem and we are thinking of providing in the near future an additional configuration where it's possible to have IceStorm operate in "buffered" mode. That is, the update will be buffered by the thread dispatching the publisher invocation and will be sent to your subscribers from another dedicated thread. This way, the publishers and the subscribers are decoupled and you can be sure that the publisher won't block if something's wrong with the subscribers.

    Of course, this might not totally solve your issue if you want all your updates to be delivered in a timely manner. Indeed, if a subscriber's dead, the IceStorm sender thread will block for some time trying to establish the connection to this dead subscriber and it won't be able to send updates to other subscribers while it's waiting. Short timeouts for connection establishment might solve to some degree this issue, see the documentation for the property Ice.Override.ConnectTimeout.

    Another solution if you are in an environment where you don't have any control on the clients would be to use Glacier2. The Glacier2 router not only acts as a firewall to prevent misbehaving clients to access your back-end servers, it also provides functionality to protect your back-ends against misbehaving clients. In this particular case, Glacier2 would ensure that all the requests from IceStorm to the client are buffered (you don't even need anymore a "buffered" mode for IceStorm). If a client is mis-behaving, you can be sure that IceStorm won't block. I recommend taking a closer look at the Glacier2 documentation for more information.

    Hope this helps!

    Benoit.
  • benoit wrote:
    Are you using Ice 2.1.0 by any chance?

    This would explain this exception. The following change was made in Ice 2.1.0 (see the CHANGES file in your Ice distribution for the list of all the changes):



    In your case, the connection was closed by ACM (indicated by the Ice::ConnectionTimeoutException). So when you call again on the oneway proxy, you get an exception indicating that the connection was closed. Unlike previous Ice version, the call isn't retried (for the reason mentioned in the paragraph quoted above).


    Please let us know if you have any questions related to this change!

    Hmm.

    The general policy of not retrying on oneway connections seems wellfounded (Although I must admit I don't know quite enough about IP stacks to evaluate how much of a problem batched data would be).

    However it seems to me that the disconnection caused by ACM is of a different quality than a disconnection caused by communication failures, as the ACM makes a diliberate disconnect. If that is correct then even a oneway connection should be able to figure out that it had been put in an idle state rather than simply know that it is disconnected, and thus that the oneway connection should be able to recover from an ACM disconnect. But there might be issues I'm unaware of.
  • benoit wrote:
    Hi Nis,

    Yes, this is a tricky problem :).

    Glad you agree.
    benoit wrote:
    First of all, regarding your IceStorm publisher blocking while IceStorm tries to establish the connection to your dead subscriber -- we are aware of this problem and we are thinking of providing in the near future an additional configuration where it's possible to have IceStorm operate in "buffered" mode. That is, the update will be buffered by the thread dispatching the publisher invocation and will be sent to your subscribers from another dedicated thread. This way, the publishers and the subscribers are decoupled and you can be sure that the publisher won't block if something's wrong with the subscribers.

    It seems to me that this decoupling of subscribers and publishers can be achieved simply by using oneway communication to over the IceStorm service. Or is there something I'm missing ?
    benoit wrote:
    Of course, this might not totally solve your issue if you want all your updates to be delivered in a timely manner.

    Well - That would be preferable. At least it would be preferable if one bad subscriber would not affect communication to all the other subscribers.
    benoit wrote:
    Indeed, if a subscriber's dead, the IceStorm sender thread will block for some time trying to establish the connection to this dead subscriber and it won't be able to send updates to other subscribers while it's waiting. Short timeouts for connection establishment might solve to some degree this issue, see the documentation for the property Ice.Override.ConnectTimeout.

    It seems to me that practically any timeout will be too long :(

    Maybe we are going about it all wrong. We are designing a MMORPG, where quite a lot of information has to be broadcasted to a number of clients, and that seemed to be what IceStorm was made for. But with that number of clients, there practically always be some of them that have network problems. Ofcourse we can't do anything for the individual client, but if that problem also adversly affects all the other subscribers to that topic, it seems a bit useless. Some sort of tiered solution where a connection can be skipped and buffered for later if it doesn't respond quickly enough would seem better then discarding connections outright.
    benoit wrote:
    Another solution if you are in an environment where you don't have any control on the clients would be to use Glacier2. The Glacier2 router not only acts as a firewall to prevent misbehaving clients to access your back-end servers, it also provides functionality to protect your back-ends against misbehaving clients. In this particular case, Glacier2 would ensure that all the requests from IceStorm to the client are buffered (you don't even need anymore a "buffered" mode for IceStorm). If a client is mis-behaving, you can be sure that IceStorm won't block. I recommend taking a closer look at the Glacier2 documentation for more information.

    It isn't so much a case of worrying about misbehaving clients accessing stuff they shouldn't but more a problem of the potential for random crashes and other uncontrollable problems leading to unclean disconnects. However it does sound as if Glacier2 might be worth looking at.
  • benoit
    benoit Rennes, France
    We've been discussing internally this issue with ACM and oneways not working well together. Without going too much into details, connections which are used for sending oneway or bidirectinal connections won't be closed anymore by ACM if it's not safe.

    I strongly recommend you to take a look at Glacier2. It is perfectly suited for the kind of application are you developing. It will ensure that your back end services (such as IceStorm in this case) won't be affected by any sort of communication problems of your clients (client crashing, client not answering, etc). It also provide many other features which will be very useful for your application -- session management for example.

    Benoit.