Archived

This forum has been archived. Please start a new discussion on GitHub.

ACM Problem?

Hello,

I'm having some problems keeping my IceTouch client connected to the Glacier2 router. I randomly get SessionRefreshException. I have a two minute timeout on the router, and the client pings the router every minute.

Here's the trace on the Glacier:
[ 03/10/09 13:22:49.151 glacier2router: Network: accepted tcp connection
  local address = 74.63.67.68:4502
  remote address = 24.62.76.106:50519 ]
[ 03/10/09 13:22:49.260 glacier2router: Network: trying to establish tcp connect
ion to 127.0.0.1:10001 ]
[ 03/10/09 13:22:49.292 glacier2router: Network: tcp connection established
  local address = 127.0.0.1:1566
  remote address = 127.0.0.1:10001 ]
[ 03/10/09 13:22:49.620 glacier2router: Network: accepted tcp connection
  local address = 127.0.0.1:1065
  remote address = 127.0.0.1:1569 ]
[ 03/10/09 13:23:07.495 glacier2router: Network: accepted tcp connection
  local address = 127.0.0.1:1065
  remote address = 127.0.0.1:1574 ]
[ 03/10/09 13:27:58.323 glacier2router: Network: closing tcp connection
  local address = 127.0.0.1:1065
  remote address = 127.0.0.1:1574 ]
[ 03/10/09 13:27:59.073 glacier2router: Network: closing tcp connection
  local address = 74.63.67.68:4502
  remote address = 24.62.76.106:50519 ]
[ 03/10/09 13:27:59.088 glacier2router: Network: accepted tcp connection
  local address = 127.0.0.1:1065
  remote address = 127.0.0.1:1576 ]
[ 03/10/09 13:27:59.120 glacier2router: Network: closing tcp connection
  local address = 127.0.0.1:1065
  remote address = 127.0.0.1:1576 ]
[ 03/10/09 13:27:59.151 glacier2router: Network: accepted tcp connection
  local address = 127.0.0.1:1065
  remote address = 127.0.0.1:1577 ]
[ 03/10/09 13:27:59.167 glacier2router: Network: closing tcp connection
  local address = 127.0.0.1:1065
  remote address = 127.0.0.1:1577 ]
[ 03/10/09 13:28:26.260 glacier2router: Network: accepted tcp connection
  local address = 74.63.67.68:4502
  remote address = 24.62.76.106:50520 ]
[ 03/10/09 13:28:26.323 glacier2router: Glacier2: rejecting request. no session
is associated with the connection.
  identity: Glacier2/router ]
[ 03/10/09 13:28:26.354 glacier2router: Network: closing tcp connection
  local address = 74.63.67.68:4502
  remote address = 24.62.76.106:50520 ]

The first accept is the IceTouch client, next is my server, and then comes IceStorm. Everything works perfectly until the first close message. It's closing a connection to IceStorm. Immediately after my client is disconnected from the router in this case; however, many cases it takes some time after the close to IceStorm for my client to get disconnected. Any ideas for me?

Thanks,
Pete

Comments

  • matthew
    matthew NL, Canada
    Do you actually have ACM enabled? You must disable ACM and retry with Glacier2 clients.
    #
    # No active connection management is permitted with Glacier2.
    # Connections must remain established.
    #
    Ice.ACM.Client=0
    
    #
    # Connection retry is not possible with Glacier2. Connections must
    # remain established.
    #
    Ice.RetryIntervals=-1
    

    You should probably enable more tracing in your glacier2 server. Try setting:
    Glacier2.Trace.Session=1
    

    If Glacier2 is expiring the session you'll see "expiring session" in the glacier2 logs.
  • Hi Matthew,

    I do have both those properties disabled on the client. I think it may be a problem with ACM between the servers. After enabling session tracing, I learn that the session does not expire.

    I did see this in the IceStorm trace however:
    [ 03/10/09 15:19:49.682 icebox: Subscriber: 031100486_gC5;eC.V}JL$Bb1!4h/C70A8135-9162-4509-BF5F-65C2EA75333B: subscriber errored out: ConnectionI.cpp:1313: Ic
    e::ConnectTimeoutException:timeout while establishing a connection retry: 0/0 ]
    

    At nearly the same time, I see these in the Glacier2 trace:
    [ 03/10/09 15:19:30.401 glacier2router: Network: closing tcp connection
      local address = 127.0.0.1:1700
      remote address = 127.0.0.1:1707 ]
    [ 03/10/09 15:19:49.682 glacier2router: Network: accepted tcp connection
      local address = 127.0.0.1:1700
      remote address = 127.0.0.1:1710 ]
    [ 03/10/09 15:19:49.698 glacier2router: Network: closing tcp connection
      local address = 127.0.0.1:1700
      remote address = 127.0.0.1:1710 ]
    [ 03/10/09 15:19:49.729 glacier2router: Network: accepted tcp connection
      local address = 127.0.0.1:1700
      remote address = 127.0.0.1:1711 ]
    [ 03/10/09 15:19:49.745 glacier2router: Network: closing tcp connection
      local address = 127.0.0.1:1700
      remote address = 127.0.0.1:1711 ]
    

    Pete
  • matthew
    matthew NL, Canada
    You should be able to use ACM on the server back end, although its probably not that useful as there will likely be very few connections internally.

    The client is not disconnected from Glacier2? Perhaps you should enable network tracing to see what endpoints icestorm is trying to connect with when it gets a timeout?
  • Thanks Matthew,

    Yes it seems like I'm able to make requests on the session proxy without interruption. The observer object on the client is what stops getting the messages.

    Looking at the following trace, it seems that the timeout exists between IceStorm and Glacier2:
    [ 03/10/09 16:39:11.057 icebox: Network: trying to establish tcp connection to 1
    27.0.0.1:1924 ]
    [ 03/10/09 16:39:11.073 icebox: Network: tcp connection established
      local address = 127.0.0.1:1943
      remote address = 127.0.0.1:1924 ]
    [ 03/10/09 16:39:11.104 icebox: Subscriber: 022C9DA8 ]uBU|?<{inl&E,lj9F\'?/46321
    E83-7FE0-4950-8BDE-B553BF16B3D1: subscriber errored out: OutgoingAsync.cpp:638:
    Ice::TimeoutException:
      timeout while sending or receiving data retry: 0/0 ]
    [ 03/10/09 16:39:11.151 icebox: Network: closing tcp connection
      local address = 127.0.0.1:1943
      remote address = 127.0.0.1:1924 ]
    
    [ 03/10/09 16:39:11.073 glacier2router: Network: accepted tcp connection
      local address = 127.0.0.1:1924
      remote address = 127.0.0.1:1943 ]
    [ 03/10/09 16:39:11.167 glacier2router: Network: closing tcp connection
      local address = 127.0.0.1:1924
      remote address = 127.0.0.1:1943 ]
    

    What's confusing to me here is that it looks like IceStorm was able to connect and transfer the data to Glacier2, but yet it errors out preventing any more updates to the subscriber.

    I want to add that's very random too. It'll work well for a while, and then this will happen suddenly.

    Pete
  • Another thing I'll throw on the table is this:
    [ 03/10/09 16:52:30.948 icebox: Topic: test: unsubscribe: ]uBU|?<{inl&E,lj9F\'?/
    46321E83-7FE0-4950-8BDE-B553BF16B3D1 ]
    [ 03/10/09 16:52:30.963 icebox: Topic: test: destroy ]
    

    This happened in the above trace when I disconnected the Glacier2 session as it usually should. However, this subscriber had already been errored out previously.

    Pete
  • Yes, so even though I get subscriber errored out in the trace, the iPhone actually received that RPC. This must mean that all communications were successful. Any clues why Ice throws the TimeoutException?

    This leads IceStorm to believe the subscriber is no longer valid, and the subscriber receives nothing more from that topic.

    Thanks,
    Pete
  • benoit
    benoit Rennes, France
    Are you using Ice 3.3.0 for IceStorm and Glacier2?

    The timeout used by IceStorm 3.3.0 to forward messages to subscribers is the one defined with the IceStorm <service>.Send.Timeout property. Are you perhaps setting it to a value which is too low?

    Can you post your IceStorm and Glacier2 configuration files? Also, which QoS settings does your subscriber use?

    Cheers,
    Benoit.
  • Hi Benoit,

    Yes, they are all running on the same machine with Ice-3.3.0.

    I do not set the timeout property you mention, so I'm assuming it's using the default.

    IceStorm config:
    Freeze.DbEnv.BAGSIceStorm.DbHome=../db
    BAGSIceStorm.TopicManager.Endpoints=tcp -p 10002
    BAGSIceStorm.Publish.Endpoints=tcp -p 10003
    Ice.ACM.Client=0
    IceStorm.ACM.Client=0
    IceStorm.Trace.TopicManager=2
    BAGSIceStorm.Trace.Topic=1
    BAGSIceStorm.Trace.Subscriber=1
    Ice.Trace.Network=2
    

    Glacier2 config:
    Glacier2.InstanceName=Glacier2
    Glacier2.Client.Endpoints=tcp -p 4502 -t 10000
    Glacier2.Server.Endpoints=tcp -h 127.0.0.1
    Glacier2.SessionManager=SessionManager:tcp -h 127.0.0.1 -p 10001
    Glacier2.PermissionsVerifier=Glacier2/NullPermissionsVerifier
    Glacier2.SessionTimeout=120
    Ice.ACM.Client=0
    Glacier2.Server.Buffered=0
    Glacier2.Client.Buffered=0
    Glacier2.Client.Trace.Reject=1
    Glacier2.Trace.Session=1
    Ice.Trace.Network=2
    

    Subscriber QoS:
    _topic->subscribeAndGetPublisher(IceStorm::QoS(), player->getObserver());
    

    I'm thinking this is ACM because when I see the connection opening and closing, that's when I usually get the subscriber errors. When the connection stays open between IceStorm and the Glacier, I've never had this problem. I can't seem to get ACM to shut off completely though.

    Thanks,
    Pete
  • matthew
    matthew NL, Canada
    [ 03/10/09 16:39:11.057 icebox: Network: trying to establish tcp connection to 1
    27.0.0.1:1924 ]
    [ 03/10/09 16:39:11.073 icebox: Network: tcp connection established
      local address = 127.0.0.1:1943
      remote address = 127.0.0.1:1924 ]
    [ 03/10/09 16:39:11.104 icebox: Subscriber: 022C9DA8 ]uBU|?<{inl&E,lj9F\'?/46321
    E83-7FE0-4950-8BDE-B553BF16B3D1: subscriber errored out: OutgoingAsync.cpp:638:
    Ice::TimeoutException:
      timeout while sending or receiving data retry: 0/0 ]
    

    Given that there is less than 50ms between the establishment and the timeout it certainly looks like you've overridden the default value of <value>.Send.Timeout (the default is 60 seconds). What is your icebox configuration?

    You also appear to have a mix of property prefixes
    BAGSIceStorm.Publish.Endpoints=tcp -p 10003
    Ice.ACM.Client=0
    IceStorm.ACM.Client=0
    

    One is using IceStorm and the other is BAGSIceStorm.
    IceStorm.ACM.Client=0
    

    This is a invalid config param.

    With respect to ACM, while there is likely little point in having it configured on the server back end, I don't think you should have to disable ACM. It shouldn't cause you any issues.
  • Hi Matthew,

    The mix up properties happened on my desperate plea for Ice to keep the connection between Glacier2 and IceStorm open.

    My IceBox config is rather simple:
    IceBox.Service.BAGSIceStorm=IceStormService,33:createIceStorm --Ice.Config=./config.icestorm.bags
    

    I'm certain that I don't override the timeout property, so why it's timing out so quickly I'm clueless.

    Pete
  • Well, I'm afraid after all of this, the solution was to reboot the Windows Server. Windows had been running for 22 days, and after a reboot all of these problems vanished. I've had my iPhone connected for over 30 mins now. I was lucky to get 5 before.

    I'm not too concerned by this since I plan on running this setup on CentOS when it goes into production. Hopefully, it won't happen there.

    Thanks a lot to both of you,
    Pete
  • benoit
    benoit Rennes, France
    Hi Pete,

    On which hardware is your Windows Server running? Are you running it under VMware by any chance? Anyway, it's good to hear it works again, let us know if this occurs again however.

    Cheers,
    Benoit.
  • Hi Benoit,

    Yes, it is a virtual machine. It's running on a Xen host as a fully virtualized machine. Have you experienced these issues before while running as a VM?

    Thanks,
    Pete
  • benoit
    benoit Rennes, France
    Hi,

    We did indeed see some issues with VMware in the past. But this was with different hardware (dual-AMD Athlon). It could also be that QueryPerformanceCounters (on which Ice timeouts rely) returns incorrect values on your platform (see here).

    If the problem shows up again, you could try re-compiling Ice to not use performance counters and see if this solves the problem, you can disable the use of performance counters by commenting out the following line in cpp/src/IceUtil/Time.cpp:
    // static InitializeFrequency frequencyInitializer;
    

    Cheers,
    Benoit.
  • Hi Benoit,

    I am seeing this problem again. It appears when IceStorm is running for more than a day or so. The host is a dual quad-core Xeon machine. I think that I'll give your suggestion a try.

    Thanks,
    Pete
  • For anyone interested, I moved to a CentOS development environment. The problems are gone. Benoit was probably correct about the performance counter issue. Otherwise, it must have been some other Windows quirk.

    Pete