Archived

This forum has been archived. Please start a new discussion on GitHub.

Glacier2 session expiration

Previously (http://www.zeroc.com/vbulletin/showthread.php?t=2083) we had some problems with an unresponsive glacier router refusing to create sessions, and those seem to have been resolved with the patch we got at the time

(http://www.zeroc.com/vbulletin/showthread.php?t=2165)

However now I'm seeing sort of the reserve problem, namely a glacier router where sessions are not timing out. Using netstat we try to monitor the number of network connections seen by the operating system, and when our server setup gets strained this number starts diverging from the number of active sessions maintained by our session manager.

Going by our logs the session manager has a number of inactive sessions that it never receives session timeouts for, because the glacier router never sends them. When this starts happening the glacier router has typically been experiencing a lot of exceptions of the following sort:
glacier2router: warning: dispatch exception: ConnectionI.cpp:268: Ice::ForcedCloseConnectionException:
protocol error: connection forcefully closed
identity: >.~B?$=Js66_fp]9@E8B/GameClient_e77da067-3158-4965-9dd5-a4567283d335
facet: 
operation: receiveInfolecule
...
glacier2router: warning: dispatch exception: ConnectionI.cpp:2040: Ice::CloseConnectionException:
protocol error: connection closed
identity: :iK7>**oO}K-y_(ge}f=/GameClient_9d18b4c0-d047-4438-9e77-5f48cdc87892
facet: 
operation: destroyGameObject
glacier2router: warning: dispatch exception: ConnectionI.cpp:2040: Ice::CloseConnectionException:
protocol error: connection closed
identity: :iK7>**oO}K-y_(ge}f=/GameClient_9d18b4c0-d047-4438-9e77-5f48cdc87892
facet: 
operation: destroyGameObject
glacier2router: warning: dispatch exception: ConnectionI.cpp:2040: Ice::CloseConnectionException:
protocol error: connection closed
identity: QGp7y$fJGq\/IS7@IJvlN/GameClient_d4223607-ea42-4ae9-bb6a-56c0c372042d
facet: 
operation: destroyGameObject
glacier2router: warning: dispatch exception: ConnectionI.cpp:2040: Ice::CloseConnectionException:
protocol error: connection closed
identity: :iK7>**oO}K-y_(ge}f=/GameClient_9d18b4c0-d047-4438-9e77-5f48cdc87892
facet: 
operation: destroyGameObject
glacier2router: warning: dispatch exception: ConnectionI.cpp:2040: Ice::CloseConnectionException:
protocol error: connection closed
identity: QGp7y$fJGq\/IS7@IJvlN/GameClient_d4223607-ea42-4ae9-bb6a-56c0c372042d
facet: 
operation: destroyGameObject
...
glacier2router: warning: dispatch exception: TcpTransceiver.cpp:285: Ice::ConnectionLostException:
connection lost: Connection reset by peer
identity: R=2My\'\\\'EJ0_kuG^Cvq3/elithrill_b262f778-c70a-4ca1-aa28-629d2c29ae6d
facet: 
operation: setTargetObject
glacier2router: warning: dispatch exception: TcpTransceiver.cpp:285: Ice::ConnectionLostException:
connection lost: Connection reset by peer
identity: R=2My\'\\\'EJ0_kuG^Cvq3/samael_2c920bc1-6bf4-432d-865c-94a6c169f9b9
facet: 
operation: setActorTarget

Glacier is reverse routing server updates being broadcast via IceStorm, and the errors are presumably caused by crashed clients. Once this happens the sessions belonging to these crashed clients don't seem to be timed out. We are currently only running protocol (not network) tracing on the glacier router, and no protocol messages are associated with these expections.

The problem may very well be somewhere else in the system, but the going by the logs the glacier router was the only service experiencing unhandled exceptions, so it seemed like a reasonable place to start.

The whole setup is patched (We got an extra garbage collector patch IIRC) Ice 3.0.1 running on Fedora Core 4 (32 bit) Linux.

Comments

  • Figured I might as well include the glacier configuration file:
    #Ice.Default.Host=192.168.0.31
    Ice.Default.Locator=IceGrid/Locator:tcp -p 5000
    
    # We must set the stack size of new threads created by Glacier2. The
    # default on Linux is typically in the 10MB range, which is way too
    # high.
    #
    # Since Glacier2 always uses thread-per-connection mode, we must use
    # the property below to set the thread stack size. Internal Glacier2
    # threads also use this property value.
    
    Ice.ThreadPerConnection.StackSize=262144
    
    # The client-visible endpoint of Glacier2. This should be an endpoint
    # visible from the public Internet, and it should be secure.
    #should be ssl - but that is for later
    #Glacier2.Client.PublishedEndpoints=tcp -h 83.91.134.45 -p 10005
    Glacier2.Client.Endpoints=tcp -p 10005
    
    
    # The server-visible endpoint of Glacier2. This endpoint is only
    # required if callbacks are needed (leave empty otherwise). This
    # should be an endpoint on an internal network (like 192.168.x.x), or
    # on the loopback, so that the server is not directly accessible from
    # the Internet.
    
    Glacier2.Server.Endpoints=tcp 
    
    # The configures the session manager. If no external session manager
    # is used, sessions are only handled Glacier2 internally.
    Glacier2.SessionManager=AccountManager:default -p 6500
    
    # For this demo, we use a dummy permissions verifier that is
    # collocated with the session server process. This dummy permissions
    # verifier allows any user-id / password combination.
    
    Glacier2.PermissionsVerifier=AccountManager:default -p 6500
    
    # The timeout for inactive sessions. If any client session is inactive
    # for longer than this value, the session expires and is removed. The
    # unit is seconds.
    #
    #Glacier2.SessionTimeout=20000
    Glacier2.SessionTimeout=60
    
    
    # Glacier can forward requests buffered or unbuffered. Unbuffered
    # means a lower resource consumption, as buffering requires one
    # additional thread per connected client or server. However, without
    # buffering, messages cannot be batched and message overriding doesn't
    # work either. Also, with unbuffered request forwarding, the caller
    # thread blocks for twoway requests.
    #
    Glacier2.Client.Buffered=1
    Glacier2.Server.Buffered=1
    
    
    # These two lines instruct Glacier2 to forward contexts both for
    # regular routing, as well as for callbacks (reverse routing).
    #
    Glacier2.Client.ForwardContext=1
    Glacier2.Server.ForwardContext=1
    
    # To prevent Glacier2 from being flooded with requests from or to one
    # particular client, Glacier2 can be configured to sleep for a certain
    # period after all current requests for this client have been
    # forwarded. During this sleep period, new requests for the client are
    # queued. These requests are then all sent once the sleep period is
    # over. The unit is milliseconds.
    
    Glacier2.Client.SleepTime=100
    Glacier2.Server.SleepTime=100
    
    # With the two settings below, Glacier2 can be instructed to always
    # batch oneways, even if they are sent with a _fwd/o instead of a
    # _fwd/O context.
    #
    Glacier2.Client.AlwaysBatch=0
    Glacier2.Server.AlwaysBatch=0
    
    
    # Glacier2 always disables active connection management so there is no
    # need to configure this manually. Connection retry does not need to
    # be disabled, as it's safe for Glacier2 to retry outgoing connections
    # to servers. Retry for incoming connections from clients must be
    # disabled in the clients.
    #
    
    
    # Various settings to trace requests, overrides, etc.
    #
    #Glacier2.Client.Trace.Request=1
    #Glacier2.Server.Trace.Request=1
    #Glacier2.Client.Trace.Override=1
    #Glacier2.Server.Trace.Override=1
    #Glacier2.Client.Trace.Reject=1
    Glacier2.Trace.Session=1
    Glacier2.Trace.RoutingTable=1
    
    
    # Other settings.
    Ice.Logger.Timestamp=1
    Ice.Trace.Network=2
    Ice.Trace.Protocol=1
    Ice.Warn.Connections=1
    Ice.MessageSizeMax=4096
    
    #Ice.Plugin.IceSSL=IceSSL:create
    #IceSSL.Client.CertPath=../../../certs
    #IceSSL.Client.Config=sslconfig.xml
    #IceSSL.Server.CertPath=../../../certs
    #IceSSL.Server.Config=sslconfig.xml
    #IceSSL.Trace.Security=1
    
    
    
  • benoit
    benoit Rennes, France
    Hi Nis,

    These exceptions are expected if some client dies. For example the following warning means that the routing of the receivedInfolecule request failed because the connection with the client was closed (forcefully in this case, which indicates that Glacier2 explicitly closed the connection).
    glacier2router: warning: dispatch exception: ConnectionI.cpp:268: Ice::ForcedCloseConnectionException:
    protocol error: connection forcefully closed
    identity: >.~B?$=Js66_fp]9@E8B/GameClient_e77da067-3158-4965-9dd5-a4567283d335
    facet: 
    operation: receiveInfolecule
    

    I suppose these exceptions are the result of the backend server invocations (IceStorm in your case, right?) to your clients so I'm surprised you don't see protocol messages for these exceptions. You should see "received request" protocol traces for "receiveInfolecule" for example (shortly before the warning).

    Otherwise, we're not aware of any problems with session timeouts. The Glacier2 router should invoke destroy() on a session if it didn't receive any requests from the client for the duration configured with Glacier2.SessionTimeout. How do you figure out which sessions are inactive in your session manager? You should see traces when the router destroys a session (with Glacier2.Trace.Session set to 1), are you seeing these traces for the sessions that you consider inactive?

    Also, could you try to set the timeout on the Glacier2 client endpoint to see if this makes a difference? For example: Glacier2.Client.Endpoints=tcp -p 10005 -t 60000

    Cheers,
    Benoit.
  • benoit wrote:
    These exceptions are expected if some client dies. For example the following warning means that the routing of the receivedInfolecule request failed because the connection with the client was closed (forcefully in this case, which indicates that Glacier2 explicitly closed the connection).

    I assumed as much.
    benoit wrote:
    Otherwise, we're not aware of any problems with session timeouts. The Glacier2 router should invoke destroy() on a session if it didn't receive any requests from the client for the duration configured with Glacier2.SessionTimeout. How do you figure out which sessions are inactive in your session manager?

    Basically we have a list of session in the session manager, and for debugging purposes we also monitor their activity level as well as try to connect them to the network connections found via 'netstat'. Comparing these things seemed to indicate that the sessions whihch where never expired corrosponded to the sessions experiencing the multiple exceptions.
    benoit wrote:
    You should see traces when the router destroys a session (with Glacier2.Trace.Session set to 1), are you seeing these traces for the sessions that you consider inactive?

    Not as far as I've been able to find, but I must admit that we have quite a few megabytes of log and I may have missed it. But it was exactly the lack session destruction messages that made me think there might be a more serious problem.

    I should also mention that clients that don't crash don't seem to have any problem expiring their sessions etc.
    benoit wrote:
    Also, could you try to set the timeout on the Glacier2 client endpoint to see if this makes a difference? For example: Glacier2.Client.Endpoints=tcp -p 10005 -t 60000

    Could certainly try that.

    I'll also try to see if I can clean up the session expiration procedure in our session manager and see if there are some hidden problems there. I have a sneaking suspision it make be throwing and unhandled expection or something .

    mvh

    Nis
  • marc
    marc Florida
    You must set a timeout on the client endpoint as Benoit suggested. If a client requests a session destruction, then Glacier2 will gracefully close the connection to the client. However, if the client doesn't pick up the response for the graceful closure (due to a network problem, or because it crashes right at this moment), then the connection will never be closed if not timeout is set, and destroy() will never be called on the session object.

    It is in general a good idea to always use timeouts in unreliable networks, such as the public Internet.
  • benoit
    benoit Rennes, France
    Hi Nis,

    We found an issue with Glacier2 session destruction which could cause the problem you're mentioning here. You might want to try the fix I've posted [thread=2212]here[/thread] and see if it helps. Note that this fix requires to change the client code.

    Cheers,
    Benoit.
  • Thanks, we will get right on applying that fix and let you know how it works out.
  • Btw. I was wondering how one would go about shutting down a session from the server end. I can obviously destroy the session object created by my session manager by calling destroy on it, but I was uncertain if the destroyed the session held by the router. The router->destroySession method doesn't seem appropriate either, since the server wasn't the one creating the session.

    The reason I might want to do this, is that sometimes I'c like to forcefully clean up from the server end, such as when people get kicked off the server.

    mvh

    Nis Haller Baggesen
  • marc
    marc Florida
    At present you can really only do this in your application logic. You can't explicitly destroy sessions in Glacier2 from the server. See also this post:

    http://www.zeroc.com/vbulletin/showthread.php?p=6910
  • marc wrote:
    At present you can really only do this in your application logic. You can't explicitly destroy sessions in Glacier2 from the server. See also this post:

    http://www.zeroc.com/vbulletin/showthread.php?p=6910

    Ok. Well, so far there is no major problem, as I can remove the objects the client uses for communication, so it was simply an effort to clean up a dangling session.
  • marc
    marc Florida
    Ok. Well, so far there is no major problem, as I can remove the objects the client uses for communication, so it was simply an effort to clean up a dangling session.

    I agree, we have to add a method that allows the server to destroy a session. It's on our todo list :)