Glacier2 session expiration

in Help Center
Previously (http://www.zeroc.com/vbulletin/showthread.php?t=2083) we had some problems with an unresponsive glacier router refusing to create sessions, and those seem to have been resolved with the patch we got at the time
(http://www.zeroc.com/vbulletin/showthread.php?t=2165)
However now I'm seeing sort of the reserve problem, namely a glacier router where sessions are not timing out. Using netstat we try to monitor the number of network connections seen by the operating system, and when our server setup gets strained this number starts diverging from the number of active sessions maintained by our session manager.
Going by our logs the session manager has a number of inactive sessions that it never receives session timeouts for, because the glacier router never sends them. When this starts happening the glacier router has typically been experiencing a lot of exceptions of the following sort:
Glacier is reverse routing server updates being broadcast via IceStorm, and the errors are presumably caused by crashed clients. Once this happens the sessions belonging to these crashed clients don't seem to be timed out. We are currently only running protocol (not network) tracing on the glacier router, and no protocol messages are associated with these expections.
The problem may very well be somewhere else in the system, but the going by the logs the glacier router was the only service experiencing unhandled exceptions, so it seemed like a reasonable place to start.
The whole setup is patched (We got an extra garbage collector patch IIRC) Ice 3.0.1 running on Fedora Core 4 (32 bit) Linux.
(http://www.zeroc.com/vbulletin/showthread.php?t=2165)
However now I'm seeing sort of the reserve problem, namely a glacier router where sessions are not timing out. Using netstat we try to monitor the number of network connections seen by the operating system, and when our server setup gets strained this number starts diverging from the number of active sessions maintained by our session manager.
Going by our logs the session manager has a number of inactive sessions that it never receives session timeouts for, because the glacier router never sends them. When this starts happening the glacier router has typically been experiencing a lot of exceptions of the following sort:
glacier2router: warning: dispatch exception: ConnectionI.cpp:268: Ice::ForcedCloseConnectionException: protocol error: connection forcefully closed identity: >.~B?$=Js66_fp][email protected]/GameClient_e77da067-3158-4965-9dd5-a4567283d335 facet: operation: receiveInfolecule ... glacier2router: warning: dispatch exception: ConnectionI.cpp:2040: Ice::CloseConnectionException: protocol error: connection closed identity: :iK7>**oO}K-y_(ge}f=/GameClient_9d18b4c0-d047-4438-9e77-5f48cdc87892 facet: operation: destroyGameObject glacier2router: warning: dispatch exception: ConnectionI.cpp:2040: Ice::CloseConnectionException: protocol error: connection closed identity: :iK7>**oO}K-y_(ge}f=/GameClient_9d18b4c0-d047-4438-9e77-5f48cdc87892 facet: operation: destroyGameObject glacier2router: warning: dispatch exception: ConnectionI.cpp:2040: Ice::CloseConnectionException: protocol error: connection closed identity: QGp7y$fJGq\/[email protected]/GameClient_d4223607-ea42-4ae9-bb6a-56c0c372042d facet: operation: destroyGameObject glacier2router: warning: dispatch exception: ConnectionI.cpp:2040: Ice::CloseConnectionException: protocol error: connection closed identity: :iK7>**oO}K-y_(ge}f=/GameClient_9d18b4c0-d047-4438-9e77-5f48cdc87892 facet: operation: destroyGameObject glacier2router: warning: dispatch exception: ConnectionI.cpp:2040: Ice::CloseConnectionException: protocol error: connection closed identity: QGp7y$fJGq\/[email protected]/GameClient_d4223607-ea42-4ae9-bb6a-56c0c372042d facet: operation: destroyGameObject ... glacier2router: warning: dispatch exception: TcpTransceiver.cpp:285: Ice::ConnectionLostException: connection lost: Connection reset by peer identity: R=2My\'\\\'EJ0_kuG^Cvq3/elithrill_b262f778-c70a-4ca1-aa28-629d2c29ae6d facet: operation: setTargetObject glacier2router: warning: dispatch exception: TcpTransceiver.cpp:285: Ice::ConnectionLostException: connection lost: Connection reset by peer identity: R=2My\'\\\'EJ0_kuG^Cvq3/samael_2c920bc1-6bf4-432d-865c-94a6c169f9b9 facet: operation: setActorTarget
Glacier is reverse routing server updates being broadcast via IceStorm, and the errors are presumably caused by crashed clients. Once this happens the sessions belonging to these crashed clients don't seem to be timed out. We are currently only running protocol (not network) tracing on the glacier router, and no protocol messages are associated with these expections.
The problem may very well be somewhere else in the system, but the going by the logs the glacier router was the only service experiencing unhandled exceptions, so it seemed like a reasonable place to start.
The whole setup is patched (We got an extra garbage collector patch IIRC) Ice 3.0.1 running on Fedora Core 4 (32 bit) Linux.
0
Comments
These exceptions are expected if some client dies. For example the following warning means that the routing of the receivedInfolecule request failed because the connection with the client was closed (forcefully in this case, which indicates that Glacier2 explicitly closed the connection).
I suppose these exceptions are the result of the backend server invocations (IceStorm in your case, right?) to your clients so I'm surprised you don't see protocol messages for these exceptions. You should see "received request" protocol traces for "receiveInfolecule" for example (shortly before the warning).
Otherwise, we're not aware of any problems with session timeouts. The Glacier2 router should invoke destroy() on a session if it didn't receive any requests from the client for the duration configured with Glacier2.SessionTimeout. How do you figure out which sessions are inactive in your session manager? You should see traces when the router destroys a session (with Glacier2.Trace.Session set to 1), are you seeing these traces for the sessions that you consider inactive?
Also, could you try to set the timeout on the Glacier2 client endpoint to see if this makes a difference? For example: Glacier2.Client.Endpoints=tcp -p 10005 -t 60000
Cheers,
Benoit.
I assumed as much.
Basically we have a list of session in the session manager, and for debugging purposes we also monitor their activity level as well as try to connect them to the network connections found via 'netstat'. Comparing these things seemed to indicate that the sessions whihch where never expired corrosponded to the sessions experiencing the multiple exceptions.
Not as far as I've been able to find, but I must admit that we have quite a few megabytes of log and I may have missed it. But it was exactly the lack session destruction messages that made me think there might be a more serious problem.
I should also mention that clients that don't crash don't seem to have any problem expiring their sessions etc.
Could certainly try that.
I'll also try to see if I can clean up the session expiration procedure in our session manager and see if there are some hidden problems there. I have a sneaking suspision it make be throwing and unhandled expection or something .
mvh
Nis
It is in general a good idea to always use timeouts in unreliable networks, such as the public Internet.
We found an issue with Glacier2 session destruction which could cause the problem you're mentioning here. You might want to try the fix I've posted [thread=2212]here[/thread] and see if it helps. Note that this fix requires to change the client code.
Cheers,
Benoit.
The reason I might want to do this, is that sometimes I'c like to forcefully clean up from the server end, such as when people get kicked off the server.
mvh
Nis Haller Baggesen
http://www.zeroc.com/vbulletin/showthread.php?p=6910
Ok. Well, so far there is no major problem, as I can remove the objects the client uses for communication, so it was simply an effort to clean up a dangling session.
I agree, we have to add a method that allows the server to destroy a session. It's on our todo list