Archived

This forum has been archived. Please start a new discussion on GitHub.

why did some service restart frequently?

I tested some servers on local area network. they worked well. But when I deploied on wide area network. some of severices restart frequently. I noticed that there may be sometimes very short period of dis-connection of internet.

some message like this:

[ icegridnode: Activator: sent SIGKILL to server `CN_60.190.167.139' (pid = 16640) ]
[ icegridnode: Activator: detected termination of server `CN_60.190.167.139'
[ icegridnode: Network: accepted tcp connection
[ icegridnode: Activator: activating server `CN_60.190.167.139'
args = /opt/mysee/bin/cachenode --Ice.Config=/opt/mysee/var/icegridnode/servers/CN_60.190.167.139/config/config --Ice.Default.Locator=MyGrid/Locator:tcp -h 211.160.17.100 -p 22000 --Ice.ServerId=CN_60.190.167.139 ]
[ icegridnode: Activator: activated server `CN_60.190.167.139' (pid = 16695) ]

Comments

  • benoit
    benoit Rennes, France
    Hi,

    The IceGrid node doesn't kill servers unless you deactivate the server and the deactivation takes too much time. The deactivation timeout can be configured on a per server basis with the deactivation-timeout server attribute (see the IceGrid XML reference in the manual for more information) or if it's not set it defaults to the value of the IceGrid.Node.WaitTime property (which defaults to 60s).

    So, you should first figure out why the servers are being shutdown (do your clients explicitely shutdown the servers? are you using the Ice.ServerIdleTime property?) and then, eventually, why the deactivation takes too long. Could you set the following properties on the IceGrid node and send us the output of the node tracing? This will give us more information on the deactivaton of the server.
    IceGrid.Node.Trace.Server=3
    IceGrid.Node.Trace.Activator=2
    

    Cheers,
    Benoit.
  • It may result form this:

    The network is occasionlly dis-connected. IceGridNode loses contact with registry and throws exception. When connection is rebuilt, IceGridNode terminates the server.

    This may make big troubles for my project. First, IceGridNode can not shut down my server. It takes one minute and then kills the process. second,
    what I hope is that even IceGridNode loses contact with registry, we still want the server to work.

    Is there any suggestion? thanks a lot!
  • benoit
    benoit Rennes, France
    Hi,

    For the first point (the node kills the server if it doesn't shutdown in a timely maner), that's most likely a problem in your server implementation if it's not supposed to take that long to shutdown. The best way to figure this out, it to attach the debugger while the server is being shutdown and check the stack traces of the threads to see where it eventually hangs.

    For the second point, I'm afraid I don't see why the node would restart the server when the IceGrid registry is available again after the disconnection; this shouldn't happen. However, it's difficult to say without more information why this is happening for you. If you post the IceGrid node traces with the properties set as requested in my previous post, I'd be happy to look at the traces and see if this can give us some clues. Perhaps you can also try to reproduce this issue using the IceGrid demo from your Ice distribution (in demo/IceGrid/simple)?

    Btw, the name (and eventually URL) of the project you're working on is missing from your signature, could you please set this information? (see also [thread=1697]this thread[/thread] for more details on how to set your signature). Thanks!

    Cheers,
    Benoit.
  • shutdown:

    Our application need to deal some other issues. Therefore, It should first finish other transactions. we write a shutdown function which calls Ice::Service::shutdown() in the end.

    project: the servers behind mysee.com. we want implement an administration system based on Ice.

    restart:

    the following is a part of traces

    [ icegridnode: Server: changed server `CN_60.190.167.139' state to `Active' ]
    [ icegridnode: Network: shutting down tcp connection for writing
    local address = 60.190.167.145:36575
    remote address = 211.160.17.100:33347 ]
    [ icegridnode: Network: closing tcp connection
    local address = 60.190.167.145:36575
    remote address = 211.160.17.100:33347 ]
    [ icegridnode: Network: shutting down tcp connection for writing
    local address = 60.190.167.145:36575
    remote address = 211.160.17.100:33346 ]
    [ icegridnode: Network: closing tcp connection
    local address = 60.190.167.145:36575
    remote address = 211.160.17.100:33346 ]
    [ icegridnode: Network: accepted tcp connection
    local address = 60.190.167.139:36574
    remote address = 211.160.17.100:32940 ]
    [ icegridnode: Server: changed server `CN_60.190.167.139' state to `Deactivating' ]
    [ icegridnode: Activator: deactivating `CN_60.190.167.139' using process proxy ]
    [ icegridnode: Network: trying to establish tcp connection to 60.190.167.139:20000 ]
    [ icegridnode: Network: tcp connection established
    local address = 60.190.167.139:32967
    remote address = 60.190.167.139:20000 ]
    [ icegridnode: Network: shutting down tcp connection for writing
    local address = 60.190.167.139:32967
    remote address = 60.190.167.139:20000 ]
    [ icegridnode: Network: closing tcp connection
    local address = 60.190.167.139:32967
    remote address = 60.190.167.139:20000 ]
    [ icegridnode: Activator: sent SIGKILL to server `CN_60.190.167.139' (pid = 19180) ]
    [ icegridnode: Activator: detected termination of server `CN_60.190.167.139'
    signal = SIGKILL ]
    [ icegridnode: Server: changed server `CN_60.190.167.139' state to `Inactive' ]
    [ icegridnode: Network: accepted tcp connection
    local address = 60.190.167.139:36574
    remote address = 211.160.17.100:52502 ]
    [ icegridnode: Server: changed server `CN_60.190.167.139' state to `Activating' ]
    [ icegridnode: Activator: activating server `CN_60.190.167.139' ]
    [ icegridnode: Activator: activated server `CN_60.190.167.139' (pid = 19216) ]
    [ icegridnode: Server: changed server `CN_60.190.167.139' state to `WaitForActivation' ]
    Startup.
    CodePage:C.
    [ icegridnode: Server: changed server `CN_60.190.167.139' state to `Active' ]
    [ icegridnode: Network: shutting down tcp connection for writing
    local address = 60.190.167.139:36574
    remote address = 211.160.17.100:52502 ]
    [ icegridnode: Network: closing tcp connection
    local address = 60.190.167.139:36574
    remote address = 211.160.17.100:52502 ]
    [ icegridnode: Network: shutting down tcp connection for writing
    local address = 60.190.167.139:36574
    remote address = 211.160.17.100:32940 ]
    [ icegridnode: Network: closing tcp connection
    local address = 60.190.167.139:36574
    remote address = 211.160.17.100:32940 ]
    [ icegridnode: Network: accepted tcp connection
    local address = 60.190.167.139:36574
    remote address = 211.160.17.100:36856 ]
    [ icegridnode: Network: shutting down tcp connection for writing
    local address = 60.190.167.139:36574
    remote address = 211.160.17.100:36856 ]
    [ icegridnode: Network: closing tcp connection
    local address = 60.190.167.139:36574
    remote address = 211.160.17.100:36856 ]
    [ icegridnode: Network: accepted tcp connection
    local address = 60.190.167.139:36574
    remote address = 211.160.17.100:39709 ]
    [ icegridnode: Server: changed server `CN_60.190.167.139' state to `Deactivating' ]
    [ icegridnode: Activator: deactivating `CN_60.190.167.139' using process proxy ]
    [ icegridnode: Network: trying to establish tcp connection to 60.190.167.145:20000 ]
    [ icegridnode: Network: tcp connection established
    local address = 60.190.167.145:60307
    remote address = 60.190.167.145:20000 ]
    [ icegridnode: Network: shutting down tcp connection for writing
    local address = 60.190.167.145:60307
    remote address = 60.190.167.145:20000 ]
    [ icegridnode: Network: closing tcp connection
    local address = 60.190.167.145:60307
    remote address = 60.190.167.145:20000 ]
    [ icegridnode: Activator: sent SIGKILL to server `CN_60.190.167.139' (pid = 19216) ]
    [ icegridnode: Activator: detected termination of server `CN_60.190.167.139'
    signal = SIGKILL ]
    [ icegridnode: Server: changed server `CN_60.190.167.139' state to `Inactive' ]
    [ icegridnode: Network: accepted tcp connection
    local address = 60.190.167.139:36574
    remote address = 211.160.17.100:45943 ]
    [ icegridnode: Server: changed server `CN_60.190.167.139' state to `Activating' ]
    [ icegridnode: Activator: activating server `CN_60.190.167.139' ]
    [ icegridnode: Activator: activated server `CN_60.190.167.139' (pid = 19343) ]
    [ icegridnode: Server: changed server `CN_60.190.167.139' state to `WaitForActivation' ]
    Startup.
    CodePage:C.
    [ icegridnode: Server: changed server `CN_60.190.167.139' state to `Active' ]
    [ icegridnode: Network: shutting down tcp connection for writing
    local address = 60.190.167.139:36574
    remote address = 211.160.17.100:45943 ]
    [ icegridnode: Network: closing tcp connection
    local address = 60.190.167.139:36574
    remote address = 211.160.17.100:45943 ]
  • benoit
    benoit Rennes, France
    Hi,

    Could you please set the name and information about your project in your signature? (check [thread=1697]this thread[/thread] for information on how to edit your signature).

    Sorry, I'm afraid I still don't really understand what is expected and what is not :). Can you clarify the following points?
    • Is the shutdown of the server expected?
    • If it's expected, what's wrong with the deactivation of the server by the node?

    From the traces, it looks like the server is being shutdown explicitly (either with the icegridadmin tool stop command or with the IceGrid::Admin interface) and since the server shutdown takes too long, the IceGrid node eventually kills it.

    Cheers,
    Benoit.
  • I do some experiments. I change the demoj/icegrid/simple a little. therefore, they run on my machine but need to register to another machine.

    while the network is ok, everything is fine. I disconnected network and then re-connected it. it is still ok no restart. But if I disconnected network and used " t " command of Client, of course Client program throws some exceptions. then after re-connection, sent "t" again, icegridnode restarted the server.

    actually, in my project, servers have their own duty. Ice plays some role of administration of servers. we don't want icegrid to restart them, which may break some services. Do I make myself clear?
  • benoit
    benoit Rennes, France
    Hi,

    I was able to reproduce the problem with the simple demo and the steps from your post! I know what the problem is, servers are indeed restarted under certain circumstances when the registry is available again. This issue has already been fixed on our mainline so the fix will be available in the upcoming Ice 3.1 release.

    Cheers,
    Benoit.
  • Can you give me some suggestion to solve this problem? or maybe you can send me 3.1 version early? This problem blocks the project and may kills it.
  • bernard
    bernard Jupiter, FL
    If you can't wait for 3.1.0, I recommend to subscribe to our priority support:
    http://www.zeroc.com/support.html

    In this case, we could prepare a 3.0.1 patch for you.

    Best regards,
    Bernard