inadverdent icegridnode shutdown

alversonalverson Member Dennis AlversonOrganization: Braxton TechnologiesProject: Orbital Analysis
I am having some strange behavior when my ICE grid application is running for a long time. I have 3 ICE grid nodes running with various servers running inside of those nodes. At some point, all processes (icegridnode.exe and icebox.exe) go away. I look at the stderr files for each of the ICE grid nodes and it appears each one is shutting down each of the servers started on that node, effectively killing the entire application. It's not like one of the servers, i.e. icebox.exe, is crashing because then it would only be that process that goes away.

Under what conditions, or any clues, other than explicitly telling each of the nodes to shutdown, would all nodes be instructed to shutdown? I'm at a loss since I do not control how the nodes operate :(

Comments

  • alversonalverson Member Dennis AlversonOrganization: Braxton TechnologiesProject: Orbital Analysis
    One of those nodes hosts HA ICEStorm, so it has 3 icebox processes running.
  • benoitbenoit Rennes, FranceAdministrators, ZeroC Staff Benoit FoucherOrganization: ZeroC, Inc.Project: Ice ZeroC Staff
    Hi,

    On which operating system is the node running? Did you check for the system log to see if there could be some clues? Other than explicitly shutting down the node with one of the IceGrid Administrative utilities, the node can also shutdown if it's being terminated with a SIGTERM signal. If you activate protocol tracing with --Ice.Trace.Protocol, you'll see all the requests sent to the node by the registry. If it's being shutdown remotely by an administrative tool, you should see a shutdown request. If you don't see the request, this would indicate that the node caught a SIGTERM signal.

    Cheers,
    Benoit.
  • alversonalverson Member Dennis AlversonOrganization: Braxton TechnologiesProject: Orbital Analysis
    They are all running on Windows. I have enabled much of the tracing but I'll check the one you identified. Which log file would I look at, i.e. I have separate stdout and stderr logs for the nodes and for each server.
  • alversonalverson Member Dennis AlversonOrganization: Braxton TechnologiesProject: Orbital Analysis
    Do you have a suggestion for the quickest way to trap who is shutting down the nodes? Are you able to determine where the client proxy call occurred inside the nodei::shutdown method?
  • alversonalverson Member Dennis AlversonOrganization: Braxton TechnologiesProject: Orbital Analysis
    I am not explicitly sending a SIGTERM or using the icegrid admin services to shutdown the node. They are just all of sudden, out of the blue, shutting down. I'm having a lot of trouble figuring out how to trap this so I can figure out who is doing it. Any suggestions would be much appreciated.
  • benoitbenoit Rennes, FranceAdministrators, ZeroC Staff Benoit FoucherOrganization: ZeroC, Inc.Project: Ice ZeroC Staff
    Hi,

    Can you tell us a little more about your environment? Which Ice and Windows version are you using? Are you running the IceGrid node as a Windows service? Under which account and how did you install the windows service? It sounds as if the node is being randomly shutdown by the system. Is this something that just occurs now or has always been happening since you deployed the IceGrid nodes?

    Regarding the protocol tracing on the node, you should look at the log file of the icegridnode. If you run it as a Windows service, the logging probably goes to the Windows event logger instead. Did you check the system logs with the event viewer to see if there was an event that could explain the service shutdown?

    The protocol tracing will tell us whether or not the node receives a shutdown request from another Ice process but we won't know from which process (unless you also set Ice.Trace.Network=3 to see who sent the network packets). It will tell us whether or not the shutdown is initiated remotely or if it's initiated by the system.

    Cheers,
    Benoit.
  • alversonalverson Member Dennis AlversonOrganization: Braxton TechnologiesProject: Orbital Analysis
    I am running Ice 3.6 on Windows 7. I start each of my nodes from the command line and generate separate log files for all of my nodes. I've turned on protocol tracing as you suggested and can see that the shutdown was sent, but I can't tell from where. The problem is very intermittent. I'm trying to figure out how too trap who is sending the shutdown command. I thought you might have some insights for me on how to trap exactly where the shutdown is coming from when it happens. I've come to the place where I am now building Ice from source and trying to break into the NodeI::shutdown method.

    I've attached a log file from two of the nodes when it occurred just in case you can glean something quickly. I set Ice.Trace.Network=3 on this run. I'm in the middle of several high visibility demos and trying to track down this issue, so I'm a little scattered at the moment.

    Thanks very much for your assistance.
  • benoitbenoit Rennes, FranceAdministrators, ZeroC Staff Benoit FoucherOrganization: ZeroC, Inc.Project: Ice ZeroC Staff
    From the traces, I believe the shutdown request is being sent to a server as a result of the IceGrid node deactivation, the node isn't receiving a shutdown request from another process.

    Rather than breaking into NodeI::shutdown, you should add the breakpoint to Activator::shutdown from IceGrid/Activator.cpp. The stack at the point of the breakpoint will tell us what is causing the deactivation of the node. If the shutdown request isn't received from the wire, the only other possibility is that the shutdown comes from the Ctrl-C handler.

    Does the node spawn processes that might generate a Ctrl-C on the console?

    Cheers,
    Benoit.
  • alversonalverson Member Dennis AlversonOrganization: Braxton TechnologiesProject: Orbital Analysis
    I am not sending, i.e. initiating from console, ctrl-c from the command prompt window because I'm launching them in the background with a command prompt window. I'm launching them using a python script without giving each process it's own command window. Note that in production, I will launch them as Windows services, but I'm not doing that yet. I'm current adding the following code at the top of the method to attempt to get a stack trace of who is calling the various methods:
    [FONT=&quot]
    try { throw ::Ice::UnknownException(__FILE__, __LINE__, "method name here"); } catch (const ::Ice::UnknownException &ex) { ex.ice_stackTrace(); std::ofstream outfile("c:\\temp\\ice\\icedebug.txt", std::ios::out | std::ios::app); outfile << "************** " << ex.unknown << std::endl; outfile << "stack trace = " << std::endl << ex.ice_stackTrace() << std::endl; outfile.close(); } [/FONT]

    I have added this code sequence to

    Activator::shutdown
    Activator::sendSignal
    NodeI::shutdown

    Where does the IceGrid node deactivation get initiated from?
  • benoitbenoit Rennes, FranceAdministrators, ZeroC Staff Benoit FoucherOrganization: ZeroC, Inc.Project: Ice ZeroC Staff
    The call to Activator::shutdown initiates the shutdown of the node. This can be called from NodeI::shutdown (NodeI.cpp, shutdown initiated by the IceGrid GUI), ProcessI::shutdown (IceGridNode.cpp, shutdown initiated through the Ice.Admin facility) or NodeService::shutdown (IceGridNode.cpp, shutdown initiated by Ctrl-C handler).

    Since we don't see any shutdown invocation being received in the node tracing, the shutdown is likely initiated by the Ctrl-C handler (IceUtil/CtrlHandler.cpp). On Windows, we use SetConsoleCtrlHandler for the Ctrl-C handler implementation.

    Did you try to see if you have the same problem if you launch the IceGrid nodes from a command prompt directly?

    Cheers,
    Benoit.
  • alversonalverson Member Dennis AlversonOrganization: Braxton TechnologiesProject: Orbital Analysis
    I will launch each IceGrid node with it's own command window and see if it occurs. I will also test invoking the Ctrl-C handler on purpose just to get a feel for that path.

    I'll keep you posted if I can find anything else useful out. Thanks for your help thus far. Hopefully, I can trap it to determine exactly how it is being initiated.
Sign In or Register to comment.