Archived

This forum has been archived. Please start a new discussion on GitHub.

IceGrid fails to destroy process

Hi,

Configuration: Ice 3.3.1, Windows XP SP3, language C#

I am using IceGrid.Admin to start and stop servers programmatically. On rare occassions I have experienced a situation where IceGrid reports a server as inactive, but the process is still visible in the OS Task Manager. Attaching VS2008 debugger I observed a number of Ice related threads still in existance. With Activator tracing enabled, the messages indicated IceGrid had detected a terminated process, which perhaps explains why the process was not forcibly killed beyond the 2 sec deactivation timeout.

Is it possible for this scenario to occur? eg failure to destroy the communicator.
Does IceGrid always check the process table to determine the state of a server process.

It might be worth mentioning that the process calls shutdown on the communicator inside a servant invocation method called PowerOff and then proceeds to abort other local threads. The main thread which is blocked by communicator.WaitForShutdown unblocks and calls communicator.Destroy. PowerOff() may still be active shutting down its local threads.

Is it correct that Destroy should only be called when all Ice invocations are complete?

Perhaps I should remove the communicator.shutdown() method from PowerOff() and let the default Process facet call shutdown?

Cheers John

Comments

  • bernard
    bernard Jupiter, FL
    Hi John,

    If this server is started by IceGrid (an IceGrid node), the IceGrid node will monitor the process and detect its termination when the process is gone; as a result, what you describe should never occur, unless the IceGrid node itself has died/been killed.

    Then, regarding destroy, you don't have to wait until all invocations have completed to call destroy. See Why does my program hang when I destroy a communicator?

    Best regards,
    Bernard
  • I can reliably recreate the problenm

    Hi,

    I have modified my app to reliably fail.

    The code below is a brief outline of my server and client apps.
    If I remove the Ice shutdown call from the servant, IceGrid kills the server process after the specified deactivation timeout (2 sec);). If I leave the call in, IceGrid fails to destroy the process.:(

    With the shutdown call included, IceGrid reports the server state to be Deactivating prior to commanding IceGrid to stop the server. However the admin stopSever() method fails with a NodeUnreachableException.

    Without the shutdown call included, IceGrid reports the server state to be Active prior to commanding IceGrid to stop the server.

    // Servant main thread
    
    static void Main()
    {
       // create and activate communicator
       // create power control servant
       // activate adapter
    
      communicator.waitForShutdown();
      communicator.destroy();
    
      while (true)
      {
         Thread.Sleep(1000);
         Console.WriteLine("Tick...");
      }
    
    }
    
    // Servant
    
    class PowerControlServant : Slice.PowerControl.PowerDisp__
    {
       public override void PowerOff(Ice.Current current__)
       {
           communicator.shutdown();
       }
    
    }
    
    // Client app
    
    main()
    {
       // Start all servers using IceGrid Admin session
       // Create proxy to PowerControlServant
       // Issue PowerOff command on proxy
       // Stop all servers using IceGrid Admin session
    }
    
  • bernard
    bernard Jupiter, FL
    Hi John,

    The overall communications here are:

    <your app> -- IceGrid registry -- IceGrid node -- the server

    and NodeUnreachableException means the IceGrid registry can't reach the IceGrid node.

    Can you check your IceGrid node? Is is still running and responding? Does it log any error message?

    If you're currently running this IceGrid node as a Windows service, you may want to temporarily switch to a command-line IceGrid node as this would be easier to troubleshoot.

    Thanks,
    Bernard
  • hi,

    My config is indeed <client>----<registry>
    <node>
    <server>

    I am running Registry and Node as a console applications

    After the admin call to stop the server has thrown an exception, I then manully kill the server process using task manager, once I manully kill the process. When the process is manually killed the Node.Activator trace output indicates it has detected termination of the server, and the Node.Server trace output indicates the server state has changed to inactive.

    It doesn't appear that Node has died as I can repeat commands on the client to start the server and call servant methods. I also have my own equivalent of IceGridAdmin HMI that displays active Nodes. If IceGrid Node had died it would have shown up on my utility.

    Cheers JH
  • benoit
    benoit Rennes, France
    Hi John,

    Calling communicator shutdown from a servant invocation is fine, it should cause the waitForShutdown call on the communicator to return. Calling destroy on the communicator will interrupt any pending two-way calls: they will return and throw an Ice::CommunicatorDestroyedException.

    When stopping a server with the IceGrid::Admin::stop() method, IceGrid tries to shutdown the server communicator and then waits for the deactivation timeout configured for the server (60s by default). If the server is not terminated after the deactivation timeout, IceGrid kills it.

    From your last email it sounds like IceGrid is correctly detecting the death of the process when you kill it with the task manager. The NodeUnreachableException shouldn't occur however. It looks like the stop call on the node is timing out. Could you tell us a bit more about the configuration of your server and the IceGrid registry/node? Or perhaps you could try reproducing the problem using one of our IceGrid demos?

    Cheers,
    Benoit.
  • Hi,

    I have attached a modified version of the IceGrid simple demo. The README file contains instructions to reveal the problem.

    Cheers John
  • benoit
    benoit Rennes, France
    Hi John,

    Thanks for the test case. I was able to reproduce the problem and it turns out to be a bug in IceGrid. If the server is not deactivated by IceGrid (but is deactivated as a result of shutting down the communicator through other means than IceGrid), the deactivation timeout is ignored and if the server hangs on shutdown the server doesn't get killed by the node. I will prepare a patch that fixes this problem, IceGrid will kill the server after the deactivation timeout and as soon as it detects than the server is being deactivated.

    Cheers,
    Benoit.
  • Hi,

    The work around for my application is quite simple, i.e. I will not explicity call communicator.shutdown() inside a servant method invocation. Rather I will let the Process facet do the work.

    FYI. The existance of the shutdown() call inside a method was a hang over from when automatic activation of servers was used. All of the servers in my application are now started manually via IceGrid admin, so a separate shutdown() call is redundant.

    Cheers John