Archived

This forum has been archived. Please start a new discussion on GitHub.

Adapter activate hangs

Hi,

I'm running an ice service inside an icebox (inside an icegrid). It's the only service in the icebox. Activation mode is set to on demand. So when the first call to a proxy of the service is called, I would expect the icebox to be started and then the call serviced. Now that happens, except sometimes it doesn't. Most of the times it fails when the icegrid is started up just after a reboot.

Sometimes the activation of the icebox timed out. I pinned it down to the adapter->activate() call. So sometimes it hangs. What would be possible causes for the adapter's activate call to fail? The fact that it doesn't fail all the time seems to indicate it's a timing issue. but I can't figure out what.

I'm running Ice 3.4.1 on windows 7. The service is a c++ icebox service, built using VS 2010.

Thanks
Budyanto

Comments

  • mes
    mes California
    Hi Budyanto,

    When a server or service is configured with a locator (as yours is), activating an object adapter internally causes Ice to attempt to register the adapter's endpoints with the locator registry. Consequently, the call to adapter->activate() does not return until this internal synchronous communication with the registry completes or fails.

    You can confirm that your service is in this process by enabling the following properties in your service's configuration:
    • Ice.Trace.Network
    • Ice.Trace.Protocol
    • Ice.Trace.Locator
    Feel free to post the log output here and we'll be happy to take a look at them.

    Now, the next question is, Assuming your service is blocked here, what could cause it to hang?

    We can assume that the IceGrid registry is already active; if it wasn't active, the internal communication would fail with an error such as ConnectionRefusedException. Furthermore, the only way that a service could be activated on demand is if the registry had received an inquiry and instructed the node to start it.

    A possible reason for a lengthy delay is if there is a network issue in the locator configuration, such as an invalid hostname or a misconfigured DNS. You should review the value of Ice.Default.Locator that the node configures for the service to see if it contains a reasonable hostname or IP address. The log output from Ice.Trace.Network will also show the address that Ice is attempting to use while contacting the registry. If the host on which your service runs has multiple network adapters, or the registry hostname resolves to multiple addresses, a network issue could explain the apparent randomness of the hangs.

    Best regards,
    Mark
  • Hi Mark,

    Thanks for your response.

    So I did turn on the trace for Network, Protocol, and Locator on the service.

    Unfortunately when the activation fails, there's nothing in the stderr or stdout for that IceBox. This seems strange in itself. Because there are calls to the communicator's logger and also direct std::cout calls before the call to adapter->activate(). Shouldn't we be able to see those logs?

    I had the node configured to redirect stderr to stdout initially. And I turned it off. Still nothing on the stderr or stdout of the IceBox.

    What led me to suspect that it's the activate() call is that we don't see the hang when it is commented out. When I uncomment it, then it starts hanging intermittently. Mostly after a reboot.

    I also checked the Ice.Default.Locator value in the config folder of the server. It looks correct. It is the same as what's specified in the node configuration. And it's using 127.0.0.1 address (which is what it's supposed to be).

    Thanks
  • I turned the traces on the node.

    The log is attached.
    node.txt 180.2K
  • mes
    mes California
    Hi,

    I don't see anything unusual in the node output.

    I'm not sure why you're not seeing the log output for the IceBox service. I just experimented with our demo/IceGrid/icebox example. I modified the IceBox service configuration to enable the tracing properties and ran a collocated registry/node process on the command line. The output appeared in the command prompt window as expected.

    When a hang occurs, have you tried attaching to the IceBox process with a debugger? It might be helpful for us to see the stack traces of all threads in this process.

    Regards,
    Mark
  • No..haven't tried attaching with a debugger yet. Maybe I'll try that next.

    Is there a way to configure thread pools for IceGrid? Could it be that when the service is trying to call activate(). IceGrid runs out of threads to process it?
  • mes
    mes California
    Is there a way to configure thread pools for IceGrid? Could it be that when the service is trying to call activate(). IceGrid runs out of threads to process it?
    That's a good question to ask, but I think it's unlikely in this case. The registry's default configuration for its object adapters allows them to grow beyond a single thread when required. It's usually not necessary to change the thread settings of the registry or node.

    Regards,
    Mark
  • So since I'm not seeing the output from the logs...I'm starting to think it might not be the activate call. Se just to make sure I tried the experiment I did before and comment out the activate call in the start method again.

    So this time around I did see it's still timing out sometimes. So in the node output we did see that the node tries to activate the service.

    What could cause the start method of the service to not get called sometimes?
  • mes
    mes California
    You're saying that the node has started icebox.exe, but the service's start method is not being called? In this case, we would really need to see the stack traces for all threads in icebox.exe to get a better idea of what's happening.

    Regards,
    Mark