Archived

This forum has been archived. Please start a new discussion on GitHub.

IceGrid: uncaught exception during process proxy registration

I have an IceGrid Node/Registry and a bunch of IceGrid Nodes.
Linux, Ice 3.3.0

After a few days of testing one of the Nodes reported that a server segfaulted on startup, apparently after throwing an uncaught exception:

Outgoing.cpp:422: Ice::ObjectNotExistException:
object does not exist:
identity: `IceGrid/LocatorRegistry-Master'
facet:
operation: setServerProcessProxy

The system is still in that state and if I try to run that (or any other) server manually, using the config file generated by IceGrid, I get the same exception about half the time.

If I comment out the Ice.Admin.ServerId property and run the server manually, then about half the time my server will catch the following exception while trying to call adapter.activate()

Outgoing.cpp:422: Ice::ObjectNotExistException:
object does not exist:
identity: `IceGrid/LocatorRegistry-Master'
facet:
operation: setAdapterDirectProxy

Otherwise, the Master node/registry appears to be fine, i.e. I can connect to it with icegridadmin.

I guess there are 2 questions:
1. Is there anything else I can do to debug the current registry state?
2. Can anything be done about the uncaught exception caused by the process proxy registration? Does it result from a particular call to the Ice public API or is it part of the ObjectAdapter initialization?

Thanks a lot,
Alex

Comments

  • Quick follow up.

    I've tried to catch exceptions around communicator()->createObjectAdapter() without success, the exception is coming from somewhere else.
    My guess is that it's thrown when creating the Admin adapter somewhere inside Ice::Application.

    With the Admin adapter disabled, I checked if adapter activation succeeds eventually. Apparently, it does not: 50% of the time it succeeds on the first try, but when it does not (throws ObjectNotExistException) it will never succeed no matter how many times it tries.
  • benoit
    benoit Rennes, France
    Hi Alex,

    I don't see how this could occur or why this happens only half of the time. It sounds like the server might be sometime sending the request to a bogus endpoint... Could you perhaps enable network and protocol tracing on the server and try to see if this exception only occurs for a specific endpoint/IP address? The server should be trying to connect to the endpoints defined by the IceGrid.Registry.Server.Endpoints property of your IceGrid registry when making these requests.

    The admin adapter is created on communicator initialization. You can delay its creating by setting Ice.Admin.DelayCreation=1. The admin adapter will then be created when you call explicitly Ice::Communicator::getAdmin().

    Could you also detail a little more your deployment, do you deploy multiple IceGrid registries for instance?

    Cheers,
    Benoit.
  • Hi Benoit,
    thanks for looking into this.

    You are right, it has something to do with alternative between 2 endopoints: one correct, one incorrect.

    There's one Regisry (collocated with a node) running on a host base (which resolves to 10.0.1.1).

    The Registry's config:
    IceGrid.Registry.Client.Endpoints=default -h localhost -p 12000:default -h base -p 12000
    IceGrid.Registry.Server.Endpoints=default -h localhost:default -h base
    IceGrid.Registry.Internal.Endpoints=default -h localhost:default -h base

    The other host has IP 10.0.1.103
    (The configuration of the second Node is not important, since I can reproduce the error by running the server manually).

    all servers look for the Registry here:
    Ice.Default.Locator=IceGrid/Locator: default -h base -p 12000

    Traces with Network=2, with the Admin interface enabled, when everything worked out:

    [ 05/23/09 18:18:34.182 myserver: Network: accepting tcp connections at 0.0.0.0:57819 ]
    [ 05/23/09 18:18:34.185 myserver: Network: tcp connection established
    local address = 10.0.1.103:32897
    remote address = 10.0.1.1:12000 ]
    [ 05/23/09 18:18:34.187 myserver: Network: tcp connection established
    local address = 127.0.0.1:53917
    remote address = 127.0.0.1:60034 ]
    [ 05/23/09 18:18:34.188 myserver: Network: stopping to accept tcp connections at 0.0.0.0:57819 ]
    [ 05/23/09 18:18:34.188 myserver: Network: closing tcp connection
    local address = 10.0.1.103:32897
    remote address = 10.0.1.1:12000 ]
    [ 05/23/09 18:18:34.188 myserver: Network: closing tcp connection
    local address = 127.0.0.1:53917

    Same thing, not working:

    [ 05/23/09 18:18:34.182 myserver: Network: accepting tcp connections at 0.0.0.0:57819 ]
    [ 05/23/09 18:18:34.185 myserver: Network: tcp connection established
    local address = 10.0.1.103:32897
    remote address = 10.0.1.1:12000 ]
    [ 05/23/09 18:18:34.187 myserver: Network: tcp connection established
    local address = 127.0.0.1:53917
    remote address = 127.0.0.1:60034 ]
    [ 05/23/09 18:18:34.188 myserver: Network: stopping to accept tcp connections at 0.0.0.0:57819 ]
    [ 05/23/09 18:18:34.188 myserver: Network: closing tcp connection
    local address = 10.0.1.103:32897
    remote address = 10.0.1.1:12000 ]
    [ 05/23/09 18:18:34.188 myserver: Network: closing tcp connection
    local address = 127.0.0.1:53917
    remote address = 127.0.0.1:60034 ]
    myserver: Outgoing.cpp:422: Ice::ObjectNotExistException:
    object does not exist:
    identity: `IceGrid/LocatorRegistry-Master'
    facet:
    operation: setServerProcessProxy


    127.0.0.1:60034 appears to be the 2nd endpoint of the Registry which of course is not reachable from the outside.

    Based on what you've said, setting IceGrid.Registry.Server.Endpoints to 'localhost' is not going to work for servers on the other host.

    I have no idea how it managed to work before, the setup has not changed in a while.

    Thanks,
    Alex
  • benoit
    benoit Rennes, France
    Hi Alex,

    It looks like you have copied the same traces twice. This only shows up now because most likely you have now a local server listening on the port of the registry server endpoints. This server accepts the connection but of course it can't find the locator registry object so you end-up with an Ice::ObjectNotExistException in your server. If no Ice servers were running on this port, the connection establishment would fail and always be re-tried on the accessible endpoint.

    Cheers,
    Benoit.
  • It looks like you have copied the same traces twice.

    Hmm. I think the traces are different: it works when it talks to the external endpoint:
    local address = 10.0.1.103:32897
    remote address = 10.0.1.1:12000 ]
    and fails when it tries to talk to the local one:
    local address = 127.0.0.1:53917
    remote address = 127.0.0.1:60034 ]
    This only shows up now because most likely you have now a local server listening on the port of the registry server endpoints. This server accepts the connection but of course it can't find the locator registry object so you end-up with an Ice::ObjectNotExistException in your server. If no Ice servers were running on this port, the connection establishment would fail and always be re-tried on the accessible endpoint.

    You were absolutely right. This was a very rare event: one of my own servers on the same host happened to have the same port (60034) as the one used by the Registry on its own host. The root cause is a misconfiguration of the Registry endpoints.

    Thanks for your help in tracking this down,
    Alex