Archived

This forum has been archived. Please start a new discussion on GitHub.

Recovery after router failure

Hi,

I'm trying to figure out how a client should recover after a failure of the Glacier2 router. It looks like this causes a client-side deadlock under some circumstances. Here is my setting:

0. Everything is Ice 3.3.0 under Linux. The client is Java.

1. The client sets up a communicator with a default router, then creates a session and a routed proxy to some remote object.

2. The client calls a method on that proxy to make sure it works.

3. For whatever reason, the router is restarted.

4. A separate thread on the client pings the session proxy once in a while to make sure the router is alive. When the router returns after the failure, that thread automatically creates a new session. It DOES NOT re-create any routed proxies that may exist on the client.

5. The client tries to use the old proxy again. That call blocks forever. If Ice.Trace.Retry is enabled on the client, I can see an infinite number of messages like that:

[ 2/26/09 13:54:05:035 Retry: retrying operation call to add proxy to router
Ice.ObjectNotExistException
id.name = "TimedCounter"
id.category = ""
facet = ""
operation = "ice_add_proxy" ]

I think I understand what's happening: the router doesn't have that TimedCounter in the routing table and refuses to route the call to it. My questions are:

1. Is it possible to reuse existing routed proxies on the client after a failure of the router?

2. If it's not possible, how to avoid deadlocks when the client accidentally calls a method on such a disabled proxy.

3. In general, what would you suggest to do on the client in order to recover from a router failure?

Thanks,

~ Andrey

Comments

  • benoit
    benoit Rennes, France
    Hi Andrey,

    This is a bug in the Ice runtime: the cached routing table in the client Ice runtime isn't cleared when the new session is re-created (the communicator has currently no way to figure out when the session is destroyed/re-created).

    As a workaround, try registering an object adapter with the router in your client and before re-creating the session, destroy this object adapter. This will ensure that the cached routing table is cleared. For example:
    Glacier2::SessionPrx session = router->createSession("dummy", "dummy");
    Ice::ObjectAdapterPtr adapter = communicator()->createObjectAdapterWithRouter("RouterAdapter", router);
    ..
    try
    {
         router->destroySession();
    }
    catch(const Ice::ConnectionLostException&)
    {
    }
    adapter->destroy(); // This will clear the routing table associated to the object adapter's router.
    // Now, you should be able to re-create again the session. 
    

    We'll look into fixing this bug in an upcoming Ice release!

    Cheers,
    Benoit.
  • Hi Benoit,

    I'm sorry, it doesn't seem to be working. Event with an adapter, which I destroy and re-create (with a different name) after the router failure, attempts to do something on proxies created with the old router cause an infinite number of messages:

    [ 4/15/09 14:55:01:100 Retry: retrying operation call to add proxy to router
    Ice.ObjectNotExistException
    id.name = "timedcounter"
    id.category = ""
    facet = ""
    operation = "ice_add_proxy" ]

    Here is a part of my code:
    Communicator comm = Util.initialize( INIT );
    
    ObjectPrx routerBase = comm.stringToProxy( ROUTER ).ice_timeout( TIMEOUT );
    RouterPrx routerPrx = RouterPrxHelper.uncheckedCast( routerBase );
    comm.setDefaultRouter( routerPrx );
    
    SessionPrx session = routerPrx.createSession( "foo", "bar" );
    
    ObjectAdapter adapter = comm.createObjectAdapterWithRouter( "adapter0", routerPrx );
    adapter.activate();
    
    ObjectPrx itemBase = comm.stringToProxy( "timedcounter @ AUX" );
    DataItemPrx item = DataItemPrxHelper.checkedCast( itemBase );
    
    // Router restart is detected
    
    try {
        session.destroy();
    } catch (Throwable ex) {
        ex.printStackTrace();
    }
    try {
        adapter.destroy();
    } catch (Throwable ex) {
        ex.printStackTrace();
    }
    
    session = routerPrx.createSession( "foo", "bar" );
    adapter = comm.createObjectAdapterWithRouter( "adapter1", routerPrx );
    adapter.activate();
    

    I also tried to re-create routerPrx, set new default router proxy on the communicator and on the item -- nothing worked.

    Any idea on how to clear the routing table?

    Thanks,

    ~ Andrey
  • benoit
    benoit Rennes, France
    Hi Andrey,

    Ok, yes this won't work if you don't also re-create the proxy to ensure it's associated to the new (and empty) routing table. For now, the best is to destroy and re-create the communicator and proxies for each new Glacier2 session. We are looking into improving this for the next major Ice release.

    Cheers,
    Benoit.