IceGrid Registry failing? ObjectNotExistException in QueryPrx.checkedCast

dstn · February 2009

Hi,

I am getting an intermittent problem in my IceGrid application. This is Ice 3.3.0, Python client and C++ server, for what it's worth.

The relevant client code looks like this:

	def find_all_workers(self):
		q = self.ice.stringToProxy('MyIceGrid/Query')
		q = IceGrid.QueryPrx.checkedCast(q)
		workers = q.findAllObjectsByType('::MyIce::Worker')

and I am getting an exception in the checkedCast() call:

ObjectNotExistException: exception ::Ice::ObjectNotExistException
{
    id =
    {
        name = Query
        category = MyIceGrid
    }
    facet =
    operation = ice_add_proxy
}

This seems to be a problem in the icegridregistry ... can anyone suggest what I can do to help diagnose this problem better? I'm not seeing any error messages in the registry or glacier2 logs. The icegridregistry is not dying.

I am using a Glacier2 router with its internal session manager. The client checks whether the session is live and re-opens it if it has expired. This doesn't seem to be related to whether or not this checkedCast exception occurs.

The client is multi-threaded, but I sometimes get this problem even if it has just one thread running.

Thanks for your help,

dustin.

benoit · February 2009

Hi,

Which Ice version do you use? Can you tell us a bit more about the configuration of your Glacier2 router and python Ice client? Do you set Glacier2.RoutingTable.MaxSize or disable retries in the client for instance (with Ice.RetryIntervals=-1)? Is your client invoking on many different proxies?

Thanks,

Cheers,
Benoit.

dstn · February 2009

benoit wrote: »

Which Ice version do you use?

3.3.0

Glacier2 config:

Glacier2.InstanceName=Glacier2
Ice.Default.Locator=SolverIceGrid/Locator:tcp -h hydra.local -p 4061
Glacier2.Client.Endpoints=tcp -h hydra.local -p 9998
Glacier2.Server.Endpoints=tcp -h hydra.local -p 9999
Glacier2.SessionTimeout=60
Glacier2.Client.ForwardContext=1
Glacier2.Server.ForwardContext=1
Glacier2.PermissionsVerifier=Glacier2/NullPermissionsVerifier

Client config:

Ice.Default.Router=Glacier2/router:tcp -h localhost -p 4063
Ice.ACM.Client=0
Ice.RetryIntervals=-1
Ice.Default.Locator=SolverIceGrid/Locator:default -p 4061 -h hydra.local
Callback.Client.Endpoints=tcp
Callback.Router=Glacier2/router:tcp -h localhost -p 4063

I do have RetryIntervals=-1, but for a good reason: the grid computer I am using is on the other side of a firewall from my client, so I have to tunnel Glacier2 over an ssh connection. Thus the client is making connections to localhost, and these connections get tunneled over ssh, then on the other side ssh makes a localhost connection to Glacier2. How could such a connection fail?

Thanks for the hint, though, I will try setting RetryIntervals and see if it helps.

benoit wrote: »

Is your client invoking on many different proxies?

I have one icegridregistry, one Glacier2, one client, and 20 compute servers which provide about 5 different replicated services. The client will end up making requests to all 20 servers.

Thanks,
dustin.

dstn · March 2009

Set the client RetryIntervals: it just keeps failing:

[ 03/01/09 12:36:13.723 Retry: retrying operation call in 1000ms because of exception
  Outgoing.cpp:422: Ice::ObjectNotExistException:
  object does not exist:
  identity: `SolverIceGrid/Query'
  facet: 
  operation: ice_add_proxy ]
[ 03/01/09 12:36:14.741 Retry: retrying operation call in 2000ms because of exception
  Outgoing.cpp:422: Ice::ObjectNotExistException:
  object does not exist:
  identity: `SolverIceGrid/Query'
  facet: 
  operation: ice_add_proxy ]
[ 03/01/09 12:36:16.762 Retry: retrying operation call in 3000ms because of exception
  Outgoing.cpp:422: Ice::ObjectNotExistException:
  object does not exist:
  identity: `SolverIceGrid/Query'
  facet: 
  operation: ice_add_proxy ]
[ 03/01/09 12:36:19.782 Retry: retrying operation call in 4000ms because of exception
  Outgoing.cpp:422: Ice::ObjectNotExistException:
  object does not exist:
  identity: `SolverIceGrid/Query'
  facet: 
  operation: ice_add_proxy ]
[ 03/01/09 12:36:23.801 Retry: retrying operation call in 5000ms because of exception
  Outgoing.cpp:422: Ice::ObjectNotExistException:
  object does not exist:
  identity: `SolverIceGrid/Query'
  facet: 
  operation: ice_add_proxy ]
[ 03/01/09 12:36:28.819 Retry: cannot retry operation call because retry limit has been exceeded
  Outgoing.cpp:422: Ice::ObjectNotExistException:
  object does not exist:
  identity: `SolverIceGrid/Query'
  facet: 
  operation: ice_add_proxy ]

Is there any other logging or tracing I can turn on to help track this down?

cheers,
dustin.

benoit · March 2009

Hi Dustin,

You could be running into the same issue as the one mentioned on this thread.

Do you re-create Glacier2 sessions with the same communicator in your client and do you re-create the Callback object adapter for each new session?

Cheers,
Benoit.

dstn · March 2009

benoit wrote: »

Do you re-create Glacier2 sessions with the same communicator in your client and do you re-create the Callback object adapter for each new session?

Yes; no. Ugh! I will try that.

I would like to see a chapter in the manual about how to write a practical robust client. It seems to be much more complicated than any of the example clients. I would think that my situation is somewhat common: I have a compute cluster, and I want to connect it to a web service. The web service is the Ice client, so it is not acceptable for it to just print an error message and fail: it should retry.

Thanks for the hint.

cheers,
dustin.

matthew · March 2009

I think this is really out of scope of Ice. It is very much application dependent what must be done to make a client robust.

dstn · March 2009

matthew wrote: »

I think this is really out of scope of Ice. It is very much application dependent what must be done to make a client robust.

I disagree: I'm talking about how a client should recover from failure of different Ice components:

-how to detect that Glacier2 has expired my session
-what to do when Glacier2 expires my session
-how to detect and recover from a registry failure
-how to detect and recover from a node failure
-how to use pings to keep a callback channel open

This thread is a perfect example. Nowhere in the manual does it say that when Glacier2 expires my session that I have to recreate all my object adapters.

cheers,
dstn

matthew · March 2009

dstn wrote: »

I disagree: I'm talking about how a client should recover from failure of different Ice components:

-how to detect that Glacier2 has expired my session

This is the same as any other Ice object. If you get an ObjectNotExist, then the session is gone.

-what to do when Glacier2 expires my session

Re-create the session. If you have any callback objects you also need to create, or re-create the object adapter, and re-register any callback objects. Note that re-registering any callback objects is a clear consequence of reestablishing the session, as the callback object identity is based on the session.

-how to detect and recover from a registry failure
-how to detect and recover from a node failure

There is no specific advice to give here, as it depends on your application and what you want to do. In both cases, if the registry or the node is no longer active you may or may not have a failure when you try to make an invocation depending on the state of your locator cache.

-how to use pings to keep a callback channel open

This is described in section 39.3.7 of the Ice manual.

This thread is a perfect example. Nowhere in the manual does it say that when Glacier2 expires my session that I have to recreate all my object adapters.

Yes, this is a deficiency, we'll address that.

dstn · March 2009

This is probably all deeply obvious to someone as familiar with Ice as you are, but speaking as a user -- who does not read the manual cover-to-cover but skims around looking for the relevant parts for the particular application at hand -- it is surprising how much is required to make a client robust. I think a chapter or section on the topic would help. That's just my opinion, take it or leave it.

cheers,
dstn.

tjorven · May 2009

+1

I agree. This is not about blameshifting, it's about how to communicate how to use the product I.C.E. in a proper way. Sort of an "ICE Cookbook" if you will, would be ideal.

regards,

matthew · May 2009

In all fairness we've devoted significant time to producing materials to help you understand how to use Ice. While a cookbook would be nice, the newsletter archive, while some of the articles are now somewhat dated, is a already cookbook of sorts.

The Ice Manual
The newsletter articles (see the article index at http://www.zeroc.com/newsletter/article_index.html for details).
The chat demo (http://www.zeroc.com/chat/index.html)
White papers & articles (http://www.zeroc.com/articles/index.html)
Screencasts (http://www.zeroc.com/doc/screencasts.html). Not many yet, but we're producing more.
FAQs (http://www.zeroc.com/faq/index.html)

If you cannot find the answer in the above material, there is this forum where we offer limited free support during your evaluation process. On top of that, we also offer commercial support, consulting, and training.

That being said, if you have specific article ideas, or topics that you would like to see covered in more detail please make concrete suggestions.

tjorven · May 2009

Community and mindshare

These are intertwined, so although off-topic, I bring it up here.

I think it would be beneficial for all, developers and ZeroC shareholders alike, with a more open process. For instance, if I wanted to contribute a Perl "port" of ICE, where do I start?

Likewise, if I write an FDL or Creative Commons licensed chapter for a cookbook, where do I submit it to you?

I recognize that maintaining a community site would cost you a significant amount, but I also believe ICE is on the verge of becoming a de facto standard, from which ZeroC can only benefit. I see it all the time now in all sorts of projects.

Just some food for thought. Not a specific suggestion for documentation, but a more abstract kind of feedback...

best regards,
Jakob

Archived

IceGrid Registry failing? ObjectNotExistException in QueryPrx.checkedCast

Comments

Categories