Using pinned connection with Random EndPoint selection strategy

aozarov · April 2008

If I disable locator cache & use Random endpoint selection strategy can I still get a reference to a "pinned" proxy (that will delegate any further calls to the same endpoint that was recently picked)?
I am using IceGrid with a service that is bound to a replicated adapter (and runs multiple instances/nodes). Client are using indirect lookup retrieval (and refresh the lookup every minute). Most of the service calls are stateless expect one, a cancel method, (which is suppose to cancel the previous call - and therefore should be directed to the same instance executing the original call). Any suggestions?

BTW, are there any plans to extend the current EndPoint selection strategies (e.g. by using callbacks and let the client/caller pick)?

Thanks,
Arie.

aozarov · April 2008

In addition to my question above I wondered about the following:
1. In a case where I don't use named connection or proxy with timeout or connection per thread will Ice maintain only once connection per endpoint (on the client-side) and multiplex any requests using the same connection (when that endpoint is picked)?
2. How the connection pool per end point is maintained in the case of connection per thread (are the connection re-used and does the pool shrink)?
3. when creating a named proxy or connection with time-out I assume Ice establish a new connection for the proxy. is that proxy "pinned" to the new connection regardless the end-point selection strategy (random, round-robin,.. - when connection caching is disabled)? Also what is the caching policy for such connection (assuming the same named proxy will be created later on, or proxy with the same timeout settings).
4. when refreshing the set of endpoints from the grid registry are old connection with endpoints that exists in both the old and the new set (the union set between old & new) preserved?

Thanks
Arie.

benoit · April 2008

Hi,

To answer your first post, by default connections are cached ("pinned") with the proxy unless you turn off connection caching with the ice_cacheConnection(false) proxy method. Note however that even if the connection is cached with the proxy, the Ice runtime might silently change this connection under a number of circumstances (if client ACM closes the connection, if server ACM closes the connection, if the network connection is interrupted for some reasons, etc).

For more information on Ice connection management see Chapter 33 in the Ice manual. I also recommend reading Matthew's article from the newsletter issue #24.

Regarding your cancel operation, I would recommend to design your Slice interfaces in such a way that you don't have to rely on Ice connection management to make sure the request is sent to the correct replica. For example, you could do something like:

    // Slice    
    interface CancelServiceRequest
    {
          void cancel();
    };

    interface MyService
    {
           CancelRequest* doWork();
    };

The proxy to the CancelServiceRequest interface returned by the doWork method would be a proxy that points to the replica which executed the doWork invocation. Your client would use this proxy to invoke the cancel operation.

aozarov wrote: »

1. In a case where I don't use named connection or proxy with timeout or connection per thread will Ice maintain only once connection per endpoint (on the client-side) and multiplex any requests using the same connection (when that endpoint is picked)?

Correct. Note however that even if you use timeouts/thread-per-connection, the connection might be shared between proxies (if the proxies are using the same timeout for example).

2. How the connection pool per end point is maintained in the case of connection per thread (are the connection re-used and does the pool shrink)?

I'm not sure what you mean by the pool shrinking. If you use thread-per-connection, the connection might be re-used by proxies which have a compatible endpoint and specified the thread-per-connection setting.

Btw, note that thread-per-connection has been removed from Ice 3.3b.

3. when creating a named proxy or connection with time-out I assume Ice establish a new connection for the proxy. is that proxy "pinned" to the new connection regardless the end-point selection strategy (random, round-robin,.. - when connection caching is disabled)? Also what is the caching policy for such connection (assuming the same named proxy will be created later on, or proxy with the same timeout settings).

The connection is cached ("pinned") with the proxy only if connection caching is enabled (the default). Whether a connection is cached with a proxy or not is not related to the endpoint selection strategy.

4. when refreshing the set of endpoints from the grid registry are old connection with endpoints that exists in both the old and the new set (the union set between old & new) preserved?

Yes, connections remain open and might be re-used by proxies until ACM closes them.

Again, I recommend checking out the manual connection management chapter and Matthew's newsletter. It should answer with more details all your questions above. If you still have any questions after reading these, don't hesitate to ask though!

Cheers,
Benoit.

aozarov · April 2008

Thanks benoit.

It is good to know that when I return a callback of a replicated service the callback is pinned to the called service but I think I am going to avoid such solution for now (unless I have to) for the following reasons:

1. Current implementation is delegated asynchronously via AMI which I think will be more efficient (resources/speed) than passing a client callback to get the results (and emulate AMI) and a service callback to enable cancel.

2. I would like to avoid managing the returned callbacks from the servant in the server side (and releasing them if clients has not done so after certain timeout - or maybe use IceFreeze).

Also, I did read both references (thanks) but still have the following question:

I think I understand how proxies with timeouts or with connection id are
using their connections when the connection is cached, but I am not sure
what happens when connection is not cached by the proxy. Will a new
connection (associated with a connection-id or timeout) be created for every
picked endpoint if one does not already exists?

The main problem that I have with cached connection is that AFAK clients only consider the current available services/end-points and will not consider services that were added after they cached the connection (and re-distribute the load, unless their current connection is broken or was removed by ACM).

Is there a way for me to force end point selection after the endpoint set has changed? should/can I do it myself by polling the proxy for its set of endpoints (calling ice_getEndpoints) and upon change (locatorChatTimeout set to X minutes) call ice_connectionCached(false).ice_connectionCached(true)?
Are this methods (or any other ObjectPrx methods) thread-safe?

Are there any plans to delegate EndPoint selection to the user (via callbacks). Doing that will enable request based partitioning and even more selective end point selection (e.g. filter endpoint that represent and older version of a service - assuming more binding based information will be exposed via the endpoint).

Thanks,
Arie.

benoit · April 2008

Hi,

aozarov wrote: »

Thanks benoit.

It is good to know that when I return a callback of a replicated service the callback is pinned to the called service but I think I am going to avoid such solution for now (unless I have to) for the following reasons:

1. Current implementation is delegated asynchronously via AMI which I think will be more efficient (resources/speed) than passing a client callback to get the results (and emulate AMI) and a service callback to enable cancel.

Hmm, I didn't suggest to use a client callback. I suggested to split your service interface in two distinct interfaces, one with replicated operations and another one with non-replicated operations. For example if your interface was:

   interface Service
   {
         void doWork();
         void cancelWork();
   };

You could replace it with:

   interface Service
   {
         void doWork();
         void cancelWork();
   };

   interface ReplicatedService
   {
         Service* getService();
   };

Your service implements both interfaces. Your client first obtains the service interface by invoking getService on a replicated proxy and it can then invoke the doWork/cancelWork methods on the returned proxy and be sure that requests will be sent to the same server instance.

If the doWork/cancelWork invocations fail, your client can obtain a new service object by invoking getService again.

2. I would like to avoid managing the returned callbacks from the servant in the server side (and releasing them if clients has not done so after certain timeout - or maybe use IceFreeze).

Also, I did read both references (thanks) but still have the following question:

I think I understand how proxies with timeouts or with connection id are
using their connections when the connection is cached, but I am not sure
what happens when connection is not cached by the proxy. Will a new
connection (associated with a connection-id or timeout) be created for every
picked endpoint if one does not already exists?

If the endpoint selection type is "Random", yes, a connection will eventually be created for each endpoint if it doesn't already exist. If the endpoint selection type is "Ordered", only a connection for the first endpoint will be created if it doesn't already exist (and for the second endpoint if connecting to the first endpoint fails, etc).

The main problem that I have with cached connection is that AFAK clients only consider the current available services/end-points and will not consider services that were added after they cached the connection (and re-distribute the load, unless their current connection is broken or was removed by ACM).

Correct. There's currently no way to specify for how long the connection is cached with the proxy.

Is there a way for me to force end point selection after the endpoint set has changed? should/can I do it myself by polling the proxy for its set of endpoints (calling ice_getEndpoints) and upon change (locatorChatTimeout set to X minutes) call ice_connectionCached(false).ice_connectionCached(true)?
Are this methods (or any other ObjectPrx methods) thread-safe?

Yes, these methods are thread-safe, see also this FAQ. You can indeed create a new proxy to clear the cached connection and force Ice to pick a new connection and consider the new set of endpoints retrieved from the locator.

I recommend considering changing your interfaces though. With the two interfaces mentioned above, you could simply configure the proxy to the ReplicatedService object to cache the connection and with a given locator cache timeout to ensure that invocations on the replicated service are distributed on all the replicas.

For the proxy returned by the getService method, you can just use the default proxy (it doesn't really matter if you cache or not the connection, this proxy points to a single service instance).

With this design, you don't have to rely on Ice connection management to make sure the doWork/cancelWork invocations are sent to the same service instance.

Are there any plans to delegate EndPoint selection to the user (via callbacks). Doing that will enable request based partitioning and even more selective end point selection (e.g. filter endpoint that represent and older version of a service - assuming more binding based information will be exposed via the endpoint).

No, there's no plans to add this at this point. We could consider this if you have a commercial interest in such a feature.

Cheers,
Benoit.

aozarov · April 2008

Thanks again!

I really liked your suggestion above (it is cleaner & more flexible) but have 3 questions:

1. How can my Servant implement the two interfaces using Java (unless one interface extends the other)? Doesn't a class (which my Servant needs to subclass) is generated for each?

2. I assume that the call to ice_getEndpoints applies to both approaches and should reflect any change in the endpoints after a lookup refresh?

3. Do you see any other issues with the "ice_connectionCached(false).ice_connectionCached(true)" approach apart from not being as flexible or as clean as your suggestion? One advantage this one has is that I don't need to care about failed connection as that should be done transparently by Ice framework, no? If I pick your solution which exception should I consider as an indicator to a bad connection?

Thanks a lot,
Arie.

benoit · April 2008

aozarov wrote: »

Thanks again!

I really liked your suggestion above (it is cleaner & more flexible) but have 3 questions:

1. How can my Servant implement the two interfaces using Java (unless one interface extends the other)? Doesn't a class (which my Servant needs to subclass) is generated for each?

You can use delegation for this purpose: 2 separate servant classes delegate to the implementation class.

You can either implement these servant classes yourself or use Java tie classes. The "Master–Slave Replication with Ice" article from the newsletter issue #23 shows how to do this with tie classes. See also here in the Ice manual for more information on Java tie classes.

2. I assume that the call to ice_getEndpoints applies to both approaches and should reflect any change in the endpoints after a lookup refresh?

The ice_getEndpoints proxy method only returns endpoints for a direct proxy. Indirect proxies do not have endpoints so an empty list of endpoints is returned. There's currently no way to retrieve the endpoints obtained from the Ice locator for an indirect proxy.

3. Do you see any other issues with the "ice_connectionCached(false).ice_connectionCached(true)" approach apart from not being as flexible or as clean as your suggestion? One advantage this one has is that I don't need to care about failed connection as that should be done transparently by Ice framework, no? If I pick your solution which exception should I consider as an indicator to a bad connection?

Whether or not the retry is done transparently by the Ice runtime depends on whether or not the operations are idempotent. See the section about "At-Most-Once" semantics here.

And because the Ice runtime does retry automatically if it's safe to do so doesn't mean that you don't need to catch exceptions

. Your application might still get exceptions if the retry limit has been reached for example.

Also, would it really be ok for the cancelWork operation to be automatically retried by the Ice runtime on another replica than the one used to invoke doWork? With your original approach, I don't believe you can prevent this if for example the connection is closed shortly before the cancelWork call.

You can just catch Ice::LocalException exceptions. See 4.10.4 Ice Run-Time Exceptions in the Ice manual for more information on local exceptions which might be raised by the client.

Cheers,
Benoit.

aozarov · April 2008

In either approach I thought to change the proxy/connection only when I detect new endpoints (to better distribute the load) unless the connection is broken.
If I can't get the list of indirect proxy endpoints is there any way for me to be notified upon a change? Should I interact directly with IceGrid locator proxy and periodically call QueryPrx#findAllReplicas to detect the changes? Will that work for me? Any better way? Is there a way for me to know the location this proxy is pointing at as well as checking if two proxies are connected to the same service instance?

BTW, I was not considering retrying an invocation call, but just the ice transparency in detecting connection failures and associating the proxy with a different valid connection, if one is available, the next call. I understand I can do that myself upon catching Ice::LocalException but was not sure if that is too wide catch-all (other than business errors) and should I really discard the connection in such case or maybe I should consider a more specific "connection failure" local exception.

Yes, In my case it would be OK to call cancel on another instance (though not desirable is harmless. action will be discarded).

Thanks!
Arie.

benoit · April 2008

aozarov wrote: »

In either approach I thought to change the proxy/connection only when I detect new endpoints (to better distribute the load) unless the connection is broken.
If I can't get the list of indirect proxy endpoints is there any way for me to be notified upon a change? Should I interact directly with IceGrid locator proxy and periodically call QueryPrx#findAllReplicas to detect the changes? Will that work for me? Any better way? Is there a way for me to know the location this proxy is pointing at as well as checking if two proxies are connected to the same service instance?

Assuming you use the interfaces mentioned above, you can compare the adapter IDs of the proxies returned by the getService method of the ReplicatedService interface to figure out if the proxies point to the same service.

You could call findAllReplicas to detect new replicas but I don't see why you would need to do this. To make sure that the getService method eventually returns proxies for newly deployed replicas, you should configure the proxy of the ReplicatedService interface as follow:

    // C++
    ReplicatedServicePrx proxy = ...
    proxy = proxy->ice_connectionCached(false)>ice_locatorCacheTimeout(60);

With these settings, the Ice runtime will:

Use per-request load balancing for the getService method. That is, each getService invocation will be sent to a random replica endpoint.
Retrieve new endpoints from the locator if the previously retrieved endpoints are older than 60s. This makes sure that your client will eventually detect and use new replicas (and won't try to use anymore old replicas).

Cheers,
Benoit.

aozarov · April 2008

Thanks Benoit,

Yes, I am aware of the standard option of disabling proxy caching and a locator cache timeout but what I am looking for is to "pin" requests using the same proxy to the same/initially picked service until new replica is detected or the connection is broken. As I understand I can't do the provided suggestion on the proxy returned from ReplicatedService#getService(), right?

benoit · April 2008

Hi,

Yes, that's correct. But why don't you simply call getService() for each doWork() invocation? This would be much simpler and guarantee that you'll always eventually use new replicas. This also allows changing the load balancing type in IceGrid. Are you concerned about the extra getService() invocation? It seems to me that this extra call will probably be insignificant compared to the doWork() call.

Cheers,
Benoit.

aozarov · May 2008

Thanks Benoit, your suggestions are very helpful!

Actually, I am a little concern about the overhead of a network call for each request (we plan to have 100K requests per second with an avg req/res time of 10ms). Also this solve one part of the problem (the ability to cancel requests), the other part is overcoming the penalty of keep sending requests and failing on connect timeout for 1/N of the requests when a server is down (that is until it was detected by the registry and the endpoints set was refreshed by the clients). If I manage to "pin/cache" a connection to a service (with the default random selection) I assume the cost of connection timeout will be once.

Thanks,
Arie.

benoit · May 2008

Hi,

The locator cache entry for an adapter or replica group is refreshed when the locator cache timeout expires or when there's a connection failure to one of the endpoints. So your client won't try to connect over and over to the server which is down for the duration of the locator cache timeout: it will try to connect, get the connection failure and refresh the endpoints as a result of the failure. If the IceGrid node which is managing this server detected that it's down, the refreshed endpoints won't contain its endpoints.

As an aside, it's not clear to me why you need this cancel request if your servers need to process so many invocations and the average response time is 10ms. I would have expected the ability to cancel a request to be mostly useful for long running server operations.

Cheers,
Benoit.

aozarov · May 2008

Oh, I wasn't aware that any connection failure will trigger an Ice registry lookup (is it mentioned anywhere?).
Is it done in the background by a separate thread (and a different end-point will be picked for this request) or is it done by the caller thread?
Is that true regardless the connection caching strategy or connection selection strategy (random/round-robin)?
I assume that will not be the case for direct proxies (such as the one I get by calling proxy.getService()), right?

We have a facade layer that delegates multiple requests by a client until completed, a time-out or explicitly cancel. We would like to be able to act upon cancel or time-out to reduce load on the back-end resources (DB,cache,...). Also, though 10ms is an avg some request may be stuck/take much longer and we would like to be able to abort them.

Thanks!
Arie.

benoit · May 2008

Hi,

aozarov wrote: »

Oh, I wasn't aware that any connection failure will trigger an Ice registry lookup (is it mentioned anywhere?).

It's mentioned in the "Locator Cache" section from 28.17.2 Client Semantics.

Is it done in the background by a separate thread (and a different end-point will be picked for this request) or is it done by the caller thread?

With Ice <= 3.2.x, it's done by the caller thread. With Ice 3.3, it's done by the caller thread for regular synchronous calls or in the background for AMI calls (the locator call is done using an AMI request).

Is that true regardless the connection caching strategy or connection selection strategy (random/round-robin)?
I assume that will not be the case for direct proxies (such as the one I get by calling proxy.getService()), right?

Yes, it's true regardless of the connection caching strategy. There's no locator request for direct proxies since such proxies contain the endpoints of the server. Assuming the object adapter of the "ReplicatedService" servant is configured with both an adapter id and replica group id, the implementation of your getService method can either return a direct proxy (created with ObjectAdapter::createDirectProxy) or an indirect proxy that contains the adapter id instead of the replica group id (created with ObjectAdaapter::createIndirectProxy). See "28.4.7 Creating Proxies" in the Ice manual here.

Which kind of proxy to return to the client depends on what the client does with the proxy. For example, if the client saves the proxy in a database it's better to return an indirect proxy. If the client just invokes a request on the proxy after obtaining it, a direct proxy might be fine.

We have a facade layer that delegates multiple requests by a client until completed, a time-out or explicitly cancel. We would like to be able to act upon cancel or time-out to reduce load on the back-end resources (DB,cache,...). Also, though 10ms is an avg some request may be stuck/take much longer and we would like to be able to abort them.

Thanks!
Arie.

Cheers,
Benoit.

Archived

Using pinned connection with Random EndPoint selection strategy

Comments

Categories