AMI/AMD Chaining woes...

vanco · January 2008

Hi,

I'm experiencing massive deadlocks when using AMI/AMD Chaining (for description, see the excellent article by Matthew Newhook in the Connections newsletter from July 2005: http://www.zeroc.com/newsletter/issue4.pdf)

Our system consists of several peer-to-peer servers, and some clients connecting to this set of servers. Requests reach a server, it performs some actions on them, and then sends the request to another server for more processing. A request will sometimes visit only one server, but sometimes visit several servers in a row (sometimes the same server multiple times).

I've managed to reproduce the problem in a fairly tiny testcase, attached at the bottom. The problem seems to be that the asynchronous calls seem to have a synchronous component to them. Here's what seems to be happening:

- a server receives an AMD call
- a Server pool thread processes this call, and dispatches it to user code
- the user code invokes an AMI method in a different server.
- this call is handled by a Server pool thread in the second server
(etc.)

Unfortunately, the AMI call seems to have a synchronous component, where it waits for a thread in the second server's Server pool thread to free up and receive the call. Only then it returns control back to the user function, which promptly returns control to Ice (as per the AMI/AMD chaining technique).

In our peer-to-peer architecture, the second server is at the same time making its own asynchronous calls to other servers, including the initial server in our example. This causes a deadlock: server 1 is waiting for a Server pool thread to free up in server 2, and server 2 is waiting for a Server pool thread to fee up in server 1.

I've tried sizing up the number of threads in the Server thread pool, but that only postpones the problem, and doesn't fix it. The Client thread pool doesn't seem to have any effect on this problem.

As far as I can tell, the only way to fix the problem is to never use AMI/AMD chaining, and instead queue up the requests from the Server pool threads to my own thread pool, for where I can make further AMI invocations. The problem with this approach is that under heavy load my queue grows infinitely longer, and there's nothing to slow down my callers (any push back mechanism will run into the same deadlock problem)

We're still using Ice 3.1, so please let me know if this problem is resolved in Ice 3.2 - the move to Ice 3.2 isn't trivial for us: it requires some code changes, plus has some a few logistical issues

Thanks,
Vanco

Testcase:
The easiest way to reproduce this problem is with one server with one server pool thread that calls itself. This deadlocks instantly.

Slice file:

module Demo {
["ami"] interface DemoServer {
// returns g('val')
["amd"] int f(int val);
// returns 'val'+1
["amd"] int g(int val);
};
};

Server code:

#include <Ice/Ice.h>
#include <Interface.h>

using namespace std;
using namespace Demo;

class FCallBack : public AMI_DemoServer_g{
const AMD_DemoServer_fPtr cb_;
public:
FCallBack(const AMD_DemoServer_fPtr& cb) : cb_(cb) { ; }
virtual void ice_response(int result) {
cb_->ice_response(result);
}
virtual void ice_exception(const ::Ice::Exception&) { ; }
};

class DemoServerI : public DemoServer {
private: // Data
Ice::CommunicatorPtr ic_;
const char* otherServerProxy_;
public: // C-tor
DemoServerI(Ice::CommunicatorPtr ic, const char* otherServerProxy) :
ic_(ic), otherServerProxy_(otherServerProxy)
{ ; }
public: // Interface
virtual void f_async(const AMD_DemoServer_fPtr& cb, int val,
const Ice::Current&)
{
try {
Ice::ObjectPrx baseL = ic_->stringToProxy(otherServerProxy_);
DemoServerPrx demoPrx = DemoServerPrx::checkedCast(baseL);
if (!demoPrx)
throw "Unable to reach other server";
FCallBack* fcb = new FCallBack(cb);
demoPrx->g_async(fcb, val);
} catch (const Ice::Exception& e) {
cb->ice_exception(e);
} catch (const char* msg) {
cb->ice_exception(Ice::Exception(msg, 0));
}
}
virtual void g_async(const AMD_DemoServer_gPtr& cb, int val,
const Ice::Current&)
{
cb->ice_response(val+1);
}
};

void usage() {
cout << "Usage: ./server <serverName> <portNum> <otherServerProxy>"
<< endl;
}

int
main(int argc, char* argv[])
{
int status = 0;
Ice::CommunicatorPtr ic;
try {
ic = Ice::initialize(argc, argv);
if (argc != 4) {
usage();
return -1;
}
char* serverName = argv[1];
int portNum = atoi(argv[2]);
char* otherServerProxy = argv[3];
cout << "Params: Server " << serverName
<< " on port " << portNum
<< ", talking to server " << otherServerProxy << endl;

char endpointString[200];
sprintf(endpointString, "default -p %d", portNum);
cout << "endpoint string: " << endpointString << endl;

Ice::ObjectAdapterPtr adapter
= ic->createObjectAdapterWithEndpoints(
"SimpleAdapter", endpointString);
Ice::ObjectPtr svrObj = new DemoServerI(ic, otherServerProxy);
adapter->add(svrObj, ic->stringToIdentity(serverName));
adapter->activate();
ic->waitForShutdown();
} catch (const Ice::Exception& e) {
cerr << e << endl;
status = 1;
} catch (const char* msg) {
cerr << msg << endl;
status = 1;
}
if (ic) {
try {
ic->destroy();
} catch (const Ice::Exception& e) {
cerr << e << endl;
status = 1;
}
}
return status;
}

Server usage:

./server --Ice.ThreadPool.Server.Size=1 --Ice.ThreadPool.Client.Size=1 server0 2345 "server0:default -p 2345"

Client code:

using namespace std;
using namespace Demo;

void usage()
{
cout << "Usage: ./client <serverProxy> <numReps>"
<< endl;
}

int
main(int argc, char* argv[])
{
int status = 0;
Ice::CommunicatorPtr ic;
try {
ic = Ice::initialize(argc, argv);
if (argc != 3) {
usage();
return -1;
}
char* serverProxy = argv[1];
int numReps = atoi(argv[2]);
cout << "Params: " << numReps << " reps to server "
<< serverProxy << endl;

Ice::ObjectPrx baseL = ic->stringToProxy(serverProxy);
DemoServerPrx demoPrx = DemoServerPrx::checkedCast(baseL);
if (!demoPrx)
throw "Invalid server proxy";

for (int i=0; i<numReps; ++i) {
int j = demoPrx->f(i);
if (j != i+1) {
cout << "Invalid result in iteration " << i << endl;
cout << "Got " << j << ", expected " << i+1 << endl;
throw "Computation error";
}
}
cout << "Done!" << endl;
} catch (const Ice::Exception& ex) {
cerr << ex << endl;
status = 1;
} catch (const char* msg) {
cerr << msg << endl;
status = 1;
}
if (ic)
ic->destroy();
return status;
}

Client usage:

client "server0:default -p 2345" 1

Compilation instructions:

slice2cpp Interface.ice
g++ -I. -I$ICE_HOME/include -c Interface.cpp server.cpp && g++ -o server Interface.o server.o -L$ICE_HOME/lib -lIce -lIceUtil
g++ -I. -I/include -c Interface.cpp client.cpp && g++ -o client Interface.o client.o -L/lib -lIce -lIceUtil

fmoya · January 2008

More inputs on this case

I managed to get this example working by:

Avoiding collocation in the otherServerProxy (using ice_collocationOptimized).
Removing synchronous invocations (checkedCast invokes ice_isA)
Adding a first invocation in the server side.

I do not know what changes when the server invokes a method using a non-collocated proxy, and I haven't had time to look into it but here is the simplest case I made to show a working/non-working version.

Archived

AMI/AMD Chaining woes...

Comments

Categories