Archived

This forum has been archived. Please start a new discussion on GitHub.

Python Deadlock Problem

I have a servant implemented in Python. It makes a call to another servant using a proxy that is on the same machine (and running in the same Python interpreter). It manages to call this remote function, but it then never returns from the function call.

I theorized that this was due to the Python GIL and found this thread and the note about multithreading in python in the FAQ.

I increased the thread pool to 2 and it stopped deadlocking. But the FAQ seems to imply that it should have released the GIL lock when it called the proxy originally. Am I doing this correctly? Am I going to have to always run the server with a thread pool greater than 1?

Note: I also have an implementation in C# that makes this same call with no problems. I could switch the python implementation to C++ (since it only requires a C library that I'm using a python wrapper for), but I would prefer not to. The single-threaded nature of Python isn't a problem since it will only be servicing one client at a time anyway. I mostly used ICE for the RPC benefits and the multiple language support.

Comments

  • mes
    mes California
    Hi,

    Welcome to the forum.

    The hang you were experiencing (before you increased the size of the thread pool) was not due to the GIL but rather to a lack of threads. Ice's "collocation optimization" feature is not available in Python, which means all invocations (whether on a remote or local object) are marshaled, sent over the "wire" (which may only be the localhost interface), and unmarshaled. Let's consider your scenario with the default thread pool configuration of one thread:
    • You're dispatching an invocation, which means you're currently using the one and only server-side thread.
    • The servant makes a nested invocation on another local object. This request is marshaled and sent over a local connection.
    • No more server-side threads are available to read and dispatch the nested invocation, so the server deadlocks.
    Increasing the size of the server-side thread pool works around the issue, but it's not the cleanest solution because you haven't eliminated the possibility of a deadlock, only made it less likely. It will happen again if the number of concurrent invocations ever reaches whatever arbitrary size you've chosen for the thread pool.

    A more robust solution is to use asynchronous invocation and dispatch. Admittedly, it's also a bit more complex to implement, but it's not too bad (especially in Python). We call the idiom AMI/AMD chaining, and we use it a lot in our own code. Here's the basic idea:
    • A servant operation is implemented using AMD. It receives an AMD callback object that it must invoke when the operation completes.
    • The servant makes a nested invocation using AMI. It creates an AMI callback object that holds the AMD callback. When the AMI call completes, the AMI callback invokes the AMD callback.
    With this strategy, the initial servant invocation no longer ties up a dispatch thread while it waits for its nested invocation to complete.

    The reason this doesn't also happen in a language like C++ or C# is because there the nested invocation is performed using collocation optimization, in which case it's a direct method call on the other servant and therefore doesn't require an additional thread pool thread to unmarshal and dispatch the request.

    Hope that helps,
    Mark
  • Thanks.

    So the servant that calls the second servant relies on the return value of the previous call to return itself.

    Say I have the following:
    interface Person {
        ...
    };
    
    sequence<Person *> People;
    interface Registry {
         People getPeople();
    };
    
    interface World {
         bool hasPeople(Registry* reg);
    };
    

    Then if the implementation of World that I want to do is:
    class WorldImpl(World):
        def hasPeople(self, reg):
            return len(reg.getPeople()) > 0
    

    How would I do this using asynchronous calls?
  • mes
    mes California
    First you need to use metadata to enable AMD for hasPeople:

    ["amd"] bool hasPeople(Registry* reg);

    The implementation of hasPeople goes something like this:
    class WorldI(World):
        def hasPeople_async(self, cb, reg, current=None):
            class getPeopleCallback(object):
                def __init__(self, amdCallback):
                    self.amdCallback = amdCallback
    
                def response(self, people):
                    self.amdCallback.ice_response(len(people) > 0)
    
                def exception(self, ex):
                    self.amdCallback.ice_exception(ex)
    
            amiCallback = getPeopleCallback(cb)
            reg.begin_getPeople(amiCallback.response, amiCallback.exception)
    
    Regards,
    Mark