Archived

This forum has been archived. Please start a new discussion on GitHub.

thread pools and deadlocks

Quoting 29.2 in the documentation:
For example, if a server calls back into the client from
within an operation implementation, the client-side receiver thread can
process the request from the server even though the client is waiting for a
reply for its request from the same server.

When it says "process a request from the server", does it mean a full synchronous method invocation from the server back to the client? Said differently, if I have two objects A and B using different communicators, and A::foo() calls B::bar(), then B::bar() calls A::bar() which does nothing but return, should this avoid deadlock if both communicators have Client and Server threadpool sizes of 1?

The reason I ask is I am getting different behavior. Both communicators have default (no configs used) thread pool sizes of 1 and are in different processes. In addition to synchronous method invocations, ice_ping() and checked_cast() deadlock as well. As expected raising the ThreadPool.MaxSize's fixes the problem.

Comments

  • benoit
    benoit Rennes, France
    andhow wrote:
    Quoting 29.2 in the documentation:


    When it says "process a request from the server", does it mean a full synchronous method invocation from the server back to the client?

    Yes.
    andhow wrote:
    Said differently, if I have two objects A and B using different communicators, and A::foo() calls B::bar(), then B::bar() calls A::bar() which does nothing but return, should this avoid deadlock if both communicators have Client and Server threadpool sizes of 1?

    I'm not sure I understand the question, what should prevent the deadlock? The scenario you describe looks like the demo/Ice/callback, where the client is calling Callback::initiateCallback() on the server, and the server is calling back CallbackReceiver::callback() on the client. The thread pool sizes are set to 1 in the example and there's no deadlocks.
    andhow wrote:
    The reason I ask is I am getting different behavior. Both communicators have default (no configs used) thread pool sizes of 1 and are in different processes. In addition to synchronous method invocations, ice_ping() and checked_cast() deadlock as well. As expected raising the ThreadPool.MaxSize's fixes the problem.

    If you have only one thread in the server thread pool to dispatch incoming requests and if this thread is currently busy waiting for a response from another invocation, ice_ping() or checkedCast() invocations on this server will hang. There's simply no more threads to process them.

    We would need to know a bit more on your application to be able to figure out why you're getting a deadlock. Is it similar to the scenario you described where A::foo() calls B::bar(), then B::bar() calls back A::bar()? Are you locking any mutexes when doing these invocations?

    The best way to quickly solve these problems is to get thread dumps of both processes and analyze the thread dumps. You can easily get thread dumps of a Java JVM by sending a SIGQUIT signal to the JVM process. For C++ processes, you need to attach the process with your favorite debugger. If you send me the thread dump of your client and server, I'll be happy to look at them to try to see where the deadlock is coming from.

    Benoit.
  • Ah ha, I think I confused myself and over-complicated things. Your response was extremely helpful.

    Reading 29.2 I thought it was saying that a Communicator actually owned two seperate pools (client-side/server-side) with different purposes, but I see now the intended meaning; by default there is only one thread pool per Communicator. [right?] Based on this, I oversimplified the example I presented: A::foo() itself is called by a third and seperate function, thereby using the 1 free thread and causing the deadlock.

    If I may ask a general question:
    This seems like other resource/deadlock problems where there is no magic solution. I am new to distributed/multithreaded programming and I know other forum members are quite the opposite. Are there any well established algorithms/practices/mechanisms that apply to specifically this situation? I am just looking for keywords or names of papers to read up on, I don't want to take too much of anyone's time.

    Again, many thanks.
  • benoit
    benoit Rennes, France
    andhow wrote:
    Ah ha, I think I confused myself and over-complicated things. Your response was extremely helpful.

    Reading 29.2 I thought it was saying that a Communicator actually owned two seperate pools (client-side/server-side) with different purposes, but I see now the intended meaning; by default there is only one thread pool per Communicator. [right?] Based on this, I oversimplified the example I presented: A::foo() itself is called by a third and seperate function, thereby using the 1 free thread and causing the deadlock.

    You were right in the first place ;). The communicator has indeed 2 different thread pools. However, unless you're using AMI, I would ignore the client thread pool (the client thread pool is only calling the user code for AMI callback objects...). What is important in the callback demo and your application is the server thread pool, i.e.: the thread pool which is used to dispatch requests from your client.
    andhow wrote:
    If I may ask a general question:
    This seems like other resource/deadlock problems where there is no magic solution. I am new to distributed/multithreaded programming and I know other forum members are quite the opposite. Are there any well established algorithms/practices/mechanisms that apply to specifically this situation? I am just looking for keywords or names of papers to read up on, I don't want to take too much of anyone's time.

    Again, many thanks.

    There's indeed no magic solutions, the key is to understand which threads are involved in the execution of your code.

    If A::foo() is being executed from a thread of the server thread pool, this explains why the call to A::bar() hangs. There's no more threads in the server thread pool to dispatch A::bar() since the only thread of the server thread pool is busy dispatching A::foo() and A::foo() is waiting for B::bar() to return.

    This problem is a classic thread starvation problem (you could do a search on Google with "thread starvation"). There's several way to avoid thread starvation, here's some:
    • Ice provides dynamic thread pools, the thread pool can grow dynamically to handle new requests if no more threads are available. See Appendix C.6 for how to configure dynamic thread pools.
    • Avoid a design where the callback nesting level is too high in the first place. Try to see if you really need these callbacks, getting rid of the callbacks will make your application simpler, less prone to thread starvation and more efficient.
    • Use AMD/AMI to release threads from the server thread pool when doing a remote invocation on an object.

    Benoit.
  • You were right in the first place ;)
    haha 5am, thank you
  • matthew
    matthew NL, Canada
    benoit wrote:
    Avoid a design where the callback nesting level is too high in the first place. Try to see if you really need these callbacks, getting rid of the callbacks will make your application simpler, less prone to thread starvation and more efficient....

    Benoit.

    I just want to expand on Benoit's excellent post.

    Callbacks in both distributed and non-distributed systems typically are are bad news. If you can arrange your systems such that all calls flow on in one direction then you cannot have deadlocks. Deadlocks occur when you have bi-directional flow -- and can be particularly hard to diagnose and fix when the flow involves lots of different cooperating objects.

    There are many different strategies to avoid callbacks in your code. Here are a couple.
    • Try to arrange whenever possible to pass in the data that is required to process the request, not require the callee to ask the caller for the data.
    • In typical manager/managee structure the managee quite often has to let the manager know various state changes (for example, a destroy call). In this case you can eliminate the callback by a form of lazy polling. For example, in an event distribution system the manager has to call on the managees on a frequent basis to give the managee new data to process. In this case the managee can indicate to the manager that it has been destroyed.

    Sometimes callbacks are unavoidable... in this case you must carefully analyse the object calling graphs and locks to ensure that you do not cause deadlocks.

    Regards, Matthew
  • I see. I am starting to see the differences programming in this environment compared to the single process GUI apps and socket-based FPS games I have worked on. I appreciate all this advice.
    Sometimes callbacks are unavoidable... in this case you must carefully analyse the object calling graphs and locks to ensure that you do not cause deadlocks.
    Is there a commonly used formal method of analyzing thread-pool deadlocks? I recall things like the Bankers algorithm that were used for generic resource deadlock prevention. Of course using a one-directional flow of control simplifies things...
  • benoit
    benoit Rennes, France
    I'm not aware of any specific formal methods to analyze thread pool deadlocks. Threads from the thread pool are resources so the banker algorithm could probably be used here, however it would probably be very difficult to implement efficiently in such a distributed environment :).

    Benoit.