Archived

This forum has been archived. Please start a new discussion on GitHub.

possible thread starvation issue, proxy hangs

in my application there is an mfc ui and a linux server app. normally, the server just updates the ui via proxy, but occasionally the ui thread needs to call something on the server because the user changed something. all callbacks are protected by mutexes and 99% of the time everything works fine, but once in a while when many things are happenning at once, a call on one of the proxies will throw an exception or the proxy function call will execute but the proxy just hangs. the mutex situation on the callbacks has been double and triple checked and looks fine.

i was under the impression that with the default config all calls are basically serialized, but it seems like when many things are happening at once ice has problems. by the way, there are no nested callbacks anywhere. will just upping the Ice.ThreadPool.Server.SizeMax or Ice.ThreadPool.Client.SizeMax have a chance to solve this? what reason could there be for a proxy to hang in the ui after seemingly successfully executing the call on the server side? if too many requests come in too little time, will the proxies ever throw exceptions or will they just wait? what is this thread starvation/deadlock risk? is that even possible without nested callbacks? any help much appreciated.

thanks,
peter

Comments

  • benoit
    benoit Rennes, France
    Hi,

    Which Ice version do you use?

    Increasing the number of thread pool threads could be a solution if the problem is caused by thread starvation but before doing this you should ensure this is the case.

    If the client request still hangs after it was dispatched by the server and the server sent the response, this usually indicates that the client thread pool thread is busy doing something else instead of listening for the outgoing connection and reading the server response. This can occur if you're using bidirectional connections or AMI. Is it the case?

    The best way to investigate deadlock or hang issues is to attach to the process with the debugger and check the stack trace of each thread. If you post the traces here, we'll be happy to take a look.

    Cheers,
    Benoit.
  • I use Ice 3.2.0 and unfortunately don't have the debug info right now. I am not using bidirectional connections or AMI. What is happening is that several proxies will try to contact the ui at the same time, possibly coinciding with a call to ice_ping on a separate thread. It seems there is a pattern where if too many calls are made on different proxies at once, the ui side hangs. When the ui is closed, all the backed up calls are executed at once on the server side. For some reason the server seems to handle this gracefully once the ui exits, but the ui hangs. Is there any way too many calls on proxies at the same time can cause ice to hang?

    There is a possibly related issue where on startup, many proxies are registered at once by the ui. Each proxy has its own thread in the ui. Rarely, only a few will register and then ice will hang. If thread starvation could be a possible cause, which should I change, the --Ice.ThreadPool.Client or Server variables.

    Thanks,
    Peter
  • matthew
    matthew NL, Canada
    I'm sorry Peter but I'm afraid I don't understand the above explanation because the terminology is a bit mixed up. Clients call on Ice objects hosted in servers using a proxy.
    There is a possibly related issue where on startup, many proxies are registered at once by the ui.

    What do you mean by that? Do you mean that the UI calls on the server to register its callback objects?
    Each proxy has its own thread in the ui.

    I'm afraid I don't understand what you mean here either. Is this a thread that you allocate? What does this thread do, and why do you want to devote a thread to a proxy?
    Rarely, only a few will register and then ice will hang. If thread starvation could be a possible cause, which should I change, the --Ice.ThreadPool.Client or Server variables.

    It sounds like perhaps what is occurring is that you sometimes get a callback on one of the previously registered objects prior to all callbacks being registered. That is typically the client does:
    foreach callback proxy:
       server->register(callback)
    

    And this typically completes with no callback being made prior to the entire group being registered. However, if a callback is made during the registration process then it causes a hang.

    Is the object adapter activated in the UI at the point that you are making these calls? If not, then the calls from the server will hang blocking the calling thread. If the callbacks are made from threads allocated from the server side thread pool, and there is only a single thread in that pool then no further invocations can be handled and all calls on the server will block.

    The solution here is to either:
    - increase the size of the server side thread pool. (Ice.ThreadPool.Server.Size=(some number > 1)
    OR
    - make the callbacks to the UI using some thread other than a thread from the server side thread pool. You can do this using a work-queue -- see demo/IceUtil/workqueue for an example.

    You might also want to review your UI code to ensure that you are not updating the the UI directly from callbacks. This is, in general, not safe! I wrote a series of articles on 4 articles on integrating UIs with Ice starting in issue 12 of Connections - http://www.zeroc.com/newsletter/issue12.pdf. You might also want to look at our bundled MFC demo - demo/Ice/MFC.
  • Thanks for the response and here are some answers to your questions

    What do you mean by that? Do you mean that the UI calls on the server to register its callback objects?
    yes, a typical interface looks like this

    // Kicker

    interface IKickerInstanceClient {
    void updateParameters(XMLKickerElement parameters, bool playSound);
    };

    interface IKickerInstance {
    void registerClient(IKickerInstanceClient* client);
    void updateParameters(XMLKickerElement parameters);
    XMLKickerElement getParameters();
    };
    sequence<Object*> KickerInstanceSeq;

    the server side implements IKickerInstance and client implements IKickerInstanceClient.

    I'm afraid I don't understand what you mean here either. Is this a thread that you allocate? What does this thread do, and why do you want to devote a thread to a proxy?

    each dialog is modal and contained in a class. there is a thread that runs the modal dialog and terminates on exit. this way dialogs can be created and destroyed dynamically by non-mfc code. each ui has a corresponding thread and proxy to its server.

    It sounds like perhaps what is occurring is that you sometimes get a callback on one of the previously registered objects prior to all callbacks being registered. That is typically the client does:

    hmm, quite possibly. this still doesn't explain the random times ice hangs after things have been working right for hours.

    You might also want to review your UI code to ensure that you are not updating the the UI directly from callbacks.

    i only "invalidate" in mfc terms on callbacks. this returns immediately and just tells windows to repaint the next time it loops around. i think the problem lies somewhere in the fact that there are mutexes for basically every callback and in some other parts of the ui code to prevent the parameters being passed back and forth from being corrupted. i can't see any way of taking the mutexes out of the ui code without the risk that ice and the ui will try to edit the parameters at the same time. once again, the mutex code has been double checked and works properly 99% of the time.

    possibly there are just too many threads trying to use ice at once which causes ice to hang on rare occasions? some of the callbacks do a little computation and all of them have mutexes so they aren't lightning fast. either way i have upped the maxsize of the threadpools on both ui and server without any issues...
  • Also, another thing I forgot to explicitly mention is that each server instance calls ice_ping once per second. This means every second at basically the same time ~20 ice_pings are attempted(1 per proxy), each by a different thread. The function to call the ping is protected by a mutex which I'm now realizing is probably unnecessary because the proxy is threadsafe. Just to clarify, there is only one mutex per server and one mutex per client instance, but it is used to protect almost every function because the data is shared intra-object. Does this seem like a likely candidate to be causing the rare ice hanging?
  • matthew
    matthew NL, Canada
    pdb1013 wrote: »
    You might also want to review your UI code to ensure that you are not updating the the UI directly from callbacks.

    i only "invalidate" in mfc terms on callbacks. this returns immediately and just tells windows to repaint the next time it loops around. i think the problem lies somewhere in the fact that there are mutexes for basically every callback and in some other parts of the ui code to prevent the parameters being passed back and forth from being corrupted. i can't see any way of taking the mutexes out of the ui code without the risk that ice and the ui will try to edit the parameters at the same time. once again, the mutex code has been double checked and works properly 99% of the time.

    possibly there are just too many threads trying to use ice at once which causes ice to hang on rare occasions? some of the callbacks do a little computation and all of them have mutexes so they aren't lightning fast. either way i have upped the maxsize of the threadpools on both ui and server without any issues...

    Too many threads using Ice will not cause random hangs. Hangs are most typically caused by deadlocks in your code (thread A locking mutex M1 and then trying to acquire M2, while thread B has locked M2 and is trying to acquire M1).
    Also, another thing I forgot to explicitly mention is that each server instance calls ice_ping once per second. This means every second at basically the same time ~20 ice_pings are attempted(1 per proxy), each by a different thread.The function to call the ping is protected by a mutex which I'm now realizing is probably unnecessary because the proxy is threadsafe.

    What is this ping for? To detect the client going away by the server? Since you are sending callbacks why do you need to do that? You'll know the client has disappeared when the callback fails. If the ping is there for the client to detect the server going away you should probably ping from the client side.

    At any rate, 20 pings a second is certainly excessive :) If this ping from the server is really necessary you should probably look to move to a session model, where the session is responsible for pinging. Look at demo/Ice/session for an example.
    Just to clarify, there is only one mutex per server and one mutex per client instance, but it is used to protect almost every function because the data is shared intra-object. Does this seem like a likely candidate to be causing the rare ice hanging?

    No, it doesn't sound likely. The best way to find out the reason for the hang is to break your application in a debugger when the hang occurs. Then you will find out exactly what is occurring.
  • Too many threads using Ice will not cause random hangs. Hangs are most typically caused by deadlocks in your code (thread A locking mutex M1 and then trying to acquire M2, while thread B has locked M2 and is trying to acquire M1).

    What I am going to do to address this is change everything involving the ui mutex to use trylock instead of lock and just return safely if the lock isn't acquired. This way, only the server mutex is blocking and if there is ever deadlock it will have to be on the server. What is the proper way to use the trylock helper object? If I just use IceUtil::Mutex::TryLock lock(uimutex_); how am I to tell whether the lock was acquired? Is there no exception-safe implementation of trylock?


    What is this ping for? To detect the client going away by the server? Since you are sending callbacks why do you need to do that? You'll know the client has disappeared when the callback fails. If the ping is there for the client to detect the server going away you should probably ping from the client side.

    At any rate, 20 pings a second is certainly excessive If this ping from the server is really necessary you should probably look to move to a session model, where the session is responsible for pinging. Look at demo/Ice/session for an example.


    The ping is so the server can detect the client going away. The computers running the server and client are in separate locations and if the connection goes down the server needs to know immediately. I will definitely look into the session model as an alternative. Is there any possibility of a ping and proxy call occurring at the same time causing problems? Thanks again.
  • matthew
    matthew NL, Canada
    pdb1013 wrote: »
    ...
    What I am going to do to address this is change everything involving the ui mutex to use trylock instead of lock and just return safely if the lock isn't acquired. This way, only the server mutex is blocking and if there is ever deadlock it will have to be on the server. What is the proper way to use the trylock helper object? If I just use IceUtil::Mutex::TryLock lock(uimutex_); how am I to tell whether the lock was acquired? Is there no exception-safe implementation of trylock?

    You should call acquired on the TryLock object to find out whether the lock was obtained.
    void
    dosomething()
    {
       IceUtil::Mutex::TryLock lock(_mut);
       if(!lock.acquired())
       {
           // lock was not acquired.
           return;
       } 
    }
    

    However, this does not sound like a very good solution. Surely you don't want to lose updates from the server? If I were you I would figure out really why you are getting unexpected deadlocks and fix the source of the problem.
    The ping is so the server can detect the client going away. The computers running the server and client are in separate locations and if the connection goes down the server needs to know immediately. I will definitely look into the session model as an alternative.

    Typically you would ping from the client to the server, and use use a timeout on the server side to detect the client disappearing. See demo/Ice/session for an example.
    Is there any possibility of a ping and proxy call occurring at the same time causing problems? Thanks again.

    Ice has no problems with concurrent calls.
  • I am making the proxy call return boolean to inform the server if the update was successful. If not it can deal with it in a sensible way (reset the flag that the client needs to be updated and try again in a bit). I feel problems like this are inherently very difficult to reproduce and track down especially when things get mixed in with ui/windows code. The proxy hang/deadlock occurs about once a week in code that runs 24/7 and makes several calls on proxies per second. If you can use a design pattern from the start where you know deadlock is impossible isn't that a very good solution? Either way, this has been a big help and keep up the good work.