Home Help Center

using Monitor segfaults

timruijstimruijs Member Tim RuijsOrganization: VialisProject: Publish/Subscribe Framework
Could someone please verify the following use of Monitor, wait and notify?
Mind the lock scope and iteration order.

myClass::run
{
{ // lock scope
IceUtil::Monitor<IceUtil::Mutex>::Lock lock( mResourceShared );
while ( Quit == false )
{
// wait for command from other thread
mResourceShared.wait();

// check shared resource for commands to execute

// notify other thread command has executed
mResourceShared.notify();
}
} // end lock scope
}

other threads invoke:
myClass::Set( <command_to_execute> )
{
IceUtil::Monitor<IceUtil::Mutex>::Lock lock( mResourceShared );

// store <command_to_execute> in shared memory

// notify command handler
mResourceShared.notify();

// wait for confirm from handler
mResourceShared.wait();
}



Purify detects Fatal core dump
__pthread_alt_unlock [spinlock.c]
pthread_mutex_unlock [libpthread.so.0]
pthread_cond_wait
void IceUtil::Cond::waitImpl<IceUtil::Mutex>(IceUtil::Mutex const &) [Config.h:169]
IceUtil::Monitor<IceUtil::Mutex>::wait(void) [Config.h:169]
myClass::run(void)

:confused:

Comments

  • michimichi Member Michi HenningOrganization: Triodia TechnologiesProject: I have a passing interest in Ice :-) ✭✭✭
    Hmmm... It is unlikely that there a bug in the mutex implementation. (Ice itself would most likely have uncovered any such bug long ago because mutexes are used extensively in the internals of Ice.)

    Can you reproduce this in a stand-alone simple example? You may have stomped on memory elsewhere in your code and caused the problem there, only to have it manifest itself later.

    Your code contains a potential livelock, by the way.

    Assume that the run() thread is asleep in its call to wait(), so the critical region is unlocked. Now two other threads more or less simultaneously call Set().

    The first thread deposits a command, calls notify() and then wait(), unlocking the critical region. There is no guarantee which thread will be scheduled next, so assume that the second thread that called Set() gets to run next. That thread also acquires the mutex, deposits a command, calls notify() and goes to sleep in wait().

    At this point, two calls to notify() have been made, both of which may have marked the run() thread as runnable. Now the run() thread gets the CPU, executes both commands, calls notify() and then wait(), unlocking the critical region. At this point, only one of the two Set() threads that called wait() will wake up; the second thread sleeps forever because one of the two calls to notify() was lost.

    Another scenario: as before, two threads more or less simultaneously call Set(). The first thread runs, notifies, and goes to sleep in wait(). Now the second thread enters Set(), deposits a command, calls notify() and then wait(). However, the call to notify() may wake up the first thread that is still asleep in Set() instead of the thread that is asleep in run().

    So, your code will work correctly only if all calls to Set() are made from a single thread. More commonly, the kind of thing you are trying to do is done by having the Set() thread put a condition variable into the work queue together with the command, and to wait on that condition variable. Every time the run() thread finishes a command, it notifies on the condition variable it finds with the command. That way, each thread that sleeps in Set() waits on a separate condition variable, so the run() thread can selectively wake up the threads that have deposited a command.

    And the run thread should sleep on a separate condition variable that counts the number of commands, with the Set() threads incrementing the counter whenever they deposit a command, and the run thread decrementing it.

    Cheers,

    Michi.
  • timruijstimruijs Member Tim RuijsOrganization: VialisProject: Publish/Subscribe Framework
    Thanx for your quick response!

    My assumption was that the wait() would (e.g. be fifo based and) maintain the invocation order. This proved not to be the case causing the relation between caller and callee to break, and therefore I can recognise the potential livelock, point taken. ;)

    Currently I am implementing your suggestion to accompany the command with a condition variable. keep you posted
Sign In or Register to comment.