Archived

This forum has been archived. Please start a new discussion on GitHub.

LocalObject vs Object for "no-Ice" application?

I'll admit, for starters, that I'm not sure exactly what the pathology of this is, but I have a case where an application -- which uses Ice facilities, but not the communicator -- works if I inherit from LocalObject, but not if I inherit from Object.

Specifically, it seems that presence of a IceUtil::Handle hangs my test app -- in what I had previously believed was a totally unrelated place -- if the template parameter is of a class that derives from Ice::Object.

I tried to narrow this down some more, but wasn't able to reproduce it with a trivial use of handles and threads (though I was able to trigger an assertion, as long as I wasn't running under the debugger).

The sample in question is less than 200 lines, though, so I figure I'll throw it up here anyway. I'm running on Fedora Core 1 with all the latest glibc goodies.

If you change
class TimedEventService : virtual public Ice::LocalObject {
to
class TimedEventService : virtual public Ice::Object {
you should see the hang. I certainly do. I don't understand it at all, to be honest.

Mike

Comments

  • Re: LocalObject vs Object for "no-Ice" application?
    Originally posted by shaver
    I'll admit, for starters, that I'm not sure exactly what the pathology of this is, but I have a case where an application -- which uses Ice facilities, but not the communicator -- works if I inherit from LocalObject, but not if I inherit from Object.

    I just tried your example with RH8 (sorry, I don't have RH9 available right now) and gcc 3.2. With that, it works as expected (prints "TimedEventService started" and "TimeEventService stopped").

    Specifically, it seems that presence of a IceUtil::Handle hangs my test app -- in what I had previously believed was a totally unrelated place -- if the template parameter is of a class that derives from Ice::Object.

    I can't reproduce the problem, so what follows is largely speculation...

    Ice::Object is derived from IceUtil::GCShared. In turn, GCShared hooks into the data structures for the garbage collector. The collector is initialized the first time you call Ice::initialize(). If you do not create any communicator in your process, the collector's data structures are not initialized.

    I suspect what is happening is that, due to some difference in static initialization order between RH8 and RH9 (or due to differences in the way you linked your code to the way I linked it), something isn't initialized, causing the problem you are seeing. One way to test this would be to simply add a call to create a communicator to your code (and, of course, properly destroy that communicator again). If that makes the problem go away, we have at least identified the cause.

    Having said all this, it seems a bit strange to want to use Ice::Object without also creating at least one communciator. Ice::Object is the base of all non-local interfaces and classes and, as such, contains fundamental hooks into the Ice run time, such as keeping track of objects for garbage collection and marshaling of objects. If you don't create a communicator, you cannot possibly marshal anything, so that begs the question of why you would want to derive something from Ice::Object without also creating a communicator ;)

    Or, to put it differently, the first call to Ice::initialize() initializes the Ice run time and, without having made such a call, you cannot expect things to work that interface with the Ice run time (such as using a remotable object).

    Could you let me know what happens if you instantiate a communicator please? Also, taking a core dump of the hung process and then getting a stack trace of all active threads would be useful. Once we know what causes the problem, I'll certainly try and see if we can come up with a way to avoid the problem you are seeing.

    Cheers,

    Michi.
  • Re: Re: LocalObject vs Object for "no-Ice" application?
    Originally posted by michi
    Ice::Object is derived from IceUtil::GCShared. In turn, GCShared hooks into the data structures for the garbage collector. The collector is initialized the first time you call Ice::initialize(). If you do not create any communicator in your process, the collector's data structures are not initialized.
    Yeah, that's sort of where Vlad and I were leaning as well, which is what led us to experiment (successfully, it turns out) with LocalObject.
    Having said all this, it seems a bit strange to want to use Ice::Object without also creating at least one communciator.
    In this case, I was writing a unit test for something that is used as part of a larger, communicator-having application. I wanted to use the IceUtil::Handle parts, which require that their parameter classes derive from Object or LocalObject, and the docs made it sound like I should prefer Object for everything but servant locators.
    Could you let me know what happens if you instantiate a communicator please? Also, taking a core dump of the hung process and then getting a stack trace of all active threads would be useful. Once we know what causes the problem, I'll certainly try and see if we can come up with a way to avoid the problem you are seeing.
    I'll try to poke at it some more today. gdb doesn't give me very good stack traces of this problem, sadly -- there may be stack corruption as a result of whatever else is going wrong -- but I'll try adding some skidmarks. Thanks!

    Mike
  • Can you please also try running with the environment variable LD_ASSUME_KERNEL set to 2.4.1?
    LD_ASSUME_KERNEL=2.4.1
    export LD_ASSUME_KERNEL
    

    The problem seems to be related to either initialization order (in which case LD_ASSUME_KERNEL should make no difference), or it is related to threading/mutex semantics (in which case it might make a difference).

    Cheers,

    Michi.
  • Originally posted by michi
    Can you please also try running with the environment variable LD_ASSUME_KERNEL set to 2.4.1?
    I actually tried that initially, remembering Bernard's post about nptl threading issues, to no avail. (I have all the glibc updates, too.) Sorry for not mentioning that earlier.

    Mike
  • OK, looks like it's related to threading or mutex semantics then. Hmmm... I don't have a Fedora system available (I guess this is something we need to address...), so I can't track this down. If you can find the time, it would be great to get some more info. Single-stepping with a debugger might help to get you to the point where the hang happens. Once we know which mutex (I'm pretty sure the hang happens on a mutex lock) is responsible, I should be able to figure out what is happening.

    Cheers,

    Michi.
  • mes
    mes California
    Re: Re: Re: LocalObject vs Object for "no-Ice" application?

    Hi Mike,
    Originally posted by shaver
    I wanted to use the IceUtil::Handle parts, which require that their parameter classes derive from Object or LocalObject, and the docs made it sound like I should prefer Object for everything but servant locators.
    Just a couple of clarifications.

    First, the IceUtil::Handle template technically doesn't require Ice::Object, it's just that Ice::Object implements the interface that IceUtil::Handle expects. If you want to use IceUtil::Handle in your code without using Ice classes, you can derive your class from IceUtil::Shared or IceUtil::SimpleShared.

    Second, we don't really recommend manually deriving a class from Ice::Object. This should generally be left to the Slice compiler.

    Finally, I too built your example and it worked as Michi described. Admittedly, I was using RH9 and our development sources.

    Good luck,
    - Mark
  • Hi Mike,

    thanks for providing me with the account on your machine to work this out.

    I think I have identified the problem. In your code, the destructor of evsvc is called by the main thread at the end of the nested block in main(). Because evscv is a Handle to a TimedEventService, and TimedEventService is derived from Ice::Object, this causes a call to __decRef() on the IceUtil::GCShared base, which calls delete because the reference count of the object has now dropped to zero. The call to delete is made with a lock held on gcRecMutex. In turn, the call to delete results in a call to ~TimedEventService() which tries to join with the thread spawned previously.

    Meanwhile, the spawned thread is in its run() method and has just been woken up from its call to wait(). _stopping is now true and run() returns, which results in the destruction of the local variable nowEvent. nowEvent is a handle to a TimedEventService, which is derived from Ice::Object, and the destructor results in a call to the IceUtil::GCShared base as well. As a result, the thread about to terminate is trying to acquire the same lock that is currently held by the main thread in the destructor of TimedEventService.
    The spawned thread can't terminate until the lock is unlocked, but the main thread holds the same lock while it calls join(), so we get a deadlock.

    Looking at the code, the problem appears to be in IceUtil::GCShared::__decRef(). The last few lines read:
    if(doDelete)
    {
        delete this;
    }
    gcRecMutex._m->unlock();
    

    So, ultimately, the problem appears to be that delete is being called while the lock on gcRecMutex is still held. Changing the code to move the unlock() call to before the delete call should fix this:
    gcRecMutex._m->unlock();
    if(doDelete)
    {
        delete this;
    }
    

    I suspect that the problem never showed up so far because of the way threads are scheduled -- presumably, the scheduler or other details of the timing are different under Fedora, making the problem visible.

    Could you please rebuild Ice with the above change (in src/IceUtil/GCShared.cpp) and try again?

    Thanks,

    Michi.
  • Originally posted by michi
    Could you please rebuild Ice with the above change (in src/IceUtil/GCShared.cpp) and try again?
    Worked like a charm. Thanks a ton!

    Mike
  • Yes, doesn't surprise me. Unfortunately, on further examination, making this change opens up another potential race condition with the garbage collector, so it's not really a fix. I need to think about this a bit more -- I'll post a proper fix in the next few days.

    Cheers,

    Michi.
  • OK, I have a correct fix now. Attached are new versions of GCShared.h and GCShared.cpp. The fix will of course be included in the next release of Ice.

    Cheers,

    Michi.