Ice 3.5.0: slice2cpp generated code relies on static initialization order => crash

grembogrembo Member Michael GmelinOrganization: Grem Equity GmbHProject: E-Commerce platform
Hi,

While testing Ice on FreeBSD-CURRENT using clang 3.3 (LLVM version numbering, so the current release), a classic rely on static initialization order problem surfaced that will sooner or later hit one of your supported platforms as well.

slice2cpp generates an initialization classes and one constant global instance each per non-local exception and/or class. Those initialization classes in turn depend on factoryTable, which is declared and statically instantiated by including cpp/include/Ice/FactoryTableInit.h. Due to this dependency and the fact that there are no guarantees about initialization and destruction order this can (and apparently does) lead to disaster, specifically when exiting the program and factoryTable being deleted before its last use by generated stub classes (segmentation fault, bus error etc.).

Generated example code (based on cpp/test/Slice/keyword/Key.ice:):
class __F__and__return__Init
{
public:

    __F__and__return__Init()
    {
        ::IceInternal::factoryTable->addExceptionFactory("::and::return", new __F__and__return);
    }

    ~__F__and__return__Init()
    {
        ::IceInternal::factoryTable->removeExceptionFactory("::and::return");
    }
};

const __F__and__return__Init __F__and__return__i;

The attached patch fixes the problem by making sure that the lifetime of factoryTable exceeds the lifetime of const __F__and__return__Init __F__and__return__i / generated classes.

After applying the patch, slice2cpp generates the following code, that allows all unit tests to finish successfully:
class __F__and__return__Init
{
    IceInternal::FactoryTableInit* _ftableinit;
public:

    __F__and__return__Init(): _ftableinit(new IceInternal::FactoryTableInit)
    {
        ::IceInternal::factoryTable->addExceptionFactory("::and::return", new __F__and__return);
    }

    ~__F__and__return__Init()
    {
        ::IceInternal::factoryTable->removeExceptionFactory("::and::return");
        delete _ftableinit;
    }
};

const __F__and__return__Init __F__and__return__i;

Patch (Attachment not found.):
--- cpp/src/slice2cpp/Gen.cpp.orig      2013-06-23 01:25:10.126254943 +0000
+++ cpp/src/slice2cpp/Gen.cpp   2013-06-23 02:15:02.159048012 +0000
@@ -756,10 +756,11 @@
 
         C << sp << nl << "class " << factoryName << "__Init";
         C << sb;
+        C << nl << "IceInternal::FactoryTableInit* _ftableinit;";
         C.dec();
         C << nl << "public:";
         C.inc();
-        C << sp << nl << factoryName << "__Init()";
+        C << sp << nl << factoryName << "__Init(): _ftableinit(new IceInternal::FactoryTableInit)";
         C << sb;
         C << nl << "::IceInternal::factoryTable->addExceptionFactory(\"" << p->scoped() << "\", new "
           << factoryName << ");";
@@ -767,6 +768,7 @@
         C << sp << nl << "~" << factoryName << "__Init()";
         C << sb;
         C << nl << "::IceInternal::factoryTable->removeExceptionFactory(\"" << p->scoped() << "\");";
+        C << nl << "delete _ftableinit;";
         C << eb;
         C << eb << ';';
 
@@ -3976,10 +3978,11 @@
             C << sp;
             C << nl << "class " << factoryName << "__Init";
             C << sb;
+            C << nl << "IceInternal::FactoryTableInit* _ftableinit;";
             C.dec();
             C << nl << "public:";
             C.inc();
-            C << sp << nl << factoryName << "__Init()";
+            C << sp << nl << factoryName << "__Init(): _ftableinit(new IceInternal::FactoryTableInit)";
             C << sb;
             if(!p->isAbstract())
             {
@@ -4002,6 +4005,7 @@
             {
                 C << nl << "::IceInternal::factoryTable->removeTypeId(" << p->compactId() << ");";
             }
+            C << nl << "delete _ftableinit;";
             C << eb;
             C << eb << ';';

Thanks to Dimitry Andric <[email protected]> who analyzed the problem and tracked down the static initialization problem (see also Are ports supposed to build and run on 10-CURRENT?).

Cheers,
Michael

Comments

  • benoitbenoit Rennes, FranceAdministrators, ZeroC Staff Benoit FoucherOrganization: ZeroC, Inc.Project: Ice ZeroC Staff
    Hi Michael,

    Hmm, this shouldn't be necessary, static initialization order should be guaranteed already thanks to the inclusion of <Ice/FactoryTableInit.h> in Key.h. We use a Schwarz counter to ensure that the factory table global is initialized first (FactoryTableInit.h declares "static FactoryTableInit factoryTableInitializer"). It's not clear to me why this wouldn't work with clang 3.3, what is the stack trace of the crash?

    Cheers,
    Benoit.
  • grembogrembo Member Michael GmelinOrganization: Grem Equity GmbHProject: E-Commerce platform
    Hi Benoit,

    The problem is not initialization (you are right that it's always initialized before it's accessed first), but destruction. On exit, the last instance of static FactoryTableInit factoryTableInitializer, which was constructed by inclusion of FactoryTableInit.h, gets destructed before the last instance of const __F_*__Init __F__*__i. Therefore the reference count drops to zero and factoryTable gets deleted in ~FactoryTableInit before the destructor of __F_*__Init __F__*__Init tries to access factoryTable (calling removeTypeId or removeExceptionFactory) and the result is a bus error when trying to acquire the lock on the member mutex _m, which already had been destroyed.
    Core was generated by `client'.
    Program terminated with signal 10, Bus error.
    ...
    #0  0x0000000801f0a467 in pthread_mutex_destroy () from /lib/libthr.so.3
    [New Thread 803006400 (LWP 101068/client)]
    (gdb) bt
    #0  0x0000000801f0a467 in pthread_mutex_destroy () from /lib/libthr.so.3
    #1  0x0000000800bded05 in IceUtil::Mutex::lock (this=0x80302a020) at Mutex.h:294
    #2  0x0000000800bdecd7 in LockT (this=0x7fffffffd700, [email protected]) at Lock.h:49
    #3  0x0000000800bddb0d in LockT (this=0x7fffffffd700, [email protected]) at Lock.h:51
    #4  0x0000000800c72614 in IceInternal::FactoryTable::removeExceptionFactory (this=0x80302a020, 
        [email protected]) at FactoryTable.cpp:53
    #5  0x0000000000428030 in ~__F__and__sizeof__Init (this=0x654861) at Key.cpp:170
    #6  0x000000080221f83b in __cxa_finalize () from /lib/libc.so.7
    #7  0x00000008021c6ccc in exit () from /lib/libc.so.7
    #8  0x0000000000415a26 in _start ()
    #9  0x0000000800664000 in ?? ()
    #10 0x0000000000000000 in ?? ()
    
  • benoitbenoit Rennes, FranceAdministrators, ZeroC Staff Benoit FoucherOrganization: ZeroC, Inc.Project: Ice ZeroC Staff
    Static destruction is supposed to occur in the exact inverse order so the factory table should be destroyed last when the last FactoryTableInit object is destructed. It sounds like this isn't the case here with clang?

    Cheers,
    Benoit.
  • grembogrembo Member Michael GmelinOrganization: Grem Equity GmbHProject: E-Commerce platform
    I'm not an authority in compilers/linkers by any means, so as far as I know the linker could just get it wrong.

    Since gdb output is not that helpful I will add debug output all over the place to get a better understanding where things go wrong and maybe deduce a smaller test case based on that.
  • grembogrembo Member Michael GmelinOrganization: Grem Equity GmbHProject: E-Commerce platform
    Ok, I did some more debugging based on the Slice/keyword unit test.

    It's pretty clear, that within the translation unit in question (compiling the generated code, Key.cpp in this example) the initialization order of factoryTable, __F__and_return_i and __F_and_sizeof_i is not defined.

    As it turns out, the order of events in this case is:
    1. Construct static factoryTableInitializer in each translation unit that comes in from a shared library - incrementing the refcount every time. As a side-effect factoryTable is initialized on the first construction of factoryTableInitializer.
    2. Construct __F__and_return_i
    3. Construct __F__and_sizeof_i
    4. Construct factoryTableInitializer in this translation unit (Key.cpp) (increment refcount)
    5. Destruct factoryTableInitializer in this translation unit (Key.cpp) (decrement refcount)
    6. Destruct factoryTableInitialzer in each translation unit that came in from shared libraries - decrementing the reference counter every time
    7. Reference counter hits 0, factoryTable get deleted
    8. Destruct _F__and_sizeof_i (bus error)
    9. (Destruct _F__and_return_i - won't get there)

    The only reason the code doesn't crash in step 2 already is that in step 1 shared libraries initialized factoryTable, so the call to addExceptionFactory succeeds. Step 2 and step 3 happen before the translation unit local factoryTableInitializer has been created, therefore it gets destructed first.

    Now the interesting question is: It's clear, that static variables from shared libraries get destructed here before the local statics (_F__and_sizeof_i and _F__and_return_i) are. Is this a violation of the standard? And if so, could you maybe point me to the exact location of it were it states that, not just in respect of a specific translation unit and maybe also in relation to shared libraries.

    Regardless of the question if this exact order is violating the standard or not, given the current code structure the following order of events would be plausible as well:
    1. Construct __F__and_return_i: Calls addExceptionFactory on not yet initialized factoryTable, crashes and burns in flames, end of story
    2. (Construct __F__and_sizeof_i: same problem)
    3. (Construct factoryTableInitializer in local translation unit)
    4. (Construct factoryTableInitializer in shared libraries)
    5. (...)

    Afaik there's nothing in the standard preventing this from happening whatsoever, that's why I think my patch does exactly the right thing, making sure that there is always a FactoryTableInit object that exceeds the lifetime of __F__*__Init objects within the same translation unit to make sure the refcount stays above 0 until they're done.

    Cheers,
    Michael

    p.s. - Here's the debug output I used for analyzing this - the interesting parts are happing at column 0, everything else is indented.
        Constructing FactoryTableInit at 34378903488
        + InitCount over zero, creating global factoryTable
            + New Initcount is 1
            Constructing FactoryTableInit at 34378903105
            + New Initcount is 2 
            Constructing FactoryTableInit at 34378902984
            + New Initcount is 3 
            Constructing FactoryTableInit at 34378884424
            + New Initcount is 4 
            Constructing FactoryTableInit at 34378883912
            + New Initcount is 5
            Constructing FactoryTableInit at 34378883736
            + New Initcount is 6
            Constructing FactoryTableInit at 34378883473
            + New Initcount is 7
            Constructing FactoryTableInit at 34378882929
            + New Initcount is 8 
            Constructing __F__IceMX__UnknownMetricsView__Init at 34378883088
            ---> addExceptionFactory ::IceMX::UnknownMetricsView called
            Constructing __F__IceMX__Metrics__Init at 34378883128
            ---> addObjectFactory ::IceMX::Metrics called
            Constructing __F__IceMX__ThreadMetrics__Init at 34378883264
            ---> addObjectFactory ::IceMX::ThreadMetrics called
            Constructing __F__IceMX__DispatchMetrics__Init at 34378883312
            ---> addObjectFactory ::IceMX::DispatchMetrics called
            Constructing __F__IceMX__RemoteMetrics__Init at 34378883360
            ---> addObjectFactory ::IceMX::RemoteMetrics called
            Constructing __F__IceMX__InvocationMetrics__Init at 34378883408
            ---> addObjectFactory ::IceMX::InvocationMetrics called
            Constructing __F__IceMX__ConnectionMetrics__Init at 34378883456
            ---> addObjectFactory ::IceMX::ConnectionMetrics called
            Constructing FactoryTableInit at 34378882641
            + New Initcount is 9
            Constructing __F__Ice__AdapterNotFoundException__Init at 34378882736
            ---> addExceptionFactory ::Ice::AdapterNotFoundException called   
            Constructing __F__Ice__InvalidReplicaGroupIdException__Init at 34378882737
            ---> addExceptionFactory ::Ice::InvalidReplicaGroupIdException called
            Constructing __F__Ice__AdapterAlreadyActiveException__Init at 34378882738
            ---> addExceptionFactory ::Ice::AdapterAlreadyActiveException called
            Constructing __F__Ice__ObjectNotFoundException__Init at 34378882739
            ---> addExceptionFactory ::Ice::ObjectNotFoundException called
            Constructing __F__Ice__ServerNotFoundException__Init at 34378882740
            ---> addExceptionFactory ::Ice::ServerNotFoundException called
    
            Constructing FactoryTableInit at 34378882593
            + New Initcount is 10
            Constructing FactoryTableInit at 34378882313
            + New Initcount is 11
            Constructing FactoryTableInit at 34378882041
            + New Initcount is 12
            Constructing Init at 34378882042
            Constructing FactoryTableInit at 34378881449
            + New Initcount is 13
            Constructing FactoryTableInit at 34378881256
            + New Initcount is 14
            Constructing FactoryTableInit at 6637601
            + New Initcount is 15
    
    Constructing __F__and__return__Init at 6637680
    ---> addExceptionFactory ::and::return called
    Constructing __F__and__sizeof__Init at 6637681
    ---> addExceptionFactory ::and::sizeof called
    
            Constructing FactoryTableInit at 6637400
            + New Initcount is 16
            Testing operation name... ok
            Destructing FactoryTableInit at 6637400
    
            + New Initcount is 15
            Destructing FactoryTableInit at 6637601
            + New Initcount is 14
            Destructing FactoryTableInit at 34378881256
            + New Initcount is 13
            Destructing FactoryTableInit at 34378881449
            + New Initcount is 12
            Destructing Init at 34378882042
            Destructing FactoryTableInit at 34378882041
            + New Initcount is 11
            Destructing FactoryTableInit at 34378882313 
            + New Initcount is 10
            Destructing FactoryTableInit at 34378882593
            + New Initcount is 9
            Destructing __F__Ice__ServerNotFoundException__Init at 34378882740
            ---> removeExceptionFactory ::Ice::ServerNotFoundException called
            Destructing __F__Ice__ObjectNotFoundException__Init at 34378882739
            ---> removeExceptionFactory ::Ice::ObjectNotFoundException called
            Destructing __F__Ice__AdapterAlreadyActiveException__Init at 34378882738
            ---> removeExceptionFactory ::Ice::AdapterAlreadyActiveException called
            Destructing __F__Ice__InvalidReplicaGroupIdException__Init at 34378882737
            ---> removeExceptionFactory ::Ice::InvalidReplicaGroupIdException called
            Destructing __F__Ice__AdapterNotFoundException__Init at 34378882736
            ---> removeExceptionFactory ::Ice::AdapterNotFoundException called
            Destructing FactoryTableInit at 34378882641
            + New Initcount is 8
            Destructing __F__IceMX__ConnectionMetrics__Init at 34378883456
            ---> removeObjectFactory ::IceMX::ConnectionMetrics called
            Destructing __F__IceMX__InvocationMetrics__Init at 34378883408
            ---> removeObjectFactory ::IceMX::InvocationMetrics called
            Destructing __F__IceMX__RemoteMetrics__Init at 34378883360
            ---> removeObjectFactory ::IceMX::RemoteMetrics called
            Destructing __F__IceMX__DispatchMetrics__Init at 34378883312
            ---> removeObjectFactory ::IceMX::DispatchMetrics called
            Destructing __F__IceMX__ThreadMetrics__Init at 34378883264
            ---> removeObjectFactory ::IceMX::ThreadMetrics called
            Destructing __F__IceMX__Metrics__Init at 34378883128
            ---> removeObjectFactory ::IceMX::Metrics called
            Destructing __F__IceMX__UnknownMetricsView__Init at 34378883088
            ---> removeExceptionFactory ::IceMX::UnknownMetricsView called
            Destructing FactoryTableInit at 34378882929
            + New Initcount is 7
            Destructing FactoryTableInit at 34378883473
            + New Initcount is 6
            Destructing FactoryTableInit at 34378883736
            + New Initcount is 5
            Destructing FactoryTableInit at 34378883912
            + New Initcount is 4
            Destructing FactoryTableInit at 34378884424
            + New Initcount is 3
            Destructing FactoryTableInit at 34378902984
            + New Initcount is 2
            Destructing FactoryTableInit at 34378903105
            + New Initcount is 1
      Destructing FactoryTableInit at 34378903488
      + InitCount dropped to zero, deleting global factoryTable
      + New Initcount is 0
    Destructing __F__and__sizeof__Init at 6637681
    ---> removeExceptionFactory ::and::sizeof called
    
    
    
  • benoitbenoit Rennes, FranceAdministrators, ZeroC Staff Benoit FoucherOrganization: ZeroC, Inc.Project: Ice ZeroC Staff
    Static initialization order is undefined between translation units however within the same translation unit the order is defined. The standard guarantees that statics are initialized in the order of their definition (3.6.2 Initialization of non-local objects).

    So in theory (4.) should occur before (2.) and (3.) since the factoryTableInitializer static is defined before any other Xxx_Init static object in the generated code. As a result the destruction of the factoryTableInitializer should always occur last in a same translation unit.

    From the output of your tracing, it looks like the factoryTableInitializer object is initialized before Xxx_Init objects for some translation units (src/Ice/Metrics.o) but not for others (src/Ice/Locator.o, Key.o) which is unexpected.

    Cheers,
    Benoit.
  • grembogrembo Member Michael GmelinOrganization: Grem Equity GmbHProject: E-Commerce platform
    That is a very valid point (preprocessor output of Key.cpp clearly shows that they should get constructed in the expected order). I'll try to reproduce the problem at a smaller scale, this smells like a compiler/toolchain problem.
  • grembogrembo Member Michael GmelinOrganization: Grem Equity GmbHProject: E-Commerce platform
    Actually looking at the trace again:
            Constructing FactoryTableInit at 6637601
            + New Initcount is 15
    
    Constructing __F__and__return__Init at 6637680
    ---> addExceptionFactory ::and::return called
    Constructing __F__and__sizeof__Init at 6637681
    ---> addExceptionFactory ::and::sizeof called
    
            Constructing FactoryTableInit at 6637400
            + New Initcount is 16
            Testing operation name... ok
            Destructing FactoryTableInit at 6637400
    
            + New Initcount is 15
            Destructing FactoryTableInit at 6637601
    

    FactoryTableInit 6637400 is from Client.cpp an 6637601 is from Key.cpp, so they are constructed in exactly the right order, the real problem is order of destruction.

    I noticed that this only happens when the executable is built using -fPIC, so this *must* be a tool chain issue. I will try to figure out what's causing this and update you once I found the problem, but it's pretty clear that it's not Ice's fault, thanks for helping me to get this right.

    In the meantime the patch above does fix the issue, so it's a vital workaround until it has been resolved.
  • grembogrembo Member Michael GmelinOrganization: Grem Equity GmbHProject: E-Commerce platform
    It turned out to be a subtle bug in handling shared libraries on the OS level that had been introduced more than a year ago and that hopefully will be fixed soon.

    Thanks for helping me to track this one down.
Sign In or Register to comment.