Ice 3.5.0: slice2cpp generated code relies on static initialization order => crash

in Bug Reports
Hi,
While testing Ice on FreeBSD-CURRENT using clang 3.3 (LLVM version numbering, so the current release), a classic rely on static initialization order problem surfaced that will sooner or later hit one of your supported platforms as well.
slice2cpp generates an initialization classes and one constant global instance each per non-local exception and/or class. Those initialization classes in turn depend on factoryTable, which is declared and statically instantiated by including cpp/include/Ice/FactoryTableInit.h. Due to this dependency and the fact that there are no guarantees about initialization and destruction order this can (and apparently does) lead to disaster, specifically when exiting the program and factoryTable being deleted before its last use by generated stub classes (segmentation fault, bus error etc.).
Generated example code (based on cpp/test/Slice/keyword/Key.ice:):
The attached patch fixes the problem by making sure that the lifetime of factoryTable exceeds the lifetime of const __F__and__return__Init __F__and__return__i / generated classes.
After applying the patch, slice2cpp generates the following code, that allows all unit tests to finish successfully:
Patch (Attachment not found.):
Thanks to Dimitry Andric <[email protected]> who analyzed the problem and tracked down the static initialization problem (see also Are ports supposed to build and run on 10-CURRENT?).
Cheers,
Michael
While testing Ice on FreeBSD-CURRENT using clang 3.3 (LLVM version numbering, so the current release), a classic rely on static initialization order problem surfaced that will sooner or later hit one of your supported platforms as well.
slice2cpp generates an initialization classes and one constant global instance each per non-local exception and/or class. Those initialization classes in turn depend on factoryTable, which is declared and statically instantiated by including cpp/include/Ice/FactoryTableInit.h. Due to this dependency and the fact that there are no guarantees about initialization and destruction order this can (and apparently does) lead to disaster, specifically when exiting the program and factoryTable being deleted before its last use by generated stub classes (segmentation fault, bus error etc.).
Generated example code (based on cpp/test/Slice/keyword/Key.ice:):
class __F__and__return__Init { public: __F__and__return__Init() { ::IceInternal::factoryTable->addExceptionFactory("::and::return", new __F__and__return); } ~__F__and__return__Init() { ::IceInternal::factoryTable->removeExceptionFactory("::and::return"); } }; const __F__and__return__Init __F__and__return__i;
The attached patch fixes the problem by making sure that the lifetime of factoryTable exceeds the lifetime of const __F__and__return__Init __F__and__return__i / generated classes.
After applying the patch, slice2cpp generates the following code, that allows all unit tests to finish successfully:
class __F__and__return__Init { IceInternal::FactoryTableInit* _ftableinit; public: __F__and__return__Init(): _ftableinit(new IceInternal::FactoryTableInit) { ::IceInternal::factoryTable->addExceptionFactory("::and::return", new __F__and__return); } ~__F__and__return__Init() { ::IceInternal::factoryTable->removeExceptionFactory("::and::return"); delete _ftableinit; } }; const __F__and__return__Init __F__and__return__i;
Patch (Attachment not found.):
--- cpp/src/slice2cpp/Gen.cpp.orig 2013-06-23 01:25:10.126254943 +0000 +++ cpp/src/slice2cpp/Gen.cpp 2013-06-23 02:15:02.159048012 +0000 @@ -756,10 +756,11 @@ C << sp << nl << "class " << factoryName << "__Init"; C << sb; + C << nl << "IceInternal::FactoryTableInit* _ftableinit;"; C.dec(); C << nl << "public:"; C.inc(); - C << sp << nl << factoryName << "__Init()"; + C << sp << nl << factoryName << "__Init(): _ftableinit(new IceInternal::FactoryTableInit)"; C << sb; C << nl << "::IceInternal::factoryTable->addExceptionFactory(\"" << p->scoped() << "\", new " << factoryName << ");"; @@ -767,6 +768,7 @@ C << sp << nl << "~" << factoryName << "__Init()"; C << sb; C << nl << "::IceInternal::factoryTable->removeExceptionFactory(\"" << p->scoped() << "\");"; + C << nl << "delete _ftableinit;"; C << eb; C << eb << ';'; @@ -3976,10 +3978,11 @@ C << sp; C << nl << "class " << factoryName << "__Init"; C << sb; + C << nl << "IceInternal::FactoryTableInit* _ftableinit;"; C.dec(); C << nl << "public:"; C.inc(); - C << sp << nl << factoryName << "__Init()"; + C << sp << nl << factoryName << "__Init(): _ftableinit(new IceInternal::FactoryTableInit)"; C << sb; if(!p->isAbstract()) { @@ -4002,6 +4005,7 @@ { C << nl << "::IceInternal::factoryTable->removeTypeId(" << p->compactId() << ");"; } + C << nl << "delete _ftableinit;"; C << eb; C << eb << ';';
Thanks to Dimitry Andric <[email protected]> who analyzed the problem and tracked down the static initialization problem (see also Are ports supposed to build and run on 10-CURRENT?).
Cheers,
Michael
0
Comments
Hmm, this shouldn't be necessary, static initialization order should be guaranteed already thanks to the inclusion of <Ice/FactoryTableInit.h> in Key.h. We use a Schwarz counter to ensure that the factory table global is initialized first (FactoryTableInit.h declares "static FactoryTableInit factoryTableInitializer"). It's not clear to me why this wouldn't work with clang 3.3, what is the stack trace of the crash?
Cheers,
Benoit.
The problem is not initialization (you are right that it's always initialized before it's accessed first), but destruction. On exit, the last instance of static FactoryTableInit factoryTableInitializer, which was constructed by inclusion of FactoryTableInit.h, gets destructed before the last instance of const __F_*__Init __F__*__i. Therefore the reference count drops to zero and factoryTable gets deleted in ~FactoryTableInit before the destructor of __F_*__Init __F__*__Init tries to access factoryTable (calling removeTypeId or removeExceptionFactory) and the result is a bus error when trying to acquire the lock on the member mutex _m, which already had been destroyed.
Cheers,
Benoit.
Since gdb output is not that helpful I will add debug output all over the place to get a better understanding where things go wrong and maybe deduce a smaller test case based on that.
It's pretty clear, that within the translation unit in question (compiling the generated code, Key.cpp in this example) the initialization order of factoryTable, __F__and_return_i and __F_and_sizeof_i is not defined.
As it turns out, the order of events in this case is:
The only reason the code doesn't crash in step 2 already is that in step 1 shared libraries initialized factoryTable, so the call to addExceptionFactory succeeds. Step 2 and step 3 happen before the translation unit local factoryTableInitializer has been created, therefore it gets destructed first.
Now the interesting question is: It's clear, that static variables from shared libraries get destructed here before the local statics (_F__and_sizeof_i and _F__and_return_i) are. Is this a violation of the standard? And if so, could you maybe point me to the exact location of it were it states that, not just in respect of a specific translation unit and maybe also in relation to shared libraries.
Regardless of the question if this exact order is violating the standard or not, given the current code structure the following order of events would be plausible as well:
Afaik there's nothing in the standard preventing this from happening whatsoever, that's why I think my patch does exactly the right thing, making sure that there is always a FactoryTableInit object that exceeds the lifetime of __F__*__Init objects within the same translation unit to make sure the refcount stays above 0 until they're done.
Cheers,
Michael
p.s. - Here's the debug output I used for analyzing this - the interesting parts are happing at column 0, everything else is indented.
So in theory (4.) should occur before (2.) and (3.) since the factoryTableInitializer static is defined before any other Xxx_Init static object in the generated code. As a result the destruction of the factoryTableInitializer should always occur last in a same translation unit.
From the output of your tracing, it looks like the factoryTableInitializer object is initialized before Xxx_Init objects for some translation units (src/Ice/Metrics.o) but not for others (src/Ice/Locator.o, Key.o) which is unexpected.
Cheers,
Benoit.
FactoryTableInit 6637400 is from Client.cpp an 6637601 is from Key.cpp, so they are constructed in exactly the right order, the real problem is order of destruction.
I noticed that this only happens when the executable is built using -fPIC, so this *must* be a tool chain issue. I will try to figure out what's causing this and update you once I found the problem, but it's pretty clear that it's not Ice's fault, thanks for helping me to get this right.
In the meantime the patch above does fix the issue, so it's a vital workaround until it has been resolved.
Thanks for helping me to track this one down.