icegridnode error message and crash

in Help Center
I'm running Ice 3.1.1. I've been up and running for a number of months. Looking back over my logs (which are piped to syslog and formated by syslog-ng), I see numerous messages like this:
These messages are posted every five seconds. There does not appear to be any additional information. Up until today, I have not noticed any problems caused by this.
Looking over the log, I notice this in the time just before this started:
These errors repeated for a number of hours, after which time, the node errors started to appear. Interestingly, about 7 hours later, I started to see the same DNS errors on an independent IceGridRegistry. Both cleared up at about the same time and then I started seeing the messages about "a node with the same name".
I've now restarted icegridnode in both locations and the messages have stopped.
In one of the two places where I was running an icegrid registry, icegridnode crashed. This is what prompted me to notice the above.
Here is the Backtrace, etc. Does it appear that there is any relation? Or, would you suspect it to be something completely unrelated?
2007-10-30T12:20:23-0700 err /usr/bin/icegridnode[14502]: a node with the same name is already registered and active
2007-10-30T12:20:28-0700 err /usr/bin/icegridnode[14502]: a node with the same name is already registered and active
2007-10-30T12:20:33-0700 err /usr/bin/icegridnode[14502]: a node with the same name is already registered and active
These messages are posted every five seconds. There does not appear to be any additional information. Up until today, I have not noticed any problems caused by this.
Looking over the log, I notice this in the time just before this started:
2007-09-28T01:34:51-0700 warning /usr/bin/icegridnode[14502]: unexpected exception while reaping node session: TcpTransceiver.cpp:291: Ice::SocketException: socket exception: Connection timed out 2007-09-28T01:34:59-0700 warning /usr/bin/icegridnode[14502]: couldn't contact the IceGrid registry: Network.cpp:841: Ice::DNSException: DNS error: Name or service not known host: cl2
These errors repeated for a number of hours, after which time, the node errors started to appear. Interestingly, about 7 hours later, I started to see the same DNS errors on an independent IceGridRegistry. Both cleared up at about the same time and then I started seeing the messages about "a node with the same name".
I've now restarted icegridnode in both locations and the messages have stopped.
In one of the two places where I was running an icegrid registry, icegridnode crashed. This is what prompted me to notice the above.
Here is the Backtrace, etc. Does it appear that there is any relation? Or, would you suspect it to be something completely unrelated?
*** glibc detected *** /usr/bin/icegridnode: malloc(): memory corruption: 0x00000000008007c1 *** ======= Backtrace: ========= /lib/libc.so.6[0x2b3cf1107b0d] /lib/libc.so.6[0x2b3cf11099a6] /lib/libc.so.6(malloc+0x7d)[0x2b3cf110b4fd] /usr/lib/gcc/x86_64-pc-linux-gnu/4.1.2/libstdc++.so.6(_Znwm+0x1d)[0x2b3cf0cf6a3d] /usr/bin/icegridnode[0x50c151] /usr/bin/icegridnode[0x50cfd0] /usr/bin/icegridnode[0x50daa9] /usr/lib/libIce.so.31(_ZNK3Ice7Locator18___findAdapterByIdERN11IceInternal8IncomingERKNS_7CurrentE+0x126)[0x2b3cf0803936] /usr/lib/libIce.so.31(_ZN11IceInternal8Incoming6invokeERKNS_6HandleINS_14ServantManagerEEE+0xc80)[0x2b3cf07db510] /usr/lib/libIce.so.31(_ZN3Ice11ConnectionI9invokeAllERN11IceInternal11BasicStreamEiihRKNS1_6HandleINS1_14ServantManagerEEERKNS4_INS_13ObjectAdapterEEE+0x17e)[0x2b3cf07b2c6e] /usr/lib/libIce.so.31(_ZN3Ice11ConnectionI7messageERN11IceInternal11BasicStreamERKNS1_6HandleINS1_10ThreadPoolEEE+0x137)[0x2b3cf07b9a47] /usr/lib/libIce.so.31(_ZN11IceInternal10ThreadPool3runEv+0x997)[0x2b3cf08817d7] /usr/lib/libIce.so.31(_ZN11IceInternal10ThreadPool18EventHandlerThread3runEv+0x62)[0x2b3cf0882d22] /usr/lib/libIceUtil.so.31[0x2b3cf0a16456] /lib/libpthread.so.0[0x2b3cf0b28135] /lib/libc.so.6(__clone+0x6d)[0x2b3cf115b62d]
0
Comments
The forum software complained that my post was too long. Here's a portion of the memory map section of the dump:
Can you upgrade to Ice 3.2.1? I'm confident that the issue where the registry warns about "a node with the same name is already registered and active" is fixed in the latest IceGrid version.
Cheers,
Benoit.
How about the crash? It turns out that a co-worker has seen multiple icegridnode crashes with the same stack. Is that a known problem? Is it fixed in 3.2.1?
A co-worker has looked over the upgrade faq and come to the conclusion that an upgrade would require some amount of re-coding due to changes. At this point, we have a fair amount of Ice code, so we are holding off until we have some project down time (or until we find a need that forces our hand).
Related question; can 3.1.1 and 3.2.1 inter-operate? That is, if we upgrade a registry and collection of nodes to 3.2.1 and leave another at 3.1.1 will they be able to communicate?
Thanks
Since it seems that you are using Ice for critical parts of your business operations, you should consider a commercial support agreement. Please contact us at [email protected] if you are interested.
I've applied the patch posted here:
http://www.zeroc.com/forums/patches/2745-patch-1-ice-3-1-1-fixes-icegrid-locator-memory-corruption.html
This seems to have corrected the crash.