Archived

This forum has been archived. Please start a new discussion on GitHub.

IceGridRegistry stops accepting connections

I run IceGridRegistry 3.0.0 on a Linux server and use its Locate and Admin interfaces. Periodically, about once a week, the registry stops accepting connections.

If I look at the process list on the server I can see a bunch of processes created in the last several hours. If I restart the registry everything works again for another week.

Does anyone else notice this? Is there any other way to debug this?

thanks, alex

$ ps -ef | grep icegridregistry
makara 22681 1 0 Feb04 ? 00:00:00 icegridregistry --Ice.Config=/home/makara/icegridregistry.cfg --daemon
makara 22682 22681 0 Feb04 ? 00:00:02 icegridregistry --Ice.Config=/home/makara/icegridregistry.cfg --daemon
makara 22683 22682 0 Feb04 ? 00:00:00 icegridregistry --Ice.Config=/home/makara/icegridregistry.cfg --daemon
makara 22684 22682 0 Feb04 ? 00:00:00 icegridregistry --Ice.Config=/home/makara/icegridregistry.cfg --daemon
makara 22685 22682 0 Feb04 ? 00:00:00 icegridregistry --Ice.Config=/home/makara/icegridregistry.cfg --daemon
makara 22688 22682 0 Feb04 ? 00:04:06 icegridregistry --Ice.Config=/home/makara/icegridregistry.cfg --daemon
makara 22689 22682 0 Feb04 ? 00:04:08 icegridregistry --Ice.Config=/home/makara/icegridregistry.cfg --daemon
makara 22690 22682 0 Feb04 ? 00:04:09 icegridregistry --Ice.Config=/home/makara/icegridregistry.cfg --daemon
makara 22691 22682 0 Feb04 ? 00:04:08 icegridregistry --Ice.Config=/home/makara/icegridregistry.cfg --daemon
makara 22692 22682 0 Feb04 ? 00:00:00 icegridregistry --Ice.Config=/home/makara/icegridregistry.cfg --daemon
makara 22693 22682 0 Feb04 ? 00:00:00 icegridregistry --Ice.Config=/home/makara/icegridregistry.cfg --daemon
makara 22694 22682 0 Feb04 ? 00:00:00 icegridregistry --Ice.Config=/home/makara/icegridregistry.cfg --daemon
makara 22695 22682 0 Feb04 ? 00:07:54 icegridregistry --Ice.Config=/home/makara/icegridregistry.cfg --daemon
makara 22700 22682 0 Feb04 ? 00:00:00 icegridregistry --Ice.Config=/home/makara/icegridregistry.cfg --daemon
makara 1114 22682 0 15:22 ? 00:00:00 icegridregistry --Ice.Config=/home/makara/icegridregistry.cfg --daemon
makara 1308 22682 0 18:58 ? 00:00:01 icegridregistry --Ice.Config=/home/makara/icegridregistry.cfg --daemon
makara 1587 22682 0 20:16 ? 00:00:00 icegridregistry --Ice.Config=/home/makara/icegridregistry.cfg --daemon
makara 1608 22682 90 20:31 ? 00:11:35 icegridregistry --Ice.Config=/home/makara/icegridregistry.cfg --daemon
makara 1611 22682 43 20:31 ? 00:05:19 icegridregistry --Ice.Config=/home/makara/icegridregistry.cfg --daemon
makara 1613 22682 0 20:32 ? 00:00:00 icegridregistry --Ice.Config=/home/makara/icegridregistry.cfg --daemon
makara 1614 22682 0 20:32 ? 00:00:00 icegridregistry --Ice.Config=/home/makara/icegridregistry.cfg --daemon
makara 1616 22682 0 20:32 ? 00:00:00 icegridregistry --Ice.Config=/home/makara/icegridregistry.cfg --daemon
makara 1800 22682 0 20:38 ? 00:00:00 icegridregistry --Ice.Config=/home/makara/icegridregistry.cfg --daemon
makara 1801 22682 0 20:38 ? 00:00:00 icegridregistry --Ice.Config=/home/makara/icegridregistry.cfg --daemon
makara 1802 22682 0 20:38 ? 00:00:00 icegridregistry --Ice.Config=/home/makara/icegridregistry.cfg --daemon
makara 1803 22682 0 20:38 ? 00:00:00 icegridregistry --Ice.Config=/home/makara/icegridregistry.cfg --daemon
makara 1804 22682 0 20:38 ? 00:00:00 icegridregistry --Ice.Config=/home/makara/icegridregistry.cfg --daemon
makara 1805 22682 0 20:38 ? 00:00:00 icegridregistry --Ice.Config=/home/makara/icegridregistry.cfg --daemon
makara 1806 22682 0 20:38 ? 00:00:00 icegridregistry --Ice.Config=/home/makara/icegridregistry.cfg --daemon
makara 1807 22682 0 20:38 ? 00:00:00 icegridregistry --Ice.Config=/home/makara/icegridregistry.cfg --daemon
makara 1808 22682 0 20:39 ? 00:00:00 icegridregistry --Ice.Config=/home/makara/icegridregistry.cfg --daemon
makara 1809 22682 0 20:39 ? 00:00:00 icegridregistry --Ice.Config=/home/makara/icegridregistry.cfg --daemon
makara 1810 22682 0 20:39 ? 00:00:00 icegridregistry --Ice.Config=/home/makara/icegridregistry.cfg --daemon
makara 1811 22682 0 20:39 ? 00:00:00 icegridregistry --Ice.Config=/home/makara/icegridregistry.cfg --daemon
makara 1812 22682 0 20:40 ? 00:00:00 icegridregistry --Ice.Config=/home/makara/icegridregistry.cfg --daemon

More details:

$ icegridregistry --version
3.0.0
$ uname -a
Linux bamboo 2.4.27-1-686-smp #1 SMP Wed Dec 1 19:50:17 JST 2004 i686 GNU/Linux

$ more icegridregistry.cfg
IceGrid.Registry.Client.Endpoints=default -h bamboo -p 12000
IceGrid.Registry.Server.Endpoints=default -h bamboo
IceGrid.Registry.Admin.Endpoints=default -h bamboo

IceGrid.Registry.Internal.Endpoints=default -p 13000
IceGrid.Registry.Data=/home/makara/sys/icereg/regdb
IceGrid.Registry.DynamicRegistration=1

Ice.Trace.Network=0
Ice.Warn.Connections=0
Ice.PrintProcessId=1

Comments

  • benoit
    benoit Rennes, France
    Hi Alex,

    Which Linux distribution do you use? I don't think you're seeing multiple processes here, it's probably the threads of the icegridregistry process.

    You could try to add network tracing (with Ice.Trace.Network=2) on the IceGrid registry and check the traces when it stops accepting new connections. But the best would be to attach to the IceGrid registry with the debugger when it stops responding and get a stack trace dump of all the IceGrid registry threads (let us know if you need more information on how to do this). If you attach the thread dump here on the forum, we'll take a look at it and check if there's anything wrong. Of course, a small example demonstrating the problem would also be fine but I suspect this might not be so easy to reproduce!

    Thanks,

    Benoit.
  • Hi Benoit, sorry for a bit of a delay in replying.

    We use Debian 'testing'.

    > But the best would be to attach to the IceGrid registry with the debugger when it
    > stops responding and get a stack trace dump of all the IceGrid registry threads
    > (let us know if you need more information on how to do this).

    ok, i can do it, if you tell me how.

    thanks, alex
  • matthew
    matthew NL, Canada
    You need to find the pid of the registry (any of them will do). Then you do

    gdb - <pid>
    then
    thread all apply bt
    cut & paste the output. Or you can first run typescript, run the gdb commands and such and then exit. The output will be contained in a file called typescript.
  • thanks matthew, will wait now for the condition to reoccur.
    alex
  • So here it is. The registry ran exactly 7 days before going silent. First there's a list of all the threads. And then the traceback of from gdb.

    thanks for checking it out, alex

    ================================
    $ ps -ef | grep icegridregistry
    makara 29062 29042 2 17:43 pts/1 00:00:02 gdb icegridregistry 14779
    makara 14779 14766 0 Feb10 ? 01:22:41 icegridregistry --Ice.Config=/home/makara/sys/icereg/icegridregistry.cfg --daemon
    makara 14765 1 0 Feb10 ? 00:00:00 icegridregistry --Ice.Config=/home/makara/sys/icereg/icegridregistry.cfg --daemon
    makara 14766 14765 0 Feb10 ? 00:00:07 icegridregistry --Ice.Config=/home/makara/sys/icereg/icegridregistry.cfg --daemon
    makara 14767 14766 0 Feb10 ? 00:00:00 icegridregistry --Ice.Config=/home/makara/sys/icereg/icegridregistry.cfg --daemon
    makara 14768 14766 0 Feb10 ? 00:00:00 icegridregistry --Ice.Config=/home/makara/sys/icereg/icegridregistry.cfg --daemon
    makara 25316 14766 0 Feb17 ? 00:00:25 icegridregistry --Ice.Config=/home/makara/sys/icereg/icegridregistry.cfg --daemon
    makara 25298 14766 0 Feb17 ? 00:00:26 icegridregistry --Ice.Config=/home/makara/sys/icereg/icegridregistry.cfg --daemon
    makara 14772 14766 0 Feb10 ? 00:18:41 icegridregistry --Ice.Config=/home/makara/sys/icereg/icegridregistry.cfg --daemon
    makara 14773 14766 0 Feb10 ? 00:18:40 icegridregistry --Ice.Config=/home/makara/sys/icereg/icegridregistry.cfg --daemon
    makara 14774 14766 0 Feb10 ? 00:50:30 icegridregistry --Ice.Config=/home/makara/sys/icereg/icegridregistry.cfg --daemon
    makara 14775 14766 0 Feb10 ? 00:18:37 icegridregistry --Ice.Config=/home/makara/sys/icereg/icegridregistry.cfg --daemon
    makara 14776 14766 0 Feb10 ? 00:00:00 icegridregistry --Ice.Config=/home/makara/sys/icereg/icegridregistry.cfg --daemon
    makara 14777 14766 0 Feb10 ? 00:00:00 icegridregistry --Ice.Config=/home/makara/sys/icereg/icegridregistry.cfg --daemon
    makara 14778 14766 0 Feb10 ? 00:00:00 icegridregistry --Ice.Config=/home/makara/sys/icereg/icegridregistry.cfg --daemon
    makara 28970 14766 82 16:47 ? 00:48:12 icegridregistry --Ice.Config=/home/makara/sys/icereg/icegridregistry.cfg --daemon
    makara 28961 14766 0 16:46 ? 00:00:00 icegridregistry --Ice.Config=/home/makara/sys/icereg/icegridregistry.cfg --daemon
    makara 28855 14766 0 16:33 ? 00:00:00 icegridregistry --Ice.Config=/home/makara/sys/icereg/icegridregistry.cfg --daemon
    makara 16019 14766 1 Feb15 ? 00:47:58 icegridregistry --Ice.Config=/home/makara/sys/icereg/icegridregistry.cfg --daemon

    ================================

    $ gdb icegridregistry 14779
    (gdb) thread apply all bt

    Thread 18 (Thread 451821587 (LWP 16019)):
    #0 0x408bfb65 in __deregister_frame () from /lib/libgcc_s.so.1
    #1 0x409ae7f5 in dl_iterate_phdr () from /lib/libc.so.6
    #2 0x408c0646 in _Unwind_Find_FDE () from /lib/libgcc_s.so.1
    #3 0x408bd397 in _Unwind_DeleteException () from /lib/libgcc_s.so.1
    #4 0x408be81b in _Unwind_RaiseException () from /lib/libgcc_s.so.1
    #5 0x40864b7c in __cxa_throw () from /usr/lib/libstdc++.so.6
    #6 0x4059aaf0 in IceInternal::TcpTransceiver::read (this=0x40c00658, buf=@0x40c01a44, timeout=0) at TcpTransceiver.cpp:219
    #7 0x404c6321 in Ice::ConnectionI::read (this=0x40c01a38, stream=@0x40c01a44) at ConnectionI.cpp:1214
    #8 0x4059db51 in IceInternal::ThreadPool::read (this=0x8211ef8, handler=@0xbd5ff8b4) at ThreadPool.cpp:756
    #9 0x405a15f1 in IceInternal::ThreadPool::run (this=0x8211ef8) at ThreadPool.cpp:593
    #10 0x405a229a in IceInternal::ThreadPool::EventHandlerThread::run (this=0x40b008e0) at ThreadPool.cpp:852
    #11 0x406539e9 in startHook (arg=0x40b008e0) at Thread.cpp:482
    #12 0x40674f4c in pthread_start_thread () from /lib/libpthread.so.0
    #13 0x4097d92a in clone () from /lib/libc.so.6

    Thread 17 (Thread 626311185 (LWP 28855)):
    #0 0x40678184 in __pthread_sigsuspend () from /lib/libpthread.so.0
    #1 0x40676f59 in __pthread_wait_for_restart_signal () from /lib/libpthread.so.0
    #2 0x4067457c in pthread_cond_wait@GLIBC_2.0 () from /lib/libpthread.so.0
    #3 0x0813b272 in IceUtil::Cond::waitImpl<IceUtil::Mutex> (this=0x8211f04, mutex=@0x8211f34) at Cond.h:203
    #4 0x0813b33a in IceUtil::Monitor<IceUtil::Mutex>::wait (this=0x8211f04) at Monitor.h:152
    #5 0x405a213d in IceInternal::ThreadPool::run (this=0x8211ef8) at ThreadPool.cpp:735
    #6 0x405a229a in IceInternal::ThreadPool::EventHandlerThread::run (this=0x827a2e8) at ThreadPool.cpp:852
    #7 0x406539e9 in startHook (arg=0x827a2e8) at Thread.cpp:482
    #8 0x40674f4c in pthread_start_thread () from /lib/libpthread.so.0
    #9 0x4097d92a in clone () from /lib/libc.so.6

    Thread 16 (Thread 628031504 (LWP 28961)):
    #0 0x40678184 in __pthread_sigsuspend () from /lib/libpthread.so.0
    #1 0x40676f59 in __pthread_wait_for_restart_signal () from /lib/libpthread.so.0
    #2 0x4067457c in pthread_cond_wait@GLIBC_2.0 () from /lib/libpthread.so.0
    #3 0x0813b272 in IceUtil::Cond::waitImpl<IceUtil::Mutex> (this=0x821242c, mutex=@0x821245c) at Cond.h:203
    #4 0x0813b33a in IceUtil::Monitor<IceUtil::Mutex>::wait (this=0x821242c) at Monitor.h:152
    #5 0x405a213d in IceInternal::ThreadPool::run (this=0x8212420) at ThreadPool.cpp:735
    #6 0x405a229a in IceInternal::ThreadPool::EventHandlerThread::run (this=0x820fbe8) at ThreadPool.cpp:852
    #7 0x406539e9 in startHook (arg=0x820fbe8) at Thread.cpp:482
    #8 0x40674f4c in pthread_start_thread () from /lib/libpthread.so.0
    #9 0x4097d92a in clone () from /lib/libc.so.6

    Thread 15 (Thread 628178959 (LWP 28970)):
    #0 0x408bfb63 in __deregister_frame () from /lib/libgcc_s.so.1
    #1 0x409ae7f5 in dl_iterate_phdr () from /lib/libc.so.6
    #2 0x408c0646 in _Unwind_Find_FDE () from /lib/libgcc_s.so.1
    #3 0x408bd397 in _Unwind_DeleteException () from /lib/libgcc_s.so.1
    #4 0x408be81b in _Unwind_RaiseException () from /lib/libgcc_s.so.1
    #5 0x40864b7c in __cxa_throw () from /usr/lib/libstdc++.so.6
    #6 0x4059aaf0 in IceInternal::TcpTransceiver::read (this=0x8279320, buf=@0x827b32c, timeout=0) at TcpTransceiver.cpp:219
    #7 0x404c6321 in Ice::ConnectionI::read (this=0x827b320, stream=@0x827b32c) at ConnectionI.cpp:1214
    #8 0x4059db51 in IceInternal::ThreadPool::read (this=0x8212420, handler=@0xbddff8b4) at ThreadPool.cpp:756
    #9 0x405a15f1 in IceInternal::ThreadPool::run (this=0x8212420) at ThreadPool.cpp:593
    #10 0x405a229a in IceInternal::ThreadPool::EventHandlerThread::run (this=0x40b019d8) at ThreadPool.cpp:852
    #11 0x406539e9 in startHook (arg=0x40b019d8) at Thread.cpp:482
    #12 0x40674f4c in pthread_start_thread () from /lib/libpthread.so.0
    #13 0x4097d92a in clone () from /lib/libc.so.6

    Thread 14 (Thread 213006 (LWP 14779)):
    #0 0x408bfb63 in __deregister_frame () from /lib/libgcc_s.so.1
    #1 0x409ae7f5 in dl_iterate_phdr () from /lib/libc.so.6
    #2 0x408c0646 in _Unwind_Find_FDE () from /lib/libgcc_s.so.1
    #3 0x408bd397 in _Unwind_DeleteException () from /lib/libgcc_s.so.1
    #4 0x408be81b in _Unwind_RaiseException () from /lib/libgcc_s.so.1
    #5 0x40864b7c in __cxa_throw () from /usr/lib/libstdc++.so.6
    #6 0x4059aaf0 in IceInternal::TcpTransceiver::read (this=0x40b011d8, buf=@0x40b01434, timeout=0) at TcpTransceiver.cpp:219
    #7 0x404c6321 in Ice::ConnectionI::read (this=0x40b01428, stream=@0x40b01434) at ConnectionI.cpp:1214
    #8 0x4059db51 in IceInternal::ThreadPool::read (this=0x82789f0, handler=@0xbdfff8b4) at ThreadPool.cpp:756
    #9 0x405a15f1 in IceInternal::ThreadPool::run (this=0x82789f0) at ThreadPool.cpp:593
    #10 0x405a229a in IceInternal::ThreadPool::EventHandlerThread::run (this=0x8278cd0) at ThreadPool.cpp:852
    #11 0x406539e9 in startHook (arg=0x8278cd0) at Thread.cpp:482
    #12 0x40674f4c in pthread_start_thread () from /lib/libpthread.so.0
    #13 0x4097d92a in clone () from /lib/libc.so.6

    Thread 13 (Thread 196621 (LWP 14778)):
    #0 0x40678184 in __pthread_sigsuspend () from /lib/libpthread.so.0
    #1 0x40676f59 in __pthread_wait_for_restart_signal () from /lib/libpthread.so.0
    #2 0x4067457c in pthread_cond_wait@GLIBC_2.0 () from /lib/libpthread.so.0
    #3 0x0813b272 in IceUtil::Cond::waitImpl<IceUtil::Mutex> (this=0x826b654, mutex=@0x826b684) at Cond.h:203
    #4 0x0813b33a in IceUtil::Monitor<IceUtil::Mutex>::wait (this=0x826b654) at Monitor.h:152
    #5 0x40222bce in IceStorm::FlusherThread::run (this=0x826b630) at Flusher.cpp:56
    #6 0x406539e9 in startHook (arg=0x826b630) at Thread.cpp:482
    #7 0x40674f4c in pthread_start_thread () from /lib/libpthread.so.0
    #8 0x4097d92a in clone () from /lib/libc.so.6

    Thread 12 (Thread 180236 (LWP 14777)):
    #0 0x4067cb96 in nanosleep () from /lib/libpthread.so.0
    #1 0x00000000 in ?? ()

    Thread 11 (Thread 163851 (LWP 14776)):
    #0 0x4067cb96 in nanosleep () from /lib/libpthread.so.0
    #1 0x00000000 in ?? ()

    Thread 10 (Thread 147466 (LWP 14775)):
    #0 0x40678184 in __pthread_sigsuspend () from /lib/libpthread.so.0
    #1 0x40676f59 in __pthread_wait_for_restart_signal () from /lib/libpthread.so.0
    #2 0x4067457c in pthread_cond_wait@GLIBC_2.0 () from /lib/libpthread.so.0
    #3 0x0813b272 in IceUtil::Cond::waitImpl<IceUtil::Mutex> (this=0x821313c, mutex=@0x821316c) at Cond.h:203
    #4 0x0813b33a in IceUtil::Monitor<IceUtil::Mutex>::wait (this=0x821313c) at Monitor.h:152
    #5 0x405a213d in IceInternal::ThreadPool::run (this=0x8213130) at ThreadPool.cpp:735
    #6 0x405a229a in IceInternal::ThreadPool::EventHandlerThread::run (this=0x8213388) at ThreadPool.cpp:852
    #7 0x406539e9 in startHook (arg=0x8213388) at Thread.cpp:482
    #8 0x40674f4c in pthread_start_thread () from /lib/libpthread.so.0
    #9 0x4097d92a in clone () from /lib/libc.so.6
    ======= continues in the next post ================
  • ============ continues from the previuos post ============
    Thread 9 (Thread 131081 (LWP 14774)):
    #0 0x408bfb51 in __deregister_frame () from /lib/libgcc_s.so.1
    #1 0x409ae7f5 in dl_iterate_phdr () from /lib/libc.so.6
    #2 0x408c0646 in _Unwind_Find_FDE () from /lib/libgcc_s.so.1
    #3 0x408bd397 in _Unwind_DeleteException () from /lib/libgcc_s.so.1
    #4 0x408be81b in _Unwind_RaiseException () from /lib/libgcc_s.so.1
    #5 0x40864b7c in __cxa_throw () from /usr/lib/libstdc++.so.6
    #6 0x4059aaf0 in IceInternal::TcpTransceiver::read (this=0x40c016d0, buf=@0x40c01804, timeout=0) at TcpTransceiver.cpp:219
    #7 0x404c6321 in Ice::ConnectionI::read (this=0x40c017f8, stream=@0x40c01804) at ConnectionI.cpp:1214
    #8 0x4059db51 in IceInternal::ThreadPool::read (this=0x8213130, handler=@0xbe9ff8b4) at ThreadPool.cpp:756
    #9 0x405a15f1 in IceInternal::ThreadPool::run (this=0x8213130) at ThreadPool.cpp:593
    #10 0x405a229a in IceInternal::ThreadPool::EventHandlerThread::run (this=0x82132d8) at ThreadPool.cpp:852
    #11 0x406539e9 in startHook (arg=0x82132d8) at Thread.cpp:482
    #12 0x40674f4c in pthread_start_thread () from /lib/libpthread.so.0
    #13 0x4097d92a in clone () from /lib/libc.so.6

    Thread 8 (Thread 114696 (LWP 14773)):
    #0 0x40678184 in __pthread_sigsuspend () from /lib/libpthread.so.0
    #1 0x40676f59 in __pthread_wait_for_restart_signal () from /lib/libpthread.so.0
    #2 0x4067457c in pthread_cond_wait@GLIBC_2.0 () from /lib/libpthread.so.0
    #3 0x0813b272 in IceUtil::Cond::waitImpl<IceUtil::Mutex> (this=0x821313c, mutex=@0x821316c) at Cond.h:203
    #4 0x0813b33a in IceUtil::Monitor<IceUtil::Mutex>::wait (this=0x821313c) at Monitor.h:152
    #5 0x405a213d in IceInternal::ThreadPool::run (this=0x8213130) at ThreadPool.cpp:735
    #6 0x405a229a in IceInternal::ThreadPool::EventHandlerThread::run (this=0x82132a0) at ThreadPool.cpp:852
    #7 0x406539e9 in startHook (arg=0x82132a0) at Thread.cpp:482
    #8 0x40674f4c in pthread_start_thread () from /lib/libpthread.so.0
    #9 0x4097d92a in clone () from /lib/libc.so.6

    Thread 7 (Thread 98311 (LWP 14772)):
    #0 0x40678184 in __pthread_sigsuspend () from /lib/libpthread.so.0
    #1 0x40676f59 in __pthread_wait_for_restart_signal () from /lib/libpthread.so.0
    #2 0x4067457c in pthread_cond_wait@GLIBC_2.0 () from /lib/libpthread.so.0
    #3 0x0813b272 in IceUtil::Cond::waitImpl<IceUtil::Mutex> (this=0x821313c, mutex=@0x821316c) at Cond.h:203
    #4 0x0813b33a in IceUtil::Monitor<IceUtil::Mutex>::wait (this=0x821313c) at Monitor.h:152
    #5 0x405a213d in IceInternal::ThreadPool::run (this=0x8213130) at ThreadPool.cpp:735
    #6 0x405a229a in IceInternal::ThreadPool::EventHandlerThread::run (this=0x8212fa8) at ThreadPool.cpp:852
    #7 0x406539e9 in startHook (arg=0x8212fa8) at Thread.cpp:482
    #8 0x40674f4c in pthread_start_thread () from /lib/libpthread.so.0
    #9 0x4097d92a in clone () from /lib/libc.so.6

    Thread 6 (Thread 573472774 (LWP 25298)):
    #0 0x40678184 in __pthread_sigsuspend () from /lib/libpthread.so.0
    #1 0x40676f59 in __pthread_wait_for_restart_signal () from /lib/libpthread.so.0
    #2 0x4067457c in pthread_cond_wait@GLIBC_2.0 () from /lib/libpthread.so.0
    #3 0x0813b272 in IceUtil::Cond::waitImpl<IceUtil::Mutex> (this=0x8212944, mutex=@0x8212974) at Cond.h:203
    #4 0x0813b33a in IceUtil::Monitor<IceUtil::Mutex>::wait (this=0x8212944) at Monitor.h:152
    #5 0x405a213d in IceInternal::ThreadPool::run (this=0x8212938) at ThreadPool.cpp:735
    #6 0x405a229a in IceInternal::ThreadPool::EventHandlerThread::run (this=0x820fdc8) at ThreadPool.cpp:852
    #7 0x406539e9 in startHook (arg=0x820fdc8) at Thread.cpp:482
    #8 0x40674f4c in pthread_start_thread () from /lib/libpthread.so.0
    #9 0x4097d92a in clone () from /lib/libc.so.6

    Thread 5 (Thread 573751301 (LWP 25316)):
    #0 0x40976c21 in select () from /lib/libc.so.6
    #1 0x4060fba4 in ?? () from /opt/Ice-3.0.0/lib/libIce.so.30
    #2 0x40c01cba in ?? ()
    #3 0x00000000 in ?? ()

    Thread 4 (Thread 32771 (LWP 14768)):
    #0 0x4067cb96 in nanosleep () from /lib/libpthread.so.0
    #1 0x00000000 in ?? ()

    Thread 3 (Thread 16386 (LWP 14767)):
    #0 0x408ec917 in sigsuspend () from /lib/libc.so.6
    #1 0x4067862e in sigwait () from /lib/libpthread.so.0
    #2 0x4063c29c in sigwaitThread () at CtrlCHandler.cpp:124
    #3 0x40674f4c in pthread_start_thread () from /lib/libpthread.so.0
    #4 0x4097d92a in clone () from /lib/libc.so.6

    Thread 2 (Thread 32769 (LWP 14766)):
    #0 0x409744a6 in poll () from /lib/libc.so.6
    #1 0x40675514 in __pthread_manager () from /lib/libpthread.so.0
    #2 0x4097d92a in clone () from /lib/libc.so.6

    Thread 1 (Thread 16384 (LWP 14765)):
    #0 0x40678184 in __pthread_sigsuspend () from /lib/libpthread.so.0
    #1 0x40676f59 in __pthread_wait_for_restart_signal () from /lib/libpthread.so.0
    #2 0x4067457c in pthread_cond_wait@GLIBC_2.0 () from /lib/libpthread.so.0
    #3 0x40537a6e in IceUtil::Cond::waitImpl<IceUtil::RecMutex> (this=0x820f96c, mutex=@0x820f99c) at Cond.h:203
    #4 0x40537b36 in IceUtil::Monitor<IceUtil::RecMutex>::wait (this=0x820f96c) at Monitor.h:152
    #5 0x40536313 in IceInternal::ObjectAdapterFactory::waitForShutdown (this=0x820f960) at ObjectAdapterFactory.cpp:66
    #6 0x404b2138 in Ice::CommunicatorI::waitForShutdown (this=0x8211da8) at CommunicatorI.cpp:119
    #7 0x4058c0e8 in Ice::Service::waitForShutdown (this=0xbfffe444) at Service.cpp:906
    #8 0x4058db56 in Ice::Service::runDaemon (this=0xbfffe444, argc=1, argv=0xbfffe4c4) at Service.cpp:1735
    #9 0x4058dec8 in Ice::Service::run (this=0xbfffe444, argc=@0xbfffe470, argv=0xbfffe4c4) at Service.cpp:502
    #10 0x4058ead7 in Ice::Service::main (this=0xbfffe444, argc=@0xbfffe470, argv=0xbfffe4c4) at Service.cpp:451
    #11 0x081a2cde in main (argc=2, argv=0xbfffe4c4) at IceGridRegistry.cpp:169
    #0 0x408bfb63 in __deregister_frame () from /lib/libgcc_s.so.1
    ================================
  • benoit
    benoit Rennes, France
    Hi Alex,

    All the threads of the Ice server thread pool are stuck in calls to the C library and GCC library (following an exception raised by the TcpTransceiver.) This really looks like a compiler/libc problem to me. Which libc and compiler version do you use? Can you make sure that you have all the patch applied for your distribution?

    Cheers,
    Benoit.
  • the compiler is from Debian 'testing', upgraded in the last couple of days:

    gcc (GCC) 4.0.3 20060128 (prerelease) (Debian 4.0.2-8)

    does ldd give reliable information on libc?

    $ ldd /opt/Ice-3.0.1/bin/icegridregistry
    linux-gate.so.1 => (0xffffe000)
    libIceGrid.so.30 => /opt/Ice-3.0.1/lib/libIceGrid.so.30 (0xb7d5c000)
    libIceStormService.so.30 => /opt/Ice-3.0.1/lib/libIceStormService.so.30 (0xb7caa000)
    libGlacier2.so.30 => /opt/Ice-3.0.1/lib/libGlacier2.so.30 (0xb7c68000)
    libFreeze.so.30 => /opt/Ice-3.0.1/lib/libFreeze.so.30 (0xb7bae000)
    libIceBox.so.30 => /opt/Ice-3.0.1/lib/libIceBox.so.30 (0xb7b95000)
    libIceXML.so.30 => /opt/Ice-3.0.1/lib/libIceXML.so.30 (0xb7b82000)
    libIce.so.30 => /opt/Ice-3.0.1/lib/libIce.so.30 (0xb78f9000)
    libIceUtil.so.30 => /opt/Ice-3.0.1/lib/libIceUtil.so.30 (0xb78b8000)
    libpthread.so.0 => /lib/tls/libpthread.so.0 (0xb7890000)
    libdb_cxx-4.3.so => /usr/lib/libdb_cxx-4.3.so (0xb7794000)
    libstdc++.so.6 => /usr/lib/libstdc++.so.6 (0xb76b6000)
    libm.so.6 => /lib/tls/libm.so.6 (0xb7690000)
    libgcc_s.so.1 => /lib/libgcc_s.so.1 (0xb7685000)
    libc.so.6 => /lib/tls/libc.so.6 (0xb754e000)
    libexpat.so.1 => /usr/lib/libexpat.so.1 (0xb752e000)
    libbz2.so.1.0 => /lib/libbz2.so.1.0 (0xb751c000)
    libdl.so.2 => /lib/tls/libdl.so.

    do you mean Ice and IceGrid patches? I haven't applied any. I can try.

    alex
  • Sorry Benoit, I got a bit confused between different machines. Disregard the last email. I run a mix of Ice 3.0.0 and 3.0.1 between the server and the clients.

    The server runs IceGrid registry 3.0.0 and ldd reports this:

    $ ldd /opt/Ice-3.0.0/bin/icegridregistry
    libIceGrid.so.30 => /opt/Ice-3.0.0/lib/libIceGrid.so.30 (0x40017000)
    libIceStormService.so.30 => /opt/Ice-3.0.0/lib/libIceStormService.so.30 (0x401bc000)
    libGlacier2.so.30 => /opt/Ice-3.0.0/lib/libGlacier2.so.30 (0x4026e000)
    libFreeze.so.30 => /opt/Ice-3.0.0/lib/libFreeze.so.30 (0x402b0000)
    libIceBox.so.30 => /opt/Ice-3.0.0/lib/libIceBox.so.30 (0x4036a000)
    libIceXML.so.30 => /opt/Ice-3.0.0/lib/libIceXML.so.30 (0x40382000)
    libIce.so.30 => /opt/Ice-3.0.0/lib/libIce.so.30 (0x40395000)
    libIceUtil.so.30 => /opt/Ice-3.0.0/lib/libIceUtil.so.30 (0x4061f000)
    libpthread.so.0 => /lib/libpthread.so.0 (0x4066f000)
    libdb_cxx-4.3.so => /usr/lib/libdb_cxx-4.3.so (0x406c2000)
    libstdc++.so.6 => /usr/lib/libstdc++.so.6 (0x407b7000)
    libm.so.6 => /lib/libm.so.6 (0x40893000)
    libgcc_s.so.1 => /lib/libgcc_s.so.1 (0x408b8000)
    libc.so.6 => /lib/libc.so.6 (0x408c4000)
    libexpat.so.1 => /usr/lib/libexpat.so.1 (0x409dd000)
    libbz2.so.1.0 => /lib/libbz2.so.1.0 (0x409fd000)
    libdl.so.2 => /lib/libdl.so.2 (0x40a0e000)
    /lib/ld-linux.so.2 (0x40000000)

    the compiler is slightly older, but should be new enough:

    gcc (GCC) 4.0.3 20051201 (prerelease) (Debian 4.0.2-5)
  • Benoit,

    turns out the server was running on kernel 2.4.x-SMP. That might've led to some lib confusion.

    I upgraded to kernel 2.6.15-SMP and compiled Ice-3.0.1 with gcc 4.0.3. hopefully, the problem will go away, i'll let you know.

    alex
  • Just to close this issue... After upgrading to the new kernel and everything that comes with it, the registry has been running continuously for over a month now with no problems.

    thanks for your help, alex
  • benoit
    benoit Rennes, France
    Hi Alex,

    Thanks for the info, I'm glad it works fine now!

    Cheers,
    Benoit.