Home Help Center

Freeze failure on x86_64

spongebobspongebob Member Simon J. BensonOrganization: Centre for Magnetic ResonanceProject: Resonanz [www.resonanz.org.au]
Hello,

I have compiled Berkeley (db-4.3.27.NC) on an x86_64 platform using gcc (3.4.1) without any problems ensuring all the correct flags set out in you documentation.

I did discover a small issue when building Ice, it was looking in $DB_HOME/lib64. Berkely installs to $DB_HOME/lib. A small problem.

Just checking ....

ldd $DB_HOME/lib64/libdb_cxx-4.3.so
libstdc++.so.6 => /usr/lib64/libstdc++.so.6 (0x0000002a95758000)
libm.so.6 => /lib64/tls/libm.so.6 (0x0000002a9594a000)
libc.so.6 => /lib64/tls/libc.so.6 (0x0000002a95aa3000)
libgcc_s.so.1 => /lib64/libgcc_s.so.1 (0x0000002a95ccf000)
/lib64/ld-linux-x86-64.so.2 => /lib64/ld-linux-x86-64.so.2 (0x000000552aaaa000)

... so it did build 64bit.

Ice then built fine. Then running the tests I see this :-

*** running tests in ./test/Freeze/dbmap
starting client... ok
testing populate... ok
testing map::find... ok
testing erase... ok
testing map::find (again)... ok
testing iterators... ok
testing iterator.set... ok
testing algorithms... ok
testing index ... ok
testing concurrent access... test in ./test/Freeze/dbmap failed with exit status 256


Running test/Freeze/complex/run.py results in no errors, but test/Freeze/evictor/run.py gives the following :-

starting server... ok
starting client... ok
testing Freeze Evictor... ../../../test/Freeze/evictor/client: warning: connection exception:
SslTransceiver.cpp:288: Ice::ConnectionLostException:
connection lost: recv() returned zero
local address = 127.0.0.1:35897
remote address = 127.0.0.1:12345
../../../test/Freeze/evictor/client: warning: connection exception:
SslTransceiver.cpp:269: Ice::ConnectionLostException:
connection lost: Connection reset by peer
local address = 127.0.0.1:35899
remote address = 127.0.0.1:35898
Network.cpp:557: Ice::ConnectionRefusedException:
connection refused: Connection refused

Any thoughts?
Bob

Comments

  • bernardbernard Jupiter, FLAdministrators, ZeroC Staff Bernard NormierOrganization: ZeroC, Inc.Project: Ice ZeroC Staff
    Hi Bob,

    Since you get an error in the threaded part of the Freeze test, I'd double check first your Berkeley DB build. Did db's configure select POSIX mutexes? If not, you could try to reconfigure and rebuild with --enable-posixmutexes (and --enable-cxx).

    Which x86_64 distribution do you use? If it comes with a binary Berkeley DB package (4.2.52 or later), do the Freeze tests also fail with this package?

    Cheers,
    Bernard
  • spongebobspongebob Member Simon J. BensonOrganization: Centre for Magnetic ResonanceProject: Resonanz [www.resonanz.org.au]
    Hi Bernard,

    The x86_64 distribution is Mandrake 10.1 (2.6.8.1-20mdksmp). It's packaged Berkeley DB (4.2-4.2.52-6mdk) gives a core dump running Ice.

    Having recompiled Berekeley DB with " --enable-posixmutexes --enable-cxx " and then
    compiling Ice and running the tests gives the same result

    testing concurrent access... test in ./test/Freeze/dbmap failed with exit status 256

    I have also tested the Mandrake 10.2 x86_64 distribution and things look far worse. Using Berkeley DB (db-4.3.27.NC) and the same compiler flags as above, then compiling Ice and running the tests gives

    *** running tests in ./test/Ice/operations
    tests with regular server.
    starting server... ok
    starting client... ok
    testing stringToProxy... ok
    testing checked cast... SslClientTransceiver.cpp:322: IceSSL::ProtocolException:
    encountered a violation of the ssl protocol during handshake
    1 - Thread ID: 46912512655360
    1 - Error: 336031996
    1 - Message: error:140770FC:SSL routines:SSL23_GET_SERVER_HELLO:unknown protocol
    1 - Location: s23_clnt.c, 478

    test in ./test/Ice/operations failed with exit status 256


    This is using lib64openssl0.9.7-0.9.7e-5mdk, which is the version you support.

    I'm aware that you don't officially support Mandrake (or Mandriva as its now called). Do you support Fedora Core 3 for x86_64 or this just for x86? I will start Fedora Core 3 & 4 x86_64 testing soon but any heads up on what to expect would be appreciated.

    I have done some testing of Ice and find it a breeze to use over CORBA, but now we are ready to scale on a number of clusters (Sunfire) and shared memory (Altix) boxes with a combination of approximately 400+ processors in preparation for the IceGrid. I also found that you don't support IA64 so that rules out the Altix boxes. I will be needing Glacier to penetrate some firewalls to access these supercomputers.

    Many thanks,
    Bob

    bernard wrote:
    Hi Bob,

    Since you get an error in the threaded part of the Freeze test, I'd double check first your Berkeley DB build. Did db's configure select POSIX mutexes? If not, you could try to reconfigure and rebuild with --enable-posixmutexes (and --enable-cxx).

    Which x86_64 distribution do you use? If it comes with a binary Berkeley DB package (4.2.52 or later), do the Freeze tests also fail with this package?

    Cheers,
    Bernard
  • bernardbernard Jupiter, FLAdministrators, ZeroC Staff Bernard NormierOrganization: ZeroC, Inc.Project: Ice ZeroC Staff
    Hi Bob,

    I just ran the entire 2.1.2 test suite on SuSE 9.1 x86_64 (Linux 2.6.5) without any problem. I used the db-4.2.52 package that comes with this distribution (it installs the db libraries in /usr/lib64). It's surprising you have all this trouble with Mandrake x86_64.

    In the next Ice release, we will probably support FC4 x86 and x86_64, so maybe you could try FC4 x86_64 (+ the GCC 4.0.1 update)?

    testing concurrent access... test in ./test/Freeze/dbmap failed with exit status 256
    Additional info, such as a stack trace, may give us some clue.

    For the Ice/operations failure, please try to run the test suite with ssl disabled, i.e. edit $ICE_HOME/config/TestUtil.py and comment out the line
    protocol = "ssl"
    This would show if this failure is ssl-specific or not.

    At present, we don't plan to support Linux on IA64. I suspect the port itself would be easy, but ongoing maintenance and testing would be time consuming. If you'd like to sponsor this effort, please contact us at [email protected].

    Thanks,
    Bernard
  • spongebobspongebob Member Simon J. BensonOrganization: Centre for Magnetic ResonanceProject: Resonanz [www.resonanz.org.au]
    Bernard,

    I am seeing the same problem on Fedora Core 3 (x86_64) with the Freeze/evictor test and the Freeze/dbmap test.

    evictor trace (server) :-


    ./server --Ice.Plugin.IceSSL=IceSSL:create --Ice.Default.Protocol=ssl --IceSSL.Server.CertPath=../../../certs --IceSSL.Server.Config=server_sslconfig.xml --Ice.Override.Compress --Ice.Default.Host=127.0.0.1 --Ice.PrintProcessId --Ice.PrintAdapterReady --Ice.NullHandleAbort --Ice.Warn.Connections --Ice.ServerIdleTime=30 --Ice.ThreadPool.Server.Size=1 --Ice.ThreadPool.Server.SizeMax=3 --Ice.ThreadPool.Server.SizeWarn=0 --Freeze.DbEnv.db.DbHome=./db --Ice.Config=./config


    ./client --Ice.Plugin.IceSSL=IceSSL:create --Ice.Default.Protocol=ssl --IceSSL.Client.CertPath=../../../certs --IceSSL.Client.Config=client_sslconfig.xml --Ice.Override.Compress --Ice.Default.Host=127.0.0.1 --Ice.NullHandleAbort --Ice.Warn.Connections --Freeze.DbEnv.db.DbHome=./db --Ice.Config=./config

    Program received signal SIGPIPE, Broken pipe.
    [Switching to Thread 1105197408 (LWP 4733)]
    0x0000003c51a0afbf in __write_nocancel () from /lib64/tls/libpthread.so.0
    (gdb) bt
    #0 0x0000003c51a0afbf in __write_nocancel () from /lib64/tls/libpthread.so.0
    #1 0x0000003c5488e867 in BIO_sock_should_retry () from /lib64/libcrypto.so.4
    #2 0x0000003c5488caf8 in BIO_write () from /lib64/libcrypto.so.4
    #3 0x0000003c5641b085 in ssl3_alert_code () from /lib64/libssl.so.4
    #4 0x0000003c5641b133 in ssl3_dispatch_alert () from /lib64/libssl.so.4
    #5 0x0000003c5641994d in ssl3_shutdown () from /lib64/libssl.so.4
    #6 0x0000002a95e7e984 in IceSSL::SslTransceiver::internalShutdownWrite (this=0x2a96100c50, timeout=0) at SslTransceiver.cpp:521
    #7 0x0000002a95e7d094 in IceSSL::SslTransceiver::close (this=0x2a96100c50) at SslTransceiver.cpp:71
    #8 0x0000002a95a41a0a in Ice::ConnectionI::finished (this=0x2a9611a010, threadPool=@0x41dfef10) at ConnectionI.cpp:1311
    #9 0x0000002a95aee01f in IceInternal::ThreadPool::run (this=0x5bc240) at ThreadPool.cpp:566
    #10 0x0000002a95aef321 in IceInternal::ThreadPool::EventHandlerThread::run (this=0x5b6530) at ThreadPool.cpp:836
    #11 0x0000002a95cc5208 in startHook (arg=0x5b6530) at Thread.cpp:491
    #12 0x0000003c51a05f81 in start_thread () from /lib64/tls/libpthread.so.0
    #13 0x0000003c50fc3af3 in thread_start () from /lib64/tls/libc.so.6
    #14 0x0000000000000000 in ?? ()


    dbmap trace (client):-


    ./client --Ice.Plugin.IceSSL=IceSSL:create --Ice.Default.Protocol=ssl --IceSSL.Client.CertPath=../../../certs --IceSSL.Client.Config=client_sslconfig.xml --Ice.Override.Compress --Ice.Default.Host=127.0.0.1 --Ice.NullHandleAbort --Ice.Warn.Connections

    Program received signal SIGSEGV, Segmentation fault.
    [Switching to Thread 1178597728 (LWP 4694)]
    0x0000002a9574f9fd in __bam_adj_log () from /home/bob/installs/Berkeley-4.3/lib64/libdb_cxx-4.3.so
    (gdb) bt
    #0 0x0000002a9574f9fd in __bam_adj_log () from /home/bob/installs/Berkeley-4.3/lib64/libdb_cxx-4.3.so
    #1 0x0000002a95741416 in __bam_adjindx () from /home/bob/installs/Berkeley-4.3/lib64/libdb_cxx-4.3.so
    #2 0x0000002a95744c95 in __bam_iitem () from /home/bob/installs/Berkeley-4.3/lib64/libdb_cxx-4.3.so
    #3 0x0000002a9573fc69 in __bam_c_put () from /home/bob/installs/Berkeley-4.3/lib64/libdb_cxx-4.3.so
    #4 0x0000002a957897a7 in __db_c_put () from /home/bob/installs/Berkeley-4.3/lib64/libdb_cxx-4.3.so
    #5 0x0000002a95789ac5 in __db_c_put () from /home/bob/installs/Berkeley-4.3/lib64/libdb_cxx-4.3.so
    #6 0x0000002a9578fb46 in __db_c_put_pp () from /home/bob/installs/Berkeley-4.3/lib64/libdb_cxx-4.3.so
    #7 0x0000002a95733e71 in Dbc::put () from /home/bob/installs/Berkeley-4.3/lib64/libdb_cxx-4.3.so
    #8 0x0000002a955d0fc2 in Freeze::IteratorHelperI::set (this=0x2a96100df0, value=@0x463ff070) at MapI.cpp:479
    #9 0x000000000040d86c in Freeze::Iterator<unsigned char, int, Test::ByteIntMapKeyCodec, Test::ByteIntMapValueCodec>::set (this=0x463ff130, value=@0x463ff10c) at Map.h:313
    #10 0x0000000000412cbd in WriteThread::run (this=0x620aa0) at Client.cpp:167
    #11 0x0000002a95cc5208 in startHook (arg=0x620aa0) at Thread.cpp:491
    #12 0x0000003c51a05f81 in start_thread () from /lib64/tls/libpthread.so.0
    #13 0x0000003c50fc3af3 in thread_start () from /lib64/tls/libc.so.6
    #14 0x0000000000000000 in ?? ()


    Is this a pebkac error ?
    Bob
  • bernardbernard Jupiter, FLAdministrators, ZeroC Staff Bernard NormierOrganization: ZeroC, Inc.Project: Ice ZeroC Staff
    Hi Bob,

    Two interesting stack traces :)

    For the first one, it's surprising to see a SIGPIPE: when you create your first communicator, Ice changes the disposition of SIGPIPE to SIG_IGN (ignore), and it's only when you destroy your last communicator that Ice changes SIGPIPE back to its default disposition (exit with this message). Maybe we do this too early?
    Could you comment out the "restore SIGIPE to default dispotition" in src/Ice/Instance.cpp 658-662 to see if this bug goes away?
    It would also be interesting to run this test without OpenSSL (see my previous post).

    For the dbmap stack trace, it looks like a Berkeley-DB bug; I'll contact Sleepycat. Could you post or e-mail me the complete stack traces (thread apply all bt)? And two more piece of info:
    - do you see this crash for every test run?
    - what kind of server do you use? (dual Xeon /EM64T, quad Opteron, ... ?)

    Thanks,
    Bernard
    [email protected]
  • spongebobspongebob Member Simon J. BensonOrganization: Centre for Magnetic ResonanceProject: Resonanz [www.resonanz.org.au]
    Hi Bernard,

    test/Freeze/evictor
    Running this test with the protocol="" results in :-

    starting server... ok
    starting client... ok
    testing Freeze Evictor... ../../../test/Freeze/evictor/client: warning: connection exception:
    TcpTransceiver.cpp:285: Ice::ConnectionLostException:
    connection lost: Connection reset by peer
    local address = 127.0.0.1:32842
    remote address = 127.0.0.1:12345
    failed!
    Client.cpp:271: assertion `false' failed
    failed!
    Client.cpp:271: assertion `false' failed
    sh: line 1: 4557 Segmentation fault ../../../test/Freeze/evictor/server --Ice.Override.Compress --Ice.Default.Host=127.0.0.1 --Ice.PrintProcessId --Ice.PrintAdapterReady --Ice.NullHandleAbort --Ice.Warn.Connections --Ice.ServerIdleTime=30 --Ice.ThreadPool.Server.Size=1 --Ice.ThreadPool.Server.SizeMax=3 --Ice.ThreadPool.Server.SizeWarn=0 --Freeze.DbEnv.db.DbHome=../../../test/Freeze/evictor/db --Ice.Config=../../../test/Freeze/evictor/config 2>&1
    sh: line 1: 4561 Aborted ../../../test/Freeze/evictor/client --Ice.Override.Compress --Ice.Default.Host=127.0.0.1 --Ice.NullHandleAbort --Ice.Warn.Connections --Freeze.DbEnv.db.DbHome=../../../test/Freeze/evictor/db --Ice.Config=../../../test/Freeze/evictor/config 2>&1


    Run again and it looks a little different :-

    starting server... ok
    starting client... ok
    testing Freeze Evictor... ../../../test/Frsh: line 1: 4592 Segmentation fault ../../../test/Freeze/evictor/server --Ice.Override.Compress --Ice.Default.Host=127.0.0.1 --Ice.PrintProcessId --Ice.PrintAdapterReady --Ice.NullHandleAbort --Ice.Warn.Connections --Ice.ServerIdleTime=30 --Ice.ThreadPool.Server.Size=1 --Ice.ThreadPool.Server.SizeMax=3 --Ice.ThreadPool.Server.SizeWarn=0 --Freeze.DbEnv.db.DbHome=../../../test/Freeze/evictor/db --Ice.Config=../../../test/Freeze/evictor/config 2>&1
    eeze/evictor/client: warning: connection exception:
    TcpTransceiver.cpp:217: Ice::ConnectionLostException:
    connection lost: recv() returned zero
    local address = 127.0.0.1:32847
    remote address = 127.0.0.1:12345
    TcpTransceiver.cpp:217: Ice::ConnectionLostException:
    connection lost: recv() returned zero


    Commenting out src/Ice/Instance.cpp:658-662 .... compiling and running again :-

    starting server... ok
    starting client... ok
    testing Freeze Evictor... sh: line 1: 6160 Segmentation fault ../../../test/Freeze/evictor/server --Ice.Override.Compress --Ice.Default.Host=127.0.0.1 --Ice.PrintProcessId --Ice.PrintAdapterReady --Ice.NullHandleAbort --Ice.Warn.Connections --Ice.ServerIdleTime=30 --Ice.ThreadPool.Server.Size=1 --Ice.ThreadPool.Server.SizeMax=3 --Ice.ThreadPool.Server.SizeWarn=0 --Freeze.DbEnv.db.DbHome=../../../test/Freeze/evictor/db --Ice.Config=../../../test/Freeze/evictor/config 2>&1
    ../../../test/Freeze/evictor/client: warning: connection exception:
    TcpTransceiver.cpp:217: Ice::ConnectionLostException:
    connection lost: recv() returned zero
    local address = 127.0.0.1:32868
    remote address = 127.0.0.1:12345
    ../../../test/Freeze/evictor/client: warning: connection exception:
    TcpTransceiver.cpp:285: Ice::ConnectionLostException:
    connection lost: Connection reset by peer
    local address = 127.0.0.1:32870
    remote address = 127.0.0.1:32869
    Network.cpp:557: Ice::ConnectionRefusedException:
    connection refused: Connection refused


    And then again with the protocol="ssl"

    starting server... ok
    starting client... ok
    testing Freeze Evictor... ../../../test/Freeze/evictor/client: sh: line 1: 6148 Segmentation fault ../../../test/Freeze/evictor/server --Ice.Plugin.IceSSL=IceSSL:create --Ice.Default.Protocol=ssl --IceSSL.Server.CertPath=../../../certs --IceSSL.Server.Config=server_sslconfig.xml --Ice.Override.Compress --Ice.Default.Host=127.0.0.1 --Ice.PrintProcessId --Ice.PrintAdapterReady --Ice.NullHandleAbort --Ice.Warn.Connections --Ice.ServerIdleTime=30 --Ice.ThreadPool.Server.Size=1 --Ice.ThreadPool.Server.SizeMax=3 --Ice.ThreadPool.Server.SizeWarn=0 --Freeze.DbEnv.db.DbHome=../../../test/Freeze/evictor/db --Ice.Config=../../../test/Freeze/evictor/config 2>&1
    warning: connection exception:
    SslTransceiver.cpp:288: Ice::ConnectionLostException:
    connection lost: recv() returned zero
    local address = 127.0.0.1:32864
    remote address = 127.0.0.1:12345
    ../../../test/Freeze/evictor/client: warning: connection exception:
    SslTransceiver.cpp:288: Ice::ConnectionLostException:
    connection lost: recv() returned zero
    local address = 127.0.0.1:32866
    remote address = 127.0.0.1:32865
    Network.cpp:557: Ice::ConnectionRefusedException:
    connection refused: Connection refused



    test/Freeze/dbmap
    This problem is intermittant. The complete stack staces are in the attached dbmap.txt

    All these tests are with Fedora Core 3 (x86_64) [2.6.9-1.667smp] on a Sun Sunfire V20Z (Dual AMD Opteron 250) node using Ice 2.1.2

    Hope this helps, many thanks for your support.
    Cheers,
    Bob
  • bernardbernard Jupiter, FLAdministrators, ZeroC Staff Bernard NormierOrganization: ZeroC, Inc.Project: Ice ZeroC Staff
    Hi Bob,

    The test/evictor failures actually all look the same: the client fails (connection lost) because the server crashed (segfault).
    The best would be to see why the server crashes: instead of using ./run.py to run the test, you can start the client and server "by hand":
    ./server --Ice.Config=config
    ./client
    (with $ICE_HOME/lib in your LD_LIBRARY_PATH)
    You could also run the server in gdb.

    For the dbmap crash, I contacted Sleepycat support; can you reproduce this crash with a debug version of Berkeley DB (built with -g instead of -O to get line numbers)?

    Thanks,
    Bernard
  • spongebobspongebob Member Simon J. BensonOrganization: Centre for Magnetic ResonanceProject: Resonanz [www.resonanz.org.au]
    Bernard,

    I built Berkeley with debug enabled, attached are the gdb results. If you want any more variations on the evictor server options I'd be happy to oblige.

    Cheers,
    Bob.
  • bernardbernard Jupiter, FLAdministrators, ZeroC Staff Bernard NormierOrganization: ZeroC, Inc.Project: Ice ZeroC Staff
    Hi Bob,

    I've just posted a patch for Berkeley DB 4.3 in the Patches forum. Please apply this patch and try again!

    Thanks,
    Bernard
  • spongebobspongebob Member Simon J. BensonOrganization: Centre for Magnetic ResonanceProject: Resonanz [www.resonanz.org.au]
    Bernard,

    You're a champion, the patch works fine! allTests.py results in an ok for all tests on both Fedora Core 3 (x86_64) and Mandrake 10.1 (x86_64).

    Many thanks again!

    Cheers,
    Bob
  • bernardbernard Jupiter, FLAdministrators, ZeroC Staff Bernard NormierOrganization: ZeroC, Inc.Project: Ice ZeroC Staff
    Glad I could help! It will easier to enjoy Ice now :)

    Cheers,
    Bernard
Sign In or Register to comment.