Archived

This forum has been archived. Please start a new discussion on GitHub.

timeout and connection lost problem

our program is under load test, during our test we found that the following error message were generated sometimes. The first 11 lines are logs generated by our applciation program, and the last line is error message generated by icegridnode:
2007-06-04 09:02:59 Ice::TimeoutExceptionOutgoing.cpp
2007-06-04 09:02:59 Ice::TimeoutExceptionOutgoing.cpp
2007-06-04 09:02:59 Ice::TimeoutExceptionOutgoing.cpp
2007-06-04 09:02:59 Ice::TimeoutExceptionOutgoing.cpp
2007-06-04 09:02:59 Ice::TimeoutExceptionOutgoing.cpp
2007-06-04 09:02:59 Ice::TimeoutExceptionOutgoing.cpp
2007-06-04 09:02:59 Ice::TimeoutExceptionOutgoing.cpp
2007-06-04 09:03:05 Ice::ConnectionLostExceptionTcpTransceiver.cpp
2007-06-04 09:03:05 Ice::ConnectionLostExceptionTcpTransceiver.cpp
2007-06-04 09:03:05 Ice::ConnectionLostExceptionTcpTransceiver.cpp
2007-06-04 09:03:05 Ice::ConnectionLostExceptionTcpTransceiver.cpp

06/04/07 09:03:06.125 a node with the same name is already active with the replica `Master'

what leads to these errors? network error or something other?

when will "a node with the same name is already active with the replica `Master'" be generated?

Comments

  • benoit
    benoit Rennes, France
    Hi,

    This usually shouldn't happen if there's no network connectivity issue between the registry and node. The message indicates that the registry rejected the node registration because it thinks it's already registered.

    You can enable the following properties in the node and registry configuration files to figure out why this occurs:
    IceGrid.Node.Trace.Replica=2 # In the node config file
    IceGrid.Registry.Trace.Node=2 # In the registry config file

    Cheers,
    Benoit.
  • thanks for your reply,benoit

    i think there must be some network connectivity issues during our test learning from the error messages.At first, TimeoutException and ConnectionLostException was caught for request, and at the same time,i think, the contact between registry and node was lost too; then the node try to contact with registry and regist itself.
    But i still have a problem, form the stderr.txt of rgistry, i find:
    [ 06/04/07 08:57:47.607 Node: node `CSESrvModule1' down ]
    [ 06/04/07 08:57:47.607 Node: node `CSESrvModule1' up ]
    (the time of computer on which rgistry runs is several minites later than that of computer on which node runs)

    the above shows that the registry has detected down of the node, so why registry rejected the node registration?
  • benoit
    benoit Rennes, France
    I would need more information to be able to answer your question. Do you have the traces of the node where it indicates that it lost connectivity with the registry? The best would be to reproduce the problem with the traces I've indicated above and post the relevant traces here. You should also correctly setup the clock on both machines so that it's easier to read the traces.

    Cheers,
    Benoit.
  • thanks again,benoit

    i am very sorry that i don't have any traces of the node where it indicates that it lost connectivity with the registry at this moment. Our test is running right now and it will last several days, so i can't change the config of node and restart the node. I will do some test a few days later, if the problem still exist, I will collect tracies you mentioned aboved and post them here.