Archived

This forum has been archived. Please start a new discussion on GitHub.

Connection delayed related Ice.Override.ConnectTimeout setting

I deploy an icegrid application in which there is a server named writeDataAgentServer-1, it provides a proxy interface. Then a client accesses this interface and something strange happens: If client is not on server's machine, the calling to service is sometimes delayed but completes successfully, this happens randomly but frequently (Maybe 1 of 3 chances). But if client is on same machine, it always works fine.

Could anybody please tell me how to inspect this problem further?

There are some hints I already found:
1. Delayed time always equals to Ice.Override.ConnectTimeout setting of client. Whenever it is changed, delayed time changed accordingly.
2. By capturing network packages, I found interaction with icegridnode is fine, the delay happens before client launchs TCP connection to my writeDataAgent proxy. There is no message from server side to trigger the TCP connection launching, it seems client just wait for timeout then do the job.
3. It's not likely because of network quality. All my machines are in the same LAN. I deployed 2 icegrid environment, both have same issue.

Below are some environment info:
============================

Ice version: 3.5.1

Server OS: Red Hat Enterprise Linux Server release 5.4 (Tikanga)

Client source code:
//I use Ice::Application for client programming
//_writeServant is write-dataagent servant name, TransactionPrx is pointer to slice class.
Ice::ObjectPrx base = _communicator->stringToProxy(_writeServant);
TransactionPrx transaction = TransactionPrx::checkedCast(base);
if(transaction)
{
return (ReturnCode)(transaction->execute(sql, proto, inParamStream));
}
else
{
return DSE::dataAgent::IceError;
}

Client config file:
Ice.Default.Locator=DbGrid/Locator:tcp -h 10.1.4.248 -p 4455
Ice.Override.ConnectTimeout=50000
Ice.Override.Timeout=10000

icegrid config file:
#
# The IceGrid instance name.
#
IceGrid.InstanceName=DbGrid

#
# The IceGrid locator proxy.
#
Ice.Default.Locator=DbGrid/Locator:tcp -h 10.1.4.248 -p 4455

#
# IceGrid registry configuration.
#
IceGrid.Registry.Client.Endpoints=tcp -h 10.1.4.248 -p 4455
IceGrid.Registry.Server.Endpoints=default
IceGrid.Registry.Internal.Endpoints=default
IceGrid.Registry.Data=db/registry
IceGrid.Registry.PermissionsVerifier=DbGrid/NullPermissionsVerifier
IceGrid.Registry.AdminPermissionsVerifier=DbGrid/NullPermissionsVerifier

#
# IceGrid node configuration.
#
IceGrid.Node.Name=node1
IceGrid.Node.Endpoints=default
IceGrid.Node.Data=db/node
IceGrid.Node.CollocateRegistry=1

#
# Trace properties.
#
IceGrid.Node.Trace.Activator=1

#
# Dummy username and password for icegridadmin.
#
IceGridAdmin.Username=dbgrid
IceGridAdmin.Password=dbgrid

application xml:
[HTML]
...
<server-template id="DataAgentServer">
<parameter name="index"/>
<parameter name="type"/>
<parameter name="db_server"/>
<parameter name="db_user"/>
<parameter name="db_pwd"/>
<server id="${type}DataAgentServer-${index}" exe="dataagent" activation="always">
<adapter name="DataAgent" endpoints="tcp" replica-group="${type}DataAgentReplicaGroup">
<property name="Identity" value="${type}DataAgent"/>
<property name="DBConnectionPool.server" value="${db_server}"/>
<property name="DBConnectionPool.user" value="${db_user}"/>
<property name="DBConnectionPool.passwd" value="${db_pwd}"/>
<property name="DBConnectionPool.sizeMin" value="1"/>
<property name="DBConnectionPool.sizeMax" value="50"/>
<property name="DBConnectionPool.idleTime" value="600"/>
<property name="DBConnectionPool.waitTime" value="5"/>
<property name="Ice.ThreadPool.Server.Size" value="20"/>
<property name="Ice.ThreadPool.Server.SizeMax" value="20"/>
<property name="Ice.ThreadPool.Client.Size" value="20"/>
<property name="Ice.ThreadPool.Client.SizeMax" value="20"/>
<property name="Ice.MessageSizeMax" value="5120"/>
<property name="Ice.ACM.Client" value="60"/>
<property name="Ice.ACM.Server" value="60"/>
<property name="Ice.Default.EndpointSelection" value="Random"/>
<property name="Ice.Default.LocatorCacheTimeout" value="-1"/>
<property name="Ice.Override.ConnectTimeout" value="5000"/>
<property name="Ice.Override.Timeout" value="60000"/>
<!--property name="Ice.Plugin.DSELogger.cpp" value="DSELoggerPlugin:createLogger"/-->
<property name="Ice.Plugin.DBPlugin.cpp" value="mysqldbplugin:createDBPlugin"/>
<!--property name="Ice.Plugin.DBPlugin.cpp" value="oracledbplugin:createDBPlugin"/-->
</adapter>
</server>
</server-template>
...
<replica-group id="readDataAgentReplicaGroup">
<load-balancing type="ordered" n-replicas="1"/>
<object identity="readDataAgent" type="::DSE::dataAgent::Transaction"/>
</replica-group>
<replica-group id="writeDataAgentReplicaGroup">
<load-balancing type="ordered" n-replicas="1"/>
<object identity="writeDataAgent" type="::DSE::dataAgent::Transaction"/>
</replica-group>
...
<node name="node1">
<server-instance template="DataAgentServer" index="1" type="read" db_server="10.1.4.248:3306@PayDB" db_user="payuser" db_pwd="xxx"/>
<server-instance template="DataAgentServer" index="1" type="write" db_server="10.1.4.248:3306@PayDB" db_user="payuser" db_pwd="xxx"/>
</node>
[/HTML]

Stack of client while operation is hanging:
(gdb) thread apply all bt

Thread 5 (Thread 0x426ab940 (LWP 6192)):
#0  0x000000387fe30838 in do_sigwait () from /lib64/libc.so.6
#1  0x000000387fe308dd in sigwait () from /lib64/libc.so.6
#2  0x00002ac63f0d0880 in sigwaitThread () at CtrlCHandler.cpp:166
#3  0x0000003880a0673d in start_thread () from /lib64/libpthread.so.0
#4  0x000000387fed3d1d in clone () from /lib64/libc.so.6

Thread 4 (Thread 0x430ac940 (LWP 6193)):
#0  0x0000003880a0b150 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x00002ac63df7d934 in IceUtil::Cond::timedWaitImpl<IceUtil::Mutex> (this=0x1d313628, mutex=..., timeout=...)
    at ../../include/IceUtil/Cond.h:301
#2  0x00002ac63df7da24 in IceUtil::Monitor<IceUtil::Mutex>::timedWait (this=0x1d313628, timeout=...)
    at ../../include/IceUtil/Monitor.h:180
#3  0x00002ac63f0ec984 in IceUtil::Timer::run (this=0x1d313620) at Timer.cpp:195
#4  0x00002ac63f0e9e82 in startHook (arg=0x1d313708) at Thread.cpp:644
#5  0x0000003880a0673d in start_thread () from /lib64/libpthread.so.0
#6  0x000000387fed3d1d in clone () from /lib64/libc.so.6

Thread 3 (Thread 0x43aad940 (LWP 6194)):
#0  0x0000003880a0aee9 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x00002ac63def7365 in IceUtil::Cond::waitImpl<IceUtil::Mutex> (this=0x1d3138f8, mutex=...) at ../../include/IceUtil/Cond.h:266
#2  0x00002ac63df2a3e4 in IceUtil::Monitor<IceUtil::Mutex>::wait (this=0x1d3138f8) at ../../include/IceUtil/Monitor.h:152
#3  0x00002ac63df6a871 in IceInternal::EndpointHostResolver::run (this=0x1d3138b0) at EndpointI.cpp:238
#4  0x00002ac63f0e9e82 in startHook (arg=0x1d3138b0) at Thread.cpp:644
#5  0x0000003880a0673d in start_thread () from /lib64/libpthread.so.0
#6  0x000000387fed3d1d in clone () from /lib64/libc.so.6

Thread 2 (Thread 0x444ae940 (LWP 6195)):
#0  0x000000387fed4108 in epoll_wait () from /lib64/libc.so.6
#1  0x00002ac63e10e763 in IceInternal::Selector::select (this=0x1d313df0, handlers=..., timeout=0) at Selector.cpp:447
#2  0x00002ac63e13788c in IceInternal::ThreadPool::run (this=0x1d313d60, thread=...) at ThreadPool.cpp:665
#3  0x00002ac63e138171 in IceInternal::ThreadPool::EventHandlerThread::run (this=0x1d313fc0) at ThreadPool.cpp:1175
#4  0x00002ac63f0e9e82 in startHook (arg=0x1d313fc0) at Thread.cpp:644
#5  0x0000003880a0673d in start_thread () from /lib64/libpthread.so.0
#6  0x000000387fed3d1d in clone () from /lib64/libc.so.6

Thread 1 (Thread 0x2ac6407e8a60 (LWP 6191)):
#0  0x0000003880a0aee9 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x00002ac63def7365 in IceUtil::Cond::waitImpl<IceUtil::Mutex> (this=0x1d317cc8, mutex=...) at ../../include/IceUtil/Cond.h:266
#2  0x00002ac63df2a3e4 in IceUtil::Monitor<IceUtil::Mutex>::wait (this=0x1d317cc8) at ../../include/IceUtil/Monitor.h:152
#3  0x00002ac63df2587c in IceInternal::ConnectRequestHandler::getConnection (this=0x1d317ca0, waitInit=true)
    at ConnectRequestHandler.cpp:280
#4  0x00002ac63df25e2e in IceInternal::ConnectRequestHandler::sendRequest (this=0x1d317ca0, out=0x7fff7a083bd0)
    at ConnectRequestHandler.cpp:219
#5  0x00002ac63e0af0e4 in IceInternal::Outgoing::invoke (this=0x7fff7a083bd0) at Outgoing.cpp:181
#6  0x00002ac63e0d84d4 in IceDelegateM::Ice::Object::ice_isA (this=0x1d31b330, __id=..., context=0x0, observer=...)
    at Proxy.cpp:1557
#7  0x00002ac63e0db8d5 in IceProxy::Ice::Object::ice_isA (this=0x1d321d00, typeId=..., context=0x0) at Proxy.cpp:147
#8  0x00002ac63e0e3b3a in IceProxy::Ice::Object::ice_isA (this=0x1d321d00, typeId=...) at ../../include/Ice/Proxy.h:244
#9  0x00002ac63ec645b6 in IceInternal::checkedCastImpl<IceInternal::ProxyHandle<IceProxy::DSE::dataAgent::Transaction> > (b=...,
    context=0x0) at /opt/aspire/product/compiler/common/ice/Ice-3.5.1/include/Ice/Proxy.h:1449
#10 0x00002ac63ec646b4 in IceInternal::checkedCastHelper<IceProxy::DSE::dataAgent::Transaction, IceProxy::Ice::Object> (b=...,
    ctx=0x0) at /opt/aspire/product/compiler/common/ice/Ice-3.5.1/include/Ice/ProxyHandle.h:90
#11 0x00002ac63ec646f2 in IceInternal::ProxyHandle<IceProxy::DSE::dataAgent::Transaction>::checkedCast<IceProxy::Ice::Object> (
    r=...) at /opt/aspire/product/compiler/common/ice/Ice-3.5.1/include/Ice/ProxyHandle.h:256
#12 0x00002ac63ec5e7c2 in DSE::dataAgent::SqlMapClient::executeSql (this=0x1d313ef0, sql=..., parameter=0x0, parameterNum=0)
    at SqlMapClient.cpp:139
#13 0x00002ac63ec60262 in DSE::dataAgent::SqlMapClient::executeSql (this=0x1d313ef0, id=..., operatorCode=Insert)
    at SqlMapClient.cpp:340
#14 0x00002ac63ec60308 in DSE::dataAgent::SqlMapClient::insertData (this=0x1d313ef0, id=...) at SqlMapClient.cpp:403
#15 0x00000000004070a8 in TestClient::testExecuteWithNoParam() ()
#16 0x00000000004087b8 in TestClient::run(int, char**) ()
#17 0x00002ac63def2602 in Ice::Application::doMain (this=0x7fff7a084db0, argc=1, argv=0x7fff7a084ec8, initData=...)
    at Application.cpp:673
#18 0x00002ac63def44a1 in Ice::Application::main (this=0x7fff7a084db0, argc=1, argv=0x7fff7a084ec8, initializationData=...)
    at Application.cpp:423
#19 0x00002ac63def4def in Ice::Application::main (this=0x7fff7a084db0, argc=1, argv=0x7fff7a084ec8,
    configFile=0x40b186 "config.client") at Application.cpp:343
#20 0x0000000000408a1a in main ()
(gdb)

Comments

  • benoit
    benoit Rennes, France
    Hi,

    This is most likely because your server is running on a machine with multiple network interfaces and one of them is inaccessible to the client.

    You can enable Ice.Trace.Network=2 in the client configuration to check to which IP address the client tries to connect to and see which IP fails (once the connect timeout elapses).

    Assuming the server runs on the machine with the IP address 10.1.4.28, adding the -h 10.1.4.248 option in the tcp endpoint of the server's object adapter should fix this problem:

    < adapter name="DataAgent" endpoints="tcp -h 10.1.4.28" replica-group="${type}DataAgentReplicaGroup" >

    Cheers,
    Benoit.
  • Hi Benoit,

    Thank you very much. It's exactly because of multiple IP address. After adding -h to endpoints property, everything works fine. You save my day!

    Best Regards,
    Alvan.