Archived

This forum has been archived. Please start a new discussion on GitHub.

ICE hanging on RHEL 5

Hello,

we are using ICE for application development. We use IceGridRegistry, with about 12 components connected to it, running on the same machine. Recently the system suddenly started hanging on our RHEL 5 machine. Other distributions (e.g., Scientific Linux 6) do not appear to have the same problem.

The ice version is
ice-3.4.2-1.rhel5

What I observe is this: my code issues the standard "activate" call, which hangs. I tried to do a network trace, but I cannot derive anything useful from it.

The code fragment in question (in Java) is:

boolean success = true;
try {
IceSimpleLogger.log("Activating LoggerServer");
m_adapter.activate();
}
catch (ConnectionRefusedException e) {
IceSimpleLogger.log("Cannot activate LoggerServer: Ice Connection Refused");
e.printStackTrace();
success = false;

} catch (Exception e) {
IceSimpleLogger.log("Cannot activate LoggerServer");
success = false;
e.printStackTrace();
}
if (success) {
IceSimpleLogger.log("Activated LoggerServer");
}
}

The "log" function is a simple print statement, so nothing to hang there.

What I see on my trace printout is that the first message ("Activating logger server") is printed out fine, but the second message ("Activated logger server") is never printed, and no exceptions are printed, either.

I have the network trace below. What could be causing this?

Thanks,

Myrosia

Client network trace
[iceInterfaces.servers.LoggerServer -1 <init>] Ice Initialized in LoggerServer
-- 1/20/12 19:00:39:272 Network: attempting to bind to tcp socket 0.0.0.0:0
-- 1/20/12 19:00:39:273 Network: accepting tcp connections at 0:0:0:0:0:0:0:0:41657
local interfaces: 192.168.122.1, 172.20.48.219, 129.215.91.94, 127.0.0.1
-- 1/20/12 19:00:39:276 Network: published endpoints for object adapter `BeegleBeetleLogger':
tcp -h 192.168.122.1 -p 41657:tcp -h 172.20.48.219 -p 41657:tcp -h 129.215.91.94 -p 41657
[iceInterfaces.servers.LoggerServer -1 <init>] Establishing object beetleLogger@BeegleBeetleLogger
[iceInterfaces.servers.LoggerServer -1 start] Activating LoggerServer
-- 1/20/12 19:00:39:325 Network: trying to establish tcp connection to 127.0.0.1:4061
-- 1/20/12 19:00:39:370 Network: tcp connection established
local address = 127.0.0.1:35280
remote address = 127.0.0.1:4061
-- 1/20/12 19:00:39:370 Network: received 14 of 14 bytes via tcp
local address = 127.0.0.1:35280
remote address = 127.0.0.1:4061
-- 1/20/12 19:00:39:377 Network: sent 55 of 55 bytes via tcp
local address = 127.0.0.1:35280
remote address = 127.0.0.1:4061
-- 1/20/12 19:00:39:380 Network: received 14 of 14 bytes via tcp
local address = 127.0.0.1:35280
remote address = 127.0.0.1:4061
-- 1/20/12 19:00:39:380 Network: received 139 of 139 bytes via tcp
local address = 127.0.0.1:35280
remote address = 127.0.0.1:4061
-- 1/20/12 19:00:39:399 Network: trying to establish tcp connection to 192.168.122.1:43928
-- 1/20/12 19:00:39:402 Network: tcp connection established
local address = 192.168.122.1:59144
remote address = 192.168.122.1:43928
-- 1/20/12 19:00:39:403 Network: received 14 of 14 bytes via tcp
local address = 192.168.122.1:59144
remote address = 192.168.122.1:43928
-- 1/20/12 19:00:39:405 Network: sent 203 of 203 bytes via tcp
local address = 192.168.122.1:59144
remote address = 192.168.122.1:43928
Nothing beyond that point is ever printed, the process hangs



IceGrid network trace
-! 01/20/12 18:58:37.679 warning: deprecated property: Ice.Trace.Location
-! 01/20/12 18:58:39.899 icegridregistry: warning: setting `Ice.ThreadPool.Serve
r.Size' is not useful, you should set individual adapter thread pools instead.
-- 01/20/12 18:58:39.904 icegridregistry: Network: trying to establish tcp conne
ction to 127.0.0.1:4061
-- 01/20/12 18:58:40.016 icegridregistry: Network: failed to establish tcp conne
ction
local address: 127.0.0.1:35240
remote address: 127.0.0.1:4061
Network.cpp:1243: Ice::ConnectionRefusedException:
connection refused: Connection refused
-- 01/20/12 18:58:40.042 icegridregistry: Network: trying to establish tcp conne
ction to 127.0.0.1:4061
-- 01/20/12 18:58:40.043 icegridregistry: Network: failed to establish tcp conne
ction
local address: 127.0.0.1:35241
remote address: 127.0.0.1:4061
Network.cpp:1243: Ice::ConnectionRefusedException:
connection refused: Connection refused
-- 01/20/12 18:58:40.084 icegridregistry: Network: attempting to bind to tcp soc
ket 0.0.0.0:0
-- 01/20/12 18:58:40.086 icegridregistry: Network: accepting tcp connections at
0.0.0.0:60252
local interfaces: 129.215.91.94, 172.20.60.167, 192.168.122.1, 127.0.0.1
-- 01/20/12 18:58:40.113 icegridregistry: Network: published endpoints for objec
t adapter `IceGrid.Registry.Internal':
tcp -h 129.215.91.94 -p 60252:tcp -h 172.20.60.167 -p 60252:tcp -h 192.168.12
2.1 -p 60252
-- 01/20/12 18:58:40.156 icegridregistry: Network: trying to establish tcp conne
ction to 172.20.60.167:60252
-- 01/20/12 18:58:40.158 icegridregistry: Network: accepted tcp connection
local address = 172.20.60.167:60252
remote address = 172.20.60.167:44631
-- 01/20/12 18:58:40.158 icegridregistry: Network: tcp connection established
local address = 172.20.60.167:44631
remote address = 172.20.60.167:60252
-- 01/20/12 18:58:40.159 icegridregistry: Network: sent 14 of 14 bytes via tcp
local address = 172.20.60.167:60252
remote address = 172.20.60.167:44631
-- 01/20/12 18:58:40.159 icegridregistry: Network: received 14 of 14 bytes via t
cp
local address = 172.20.60.167:44631
remote address = 172.20.60.167:60252
-- 01/20/12 18:58:40.159 icegridregistry: Network: sent 84 of 84 bytes via tcp
local address = 172.20.60.167:44631
remote address = 172.20.60.167:60252
-- 01/20/12 18:58:40.159 icegridregistry: Network: received 14 of 14 bytes via tcp
local address = 127.0.0.1:4061
remote address = 127.0.0.1:35280
-- 01/20/12 19:00:39.377 icegridregistry: Network: received 41 of 41 bytes via t
cp
local address = 127.0.0.1:4061
remote address = 127.0.0.1:35280
-- 01/20/12 19:00:39.377 icegridregistry: Network: sent 153 of 153 bytes via tcp
local address = 127.0.0.1:4061
remote address = 127.0.0.1:35280
-- 01/20/12 19:00:39.400 icegridregistry: Network: accepted tcp connection
local address = 192.168.122.1:43928
remote address = 192.168.122.1:59144
-- 01/20/12 19:00:39.400 icegridregistry: Network: sent 14 of 14 bytes via tcp
local address = 192.168.122.1:43928
remote address = 192.168.122.1:59144
-- 01/20/12 19:00:39.405 icegridregistry: Network: received 14 of 14 bytes via t
cp
local address = 192.168.122.1:43928
remote address = 192.168.122.1:59144
-- 01/20/12 19:00:39.405 icegridregistry: Network: received 189 of 189 bytes via
tcp
local address = 192.168.122.1:43928
remote address = 192.168.122.1:59144
-- 01/20/12 19:02:41.505 icegridregistry: Network: accepted tcp connection
local address = 129.215.91.94:43928
remote address = 129.215.91.94:42830
-- 01/20/12 19:02:41.506 icegridregistry: Network: sent 14 of 14 bytes via tcp
local address = 129.215.91.94:43928
remote address = 129.215.91.94:42830
-- 01/20/12 19:02:41.506 icegridregistry: Network: received 14 of 14 bytes via tcp
local address = 129.215.91.94:43928
remote address = 129.215.91.94:42830
-- 01/20/12 19:02:41.506 icegridregistry: Network: received 174 of 174 bytes via tcp
local address = 129.215.91.94:43928
remote address = 129.215.91.94:42830
-- 01/20/12 19:03:09.975 icegridregistry: Network: accepted tcp connection
local address = 192.168.122.1:43928
remote address = 192.168.122.1:59147
-- 01/20/12 19:03:09.976 icegridregistry: Network: sent 14 of 14 bytes via tcp
local address = 192.168.122.1:43928
remote address = 192.168.122.1:59147
-- 01/20/12 19:03:09.976 icegridregistry: Network: received 14 of 14 bytes via tcp
local address = 192.168.122.1:43928
remote address = 192.168.122.1:59147
-- 01/20/12 19:03:09.976 icegridregistry: Network: received 183 of 183 bytes via tcp
local address = 192.168.122.1:43928
remote address = 192.168.122.1:59147



Comments

  • Update: I just saw my application get unstuck after sitting there for 10 minutes (while I was composing the previous message and copying and pasting the printouts). It promptly got stuck again on activating the second server (there are two separate ICE servers in that process). The second time was a little faster - about 3 minutes (though of course that is much too slow). But then the first call from the client hang as well, at the point of invoking the proxy.

    This really looks to me like some iceregistry problem. I'm happy to provide debug information for that as necessary. The RHEL 5 is what our project partners are required to use, so I'd really appreciate help figuring this out.

    Myrosia
  • mes
    mes California
    Hi Myrosia,

    One log entry that stands out to me is this:
    -- 01/20/12 18:58:40.113 icegridregistry: Network: published endpoints for object adapter `IceGrid.Registry.Internal':
    tcp -h 129.215.91.94 -p 60252:tcp -h 172.20.60.167 -p 60252:tcp -h 192.168.122.1 -p 60252

    Your server is clearly configured with multiple network interfaces. If not handled carefully, this configuration can often lead to unexpected problems. What happens is that proxies published by an object adapter contain all of these "published endpoints", while only some of them are actually reachable by the client. If the Ice run time in the client is trying to establish a new connection and happens to select one of these unreachable endpoints, the client can appear to hang.

    I'm not convinced that you're experiencing a similar issue, but I thought I should mention it so that you take this into consideration. Pay close attention to the addresses shown in the network tracing logs so that you are sure it's not a contributing factor.

    If all the components of your application (client, IceGrid registry/node, servers) are running on the same host, one test you can run is to define Ice.Default.Host=127.0.0.1 for every component. This property will force all object adapters to listen only on the localhost interface, and consequently only publish that address in their proxies. If the lengthy delays disappear, that's a very good indication that a network configuration issue is to blame.

    One other suggestion I can make is to enable Ice.Trace.Protocol=1. This will include details of each operation in the log, making it easier to associate the network activity with the application activity.

    If you're still having trouble, please post updated logs that include the protocol tracing.

    Regards,
    Mark
  • Thanks. Setting Ice.Default.Host = 127.0.0.1 resolved the problem.

    However, the "multiple interface" problem is inherent to RHEL 5. After some more investigation, I traced the issue to the "virbr0" interface. This is standard RHEL setup, with this interface automatically set up by the kernel and used for virtualization support. Other than turning off virtualization entirely (which I cannot do for other reasons), I cannot shut it off, so there will always be multiple interfaces. Is there a better way to get around it, other than maintaining separate ICE configuration for RHEL machines?


    Myrosia
  • mes
    mes California
    Hi,

    I'm glad to hear you figured out the issue.

    Ice doesn't provide a way to selectively exclude certain network interfaces. For a given endpoint, Ice either listens on INADDR_ANY (i.e., all available interfaces), or it listens on the interface corresponding to a specific host name or IP address (specified by Ice.Default.Host or the -h option in the endpoint).

    If all your communication occurs on the same host, setting Ice.Default.Host to the localhost interface is the simplest solution.

    Regards,
    Mark