Archived

This forum has been archived. Please start a new discussion on GitHub.

Problem with .checkedCast

Hi, I'm working with Ice and have noticed the followong problem when the primary registry's interface is down. When its interface is up, a task takes 0,10 seconds.
When the interface is down, the job takes generally 3,10 seconds. We think that this is because of the extra time needed to connect to the secondary registry. We have noticed that the error occurs when calling ExecServantPrx.checkedCast(communicator.stringToProxy());
In the forum we have noticed that people have had similar issues, but it is because of the stringToProxy call. We have noticed that this is not our case and the actual problem is with the checkedCast.
Any help would be greatly appreciated.
Thank you

Comments

  • bernard
    bernard Jupiter, FL
    Hi Jose,

    Welcome to our forums!

    stringToProxy is a local call: it just manufactures a local proxy from the provided string. As a result, it's very fast.

    So most of the work (in particular the indirect-proxy resolution) occurs during the first remote invocation - in your case, the checkedCast call.

    You should be able to find more information on what's happening in your client by enabling tracing. For example:
    Ice.Trace.Network=2
    Ice.Trace.Locator=2
    Ice.Trace.Retry=1

    (See http://www.zeroc.com/doc/Ice-3.4.1-IceTouch/manual/PropRef.50.2.html for details).

    Best regards,
    Bernard
  • Hi Bernard, thank you for answering.

    First of all I would like to quickly explain the situation:

    I have a Primary and Secondary registry.
    And two clients (clientA -> Primary , clientB -> Secondary), one on the primary registry, and the other on the secondary for HA.

    For some uknown reason, the primary registry randomly stops its network interfaces, making any connection impossible.
    My original configuration of both clients, used the following endponts:
    Ice.Default.Locator=Grid/Locator:tcp -h primary -p 4061:tcp -h secondary -p 4061

    The problem was, clientA would obviously not be able to connect because its network was down.
    And clientB would connect, but as it searches for its endpoints randomly by default, it would
    sometimes spend unnecesarry time trying to connect to the primary registry, and after a random time (generaly 3 seconds, although sometimes much more), it would end
    up connecting to the secondary registry.

    The solution I have used was to set the clientB locator to:

    Ice.Default.Locator=Grid/Locator:tcp -h secondary-p 4061:tcp -h primary -p 4061
    &
    Ice.Default.EndpointSelection=Ordered

    With this configuration, if clientA is ok, it will balance the load over the primary and secondary registry.
    If clientA is down, as is the primary registry because of network interface failure on that node,
    clientB connects to the secondary registry. There is no point of balancing load when there is only one node :D

    So far so good, but I have noticed the following problem, when executing from clientB (that should only connect to the secondary registry),
    when i stop the network interface on the primary registry, after one minute aprox. (tcp heaven on linex is 60 seconds), the node that executes takes a LONG
    time to respond:




    -- 11/03/10 14:09:21.133 Locator: searching for object by id
    object = HPCBanestoV2Servant
    -- 11/03/10 14:09:21.137 Network: trying to establish tcp connection to 192.10.0.230:4061
    -- 11/03/10 14:09:21.137 Network: tcp connection established
    local address = 192.10.0.230:52030
    remote address = 192.10.0.230:4061
    -- 11/03/10 14:09:21.137 Network: received 14 of 14 bytes via tcp
    local address = 192.10.0.230:52030
    remote address = 192.10.0.230:4061
    -- 11/03/10 14:09:21.137 Network: sent 86 of 86 bytes via tcp
    local address = 192.10.0.230:52030
    remote address = 192.10.0.230:4061
    -- 11/03/10 14:09:21.137 Network: received 14 of 14 bytes via tcp
    local address = 192.10.0.230:52030
    remote address = 192.10.0.230:4061
    -- 11/03/10 14:09:21.138 Network: received 64 of 64 bytes via tcp
    local address = 192.10.0.230:52030
    remote address = 192.10.0.230:4061
    -- 11/03/10 14:09:21.138 Locator: retrieved endpoints from locator, adding to locator table
    object = HPCBanestoV2Servant
    endpoints = tcp -h 192.10.0.6 -p 35731
    -- 11/03/10 14:09:21.138 Network: trying to establish tcp connection to 192.10.0.6:35731
    -- 11/03/10 14:09:21.138 Network: tcp connection established
    local address = 192.10.0.230:49816
    remote address = 192.10.0.6:35731
    -- 11/03/10 14:09:21.138 Network: received 14 of 14 bytes via tcp
    local address = 192.10.0.230:49816
    remote address = 192.10.0.6:35731
    -- 11/03/10 14:09:21.138 Network: sent 84 of 84 bytes via tcp
    local address = 192.10.0.230:49816
    remote address = 192.10.0.6:35731
    -- 11/03/10 14:09:21.138 Network: received 14 of 14 bytes via tcp
    local address = 192.10.0.230:49816
    remote address = 192.10.0.6:35731
    -- 11/03/10 14:09:21.138 Network: received 12 of 12 bytes via tcp
    local address = 192.10.0.230:49816
    remote address = 192.10.0.6:35731
    -- 11/03/10 14:09:21.139 Network: sent 1413 of 1413 bytes via tcp
    local address = 192.10.0.230:49816
    remote address = 192.10.0.6:35731

    Exactly one minute after (I think its the timeout of 60 seconds for tcp in linux):
  • bernard
    bernard Jupiter, FL
    Hi Jose,

    I don't see any issue with the trace you posted:
    192.10.0.230 = secondary registry?
    192.10.0.6 = target object adapter
    and your entire trace covers just 6 ms (14:09:21.133 to 14:09:21.139)

    Could you clarify where you see a 60s delay?

    Best regards,
    Bernard
  • I am sorry, the message seemed to cut off.
    I have pasted the rest of the message:


    Exactly one minute after (I think its the timeout of 60 seconds for tcp in linux):

    -- 11/03/10 14:10:21.192 Network: sent 14 of 14 bytes via tcp
    local address = 192.10.0.230:52030
    remote address = 192.10.0.230:4061
    -- 11/03/10 14:10:21.193 Network: closing tcp connection
    local address = 192.10.0.230:52030
    remote address = 192.10.0.230:4061

    This is the LONG time i was talking about!!!

    -- 11/03/10 14:24:42.250 Network: received 14 of 14 bytes via tcp
    local address = 192.10.0.230:49816
    remote address = 192.10.0.6:35731
    -- 11/03/10 14:24:42.250 Network: received 406 of 406 bytes via tcp
    local address = 192.10.0.230:49816
    remote address = 192.10.0.6:35731

    The end result is a task that should take 0,13 seconds, takes 921 seconds.
    After this, all works perfectly.
    Any help would be greatly appreciated.
  • bernard
    bernard Jupiter, FL
    Hi Jose,
    -- 11/03/10 14:10:21.192 Network: sent 14 of 14 bytes via tcp
    local address = 192.10.0.230:52030
    remote address = 192.10.0.230:4061
    -- 11/03/10 14:10:21.193 Network: closing tcp connection
    local address = 192.10.0.230:52030
    remote address = 192.10.0.230:4061

    This just indicates that the Ice client runtime is closing the connection to the registry, most likely because it has been idle for over a minute. That's the default behavior; while you troubleshoot these issues, you may want to disable client-side active connection management (ACM): see http://www.zeroc.com/doc/Ice-3.4.1-IceTouch/manual/PropRef.50.11.html

    The rest of your trace is 14 minutes later. Perhaps another pasting issue? It may be easier to attach your trace log than paste it inline.

    The first thing you want to find out here is what is taking so much time. Is it the connection establishment to the registry? To the target server? Or is it a request sent to the server that takes unusually long? (In that case, you'll want to look more closely at the implementation of this operation).

    Tracing and possibly attaching with a debugger are your best tools here. You can enable protocol tracing to get information about the requests sent/received by Ice:
    Ice.Trace.Protocol=1

    Best regards,
    Bernard