Archived

This forum has been archived. Please start a new discussion on GitHub.

Slow grid start - node seems to be unreachable

Hi Support Team

we currently test ICE 3.2.0 with a configuration that uses different applications running on two ice grid nodes "appserver" and "mediaserver". Following questions came up during the tests

  1. After starting both icegridnodes it takes >3min until mediaserver print its up message, Even if we have no apps deployed. As we have a load <0.3 on the servers and both are connected via GBit ethernet there seems no reason to be so slow. Do you have any ideas on how to improve this behaviour?

  2. During debugging behaviour 1 another point came up, ice does not create a log even if we set the IceGrid.Node.Output property to an existing writable folder and the icegrid is runs as root.

  3. Finally we need to start an apache that is started as root but switch to apache user after initialization. If ice grid starts this process (running as root) it prints a message "server is not allowed to run as root". While this is a nice security feature - is there a way to overwrite this behaviour?

We run Ice 3.2.0, at log level 2 we got the following output
At appserver (hosts registry, no applications deployed)
[root@int_pia1 fridge]# ./run.sh
Start IceGrid!
[ 06/13/07 14:22:19.195 Node: node 'appserver' up ]
[ 06/13/07 14:25:57.484 Node: node 'mediaserver' up ]
]

At mediaserver (no applications deployed)
[root@int_pia2 fridge]# ./run.sh
Start IceGrid!
[ 06/13/07 14:22:48.265 Replica: trying to establish session with replica Master' ]
[ 06/13/07 14:25:57.390 Replica: established session with replica Master' ]

Our appserver has the following config:
# File config.icegrid
# Registry configuration
IceGrid.Registry.Client.Endpoints=tcp -p 12000
IceGrid.Registry.Server.Endpoints=tcp
IceGrid.Registry.Internal.Endpoints=tcp
IceGrid.Registry.Data=db/registry
IceGrid.Registry.AdminPermissionsVerifier=IceGrid/NullPermissionsVerifier
IceGrid.Registry.AdminCryptPasswords=/conf/authentications.txt

IceGrid.Registry.Trace.Node=2
IceGrid.Registry.Trace.Locator=2
IceGrid.Node.RedirectErrToOut=1
IceGrid.Node.Output=/var/log/ice-applications/fridge

# Only required if you want servers to register
# themselves without explicit deployment. If all
# servers are deployed explicitly, this property
# can be left unset.
IceGrid.Registry.DynamicRegistration=1

# Node configuration
IceGrid.Node.CollocateRegistry=1
IceGrid.Node.Name=appserver
IceGrid.Node.Endpoints=tcp
IceGrid.Node.Data=db/node

# Set the default locator so the node and admin
# tools can find the registry.
Ice.Default.Locator=IceGrid/Locator:tcp -h int1.ourdomain.de -p 12000

The mediaserver uses this config.
# File config.icegrid

# Only required if you want servers to register
# themselves without explicit deployment. If all
# servers are deployed explicitly, this property
# can be left unset.
IceGrid.Registry.DynamicRegistration=1

# Node configuration
IceGrid.Node.CollocateRegistry=0
IceGrid.Node.Name=mediaserver
IceGrid.Node.Endpoints=tcp
IceGrid.Node.Data=db/node

IceGrid.Node.Trace.Replica=2
IceGrid.Registry.Trace.Locator=2
IceGrid.Node.RedirectErrToOut=1
IceGrid.Node.Output=/var/log/ice-applications/fridge

# Set the default locator so the node and admin
# tools can find the registry.
Ice.Default.Locator=IceGrid/Locator:tcp -h int1.ourdomain.de -p 12000


Thanks in advance

Wilko Kempa
sd&m AG
Common Service Plattform

Comments

  • benoit
    benoit Rennes, France
    Hi,

    Here's the answers to your three questions:
    1. You should enable network tracing on the mediaserver node to figure out why it's taking so long for the node to connect to the registry. You can add the Ice.Trace.Network=2 property to the node configuration file to see the network tracing. Most likely, the node is trying to connect to unreachable interfaces because of the configuration of your registry endpoints:
      IceGrid.Registry.Client.Endpoints=tcp -p 12000
      IceGrid.Registry.Server.Endpoints=tcp
      IceGrid.Registry.Internal.Endpoints=tcp
      Here, you don't specify any -h option in the endpoints so the registry will listen on all the local interfaces of your machine and will publish proxies with endpoints containing all these local interfaces. If this is the issue, you should specify explicitly on which interface the registry should listen. For example, to only listen on the interface with the IP address 192.168.1.2:
      IceGrid.Registry.Client.Endpoints=tcp -p 12000 -h 192.168.1.2
      IceGrid.Registry.Server.Endpoints=tcp -h 192.168.1.2
      IceGrid.Registry.Internal.Endpoints=tcp -h 192.168.1.2
    2. The IceGrid.Node.Output property is only to redirect the stderr/stdout of servers hosted and managed by the node. To redirect the stderr/stdout of the icegridnode process itself, you should use the Ice.StdErr and Ice.StdOut properties. See here for more information on these 2 properties.
    3. I'm afraid there's currently no way to override this behavior and to start a process as root. I agree that this is a bit restrictive. I'll add this on our TODO list for the next release.

    Cheers,
    Benoit.
  • benoit
    benoit Rennes, France
    Btw, unless you're running the IceGrid node in a secure network environment, I strongly recommend you to secure access to the IceGrid node endpoint with IceSSL to ensure that only the IceGrid registry can talk to the node. This is especially important if you run the IceGrid node as root.

    For more information on securing IceGrid, you can to take a look at Matthew's newsletter article from the Connections issue #17.

    Cheers,
    Benoit.
  • Problem solved thx

    Hi Benoit,

    you will be pleased to hear your were right.
    1. appserver (registry) tried to bind to all existing network interfaces, which failed for some. Adding -h .. to the registry properties fixed this problem. appserver and mediaserver become active within 200ms.
    2. After adding Ice.Trace.StdOut and Ice.Trace.StrErr the log files are written as expected. As a side effect we figure out that using the same config to start icegrid and icegridadmin is a bad idea as icegridadmin is locked if logging is activated.
    3. thanks for the patch

    Best Regards
    Wilko

    Wilko Kempa
    sd&m AG
    Common Service Plattform
  • I having a similar long load time issue. However, I added the -h fix and I am still getting the issue. I have confirmed that the issue is related to a network address. Here are my config files.

    Icebox:
    #
    # The IceBox server endpoint configuration
    #
    IceBox.ServiceManager.Endpoints=tcp -p 9998 -h 10.110.3.60
    
    #
    # The IceStorm service
    #
    IceBox.Service.IceStorm=IceStormService,32:createIceStorm --Ice.Config=config.service
    
    #
    # Warn about connection exceptions
    #
    #Ice.Warn.Connections=1
    
    #
    # Network Tracing
    #
    # 0 = no network tracing
    # 1 = trace connection establishment and closure
    # 2 = like 1, but more detailed
    # 3 = like 2, but also trace data transfer
    #
    #Ice.Trace.Network=1
    
    #
    # Protocol Tracing
    #
    # 0 = no protocol tracing
    # 1 = trace protocol messages
    #
    #Ice.Trace.Protocol=1
    

    Grid:
    IceGrid.InstanceName=BDR
    
    #
    # The IceGrid locator proxy.
    #
    Ice.Default.Locator=BDR/Locator:default -p 12000
    
    #
    # IceGrid registry configuration.
    #
    IceGrid.Registry.Client.Endpoints=default -p 12000 -h 10.110.3.60
    IceGrid.Registry.Server.Endpoints=default -h 10.110.3.60
    IceGrid.Registry.Internal.Endpoints=default -h 10.110.3.60
    IceGrid.Registry.Data=/usr/bright/apps/bdr/db/registry
    IceGrid.Registry.PermissionsVerifier=BDR/NullPermissionsVerifier
    IceGrid.Registry.AdminPermissionsVerifier=BDR/NullPermissionsVerifier
    IceGrid.Registry.SSLPermissionsVerifier=BDR/NullSSLPermissionsVerifier
    IceGrid.Registry.AdminSSLPermissionsVerifier=BDR/NullSSLPermissionsVerifier
    
    #
    # IceGrid node configuration.
    #
    IceGrid.Node.Name=localhost
    IceGrid.Node.Endpoints=default
    IceGrid.Node.Data=/usr/bright/apps/bdr/db/node
    IceGrid.Node.CollocateRegistry=1
    #IceGrid.Node.Output=db
    #IceGrid.Node.RedirectErrToOut=1
    
    #
    # Trace properties.
    #
    IceGrid.Node.Trace.Activator=1
    IceGrid.Node.Trace.Patch=1
    #IceGrid.Node.Trace.Adapter=2
    #IceGrid.Node.Trace.Server=3
    
    # Allow root access
    IceGrid.Node.AllowRunningServersAsRoot=1
    
    #
    # Dummy username and password for icegridadmin.
    #
    IceGridAdmin.Username=foo
    IceGridAdmin.Password=bar
    
    # Events
    IceStorm.TopicManager.Proxy=IceStorm/TopicManager
    
    IceGrid.Registry.DefaultTemplates=/usr/share/Ice-3.2.1/templates.xml
    
    # The devices to ignore on the server
    BDR.IGNORED_DEVICES=sda,sdb,vsd
    
    # The full path to the various executables
    Path.vsdcfg=/usr/bright/drive/vsdcfg
    Path.vsdinfo=/usr/bright/drive/vsdinfo
    Path.lmdd=/usr/bright/apps/lmdd
    Path.bdrdd=/usr/bright/apps/bdrdd
    Path.scum=/usr/bright/apps/scum
    Path.bdm=/usr/bright/apps/bdm
    
    # Number of seconds to wait until timing out waiting for a Semaphore
    Semaphore.waittime=15
    

    Service:
    #
    # This property is used by the administrative client to connect to IceStorm.
    #
    # IceStorm.TopicManager.Proxy=BDR/TopicManager:default -p 20000
    
    #
    # This property defines the endpoints on which the IceStorm
    # TopicManager listens.
    #
    IceStorm.TopicManager.Endpoints=default -p 20000 -h 10.110.0.60
    
    #
    # The IceStorm service instance name.
    #
    IceStorm.InstanceName=BDR
    
    #
    # This property defines the endpoints on which the topic
    # publisher objects listen. If you want to federate
    # IceStorm instances this must run on a fixed port (or use
    # IceGrid).
    #
    IceStorm.Publish.Endpoints=default -p 20001:udp -p 20001
    
    #
    # TopicManager Tracing
    #
    # 0 = no tracing
    # 1 = trace topic creation, subscription, unsubscription
    # 2 = like 1, but with more detailed subscription information
    #
    IceStorm.Trace.TopicManager=2
    
    Ice.Default.Locator=BDR/Locator:default -p 12000 -h 10.110.0.60
    
    #
    # Topic Tracing
    #
    # 0 = no tracing
    # 1 = trace unsubscription diagnostics
    #
    IceStorm.Trace.Topic=1
    
    #
    # Subscriber Tracing
    #
    # 0 = no tracing
    # 1 = subscriber diagnostics (subscription, unsubscription, event
    #     propagation failures)
    #
    IceStorm.Trace.Subscriber=1
    
    #
    # Flush Tracing (for batch mode transfer flushing)
    #
    # 0 = no tracing
    # 1 = trace activity of flusher thread
    #
    IceStorm.Trace.Flush=1
    
    #
    # Amount of time in milliseconds between flushes for batch mode
    # transfer. The minimum allowable value is 100ms.
    #
    IceStorm.Flush.Timeout=2000
    
    #
    # Network Tracing
    #
    # 0 = no network tracing
    # 1 = trace connection establishment and closure
    # 2 = like 1, but more detailed
    # 3 = like 2, but also trace data transfer
    #
    #Ice.Trace.Network=1
    
    #
    # This property defines the home directory of the Freeze 
    # database environment for the IceStorm service.
    #
    Freeze.DbEnv.IceStorm.DbHome=/usr/bright/apps/bdr/db/icestorm
    
  • benoit
    benoit Rennes, France
    Hi,

    I can't see anything wrong in the IceBox and IceGrid configuration files. In the service configuration file, you don't specify endpoints for the IceStorm.Publish.Endpoints property and the endpoints from Ice.Default.Locator don't match the ones you define for IceGrid (10.110.3.60 vs. 10.110.0.60).

    I'm afraid without more information it's difficult to figure out what the problem could be. Could you detail when the timing issue occurs and for which servers? Did you run the servers with Ice.Trace.Network=2 to see if they aren't still trying to connect to some unreachable IP addresses?

    Could you also detail your deployment? It sounds like you're using an IceGrid node with a collocated registry and an IceBox server with the IceStorm service but the IceBox server isn't managed by the IceGrid node. Is this correct?


    Thanks,

    Cheers,
    Benoit.
  • I changed the IP addresses to be the same. That didn't help.

    The issue seems to be related to Linux networking alias. Eth0 is given two IP addresses and is divided into 2 devices, eth0 and eth0:0. Either, the eth0:0 device is giving the problems or the IP address given to eth0:0 is giving the problems. Do you know of any binding issues with aliases?

    You are correct about the deployment. I will post the trace in a bit.
  • benoit
    benoit Rennes, France
    I'm not aware of any issues with aliases. Let us know if you need more help with this.

    Cheers,
    Benoit.
  • Sorry, I was sidetracked. Here is the output with Ice.Trace.Network=2.

    Here is the client output:
    [ 3/5/08 18:41:29:173 Network: attempting to bind to tcp socket 10.110.3.10:0 ]
    [ 3/5/08 18:41:29:175 Network: accepting tcp connections at 10.110.3.10:52553 ]
    [ 3/5/08 18:41:29:176 Network: attempting to bind to tcp socket 127.0.0.1:0 ]
    [ 3/5/08 18:41:29:176 Network: accepting tcp connections at 127.0.0.1:40687 ]
    [ 3/5/08 18:41:29:182 Network: attempting to bind to udp socket 10.110.3.10:0 ]
    [ 3/5/08 18:41:29:183 Network: starting to receive udp packets
      local address = 10.110.3.10:32930
      remote address = <not connected> ]
    [ 3/5/08 18:41:29:191 Network: attempting to bind to udp socket 127.0.0.1:0 ]
    [ 3/5/08 18:41:29:191 Network: starting to receive udp packets
      local address = 127.0.0.1:32932
      remote address = <not connected> ]
    [ 3/5/08 18:41:29:194 Network: attempting to bind to ssl socket 10.110.3.10:0 ]
    [ 3/5/08 18:41:29:466 Network: accepting ssl connections at 10.110.3.10:42473 ]
    [ 3/5/08 18:41:29:466 Network: attempting to bind to ssl socket 127.0.0.1:0 ]
    [ 3/5/08 18:41:29:467 Network: accepting ssl connections at 127.0.0.1:35930 ]
    18:41:29,474 [main] DEBUG com.brightsystemsinc.util.ClientAdapter  - Object adapter set
    [ 3/5/08 18:41:30:908 Network: trying to establish tcp connection to 10.110.3.60:12000 ]
    [ 3/5/08 18:41:30:910 Network: tcp connection established
      local address = 10.110.3.10:48942
      remote address = 10.110.3.60:12000 ]
    18:41:39,924 [main] ERROR com.brightsystemsinc.bdrgui.BDRGUI  - There was an error
    Ice.NoEndpointException
        proxy = "BDRDriveInfoService -t"
    	at IceInternal.RoutableReference.createConnection(RoutableReference.java:418)
    	at IceInternal.IndirectReference.getConnection(IndirectReference.java:181)
    	at Ice._ObjectDelM.setup(_ObjectDelM.java:258)
    	at Ice.ObjectPrxHelperBase.__getDelegate(ObjectPrxHelperBase.java:1050)
    	at Ice.ObjectPrxHelperBase.ice_isA(ObjectPrxHelperBase.java:85)
    	at Ice.ObjectPrxHelperBase.ice_isA(ObjectPrxHelperBase.java:61)
    	at com.brightsystemsinc.bdrinterface.BDRInterface.BDRDriveInfo.BDRDriveInfoServicePrxHelper.checkedCast(BDRDriveInfoServicePrxHelper.java:297)
    	at com.brightsystemsinc.display.CartridgeSetTabDisplay.init(CartridgeSetTabDisplay.java:134)
    	at com.brightsystemsinc.display.CartridgeSetTabDisplay.<init>(CartridgeSetTabDisplay.java:106)
    	at com.brightsystemsinc.bdrgui.BDRGUI.initFrame(BDRGUI.java:171)
    	at com.brightsystemsinc.bdrgui.BDRGUI.run(BDRGUI.java:104)
    	at Ice.Application.main(Application.java:114)
    	at Ice.Application.main(Application.java:57)
    	at com.brightsystemsinc.bdrgui.BDRGUI.main(BDRGUI.java:402)
    

    Server Output:
    [ 03/05/08 10:42:04.981 Network: trying to establish tcp connection to 127.0.0.1:12000 ]
    [ 03/05/08 10:42:04.982 Network: trying to establish tcp connection to 127.0.0.1:12000 ]
    icegridadmin: could not contact the default locator:
    Network.cpp:664: Ice::ConnectionRefusedException:
    connection refused: Connection refused
    

    Icebox output:
    [ 03/05/08 10:42:48.847 Network: attempting to bind to tcp socket 10.110.3.60:9998 ]
    [ 03/05/08 10:42:48.847 Network: accepting tcp connections at 10.110.3.60:9998 ]
    [ 03/05/08 10:42:48.861 Network: attempting to bind to tcp socket 10.110.3.60:20000 ]
    [ 03/05/08 10:42:48.861 Network: accepting tcp connections at 10.110.3.60:20000 ]
    [ 03/05/08 10:42:48.862 Network: attempting to bind to tcp socket 127.0.0.1:20001 ]
    [ 03/05/08 10:42:48.862 Network: accepting tcp connections at 127.0.0.1:20001 ]
    [ 03/05/08 10:42:48.862 Network: attempting to bind to tcp socket 10.110.3.60:20001 ]
    [ 03/05/08 10:42:48.862 Network: accepting tcp connections at 10.110.3.60:20001 ]
    [ 03/05/08 10:42:48.862 Network: attempting to bind to udp socket 127.0.0.1:20001 ]
    [ 03/05/08 10:42:48.862 Network: starting to receive udp packets
    local address = 127.0.0.1:20001
    remote address = <not connected> ]
    [ 03/05/08 10:42:48.862 Network: attempting to bind to udp socket 10.110.3.60:20001 ]
    [ 03/05/08 10:42:48.862 Network: starting to receive udp packets
    local address = 10.110.3.60:20001
    remote address = <not connected> ]

    Thanks for your help.
  • Nevermind, I forgot to add the IP for IceGrid.
  • I'm back again. Somehow the 2nd IP address got lost. 1.53.1.8 is an IP on the server's network, but is not part of the client's network. 10.110.3.60 is part of the client's network and is configured for that IP address. Here is the output from the client:
    [ 3/5/08 21:41:21:297 Network: attempting to bind to tcp socket 10.110.3.10:0 ]
    [ 3/5/08 21:41:21:299 Network: accepting tcp connections at 10.110.3.10:52033 ]
    [ 3/5/08 21:41:21:299 Network: attempting to bind to tcp socket 127.0.0.1:0 ]
    [ 3/5/08 21:41:21:300 Network: accepting tcp connections at 127.0.0.1:55538 ]
    [ 3/5/08 21:41:21:304 Network: attempting to bind to udp socket 10.110.3.10:0 ]
    [ 3/5/08 21:41:21:304 Network: starting to receive udp packets
      local address = 10.110.3.10:32892
      remote address = <not connected> ]
    [ 3/5/08 21:41:21:312 Network: attempting to bind to udp socket 127.0.0.1:0 ]
    [ 3/5/08 21:41:21:313 Network: starting to receive udp packets
      local address = 127.0.0.1:32893
      remote address = <not connected> ]
    [ 3/5/08 21:41:21:316 Network: attempting to bind to ssl socket 10.110.3.10:0 ]
    [ 3/5/08 21:41:21:581 Network: accepting ssl connections at 10.110.3.10:45847 ]
    [ 3/5/08 21:41:21:582 Network: attempting to bind to ssl socket 127.0.0.1:0 ]
    [ 3/5/08 21:41:21:583 Network: accepting ssl connections at 127.0.0.1:35470 ]
    21:41:21,590 [main] DEBUG com.brightsystemsinc.util.ClientAdapter  - Object adapter set
    [ 3/5/08 21:41:23:001 Network: trying to establish tcp connection to 10.110.3.60:12000 ]
    [ 3/5/08 21:41:23:003 Network: tcp connection established
      local address = 10.110.3.10:45268
      remote address = 10.110.3.60:12000 ]
    [ 3/5/08 21:41:23:012 Network: trying to establish tcp connection to 1.53.1.8:32985 ]
    [ 3/5/08 21:43:21:301 Network: shutting down tcp connection for reading and writing
      local address = 10.110.3.10:45268
      remote address = 10.110.3.60:12000 ]
    [ 3/5/08 21:43:21:301 Network: closing tcp connection
      local address = 10.110.3.10:45268
      remote address = 10.110.3.60:12000 ]
    
  • Here is the output once the program starts.
    [ 3/5/08 21:41:21:297 Network: attempting to bind to tcp socket 10.110.3.10:0 ]
    [ 3/5/08 21:41:21:299 Network: accepting tcp connections at 10.110.3.10:52033 ]
    [ 3/5/08 21:41:21:299 Network: attempting to bind to tcp socket 127.0.0.1:0 ]
    [ 3/5/08 21:41:21:300 Network: accepting tcp connections at 127.0.0.1:55538 ]
    [ 3/5/08 21:41:21:304 Network: attempting to bind to udp socket 10.110.3.10:0 ]
    [ 3/5/08 21:41:21:304 Network: starting to receive udp packets
      local address = 10.110.3.10:32892
      remote address = <not connected> ]
    [ 3/5/08 21:41:21:312 Network: attempting to bind to udp socket 127.0.0.1:0 ]
    [ 3/5/08 21:41:21:313 Network: starting to receive udp packets
      local address = 127.0.0.1:32893
      remote address = <not connected> ]
    [ 3/5/08 21:41:21:316 Network: attempting to bind to ssl socket 10.110.3.10:0 ]
    [ 3/5/08 21:41:21:581 Network: accepting ssl connections at 10.110.3.10:45847 ]
    [ 3/5/08 21:41:21:582 Network: attempting to bind to ssl socket 127.0.0.1:0 ]
    [ 3/5/08 21:41:21:583 Network: accepting ssl connections at 127.0.0.1:35470 ]
    21:41:21,590 [main] DEBUG com.brightsystemsinc.util.ClientAdapter  - Object adapter set
    [ 3/5/08 21:41:23:001 Network: trying to establish tcp connection to 10.110.3.60:12000 ]
    [ 3/5/08 21:41:23:003 Network: tcp connection established
      local address = 10.110.3.10:45268
      remote address = 10.110.3.60:12000 ]
    [ 3/5/08 21:41:23:012 Network: trying to establish tcp connection to 1.53.1.8:32985 ]
    [ 3/5/08 21:43:21:301 Network: shutting down tcp connection for reading and writing
      local address = 10.110.3.10:45268
      remote address = 10.110.3.60:12000 ]
    [ 3/5/08 21:43:21:301 Network: closing tcp connection
      local address = 10.110.3.10:45268
      remote address = 10.110.3.60:12000 ]
    [ 3/5/08 21:44:32:024 Network: trying to establish tcp connection to 10.110.3.60:32986 ]
    [ 3/5/08 21:44:32:025 Network: tcp connection established
      local address = 10.110.3.10:39440
      remote address = 10.110.3.60:32986 ]
    [ 3/5/08 21:44:32:046 Network: trying to establish tcp connection to 10.110.3.60:12000 ]
    [ 3/5/08 21:44:32:047 Network: tcp connection established
      local address = 10.110.3.10:42143
      remote address = 10.110.3.60:12000 ]
    [ 3/5/08 21:44:32:054 Network: trying to establish tcp connection to 1.53.1.8:32967 ]
    [ 3/5/08 21:46:21:318 Network: shutting down tcp connection for reading and writing
      local address = 10.110.3.10:42143
      remote address = 10.110.3.60:12000 ]
    [ 3/5/08 21:46:21:318 Network: closing tcp connection
      local address = 10.110.3.10:42143
      remote address = 10.110.3.60:12000 ]
    [ 3/5/08 21:46:21:319 Network: shutting down tcp connection for reading and writing
      local address = 10.110.3.10:39440
      remote address = 10.110.3.60:32986 ]
    [ 3/5/08 21:46:21:319 Network: closing tcp connection
      local address = 10.110.3.10:39440
      remote address = 10.110.3.60:32986 ]
    [ 3/5/08 21:47:41:066 Network: trying to establish tcp connection to 10.110.3.60:32966 ]
    [ 3/5/08 21:47:41:067 Network: tcp connection established
      local address = 10.110.3.10:34129
      remote address = 10.110.3.60:32966 ]
    [ 3/5/08 21:47:41:074 Network: trying to establish tcp connection to 10.110.3.60:12000 ]
    [ 3/5/08 21:47:41:075 Network: tcp connection established
      local address = 10.110.3.10:42189
      remote address = 10.110.3.60:12000 ]
    21:47:41,078 [main] DEBUG com.brightsystemsinc.util.EventHelper  - Subscribed to bdrcartridgesetstatuschange
    [ 3/5/08 21:47:41:081 Network: trying to establish tcp connection to 10.110.3.60:32986 ]
    [ 3/5/08 21:47:41:082 Network: tcp connection established
      local address = 10.110.3.10:39488
      remote address = 10.110.3.60:32986 ]
    21:47:41,080 [main] DEBUG com.brightsystemsinc.util.EventHelper  - Subscribed to bdrcartridgestatuschange
    21:47:41,081 [main] DEBUG com.brightsystemsinc.util.EventHelper  - Subscribed to bdrfilesystem
    21:47:41,219 [main] DEBUG com.brightsystemsinc.display.CartridgeSetTabDisplay  - Found tab for CartridgeSet ICartridgeSet[id=0803AY0F,numberInPack=1,raidDescription=Max Speed,raidLevel=raid0,raidType=Linux Software,statusDescription=,storageVersion=SNFS 3.0.2 Build 43,status=MountingCC,format=InitializedCartridge] at tab index 0
    21:47:41,219 [main] DEBUG com.brightsystemsinc.display.CartridgeSetTabDisplay  - Added CartridgeSet tab 0803AY0F
    [ 3/5/08 21:47:41:240 Network: trying to establish tcp connection to 1.53.1.8:32982 ]
    [ 3/5/08 21:49:21:334 Network: shutting down tcp connection for reading and writing
      local address = 10.110.3.10:42189
      remote address = 10.110.3.60:12000 ]
    [ 3/5/08 21:49:21:334 Network: closing tcp connection
      local address = 10.110.3.10:42189
      remote address = 10.110.3.60:12000 ]
    [ 3/5/08 21:49:21:334 Network: shutting down tcp connection for reading and writing
      local address = 10.110.3.10:39488
      remote address = 10.110.3.60:32986 ]
    [ 3/5/08 21:49:21:335 Network: closing tcp connection
      local address = 10.110.3.10:39488
      remote address = 10.110.3.60:32986 ]
    [ 3/5/08 21:49:21:335 Network: shutting down tcp connection for reading and writing
      local address = 10.110.3.10:34129
      remote address = 10.110.3.60:32966 ]
    [ 3/5/08 21:49:21:336 Network: closing tcp connection
      local address = 10.110.3.10:34129
      remote address = 10.110.3.60:32966 ]
    [ 3/5/08 21:50:50:253 Network: trying to establish tcp connection to 10.110.3.60:32983 ]
    [ 3/5/08 21:50:50:254 Network: tcp connection established
      local address = 10.110.3.10:57581
      remote address = 10.110.3.60:32983 ]
    [ 3/5/08 21:50:50:289 Network: trying to establish tcp connection to 10.110.3.60:32966 ]
    [ 3/5/08 21:50:50:289 Network: tcp connection established
      local address = 10.110.3.10:60699
      remote address = 10.110.3.60:32966 ]
    21:50:50,294 [Thread-18] DEBUG com.brightsystemsinc.util.EventHelper  - Subscribed to bdrcustomerinfo
    21:50:50,296 [Thread-18] DEBUG com.brightsystemsinc.display.ContentsDescriptionTable  - Response to ContentsDescription.list() came for 0803AMNF
    21:50:50,349 [main] DEBUG com.brightsystemsinc.display.CartridgeSetTabDisplay  - Added CartridgeSet tab 0803AMNF
    [ 3/5/08 21:50:50:358 Network: trying to establish tcp connection to 10.110.3.60:32986 ]
    [ 3/5/08 21:50:50:359 Network: tcp connection established
      local address = 10.110.3.10:35602
      remote address = 10.110.3.60:32986 ]
    21:50:50,382 [main] DEBUG com.brightsystemsinc.util.EventHelper  - Subscribed to bdrfilesystem
    21:50:50,400 [main] DEBUG com.brightsystemsinc.util.EventHelper  - Subscribed to bdrcartridgesetstatuschange
    21:50:50,401 [main] DEBUG com.brightsystemsinc.util.EventHelper  - Subscribed to bdrcartridgestatuschange
    21:50:50,402 [main] DEBUG com.brightsystemsinc.util.EventHelper  - Subscribed to bdrmount
    21:50:50,403 [main] DEBUG com.brightsystemsinc.util.EventHelper  - Subscribed to bdrunmount
    21:50:50,405 [main] DEBUG com.brightsystemsinc.util.EventHelper  - Subscribed to bdrcustomerinit
    [ 3/5/08 21:52:21:357 Network: shutting down tcp connection for reading and writing
      local address = 10.110.3.10:60699
      remote address = 10.110.3.60:32966 ]
    [ 3/5/08 21:52:21:357 Network: closing tcp connection
      local address = 10.110.3.10:60699
      remote address = 10.110.3.60:32966 ]
    [ 3/5/08 21:52:21:358 Network: shutting down tcp connection for reading and writing
      local address = 10.110.3.10:35602
      remote address = 10.110.3.60:32986 ]
    [ 3/5/08 21:52:21:359 Network: closing tcp connection
      local address = 10.110.3.10:35602
      remote address = 10.110.3.60:32986 ]
    [ 3/5/08 21:52:21:359 Network: shutting down tcp connection for reading and writing
      local address = 10.110.3.10:57581
      remote address = 10.110.3.60:32983 ]
    [ 3/5/08 21:52:21:359 Network: closing tcp connection
      local address = 10.110.3.10:57581
      remote address = 10.110.3.60:32983 ]
    
  • benoit
    benoit Rennes, France
    Hi Jesse,

    Could you please detail a little more what is your exact problem?

    Both traces show that an attempt is made to connect to an unreachable address (1.53.1.8) and that the connection attempt hangs for about 2 minutes. This is expected if you don't use timeouts. You can set the Ice.Override.ConnectTimeout property if you want the connection attempt to timeout sooner.

    However, the best is to avoid connection attempts to invalid addresses. You should find out which proxy contains this IP address and see if the server configuration can be changed so that it doesn't publish proxies with unreachable addresses.

    Cheers,
    Benoit.
  • The issue is that I believe I am specifying the correct IP address for the server and client to use (10.110.3.60). However, the client is still looking for the server at an invalid IP address (1.53.1.8 and 127.0.0.1).

    Here are my configs:

    Client:
    # Location of the IceGrid
    Ice.Default.Locator=BDR/Locator:default -h 10.110.3.60 -p 12000
     
    Callback.Client.Endpoints=tcp:udp:ssl
    
    # Events
    IceStorm.TopicManager.Proxy=IceStorm/TopicManager
    
    BDR.datawrangler.path=/usr/bright/bin/dwGUI
    BDR.terminal.path=gnome-terminal
    
    Ice.Default.Package=com.brightsystemsinc.datamodel
    #
    # SSL Configuration
    #
    Ice.Plugin.IceSSL=IceSSL.PluginFactory
    Ice.ThreadPerConnection=1
    IceSSL.DefaultDir=certs
    IceSSL.Keystore=client.jks
    IceSSL.Password=password
    IceSSL.Truststore=certs.jks
    
    Ice.Trace.Network=2
    

    Icebox:
    #
    # The IceBox server endpoint configuration
    #
    IceBox.ServiceManager.Endpoints=tcp -p 9998 -h 10.110.3.60
    
    #
    # The IceStorm service
    #
    IceBox.Service.IceStorm=IceStormService,32:createIceStorm --Ice.Config=config.service
    
    #
    # Warn about connection exceptions
    #
    #Ice.Warn.Connections=1
    
    #
    # Network Tracing
    #
    # 0 = no network tracing
    # 1 = trace connection establishment and closure
    # 2 = like 1, but more detailed
    # 3 = like 2, but also trace data transfer
    #
    # Ice.Trace.Network=2
    
    #
    # Protocol Tracing
    #
    # 0 = no protocol tracing
    # 1 = trace protocol messages
    #
    Ice.Trace.Protocol=1
    

    Grid:
    IceGrid.InstanceName=BDR
    
    #
    # The IceGrid locator proxy.
    #
    Ice.Default.Locator=BDR/Locator:default -p 12000 -h 10.110.3.60
    
    #
    # IceGrid registry configuration.
    #
    IceGrid.Registry.Client.Endpoints=default -p 12000 -h 10.110.3.60
    IceGrid.Registry.Server.Endpoints=default -h 10.110.3.60
    IceGrid.Registry.Internal.Endpoints=default -h 10.110.3.60
    IceGrid.Registry.Data=/usr/bright/apps/bdr/db/registry
    IceGrid.Registry.PermissionsVerifier=BDR/NullPermissionsVerifier
    IceGrid.Registry.AdminPermissionsVerifier=BDR/NullPermissionsVerifier
    IceGrid.Registry.SSLPermissionsVerifier=BDR/NullSSLPermissionsVerifier
    IceGrid.Registry.AdminSSLPermissionsVerifier=BDR/NullSSLPermissionsVerifier
    
    #
    # IceGrid node configuration.
    #
    IceGrid.Node.Name=localhost
    IceGrid.Node.Endpoints=default
    IceGrid.Node.Data=/usr/bright/apps/bdr/db/node
    IceGrid.Node.CollocateRegistry=1
    #IceGrid.Node.Output=db
    #IceGrid.Node.RedirectErrToOut=1
    
    #
    # Trace properties.
    #
    IceGrid.Node.Trace.Activator=1
    IceGrid.Node.Trace.Patch=1
    #IceGrid.Node.Trace.Adapter=2
    #IceGrid.Node.Trace.Server=3
    
    # Ice.Trace.Network=2
    
    # Allow root access
    IceGrid.Node.AllowRunningServersAsRoot=1
    
    #
    # Dummy username and password for icegridadmin.
    #
    IceGridAdmin.Username=foo
    IceGridAdmin.Password=bar
    
    # Events
    IceStorm.TopicManager.Proxy=IceStorm/TopicManager
    
    IceGrid.Registry.DefaultTemplates=/usr/share/Ice-3.2.1/templates.xml
    
    # The devices to ignore on the server
    BDR.IGNORED_DEVICES=sda,vsd
    
    # The full path to the various executables
    Path.vsdcfg=/usr/bright/drive/vsdcfg
    Path.vsdinfo=/usr/bright/drive/vsdinfo
    Path.lmdd=/usr/bright/apps/lmdd
    Path.bdrdd=/usr/bright/apps/bdrdd
    Path.scum=/usr/bright/apps/scum
    Path.bdm=/usr/bright/apps/bdm
    
    # Number of seconds to wait until timing out waiting for a Semaphore
    Semaphore.waittime=15
    

    Icestorm:
    #
    # This property is used by the administrative client to connect to IceStorm.
    #
    # IceStorm.TopicManager.Proxy=BDR/TopicManager:default -p 20000
    
    #
    # This property defines the endpoints on which the IceStorm
    # TopicManager listens.
    #
    IceStorm.TopicManager.Endpoints=default -p 20000 -h 10.110.3.60
    
    #
    # The IceStorm service instance name.
    #
    IceStorm.InstanceName=BDR
    
    #
    # This property defines the endpoints on which the topic
    # publisher objects listen. If you want to federate
    # IceStorm instances this must run on a fixed port (or use
    # IceGrid).
    #
    IceStorm.Publish.Endpoints=default -p 20001:udp -p 20001
    
    #
    # TopicManager Tracing
    #
    # 0 = no tracing
    # 1 = trace topic creation, subscription, unsubscription
    # 2 = like 1, but with more detailed subscription information
    #
    IceStorm.Trace.TopicManager=2
    
    Ice.Default.Locator=BDR/Locator:default -p 12000 -h 10.110.3.60
    
    #
    # Topic Tracing
    #
    # 0 = no tracing
    # 1 = trace unsubscription diagnostics
    #
    IceStorm.Trace.Topic=1
    
    #
    # Subscriber Tracing
    #
    # 0 = no tracing
    # 1 = subscriber diagnostics (subscription, unsubscription, event
    #     propagation failures)
    #
    IceStorm.Trace.Subscriber=1
    
    #
    # Flush Tracing (for batch mode transfer flushing)
    #
    # 0 = no tracing
    # 1 = trace activity of flusher thread
    #
    IceStorm.Trace.Flush=1
    
    #
    # Amount of time in milliseconds between flushes for batch mode
    # transfer. The minimum allowable value is 100ms.
    #
    IceStorm.Flush.Timeout=2000
    
    #
    # Network Tracing
    #
    # 0 = no network tracing
    # 1 = trace connection establishment and closure
    # 2 = like 1, but more detailed
    # 3 = like 2, but also trace data transfer
    #
    # Ice.Trace.Network=2
    
    #
    # This property defines the home directory of the Freeze 
    # database environment for the IceStorm service.
    #
    Freeze.DbEnv.IceStorm.DbHome=/usr/bright/apps/bdr/db/icestorm
    
  • benoit
    benoit Rennes, France
    Hi Jesse,

    As mentioned previously, you don't specify the host for the IceStorm.Publish.Endpoints property. Could this perhaps be the problem?

    So if I understand it correctly, it's the client which is hanging when it tries to connect to IceStorm? Please correct me if I'm mistaken.

    Did you figure out which Ice invocation was hanging by attaching with the debugger to the process while it hangs or by adding some tracing? Once you figure out where it hangs, you'll know which proxy is causing the hang and then you can check where the proxy is coming from and see if the configuration of the application that creates the proxy is correct.

    Cheers,
    Benoit.
  • I added the proper host endpoint to the Icestorm config. Here is the new one:
    #
    # This property is used by the administrative client to connect to IceStorm.
    #
    # IceStorm.TopicManager.Proxy=BDR/TopicManager:default -p 20000
    
    #
    # This property defines the endpoints on which the IceStorm
    # TopicManager listens.
    #
    IceStorm.TopicManager.Endpoints=default -p 20000 -h 10.110.3.60
    
    #
    # The IceStorm service instance name.
    #
    IceStorm.InstanceName=BDR
    
    #
    # This property defines the endpoints on which the topic
    # publisher objects listen. If you want to federate
    # IceStorm instances this must run on a fixed port (or use
    # IceGrid).
    #
    IceStorm.Publish.Endpoints=default -p 20001:udp -p 20001 -h 10.110.3.60
    
    #
    # TopicManager Tracing
    #
    # 0 = no tracing
    # 1 = trace topic creation, subscription, unsubscription
    # 2 = like 1, but with more detailed subscription information
    #
    IceStorm.Trace.TopicManager=2
    
    Ice.Default.Locator=BDR/Locator:default -p 12000 -h 10.110.3.60
    
    #
    # Topic Tracing
    #
    # 0 = no tracing
    # 1 = trace unsubscription diagnostics
    #
    IceStorm.Trace.Topic=1
    
    #
    # Subscriber Tracing
    #
    # 0 = no tracing
    # 1 = subscriber diagnostics (subscription, unsubscription, event
    #     propagation failures)
    #
    IceStorm.Trace.Subscriber=1
    
    #
    # Flush Tracing (for batch mode transfer flushing)
    #
    # 0 = no tracing
    # 1 = trace activity of flusher thread
    #
    IceStorm.Trace.Flush=1
    
    #
    # Amount of time in milliseconds between flushes for batch mode
    # transfer. The minimum allowable value is 100ms.
    #
    IceStorm.Flush.Timeout=2000
    
    #
    # Network Tracing
    #
    # 0 = no network tracing
    # 1 = trace connection establishment and closure
    # 2 = like 1, but more detailed
    # 3 = like 2, but also trace data transfer
    #
    # Ice.Trace.Network=2
    
    #
    # This property defines the home directory of the Freeze 
    # database environment for the IceStorm service.
    #
    Freeze.DbEnv.IceStorm.DbHome=/usr/bright/apps/bdr/db/icestorm
    

    The client is hanging on the following line:
    IceStorm.TopicManagerPrx manager = IceStorm.TopicManagerPrxHelper.checkedCast( Ice.Application.communicator()
    				.propertyToProxy( "IceStorm.TopicManager.Proxy" ) );
    
  • benoit
    benoit Rennes, France
    Ok, so the proxy causing the hang is "IceStorm/TopicManager" (the value of the IceStorm.TopicManager.Proxy property in the client configuration file).

    This is a well-known proxy. How did you register this well-known proxy with the IceGrid registry? I suppose you used the icegridadmin "object add" command or the IceGrid GUI since IceStorm isn't managed by an IceGrid node, is this correct?

    If this is the case, most likely, the -h option for the well-known proxy hasn't been specified when it was added to the IceGrid registry. You can check the endpoints of this well-known proxy with the "object describe" command or the IceGrid administrative GUI. Updating the endpoints of this well-known proxy to include the -h option should fix the hang.

    Cheers,
    Benoit.
  • I am starting the IceStorm server in my application.xml.

    Here is the line:
    <server-instance template="IceStorm" instance-name="IceStorm">
          </server-instance>
    

    I tried adding the endpoint property. It didn't help.
    <server-instance template="IceStorm" instance-name="IceStorm">
          	<properties>
        		<property name="IceStorm.Publish.Endpoints" value="default -p 20001:udp -p 20001 -h 10.110.3.60"/>
    		</properties>
          </server-instance>
    
  • benoit
    benoit Rennes, France
    Hi Jesse,

    I'm afraid I don't understand your deployment. I thought you were not deploying IceStorm on an IceGrid node with a deployment descriptor but instead was manually starting IceStorm from the command line using a configuration file. If you're indeed using a deployment descriptor to deploy IceStorm on an IceGrid node, you need to update the XML file for configuration changes (and run the icegridadmin "application update" command after each update). Could post your application.xml file? I suspect the problem is coming from the definition of the "endpoints" attribute for the TopicManager adapter (missing -h option).

    Cheers,
    Benoit.
  • I am starting IceStorm automatically via IceGrid. Here is my application.xml:
    <icegrid>
      <application name="BDR" import-default-templates="true">
        <node name="localhost">
          <server id="BDRServer" exe="/usr/bright/apps/bdr_server_start.sh" activation="always" user="root">
    		<adapter name="CpService" endpoints="tcp -h 10.110.3.60" register-process="true">
    		  <object identity="BDRCpService" type="::BDRCp::BDRCpService" property="Identity"/>
    		</adapter>
    		<adapter name="CustomerInfoService" endpoints="tcp -h 10.110.3.60" register-process="true">
    		  <object identity="BDRCustomerInfoService" type="::BDRCustomerInfo::BDRCustomerInfoService" property="Identity"/>
    		</adapter>
    		<adapter name="DriveInfoService" endpoints="tcp -h 10.110.3.60" register-process="true">
    		  <object identity="BDRDriveInfoService" type="::BDRDriveInfo::BDRDriveInfoService" property="Identity"/>
    		</adapter>
    		<adapter name="CustomerInitService" endpoints="tcp -h 10.110.3.60" register-process="true">
    		  <object identity="BDRCustomerInitService" type="::BDRCustomerInit::BDRCustomerInitService" property="Identity"/>
    		</adapter>
    		<adapter name="MountService" endpoints="tcp -h 10.110.3.60" register-process="true">
    		  <object identity="BDRMountService" type="::BDRMount::BDRMountService" property="Identity"/>
    		</adapter>
    		<adapter name="UnmountService" endpoints="tcp -h 10.110.3.60" register-process="true">
    		  <object identity="BDRUnmountService" type="::BDRUnmount::BDRUnmountService" property="Identity"/>
    		</adapter>
          </server>
          
          <server-instance template="IceStorm" instance-name="IceStorm">
          	<properties>
        		<property name="IceStorm.Publish.Endpoints" value="default -p 20001:udp -p 20001 -h 10.110.3.60"/>
        		<property name="IceStorm.TopicManager.Endpoints" value="default -p 20000 -h 10.110.3.60"/>
    		</properties>
          </server-instance>
        </node>
      </application>
    </icegrid>
    

    I have already ran the "appliation update" after updating my application.xml.
  • benoit
    benoit Rennes, France
    The properties defined in the <server-instance> element are properties that will be defined in the IceBox server configuration file not the IceStorm service configuration file and will therefore have no effect on the IceStorm service.

    You can simply use the following instead:
          <server-instance template="IceStorm" instance-name="IceStorm" topic-manager-endpoints="default -p 20000 -h 10.110.3.60" publish-endpoints="default -p 20001:udp -p 20001 -h 10.110.3.60"/>
    

    Or:
         <server-instance template="IceStorm" instance-name="IceStorm">
           	 <properties service="IceStorm">
        		<property name="IceStorm.Publish.Endpoints" value="default -p 20001:udp -p 20001 -h 10.110.3.60"/>
        		<property name="IceStorm.TopicManager.Endpoints" value="default -p 20000 -h 10.110.3.60"/>
             </properties>
          </server-instance>
    

    Make sure to run "application update" on the updated descriptor to push the changes to the IceGrid registry.

    You can also use the "adapter endpoints IceStorm.TopicManager" command to check the runtime endpoints of the topic manager adapter once the IceBox server is started. In theory, it should only contain the 10.110.3.60 IP address.

    Cheers,
    Benoit.
  • It looks like that was the issue. Thanks for your help.