Archived

This forum has been archived. Please start a new discussion on GitHub.

[ICEBOX] Pb registering more than 400 servers

I have a Ice-3.3.0 grid composed of 71 octo-cores under windows server 2003.

I cannot register more than 400 servers to IceBox. After that, servers trying to register get an exception:
TcpTransceiver.cpp:158: Ice::ConnectionLostException: connection lost: recv() returned zero

Nothing is present in IceBox logs.

With ~250 servers, it works perfectly.

I tried to increase ThreadPool but it does not change anything.
When IceBox reachs a certain point, new servers can't register. After that, I have to unregister a server to register another one. Is there some kind of property `IceBox.MaximumServers' I missed in the documentation?


FYI here is the IceBox XML configuration I use:

<node name="spw-grid-118">
<server-instance template="IcePatch2"
directory="C:\Documents and Settings\LocalService\Local Settings\Application Data\ZeroC\icepatch2"/>


<icebox id="IceBoxServer"
exe="C:\Program Files\ZeroC\Ice-3.3.0\bin_stlport\icebox.exe"
activation="always">
<env>PATH=%PATH%;${application.distrib};C:\Program Files\ZeroC\Ice-3.3.0\bin_stlport</env>

<!-- TODO Utiliser le template -->
<service name="IceStorm" entry="IceStormService,33:createIceStorm">
<adapter name="IceStorm.TopicManager" endpoints="tcp -p 5080">
<object identity="IceStorm/TopicManager" type="::IceStorm::TopicManager"/>
</adapter>

<adapter name="${service}.Publish" endpoints="tcp -p 5081"/>

<property name="IceStorm.InstanceName" value="IceStorm"/>
<property name="IceStorm.TopicManager.Proxy" value="IceStorm/TopicManager"/>
<property name="IceStorm.TopicManager.Endpoints" value="default -p 5080"/>
<property name="IceStorm.Publish.Endpoints" value="tcp -p 5081"/>
<property name="IceStorm.Trace.TopicManager" value="2"/>
<property name="IceStorm.Trace.Topic" value="1"/>
<property name="IceStorm.Trace.Subscriber" value="1"/>
<property name="IceStorm.Flush.Timeout" value="2000"/>
<property name="Freeze.DbEnv.IceStorm.DbHome" value="C:\Documents and Settings\LocalService\Local Settings\Application Data\ZeroC\Icestorm"/>
</service>

<properties>
<properties refid="ThreadPool"/>
<property name="Ice.RetryIntervals" value="0 50 100 250 500 1000 2500 5000"/>
<property name="Ice.MessageSizeMax" value="256000"/>
<property name="Ice.Override.Compress" value="0"/>
<property name="Ice.Warn.Connections" value="1"/>
<property name="Ice.Warn.Dispatch" value="1"/>
<property name="Ice.StdErr" value="C:\Documents and settings\LocalService\Application Data\Castor\StdErr_IceBox.txt"/>
<property name="Ice.StdOut" value="C:\Documents and settings\LocalService\Application Data\Castor\StdOut_IceBox.txt"/>
</properties>
</icebox>

</node>
</node>

I am at your disposal to get you additional information about my problem.

Thank you for this middleware guys, as handy as powerful!
Regards

Comments

  • benoit
    benoit Rennes, France
    Hi Tristan,

    I assume that you meant IceGrid instead of IceBox (IceBox is a container for services, you deploy servers with IceGrid).

    Could you provide more information about this exception? Which process is throwing it? Is this perhaps the icegridadmin tool when you try to update the IceGrid application to add a new server?

    Cheers,
    Benoit.
  • issue seems resolved

    You are right, IceBox errors I was refering came from IceStorm events:

    [ 11/26/08 11:45:15.628 IceBoxServer: Subscriber: 0x02d28ba8 431e2477-1b63-4bc0-ac10-ebfd91e0a500: subscriber errored out: TcpTransceiver.cpp:191: Ice::ConnectionLostException: connection lost: WSAECONNRESET retry: 0/0 ]

    I also got connection lost from icegridadmin trying to connect to the locator and from my services trying to register to the grid. I guess exception is thrown by the registry.

    I read a post talking about icegrid registry threads getting crazy. I stopped the node hosting the registry and start the service again. It now works perfectly! All servers are properly registered and not more errors show up about lost connection.

    The grid is now working, node's cpu with registry in at 13%, others nodes at 99,9! I will take a look at the logs in few hours. I let you know.

    Thanks Benoît


    FYI, here is the icegridregistry configuration I use, maybe there is something wrong:

    #
    # Sample configuration file for the IceGrid registry daemon
    #

    #
    # The IceGrid instance name; must be unique, to distinguish several
    # IceGrid deployments
    #
    IceGrid.InstanceName=IceGrid

    #
    # Client object adapter: listens on all interfaces
    # (add -h <name | IP address> to listen on just one interface)
    # IANA-registered TCP ports for the IceGrid registry:
    # - 4061 (insecure)
    # - 4062 (secure, using SSL)
    #
    IceGrid.Registry.Client.Endpoints=tcp -p 4061

    #
    # Server and Internal object adapters: listens on all interfaces
    # using an OS-assigned port number.
    #
    IceGrid.Registry.Server.Endpoints=tcp
    IceGrid.Registry.Internal.Endpoints=tcp

    #
    # The registry DB home; must exist when icegridregistry starts
    #
    # Under Vista we recommend using:
    #
    # C:\Windows\ServiceProfiles\LocalService\AppData\Local\ZeroC\icegrid\registry
    #
    IceGrid.Registry.Data=C:\Documents and Settings\LocalService\Local Settings\Application Data\ZeroC\registry

    #
    # Authentication/authorization
    # With NullPermissionsVerifier, any password is accepted (not recommended
    # for production)
    #
    IceGrid.Registry.PermissionsVerifier=IceGrid/NullPermissionsVerifier
    IceGrid.Registry.AdminPermissionsVerifier=IceGrid/NullPermissionsVerifier

    #
    # Default templates
    #
    IceGrid.Registry.DefaultTemplates=C:\Program Files\ZeroC\Ice-3.3.0\config\templates.xml

    #
    # Trace properties.
    #
    IceGrid.Registry.Trace.Node=1
    IceGrid.Registry.Trace.Replica=1

    IceGrid.Registry.Client.ThreadPool.Size=1
    IceGrid.Registry.Client.ThreadPool.SizeMax=8

    IceGrid.Registry.Server.ThreadPool.Size=1
    IceGrid.Registry.Server.ThreadPool.SizeMax=8
  • benoit
    benoit Rennes, France
    Hi Tristan,

    Please make sure to apply the IceGrid patches for 3.3.0 posted here. You should also setup timeouts on the IceGrid registry and node endpoints. For an example on how to set these timeouts, see the configuration files from the demo/IceGrid/replication demo in your Ice distribution.

    Cheers,
    Benoit.