Archived

This forum has been archived. Please start a new discussion on GitHub.

Ice.FileException in createProperties

Hi,

I'm getting the following intermittent problem. I'm fairly new to ICE so I hope it is not just a misconfiguration error on my part...

It seems that when I start many icegridnode servers at once on different machines (one icegridnode per machine), I get:
Traceback (most recent call last):
  File "SolverServer.py", line 176, in ?
    sys.exit(app.main(sys.argv, 'config.grid'))
  File "/data1/dstn/software/ice-3.3.0-hydra/python/Ice.py", line 755, in main
    initData.properties = createProperties(args, initData.properties)
  File "/data1/dstn/software/ice-3.3.0-hydra/python/Ice.py", line 552, in createProperties
    properties = IcePy.createProperties(args, defaults)
Ice.FileException: exception ::Ice::FileException
{
    error = 2
    path = /nfs/hydra.local/data1/dstn/astrometry/net/ice/db/node/servers/Server-neuron-0-0-scale10/config/config
}

My config file includes:
IceGrid.Node.Data=db/node

If I look in that directory, I see only the directory of the last server to start:
> ls -1 db/node/servers/
.
..
Server-neuron-0-11-scale2

Could this be a race condition, or some sort of NFS problem?

I don't know what other information to provide. I am launching the icegridnodes with "cluster-fork", a thin wrapper over ssh, and it launches each one in series and takes a few seconds for each one.

Please let me know if there are any more diagnostics I can collect.

Thanks,
dustin.

PS, this is with Ice 3.3.0, SSH "OpenSSH_4.3p2, OpenSSL 0.9.8b 04 May 2006", on "CentOS release 5.2 (Final)" (rockscluster.org) platform.

Comments

  • benoit
    benoit Rennes, France
    Hi Dustin,

    The directory for the server is created by the IceGrid node before the server is started. It's not clear to me why it doesn't show up at the time the server is started. You could enable server tracing on the IceGrid node with IceGrid.Node.Trace.Server=3. This should show when the server directory is created (it occurs in the "Loading" state).

    How many servers do you deploy per-node? If I understand it correctly, the only server directory present in the node servers directory is the directory of the last started server, is this correct? It sounds like some problem with NFS, can you perhaps try on a local file system instead first?

    Cheers,
    Benoit.
  • benoit wrote: »
    The directory for the server is created by the IceGrid node before the server is started. It's not clear to me why it doesn't show up at the time the server is started. You could enable server tracing on the IceGrid node with IceGrid.Node.Trace.Server=3. This should show when the server directory is created (it occurs in the "Loading" state).

    Log below.
    benoit wrote: »
    How many servers do you deploy per-node?

    One.
    benoit wrote: »
    If I understand it correctly, the only server directory present in the node servers directory is the directory of the last started server, is this correct?

    Yes.
    benoit wrote: »
    It sounds like some problem with NFS, can you perhaps try on a local file system instead first?

    Using a local filesystem directory (/tmp/node) fixed this problem. Thanks for this tip. This should *really* be in the manual or FAQ - I would imagine that many grid machines will use NFS.

    Thanks,
    dustin.

    [ 12/01/08 13:20:46.206 icegridnode: Server: changed server `Server-neuron-0-0-scale10' state to `Loading' ]
    [ 12/01/08 13:20:46.240 icegridnode: Server: changed server `Server-neuron-0-0-scale10' state to `Inactive' ]
    [ 12/01/08 13:20:46.245 icegridnode: Server: changed server `Server-neuron-0-0-scale5' state to `Loading' ]
    [ 12/01/08 13:20:46.276 icegridnode: Server: changed server `Server-neuron-0-0-scale5' state to `Inactive' ]
    [ 12/01/08 13:20:46.743 icegridnode: Server: changed server `Server-neuron-0-0-scale10' state to `Activating' ]
    [ 12/01/08 13:20:46.753 icegridnode: Activator: activating server `Server-neuron-0-0-scale10'
      path = python
      pwd = /nfs/hydra.local/data1/dstn/astrometry/net/ice
      uid/gid = 1907/30026
      args = python SolverServer.py 10 --Ice.Config=/nfs/hydra.local/data1/dstn/astrometry/net/ice/db/node/servers/Server-neuron-0-0-sc
    ale10/config/config ]
    [ 12/01/08 13:20:46.755 icegridnode: Server: changed server `Server-neuron-0-0-scale10' state to `WaitForActivation' ]
    [ 12/01/08 13:20:46.778 icegridnode: Server: changed server `Server-neuron-0-0-scale5' state to `Activating' ]
    [ 12/01/08 13:20:46.778 icegridnode: Activator: activating server `Server-neuron-0-0-scale5'
      path = python
      pwd = /nfs/hydra.local/data1/dstn/astrometry/net/ice
      uid/gid = 1907/30026
      args = python SolverServer.py 5 --Ice.Config=/nfs/hydra.local/data1/dstn/astrometry/net/ice/db/node/servers/Server-neuron-0-0-sca
    le5/config/config ]
    [ 12/01/08 13:20:46.781 icegridnode: Server: changed server `Server-neuron-0-0-scale5' state to `WaitForActivation' ]
    SolverServer.py args: ['SolverServer.py', '10', '--Ice.Config=/nfs/hydra.local/data1/dstn/astrometry/net/ice/db/node/servers/Server
    -neuron-0-0-scale10/config/config']
    scale 10
    Traceback (most recent call last):
      File "SolverServer.py", line 176, in ?
        sys.exit(app.main(sys.argv, 'config.grid'))
      File "/data1/dstn/software/ice-3.3.0-hydra/python/Ice.py", line 755, in main
        initData.properties = createProperties(args, initData.properties)
      File "/data1/dstn/software/ice-3.3.0-hydra/python/Ice.py", line 552, in createProperties
        properties = IcePy.createProperties(args, defaults)
    Ice.FileException: exception ::Ice::FileException
    {
        error = 2
        path = /nfs/hydra.local/data1/dstn/astrometry/net/ice/db/node/servers/Server-neuron-0-0-scale10/config/config
    }
    [ 12/01/08 13:20:49.945 icegridnode: Activator: detected termination of server `Server-neuron-0-0-scale10'
      exit code = 1 ]
    [ 12/01/08 13:20:49.948 icegridnode: Server: changed server `Server-neuron-0-0-scale10' state to `Deactivating' ]
    [ 12/01/08 13:20:49.950 icegridnode: Server: changed server `Server-neuron-0-0-scale10' state to `Inactive' ]
    
  • bernard
    bernard Jupiter, FL
    Hi Dustin,

    You can use a NFS or other shared drive to store your config files. Of course, each server needs its own private config-file.

    Here, it sounds like a mounting problem, a permission problem, or a problem with the way icegridnode transforms a relative path into an absolute path.

    When you log onto this machine, can you read /nfs/hydra.local/data1/dstn/astrometry/net/ice/db/node/servers/Server-neuron-0-0-scale10/config/config? Can the icegridnode user read this file?

    Best regards,
    Bernard
  • bernard wrote: »
    You can use a NFS or other shared drive to store your config files. Of course, each server needs its own private config-file.

    I am using IceGrid, so my config file includes:
    IceGrid.Node.Data=db/node
    

    And when my servers are activated, they get:
    [ 12/01/08 13:20:46.753 icegridnode: Activator: activating server `Server-neuron-0-0-scale10'
      path = python
      pwd = /nfs/hydra.local/data1/dstn/astrometry/net/ice
      uid/gid = 1907/30026
      args = python SolverServer.py 10 --Ice.Config=/nfs/hydra.local/data/dstn/astrometry/net/ice/db/node/servers/Server-neuron-0-0-scale10/config/config ]
    

    Since my servers each have unique names, IceGrid seems to create unique config files for them.
    bernard wrote: »
    Here, it sounds like a mounting problem, a permission problem, or a problem with the way icegridnode transforms a relative path into an absolute path.

    When you log onto this machine, can you read /nfs/hydra.local/data1/dstn/astrometry/net/ice/db/node/servers/Server-neuron-0-0-scale10/config/config? Can the icegridnode user read this file?

    After I launch 12 icegridnodes, the db/node directory contains this:
    > find db/node/
    db/node/
    db/node/servers
    db/node/servers/Server-neuron-0-6-scale5
    db/node/servers/Server-neuron-0-6-scale5/config
    db/node/servers/Server-neuron-0-6-scale5/config/config
    db/node/servers/Server-neuron-0-6-scale5/dbs
    db/node/servers/Server-neuron-0-6-scale5/distrib
    db/node/servers/Server-neuron-0-6-scale5/revision
    db/node/tmp
    db/node/distrib
    

    And the rest of the my servers have died because their config files didn't exist:
    Ice.FileException: exception ::Ice::FileException
    {
        error = 2
        path = /nfs/hydra.local/data1/dstn/astrometry/net/ice/db/node/servers/Server-neuron-0-0-scale10/config/config
    }
    

    It's not a permissions problem: all icegridnodes run as 'dstn', a normal user, because I don't have root access on this cluster. The directories simply don't exist at the time my server runs.

    I ran the following experiment:
    ./startallnodes & for ((;;)); do echo "### `date` ###" >> ls; find db/node >> ls; sleep 0.01; done
    

    What I see is that each node's directory gets created and populated, but then disappears:
    > grep "config/config\|###" ls
    ### 27:00.840826000 ###
    db/node/servers/Server-neuron-0-11-scale5/config/config
    ### 27:00.949344000 ###
    db/node/servers/Server-neuron-0-0-scale10/config/config
    ### 27:01.067322000 ###
    db/node/servers/Server-neuron-0-0-scale10/config/config
    ### 27:01.176296000 ###
    db/node/servers/Server-neuron-0-0-scale10/config/config
    ### 27:01.285155000 ###
    db/node/servers/Server-neuron-0-1-scale5/config/config
    ### 27:01.395943000 ###
    db/node/servers/Server-neuron-0-1-scale5/config/config
    ### 27:01.506709000 ###
    db/node/servers/Server-neuron-0-1-scale5/config/config
    ### 27:01.621234000 ###
    ### 27:01.730697000 ###
    db/node/servers/Server-neuron-0-2-scale4/config/config
    ### 27:01.842905000 ###
    db/node/servers/Server-neuron-0-2-scale4/config/config
    ### 27:01.955905000 ###
    db/node/servers/Server-neuron-0-3-scale3/config/config
    ### 27:02.066291000 ###
    db/node/servers/Server-neuron-0-3-scale3/config/config
    ### 27:02.177472000 ###
    db/node/servers/Server-neuron-0-3-scale3/config/config
    ### 27:02.286882000 ###
    db/node/servers/Server-neuron-0-4-scale2/config/config
    ### 27:02.402262000 ###
    db/node/servers/Server-neuron-0-4-scale2/config/config
    ### 27:02.523075000 ###
    db/node/servers/Server-neuron-0-4-scale2/config/config
    ### 27:02.646938000 ###
    db/node/servers/Server-neuron-0-5-scale10/config/config
    ### 27:02.757144000 ###
    db/node/servers/Server-neuron-0-5-scale10/config/config
    ### 27:02.874255000 ###
    db/node/servers/Server-neuron-0-5-scale10/config/config
    ### 27:02.984534000 ###
    db/node/servers/Server-neuron-0-6-scale5/config/config
    ### 27:03.097887000 ###
    db/node/servers/Server-neuron-0-6-scale5/config/config
    ### 27:03.240119000 ###
    db/node/servers/Server-neuron-0-6-scale5/config/config
    ### 27:03.357368000 ###
    db/node/servers/Server-neuron-0-7-scale4/config/config
    ### 27:03.469632000 ###
    db/node/servers/Server-neuron-0-7-scale4/config/config
    ### 27:03.582018000 ###
    db/node/servers/Server-neuron-0-7-scale4/config/config
    ### 27:03.694741000 ###
    ### 27:03.804037000 ###
    db/node/servers/Server-neuron-0-8-scale3/config/config
    ### 27:03.936422000 ###
    db/node/servers/Server-neuron-0-8-scale3/config/config
    ### 27:04.066373000 ###
    db/node/servers/Server-neuron-0-8-scale3/config/config
    ### 27:04.182170000 ###
    db/node/servers/Server-neuron-0-9-scale2/config/config
    ### 27:04.318162000 ###
    db/node/servers/Server-neuron-0-9-scale2/config/config
    ### 27:04.449006000 ###
    ### 27:04.580340000 ###
    db/node/servers/Server-neuron-0-10-scale10/config/config
    ### 27:04.709145000 ###
    db/node/servers/Server-neuron-0-10-scale10/config/config
    ### 27:04.821542000 ###
    ### 27:04.943917000 ###
    db/node/servers/Server-neuron-0-11-scale5/config/config
    ### 27:05.071327000 ###
    db/node/servers/Server-neuron-0-11-scale5/config/config
    

    I get this error message from find:
    find: WARNING: Hard link count is wrong for db/node/servers: this may be a bug in your filesystem driver.  Automatically turning on find's -noleaf option.  Earlier results may have failed to include directories that should have been searched.
    

    So this may by an NFS directory-creation race condition bug.

    I suggest a warning be added to the manual!

    Cheers,
    dustin.
  • bernard
    bernard Jupiter, FL
    Hi Dustin,

    Ok, I think the issue is that each icegridnode needs its own private IceGrid.Node.Data directory. You can't share this directory. The manual should make this clear.

    You can however put this data directory on a shared drive. For example, you could use:
    IceGrid.Node.Data=db/${node}
    

    Best regards,
    Bernard
  • bernard wrote: »
    For example, you could use:
    IceGrid.Node.Data=db/${node}
    

    When I put that in my icegridnode config file, I get:
    12/02/08 11:14:07.518 icegridnode: error: property `IceGrid.Node.Data' is set to an invalid path:
    IceGridNode.cpp:369: Ice::FileException:
    file exception: No such file or directory
    path: db/${node}
    icegridnode: failure occurred in daemon
    
  • benoit
    benoit Rennes, France
    Hi,

    Yes, I don't think Bernard really meant to use "${node}" as is (such variables are only substituted in XML descriptors). Instead, you can use the node name, for example:
    • In config.node1 file:
          IceGrid.Node.Name=node1
          IceGrid.Node.Data=db/node1
          ...
      
    • In config.node2 file:
          IceGrid.Node.Name=node2
          IceGrid.Node.Data=db/node2
          ...
      

    If you want to use a single configuration file for all the nodes, you can specify the node name and node data directory properties on the command line.

    Cheers,
    Benoit.
  • benoit wrote: »
    If you want to use a single configuration file for all the nodes, you can specify the node name and node data directory properties on the command line.

    Yes, thanks, I did that.

    It would be convenient if icegridnode would create its data directory if it doesn't already exist.