Archived
This forum has been archived. Please start a new discussion on GitHub.
Ice.FileException in createProperties
in Bug Reports
Hi,
I'm getting the following intermittent problem. I'm fairly new to ICE so I hope it is not just a misconfiguration error on my part...
It seems that when I start many icegridnode servers at once on different machines (one icegridnode per machine), I get:
My config file includes:
If I look in that directory, I see only the directory of the last server to start:
Could this be a race condition, or some sort of NFS problem?
I don't know what other information to provide. I am launching the icegridnodes with "cluster-fork", a thin wrapper over ssh, and it launches each one in series and takes a few seconds for each one.
Please let me know if there are any more diagnostics I can collect.
Thanks,
dustin.
PS, this is with Ice 3.3.0, SSH "OpenSSH_4.3p2, OpenSSL 0.9.8b 04 May 2006", on "CentOS release 5.2 (Final)" (rockscluster.org) platform.
I'm getting the following intermittent problem. I'm fairly new to ICE so I hope it is not just a misconfiguration error on my part...
It seems that when I start many icegridnode servers at once on different machines (one icegridnode per machine), I get:
Traceback (most recent call last): File "SolverServer.py", line 176, in ? sys.exit(app.main(sys.argv, 'config.grid')) File "/data1/dstn/software/ice-3.3.0-hydra/python/Ice.py", line 755, in main initData.properties = createProperties(args, initData.properties) File "/data1/dstn/software/ice-3.3.0-hydra/python/Ice.py", line 552, in createProperties properties = IcePy.createProperties(args, defaults) Ice.FileException: exception ::Ice::FileException { error = 2 path = /nfs/hydra.local/data1/dstn/astrometry/net/ice/db/node/servers/Server-neuron-0-0-scale10/config/config }
My config file includes:
IceGrid.Node.Data=db/node
If I look in that directory, I see only the directory of the last server to start:
> ls -1 db/node/servers/ . .. Server-neuron-0-11-scale2
Could this be a race condition, or some sort of NFS problem?
I don't know what other information to provide. I am launching the icegridnodes with "cluster-fork", a thin wrapper over ssh, and it launches each one in series and takes a few seconds for each one.
Please let me know if there are any more diagnostics I can collect.
Thanks,
dustin.
PS, this is with Ice 3.3.0, SSH "OpenSSH_4.3p2, OpenSSL 0.9.8b 04 May 2006", on "CentOS release 5.2 (Final)" (rockscluster.org) platform.
0
Comments
-
Hi Dustin,
The directory for the server is created by the IceGrid node before the server is started. It's not clear to me why it doesn't show up at the time the server is started. You could enable server tracing on the IceGrid node with IceGrid.Node.Trace.Server=3. This should show when the server directory is created (it occurs in the "Loading" state).
How many servers do you deploy per-node? If I understand it correctly, the only server directory present in the node servers directory is the directory of the last started server, is this correct? It sounds like some problem with NFS, can you perhaps try on a local file system instead first?
Cheers,
Benoit.0 -
The directory for the server is created by the IceGrid node before the server is started. It's not clear to me why it doesn't show up at the time the server is started. You could enable server tracing on the IceGrid node with IceGrid.Node.Trace.Server=3. This should show when the server directory is created (it occurs in the "Loading" state).
Log below.How many servers do you deploy per-node?
One.If I understand it correctly, the only server directory present in the node servers directory is the directory of the last started server, is this correct?
Yes.It sounds like some problem with NFS, can you perhaps try on a local file system instead first?
Using a local filesystem directory (/tmp/node) fixed this problem. Thanks for this tip. This should *really* be in the manual or FAQ - I would imagine that many grid machines will use NFS.
Thanks,
dustin.[ 12/01/08 13:20:46.206 icegridnode: Server: changed server `Server-neuron-0-0-scale10' state to `Loading' ] [ 12/01/08 13:20:46.240 icegridnode: Server: changed server `Server-neuron-0-0-scale10' state to `Inactive' ] [ 12/01/08 13:20:46.245 icegridnode: Server: changed server `Server-neuron-0-0-scale5' state to `Loading' ] [ 12/01/08 13:20:46.276 icegridnode: Server: changed server `Server-neuron-0-0-scale5' state to `Inactive' ] [ 12/01/08 13:20:46.743 icegridnode: Server: changed server `Server-neuron-0-0-scale10' state to `Activating' ] [ 12/01/08 13:20:46.753 icegridnode: Activator: activating server `Server-neuron-0-0-scale10' path = python pwd = /nfs/hydra.local/data1/dstn/astrometry/net/ice uid/gid = 1907/30026 args = python SolverServer.py 10 --Ice.Config=/nfs/hydra.local/data1/dstn/astrometry/net/ice/db/node/servers/Server-neuron-0-0-sc ale10/config/config ] [ 12/01/08 13:20:46.755 icegridnode: Server: changed server `Server-neuron-0-0-scale10' state to `WaitForActivation' ] [ 12/01/08 13:20:46.778 icegridnode: Server: changed server `Server-neuron-0-0-scale5' state to `Activating' ] [ 12/01/08 13:20:46.778 icegridnode: Activator: activating server `Server-neuron-0-0-scale5' path = python pwd = /nfs/hydra.local/data1/dstn/astrometry/net/ice uid/gid = 1907/30026 args = python SolverServer.py 5 --Ice.Config=/nfs/hydra.local/data1/dstn/astrometry/net/ice/db/node/servers/Server-neuron-0-0-sca le5/config/config ] [ 12/01/08 13:20:46.781 icegridnode: Server: changed server `Server-neuron-0-0-scale5' state to `WaitForActivation' ] SolverServer.py args: ['SolverServer.py', '10', '--Ice.Config=/nfs/hydra.local/data1/dstn/astrometry/net/ice/db/node/servers/Server -neuron-0-0-scale10/config/config'] scale 10 Traceback (most recent call last): File "SolverServer.py", line 176, in ? sys.exit(app.main(sys.argv, 'config.grid')) File "/data1/dstn/software/ice-3.3.0-hydra/python/Ice.py", line 755, in main initData.properties = createProperties(args, initData.properties) File "/data1/dstn/software/ice-3.3.0-hydra/python/Ice.py", line 552, in createProperties properties = IcePy.createProperties(args, defaults) Ice.FileException: exception ::Ice::FileException { error = 2 path = /nfs/hydra.local/data1/dstn/astrometry/net/ice/db/node/servers/Server-neuron-0-0-scale10/config/config } [ 12/01/08 13:20:49.945 icegridnode: Activator: detected termination of server `Server-neuron-0-0-scale10' exit code = 1 ] [ 12/01/08 13:20:49.948 icegridnode: Server: changed server `Server-neuron-0-0-scale10' state to `Deactivating' ] [ 12/01/08 13:20:49.950 icegridnode: Server: changed server `Server-neuron-0-0-scale10' state to `Inactive' ]
0 -
Hi Dustin,
You can use a NFS or other shared drive to store your config files. Of course, each server needs its own private config-file.
Here, it sounds like a mounting problem, a permission problem, or a problem with the way icegridnode transforms a relative path into an absolute path.
When you log onto this machine, can you read /nfs/hydra.local/data1/dstn/astrometry/net/ice/db/node/servers/Server-neuron-0-0-scale10/config/config? Can the icegridnode user read this file?
Best regards,
Bernard0 -
You can use a NFS or other shared drive to store your config files. Of course, each server needs its own private config-file.
I am using IceGrid, so my config file includes:IceGrid.Node.Data=db/node
And when my servers are activated, they get:[ 12/01/08 13:20:46.753 icegridnode: Activator: activating server `Server-neuron-0-0-scale10' path = python pwd = /nfs/hydra.local/data1/dstn/astrometry/net/ice uid/gid = 1907/30026 args = python SolverServer.py 10 --Ice.Config=/nfs/hydra.local/data/dstn/astrometry/net/ice/db/node/servers/Server-neuron-0-0-scale10/config/config ]
Since my servers each have unique names, IceGrid seems to create unique config files for them.Here, it sounds like a mounting problem, a permission problem, or a problem with the way icegridnode transforms a relative path into an absolute path.
When you log onto this machine, can you read /nfs/hydra.local/data1/dstn/astrometry/net/ice/db/node/servers/Server-neuron-0-0-scale10/config/config? Can the icegridnode user read this file?
After I launch 12 icegridnodes, the db/node directory contains this:> find db/node/ db/node/ db/node/servers db/node/servers/Server-neuron-0-6-scale5 db/node/servers/Server-neuron-0-6-scale5/config db/node/servers/Server-neuron-0-6-scale5/config/config db/node/servers/Server-neuron-0-6-scale5/dbs db/node/servers/Server-neuron-0-6-scale5/distrib db/node/servers/Server-neuron-0-6-scale5/revision db/node/tmp db/node/distrib
And the rest of the my servers have died because their config files didn't exist:Ice.FileException: exception ::Ice::FileException { error = 2 path = /nfs/hydra.local/data1/dstn/astrometry/net/ice/db/node/servers/Server-neuron-0-0-scale10/config/config }
It's not a permissions problem: all icegridnodes run as 'dstn', a normal user, because I don't have root access on this cluster. The directories simply don't exist at the time my server runs.
I ran the following experiment:./startallnodes & for ((;;)); do echo "### `date` ###" >> ls; find db/node >> ls; sleep 0.01; done
What I see is that each node's directory gets created and populated, but then disappears:> grep "config/config\|###" ls ### 27:00.840826000 ### db/node/servers/Server-neuron-0-11-scale5/config/config ### 27:00.949344000 ### db/node/servers/Server-neuron-0-0-scale10/config/config ### 27:01.067322000 ### db/node/servers/Server-neuron-0-0-scale10/config/config ### 27:01.176296000 ### db/node/servers/Server-neuron-0-0-scale10/config/config ### 27:01.285155000 ### db/node/servers/Server-neuron-0-1-scale5/config/config ### 27:01.395943000 ### db/node/servers/Server-neuron-0-1-scale5/config/config ### 27:01.506709000 ### db/node/servers/Server-neuron-0-1-scale5/config/config ### 27:01.621234000 ### ### 27:01.730697000 ### db/node/servers/Server-neuron-0-2-scale4/config/config ### 27:01.842905000 ### db/node/servers/Server-neuron-0-2-scale4/config/config ### 27:01.955905000 ### db/node/servers/Server-neuron-0-3-scale3/config/config ### 27:02.066291000 ### db/node/servers/Server-neuron-0-3-scale3/config/config ### 27:02.177472000 ### db/node/servers/Server-neuron-0-3-scale3/config/config ### 27:02.286882000 ### db/node/servers/Server-neuron-0-4-scale2/config/config ### 27:02.402262000 ### db/node/servers/Server-neuron-0-4-scale2/config/config ### 27:02.523075000 ### db/node/servers/Server-neuron-0-4-scale2/config/config ### 27:02.646938000 ### db/node/servers/Server-neuron-0-5-scale10/config/config ### 27:02.757144000 ### db/node/servers/Server-neuron-0-5-scale10/config/config ### 27:02.874255000 ### db/node/servers/Server-neuron-0-5-scale10/config/config ### 27:02.984534000 ### db/node/servers/Server-neuron-0-6-scale5/config/config ### 27:03.097887000 ### db/node/servers/Server-neuron-0-6-scale5/config/config ### 27:03.240119000 ### db/node/servers/Server-neuron-0-6-scale5/config/config ### 27:03.357368000 ### db/node/servers/Server-neuron-0-7-scale4/config/config ### 27:03.469632000 ### db/node/servers/Server-neuron-0-7-scale4/config/config ### 27:03.582018000 ### db/node/servers/Server-neuron-0-7-scale4/config/config ### 27:03.694741000 ### ### 27:03.804037000 ### db/node/servers/Server-neuron-0-8-scale3/config/config ### 27:03.936422000 ### db/node/servers/Server-neuron-0-8-scale3/config/config ### 27:04.066373000 ### db/node/servers/Server-neuron-0-8-scale3/config/config ### 27:04.182170000 ### db/node/servers/Server-neuron-0-9-scale2/config/config ### 27:04.318162000 ### db/node/servers/Server-neuron-0-9-scale2/config/config ### 27:04.449006000 ### ### 27:04.580340000 ### db/node/servers/Server-neuron-0-10-scale10/config/config ### 27:04.709145000 ### db/node/servers/Server-neuron-0-10-scale10/config/config ### 27:04.821542000 ### ### 27:04.943917000 ### db/node/servers/Server-neuron-0-11-scale5/config/config ### 27:05.071327000 ### db/node/servers/Server-neuron-0-11-scale5/config/config
I get this error message from find:find: WARNING: Hard link count is wrong for db/node/servers: this may be a bug in your filesystem driver. Automatically turning on find's -noleaf option. Earlier results may have failed to include directories that should have been searched.
So this may by an NFS directory-creation race condition bug.
I suggest a warning be added to the manual!
Cheers,
dustin.0 -
Hi Dustin,
Ok, I think the issue is that each icegridnode needs its own private IceGrid.Node.Data directory. You can't share this directory. The manual should make this clear.
You can however put this data directory on a shared drive. For example, you could use:IceGrid.Node.Data=db/${node}
Best regards,
Bernard0 -
For example, you could use:
IceGrid.Node.Data=db/${node}
When I put that in my icegridnode config file, I get:12/02/08 11:14:07.518 icegridnode: error: property `IceGrid.Node.Data' is set to an invalid path: IceGridNode.cpp:369: Ice::FileException: file exception: No such file or directory path: db/${node} icegridnode: failure occurred in daemon
0 -
Hi,
Yes, I don't think Bernard really meant to use "${node}" as is (such variables are only substituted in XML descriptors). Instead, you can use the node name, for example:- In config.node1 file:
IceGrid.Node.Name=node1 IceGrid.Node.Data=db/node1 ...
- In config.node2 file:
IceGrid.Node.Name=node2 IceGrid.Node.Data=db/node2 ...
If you want to use a single configuration file for all the nodes, you can specify the node name and node data directory properties on the command line.
Cheers,
Benoit.0 - In config.node1 file: