Observers do not work properly

ignacio.diez · February 2010

Hello everybody,
I'm having trouble with using node observers in my project. I have a server (we'll call it HydraManager) which needs to be aware of nodes coming up and down. I set the observer via the setObserver method, and I have a thread pingin the session in order not to let it expire.

Everything works fine if the node comes up shortly after the HydraManager was started, but after some time I get no more notifications on node changes. I am using python and Ice 3.3.1-12 (also tried 3.3.1-10) in Debian. Here are some pieces of code which might be relevant:

This is the run method from HydraManger

def run(self, args):
        sys.stdout.flush()

        c  = self.communicator()

        locprx   = IceGrid.LocatorPrx.checkedCast(c.getDefaultLocator())
        registry = locprx.getLocalRegistry()
        self.session = registry.createAdminSession("manager", "passwd")
        self.admin = self.session.getAdmin()

        [COLOR="DarkRed"]# thread to keep the session alive[/COLOR]
        self.alive = SessionThread(self.session, registry.getSessionTimeout(), "ManagerServer")
        self.alive.start()

        [COLOR="DarkRed"]# add the node observer[/COLOR]
        self.observersAdapter = c.createObjectAdapter('ObserversAdapter')
        self.observersAdapter.activate()

        nodeobprx = self.observersAdapter.add(HydraNodeObserver(c), Ice.Identity("NodeObserver"))
        nodeobserver = IceGrid.NodeObserverPrx.checkedCast(nodeobprx)

        try:
            self.session.setObservers(
                None,
                nodeobserver,
                None,
                None,
                None)

        except IceGrid.ObserverAlreadyRegisteredException, err:
            logger.warn(err)


        [COLOR="DarkRed"]# add the manager[/COLOR]
        self.mgri = ManagerI()

        mgrAdap = c.createObjectAdapter('ManagerAdapter')
        mgrprx = mgrAdap.add(self.mgri, Ice.Identity('Manager'))
        mgrAdap.activate()

        logger.info("Manager ready!")

        self.callbackOnInterrupt()
        c.waitForShutdown()
        self.interruptCallback(2)
        logger.info("Exiting...")
        return 0

HydraNodeObserver inherits from IceGrid.NodeObserver and implements its methods (nodeUp, nodeDown, etc). And this is the thread used to ping the session:

class SessionThread(Thread):

    def __init__(self, session, timeout, name):
        super(SessionThread, self).__init__()
        self.session = session
        self.finished = Event()
        self.timeout = timeout-5
        self.name = name


    def run(self):
        while True:
            self.finished.wait(self.timeout)
            if self.finished.isSet(): break
            try:
                self.session.keepAlive()
            except Exception, err:
                logger.error("%s\n%s" % (self.name, err))

    def stop(self):
        self.finished.set()

I can't figure out what is going on, so any help would be appreciated.

benoit · February 2010

Hi,

You should enable session tracing on the registry to see if the session if being closed by the registry. Set the IceGrid.Registry.Trace.Session property to 2 in your registry configuration file to enable this tracing. If this doesn't help, you can then try to enable networking tracing on the registry and on your python client with Ice.Trace.Network=2 to see if there are any issues with establishing the network connection from the registry to your client.

Cheers,
Benoit.

ignacio.diez · February 2010

Hello Benoit, thanks for your reply.

I tried what you said. First I only traced the registry, and this is what I got.

[ 02/17/10 11:46:50.369 icegridnode: Node: node `Hydra-Manager' up ] [COLOR="DarkRed"]//this is the registry node[/COLOR]
[ 02/17/10 11:46:51.573 icegridnode: Session: admin session `Manager.run' created ] [COLOR="DarkRed"]//this is the session holding the observers[/COLOR]
[ 02/17/10 11:46:51.734 icegridnode: Session: admin session `Core.unpublish' created ]
[ 02/17/10 11:46:51.741 icegridnode: Session: admin session `Core.publish' created ]
[ 02/17/10 11:46:51.743 icegridnode: Session: admin session `Core.publish' keep alive ]
[ 02/17/10 11:46:51.937 icegridnode: Session: admin session `Core.publish' keep alive ]
[ 02/17/10 11:46:51.938 icegridnode: Session: admin session `Core.publish' keep alive ]
[ 02/17/10 11:46:51.939 icegridnode: Session: admin session `Core.publish' destroyed ]
[ 02/17/10 11:46:52.107 icegridnode: Session: admin session `Core.unpublish' created ]
[ 02/17/10 11:46:52.117 icegridnode: Session: admin session `Core.publish' created ]
[ 02/17/10 11:46:52.118 icegridnode: Session: admin session `Core.publish' keep alive ]
[ 02/17/10 11:46:52.133 icegridnode: Session: admin session `Core.publish' keep alive ]
[ 02/17/10 11:46:52.134 icegridnode: Session: admin session `Core.publish' keep alive ]
[ 02/17/10 11:46:52.135 icegridnode: Session: admin session `Core.publish' destroyed ]
[ 02/17/10 11:46:56.551 icegridnode: Session: admin session `icegrid-gui' created ]
[ 02/17/10 11:46:56.558 icegridnode: Session: admin session `icegrid-gui' keep alive ]
[ 02/17/10 11:47:06.575 icegridnode: Session: admin session `Manager.run' keep alive ]
[ 02/17/10 11:47:11.560 icegridnode: Session: admin session `icegrid-gui' keep alive ]
[ 02/17/10 11:47:21.576 icegridnode: Session: admin session `Manager.run' keep alive ]
[ 02/17/10 11:47:26.561 icegridnode: Session: admin session `icegrid-gui' keep alive ]
[ 02/17/10 11:47:36.576 icegridnode: Session: admin session `Manager.run' keep alive ]
[COLOR="DarkRed"]
/**
 * This is a second node (a "Delegate"). So far, so good. The section in green
 * shows the activity of the Delegate (along with the keepalives), which
 * must be performed before the "pxe-" nodes come up
 */
[/COLOR]
[COLOR="DarkGreen"][ 02/17/10 11:47:40.292 icegridnode: Node: node `del-hydra' up ]
[ 02/17/10 11:47:40.300 icegridnode: Session: admin session `Deployer.run' created ]
[ 02/17/10 11:47:40.305 icegridnode: Session: admin session `Deployer.run' destroyed ]
[ 02/17/10 11:47:40.306 icegridnode: Session: admin session `Deployer.deploy' created ]
[ 02/17/10 11:47:41.338 icegridnode: Session: admin session `Deployer.deploy' keep alive ]
[ 02/17/10 11:47:41.563 icegridnode: Session: admin session `icegrid-gui' keep alive ]
[ 02/17/10 11:47:42.655 icegridnode: Session: admin session `Deployer.deploy' keep alive ]
[ 02/17/10 11:47:43.658 icegridnode: Session: admin session `Deployer.deploy' destroyed ]
[ 02/17/10 11:47:45.705 icegridnode: Session: admin session `Deployer.deploy' created ]
[ 02/17/10 11:47:46.727 icegridnode: Session: admin session `Deployer.deploy' keep alive ]
[ 02/17/10 11:47:47.858 icegridnode: Session: admin session `Deployer.deploy' keep alive ]
[ 02/17/10 11:47:48.882 icegridnode: Session: admin session `Deployer.deploy' destroyed ]
[ 02/17/10 11:47:50.371 icegridnode: Session: admin session `Core.unpublish' destroyed ]
[ 02/17/10 11:47:50.371 icegridnode: Session: admin session `Core.unpublish' destroyed ]
[ 02/17/10 11:47:51.577 icegridnode: Session: admin session `Manager.run' keep alive ]
[ 02/17/10 11:47:56.564 icegridnode: Session: admin session `icegrid-gui' keep alive ]
[ 02/17/10 11:48:06.578 icegridnode: Session: admin session `Manager.run' keep alive ]
[ 02/17/10 11:48:11.565 icegridnode: Session: admin session `icegrid-gui' keep alive ]
[ 02/17/10 11:48:21.578 icegridnode: Session: admin session `Manager.run' keep alive ]
[ 02/17/10 11:48:26.566 icegridnode: Session: admin session `icegrid-gui' keep alive ]
[ 02/17/10 11:48:36.579 icegridnode: Session: admin session `Manager.run' keep alive ]
[ 02/17/10 11:48:41.566 icegridnode: Session: admin session `icegrid-gui' keep alive ]
[ 02/17/10 11:48:50.680 icegridnode: Session: admin session `Core.unpublish' created ]
[ 02/17/10 11:48:50.686 icegridnode: Session: admin session `Core.publish' created ]
[ 02/17/10 11:48:50.801 icegridnode: Session: admin session `Core.publish' keep alive ]
[ 02/17/10 11:48:50.802 icegridnode: Session: admin session `Core.publish' keep alive ]
[ 02/17/10 11:48:50.807 icegridnode: Session: admin session `Core.unpublish' created ]
[ 02/17/10 11:48:50.814 icegridnode: Session: admin session `Core.publish' created ]
[ 02/17/10 11:48:50.866 icegridnode: Session: admin session `Core.publish' keep alive ]
[ 02/17/10 11:48:50.869 icegridnode: Session: admin session `Core.publish' keep alive ]
[ 02/17/10 11:48:51.580 icegridnode: Session: admin session `Manager.run' keep alive ]
[ 02/17/10 11:48:56.567 icegridnode: Session: admin session `icegrid-gui' keep alive ]
[ 02/17/10 11:49:06.580 icegridnode: Session: admin session `Manager.run' keep alive ]
[ 02/17/10 11:49:11.568 icegridnode: Session: admin session `icegrid-gui' keep alive ]
[ 02/17/10 11:49:21.581 icegridnode: Session: admin session `Manager.run' keep alive ]
[ 02/17/10 11:49:26.569 icegridnode: Session: admin session `icegrid-gui' keep alive ]
[ 02/17/10 11:49:36.582 icegridnode: Session: admin session `Manager.run' keep alive ]
[ 02/17/10 11:49:41.570 icegridnode: Session: admin session `icegrid-gui' keep alive ]
[ 02/17/10 11:49:50.372 icegridnode: Session: admin session `Core.unpublish' destroyed ]
[ 02/17/10 11:49:50.373 icegridnode: Session: admin session `Core.publish' destroyed ]
[ 02/17/10 11:49:50.373 icegridnode: Session: admin session `Core.unpublish' destroyed ]
[ 02/17/10 11:49:50.373 icegridnode: Session: admin session `Core.publish' destroyed ]
[ 02/17/10 11:49:51.582 icegridnode: Session: admin session `Manager.run' keep alive ]
[ 02/17/10 11:49:56.571 icegridnode: Session: admin session `icegrid-gui' keep alive ]
[ 02/17/10 11:50:06.583 icegridnode: Session: admin session `Manager.run' keep alive ]
[ 02/17/10 11:50:11.574 icegridnode: Session: admin session `icegrid-gui' keep alive ]
[ 02/17/10 11:50:21.584 icegridnode: Session: admin session `Manager.run' keep alive ]
[ 02/17/10 11:50:26.576 icegridnode: Session: admin session `icegrid-gui' keep alive ][/COLOR]

[COLOR="DarkRed"]
/**
 * Now, a "pxe-" node comes up, as detected by icegrid-gui, but my registry 
 * doesn't notice
 */
[/COLOR]
[COLOR="Blue"][ 02/17/10 11:50:26.623 icegridnode: Node: node `pxe-000102fc3088' up ][/COLOR]
[ 02/17/10 11:50:36.585 icegridnode: Session: admin session `Manager.run' keep alive ]
[ 02/17/10 11:50:41.577 icegridnode: Session: admin session `icegrid-gui' keep alive ]
[ 02/17/10 11:50:51.585 icegridnode: Session: admin session `Manager.run' keep alive ]
[ 02/17/10 11:50:56.578 icegridnode: Session: admin session `icegrid-gui' keep alive ]
[ 02/17/10 11:51:01.108 icegridnode: Node: node `pxe-000102fc3088' down ]
[ 02/17/10 11:51:05.671 icegridnode: Session: admin session `Manager.run' destroyed ]
[ 02/17/10 11:51:05.717 icegridnode: Session: admin session `Core.unpublish' created ]
[ 02/17/10 11:51:06.752 icegridnode: Session: admin session `Core.unpublish' destroyed ]
[ 02/17/10 11:51:06.836 icegridnode: Session: admin session `Core.unpublish' created ]
[ 02/17/10 11:51:07.874 icegridnode: Session: admin session `Core.unpublish' destroyed ]
[ 02/17/10 11:51:10.492 icegridnode: Node: node `Hydra-Manager' down ]
[ 02/17/10 11:51:10.495 icegridnode: Session: admin session `icegrid-gui' destroyed ]
[ 02/17/10 11:51:10.501 icegridnode: Node: node `del-hydra' down ]

Everything looked normal, as I couldn't see the Manger.run session being destroyed nor anything strange. So I tried tracing the network; the result is in the attatched file (it was too long for a post).

I don't see anything unusual, but I'm not sure with this tracing. Perhaps you can spot some error.

Thanks again,
Nacho.

benoit · February 2010

Hi,

Did you try enabling protocol tracing with Ice.Trace.Protocol=2 on your client? Can you try attaching to your client process after you notice that it doesn't receive an event to verify that the Ice server thread pool threads aren't hung on some locks (if all the threads are busy, your client won't be able to dispatch further events)?

Cheers,
Benoit.

ignacio.diez · February 2010

Hi,

I don't know what you mean by 'attatch to your client process'. I am using python, so I cannot attatch to a process the way you do it with a C debugger.

When trying to set Ice.Trace.Protocol and Ice.Trace.Network my program won't start (without rendering any kind of message, which is most frustrating). It enters __main__, but when it hangs on the m.main(sys.argv) invocation (see below):

if __name__=="__main__":
    m = ManagerServer()
    vlevel = logging.INFO
    logger.setLevel(vlevel)
    sys.exit(m.main(sys.argv))

ManagerServer is the class containing the run() method I posted on my first post.

Regards,
Nacho

benoit · February 2010

Which platform and python version do you use? Does it work when you try to run the hello client from the Ice/hello directory in your Ice distribution? Here's what I get when running the client with --Ice.Trace.Network=2:

 $ ./Server.py &
 $ ./Client.py --Ice.Trace.Network=2
[ 2010-02-17 20:39:07.248571 ./Client.py: Network: trying to establish tcp connection to 127.0.0.1:10000 ]
[ 2010-02-17 20:39:07.249269 ./Client.py: Network: tcp connection established
  local address = 127.0.0.1:54346
  remote address = 127.0.0.1:10000 ]

You should see something similar. If this works, you can try to find out what is different in your application.

Cheers,
Benoit.

ignacio.diez · February 2010

I think I solved it

I think I have found the sources of my problem. One of them was my having several administrative sessions, one for every class that performed administrative tasks. That is solved.

The other one is some error being raised somewhere, which freezes my manager, therfore making it unable to receive events. I'm still working on this.

Thanks for your replies.

Archived

Observers do not work properly

Comments

Categories