Archived

This forum has been archived. Please start a new discussion on GitHub.

How to improve performance of thread-per-connection?

I'm embarrassed to say that I'm not entirely sure exactly what the right question to ask is... :o So, for now, what's the best/correct way to improve performance of systems using the thread-per-connection model? Here's a description of our architecture and the problems we're seeing...

We have a peer-to-peer system, partially based on the architecture that Mark described in his (excellent!) article on Dynamic Ice in Connections #11. The peers are always of two types: one is a human using a GUI client (written in Java) and the other is a robot (written in C++, using Ice-E). Here's the typical use case:
  • Human turns the robot on, and the robot auto-connects to our relay server (very similar to the "registry" in Mark's article). Auto-connection includes login and registering a proxy to its servant with the relay.
  • Human fires up the GUI client and logs in to the relay. Login includes registering a proxy to its servant with the relay.
  • Human uses the GUI client to view the list of robots currently connected to the relay, selects one, and clicks "Connect" (which sends a connection request to the relay)
  • Upon receipt of the connection request from the human, the relay sends a client proxy to the robot and a robot proxy to the GUI client (so they can make method calls on each other--the relay merely shuffles messages back and forth using Dynamic Ice as Mark described in the article).
  • Once the connection is established (i.e. the client gets the robot's proxy), the client calls a startVideoStream() method on the robot. This causes the robot to start pushing JPEG images (obtained from its camera) to the client. Pushing is performed in a separate thread.
  • The human can drive the robot by using the GUI client's interface. Commands are, at the moment, very simple (e.g. "drive forward", "tilt camera up", etc.). See the slice code below for more detail.

The problem we're having is that if the the human bangs away on the keyboard, sending, say, multiple drive commands in quick succession, the video stream it receives slows to a crawl (20 fps --> 1 fps or worse). The problem appears to result from the robot's use of the thread-per-connection model (it can't use thread pools since it's running under Ice-E). It seems as though thread-per-connection limits the robot to only doing one command at a time, in either direction (sending or receiving). Indeed, I've tested this using a fake robot (just a simple Java simulator, running under plain ol' Ice, not Ice-E) and switching to the thread pool model makes all video stream degradation disappear. Switching the Java robot simulator back to thread-per-connection results in the exact behavior we're witnessing with the real robot.

I read something in the Ice docs about possibly opening up another connection, and tried it in the robot simulator, but didn't notice any improvement. Maybe I didn't do it right though--all I did was have the robot code call ice_connectionId("video") on the client proxy it receives from the relay and use that new proxy solely for pushing the video frames. I was assuming video would use that new connection, while commands from the client to the robot would use the original connection that the robot made when it first connected to the relay (and registered its proxy, etc.)

Is that enough detail to ask for advice on how to fix things so that the robot can do send/receive more than one Ice command at a time?

Here's the (relevant) Slice code and all the config files we're using (yes, I know "GenericError" in the slice code is totally lame--it's just a placeholder until we better identify our exceptional conditions ;) ):

Slice code for methods the robot and client use to register themselves with the relay (as well as some the relay uses to notify the client/robot of connections and disconnections):
#ifndef MRPL_PEER_ICE
#define MRPL_PEER_ICE

#include <Glacier2/Session.ice>

[["java:package:edu.cmu.ri.mrpl"]]
module peer
   {
   enum PeerAccessLevel {AccessLevelOwner,
                         AccessLevelOwnerRestricted,
                         AccessLevelNormalEnhanced,
                         AccessLevelNormal,
                         AccessLevelNormalRestricted,
                         AccessLevelGuestEnhanced,
                         AccessLevelGuest,
                         AccessLevelGuestRestricted,
                         AccessLevelNone};

   struct PeerIdentifier
      {
      string userId;
      string firstName;
      string lastName;
      };

   exception PeerException
      {
      string reason;
      };

   exception PeerAccessException extends PeerException { };

   exception PeerUnavailableException extends PeerException { };

   exception PeerConnectionFailedException extends PeerException { };

   exception DuplicateConnectionException extends PeerException { };

   exception AuthenticationRequiredException extends PeerException { };

   exception RegistrationException extends PeerException { };

   interface UserConnectionEventHandler
      {
      ["ami"] void forcedLogoutNotification();
      };

   interface PeerConnectionEventHandler
      {
      ["ami"] void peerConnected(string peerUserId, PeerAccessLevel accessLevel, Object* peerProxy);

      ["ami"] void peerConnectedNoProxy(string peerUserId, PeerAccessLevel accessLevel);

      ["ami"] void peerDisconnected(string peerUserId);
      };

   interface ConnectionEventHandler extends UserConnectionEventHandler, PeerConnectionEventHandler
      {
      };

   interface PeerRegistrationHandler
      {
      void registerCallbacks(Object* selfCallbackProxy, ConnectionEventHandler* connectionEventHandlerProxy) throws RegistrationException;
      };

   ["java:type:java.util.HashSet<PeerIdentifier>"] sequence<PeerIdentifier> PeerIdentifierSet;

   interface UserSession extends Glacier2::Session, PeerRegistrationHandler
      {
      PeerIdentifierSet getMyAvailablePeers() throws PeerException;

      Object* connectToPeer(string peerUserId) throws PeerAccessException, PeerUnavailableException, PeerConnectionFailedException, DuplicateConnectionException, AuthenticationRequiredException;

      PeerIdentifierSet getConnectedPeers() throws PeerException;

      void disconnectFromPeer(string peerUserId);

      void disconnectFromPeers();
      };
   };

#endif

Slice code common to the GUI client (a.k.a. TerkClient) and robot (a.k.a. Qwerk):
#ifndef TERK_PEER_COMMON_ICE
#define TERK_PEER_COMMON_ICE

#include <peer/MRPLPeer.ice>

[["java:package:edu.cmu.ri.mrpl"]]
module TeRK
   {
   enum ImageFormat {IMAGEJPEG, IMAGERGB24, IMAGERGB32, IMAGEGRAY8, IMAGEYUV420P, IMAGEUNKNOWN};

   sequence<byte> ByteArray;

   struct Image
      {
      int height;
      int width;
      int frameNum;
      ImageFormat format;
      ByteArray data;
      };

   exception GenericError
      {
      string reason;
      };

   interface VideoStreamerClient
      {
      int newFrame(Image frame) throws GenericError;
      };

   interface VideoStreamerServer
      {
      idempotent int startCamera() throws GenericError;
      idempotent int stopCamera() throws GenericError;
      idempotent int startVideoStream() throws GenericError;
      idempotent int stopVideoStream() throws GenericError;
      };

   interface TerkClient extends peer::ConnectionEventHandler, VideoStreamerClient
      {
      };

   interface Qwerk extends peer::ConnectionEventHandler, VideoStreamerServer
      {
      void cameraTiltUp();
      void cameraTiltDown();
      void cameraPanLeft();
      void cameraPanRight();
      void driveForward();
      void driveBack();
      void spinLeft();
      void spinRight();
      void stop();
      };

   };

#endif

The client's Ice config file (don't worry about the "@glacier.host@" stuff...the real, correct hostname gets inserted by our build tool depending on the build target):
Ice.ProgramName=DiffDriveTerkClient
Ice.Package.peer=edu.cmu.ri.mrpl
Ice.Package.TeRK=edu.cmu.ri.mrpl
Ice.Default.Package=edu.cmu.ri.mrpl
Ice.Default.Router=TerkGlacier/router:tcp -h @glacier.host@ -p 10004
Teleop.Client.Router=TerkGlacier/router:tcp -h @glacier.host@ -p 10004
Teleop.Client.Endpoints=
Ice.ACM.Client=0
Ice.ACM.Server=0
Ice.MonitorConnections=10
Ice.Warn.Connections=1
Ice.Logger.Timestamp=1
Ice.ThreadPool.Client.Size=5
Ice.ThreadPool.Client.SizeMax=20
Ice.ThreadPool.Server.Size=5
Ice.ThreadPool.Server.SizeMax=20

The robot's Ice config file:
Ice.ProgramName=RobotClient
Ice.Package.peer=edu.cmu.ri.mrpl
Ice.Package.TeRK=edu.cmu.ri.mrpl
Ice.Default.Package=edu.cmu.ri.mrpl
Ice.Default.Router=TerkGlacier/router:tcp -h @glacier.host@ -p 10004
Robot.Client.Router=TerkGlacier/router:tcp -h @glacier.host@ -p 10004
Robot.Client.Endpoints=
Ice.ACM.Client=0
Ice.ACM.Server=0
Ice.MonitorConnections=60
Ice.Warn.Connections=1
Ice.Trace.Network=0
Ice.Trace.Protocol=0

I've reached the forum post max length (sorry!), so I'll post the Glacier2 and relay config files as attachments.

Thanks heaps!

chris

Comments

  • bernard
    bernard Jupiter, FL
    Hi Chris,

    I don't quite understand your setup:
    - are your Java GUI and robot talking directly to each other, or only through the relay (like in Mark's article)
    - are you using bi-dir connections between the relay and Java GUI, and between the relay and Ice-E program?
    - what are you using Glacier2 for? Are any calls routed through a Glacier2 router?

    If you can simplify your setup, it would be much easier to pinpoint the issue.

    From your description, I don't see how it could be a connection problem: even if you use a bi-dir connection, small traffic from your GUI to your robot would not affect traffic in the other direction.
    If you can't eliminate the relay, it would be useful to post its implementation ... or even better, post a simple demo that shows your issue.

    Cheers,
    Bernard
  • are your Java GUI and robot talking directly to each other, or only through the relay (like in Mark's article)

    Only through the relay, as in Mark's article.
    what are you using Glacier2 for? Are any calls routed through a Glacier2 router?

    Our relay server sits behind Glacier2. We're using Glacier2 for authentication and session management. So, yes, all calls are routed through Glacier2.
    are you using bi-dir connections between the relay and Java GUI, and between the relay and Ice-E program?

    Yes, since both the GUI and the robot communicate with the relay server through Glacier2.
    If you can't eliminate the relay, it would be useful to post its implementation

    We can't eliminate the relay. The benefit of the relay is that it allows both the client and the robot to be behind firewalls and still talk to one another (the relay isn't firewalled, other than sitting behind Glacier2). E.g., the relay is our solution for enabling someone to have a robot at their home (behind their home firewall) and control it from, say, work (behind their corporate firewall). Here are some portions of the relay code that seem most relevant...

    Here's the main() method for the relay server. You'll see that it extends IceApplication--this class is essentially the same as ZeroC's Ice.Application, except that I've tweaked it to allow loading the config file from a jar file. The ConnectionManager class is where all the proxy registration and peer connection stuff is implemented. It's really the heart of the relay--it keeps some stuff in memory only (e.g. registered proxy objects) while persisting some stuff to the database (e.g. which peers are connected). I should mention that the database is only updated upon connection and disconnection--that is, it's not used at all for calls the robot and client make on each other once the peers are connected.
    public final class RelayServer extends IceApplication
       {
       public int run(final String[] args)
          {
          // fetch the context map keys from the properties
          final String contextMapKeyPeerIdentity = communicator().getProperties().getProperty("contextMapKeyPeerIdentity");
          final String contextMapKeyPeerUserid = communicator().getProperties().getProperty("contextMapKeyPeerUserid");
    
          // configure and start up the relay server
          final ObjectAdapter adapter = communicator().createObjectAdapter("RelayServer");
          final ConnectionManager connectionManager = new ConnectionManager(adapter, contextMapKeyPeerIdentity, contextMapKeyPeerUserid);
          adapter.add(new TerkPermissionsVerifierServant(), Util.stringToIdentity("TerkPermissionsVerifier"));
          adapter.add(new TerkSessionManagerServant(connectionManager), Util.stringToIdentity("TerkSessionManager"));
          adapter.addServantLocator(new SingletonServantLocator(new AsynchronousBlobjectServant(connectionManager, connectionManager)), "");
          adapter.activate();
    
          communicator().waitForShutdown();
    
          return 0;
          }
    
       public static void main(final String[] args)
          {
          // force Hibernate to initialize now (so there's no lag later, e.g. when the first user logs in)
          HibernateUtil.getSessionFactory();
    
          // configure and start up the relay server
          final RelayServer app = new RelayServer();
          final int status = app.main("RelayServer", args, "/edu/cmu/ri/mrpl/TeRK/relay/RelayServer.properties");
          System.exit(status);
          }
       }
    

    Here's the AsynchronousBlobjectServant. You can see from above that the ConnectionManager is passed in to the AsynchronousBlobjectServant's constructor since the ConnectionManager is responsible for mapping Ice identities to object proxies (it implements the IdentityToObjectProxyMapper interface). The implementation of that mapping is trivial (so I won't show it here)--simply use the identity as a key into a map which maps identities to object proxies:
    public final class AsynchronousBlobjectServant extends BlobjectAsync
       {
       private static final Log LOG = LogFactory.getLog(AsynchronousBlobjectServant.class);
    
       private final IdentityToObjectProxyMapper identityToObjectProxyMapper;
       private final ContextMapEntrySetter contextMapEntrySetter;
    
       public AsynchronousBlobjectServant(final IdentityToObjectProxyMapper identityToObjectProxyMapper)
          {
          this(identityToObjectProxyMapper, null);
          }
    
       public AsynchronousBlobjectServant(final IdentityToObjectProxyMapper identityToObjectProxyMapper, final ContextMapEntrySetter contextMapEntrySetter)
          {
          this.identityToObjectProxyMapper = identityToObjectProxyMapper;
          this.contextMapEntrySetter = contextMapEntrySetter;
          }
    
       /**
        * Retrieves the target proxy by calling {@link IdentityToObjectProxyMapper#getObjectProxyForIdentity(Identity)} and,
        * if not <code>null</code>, wraps the given {@link AMD_Object_ice_invoke} in an {@link AMI_Object_ice_invoke} and
        * passes it to the proxy's {@link ObjectPrx#ice_invoke_async ice_invoke_async()} method.  Throws an
        * {@link ObjectNotExistException} if the proxy returned by
        * {@link IdentityToObjectProxyMapper#getObjectProxyForIdentity getObjectProxyForIdentity()} is <code>null</code>.
        * Subclasses can customize the context map passed to the proxy providing a
        * {@link ContextMapEntrySetter} to this class's constructor.
        *
        * @see IdentityToObjectProxyMapper
        * @see ContextMapEntrySetter
        */
       public void ice_invoke_async(final AMD_Object_ice_invoke amdCallback, final byte[] inParams, final Current current)
          {
          if (LOG.isDebugEnabled())
             {
             LOG.debug("AsynchronousBlobjectServant.ice_invoke_async()");
             LOG.debug(IceUtil.dumpCurrentToString(current));
             }
          ObjectPrx proxy = identityToObjectProxyMapper.getObjectProxyForIdentity(current.id);
          if (proxy != null)
             {
             if (current.facet.length() > 0)
                {
                proxy = proxy.ice_newFacet(current.facet);
                }
             final AMI_Object_ice_invoke amiCallback = new AsynchronousCallback(amdCallback);
             if (contextMapEntrySetter != null)
                {
                contextMapEntrySetter.setCustomContextMapEntries(current);
                }
             proxy.ice_invoke_async(amiCallback, current.operation, current.mode, inParams, current.ctx);
             return;
             }
    
          LOG.info("Proxy returned by getObjectProxyForIdentity() was null.  Throwing ObjectNotExistException.");
          throw new ObjectNotExistException(current.id, current.facet, current.operation);
          }
       }
    

    Finally, here's what the robot (simulator) does to connect to the relay. The client code is nearly identical. I've omitted the code which reads in the id and password:
    final Ice.RouterPrx defaultRouter = communicator().getDefaultRouter();
    if (defaultRouter == null)
       {
       LOG.error("no default router set");
       return 1;
       }
    
    final RouterPrx router = RouterPrxHelper.checkedCast(defaultRouter);
    if (router == null)
       {
       LOG.error("configured router is not a Glacier2 router");
       return 1;
       }
    
    // read in the id and password... (code omitted)
    
    // create a session
          try
             {
             userSessionPrx = UserSessionPrxHelper.uncheckedCast(router.createSession(id, pw));
             break;
             }
          catch (PermissionDeniedException ex)
             {
             LOG.error("permission denied:\n" + ex.reason);
             }
          catch (CannotCreateSessionException ex)
             {
             LOG.error("cannot create session:\n" + ex.reason);
             }
          }
    
    // start up the session pinger
    final IceSessionPinger iceSessionPinger = new IceSessionPinger(50, userSessionPrx);
    iceSessionPinger.start();
    
    final Identity callbackReceiverIdent = new Identity();
    callbackReceiverIdent.name = "robotCallbackReceiver";
    callbackReceiverIdent.category = router.getServerProxy().ice_getIdentity().category;
    
    // register callback
    final ObjectAdapter adapter = communicator().createObjectAdapter("Robot.Client");
    final QwerkServant qwerkServant = new QwerkServant();
    final QwerkPrx qwerkServantPrx = QwerkPrxHelper.uncheckedCast(adapter.add(qwerkServant, callbackReceiverIdent));
    adapter.activate();
    
    // register my callbacks with the relay
    try
       {
       userSessionPrx.registerCallbacks(qwerkServantPrx, qwerkServantPrx);
       }
    catch (RegistrationException e)
       {
       LOG.error("RegistrationException while trying to register the callbacks", e);
       }
    
    From your description, I don't see how it could be a connection problem: even if you use a bi-dir connection, small traffic from your GUI to your robot would not affect traffic in the other direction.

    Hmmm. That's what I hoped and expected, but definitely not what we're seeing. Commands from the client to the robot appear to be getting queued up (since the robot apparently can either send a video frame or respond to a command, but never do both at the same time). If you bang away enough on the keyboard in the GUI client, so many commands get queued up (presumabley since Ice is transparently delaying commands, as it does in the thread pool model when no more threads are available in the pool?) that the relay starts throwing UnkownLocalExceptions (which wrap TimeoutException) since the relay is set to a 2-second timeout (which is arbitrary at the moment and something we need to tweak later).

    Does any of this second post help identify the issue? Thanks,

    chris
  • bernard
    bernard Jupiter, FL
    I'd suggest to try without Glacier2 and the relay, not because it makes sense for your application, but in order to simplify your environment when trying to isolate this issue.

    Of course, if you don't see any problem when the robot and client talk directly to each other, then there is something wrong with your Relay/Glacier2 setup.

    Also, is this issue just a slow down or a deadlock? You may want to disable timeouts altogether to see if you get a deadlock.
    For simplicity, I'd recommend to use your Java emulator in thread-per-connection mode instead of the actual Ice-E robot ... if you get a deadlock, dumping the stack traces will be straightforward and potentially very useful.

    Cheers,
    Bernard
  • I'd suggest to try without Glacier2 and the relay, not because it makes sense for your application, but in order to simplify your environment when trying to isolate this issue.

    Of course, if you don't see any problem when the robot and client talk directly to each other, then there is something wrong with your Relay/Glacier2 setup.

    Doesn't the fact that everything suddenly works fine when I change the simulated robot from thread-per-connection to thread pool suggest that it's not a relay/Glacier2 issue? That is, if tweaking that one part of the robot's config file makes the difference, then isn't it most likely that it's an issue with how the robot is coded/configured?

    I'm hesitant to try it without the relay just because it's an altogether different architecture. That is, the robot would need to be a server instead of a client. It just feels too different to be a useful test at the moment. I'd prefer something a little more incrementally different than our current architecture. Anyway, I'll try to find a little time to hack up a limited demo of the problem and post it here.
    Also, is this issue just a slow down or a deadlock? You may want to disable timeouts altogether to see if you get a deadlock.

    Definitely not a deadlock, just a slow down. The messages do eventually get delivered to the robot and processed (including the ones that the relay times out on and sends exceptions back to the client!). After the queue of messages calms down, everything comes back to life--that is, the video displayed in the client resumes at its normal speed (as long as you're not sending commands to the robot) and further commands can still be sent to the robot (though with the same slow-down behavior).

    I should mention that it's not just the video that slows to a crawl. Even the session pinger (which runs in its own thread) can't get its pings through--so, if the human bangs away on the keyboard sending a barrage of commands to the robot, the robot gets so swamped responding to commands that it can't get its video frames through to the client nor can it even ping Glacier2. So, it'll eventually get auto-expired by Glacier2 which causes the relay to break the peer association.
    For simplicity, I'd recommend to use your Java emulator in thread-per-connection mode instead of the actual Ice-E robot ... if you get a deadlock, dumping the stack traces will be straightforward and potentially very useful.

    No deadlock, just slow-down. Exactly the same behavior we see with the real robot.

    Perhaps you could help clarify this part of the Ice manual (p 779, section 30.8.2):
    The thread-per-connection concurrency model creates a separate thread for each incoming and outgoing connection. The thread for an incoming connection dispatches requests and, if the connection is bidiretional, handles replies to outgoing bidirectional requests. The thread for an outgoing connection processes replies and, if the connection is bidirectional, dispatches incoming requests.

    It's correct to say that connections made to Glacier2 are always bidirectional, right? If so, then it sure sounds like there's only a single thread for handling all incoming and outgoing traffic. That would definitely explain the behavior we're seeing. Am I just totally misunderstanding the docs?

    thanks,

    chris
  • marc
    marc Florida
    bartley wrote:
    It's correct to say that connections made to Glacier2 are always bidirectional, right? If so, then it sure sounds like there's only a single thread for handling all incoming and outgoing traffic. That would definitely explain the behavior we're seeing. Am I just totally misunderstanding the docs?

    Connections from the client to Glacier2 are always bi-directional. Also, there is only one single connection from the client to Glacier2. Connections from Glacier2 to the backend are just regular connections, i.e., these are not bi-directional.

    Whether or not this has anything to do with the problem you are experiencing, I'm afraid I don't know, because quite frankly, I'm having trouble following what you are exactly doing. If you can condense your problem description into a smaller and simpler example, I will be happy to look at it.
  • Chris,

    A couple of thoughts, FWIW.

    1. From the ICE documentation (3.0.1 p779) thread-per-connection: "the thread for an outgoing connection processes replies and, if the connection is bidirectional, dispatches incoming requests". Your bidir connection to G2 and the relay is therefore a single thread. If I haven't misunderstood this, then long running operations may be part of the problem... eg. cameraTiltUp or driveForward may be measured in the hundreds of ms due to synchronous commands to servo motors and so prevent outbound video invocations on the same thread. To prove this, don't do any work in the Qwerk command functions - simply return immediately - and see if this reduces the severity of the problem.

    From what you've said, unless you have nested calls between commands and video streaming on client or server side, then its not really a deadlock issue and purely a thread contention issue.

    2. Have you tried using two communicators - one dedicated to video client invocations and the other to command servant invocations? My understanding is that two communicators would force two connections to the default router, whereas two sessions created on the same router with same communicator may not... A similar workaround is to create two adapters on the relay (diff ports, etc) to force two connections on the client side - one for video, one for command invocations - to ensure two threads of execution.

    3. Your VideoStreamerClient invocations are twoway, and unless you're using AMI then you will block until you get an int result and/or exceptions back. You might considering using AMI or making these oneway if that doesn't break semantics, since that single thread on the client side is being blocked unable to process Qwerk commands in the meantime...Potential scalability issue (?)

    HTH.
  • marc
    marc Florida
    bartley wrote:
    I read something in the Ice docs about possibly opening up another connection, and tried it in the robot simulator, but didn't notice any improvement. Maybe I didn't do it right though--all I did was have the robot code call ice_connectionId("video") on the client proxy it receives from the relay and use that new proxy solely for pushing the video frames. I was assuming video would use that new connection, while commands from the client to the robot would use the original connection that the robot made when it first connected to the relay (and registered its proxy, etc.)

    If I understand correctly, your robot calls back to the client (indirectly, using both the relay and glacier), using the client proxy. Do you use bi-directional connections from your relay to the robot? If not, then the callback will use a separate connection already, so requesting a new connection explicitly makes no difference.

    I recommend to run your robot code with both request tracing and connection tracing to see what connections are established, and what requests are sent, and when they are replied to. Use Ice.Trace.Protocol=1 and Ice.Trace.Network=3 (which will give you a lot of information to analyze). You could do the same for your Java robot simulator, which uses thread pool, and which, as you say, works fine. From the differences of the logs you should be able to find out where the problem lies.
  • Marc wrote:
    Connections from the client to Glacier2 are always bi-directional. Also, there is only one single connection from the client to Glacier2.

    Ok, good, that's what I thought. Thanks.

    Joe wrote:
    From the ICE documentation (3.0.1 p779) thread-per-connection: "the thread for an outgoing connection processes replies and, if the connection is bidirectional, dispatches incoming requests". Your bidir connection to G2 and the relay is therefore a single thread.

    Right. This is what I was attempting to get at in my post above. I just wasn't sure if it was true.

    Joe wrote:
    If I haven't misunderstood this, then long running operations may be part of the problem... eg. cameraTiltUp or driveForward may be measured in the hundreds of ms due to synchronous commands to servo motors and so prevent outbound video invocations on the same thread. To prove this, don't do any work in the Qwerk command functions - simply return immediately - and see if this reduces the severity of the problem.

    Long running operations are definitely a problem, but even lots of relatively quick operations bog things down. I proved it in the opposite way, actually--we noticed the slowdown with the real robot, but not with the simulated one because although the simulated robot sends (fake) video, the response to servo commands was initially nothing more than a log statement. Putting a sleep in there of even 10 ms is enough to slow things down so that the video becomes (slightly) choppy.

    Joe wrote:
    From what you've said, unless you have nested calls between commands and video streaming on client or server side, then its not really a deadlock issue and purely a thread contention issue.

    Yes, exactly.

    Joe wrote:
    Have you tried using two communicators - one dedicated to video client invocations and the other to command servant invocations? My understanding is that two communicators would force two connections to the default router, whereas two sessions created on the same router with same communicator may not... A similar workaround is to create two adapters on the relay (diff ports, etc) to force two connections on the client side - one for video, one for command invocations - to ensure two threads of execution.

    No, I haven't tried either of those options yet. Thanks for the suggestions! I was kinda waiting for the opinions of the folks on the forum--I figured I'm just doing something wrong/stupid (and maybe I am).

    Joe wrote:
    Your VideoStreamerClient invocations are twoway, and unless you're using AMI then you will block until you get an int result and/or exceptions back. You might considering using AMI...

    Alas, AMI isn't supported in Ice-E, so that's not an option. I've wondered about possibly compiling Ice for our robot, but a couple teammates said they've tried without success (something about our limited environment not supporting Unicode characters, I think?). The robot runs Linux on an ARM9 chip, and we only have about 3.5 megs of free space for our executable. I have no idea whether it's worth trying harder, or if the 3.5 megs is too small to even bother. Those things are probably best left for another thread, though.

    Marc wrote:
    If I understand correctly, your robot calls back to the client (indirectly, using both the relay and glacier), using the client proxy.

    Yes. I should clarify, though, that neither the client nor the robot ever communicate directly with the relay--they always go through Glacier2.

    Marc wrote:
    Do you use bi-directional connections from your relay to the robot?

    Yes, since Glacier2 always sits between the relay and the robot. The architecture is like this:
    |-------|               |-----------|               |--------|
    |       |               |  Glacier  |               |        |
    | Robot | <-----------> | |-------| | <-----------> | Client |
    |       |               | |       | |               |        |
    |-------|               | | Relay | |               |--------|
                            | |       | |
                            | |-------| |
                            |           |
                            |-----------|
    

    The relay just blindly passes messages back and forth between the client and robot using Dynamic Ice (as Mark described in his Connections article).

    Marc wrote:
    I recommend to run your robot code with both request tracing and connection tracing to see what connections are established, and what requests are sent, and when they are replied to.

    I'll give that a try, thanks.

    So maybe the question I should be asking (the one I couldn't seem to come up with when I started this thread) is, "Is a client using the thread-per-connection model talking to a server through Glacier2 always limited to simplex behavior? If not, what's the best/most appropriate way to make the communication duplex?"

    thanks for all your help (from all of you),
    best,

    chris
  • marc
    marc Florida
    Can you clarify your diagram? You show the relay within Glacier2. However, it can only be either behind or before Glacier2. Where exactly is it? Also, is it the robot that establishes a session with Glacier2? In this case, from a Glacier2 perspective, the robot is the client, and it would use bi-directional connections. Otherwise, if it does not establish the connection, then it's on Glacier2's server side, and there are no bi-directional connections.

    Client <--a--> A <--b--> B <--c--> Robot

    Is A the Relay and B Glacier2 or vice versa?

    Does the Robot establish the session with Glacier2, or does the Client?
  • Can you clarify your diagram? You show the relay within Glacier2. However, it can only be either behind or before Glacier2. Where exactly is it?

    Ha, I was afraid of that when I drew it. By putting it inside, all I was attempting to show is that there's no way to talk directly with the relay. That is, ALL communication goes through Glacier2. So, yes, the relay sits behind Glacier2.
    Also, is it the robot that establishes a session with Glacier2?

    Yes.
    In this case, from a Glacier2 perspective, the robot is the client, and it would use bi-directional connections.

    Agreed.
    Client <--a--> A <--b--> B <--c--> Robot

    Is A the Relay and B Glacier2 or vice versa?

    Neither. See the drawing below.
    Does the Robot establish the session with Glacier2, or does the Client?

    They both do. I described this in the bulleted list in the original post, but I can see that it's not very obvious.

    This drawing might be clearer...
    |-------|               |-----------|               |--------|
    |       |               |           |               |        |
    | Robot | <-----------> |           | <-----------> | Client |
    |       |               |  Glacier  |               |        |
    |-------|               |           |               |--------|
                            |           |
                            |-----------|
                                ^   |
                                |   |
                                |   |
                                |   v
                              |-------|
                              |       |
                              | Relay |
                              |       |
                              |-------|
    

    Note: I drew two arrows between Glacier2 and the relay (instead of a single, double-ended one) simply because that's how it's shown in the Ice manual in section 40.4.1 "Bidirectional Connections", p. 1263

    thanks,

    chris
  • marc
    marc Florida
    OK, I think I understand it better now. So both your Robot and your "Client" are clients from a Glacier2 perspective. Your Relay is the server from a Glacier2 perspective. This means that the connections from both your Robot and your Client are bi-directional, and the connections from/to your Relay are just regular connections. Correct?

    Further, your Robot sends video streams to the Relay (using a separate thread, which you create), which then sends them to the Client. These are interrupted, or slowed down, if you Robot gets some other requests to process. Again, correct me if I'm not understanding the scenario correctly.

    If the above is correct, tell me more about these other requests, and about how you send the video. Do you use oneway or twoway calls for video? Are these other requests oneway or twoway? Do they take long to process?
  • OK, I think I understand it better now. So both your Robot and your "Client" are clients from a Glacier2 perspective. Your Relay is the server from a Glacier2 perspective. This means that the connections from both your Robot and your Client are bi-directional, and the connections from/to your Relay are just regular connections. Correct?

    Yes, exactly.
    Further, your Robot sends video streams to the Relay (using a separate thread, which you create), which then sends them to the Client.

    Yes. Well, I guess. :o That is, to send the video, the robot's video thread is merely calling a method (the newFrame() method in the VideoStreamerClient interface -- see the slice code I included in the original post) on the client proxy object that it (the robot) received from the relay. The robot received the client proxy from the relay (via the peerConnected() method defined in the PeerConnectionEventHandler interface) when the client initiated the peer connection (by calling the connectToPeer() method defined in the UserSession interface). So, once it has the client proxy, the robot doesn't really know that there's a relay involved (the same is true for the client). As far as it's concerned, it magically gets a proxy to the client and then starts calling newFrame() on it (after the client tells it to start doing so by calling startVideoStream()). I just want to make it clear that there's no special video code or anything in the relay. The relay is totally unaware of what messages are being sent back and forth between robot and client. That's the beauty of Dynamic Ice here--I can change the interface common to the robot and client without ever needing to change anything in the relay (btw, thanks for that...that's an awesome feature and one of the many reasons we love Ice).
    These are interrupted, or slowed down, if you Robot gets some other requests to process.

    Yes, exactly.
    If the above is correct, tell me more about these other requests, and about how you send the video.

    Hopefully my description above of the video is sufficient. Oh, I should probably add that video is merely just a series of JPEGs. We're getting up to 20 frames/sec here in our lab, so it's fast enough that it appears to be video.

    The other commands, those from the client to the robot, are, for now, just simple drive commands and camera movement commands (again, see the slice code--look in particular at the 9 methods in the Qwerk interface). Don't get too caught up in what the other commands actually do, though. They can really be just about anything (again, due to the freedom that Dynamic Ice buys us--we can change interface in the Slice definition without the relay knowing or caring). It's probably sufficient to know that they cause the robot to perform some action, and that action may take a few milliseconds to execute. They'll block while the robot is executing them (we're not calling them with AMI...we haven't identified too many cases where it's useful for the client to invoke commands on the robot with AMI).

    I should add that, from the relay's perspective, there's nothing different about commands that the client is calling on the robot vs the newFrame() method that the robot uses to deliver video frames to the client. They're all just commands which the relay shuffles back and forth with Dynamic Ice. The newFrame() method just happens to be called more often (up to 20 times per second).
    Do you use oneway or twoway calls for video?

    All calls are two-way. This is because I didn't want the relay to impose any restrictions on what kinds of calls the robot and client could make on each other. That is, I didn't want to enforce that the robot could only ever make one-way calls on the client. I thought about some way to determine on-the-fly whether a call should one- or two-way, but Mark convinced me that it isn't possible (http://www.zeroc.com/vbulletin/showthread.php?p=9057#post9057).

    I enforce the "all calls are two way" rule when the client or robot registers its servant proxy with the relay (via the registerCallbacks() method defined in the PeerRegistrationHandler interface). The relay calls ice_newContext() on the proxy (passing in a map with the "_fwd" key set to "z") and then caches the new proxy returned by ice_newContext(). This proxy is cached by the relay in a map (which maps identities to proxies) and used by AsynchronousBlobjectServant (see code in my 2nd post) to get the proper target proxy for the Dynamic Ice stuff (I believe this is similar to how Mark did things in his Connections article).
    Are these other requests oneway or twoway? Do they take long to process?

    I probably already answered these in this post, but, again, all requests are twoway. As for how long they take to process, it's hard to say. Partially because I don't know the actual execution time (10 ms or so?), but also because network latency plays a role since the calls block until completion. Even more, since it's possible that one of our users may decide to add methods to the client or robot slice definitions (which is fine since the relay doesn't care--all that matters is that the robot and client are coded against the same slice interfaces), they may end up doing something stupid and accidentally writing a method that takes seconds to return. We'd like our architecture to handle that as gracefully as possible--just because a call from, say, the client to the robot may take a long time shouldn't mean that the robot isn't still able to send video or make calls on the client.

    thanks again for your help (and patience),

    chris
  • marc
    marc Florida
    I believe the problem is that your video calls are twoway. With bi-directional connections and thread-per-connection, there is only one thread to both dispatch the requests that your robot receives, and to handle the replies for the calls that send the video. This mean that your video calls get blocked by the other requests. Try to make them oneway. Note that this is only relevant for how the robot sends the video to Glacier2. You can still have Glacier2 forward them as twoway if you like (although I don't see any advantage in doing so).
  • I believe the problem is that your video calls are twoway. With bi-directional connections and thread-per-connection, there is only one thread to both dispatch the requests that your robot receives, and to handle the replies for the calls that send the video. This mean that your video calls get blocked by the other requests. Try to make them oneway.

    That fixes it. That is, the video never appears delayed anymore in the client. But, if you bang away on the keys in the client (sending lots of drive commands in a short amount of time), enough commands fill up (presumably since Ice is transparently delaying commands, as it does in the thread pool model when no more threads are available in the pool?) that the relay starts throwing timeout exceptions back to the client (though the commands do eventually get executed on the robot).

    So, that's definitely an improvement, thanks. However, it's not really an ideal, long-term solution for our architecture since it's conceivable that the robot might need to make repeating, two-way calls on the client at regular intervals (i.e. where it really does needs a return value).

    Is there no other way to make a thread-per-connection client talk to Glacier2 in a duplex kinda way? Would Joe's suggestion of two communicators work? Maybe one for incoming traffic (calls from the GUI client and responses from it) and one for outgoing traffic (calls to the GUI client and responses to it)? Any other possibilities?

    thanks,

    chris
  • marc
    marc Florida
    bartley wrote:
    That fixes it. That is, the video never appears delayed anymore in the client. But, if you bang away on the keys in the client (sending lots of drive commands in a short amount of time), enough commands fill up (presumably since Ice is transparently delaying commands, as it does in the thread pool model when no more threads are available in the pool?) that the relay starts throwing timeout exceptions back to the client (though the commands do eventually get executed on the robot).

    I'm not sure what you mean with "transparently delaying commands". Do you mean that the only dispatch thread is busy? If so, then this is expected for the thread-per-connection model, since your server can only dispatch one request at a time (since there is only one thread with one connection).

    You can also configure Glacier2 to delay requests, but this is purely a configuration choice to avoid having rogue clients flood your server (and to allow for oneway batching--see the Ice manual for details).
    bartley wrote:
    So, that's definitely an improvement, thanks. However, it's not really an ideal, long-term solution for our architecture since it's conceivable that the robot might need to make repeating, two-way calls on the client at regular intervals (i.e. where it really does needs a return value).

    Is there no other way to make a thread-per-connection client talk to Glacier2 in a duplex kinda way? Would Joe's suggestion of two communicators work? Maybe one for incoming traffic (calls from the GUI client and responses from it) and one for outgoing traffic (calls to the GUI client and responses to it)? Any other possibilities?

    No, Glacier2 requires bi-directional connections, otherwise it wouldn't be any good for firewall traversal.

    Multiple communicators with multiple sessions (one per communicator) would work. You could also establish multiple sessions with a single communicator, but the setup for this is probably more complex than just using multiple communicators.
  • I'm not sure what you mean with "transparently delaying commands". Do you mean that the only dispatch thread is busy? If so, then this is expected for the thread-per-connection model, since your server can only dispatch one request at a time (since there is only one thread with one connection).

    Yeah, sorry, that's what I meant. By questioning it, I was just trying to show that it's likely I don't know what the heck I'm talking about and wasn't entirely sure that that's where the buffering is occurring. :o I stole the "transparently delaying commands" phrase from the Ice manual's discussion of the thread pool model (p. 776):
    "If a thread pool is exhausted because all threads are currently dispatching a request, additional incoming requests are transparently delayed until a request completes and relinquishes its thread"

    I was just wondering if the buffering is happening for the same reason. Sounds like it is, thanks.
    Multiple communicators with multiple sessions (one per communicator) would work. You could also establish multiple sessions with a single communicator, but the setup for this is probably more complex than just using multiple communicators.

    Ok, thanks. Both sound more complicated than what we need for now. But good to know that it's an option.

    Thanks again for all your help. And sorry if I didn't sound appropriately appreciative in my last post. Your one-way proxy solution for the video really is a huge improvement over what we had! Many thanks! :D

    best,

    chris