Archived

This forum has been archived. Please start a new discussion on GitHub.

Response thread

Just wanted to confirm.

Without AMI and/or AMD, do responses to requests always go back to the same thread that issues the request?

Thanks
Budyanto

Comments

  • The reason I'm asking is I seem to have hit a deadlock issue that I can't figure out why...It happens intermittently.

    Here's the log on client and server with Trace.Protocol and Trace.Network turned on.
    Client:
    ==========================
        [junit] 2010-05-02 16:31:20,040 INFO [main] ETicketTest - =================== testPlayOneRound_forward ====================
        [junit] -- 5/2/10 16:31:20:043 Protocol: sending request
        [junit]    message type = 0 (request)
        [junit]    compression status = 0 (not compressed; do not compress response, if any)
        [junit]    message size = 166
        [junit]    request id = 2
        [junit]    identity = lucketicket
        [junit]    facet =
        [junit]    operation = register
        [junit]    mode = 0 (normal)
        [junit]    context =
        [junit] -- 5/2/10 16:31:20:151 Protocol: received reply
        [junit]    message type = 2 (reply)
        [junit]    compression status = 0 (not compressed; do not compress response, if any)
        [junit]    message size = 118
        [junit]    request id = 2
        [junit]    reply status = 0 (ok)
        [junit] -- 5/2/10 16:31:20:169 Protocol: sending request
        [junit]    message type = 0 (request)
        [junit]    compression status = 0 (not compressed; do not compress response, if any)
        [junit]    message size = 120
        [junit]    request id = 3
        [junit]    identity = lucketicket/machine178f2d8f5-7207-4fcd-b2d3-c1a14681749f
        [junit]    facet =
        [junit]    operation = clientToServer
        [junit]    mode = 0 (normal)
        [junit]    context =
        [junit] -- 5/2/10 16:31:20:173 Protocol: received reply
        [junit]    message type = 2 (reply)
        [junit]    compression status = 0 (not compressed; do not compress response, if any)
        [junit]    message size = 25
        [junit]    request id = 3
        [junit]    reply status = 0 (ok)
        [junit] -- 5/2/10 16:31:20:200 Network: accepted tcp connection
        [junit]    local address = 10.11.22.165:35264
        [junit]    remote address = 10.11.22.165:35298
        [junit] -- 5/2/10 16:31:20:201 Protocol: sending validate connection
        [junit]    message type = 3 (validate connection)
        [junit]    compression status = 0 (not compressed; do not compress response, if any)
        [junit]    message size = 14
        [junit] -- 5/2/10 16:31:20:202 Protocol: received request
        [junit]    message type = 0 (request)
        [junit]    compression status = 0 (not compressed; do not compress response, if any)
        [junit]    message size = 141
        [junit]    request id = 1
        [junit]    identity = 087841f3-1a69-4620-adb1-a8dcb2ab84db
        [junit]    facet =
        [junit]    operation = serverToClient
        [junit]    mode = 0 (normal)
        [junit]    context =
        [junit] 2010-05-02 16:31:20,209 INFO [Ice.ThreadPool.Server-0] ETicketTest$GameCallback - Callback from server: {"method":"s2c_reset","numbers":[50,50,50,100,25,25,100,50]}
        [junit] 2010-05-02 16:31:20,212 INFO [Ice.ThreadPool.Server-0] ETicketTest$GameCallback - received reset method
        [junit] -- 5/2/10 16:31:20:213 Protocol: sending request
        [junit]    message type = 0 (request)
        [junit]    compression status = 0 (not compressed; do not compress response, if any)
        [junit]    message size = 125
        [junit]    request id = 4
        [junit]    identity = lucketicket/machine178f2d8f5-7207-4fcd-b2d3-c1a14681749f
        [junit]    facet =
        [junit]    operation = clientToServer
        [junit]    mode = 0 (normal)
        [junit]    context =
    
    
    
    
    Server:
    ========================================================
    2010-05-02 16:31:20,198 DEBUG [JavaGameServer-lucketicket-1-GameService-lucketicket-Ice.ThreadPool.Server-0] com.wms.services.servant.GameServiceI-lucketicket - callback proxy: 087841f3-1a69-4620-adb1-a8dcb2ab84db -t:tcp -h 10.11.22.165 -p 35264
    -- 5/2/10 16:31:20:198 JavaGameServer-lucketicket-1-GameService-lucketicket: Network: trying to establish tcp connection to 10.11.22.165:35264
    -- 5/2/10 16:31:20:199 JavaGameServer-lucketicket-1-GameService-lucketicket: Network: tcp connection established
       local address = 10.11.22.165:35298
       remote address = 10.11.22.165:35264
    -- 5/2/10 16:31:20:202 JavaGameServer-lucketicket-1-GameService-lucketicket: Protocol: received validate connection
       message type = 3 (validate connection)
       compression status = 0 (not compressed; do not compress response, if any)
       message size = 14
    -- 5/2/10 16:31:20:202 JavaGameServer-lucketicket-1-GameService-lucketicket: Protocol: sending request
       message type = 0 (request)
       compression status = 0 (not compressed; do not compress response, if any)
       message size = 141
       request id = 1
       identity = 087841f3-1a69-4620-adb1-a8dcb2ab84db
       facet =
       operation = serverToClient
       mode = 0 (normal)
       context =
    -- 5/2/10 16:32:42:507 JavaGameServer-lucketicket-1-GameService-lucketicket: Network: closing tcp connection
       local address = 10.11.22.165:35298
       remote address = 10.11.22.165:35264
    -! 5/2/10 16:32:42:517 JavaGameServer-lucketicket-1-GameService-lucketicket: warning: JavaGameServer-lucketicket-1-GameService-lucketicket-Ice.ThreadPool.Server-0: dispatch exception:
       identity: lucketicket/machine178f2d8f5-7207-4fcd-b2d3-c1a14681749f
       facet:
       operation: clientToServer
       IceInternal/Outgoing.java:147:in `invoke': Ice.ConnectionLostException: null (NativeException)
            from com/wms/services/contract/_GameSessionCallbackDelM.java:41:in `serverToClient'
            from com/wms/services/contract/GameSessionCallbackPrxHelper.java:52:in `serverToClient'
            from com/wms/services/contract/GameSessionCallbackPrxHelper.java:28:in `serverToClient'
            from com/wms/services/servant/GameServiceI.java:130:in `serverToClient'
            from file:/sandboxes/webplatform/build/ice/deployment/hanode1/servers/JavaGameServer-lucketicket-1/distrib/jars/jrubygems.jar!/gems/json_pure-1.2.0/lib/json/pure/generator.rb:339:in `register_app_server'
    


    This is what's happening when the hang happens.

    Server is calling serverToClient on client's callback proxy. The client gets the message on its server thread pool and then calls clientToServer on the server proxy. I can see in the log the clientToServer request was sent out but somehow never got to the server.

    The server has client threadpool size of 5 and server threadpool size of 10.
    The client has 1 client thread and 2 server thread.

    I started seeing this problem when I move the server to a VMWare image running CentOS 5.4 (32 bit). The host is CentOS 5.4 (64 bit).

    Thanks
    Budyanto
  • benoit
    benoit Rennes, France
    Hi,

    Note that the client thread pool is only used to dispatch callbacks from AMI requests or requests from bi-directional connections. If you don't use bi-directional connections or AMI, you don't need to increase the Ice client thread pool size, a size of 1 thread is enough.

    To answer your question, if you don't use AMI, the thread invoking on a twoway proxy will block waiting until the response is received. The response is read by a thread from the Ice client thread pool and this thread pool thread notifies the thread that is blocked that the response was received.

    I suspect the deadlock you are observing is caused by nested calls and a limited size of the server thread pool in the client or server. The best way to figure this out is to dump the stack traces of each process and see whether or not the Ice thread pool threads are all waiting for some invocations to return.

    For example, if the 5 threads of your server are invoking callbacks on clients and these client callbacks in turn are invoking on the server it will deadlock because there are no more threads available in the server thread pool to dispatch the client invocations. There are several options to solve such deadlocks: use AMD/AMI or increase the server thread pool size (unless you can determine the maximum number of concurrent nested invocations, you will need to set a high number of threads for SizeMax to be sure deadlocks can never occur in practice).

    I recommend checking out the section about threading in the Ice manual for more information on how the thread pool works, see here. The Ice/nested demo also demonstrates how nesting works and how it's affected by the server thread pool size.

    Cheers,
    Benoit.
  • I tried upping the server thread pool on client and server to 20. I still see it getting stucked at exactly the same point.

    If it were indeed an issue with the server threadpool running out of threads wouldn't there be some sort of warning in the logs that says that the thread pool is running low on threads? I remember seeing those before.
  • benoit
    benoit Rennes, France
    Yes, there should be some warnings indicating that the thread pool is running low on threads if thread pool warnings are enabled.

    Thread pool warnings are now disabled by default with Ice 3.4.0. With older versions, warnings were enabled by default. Which version do you use? If you use 3.4.0, you can enable thread pool warnings with Ice.ThreadPool.Server.SizeWarn=15 for example.

    In any case, the best way to investigate deadlocks is to take a look a the thread dump of the JVM so I recommend taking a look at them when the deadlock occurs. You can also post the dumps here on the forums if you want, I'll be happy to take a look at them.

    Cheers,
    Benoit.
  • I'm using Ice 3.4.0.

    I'll try to get a dump. How do I get a dump on the server side? The server is an IceBox service. I tried sending sending kill -3 on the IceBox server but I don't get anything in stdout or std error for the IceBox.

    Budyanto
  • Here's the client dump (attached).
  • benoit
    benoit Rennes, France
    Hmm, the kill -3 <pid> (or kill -SIGQUIT <pid>) command should in theory dump JVM threads on stderr or stdout (on Unix at least). How do you start the IceBox server? Which OS and JDK are you using? Do you by any chance redirect the IceBox server output to files using the Ice.StdErr/Ice.StdOut properties configuration properties?

    Cheers,
    Benoit.
  • IceBox is started from IceGrid.

    I'm on CentOS 5.4 and jdk - Java(TM) SE Runtime Environment (build 1.6.0_18-b07)
  • benoit
    benoit Rennes, France
    The client thread dump indicates that the main thread is waiting on a monitor at ETicketTest.java line 125 and that there's one thread from the Ice server thread pool waiting for the response of a remote call: the clientToServer call at ETicketTest.java line 154.

    I guess the server is calling on the client here and in turn the client calls back again on the server but for some reasons the call on the server hangs. The stack traces of the server JVM should provide us more information on the possible reasons of this hang.

    If IceBox is started through IceGrid, the IceBox server output should go to the console where IceGrid is started. If you start IceGrid as a Unix daemon using an init.d script, do you configure any logging in the IceGrid node configuration file?

    If not, you will need to if you want to see the server stderr/stdout output. You have two options. The first option is to just redirect the IceGrid node stderr/stdout to files, that's also where the servers started by IceGrid will print their stdout/stderr output. For this option add the following properties to your IceGrid node configuration file:
    #
    # Redirect IceGrid stderr/stdout to files.
    # 
    Ice.StdErr=/home/game/logs/stderr.txt
    Ice.StdOut=/home/game/logs/stdout.txt
    

    The second option is to redirect each server output to separate files by adding the following properties into the IceGrid node configuration file:
    #
    # Redirect the servers' stdout and stderr to files in this directory:
    #
    IceGrid.Node.Output=/home/game/logs
    #IceGrid.Node.RedirectErrToOut=1
    

    Let me know if you need more information on this. Once you redirect the server or node output to a file, sending the SIGQUIT signal to the IceBox Java sever should dump the threads into the file. You should also be able to access the content of these files with the IceGridGUI or icegridadmin tools (using the either the server "node show" or "server show" commands).

    Cheers,
    Benoit.
  • Hmmm,

    I'm already using the second approach you described. I can see the output from setting Ice Trace properties. But kill -3 on the pid of the java IceBox server is not giving me the dump (not in the err or out file).
  • Ok I can get the server dump using jstack. I'm just trying to repro the hang at the moment.
  • Here are both the client and server dump, from the same run (attached).
  • So what should really be happening is as follows.
    Client                                                 Server
    -----                                                  ------
    clientToServer()  (main thread)               clientToServer_async (this is AMD)
                                                             ice_response()
                                                             serverToClient()
    serverToClient()  (server thread)                   HANG...
    clientToServer()
       HANG...
    

    There are definitely server threads available for the server to process the second clientToServer.

    If I set the server thread pool of the icebox service to, say 10, but the IceBox itself only has 1, what happens in this case?
  • benoit
    benoit Rennes, France
    Hi,

    I suspect you are not configuring the right thread pool: the dump shows only one thread for the server thread pool of your IceBox service (the thread pool named "JavaGameServer-lucketicket-1-GameService-lucketicket-Ice.ThreadPool.Server").

    An IceBox server has its own communicator for administrative purposes. The IceBox server also creates a communicator for each service and this communicator is configured with the properties defined at the <service> scope in the IceGrid XML descriptor.

    Do you set the thread pool properties within the <service> XML element or within the <icebox> server element? You should configure them within the <service> element to configure the Ice communicator of your service.

    See here for more information on IceBox services and here for information on IceBox integration with IceGrid.

    Note that you can also use the IceGrid GUI or icegridadmin command line tool (with the "service properties" or "server properties" commands) to check the properties used by a server or IceBox service at runtime.

    Cheers,
    Benoit.
  • I set it for the service..

    Here's a screenshot of IceGrid GUI (attached).

    Interestingly enough for the cases when it didn't hang, I do see multiple "JavaGameServer-lucketicket-1-GameService-lucketicket-Ice.ThreadPool.Server" threads in the dump. So it seems like sometimes the Communicator doesn't get setup with the right pool?, even though the IceGrid GUI shows that it does?

    Could this be because it is running on a VMware virtual machine??
  • I have another server dump that shows that two threads are in the server pool (attached). In this case the system still hangs.

    What I'm seeing is that even though I set the size to, say 5, I don't consistently get 5 threads in the dump.
  • benoit
    benoit Rennes, France
    This latest thread dump is definitely suspicious. Your configuration looks fine. The thread dump shows 2 threads for your service server thread pool: one is waiting on the serverToClient remote call and the other is idle waiting on the thread pool monitor.

    I don't see any threads waiting on the thread pool selector (which monitors Ice connections and eventually read data from connections). This is wrong -- there should always be such a thread unless there are no more threads available in the thread pool (which isn't the case here).

    So something must have happened to this thread.

    Do you see any errors in the stderr output of the server? You should see thread dumps of the JVM when sending the SIGQUIT signal, is perhaps something not working with the standard error output? Do you use any logging facilities such as log4j which could interfere with Ice stderr/stdout redirection (there used to be some issues with log4j, see this [thread=3588]thread[/thread] for example, I'm not sure this is still relevant though)?

    At a first step, you could try running the icegridnode from the command line and disable the stderr/stdout file redirection. The stderr/stdout output of servers should then go to the console and you should see thread dumps of Java servers when you send them the SIGQUIT signal. If this works, you could try to reproduce the deadlock with this configuration and see if any errors show up.

    Cheers,
    Benoit.
  • I am using log4j. Once I turn off redirection, I can get the dump by sending the pid kill -3.

    I have full output from the node now. I don't really see any error but I do see these. So the pool size keeps shrinking until it's at 1.
    -- 5/3/10 16:59:23:246 JavaGameServer-lucketicket-1-GameService-lucketicket: ThreadPool: shrinking Ice.ThreadPool.Server: Size=4
    

    I can send you the full log via email. Is there an email I can send it to?

    Thanks
    Budyanto
  • benoit
    benoit Rennes, France
    You can send it to support@zeroc.com.

    Cheers,
    Benoit.