Icegridnode:long waiting problem

georgia419 · July 2006

Hi
Now I test my server program(the server is used to process alarm records),using icegridnode scheme.There are 4 clients invoke it's service,each client has 500 threads,and each thread sent only one alarm message to the server,I set ice_timeout(60000) for each invocation on client side. I also set properties such as
Ice.ACM.server =10
Ice.ACM.client =10
Ice.ThreadPool.Server.Size=100 on server side.

Then I run my server and clients, the server print the number that it has processed successfully.they work normal,but when the server has finished about more than 1000 records(usually about 1016 or 1020 records),it stop processing incoming records, and clients also wait, after a long time, about one minutes or more, the server restart and proceed. After all,the server can process all records but need a long waiting after it had processed more than 1000 records.

I need to improve the server's performance,how can I do?I need your advice,thank you in advance!

benoit · July 2006

Hello,

Before we can help you, could you please set your signature? See [thread=1697]this thread[/thread] for more information on how to set your signature.

Thanks!

Benoit.

georgia419 · July 2006

I have set my signature.:)

benoit · July 2006

You set 100 threads for the server thread pool -- why do you need so many threads? Did you try with just one thread (the default configuration)?

The best way to figure out what the server is doing when it hangs would be to attach to the process with the debugger and get a strack trace of all the threads (I would reduce the number of server threads first though). Btw, are you using Ice for C++, Java or C#?

Cheers,
Benoit.

georgia419 · July 2006

I am using Ice for C++,I have tried with just one thread in the server's thread pool.But the result is same,I also need to wait one minute or more.

Btw, 4 clients invoke the server's service synchronously,in another word,about 4*500=2000 threads invoke the server's service synchronously(of course, each thread sleep a few seconds in order to alleviate the server's press), so I think I need more threads in the server. Maybe my perspective is wrong,for it doesn't solve the problem.

benoit · July 2006

Whether or not you need many threads in the server thread pool really depends on what the server is doing. If the code executed by the server is always runnable (it's doing some computation for example) then it's not useful. Configuring more threads than the number of CPUs will only result in overhead since the OS will have to spend more time scheduling all these runnable threads. However, if the code eventually have to acquire shared resources and wait, it can be useful to configure more threads. While a thread is waiting to acquire a resource another thread can dispatch another request.

As for your problem where the server is hanging, I'm afraid it's difficult to say what the problem could be without more information. As I explained in my first post, the best would be to attach to the server process while it's hanging and check the stack traces of each thread. You could also try to write a small self compilable test case that demonstrates the problem so that we can look at it.

Cheers,
Benoit.

georgia419 · July 2006

Hello, Benoit,thank you for your advice!
I used my server program as a normal server, not using icegrid scheme,It works very well and the problem I mentioned above doesn't exsit.

But after using Icegrid scheme, the problem occured, also the server application throws exception during the time of long waiting. So I don't think there is something wrong in my server program.

The exception as follows :

[PHP]E:\buffer\clustclient\Release\server1>icegridnode --Ice.Config=config --warn
07/04/06 21:31:36.843 gyl: .\AlarmSrvI.cpp(154) Connect DB Success
07/04/06 21:32:21.875 gyl: .\StatusThread.cpp(40) Query num:0
07/04/06 21:32:27.156 gyl: .\StatusThread.cpp(40) Query num:0
07/04/06 21:32:32.156 gyl: .\StatusThread.cpp(40) Query num:0
07/04/06 21:32:37.156 gyl: .\StatusThread.cpp(40) Query num:0
07/04/06 21:32:39.828 gyl: .\AlarmSrvI.cpp(259) Start
07/04/06 21:32:39.843 gyl: .\AlarmSrvI.cpp(259) Start
07/04/06 21:32:39.843 gyl: .\AlarmSrvI.cpp(259) Start
07/04/06 21:32:39.875 gyl: .\AlarmSrvI.cpp(259) Start
07/04/06 21:32:39.890 gyl: .\AlarmSrvI.cpp(259) Start
07/04/06 21:32:42.187 gyl: .\StatusThread.cpp(40) Query num:66
07/04/06 21:32:47.218 gyl: .\StatusThread.cpp(40) Query num:209
07/04/06 21:32:52.250 gyl: .\StatusThread.cpp(40) Query num:347
07/04/06 21:32:57.281 gyl: .\StatusThread.cpp(40) Query num:509
07/04/06 21:33:02.296 gyl: .\StatusThread.cpp(40) Query num:628
07/04/06 21:33:07.328 gyl: .\StatusThread.cpp(40) Query num:745
07/04/06 21:33:12.359 gyl: .\StatusThread.cpp(40) Query num:886
07/04/06 21:33:17.359 gyl: .\StatusThread.cpp(40) Query num:1000
07/04/06 21:33:22.421 gyl: .\StatusThread.cpp(40) Query num:1011
07/04/06 21:33:27.421 gyl: .\StatusThread.cpp(40) Query num:1011
07/04/06 21:33:32.421 gyl: .\StatusThread.cpp(40) Query num:1013
07/04/06 21:33:37.421 gyl: .\StatusThread.cpp(40) Query num:1013
07/04/06 21:33:42.437 gyl: .\StatusThread.cpp(40) Query num:1013
icegridnode: warning: unexpected observer exception:
.\Outgoing.cpp:415: Ice::UnknownLocalException:
unknown local exception:
.\Network.cpp:705: Ice::SocketException:
socket exception: WSAENOBUFS
07/04/06 21:33:47.437 gyl: .\StatusThread.cpp(40) Query num:1013
07/04/06 21:33:52.437 gyl: .\StatusThread.cpp(40) Query num:1013
07/04/06 21:33:57.437 gyl: .\StatusThread.cpp(40) Query num:1013
07/04/06 21:34:02.437 gyl: .\StatusThread.cpp(40) Query num:1013
07/04/06 21:34:07.453 gyl: .\StatusThread.cpp(40) Query num:1013
07/04/06 21:34:12.453 gyl: .\StatusThread.cpp(40) Query num:1013
07/04/06 21:34:17.453 gyl: .\StatusThread.cpp(40) Query num:1013
07/04/06 21:34:22.453 gyl: .\StatusThread.cpp(40) Query num:1013
07/04/06 21:34:27.453 gyl: .\StatusThread.cpp(40) Query num:1013
07/04/06 21:34:32.453 gyl: .\StatusThread.cpp(40) Query num:1013
07/04/06 21:34:37.453 gyl: .\StatusThread.cpp(40) Query num:1013
07/04/06 21:34:42.453 gyl: .\StatusThread.cpp(40) Query num:1013
07/04/06 21:34:50.109 gyl: .\AlarmSrvI.cpp(154) Connect DB Success
07/04/06 21:34:50.125 gyl: .\AlarmSrvI.cpp(259) Start
07/04/06 21:34:50.125 gyl: .\AlarmSrvI.cpp(259) Start
07/04/06 21:34:50.125 gyl: .\AlarmSrvI.cpp(259) Start
07/04/06 21:34:50.125 gyl: .\AlarmSrvI.cpp(259) Start
07/04/06 21:34:50.125 gyl: .\AlarmSrvI.cpp(259) Start
07/04/06 21:34:50.265 gyl: .\AlarmSrvI.cpp(259) Start
07/04/06 21:34:50.281 gyl: .\AlarmSrvI.cpp(259) Start
07/04/06 21:34:50.296 gyl: .\AlarmSrvI.cpp(259) Start
07/04/06 21:34:50.296 gyl: .\AlarmSrvI.cpp(259) Start
07/04/06 21:34:50.296 gyl: .\AlarmSrvI.cpp(259) Start
07/04/06 21:34:55.265 gyl: .\StatusThread.cpp(40) Query num:133
07/04/06 21:35:00.265 gyl: .\StatusThread.cpp(40) Query num:250
07/04/06 21:35:05.281 gyl: .\StatusThread.cpp(40) Query num:368
07/04/06 21:35:10.281 gyl: .\StatusThread.cpp(40) Query num:486
07/04/06 21:35:15.281 gyl: .\StatusThread.cpp(40) Query num:617
07/04/06 21:35:20.296 gyl: .\StatusThread.cpp(40) Query num:737
07/04/06 21:35:25.312 gyl: .\StatusThread.cpp(40) Query num:851
07/04/06 21:35:30.312 gyl: .\StatusThread.cpp(40) Query num:980
07/04/06 21:35:35.312 gyl: .\StatusThread.cpp(40) Query num:987[/PHP]

The quote section shows my problem clearly. In addition, I set the number of server's thread pool is 10. So it may print several "start" synchronously.

georgia419 · July 2006

sorry, I have made a small mistake, the quote section isn't PHP Code, It just trace message I print on the screen.It's C++ code.

benoit · July 2006

The exception below indicates that your system is out of resources. See here for an explanation.

icegridnode: warning: unexpected observer exception: 
.Outgoing.cpp:415: Ice::UnknownLocalException: 
unknown local exception: 
.Network.cpp:705: Ice::SocketException: 
socket exception: WSAENOBUFS

Are you running the client, server and icegridnode on the same machine? Could it be that your system is running out of memory? Which Windows version do you use? I would recommend to closely watch the system resources (available memory, sockets, etc) when running your test.

Cheers,
Benoit.

georgia419 · July 2006

new clue

Hello, Benoit
Now I make the client, server and icegridnode running on the diffrent machines. But the problem still exist.

Someone told me to make my server code simple and stupid to find out the problem , I follow his advice, after all, my server just receive alarm message, the counter plus one if it receive one. The trace message still be print out to show how many messages it received now. The code is so simple that it actually do nothing but using a counter to compute the number of messages. But..:( , the problem still exsit.

I have mentioned in post #6 above, If I don't use IceGrid scheme, my server works very well. (using same properties setting, same machine, same clients, same OS, of course ,I set corresponding properties in the XML file, not in the "config" file as ususal)

Right now, I find a clue.
If I don't set ice_timeout(60000) on the client side. After the server have processed about 1020 records, It will stop processing and wait, It won't restart and proceed processing.
Hope it can give you more infomation about my problem.

Thank you!

benoit · July 2006

Hi,

Sorry, but it's still impossible to say why your server is hanging without more information. Could you send us a small self compilable test case which demonstrates the issue?

Cheers,
Benoit.

georgia419 · July 2006

important clue

Hello, I think I have found the point which cause the problem I mentioned above.

I did many test, finally I found the server doesn't hang during the waiting, It's client's fault. I find the code that cause client to hang.
The colored code cause waiting. In fact, the server and clients work very well , But after the server have processed a little more than 1000 records.

while(true)// so many threads ,so use while to guarantee each thread can gain a proxy successfully.
{
try
{
Sleep(10000); //In order to alleviate press, so sleep for a moment

m_proxy = AlarmSrvPrx::checkedCast (pCommunicator->stringToProxy("alarmproxy")->ice_timeout(60000));

break;
}
catch(const IceUtil::Exception)
{
failNum++; // a counter to compute the failure times
}
}

I think I can't solve this problem, can you give me some suggestions? Thank you very much.

benoit · July 2006

Hi,

This code could hang for many reasons:

The server hangs and can't process the checkedCast() call from the client.
The server is deactivated and the activation hangs (assuming the server is started on-demand).
For some reasons, the IceGrid registry can't process any more requests.

Again, I don't think we can solve your problem with so little information. The best would be to provide us a small self-compilable example demonstrating the issue. If that's too complicated, you could first send us dumps of the stack traces of each component involved in the hang: the server, the IceGrid registry and node.

Also, are you still getting the Ice::SocketException (socket exception: WSAENOBUFS) exception from the IceGrid node? Did you figure out where the resource exhaustion is coming from?

You could also try to enable some tracing on the client (with Ice.Trace.Network=2, Ice.Trace.Retry=1, Ice.Trace.Location=1, Ice.Trace.Protocol=1) to see what it's doing before the hang. Enabling some tracing on the IceGrid node and registry might also give us some clues (set the IceGrid.Node.Trace.Activator=2, IceGrid.Node.Trace.Server=2, IceGrid.Node.Trace.Adapter=2 properties on the node and IceGrid.Registry.Trace.Server=2 and IceGrid.Registry.Trace.Adapter=2 on the registry).

But again, to expediate this matter, the best would be to send us a small self-compilable example demonstrating the problem or some stack traces.

Cheers,
Benoit.

georgia419 · July 2006

Hello, thank you for your useful advice.

Though I don't find the reason why the problem occur, but I have solved it.
My clients inherit from the class Ice::Application, It has a default communicator that can be gained by invoking function communicator(). Because each client needs 500 threads to send alarm message, so I create a communicator for my own, using following code.

int argc = 0;
properties = Ice::createProperties();
properties->load("config");
pCommunicator = Ice::initializeWithProperties(argc, 0, properties);

Now I edit my code, using the default communicator, and the problem disappear, I am very happy, but I don't know why?

In addition, I am very embarrassed that I can't find way to get a strack trace of all the threads, because there are so many threads, can you introduce some useful tools or methods to achieve this aim.:o I know this is out of scope of ICE, but I really want to know.

Thank you very much!

benoit · July 2006

If you create one communicator per thread, each thread will have its own connection to the IceGrid registry and server. So if you run 4 clients, you'll need 4 * 500 = 2000 connections to the registry and 2000 connections to the server.

The Ice server thread pool can't handle more than 1024 connections simultaneously. I suspect you're gettting the hang when you hit this limit.

By using a single communicator per thread, each client establishes only one connection to the registry and one connection to the server so you don't have the problem anymore.

Examining 500 threads with the debugger would of course be a bit tedious. But in your case, the best would have been to examine the threads of the IceGrid registry and the server, there shouldn't be as many threads for these processes. To get the stack trace of each thread, we simply use the Microsoft Visual Studio debugger (on Windows) or gdb (on Linux) and attach to the running process.

Cheers,
Benoit.

Archived

Icegridnode:long waiting problem

Comments

Categories