Archived

This forum has been archived. Please start a new discussion on GitHub.

Help : AMI

Hi,

I am trying to create a benchmark like NetPIPE for MPI.
Send x times a message then increase the size of the message.
In this view, I want to use synchrone, one way and asynchrone invocation using the throughput demo with some modifications.

But I need your help in order to create an equivalent using asynchrone invocation because ice documentation is not very helpful.

Thank you very much for your help. :o

Comments

  • benoit
    benoit Rennes, France
    Hi Mikael,

    Could you detail a little more what exactly you're trying to measure with AMI and the throughput test?

    For instance, do you want to send the "sendByteSeq" requests as fast as possible without waiting for the answer of each request? Or do you want to wait for the answer of each request before to send another request?

    If that's the former, you just need to create an AMI callback object for each AMI invocation:
    	  throughput->sendByteSeq_async(new AMI_Throughput_sendByteSeqI(), byteSeq);
    

    Your implementation of the AMI callback is also not totally correct (the sendByteSeq doesn't return anything so ice_response shouldn't have any parameters), the signature of ice_response should be:
      virtual void ice_response()
      {
      }
    

    Cheers,
    Benoit.
  • I want to send request as fast as possible without waiting for an answer.

    Your solution is running but I was expected better performances.
    It's better than synchrone invocation but one way invocation is better than your solution. And I don't know why.

    But my server and my client are running on the same machine, may be that is why performances are not so good as expected.

    Can you give an explanation, please ?

    Thanks a lot Benoit
    Cheers
    Mykael
  • benoit
    benoit Rennes, France
    An AMI invocation is a bit slower than a oneway invocation because of the following:
    • it requires creating a callback object on the heap
    • a response needs to be sent by the server
    • the response needs to be read by the client and passed to the AMI callback object.

    However, I wouldn't expect a big difference between oneway and AMI especially when transfering large amounts of data (where most of the time is spent in writing/reading the data from the network connection). Did you build Ice and your benchmark with optimization enabled? I tried it (with Ice 3.0.1) and I'm getting the following results:
    With AMI:
    ----------
    thores:~/src_ice$ ./client 
    89 micro-secondes pour 2 octets -> 0 Mo/s 
    99 micro-secondes pour 4 octets -> 0 Mo/s 
    98 micro-secondes pour 8 octets -> 0 Mo/s 
    99 micro-secondes pour 16 octets -> 0 Mo/s 
    98 micro-secondes pour 32 octets -> 0 Mo/s 
    99 micro-secondes pour 64 octets -> 0 Mo/s 
    99 micro-secondes pour 128 octets -> 1 Mo/s 
    102 micro-secondes pour 256 octets -> 2 Mo/s 
    103 micro-secondes pour 512 octets -> 4 Mo/s 
    104 micro-secondes pour 1024 octets -> 9 Mo/s 
    106 micro-secondes pour 2048 octets -> 19 Mo/s 
    109 micro-secondes pour 4096 octets -> 37 Mo/s 
    121 micro-secondes pour 8192 octets -> 67 Mo/s 
    165 micro-secondes pour 16384 octets -> 99 Mo/s 
    257 micro-secondes pour 32768 octets -> 127 Mo/s 
    551 micro-secondes pour 65536 octets -> 118 Mo/s 
    1616 micro-secondes pour 131072 octets -> 81 Mo/s 
    3339 micro-secondes pour 262144 octets -> 78 Mo/s 
    6308 micro-secondes pour 524288 octets -> 83 Mo/s 
    12085 micro-secondes pour 1048576 octets -> 86 Mo/s 
    
    With oneway:
    --------------
    thores:~/src_ice$ ./client 
    20 micro-secondes pour 2 octets -> 0 Mo/s 
    23 micro-secondes pour 4 octets -> 0 Mo/s 
    22 micro-secondes pour 8 octets -> 0 Mo/s 
    23 micro-secondes pour 16 octets -> 0 Mo/s 
    23 micro-secondes pour 32 octets -> 1 Mo/s 
    24 micro-secondes pour 64 octets -> 2 Mo/s 
    23 micro-secondes pour 128 octets -> 5 Mo/s 
    26 micro-secondes pour 256 octets -> 9 Mo/s 
    27 micro-secondes pour 512 octets -> 18 Mo/s 
    30 micro-secondes pour 1024 octets -> 34 Mo/s 
    35 micro-secondes pour 2048 octets -> 58 Mo/s 
    44 micro-secondes pour 4096 octets -> 93 Mo/s 
    62 micro-secondes pour 8192 octets -> 132 Mo/s 
    102 micro-secondes pour 16384 octets -> 160 Mo/s 
    190 micro-secondes pour 32768 octets -> 172 Mo/s 
    521 micro-secondes pour 65536 octets -> 125 Mo/s 
    1478 micro-secondes pour 131072 octets -> 88 Mo/s 
    3154 micro-secondes pour 262144 octets -> 83 Mo/s 
    5936 micro-secondes pour 524288 octets -> 88 Mo/s 
    11257 micro-secondes pour 1048576 octets -> 93 Mo/s 
    

    AMI is a bit slower but not by much. How do AMI invocations perform on your machine compared to oneway invocations?

    Cheers,
    Benoit.
  • Optimization ?
    I build Ice 3.0.1 and my benchmark with default options.

    I am getting the following results :
    With oneway:
    -------------
    34 micro-secondes pour 2 octets -> 0 Mo/s
    41 micro-secondes pour 4 octets -> 0 Mo/s
    42 micro-secondes pour 8 octets -> 0 Mo/s
    43 micro-secondes pour 16 octets -> 0 Mo/s
    43 micro-secondes pour 32 octets -> 0 Mo/s
    44 micro-secondes pour 64 octets -> 1 Mo/s
    44 micro-secondes pour 128 octets -> 2 Mo/s
    44 micro-secondes pour 256 octets -> 5 Mo/s
    45 micro-secondes pour 512 octets -> 11 Mo/s
    46 micro-secondes pour 1024 octets -> 22 Mo/s
    48 micro-secondes pour 2048 octets -> 42 Mo/s
    53 micro-secondes pour 4096 octets -> 77 Mo/s
    64 micro-secondes pour 8192 octets -> 128 Mo/s
    88 micro-secondes pour 16384 octets -> 186 Mo/s
    144 micro-secondes pour 32768 octets -> 227 Mo/s
    394 micro-secondes pour 65536 octets -> 166 Mo/s
    821 micro-secondes pour 131072 octets -> 159 Mo/s
    2016 micro-secondes pour 262144 octets -> 130 Mo/s
    3811 micro-secondes pour 524288 octets -> 137 Mo/s
    7226 micro-secondes pour 1048576 octets -> 145 Mo/s
    
    With AMI:
    ---------
    
    74 micro-secondes pour 2 octets -> 0 Mo/s
    95 micro-secondes pour 4 octets -> 0 Mo/s
    195 micro-secondes pour 8 octets -> 0 Mo/s
    225 micro-secondes pour 16 octets -> 0 Mo/s
    259 micro-secondes pour 32 octets -> 0 Mo/s
    341 micro-secondes pour 64 octets -> 0 Mo/s
    472 micro-secondes pour 128 octets -> 0 Mo/s
    790 micro-secondes pour 256 octets -> 0 Mo/s
    183 micro-secondes pour 512 octets -> 2 Mo/s
    123 micro-secondes pour 1024 octets -> 8 Mo/s
    132 micro-secondes pour 2048 octets -> 15 Mo/s
    145 micro-secondes pour 4096 octets -> 28 Mo/s
    165 micro-secondes pour 8192 octets -> 49 Mo/s
    212 micro-secondes pour 16384 octets -> 77 Mo/s
    292 micro-secondes pour 32768 octets -> 112 Mo/s
    652 micro-secondes pour 65536 octets -> 100 Mo/s
    1140 micro-secondes pour 131072 octets -> 114 Mo/s
    2164 micro-secondes pour 262144 octets -> 121 Mo/s
    3978 micro-secondes pour 524288 octets -> 131 Mo/s
    7351 micro-secondes pour 1048576 octets -> 142 Mo/s
    

    So performances using oneway invocation are better. But is there a way to have better performances or it is the best that I can get ?
  • benoit
    benoit Rennes, France
    You should build Ice with optimization -- check the value of the OPTIMIZE variable in the Ice-3.0.1/config/Make.rules file of your Ice source distribution, it should be defined to "yes". If you built Ice in debug mode, you should rebuild it (and your benchmark) with optimization, this should lead to better results.

    Your performance figures look fine. Let me try to explain the figures:
    • The optimal speed on your machine seems to be 227 MB/s for oneway requests of size 32KB -- that's about as fast as the speed you'd get on a gigabit ethernet network!
    • For smaller size requests, oneway is faster than AMI. That's expected. For small requests, compared to large requests, the Ice runtime will spend more time reading the responses of AMI requests. Since no responses are sent with oneway requests, there's no such overhead.
    • For bigger size requests, AMI and oneway have about the same performance.
    • Throughput for oneway or AMI requests with a large request size isn't optimal. That's because the client can't continuously feed the server: it has to block and wait for the server to read the data.

    Note that Ice 3.1 will support a new array mapping for byte sequences. This will lead to increased performance by reducing the number of memory copies to receive a byte sequence. Ice-E already has this optimization, you might also want to check it out if performance is important for your project.

    Cheers,
    Benoit.
  • Thank you for your help and your explanations. It is very helpful.
    I hope that Ice3.1 will be release soon.
    Unfortunately some features (concurrency models and asynchrone invocation) are not in IceE.
  • matthew
    matthew NL, Canada
    Note that if you have a commercial need for the missing features then you should contact info@zeroc.com.