Eval Questions

2321fdaf · January 2006

Hi, I'm evaluating performance of modern ORB technologies for upcoming commercial use in a rearchitecture of one of our products which must access very large data, very fast.

I have a couple of questions (hopefully I've got the sig file right

)

Question 1

Currently in our application we do data access via classes like the following (C++):

class Foo
{
public:

std::vector<DataType> getData(...);

private:

std::vector<DataType> _data;

}

where, in general, the _data array is very large (say 4-100GB <yes, that's not a typo>). The getData(...) method slice's and dices and serves up chunks of data that are typically < 1GB, reformated into a single contiguous array.

I've setup a basic benchmarking app using ICE to test latency and throughput of an Ice client-server pair with an interface, servant, and proxy class implementation. I am looking for suggestions for setting up a config to squeeze maximum performance from an intra-host connection (meaning that this doesn't have to actually go out across a TCP/IP connection) I'm working straight off the ICE demos, so a config file that works with them, should be fine. The objective is to measure the lowest possible overhead that Ice introduces over an intra-host connection vs. in-memory shared-memory direct access and determine whether this is acceptable for our needs.

Question 2

In many cases, we'd prefer to keep the data localized to a server object, and simply iterate through it at maximal speed. Assuming that the data structures we are trying to iterate over are standard (C++ stl containers), what is the best way implement an Ice client-server relationship to maximize throughput? One may consider this to be the case where we call getData(...) above with arguments which return 1 data value. These requests can be batched in groups up to appox 1000 requests.

Thank you all for your help in my evaluation.

cheers,

sean

michi · February 2006

2321fdaf wrote:

The getData(...) method slice's and dices and serves up chunks of data that are typically < 1GB, reformated into a single contiguous array.

Right. It's obviously impossible to get 100GB across in a single RPC, so you have to chunk the data. BTW, if you do a benchmark, you will probably find that sending more than a megabyte or so at a time makes no difference when compared to sending larger chunks--once the chunks are large enough so that the transmission time is dominated by bandwidth, the call dispatch delay disappears below the noise floor.

I've setup a basic benchmarking app using ICE to test latency and throughput of an Ice client-server pair with an interface, servant, and proxy class implementation.

The throughput demo in the Ice distribution might serve as a good starting point. It's easy to extend it to add new data types for benchmarking.

I am looking for suggestions for setting up a config to squeeze maximum performance from an intra-host connection (meaning that this doesn't have to actually go out across a TCP/IP connection) I'm working straight off the ICE demos, so a config file that works with them, should be fine.

The most efficient way would be to specify the IP address as 127.0.0.1, so data is sent over the loopback interface. (Depending on the implementation of your TCP/IP stack, that may or may not be faster than using the real IP address.)

The objective is to measure the lowest possible overhead that Ice introduces over an intra-host connection vs. in-memory shared-memory direct access and determine whether this is acceptable for our needs.

It's difficult to give specifics without knowing more about what kind of data you are transmitting. In general, classes and strings are somewhat more expensive to marshal than other data types; proxies also cost more. If you want the absolute best raw performance, you can transmit data as a byte sequence. However, whether this makes sense depends on your data and how expensive it is to transform to/from the byte sequence into the proper data types inside the application.

Note that V3.1 of Ice will have further speed improvements for sending sequences.

In many cases, we'd prefer to keep the data localized to a server object, and simply iterate through it at maximal speed. Assuming that the data structures we are trying to iterate over are standard (C++ stl containers), what is the best way implement an Ice client-server relationship to maximize throughput? One may consider this to be the case where we call getData(...) above with arguments which return 1 data value. These requests can be batched in groups up to appox 1000 requests.

I'm not sure I fully understand your scenario. Are you asking how to improve performance if clients want to retrieve small amounts of data in separate RPC calls? If so, performance will be limited by latency, so the way to get better performance is to avoid lots of synchronous RPCs for small data. For example, clients could let the server know what they want using batched oneways and then the server could send a whole pile of data to the client via a callback object at once, in a single RPC call.

Or, alternatively, clients could remember what data they need, package identifiers for the data in a big request to the server, and then get all the data back in a single RPC call.

In general, you will find though that Ice provides good throughput, close to the theoretical maximum of TCP/IP. For example, we easily get 800Mbps or so over the loopback interface with average hardware.

Cheers,

Michi.

Archived

Eval Questions

Comments

Categories