On Ice Performances

ganzuoni · September 2004

I found a performance evaluation of Java middleware on jacorb
mailing list.
http://lists.spline.inf.fu-berlin.de/pipermail/jacorb-developer/2004-September/006752.html

Ice is included, but the "winner" is....ORBAcus 4.1

Guido.

marc · September 2004

Funny, I was the main architect of ORBacus and of Ice

Anyway, I wouldn't go so far to say that this is a "performance evaluation". It's a latency evaluation. Furthermore, it kind of compares apples with oranges, because different thread models are used (thread pool, thread per connection, etc.).

ganzuoni · September 2004

Originally posted by marc
Funny, I was the main architect of ORBacus and of Ice

Yes, I know

I have been a happy ORBAcus user.
(Well, it is the main reason why I posted the link).

Anyway, I wouldn't go so far to say that this is a "performance evaluation".
It's a latency evaluation. Furthermore, it kind of compares apples with oranges, because different thread models are used (thread pool, thread per connection, etc.).

Yes, absolutely true !

michi · September 2004

I read this paper, and I'm quite disappointed by it. In particular, the measurements do not support the conclusions:

Comparing various Java-based CORBA implementations has shown that ORBacus 4.1.0 is between 2 and 5 more efficient than JacORB 2.1/2.2, OpenORB 1.3.1/1.4. and Java IDL 1.4.2. Moreover Java IDL is the worse of benchmarked CORBA platforms.

From that, one is easily led to believe that ORBacus is generally 2 to 5 times more efficient than Ice. That is most certainly not the case: the test only measures latency and completely ignores all other aspects of performance, so it cannot draw conclusions about anything but latency.

Comparing CORBA/IIOP versus other ORB protocols has shown that GIOP/IIOP implemented in an efficient way (i.e. in ORBacus) provides better roundtrip latency results than other benchmarked ad hoc protocols as provided by Java RMI/JRMP, Ice, and Fractal RMI platforms.

To draw conclusions about the efficiency of a protocol from a latency test is invalid. That is because, whether you use the Ice protocol or IIOP, the information that is exchanged is essentially the same: a request sends a simple request header, and a reply returns a simple reply header. (The headers for the Ice protocol and IIOP are very similar.)

What the test measures first and foremost is the threading model that is in use (and, to a lesser extent, the efficiency of the code that passes and retrieves the headers to and from the kernel).

Ice uses a single threading model (thread pool) to dispatch on the server side, whereas ORBacus offers a whole raft of threading models. By default, I think ORBacus uses a blocking threading model (or a reactive one--I can't remember because it's been too long).

At any rate, reactive (and especially, blocking) are faster threading models than using a leader-follower pool, which explains the better performance of ORBacus in the benchmark.

So, the question is then, if blocking and reactive threading models are so much faster than a thread pool, why doesn't Ice offer them? Technically, Ice could offer those threading models easily. (The technical issues aretrivial.)

We agonized for quite a while over what to do with respect to threading models and, in the end, decided to settle for a thread pool model for the following reasons:

For the majority of applications, a difference in call dispatch rate of 30% is completely irrelevant. If an application is truly bound by call dispatch rate, it either has unusual requirements, or it is poorly designed because it exchanges too little information with each remote call. As soon as request sizes reach about a kilobyte, call dispatch rate becomes irrelevant anyway because the marshaling cost dominates the overall throughput.
Experience with many years of supporting ORBacus showed us that the plethora of threading models was a hindrance rather than a help. Customers had endless problems with the selection (or rather, non-selection) of the correct threading model, and we spent considerable support capacity on dealing with threading model problems. (And, of course, to look good on benchmarks, ORBacus picked the most efficient threading model as the default.)
Blocking and reactive threading models have deadlock issues with nested callbacks. This problem occurs a lot more in practice than one might expect, and we know from bitter past experience that customers kept getting hurt by the this time and time again. A thread pool model does not have these problems and is therefore a more suitable and realistic choice for the majority of applications: those applications that need it appreciate it, and those that do not need it are not hurt unduly.

Choosing a middleware platform is intrinsically difficult, and a lot of considerations enter into the choice, most of which are difficult to quantify. That is why benchmarks are so popular: they distill one aspect of the platform (run-time performance) down to a simple set of numbers that are easily compared. However, the relevance of the benchmark to real application development is often rather remote. What usually matters much more than performance (within reasonable limits, of course) are things such as the quality of the APIs, ease of use of language mappings, architectural quality, footprint at run time, number of supported OS and compiler platforms, which languages are supported, the simplicity and elegance of the object model, features such as compression and encryption, etc.

All these things are much harder to quantify than a performance benchmark, and the relative importance of each item varies greatly from project to project. So, given the immense complexity of the topic, people latch onto benchmarks because they nicely simplify things (even though they may simplify an entirely inappropriate part of the problem space).

Obviously, performance is important for some applications, and it shouldn't be neglected. But it is only one aspect of a much wider range of issues.

So, to me, the benchmark has little relevance, for two reasons:

Real applications very rarely invoke operations that do not accept any parameters and do not return any results. Instead, applications send data or receive data (or both) with each operation invocation. A lot of applications send highly-structured data, such as big structures, complex object graphs, or lots of proxies. Marshaling performance for those scenarios is much more important than just latency.
Comparing latency across different threading models is comparing apples and oranges. The results are meaning less unless the threading models are identical.

I have not checked this, so I'm going out on a limb here. But my expectation is that, if the benchmarks were re-run for ORBacus and Ice using a thread pool model for both, performance would come out very similar. And, for complex data, such as proxies (vs IORs) and graphs of objects (vs objects by value), my expectation would be for Ice to be faster than ORBacus.

Cheers,

Michi.

Archived

On Ice Performances

Comments

Categories