Archived

This forum has been archived. Please start a new discussion on GitHub.

Latency: structs vs. classes

We are looking to use Ice in robotic applications. The typical usage is to send small amounts of data at regular intervals. Classes offer inheritance, but the manual warns that using classes leads to reduced performance and higher memory usage. We tried to quantify this penalty and concentrated on performance (latencies) because that's more important to us. If someone has experience in this area we'd like to hear about it.

We recorded round-trip times of sending objects of 4 different sizes.
Object sizes [bytes]
[32 3273 307224 921648]

Average round-trip times [ms] for the 4 object sizes above:
intra-host : structures
[0.0772 0.2313 3.1041 20.2786]
intra-host : classes
[0.1883 0.3437 3.3447 14.7283]

host-to-host : structures
[0.4833 2.0631 58.8670 178.2058]
host-to-host : classes
[0.9046 2.4465 57.6246 171.1470]

So it seems that for small object sizes (our typical case), sending classes is ~2 times slower than sending structures. For bigger objects the difference is insignificant (or even slightly reversed?!). Does this sound reasonable?

cheers, alex


implementation (for reference)

Debian Linux, Ice 2.1.2, C++, gcc-3.3, 2GHz laptop, 10Mb Ethernet.

101 objects sent one a time (preload=1), 0.25s interval after each ping.

// slice
class BaseObject {};
class ClassObject1 extends BaseObject { /* data */};
struct StructObject1 { /* data */};

interface Replier
{
// pings using classes
BaseObject ping( BaseObject );

// pings using structures
StructObject1 ping1( StructObject1 );
StructObject2 ping2( StructObject2 );
StructObject3 ping3( StructObject3 );
StructObject4 ping4( StructObject4 );
};

// c++
BaseObjectPtr ReplierI::ping(const BaseObjectPtr& obj, const ::Ice::Current& )
{
return obj;
}
StructObject1 ReplierI::ping1(const StructObject1& obj, const ::Ice::Current& )
{
return obj;
}

Comments

  • marc
    marc Florida
    A factor of two for small objects sounds about right. For large objects, the additional overhead for classes will become insignificant.

    I have no idea why you see a lower latency for large classes compared to large structs. In theory, structs should never be slower than classes.

    Note that all latency tests should be performed with an optimized (-O2 or -O3 and -DNDEBUG) version of Ice. Otherwise you measure debug mode overhead, which is in particular significant with respect to the STL.
  • marc
    marc Florida
    marc wrote:
    I have no idea why you see a lower latency for large classes compared to large structs. In theory, structs should never be slower than classes.

    Actually, the difference might be due to the passing of structs by value compared to passing of class instances by smart pointer for the return type.
  • marc wrote:
    Actually, the difference might be due to the passing of structs by value compared to passing of class instances by smart pointer for the return type.
    Do you mean the local passing of data around (different function signatures in c++)? Or different mechanisms of passing data over the network?

    Regarding optimization, our code is compiled with -O3. Ice was compiled as shipped, so I assume it's optimized.

    alex
  • marc
    marc Florida
    n2503v wrote:
    Do you mean the local passing of data around (different function signatures in c++)? Or different mechanisms of passing data over the network?

    I mean the local passing of the return value in C++. For structs, it's by value, meaning that a copy is made (except for some cases where the optimizer can avoid this copy). For class instances, it's a smart pointer, so no copy is made.

    Note, however, that if you plan to make your application multi-threaded, then you must be careful when passing a return value by smart pointer. For more details, please see the article "Thread-Safe Marshaling" in issue 2 of our Ice newsletter (http://www.zeroc.com/newsletter/).
  • marc wrote:
    I mean the local passing of the return value in C++. For structs, it's by value, meaning that a copy is made (except for some cases where the optimizer can avoid this copy). For class instances, it's a smart pointer, so no copy is made.

    Understood. I've just ported the pinger/replier to IceBox services and the difference between structs/classes shows up even more dramatically. I'm including a graph of latencies in all three cases (IceBox, same host, two hosts) vs. two representations (structs, classes).
    marc wrote:
    Note, however, that if you plan to make your application multi-threaded, then you must be careful when passing a return value by smart pointer. For more details, please see the article "Thread-Safe Marshaling" in issue 2 of our Ice newsletter (http://www.zeroc.com/newsletter/).

    Thanks for the tip, I read the article. We'll probably be affected and it would've been tough to track down. Also, passing smart pointers inside IceBox is also an issue which has to be delt with.
  • n2503v wrote:
    I've just ported the pinger/replier to IceBox services and the difference between structs/classes shows up even more dramatically.

    That's not really a surprise. Structs are passed by value, so if you pass structs around that are 100kB in size, you copy an extra 100kB of data at least once for each structure. In contrast, for classes, all that is copied is a smart pointer (or reference, for Java and C#), the cost of which is constant and very small.

    For host-to-host communication, I suspect that you are seeing essentially the same performance for large structures and large objects because the bottleneck here is network bandwidth rather than memory bandwidth, so the cost of the extra in-memory copy eventually drowns in the cost of going on the wire.

    I'm not 100% sure why the intra-host figures are essentially the same as the host-to-host figures, but I suspect that the bottleneck here is also the network (or backplane) bandwidth. Are you using 127.0.0.1 for this or the real IP addresses? Depending on your kernel implementation, changing between the real IP and the loopback interfaces may make a difference.

    Cheers,

    Michi.
  • michi wrote:
    I'm not 100% sure why the intra-host figures are essentially the same as the host-to-host figures, but I suspect that the bottleneck here is also the network (or backplane) bandwidth. Are you using 127.0.0.1 for this or the real IP addresses? Depending on your kernel implementation, changing between the real IP and the loopback interfaces may make a difference.

    Yes, I configure the endpoint as 'tcp' so it listens on 127.0.0.1:xxxx. I haven't checked if using the real ip address makes a difference.