handling complex numbers

sidney · March 2010

Hi all,

We are currently looking into ICE as a possible middleware layer for our project. One of the things we need is transferring arrays of complex numbers over the network.

Since we use C++, our complex numbers are usually of type complex<double>, or sometimes complex<float>.

However, Slice only supports double's and float's.

What would be the best approach to tackle this? I can think of a few approaches.

1) annotate the Slice spec for our interface to use complex<double> or complex<float> as a base type for a sequence. We'd be willing to give op language interoperability for this one. Unfortunately(?), this doesn't seem to be possible in Slice; while we can change the container type using a cpp annotation, we cannot force a C++- base type in a sequence.

2) Define a struct ComplexdDouble with a double 're' and 'im' field. Unfortunately, it would be quite inconvenient that these data items would be typed as ComplexDouble rather than complex<double> in our C++ code. We could do a lot of reinterpret_cast'ing, but this would obviously make our code less nice than necessary. Furthermore (somewhat surprising): early experiments indicate that this solution has a rather big overhead in terms of achievable bandwidth.

3) Send the complex number data as opaque, unsigned bytes. This allows us to use zero-copy semantics, but it will be a bit messy with much casting, etc.

4) Send the data as even-length double sequences, and just use it as complex numbers.

5) Implement support for complex numbers in slice (or pay ZeroC to do it). Most languages that Slice supports implement complex numbers natively or in their standard libraries, so this could surely be done.

My question to you guys (being much more experienced than I am): how would you handle this? Is there perhaps an option that I missed?

Best regards, Sidney

bernard · March 2010

Hi Sidney,

Welcome to our forums!

Like you, I see a number of options, each with pros and cons:

(a) Write helper functions

You'd write helper functions or functors to convert from the Slice type to your native types, as efficiently as possible, for example:

const vector<SliceComplexD>&
toSlice(const vector<complex<double> >& vc)
{
    return reinterpret_cast<const vector<SliceComplexD>&>(vc);
}

vector<complex<double> >&
fromSlice(vector<SliceComplexD>& vc)
{
    return reinterpret_cast<vector<complex<double> >&>(vc);
}

Pros: - you can do it right now
- there should be no overhead
- full interop with other languages

Cons: - may not work (!) (depends on the representation of complex<double> with your compiler)
- need to add calls to toSlice and fromSlice in your code

You could also write a slower but safer and more portable version of toSlice/fromSlice, that uses copies instead of casts (note: the signature of these functions would change also).

(b) Add support in Slice for a special "cpp:complex" metadata, something like:

// Slice
["cpp:type:std::complex<double>"] struct ComplexD
{
     double real;
     double imag;
};

This way, the Slice type ComplexD would be mapped by the Slice compiler to std::complex<double>.

Pros: - you could use complex<double> in any kind of Slice types (sequences, structs, classes ...)
- Interop with other languages would be maintained (they just get this struct)

Cons: - need to update Ice (mostly slice2cpp)
- quite a bit of work just to support std::complex in C++

(c) Add support in Slice for serialization of C++ types, similar to http://www.zeroc.com/doc/Ice-3.4.0/manual/Slice.5.18.html

It could be something like:

//Slice
["cpp:serializable:SliceComplexDVectorHandler"]
sequence<byte> ComplexDVector;

where you'd provide the class "SliceComplexDVectorHandler" that serializes and deserializes a vector<complex<double> > to and from a byte buffer.

Pros: - would be generally useful, not just for std::complex types

Cons: - need to update Ice (mostly slice2cpp)
- no interop with other languages
- need to write a "handler" for every native type transmitted through Ice
- adds small overhead to each native type (the length of the byte sequence)

With respect to zero-copy, keep in mind that there is no zero copy on the client side: the parameters you provide are always marshaled into the "send buffer".

On the server/receiving side, for some types and on some platforms, you can read directly the data from the "request buffer". For example, you could read directly an array of doubles (on Intel x86). There is however no alignment guarantee--the doubles may not be aligned and as a result access to these doubles could be quite slow...saving this copy could actually slow down your application.

Best regards,
Bernard

sidney · March 2010

Hi Bernard,

Thanks for the detailed answer, I'll have to think a bit about the best way to go on this issue...

In the mean time, I'm seeing a significant bottleneck when I do this:

module Test {

    struct ComplexDouble {
        double re;
        double im;
    };

    sequence<ComplexDouble> ComplexDoubleArray;
    sequence<double> DoubleArray;

    interface Responder {
        int testComplexDoubleArray(ComplesDoubleArray values);
        int testDoubleArray(DoubleArray values);
    };

};

For some reason, transferring parameters of type ComplexDoubleArray is quite slow compared to e.g. transferring arrays of doubles.

Locally (two processes on the same machine), transferring a ComplexDoubleArray tops out at about 97 MByte/sec, while remote transfer (over 1 Gbit network switch) tops out at +/- 51 MByte/sec.

A similar test with a sequence<double> type shows a performance of +/- 840 MByte/sec (local process-to-process) and 96 MByte/sec over the network (close to the 1 Gbit/sec limit).

I am trying to understand why we see this sharp decrease in effective bandwidth between sequences of ComplesDouble vs sequences of plain doubles. Before diving into the bowels of Ice, do you perhaps have an idea on why this is so?

Best regards,

Sidney

mes · March 2010

Hi Sidney,

Ice uses some optimizations that help to improve the marshaling performance of primitive sequence types. The Ice protocol uses a little-endian encoding, so when your machine also uses a little-endian architecture, Ice can simply copy the bytes used by the sequence directly into its marshaling buffer with one call to memcpy. On the other hand, for a sequence of ComplexDouble, Ice copies each byte of each double individually into its marshaling buffer. A similar situation exists on the receiving side. Consequently, using a ComplexDoubleArray will certainly result in lower throughput than a DoubleArray.

I performed some experiments using Ice 3.4 and your Slice definitions on a RHEL 5.4 x64 dual core machine with gigabit LAN. Over the loopback interface, I was getting about 2150Mbps using a DoubleArray of 50K elements, compared to 1420Mbps for a ComplexDoubleArray of 50K elements.

I also tested over the LAN with a Windows XP client and the RHEL server. In this case I got 390Mbps for DoubleArray and 360Mbps for ComplexDoubleArray.

My results don't surprise me too much. Over the faster loopback interface, the extra effort required to marshal ComplexDoubleArray has more of an impact on throughput, whereas over the LAN this isn't as noticeable.

There's a much greater difference in your results, but it's difficult to say what might be causing it without knowing more details about your environment (Ice version, operating systems, compilers, etc.). It's also important to compile Ice and your test programs with optimization.

If you can post an archive of your test case, I'd be happy to take a look at it.

Regards,
Mark

sidney · March 2010

Hi Mark,

We're using modern Dell systems with dualcore Xeon processors, running Debian stable AMD64 with a 2.6.26 kernel and Ice 3.2 (this is the version packed with Debian stable, I haven't gotten round to installing v3.4 yet). The differences you report are much more according to expectation, I'd expect to take a small hit in throughput moving from doubles to ComplexDoubles, but not the factor 8 or so we're seeing.

If you could try my performance test that would be nice, it's here:

http://www.jigsaw.nl/SpeedTest-0.1.tar.gz

I am curious whether you can reproduce the slowdown I'm seeing.

Cheers, Sidney

PS it's necessary to set Ice.MessageSizeMax to something big to run the tests to completion via ICE_CONFIG, the program doesn't do that by itself.

mes · March 2010

Hi,

I used your example on the same hardware I mentioned earlier. Here are my results for the local case on the RHEL 5.4 x64 machine using Ice 3.4:

Test                 Peak (Mbyte/s)
------------------   --------------
ByteArray            395
DoubleArray          331
ComplexDoubleArray   211

And in the remote case with a Windows client:

Test                 Peak (Mbyte/s)
------------------   --------------
ByteArray            50
DoubleArray          50
ComplexDoubleArray   45

My systems aren't using the "latest and greatest" hardware, but that should give you a general idea.

Note also that raw throughput is only one piece of the puzzle. Applications usually do more than just ship the data around, in which case the time spent transferring the data is often insignificant compared to the time spent by the application in processing the data. Ice is designed to be "fast enough" for most use cases, and provides a lot of flexibility for applications with special requirements.

Hope that helps,
Mark

sidney · March 2010

Mark,

Thanks for trying my program, again your results look quite reasonable. I have installed v3.4 now and I see no real changes compared to earlier results in regard to raw throughput.

As you say, raw bandwidth isn't the only factor to be considered; in regard to the areas of complexity management of a distributed application and overall quality I am impressed so far with Ice, and the results apart from the ComplexDouble sequences also look good in terms of performance.

Our application involves a lot of movement of complexly structured data between a couple of CPUs, and we will need to have a good understanding of the performance implications of our data model. Tens of percents would be quite acceptable, but a factor of eight for moving doubles vs very simple structs cannot be ignored by us, it needs an explanation.

So, I will look into this issue some more and report findings if they are of general interest. Thanks for your help so far, it is appreciated.

Archived

handling complex numbers

Comments

Categories