Passing Java Objects using Slice

brian · April 2004

Hi Again,

We are currently trying to convert our JNI layer to using ICE. One of our JNI routines is fairly generic in that it can take an array of the Java Object type with associated information on each object's type -- the type can be an int, double, or Map. The C++ JNI layer then processes each value in the array according to its type. The array of Object values can contain data of any of the 3 types mentioned.

Since there is no direct support for the Java Object type in Slice, was wondering if anybody had any suggestions on how to port this to ICE? The only thing we can think of at this point is to pass all data as strings and convert them on our C++ server side, but this would be an non-optimal soluation.

Thanks,

Brian

marc · April 2004

You could create a Slice abstraction using classes, that covers all Java data types you are using. For example:

class AnyType { };

sequence<AnyType> AnyTypeSeq;

dictonary<string, AnyType> AnyTypeDict; // Assuming your map key is a string

class IntType extends AnyType { int value; };

class DoubleType extends AnyType { double value; }

class MapType extends AnyType { AnyTypeDict value; }

You can then use operations like:

void foo(AnyTypeSeq values);

Note, however, that this code is not as efficient as if you would have a separate methods for each type, i.e., classes have some overhead. Please see the Ice manual for more details.

If you want to encode/decode the values by hand, then you should use a sequence<byte>, not a string.

brian · April 2004

Thanks for the reply Marc.

I agree about that I would like to avoid the overhead of using classes in this situation.

Why do you say that we should pass these as bytes instead of strings?

Is there any direct support in Ice for encoding and decoding sequences of bytes? (They only thing I found in the many is the IceUtil::Base64 class). We would have to be able to pass different data types between Windows and Unix, C++ and Java.

Brian

mes · April 2004

Hi Brian,

Marc recommended a byte sequence over a string because a byte sequence is more compact and easier to encode and decode.

You can encode to a byte sequence, but you'll have to use an undocumented class (IceInternal::BasicStream). You can find examples of this in the Freeze implementation.

Of course, the decoder will need some way of determining the next type to be decoded from the stream. In other words, each byte sequence must have a fixed format, or you must embed some type identifier in the sequence in order to signal the decoder about the next type.

Are you that concerned about overhead that you're willing to go to these lengths? Perhaps you should try the technique Marc described first?

Take care,
- Mark

brian · May 2004

Thanks for the reply, Mark.

I do not want to use classes for the reason that Marc pointed out -- overhead.

I understand what you say about not using strings for compactness reasons, but if I am converting doubles and ints to strings and then converting them back, its seems like that "encoding and decoding" is far more straightforward than trying to use BasicStream., n'est pas? Although maybe that is a naive question: when converting between different architectures like Windows and Unix, do you need to swap bytes or something when converting a string to a double or int when "unmarshalling" it?

The other type we would pass is Map, which would just be key/value pairs that are always strings.

Anyway, I'd love to get your feedback on a couple of ways I was thinking of doing this. (I've looked at the source for BasicStream, Buffer, and EvictorI.cpp in Freeze and tried to glom as much as possible without having documentation... )

Method 1: (passing bytes -- more efficient?)

in Slice:

enum DataType { Integer, Double, Map }

struct TypedBytes
{
DataType dataType;
Ice::ByteSeq dataBytes;
}

In C++ Server:

Ice::CommunicatorPtr communicator = Ice::Current.adapter.getCommunicator();
IceInternal::InstancePtr instancePtr = IceInternal::getInstance( communicator );
IceInternal::BasicStream basicStream( instancePtr.get() );
TypedBytes typedbytes;

....

basicStream.b.clear();

switch ( native type )

case double:
double value = ...
typedBytes.dataType = DataType.Double;
basicStream.write( value );
typedBytes.dataBytes = basicStream.b;
.....

In Java Client:

TypedBytesHolder typedBytes = ....
switch ( typedBytes.DataType )

case Double:
basicStream.read( (double)typedBytes.dataBytes.value )
double value = (double)typedBytes.dataBytes.value;

I have a feeling I am way off base on usage of BasicStream here, but I gave it my best shot.

Method 2: (passing strings -- easier implementation?)

in Slice:

enum DataType { Integer, Double, Map }

struct TypedBytes
{
DataType dataType;
Ice::StringSeq dataStrings; // no type conversion needed for generating key/value Map entries
}

In C++ Server:

case Double:
double value = ...
std::ostringstream os;
os << value;
typedBytes.dataStrings.push_back( os.str() );

In Java Client:

case Double:
Double.valueOf( typedbytes.dataStrings[0] )

Don't want you to write code for me, but just wondering what you thought of these implementations and if they are even on the right path.

Thanks again,

Brian

mes · May 2004

Hi Brian,

Ah, now I understand what you're thinking. I was under the impression, and probably so was Marc, that you were wanting to send multiple values in a single string or byte sequence, in which case the byte sequence has several advantages.

However, in the example you provided, it does seem like your second solution is easier, and also avoids reliance on undocumented Ice internals. As long as you can live with the potential loss of precision when transmitting doubles, this looks like the simpler approach.

Another approach that avoids the string conversion is to use a struct:

struct Value
{
    DataType type;
    int i;
    double d;
    StringMap m;
};
sequence<Value> ValueSeq;

The type member indicates which data member is used. Of course, there is still some overhead:

9 bytes wasted for an int
5 bytes wasted for a double
12 bytes wasted for a map

Hope that helps,
Mark

brian · May 2004

Sorry for the confusion, Mark. I should have articulated the problem better.

Why is there a loss of precision when encoding a double into a string and then decoding back into a double? (We are doing stock price manipulation, not rocket trajectories so we are probably okay with 5 or 6 digits to the right of the radix. We actually might go to regular floats. )

Brian

brian · May 2004

Hi Mark,

Thinking more about your suggestion of a mixed type struct...

Are you suggesting that because you think the cost of encoding/decoding doubles and int in/out of strings is more than the cost of sending the extra bytes in your solution? I know it depends on a number of factors, but just generally speaking.

Thanks again,

Brian

P.S. There is another reply I sent previous to this one.

mes · May 2004

Originally posted by brian
Why is there a loss of precision when encoding a double into a string and then decoding back into a double? (We are doing stock price manipulation, not rocket trajectories so we are probably okay with 5 or 6 digits to the right of the radix. We actually might go to regular floats. )

If you have a process that supplies enough precision for your needs, then that's fine with me. Personally I prefer the binary approach, just to know that all the bits are getting there.

Thinking more about your suggestion of a mixed type struct...

Are you suggesting that because you think the cost of encoding/decoding doubles and int in/out of strings is more than the cost of sending the extra bytes in your solution? I know it depends on a number of factors, but just generally speaking.

I doubt that the string conversion would be much of a burden, unless the volume was large.

Here's a similar strategy that reduces the wasted space of my previous example:

sequence<int> IntSeq;
sequence<double> DoubleSeq;

struct Value
{
    DataType type;
    IntSeq i;
    DoubleSeq d;
    StringMap m;
};
sequence<Value> ValueSeq;

If you can live with the use of a sequence when you only need to transfer a single integer or double, then this strategy results in the following overhead:

3 bytes wasted for an int
3 bytes wasted for a double
2 bytes wasted for a map

Mark

mes · May 2004

Actually, you could also eliminate the DataType enumeration from my last example, because the selected member is implied by the non-empty sequence or map. That saves another byte in each Value instance.

Now it may not be critical that you save every last byte, in which case you'd have to decide whether the savings is worth a less "obvious" solution.

Take care,
- Mark

michi · May 2004

I would strongly recommend to go with the class-based solution that Marc recommended. That solution is clean, with strong typing, and expresses the problem domain appropriately. If you go with some trick to encode the values into byte sequences (or other sequences), at the very least, that obscures the issue.

The overhead added by the class-based solution will be unnoticeable in all but the most performance-critical situations. I would convince myself first that my application really is that close to my performance limits before considering other approaches.

The encoding into byte sequences is unlikely to gain you anything whatsoever because, in effect, that bypasses the marshaling/unmarshaling code in Ice, but your application code is highly unlikely to get the marshaling/unmarshaling done any quicker than Ice.

Cheers,

Michi.

Archived

Passing Java Objects using Slice

Comments

Categories