Archived

This forum has been archived. Please start a new discussion on GitHub.

Is serialization zero copy?

Hi everyone,

in the manual I read that the size limit of an ICE message is 1MB, because in the process of serializing the SLICE-objects and sending them over the network a copy of the objects is created.

My question is, whether the copy happens in the serialization or in the network stuff.

This bothers me, because I presume there is a way to serialize any SLICE-object and write it to a file and because this object is that big that I dont have enough memory for a copy.

All replies are welcome.

Best regards,
Markus

Comments

  • marc
    marc Florida
    You can configure the size limit to whatever value you wish. See this FAQ:

    http://www.zeroc.com/faq/requestSizeLimit.html

    Ice is not zero-copy. Doing so is difficult and in many cases impossible. (Ice-E has some limited zero-copy functionality, which will also be added to Ice 3.1.)

    If you have to transfer a lot of data, the recommendation is to not to transfer the data with a single call. Please see this FAQ for more details:

    http://www.zeroc.com/faq/fileTransfer.html
  • Hi marc,

    thanks for your answer, but I think I have to provide some more details.

    - I've found the two faq's earlier and that was the reason for my question.

    - I understand that there is a size limit for objects sent over the network in one call and why

    - I understand that zero copy with network transfer is hard to implement and not really needed (at least by me)

    - Why I came up with this question is, because I don't like to have two different approaches for object serialization in my server (one for objects sent over the network via ICE and one for objects sent to a flat file) and I want (mis)using the ICE serialization capabilities to write a object to a file.

    - I know there's freeze provided by ICE which does the quite exact thing, but I think the overhead for storing one of my objects to a berkeley db will kill performance at all, because my objects are large 100MB to 10GB per object and contain only some sequences of long's.

    Example:

    module My {

    sequence<long> SequenceLong;

    class MyObject {
    SequenceLong index;
    };

    };

    - Now I create an object of this class and put one billion longs into the sequence. The memory usage should be slightly over 8GB. This object will never be sent over the network anywhere.

    - Then I want to save this object to a flat file. To do that I would like to use the ICE capabilities to serialize this Object to a FileOutputStream (BTW: I hope I'm right that this is possible).

    - And here finally comes my question. Does the serialization of an ICE object to an FileOutputStream create a copy of the object? This would be a problem, because I can't spend the additional 8GB memory.

    I hope this helps understanding my problem.

    Once again, all replies are welcome.

    Best regards,
    Markus
  • Hi Markus,

    The answer to your last bullet point (above) is yes -- serialization for sequence<long> ultimately occurs in IceInternal::BasicStream::write(const vector<Long>& v) [ICE/C++] which does a direct copy from the vector's buffer to an internal buffer class (Container) as follows :-
    void
    IceInternal::BasicStream::write(const vector<Long>& v)
    {
        Int sz = static_cast<Int>(v.size());
        writeSize(sz);
        if(sz > 0)
        {
    	Container::size_type pos = b.size();
    	resize(pos + sz * sizeof(Long));
    #ifdef ICE_BIG_ENDIAN
    	const Byte* src = reinterpret_cast<const Byte*>(&v[0]) + sizeof(Long) - 1;
    	Byte* dest = &(*(b.begin() + pos));
    	for(int j = 0 ; j < sz ; ++j)
    	{
    	    *dest++ = *src--;
    	    *dest++ = *src--;
    	    *dest++ = *src--;
    	    *dest++ = *src--;
    	    *dest++ = *src--;
    	    *dest++ = *src--;
    	    *dest++ = *src--;
    	    *dest++ = *src--;
    	    src += 2 * sizeof(Long);
    	}
    #else
    	memcpy(&b[pos], reinterpret_cast<const Byte*>(&v[0]), sz * sizeof(Long));
    #endif
        }
    }
    

    Just offering my opinion here... I'm not sure why you want to use ICE to persist an 8GB internal vector of longs. I'd only model this is in slice if I wanted to transport the vector via a middleware call or wanted Freeze to persist it. There's no other reason to use ICE for this... Since you have such a basic serialization (iterate over a vector of longs) you could get zero-copy by handrolling your own in this case. I'd be sure to flush often to ensure 8GB was committed properly though. I have a similar need and I store a base value (64bit) + vector of offsets from this (8 to 16 bit) which allows > 80%diskspace reduction on average, which is perhaps another reason handrolling your own might be beneficial in your case.

    HTH
  • matthew
    matthew NL, Canada
    Not to mention the cost of byte swapping if you happen to be on a big endian machine.

    Ultimately it comes down to what you want to do with this data and what kind of guarantees you want as to the reliability and availability of the data.