Archived

This forum has been archived. Please start a new discussion on GitHub.

About Ice.MessageSizeMax...

Hello all,

I have a question about Ice.MessageSizeMax.

If I set the limit to say, 1 GiB (1073741 in ice.config), when a server is about to send a message, does it internally allocate the right size for the message (to the limit of 1 GiB) or does it always allocate a buffer of 1 GiB?

Reading the source code, I've seen the upper limit is hard coded to 2 GiB. Is that correct?

Comments

  • bernard
    bernard Jupiter, FL
    Hi Hanz,

    Ice.MessageSizeMax is just a safety mechanism, a way to prevent a rogue client or server from sending your program a message that requires a huge memory allocation. If this doesn't matter in your environment, you can disable this feature by setting Ice.MessageSizeMax to a large value.

    Ice always allocates marshaling buffers as needed, and the value of Ice.MessageSizeMax doesn't affect how these buffers are allocated (until the MessageSizeMax is reached).

    In the Ice Protocol, the size of each message is sent as a Slice int (signed 32-bit integer), so the upper limit of MessageSizeMax is 2^31 or 2GiB.

    Cheers,
    Bernard
  • Thanks a lot for your answer which is very accurate.

    I am still playing with Ice as I want to understand how the memory usage of Ice works.

    I've seen in the documentation that there is a garbage collector. What does it "collect" ? I work in C++ and most of the memory seems to be smart pointed, ergo I would think that only identities are collected.

    Which brings me to my second point. I've made a little test program (well actually 3). I send a map filled with longs and strings or longs (for example std::map<long, std::string> or std::map<long, long>). I fill them with pseudo-random data.

    I send these maps to a second program, through ice. Let's call it the "data server". There is an ice object with methods to receive the maps.

    Basically you create your object proxy and fill the maps through various methods.

    a->setmap1(map1);
    a->setmap2(map2);

    etc.

    Then I spawn "stressers" to query the content of the maps. They simply "get" the ice object and perform a "getmap" on it.

    In this scenario we have various executable performing a very large amount of Ice queries on the same object. The query result in an Ice map being created, filled with the values and transmitted over the network.

    Everything works fine. (So why don't I just shut up? :) ) My only concern is that I have witnessed a very high virtual memory usage with Sysinternals Process Explorer (regular task explorer shows nothing). By very high I mean 1.5+ GiB when the whole object is only 100 MiB big. If I hit the 2 GiB limit, Ice dies saying it could not allocate memory in Buffer.cpp (that makes sense, in Windows 32-bit you cannot have more than 2 GiB of user memory per process).

    The memory usage does not look like a leak on my side. The virtual memory usage "jumps", from 520 MiB to 1 GiB at some point during the execution.

    The working set remains withing very logical values (around 300 MiB). That means there is a large amount of virtual memory which seems to be uncommited.

    Is there some kind of mechanism within Ice that would explain such high virtual memory usage?
  • I've seen in the documentation that there is a garbage collector. What does it "collect" ? I work in C++ and most of the memory seems to be smart pointed, ergo I would think that only identities are collected.

    The garbage collector specifically deals only with classes that have cyclic references. It ensures that, if Ice marshals a graph of class instances that contains one or more cycles, the cycles are eventually reclaimed when there are no more references to the graph from the application code.

    Other than for such graphs, all memory is reclaimed via reference counting.

    You can read about the garbage collector in more detail in the article "Who's Counting?" in Issue 25 of Connections.
    My only concern is that I have witnessed a very high virtual memory usage with Sysinternals Process Explorer (regular task explorer shows nothing). By very high I mean 1.5+ GiB when the whole object is only 100 MiB big.

    So, each map instance contains 100MB of data?

    During marshaling and unmarshaling, Ice temporarily requires twice the memory that is contained in the map. For example, when marshaling (or unmarshaling) a 100MB map, the map is copied into marshaling buffers for transmission/receipt. Once a request has been sent/received, that memory is reclaimed immediately.
    Is there some kind of mechanism within Ice that would explain such high virtual memory usage?

    There is nothing in Ice that would specifically cause high memory consumption. It is difficult to say more without seeing the test program you used. If you could post the code and let us know exactly how you ran client(s) and server(s) and what property settings you used, we can look into this further.

    Cheers,

    Michi.
  • Thanks a lot, for your input, these are great answers.

    I think I know what's going on.

    If I have 16 clients that each require, say, 100 MiB of data and if I ice is able to run 4 threads, this means at a given time I will have 200 MiB of data per client in memory => 800 MiB. The heap manager heuristics of Windows will grow the heap to a very large size and AFAIK never reduces it. In ice.config the buffer size is 50 MiB.

    This is clearly a problem. I'm going to run more tests and post my demo code.

    Solution : you need to write your own allocator within Ice that calls VirtualAlloc instead of the operator new (which calls HeapAlloc).

    I've read in a connections issue that Ice does zero copy on IA32. When is it active?
  • dwayne
    dwayne St. John's, Newfoundland
    Hi,

    In order to have server side zero copy you need to use the array or range custom sequence mapping for your sequences. For example
    sequence<byte> ByteSeq;
    
    interface foo
    {
        void bar(["cpp:array"] ByteSeq seq);
    }
    

    See Section 6.7.4 of the manual for more info.

    Dwayne
  • If you are measuring your process size by looking at the memory reported by task manager, you are going to get misleading figures. To get an accurate measurement of the working set size, you may find this FAQ useful.

    Cheers,

    Michi.
  • Thanks guys, it's great to see you are actually reading this and giving great feedback.

    I use Sysinternal process explorer which give accurate results. I wrote several drivers for Windows NT, don't worry I know task manager is a liar. ;)

    But when the trace of ice says "cannot allocate memory in buffer.cpp", I think it's quite self explanatory...

    I've read the source code of Ice. If I'm correct Ice only use "malloc". I think it would be better to either write your own allocator using VirtualAlloc or to create a heap for each thread with HeapCreate (with the Low fragmentation bit, which is disabled by default). Granted, this would require some #ifdef here and here.

    If all the threads use the same heap the OS might decide to grow this heap too much (due to fragmentation or intensive requests), leaving too little memory for other purposes.

    In my test I have 16 clients asking the server for a lot of data, and the server runs on a quad core.

    I'll do my best to create a test program, I do these Ice experiments during my free time which is dwindling due to lots of requests from my customers. ;)
  • I've done some more tests.

    This is a Windows/Ice bug. On one hand you can say it's a Windows bug, on the other hand you can say it's an Ice bug. :)

    On a 32-bit platform, frequent calls to ::HeapAlloc may result in a very large heap and the virtual memory might be exhausted. ::HeapAlloc is what is called when you do ::malloc or a new.

    When serializing/deserializing a very big structure (or making many small serialization/deserializations) this results in many new/delete that make the heap grow. You can see that because process explorer shows a large virtual memory usage but a relatively low amount of committed memory.

    There is a thread here about a program exhibiting the same issues.

    Solutions
    • Rewrite parts of the code to avoid "many small news".
    • Write your own allocator
    • Microsoft solution : temporary heap that is trashed once the work is done (most simple one I think). For example when you deserialize your structure you would create a heap from which you allocate everything you need, once done, you trash it.
    • For Vista there is a flag that prevents that problem from happening if I'm correct.

    On 64-bit Windows the problem is not present because the address space is much larger than the memory you actually use.

    This problem might also be present on other operating systems.
  • Thanks for the update. We have looking at this issue on our todo list for the next release.

    Cheers,

    Michi.
  • That's great to hear!

    I had another idea. I'm trying to switch all the heaps to "low fragmentation heap". I'll let you know if this improves the situation or not. In theory memory usage will be a bit higher at the beginning, but it should grow slower.

    The code for switching to low fragmentation heap might look like this (you have to add some more checks) :
    std::vector<HANDLE> heaps;
    
    heaps.resize(1);
    DWORD heapsCount = ::GetProcessHeaps(0, &heaps[0]);
    heaps.resize(heapsCount);
    heapsCount = ::GetProcessHeaps(static_cast<DWORD>(heaps.size()), &heaps[0]);
    
    ULONG  HeapFragValue = 2;
    
    std::for_each(heaps.begin(), heaps.end(), 
    	boost::lambda::bind<BOOL>(&HeapSetInformation,
    		boost::lambda::_1, 
    		HeapCompatibilityInformation, 
    		&HeapFragValue, 
    		sizeof(HeapFragValue)));
    
  • xdm
    xdm La Coruña, Spain
    Hi Hanz,

    maybe you are interesting in this malloc replacments, they don´t require changes to your code or Ice code.

    The Hoard Memory Allocator
    GooglePerformanceTools - google-perftools - Google Code
  • Thanks a lot, I've had a look at Hoard and it might indeed solve the problem. It's quite an interesting piece of software, I'll give it a shot if I have time.
  • I am happy to inform you that using the Low Fragmentation Heap (LFH) solves the problem.

    The code I posted (or an equivalent) has to be called once all the heaps of the calling process are created. The reduced fragmentation will lower the pressure on the virtual memory.

    If your memory usage remains dangerously close to 2 GB, you might try another allocator such as Hoard, as suggested by xdm. Otherwise the conclusion is simply that you need to switch to 64-bit... (I really don't recommend using the /3GB switch).

    From a middleware point of view, I think the best course of action remains to write a custom allocator, for example based on a short-lived heap (for serialization/deserialization).

    Hope this helps.