Archived

This forum has been archived. Please start a new discussion on GitHub.

IceStorm questions

Hello!
I have the task of evaluating the features offered by IceStorm. ( :confused: )

My working environment is the following: we must have one (or more) publisher sending a sequence of control messages to a certain number of subscribers. Such a sequence must be received with the same order it had on the sender side.

Looking at the results of the first tests, it seems that the behaviour of IceStorm is characterized by the following points:

1) if the parameter "Ice.ThreadPool.Server.SizeMax" is equal to one, the messages are sent to subscribers with the right (publishing) order, but the maximum length of the internal server "queue" seems to be really short and only a few hundred of messages can be kept in memory: this issue blocks the publisher if the subscriber is a slow receiver.

Otherwise, if we set the "Ice.ThreadPool.Server.SizeMax" parameter to a value greater than one, there are no problems related to the queue size, but the ordering info is lost in the received messages.

2) It seems that under certain conditions, if some of the tasks which are connected to a IceStorm server die, the server itself "hangs" and stops doing its duties. So, for an external process which wants to monitor such a task it is impossible to know whether it is working fine or not.

3) When we kill the publisher and subscriber processes during their execution, it seems that the resident memory footprint of the IceStorm server does not decrease, keeping up to hundred of megabytes of OS' memory without releasing it.

I would like to know if we focused correctly on the behaviour of IceStorm, and if there is the chance to make its internal queue bigger for the 1-threaded case: in such a way we could grant the ordering is preserved without the need for a multithreaded environment.



Best regards and thanks in advance,
Mandi, Fabio

Comments

  • benoit
    benoit Rennes, France
    Welcome to the forums!

    Regarding 1), the default delivery mode for IceStorm is to use oneway messages. Oneway are not guaranteed to be delivered in order unless you set the thread pool size to 1 for the IceStorm service and your subscribers. We've added recently 2 additional delivery modes: "twoway" and "twoway ordered". You should use "twoway ordered" when subscribing, you can specify this with the "reliability" QoS:
        IceStorm::QoS qos;
        qos["reliability"] = "twoway ordered";
    

    Note that this feature isn't documented yet. This will be documented in the next release (we might also change the name of the reliability and the default delivery mode).

    For 2), if for some reasons a subscriber hangs or is too slow to dispatch incoming messages from IceStorm, the IceStorm publisher will eventually block to wait for this subscriber. You can use timeouts to ensure that the service won't block for too long in cases where the subscriber hangs or crash. You can for example set this timeout in the service configuration with: Ice.Override.Timeout=5000

    For 3), can you check if this is still happening after setting the timeout in the service configuration? The service shouldn't leak of course and if that's the case we'll investigate!

    Hope this answers your question ;)

    Benoit.
  • Still does not work ;P

    Hello Benoit.
    It seems that using the "twoway ordered" reliability mode doesnt suffice: we have tested such a property in two systems and things still seem not to work.


    I will give you some additional information about our tests (I will try not to exceed with unuseful info) with the help of some code:

    a) I set up with slice the following interface:
    module sMsgPublisher{


    struct smsg{
    string Data;
    int seqId;
    //some other string stuff here
    };

    interface blackBoard{
    void messageInBlackBoard(smsg mymsg);
    };


    }


    b) I made a publisher and a subscriber, extending the Ice::Application class.

    c) In the implementation for my subscriber class, blackBoardI, I added the following code:
    void messageInBlackBoard(...,smag & inMsg){
    static int check=0;
    if (check!=inMsg.seqNum) cerr << "Bad seq order"<<endl;
    }


    obviously This code works correctly when the subscriber is started before the publisher starts with its uploads and there is only one publisher.

    When the subscriber is slower than the publisher, and there are several messages (thousands) envolved in the transmission, there are ordering problems at the subscriber side: for example, if we work with messages having size of 1kB, the 0,5% of them is affected by ordering problems.


    Do you agree with this behaviour? ..
    I hope to have been quite simple in the problem exposition. Thanks for the quickness of your answer. ;)

    Bye, F.&M.
  • benoit
    benoit Rennes, France
    It's not clear to me what your test is doing. It ensures that the received message always have "seqNum" attribute equal to 0? This will probably not be the case if your subscriber subscribes after the publisher starts (IceStorm doesn't store any messages, your subscriber will receive only messages which arrive after it subscribed).

    In any case, you shouldn't get out of order messages with the "twoway ordered" reliability QoS, if you have a small test case that can reproduce the problem I'd be happy to look into it!

    Benoit.
  • benoit
    benoit Rennes, France
    Btw, we have a test suite to test the different reliability QoS and the ordering of received messages for the "twoway ordered" QoS. It's located in the Ice-2.1.0/test/IceStorm/single directory. I've just tried after adding a small delay in the SingleI::event method to simulate a slow subscriber and it appears to work just fine.

    Also, note that you need to use Ice-2.1.0 for "twoway ordered" to work (it wasn't implemented in earlier version).

    Benoit.
  • you said:
    benoit: It's not clear to me what your test is doing. It ensures that the received message always have "seqNum" attribute equal to 0? This will probably not be the case if your subscriber subscribes after the publisher starts (IceStorm doesn't store any messages, your subscriber will receive only messages which arrive after it subscribed).
    Right. sorry I should have said exactly the opposite: first the subscriber starts and then the publisher goes.. :)
    ... However i will send you my full code in the next minutes and I will look at the example you mentioned :)
    Bye Fabio
  • Hello Benoit.
    Here you can find the code which causes problems:

    http://mio.discoremoto.virgilio.it/forice/

    Try to start publisher first so the topic will be created into the database.

    Then kill publisher (if you want restart icebox) and then start with the subscriber.

    Notice that even if you disconnect both publisher and subscriber, the size of the process for icestorm is really big in memory (i guess this is due to garbage collection)

    If you want to try with different speeds for the subscriber, simply change the usleep value in the implementation blackBoardI.cpp.

    We really hope we have made some mistakes in the code because the ordering problems we get are not deterministic and seem to depend on the system we are testing (in a Celeron + Fedora Core 3 things seems to be better than in a Xeon+RH AS 4.0 )


    Regards, F & M
  • benoit
    benoit Rennes, France
    Your publisher is publishing events using a oneway proxy. Dispatch of oneway requests by a server (IceStorm in this case) can occur out of order (see the Ice manual for details about this). Remove the following code in your publisher:
        if (!pub->ice_isDatagram())
          pub = pub->ice_oneway();
    

    to use a twoway publisher proxy instead. This should hopefully fix the problem!

    Benoit.
  • OK,i will try to fix the problem in such a way, but that seems contradicting with the manual, page 1166 (41.5.2).

    In that page the manual says it is reccommended for a publisher to use a oneway proxy because (it says) iceStorm tries to send every published message as fast as possible.

    (I cite the manual:"Therefore, a publisher gains nothing by making a twoway invocation on the publisher object ....")

    However tomorrow morning I will try that fix.
    You are really fast! Thank you again ;)
    F.
  • marc
    marc Florida
    We will look into the issue with the manual, and make any necessary corrections.

    By the way, you can also run IceStorm in thread-per-connection mode. Then there is no way that oneways can be dispatched out of order, because IceStorm would assign one dedicated thread to each client (= publisher). This assumes that there is only one connection from each publisher to IceStorm, which is usually the case, except if you publish events with different timeouts or different protocols.

    Put this into your IceStorm configuration for testing:

    Ice.ThreadPerConnection=1

    Note that thread-per-connection is not documented yet. This is on our todo list :)
  • better now

    I've tried with the last suggestion and it seems that working with twoway is REALLY better(for now I cannot be really sure but after 12000 messages t). Tomorrow i will set up some intensive tests for both the twoway communication method and the 1-thread connection.
    :) Thanks again , Fabio :D
  • marc
    marc Florida
    One drawback of the thread-per-connection method is that IceStorm will allocate one thread per subscriber and one thread per publisher. This can cause scalability issues.
  • Good (but how does the Garbage Collection Works?)

    I've tried on 2 different machines the twoway server communication and the use of Ice.ThreadPerConnection! Now things are really good :) !
    Sending up to 100k messages of 2Kb from a fast publisher to a slow subscriber there were no reported sequence errors ;)

    The only thing that seems a bit strange is the behaviour of the icebox process handling the iceStorm service: when both publisher and subscriber were disconnected the memory footprint remains really big (100-300 MB) :eek: .

    This seems to be OK when they (Pub and Sub) are connected, but the process size decreases really slow, about 200Mb/hour, when they no longer exist.

    Thanks for the quick and useful advices,
    F. and M.
  • benoit
    benoit Rennes, France
    I'm glad it's working as you expect now, I will look into this memory issue... stay tuned!

    Benoit.
  • benoit
    benoit Rennes, France
    Did you set the timeout for the IceStorm service? You should use timeouts to ensure that Ice will eventually detect the death of some of your subscribers.

    After some extensive testing, I'm pretty sure there's no memory leaks in IceStorm. It's however a bit difficult to explain the memory usage pattern of the service under Linux, I think there's many factors here that influence the memory usage: IceStorm doesn't immediately release the memory upon failure of a subscriber, the memory pool used by the C++ allocator, how the process memory is managed by the OS, etc. If you try your program on Windows for example, you'll see that the memory decreases much faster than on Linux.

    In any case, you should see the IceStorm process memory eventually decreases or stabilizes at some point (assuming the subscribers can keep up with the flow of incoming messages).

    Benoit.
  • Too Slow ...

    Yes, I've tried with diferent timeout values (from 5 to 1000 seconds).

    The problem in my opinion is related to the effective velocity IceStorm releases the memory after both subscriber and publisher crashed or unsubscribed.

    As i told, my tests was based on a "flood" of messages from a fast publisher and a relatively slow subscriber (when the first publishes lots of messages and the latter reads them relatively slow), but after some execution time i use to stop the publisher...

    I trust on the absence of memory leaks, but in my work environment the icestorm server should respond faster. Unfortunately i must use Linux as resident OS and I cannot test it on Win :|

    Hovever, I am quite sure about this behaviour because this happens both with the examples that i sent you yesterday and modifing the clock example contained in the examples (If you want I can send you more sources, but I guess you observed the phenomenum).

    (Certainly the memory management process depends on the host OS because I can notice some relevant differences between my 512MB Laptop and the 1GB Xeon I Used for tests, but both seems slow when deallocating unneeded memory)

    I apoligize for my terrible English ;)
  • benoit
    benoit Rennes, France
    Yes, I've observed this behavior as well.

    I'll see if IceStorm can be a bit more aggressive on the reaping of dead subscribers but even if this is changed I don't think you'll see the memory decrease instantaneously. I think this is simply out of our hands and depends on the implementation of the C/C++ libraries and how the OS manages the process memory.

    So even if IceStorm deallocates the memory a bit sooner, this doesn't mean that the memory attributed by the OS to the process will be given back immediately (unless you perhaps explicitly call malloc_trim(), you could actually also add calls to malloc_stats() to convince yourself that the memory is properly freed by the code). I think you have to trust Linux doing the right thing here with respect to the process memory management ;).

    Benoit.
  • xdm
    xdm La Coruña, Spain
    FAQ Linux Memory Management

    I'm agree with beniot that this is a issue with linux memory managament


    This is an overview of Linux memory managment FAQ

    Overview of memory management

    Traditional Unix tools like 'top' often report a surprisingly small amount of free memory after a system has been running for a while. For instance, after about 3 hours of uptime, the machine I'm writing this on reports under 60 MB of free memory, even though I have 512 MB of RAM on the system. Where does it all go?

    The biggest place it's being used is in the disk cache, which is currently over 290 MB. This is reported by top as "cached". Cached memory is essentially free, in that it can be replaced quickly if a running (or newly starting) program needs the memory.

    The reason Linux uses so much memory for disk cache is because the RAM is wasted if it isn't used. Keeping the cache means that if something needs the same data again, there's a good chance it will still be in the cache in memory. Fetching the information from there is around 1,000 times quicker than getting it from the hard disk. If it's not found in the cache, the hard disk needs to be read anyway, but in that case nothing has been lost in time.

    yo can read all the article here http://gentoo-wiki.com/FAQ_Linux_Memory_Management
  • I don't agree with xdm's theory

    I know that Linux Kernel can decide when it is better to free the memory given for every process . That is related to its internal politics about memory handling, which can be more or less aggressive, but should not be related with the behaviour we are observing.

    For example, if I have my own server written in C keeping 200 mb of ram and then I release 100 Mb, there will be a short transitory time, but soon I will see 100 Mb as free, even if the OS decide to have them in its cache.

    I'm really sure about this, because many of my programs behave in this manner.

    This does not depend on the process. The user world is not the kernel world.
  • A general-purpose memory manager can, in general, not return memory to the OS even if the application frees the memory. That is because processes get memory under UNIX by making a brk() system call, which adjusts the end of the data segment. (sbrk() is a C library function that calls brk().) So, if any memory is given back to the OS, it can be given back only if that memory is at the end of the data segment. However, due to memory fragmentation, freed memory is usually not at the end of the data segment, but somewhere in the middle. As long as any used memory follows unused memory, none of the memory can be removed from a process's virtual address space. In addition, at least some older kernels implement brk() as a no-op if it is used to reduce the size of the data segment, and adjust a process's segment register only if brk() is called to increase the size of the data segment. (SVR4 on MIPS processors used to do this.)

    Regardless, the net effect is that, if you look at the process size with ps or top or a similar tool, you will generally see the process size increase, but not decrease over time.

    However, with demand-paged operating systems, physical memory will be reused by other processes as soon as an unused portion of memory spans an entire page. So, if you free up, say, 40 kbytes of memory, your process will stop using that memory and, if there is any demand for physical memory by other processes, your 40 kbytes worth of memory will be given to other processes quite quickly.

    This means that, on a demand-paged OS, such as Linux, the working set size of a process (the amount of memory it has used recently) is more important than the size of the process's virtual memory. And, more importantly, the size of virtual memory of a process is a poor indicator of the process's actual memory use because much of the virtual memory may never have been faulted in and consume no physical memory whatsoever (such as when a process allocates a large sparse array). And, even though virtual memory, once freed, tends to stay within a process's address space, it may also not consume physical memory.

    The main implication of seeing a large virtual memory size for a process is that the process will consume that much swap space (which rarely is a problem).

    Cheers,

    Michi.