Archived

This forum has been archived. Please start a new discussion on GitHub.

Documentation bug (C++ argument calculation order && exception safety)

Hi.

I found a really strange thing in documentation:
It is also worth noting that the preceding code does not contain a memory
leak.6 It is OK to pass the pointer returned by new as an argument to add:
CommunicatorPtr comm = c.adapter->getCommunicator();
c.adapter->add(new PhoneEntryI(name, phNum),
comm->stringToIdentity(name));
This does not cause a problem because add’s formal parameter type for the
servant is const Ice::ObjectPtr&, and ObjectPtr has a single-argument
constructor that accepts an Object*

But surely, this code contains potential memory leak. C++ doesn't restrict argument calculation order, so it can be executed in such sequence:

1. new PhoneEntryI(name, phNum);
2. comm->stringToIdentity(name));
3. ObjectPtr(PhoneEntryI *)

if step (2) generates exception (i look at stringToIdentity code and see that this is really possible), it causes a memory leak.

Please, fix it, at least, in documentation :)

Comments

  • Hi Andrew,

    I think that (unfortunately), you are right, this could indeed happen :mad:

    I'll fix the documentation. Sadly, the undefined order of evaluation of function arguments is a constant source of subtle and hideous bugs. And, as far as I can see, defining the order of evaluation as left-to-right would not close any significant optimization opportunities, but would make life for programmers a lot easier.

    Quite a lot of people have complained about this many times in the past. See this comp.lang.c++.moderated post for one example.

    Thanks for pointing out this bug!

    Cheers,

    Michi.
  • This is OT but as Hyman himself points out, there _are_ cases where it isn't so easy to determine which way the order of evaluation must proceed:
    Order of evaluation in new expression - comp.lang.c++.moderated | Google Groups

    I guess this is one of those endless C++ quirks that will never die...
  • michi wrote: »
    I'll fix the documentation. Sadly, the undefined order of evaluation of function arguments is a constant source of subtle and hideous bugs. And, as far as I can see, defining the order of evaluation as left-to-right would not close any significant optimization opportunities, but would make life for programmers a lot easier.

    Thanks for answer.

    I think, that this feature can really help compiler. Keep in mind, that with intensive inline expansion you can really achieve great perfomance benefit from instruction mixing (f.e., instruction pairing on modern CPU's). So, i think, this is really important feature.

    BTW, about performance.. We had compared proxy invocation latency with ICE and OmniOrb. In most cases, OmniOrb beats ICE for about 3 times :( And, more over, ICE generates slightly more traffic, than orb... So, i think, you should think about optimizations - surely, proxy invocation can be much faster ;)
  • benoit
    benoit Rennes, France
    Hi Andrew,
    Andrew S wrote: »
    BTW, about performance.. We had compared proxy invocation latency with ICE and OmniOrb. In most cases, OmniOrb beats ICE for about 3 times :( And, more over, ICE generates slightly more traffic, than orb... So, i think, you should think about optimizations - surely, proxy invocation can be much faster ;)

    Performance is one of our top priority! However, comparing Ice and omniORB is a bit like comparing apples with oranges :).

    Ice provides many more features (e.g.: AMI, AMD, background IO) than OmniORB and this comes at the price of a more elaborate concurrency model. OminORB concurrency model is much simpler than Ice's and surely it allows greater performances. You should compare omniORB with Ice/E instead which has a similar concurrency model. I expect Ice/E to perform as well as omniORB if not better.

    Btw, under which circumstances did Ice generate more traffic? On which platform did you run these tests and which Ice version did you use? I'm a bit surprised by Ice being 3 times slower. Could you elaborate on the tests you used to measure this?

    Cheers,
    Benoit.
  • benoit wrote: »

    Performance is one of our top priority! However, comparing Ice and omniORB is a bit like comparing apples with oranges :).

    Ice provides many more features (e.g.: AMI, AMD, background IO) than OmniORB and this comes at the price of a more elaborate concurrency model.

    Benoit, thanks for answer.

    Yes, I understand, that support for ami etc should slow down invocation in some cases. But, surely, threading model (based on select) is not best solution on the Windows, and, i sure again, you know it :)

    And I keep in mind, that twoway empty invocation should have comparable performance in all brokers, so I think, that other (first, I think, "thick" proxy itself) bottle neck in this pattern is an effective working with network layer: call packing into protocol and effective protocol itself. But proxy... you can easily compare quality of a proxy realisation with a simple test: run client and server parts of the latency test on the same host. You should see great difference in CPU load/per call...
    benoit wrote: »
    OminORB concurrency model is much simpler than Ice's and surely it allows greater performances. You should compare omniORB with Ice/E instead which has a similar concurrency model. I expect Ice/E to perform as well as omniORB if not better.

    May be, you right. But (in our tests) throughput performance and latency of invocations in ICE 3.2(native, C++, VC 9.0) is comparable with WCF... I expected much better performance, expecially in simple (two way invocation) cases from ICE, really...
    benoit wrote: »
    Btw, under which circumstances did Ice generate more traffic? On which platform did you run these tests and which Ice version did you use? I'm a bit surprised by Ice being 3 times slower. Could you elaborate on the tests you used to measure this?

    I have already describe one of our tests (throughput) in this topic:
    http://www.zeroc.com/forums/help-center/3603-thread-pool-performance-500-clients.html

    Second test was a latency of an empty/one generic parameter twoway call from a server to a number of a clients. When only one client connected results identical with the ICE latency test.

    If you want exact numbers, i can post it, but with throughput pattern ICE provides for about 5-6% greater network load, than corba or wcf.
    We did not measure traffic in a latency tests/empty calls, but I think, that difference will be at least the same. In latency test ominORB outperforms ICE for about 3 times, and have much better latency degrade trend with higher count of the simultaneous connections.
  • Andrew S wrote: »
    Thanks for answer.

    I think, that this feature can really help compiler. Keep in mind, that with intensive inline expansion you can really achieve great perfomance benefit from instruction mixing (f.e., instruction pairing on modern CPU's). So, i think, this is really important feature.

    On this point, I would beg to differ. The optimization opportunities through reordering of function arguments are minimal and will rarely, if ever, make any difference. But, even if there were some optimization opportunities, it would still be better to evaluate function arguments in a defined order. That's because no optimization opportunity is worth the cost if it regularly introduces latent and very hard-to-find bugs. After all, a programming language is created for programmers to use, not for compilers to compile...

    The reason for the undefined evaluation order is simply historical and dates back to the dim distant past of C (thirtyfive years now), and there was even less reason for undefined evaluation order back then than there is now. Historically, the undefined order did not come about to preserve optimization opportunities (that was a justification after the fact). Instead, original C never specified it and, by the time the standard was written, different compilers did it in different orders. Seeing that the job of a standard is to codify existing practice, the standard did not mandate an evaluation order.

    There is no shortage of experts who agree the undefined order is harmful. Here is what Bjarne has to say about it:

    The value of j is unspecified to allow compilers to produce optimal code. It is claimed that the difference between what can be produced giving the compiler this freedom and requiring "ordinary left-to-right evaluation" can be significant. I'm unconvinced, but with innumerable compilers "out there" taking advantage of the freedom and some people passionately defending that freedom, a change would be difficult and could take decades to penetrate to the distant corners of the C and C++ worlds.

    Cheers,

    Michi.
  • michi wrote: »
    There is no shortage of experts who agree the undefined order is harmful.

    And, on other side (in opposing), there is a number of C++ compiler makers, that should think not about developers, but about quality and size of resulting code :) More over, it is main paradigm of C++ - 'do not pay for what you do not use'. And when i see generated with global optimizations code, I understand, that this freedom in C++ is really necessary. Thats my opinion, but it does not apply to my initial message - bugs, surely, are independed from our opinion ;)
  • Andrew S wrote: »
    And, on other side (in opposing), there is a number of C++ compiler makers, that should think not about developers, but about quality and size of resulting code :)

    The question really is whether the optimization is significant enough to make it worth adding the increased bug risk for developers. In this particular case, I would argue no, because the potential for damage is far greater than the doubtful benefits of optimization, in particular when I consider that this particular problem routinely has bitten the unwary for the past three decades.

    Consider the following:
    void foo(const TPtr&);
    

    I can call this like so:
    foo(new T);
    

    and everything will be well.

    Now consider:
    void foo(const TPtr&, int);
    

    I can call this like so:
    foo(new T, 99); // No problem
    

    But, if I call it almost the same way, I have a potential problem:
    foo(new T, bar()); // Might leak memory
    

    This is truly hideous for a number of reasons.

    For one, it is completely counter-intuitive that the following two code fragments do not do the same thing:
    TPtr t = new T;
    foo(t, bar());
    

    This is not the same as:
    foo(new T, bar());
    

    That in itself is extremely ugly because most programmers looking at the two code fragments will conclude "They do the same thing."

    But the real issue is the hidden maintenance problems. Consider the following code, which may have been working flawlessly for many years:
    foo(new T, bar());
    

    Now someone changes the implementation of bar such that it can throw an exception. All of a sudden, I have a potential memory leak. Worse, that leak will show up only if I happen to use a compiler that reorders the code in just the right way. And, because such code reordering typically only happens when optimization is enabled, but not for debug builds, even tools such as Purify are of limited help in tracking down the bug.

    And we haven't even considered the difficulty of making bar throw during testing yet (which can be very difficult for some functions).

    But, it's not just the implementation of bar that can cause this problem. Instead, it can be caused by any function that bar calls recursively. In other words, if I add a throw statement to the implementation of bar or any of the functions called by bar, I have to examine the entire call graph to see whether there might be a memory allocation and a call to bar (or any of the functions called by bar that now can throw an exception) without an intervening sequence point.

    Doing this is next to impossible for large projects. And we haven't even considered yet that bar, or one of the functions called by bar, might be implemented in a third-party library that can change in arbitrary ways.

    Even a statement such as the following is not entirely safe:
    foo(new T);
    

    That's because, in the header file, I might have:
    void foo(auto_ptr<T>, int = bar());
    

    The undefined evaluation order of function arguments is also in conflict with exception safety. The whole point of exceptions is that I can throw them without having to worry about memory leaks, but the undefined evaluation order subverts that, even though I'm carefully using smart pointers.

    Searching the web for relevant articles reveals that this issue (and the related issue of undefined evaluation of expressions that do not contain a sequence point) has bitten countless thousands of programmers time and time again. I wrote the broken code you pointed out myself, and I have been programming in C++ for twenty years now. I'm not a guru, but I consider myself quite experienced. Yet, I still wrote the bug (despite having known about the undefined evaluation order for more than two decades).

    If it is that easy to write broken code, the problem is with the language, in my opinion. (In particular because the actual bug is highly unlikely to show up, except in rare and exotic circumstances.) The evaluation order is a carefully hidden trap that the language puts in the programmer's path so he can impale himself on the poisened spikes at the bottom...
    More over, it is main paradigm of C++ - 'do not pay for what you do not use'.

    The question is whether I really would pay in that case. As Bjarne says, he remains unconvinced, and so remain a lot of other people.
    Thats my opinion, but it does not apply to my initial message - bugs, surely, are independed from our opinion ;)

    Absolutely! That's why it'll be fixed in Ice 3.3 :)

    And thanks again for reporting the bug!

    Cheers,

    Michi.
  • michi wrote: »
    The question really is whether the optimization is significant enough to make it worth adding the increased bug risk for developers. In this particular case, I would argue no, because the potential for damage is far greater than the doubtful benefits of optimization, in particular when I consider that this particular problem routinely has bitten the unwary for the past three decades.

    Consider the following:

    ...

    If it is that easy to write broken code, the problem is with the language, in my opinion. (In particular because the actual bug is highly unlikely to show up, except in rare and exotic circumstances.) The evaluation order is a carefully hidden trap that the language puts in the programmer's path so he can impale himself on the poisened spikes at the bottom...

    Michi, yes, i agree that this behaviour can cause some problems, especially for newbie developers. But, in my opinion, it can be easily resolved by defining corporate coding standarts, as many other C++ "dark corners"...

    For example, this simple code can cause very strange behaviours:
    #include <string>
    
    std::string generate_part_one(); // can cause exception
    std::string generate_part_two(); 
    
    int main()
    {
        buffered_file f ("1.txt"); // buffers some data internally, dtor flushes all buffered data
        f << generate_part_one(); 
        f << generate_part_two();
    }
    
    If you try this code on a different compilers, you will see that dtor of the buffered_file will be called or not be called, depending on compiler... And this behaviour strictly confirms to ISO :)

    Btw, are there any plans to optimize ice thread pool under windows? IOCP etc?
  • For example, this simple code can cause very strange behaviours:
    #include <string>
    
    std::string generate_part_one(); // can cause exception
    std::string generate_part_two(); 
    
    int main()
    {
        buffered_file f ("1.txt"); // buffers some data internally, dtor flushes all buffered data
        f << generate_part_one(); 
        f << generate_part_two();
    }
    
    If you try this code on a different compilers, you will see that dtor of the buffered_file will be called or not be called, depending on compiler... And this behaviour strictly confirms to ISO :)

    Now you have me curious :) Care to enlighten me why the destructor may not run?
    Btw, are there any plans to optimize ice thread pool under windows? IOCP etc?

    It's something we will look at eventually, but no promises as to the time frame.

    Cheers,

    Michi.
  • benoit
    benoit Rennes, France
    Hi Andrew,
    Andrew S wrote: »
    Benoit, thanks for answer.

    Yes, I understand, that support for ami etc should slow down invocation in some cases. But, surely, threading model (based on select) is not best solution on the Windows, and, i sure again, you know it :)

    Yes, replacing the select based model with IOCP is on our TODO list. IOCP will improve scalability but I don't think it will improve much the latency and most likely it will still be slower than blocking sockets with a thread per connection concurrency model (like Ice-E and omniORB).
    And I keep in mind, that twoway empty invocation should have comparable performance in all brokers, so I think, that other (first, I think, "thick" proxy itself) bottle neck in this pattern is an effective working with network layer: call packing into protocol and effective protocol itself. But proxy... you can easily compare quality of a proxy realisation with a simple test: run client and server parts of the latency test on the same host. You should see great difference in CPU load/per call...

    Sorry, it's not clear to me what you mean here.
    May be, you right. But (in our tests) throughput performance and latency of invocations in ICE 3.2(native, C++, VC 9.0) is comparable with WCF... I expected much better performance, expecially in simple (two way invocation) cases from ICE, really...

    I have already describe one of our tests (throughput) in this topic:
    http://www.zeroc.com/forums/help-center/3603-thread-pool-performance-500-clients.html

    Second test was a latency of an empty/one generic parameter twoway call from a server to a number of a clients. When only one client connected results identical with the ICE latency test.

    If you want exact numbers, i can post it, but with throughput pattern ICE provides for about 5-6% greater network load, than corba or wcf.
    We did not measure traffic in a latency tests/empty calls, but I think, that difference will be at least the same. In latency test ominORB outperforms ICE for about 3 times, and have much better latency degrade trend with higher count of the simultaneous connections.

    How do you measure the network load? Which concurrency model and configuration did you use for Ice to do these measurements? Which omniORB version?

    In any case, I'd be happy to look at the numbers. We compared some time ago performances of Ice, Ice-E and omniORB. At the time, Ice-E latency performed as well as omniORB and Ice latency was about 1.2 times slower. There's one test where omniORB performed better by a larger factor: the sending of large byte sequences. omniORB implements zero-copy on the client and server side where Ice/Ice-E only implements it on the server side. In this test, ominORB was up to 2 times faster than Ice.

    So I'm still surprised that Ice performed 3 times slower for you... It would be great if you could detail some more your tests. The best would be to post a small self-compilable test case that demonstrates this (I would open another thread as this doesn't really have much to do anymore with the original subject :)).

    Cheers,
    Benoit.
  • michi wrote: »
    Now you have me curious :) Care to enlighten me why the destructor may not run?
    .

    :) from C++ ISO/1998:
    15.3/9

    If no matching handler is found in a program, the function terminate() is called; whether or not the
    stack is unwound before this call to terminate() is implementation-defined.

    So, you must write your main in that sample, for example, in such manner:
    int main()
    {
      try
      {
        // impl
      }
      catch (...)
      {
        throw;
      }
    }
    

    Really dumb, heh?
  • benoit wrote: »
    How do you measure the network load? Which concurrency model and configuration did you use for Ice to do these measurements? Which omniORB version?
    So I'm still surprised that Ice performed 3 times slower for you... It would be great if you could detail some more your tests. The best would be to post a small self-compilable test case that demonstrates this (I would open another thread as this doesn't really have much to do anymore with the original subject :)).

    I'll post exact numbers in my old thread (http://www.zeroc.com/forums/help-center/3603-thread-pool-performance-500-clients.html), ok?