Archived

This forum has been archived. Please start a new discussion on GitHub.

"Convenience Over Correctness" - The Case Against RPC

I have just read an interesting article by Steve Vinoski called Convenience Over Correctness. Steve talks about how he believes the RPC methodology to be outmoded, and not well suited to meat the demands of distributed systems. He says that RPC is used mostly because its convenient to programmers. This is in contrast to approaches like REST and message passing, which he looks at as more correct but underused because of the relative unfamiliarity when compared to the local call style of RPC.

He doesn't mention ICE, but he specially goes after CORBA and IDL based solutions because of the least common denominator issues between languages.

I am curious if Zeroc has any articles (perhaps a news letter) which talks about the relative advantages of distributed systems built with ICE when compared to REST or pure message passing ones (for example Erlang)?

Comments

  • kwaclaw
    kwaclaw Oshawa, Canada
    Game_Ender wrote: »
    I am curious if Zeroc has any articles (perhaps a news letter) which talks about the relative advantages of distributed systems built with ICE when compared to REST or pure message passing ones (for example Erlang)?

    Vinoski's article mentions REST and AMQP as the "modern" alternatives to RPC.
    Now, if one could prove that some subset of Ice (maybe dynamic Ice + one-way calling + IceStorm) is an equivalent to AMQP, one would have proven that Ice is not the kind of RPC that Vinoski argues against.

    Karl
  • I disagree with much of what Steve says in that article.

    He says that RPC is about convenience. Well, duh! Of course RPC is about convenience--I mean, if I want inconvenience, I can use sockets. The whole point of middleware is to provide a higher-level abstraction, so we get rid of all the sockets drudgery. To say that RPC focuses on convenience is stating the obvious: after all, if RPC wouldn't be convenient, I very much doubt that anyone would use it.

    The entire argument that RPC is "fundamentally flawed" doesn't fly either. Steve quotes an impedance mismatch between IDL and programming languages. He doesn't substantiate this other than by quoting a few problems with CORBA's by-value objects. These truly are a mess--a typical example of CORBA's design-by-committee problems.

    But OBVs are only the tip of the iceberg. Steve could have mentioned many other problems with CORBA IDL. Here are a few of them (among many others):
    • CORBA IDL's lack of exception inheritance makes structured error handling impossible.
    • CORBA IDL's lack of a dictionary type (one of the most frequently used types for application programming) forces awkward work-arounds, such as modelling a dictionary as a sequence of pairs.
    • CORBA IDL has three separate sequence abstractions (arrays, variable-length sequences, and length-limited sequences) even though variable-length sequences are all that is needed.
    • CORBA IDL has in-parameters, out-parameters, and inout-parameters. Because programming languages typically support only pass-by-value (or pass-by-value and pass-by-reference), this causes awkward mappings for inout-parameters.
    • CORBA IDL provides unsigned integer types that do not have natural mappings to Java and C# and can cause interoperability problems if a Java or C# program receives an unsigned integer over the wire that exceeds the range of a signed integer.
    • CORBA's split between narrow and wide chars and strings complicates APIs and causes serious problems at the protocol level because it makes the client-server interactions stateful.

    Steve then argues that language mappings are leaky abstractions and that CORBA code doesn't look natural either within the programm language itself, nor exactly like the IDL.

    The reason that CORBA code doesn't look like the IDL is that IDL has an inappropriate feature set at the wrong abstraction level. (For example, the IDL Any type requires a complex and arcane API; the same is true for inappropriate types such unions and fixed-point types.) The reason that code does not look natural within the programming language itself is due to the poor APIs provided by the language mappings. For example, CORBA's C++ mapping is well-known for its overly complex memory management, lack of exception and thread safety, lack of integration with STL, difficult-to-use dynamic invocation interface, and numerous other problems.

    So, on the one hand, we have an IDL that has an inappropriate feature set and inappropriate abstraction level and, on the other hand, we have a language mapping that is unergonomic and error-prone. Little wonder then that the combined effects of the two result in application code that looks like a dog's breakfast.

    If you look at the list of languages supported by Ice, you can see that they are quite different from each other. The list includes compiled languages, interpreted languages, and languages that are halfway in between, using just-in-time compilation. Some of these languages use explicit memory management (C++ and Objective-C) while the others use garbage collection. We have strict static typing (C++), static typing with introspection (Java, C#), loose typing (PHP), duck typing (Python and Ruby), and a mix of all of these (Objective-C). In fact, the languages supported by Ice represent pretty much the full design spectrum of modern programming languages. In terms of philosophy and idioms, these languages could hardly differ more from each other, yet, their Slice mappings do not suffer from any impedance mismatch whatsoever!

    Steve cites one example (CORBA IDL) to show that RPC is "fundamentally flawed". To support that argument, he chose the worst-designed IDL and language mappings in the history of middleware. He fails to point out that there are far better ways to do RPC that suffer none of the problems he quotes. Steve's argument is disingenious because it commits the fallacies of observational selection and suppressed evidence.

    In fact, his argument is formally invalid:
    • Premise: CORBA IDL is poorly-designed.
    • Premise: CORBA language mappings are poorly designed.
    • Conclusion: RPC is fundamentally flawed.

    The premises do not imply the conclusion. Just because CORBA got it wrong does not mean that RPC is flawed. Talk about throwing out the baby with the bath water...

    The problem is not RPC, the problem is not IDLs, and the problem is not language mappings. What is a problem are poorly-designed IDLs and language mappings. Ice suffers from none of these problems. It is convenient and correct.

    Cheers,

    Michi.
  • Game_Ender wrote: »
    He doesn't mention ICE, but he specially goes after CORBA and IDL based solutions because of the least common denominator issues between languages.

    Another thought specifically on this point...

    The idea of distributed computing is to allow communication among heterogeneous systems--different networking technologies, different operating systems, different programming language, and so on. Now, for the moment, let's put aside any thought of any specific technology, such as Ice or REST.

    At the most fundamental level, what does it take for two computers on a network to communicate with each other? First and foremost, it requires that the two parties can understand each other. In turn, that means they must use a common vocabulary. As long as they both agree on the vocabulary (syntax) and what they should do in response to each sentence they receive (semantics), things work.

    Now, you are reading this message of mine right now and you don't have a problem understanding what you are reading. That's because I'm using words on whose meaning we agree. If I suddenly use a word you don't know, such as "sesquipedalianism", communication is compromised: I've said something that you don't understand. (Please forgive me for assuming that you do not know the meaning of "sesquipedalianism". If you do know, imagine I had used a word you do not know.)

    What does this example show? It shows that, without a-priori agreement, we can't understand each other. In turn, this shows that, for us to have a meaningful communication, we must agree on what you called "a least common denominator".

    Here is the fallacy:

    Premise: RPC and IDL require a least common denominator.
    Conclusion: RPC is fundamentally flawed.

    Whether we use Ice, CORBA, REST, WCF, SOAP, Erlang, or whatever, all approaches require this least common denominator because, without it, we can't communicate. This is a truism that is completely independent of any specific technology. The "least common denominator" is not a disadvantage. Instead, it is a value-neutral fact of communication, and a value-neutral fact of distributed computing.

    To say that IDL is bad because we must agree on a set of types that are available and then map those types to APIs in each programming language misses the point completely: we must agree on a common set of types whether we use IDL or not, and we must provide APIs that allow us to manipulate those types no matter what. Whether the API is generated or hand-crafted is completely irrelevant.

    What really is going on here is a confusion between the type system that is shared among the components of a distributed system, and the APIs that are presented to developers in each programming language. Because things such as REST and WS do not specify APIs, people assume that those problems don't exist. That is not true. The API issue is there all along. It's just that REST and WS ignore it.

    Cheers,

    Michi.
  • kwaclaw
    kwaclaw Oshawa, Canada
    michi wrote: »
    Premise: RPC and IDL require a least common denominator.
    Conclusion: RPC is fundamentally flawed.

    Whether we use Ice, CORBA, REST, WCF, SOAP, Erland, or whatever, all approaches require this least common denominator because, without it, we can't communicate. This is a truism that is completely independent of any specific technology.
    This argument is pretty clear for most to understand, though it gets confusing when the "loose coupling" mantra is thrown into the discussion.

    What piqued my interest in Vinoskis article was rather this:
    For example, distributed systems typically require intermediaries to perform caching, filtering, monitoring, logging, and handling fan-in and fan-out scenarios. In large-scale systems, these intermediation services are “must haves” that ensure that the system will operate and perform as required. Unfortunately, RPC-oriented calls lack the metadata required to support intermediation because it’s simply not a concern for normal local invocations.
    However, I think this does not apply to Ice.

    Btw, the one thing I like about AMQP is that it clearly separates transport/messaging concerns from the type system, simply because it does not have one. It should be possible to layer any useful type system on top of AMQP.

    Karl
  • kwaclaw wrote: »
    This argument is pretty clear for most to understand, though it gets confusing when the "loose coupling" mantra is thrown into the discussion.

    The "loose coupling" mantra is a red herring. XML is no more loosely coupled than the Ice encoding. People endlessly confuse syntax with semantics and assume that, just because I can pull arbitrary XML off the wire, things are more loosely coupled.

    This simply is incorrect. The ability to parse a message without a-priori knowledge of its type does not couple things more loosely. For example, suppose I currently send an XML message that looks like this:
    <address>
        <housenumber>25</housenumber>
        <street>Smith St</street>
        <!-- ... -->
    </address>
    

    From looking at this, you will immediately recognize that this is some sort of address. Now, because XML has a predefined syntax, it's possible to pull this message off the wire and build the corresponding tree representation in memory. But, so what? All that means is that I now have the tree in memory, no more, no less.

    Now, let's make a change to the message for a new version of the system:
    <address>
        <number>25</number>
        <street>Smith St</street>
        <!-- ... -->
    </address>
    

    I've renamed the tag from "housenumber" to "number". That's the equivalent of renaming a Slice structure member. What do I have to do in order to deal with the new message? Well, I have to change the code. Where it used to look for "housenumber", it now has to look for "number" instead.

    Whether I use XML or Ice, either way, I have to change the code to accommodate the change. XML doesn't couple things more loosely than Slice.

    There is also the old argument that "I can version the system by adding new elements, and everything will be fine because old versions of client and server can just ignore the bits they don't understand".

    This argument is so flawed, it isn't funny. For one, most versioning problems cannot be solved by just adding new bits. Real-life versioning is far more complex and, more often than not, requires making incompatible changes. Moreover, what I don't know can be just as important as what I do know. For example, if someone sends me an XML purchase order that contains elements that I do not understand, would I act on it? I very much doubt it. For all I know, the unknown elements might state that the purchaser expects to get an 80% discount on my advertised price.

    Yet another flawed argument is that XML is self-describing. Again, this is simply nonsense. XML is not self-describing, and never will be. Here is why:
    <Addresse>
        <Hausnummer>25</Hausnummer>
        <Strasse>Smith St</Strasse>
        <!-- ... -->
    </Addresse>
    

    This is the exact same message as the earlier one, with identical semantics. Except that the tags are now in German instead of English. If you happen to speak German, the message appears to be self-describing. But, if you do not, the message is gibberish. In other words, there is nothing self-describing in XML other than syntax. The only reason client and server can make sense of such messages is that they have an a-priori agreement on the semantics of the tags.

    To drive this point home, here is the same message once more:
    <x>
        <y>25</y>
        <z>Smith St</z>
        <!-- ... -->
    </x>
    

    I've simply renamed the tags. If you want to understand this message, you must know, a priori, that "y" means "house number", and "z" means "street name". The message itself simply does not contain this knowledge.

    I cannot even infer the type information. For example, looking at the message, you might conclude that the house number is an integer. Seems like a fair-enough assumption, until we stumble across
    <housenumber>25a</housenumber>
    

    Oops, the house number is a string after all...

    Again, the understanding whether house numbers are strings or integers depends on an a-priori agreement. Whether that agreement is formally specified by something like IDL, or WSDL, or by simple good will does not matter. The agreement has to be in place a priori. If I change the type of the element's value without telling everyone else, I break the system. We are no more loosely coupled than we are with IDL or Slice.
    What piqued my interest in Vinoskis article was rather this:

    "For example, distributed systems typically require intermediaries to perform caching, filtering, monitoring, logging, and handling fan-in and fan-out scenarios. In large-scale systems, these intermediation services are “must haves” that ensure that the system will operate and perform as required. Unfortunately, RPC-oriented calls lack the metadata required to support intermediation because it’s simply not a concern for normal local invocations."

    Hmmm... Distributed systems "require intermediaries" that are "must haves"?

    As far as I can see, that is an assertion without substantiation. Last time I looked, there were tens of thousands of successful distributed systems around that got by without any such intermediaries.

    Now, I don't doubt that there are situations where intermediaries might be useful. I can come up with use cases where it's nice to be able to "cook up" the data somehow while it is in transit from one place to another. But does XML actually provide that ability?

    Let's think about this... Suppose I have some sort of message switch that performs caching, or otherwise does some sort of transformation on the XML that passes through it. Going back to the address example once more, suppose that there is a <country> element. If an address comes past that lacks this element, the intermediary can add a default that sets the country to "USA".

    So, what does the intermediary have to do? Well, it needs to inspect every XML message that comes past, look to see whether it contains an address element without a country element and, if that country element is missing, add the default. Easy.

    Does XML help with this? Does it mean that things are any more loosely coupled? Hardly. In order to transform the message, the intermediary must have type knowledge. It must know that there are such things as addresses, that they have a specific structure, that there is a country element that might need adding, and so on. In other words, to do its job, the message switch requires type knowledge. Or, in Steve's words, it requires metadata.

    Where does that metadata come from? Certainly not from the XML message itself, because it doesn't have any metadata. Instead, the knowledge must come from an external source, such as WSDL, or the knowledge might be hard-coded into the program. Regardless of where the knowledge comes from, it again constitutes an a-priori agreement as to the semantics of the message. That is no different from establishing a client-server contract with Slice. In other words, we are just as tightly coupled as ever.

    The intermediary argument confuses syntax with semantics. When Steve says that "RPC-oriented calls lack the metadata required to support intermediation", he is right: the metadata isn't inside the message, but comes from somewhere else. What he fails to see is that the XML does not contain that metadata either. In fact, whether the message is encoded as XML or using the Ice encoding is neither here nor there. In order to do anything useful with the message, I don't need to know just its syntax, I need to know its semantics. And neither XML nor Ice-encoded messages contain these semantics.

    The same is true for the logging argument. True, an intermediary can log XML messages. Just as an Ice intermediary can log Ice messages. But is either activity actually useful?

    If the XML intermediary does not have type knowledge, all it can do is log the raw XML message. But that's not very useful because, for logging to be useful, the data has to be cooked up in some way. But that is not possible without the metadata that the XML message does not contain.

    Whether to encode things in XML or binary is largely a matter of efficiency and bandwidth. As far as the semantics are concerned, the two are exactly equivalent.

    Now, clearly, intermediaries can be useful in distributed systems. No argument there. But XML does not make the job of creating such intermediaries any easier than a system using a binary protocol.

    Cheers,

    Michi.
  • kwaclaw
    kwaclaw Oshawa, Canada
    michi wrote: »
    The "loose coupling" mantra is a red herring...
    I agree with everything - didn't mean to imply that I disagreed.
    michi wrote: »
    Now, clearly, intermediaries can be useful in distributed systems. No argument there. But XML does not make the job of creating such intermediaries any easier than a system using a binary protocol.
    I think Vinoski implies that the metadata already present in the HTTP protocol make this easier. But Ice can attach metadata to a message as well.

    You should compile all your thoughts into a complete reply, if you didn't already. It seems Vinoski's article is largely quoted with no or little critique.

    Karl
  • Wow, thanks for the detailed response michi. It was the kind of thing I was hoping for.

    Another thing he mentioned is developers coding against a remote interface like its a local one. Then building there a distributed application in same way they would build a monolithic single process application. Vinoski seems to argue that by using different strategies, like message passing, you force the developer into think about the effects distributing his app really has. Leading to a properly distributed application.
  • I don't think message passing changes the picture in any substantial way: pass messages that are too fine-grained, and the problem is the exact same one.

    Cheers,

    Michi.
  • kwaclaw
    kwaclaw Oshawa, Canada
    michi wrote: »
    I don't think message passing changes the picture in any substantial way: pass messages that are too fine-grained, and the problem is the exact same one.
    Michi.

    The way I understood it was that the different syntactical constructs of a messaging API make it clear to the programmer that he is not dealing with normal function calls, but with a different thing altogether. However, I would not be too surprised if developers using a messaging API would try to hide it behind some wrapper class.

    For Ice, in order to obtain a proxy you also have to go through a few extra steps that you don't need for normal method calls, and that should make it clear that these calls are not local.

    Karl
  • matthew
    matthew NL, Canada
    I don't buy this argument. If the developer is poor, or inexperienced they will make mistakes and bad design decisions. Just because it is less convenient to make remote invocations, or because the invocations somehow look different doesn't mean the developer is more likely to create less fine grained interfaces.

    If you use AMI and AMD with Ice you get exactly this differentiation. A local call looks nothing like a remote invocation. However, I would hardly promote this differentiation for the reason Vinoski states.

    Simply put, as a programmer one has to be aware at all times the semantics, requirements and performance requirements on whatever object you make invocations. For example, if you are calling a local object there may be locking requirements that you have to take into account, and if you don't bad things (such as deadlocks) will occur. Whatever the form of the invocation, this requirement does not change.
  • kwaclaw wrote: »
    The way I understood it was that the different syntactical constructs of a messaging API make it clear to the programmer that he is not dealing with normal function calls, but with a different thing altogether.

    I wrote about this in one of my blog posts.
    However, I would not be too surprised if developers using a messaging API would try to hide it behind some wrapper class.

    This is the prime reason why Waldo's argument that remote and local calls should look different is flawed: the remoteness of calls is transitive so, before you know it, almost everything would have to look like a remote call, again removing any (dubious) usefulness of a syntactic marker.
    For Ice, in order to obtain a proxy you also have to go through a few extra steps that you don't need for normal method calls, and that should make it clear that these calls are not local.

    The argument that, just because a call is remote, it has to be treated differently from a local one stands on rather shaky legs. The reason is that local calls have many of the same failure scenarios as remote ones. It's just that local calls don't fail as frequently. But, when they do fail, not dealing with the error is just as catastrophic as not dealing with failure of a remote call.

    The fact is that, no matter what kind of operation I call, I must always be aware of the performance and failure characteristics of the call, regardless of whether the call is local or remote.

    Cheers,

    Michi.
  • Hmm,
    I could not agree more with Michi.
    His simplicity and clearness is really great.
    I am (well, was) a frequent visitor of theserverside.com and you can
    find a lot of misconception there about loose coupling, messaging, asynchronous calls etc.
    At the risk of being politically incorrect - but I like to say what I think - I guess the Vinoski was simply preparing the terrain for next generation of (old) *ix products, with new clothes, suitable for the next bandwagon (REST) over which to jump on.

    Guido
  • In 2005 or so, I looked around for a definition of "service" and a definition of "loose coupling". I couldn't find anything that defined either term (other than general hand waving).

    What worries me is that we have seen the entire industry jump on the "services" and "loose coupling" bandwagon without even knowing what these things are. After all, if I'm supposed to build a "service-oriented architecture", how am I going to know that this is what I've actually achieved without a definition of the meaning of the term and, therefore, without any tests to establish whether I'm actually "service-oriented" or "loosely coupled"?

    I have yet to see any proper (and testable) definition of what makes a system service-oriented or loosely coupled. I have also yet to see any definition that would allow me to (testably) distinguish a "message-oriented" system from an RPC-based system.

    It seems that the industry still believes that, by repackaging old ideas with new terminology, we can somehow solve our distributed computing problems.

    As I said elsewhere in the past, the reason we ended up with web services and XML on the wire is not that doing things this way is better. Instead, the reason is that that, at the time, CORBA was winning the distributed computing race, and Microsoft wasn't going to cede the market to a competitor. In the early 2000's, the web and XML were the fad of the moment, and it was easy to sell an RPC story based on the web and XML, which allowed MS to keep a foot in the door. And, of course, this provided a great opportunity for vendors to reinvent an old wheel, reinvent it poorly, and to sell lots of new products that, technically, were worse than the existing ones.

    Just because the web is good for distributed computing with a human at one end of the connection does not mean that it also is good for distributed computing with computers at both ends of the connection. As far as I can see, no-one has ever really examined why the web is as successful as it is, and how that might affect distributed computing between computers, instead of between humans and computers...

    Cheers,

    Michi.
  • michi wrote: »
    Just because the web is good for distributed computing with a human at one end of the connection does not mean that it also is good for distributed computing with computers at both ends of the connection. As far as I can see, no-one has ever really examined why the web is as successful as it is, and how that might affect distributed computing between computers, instead of between humans and computers...

    Cheers,

    Michi.
    Well, the saddest thing is that (SO)AP,XML,REST any other HTTP-based paraphernalia is largely used even in intranet environments.
    Even if I largely agree about CORBA unnecessary complications and faulty specs, I have to admit that used with a grain of salt it was the only viable solution for distributed computing.
    And it is still think it is the only alternative to other "cool" frameworks ;)
    Don't forget that, in the end, you wrote a wonderful book on that topic. With Steve Vinoski ;)
    As usual, the problem is not the gun but the man behind it.
    I bet that it is possible to find a lot of people out there that can make disasters even with ICE.

    Guido
  • ganzuoni wrote: »
    And it is still think it is the only alternative to other "cool" frameworks ;)
    Don't forget that, in the end, you wrote a wonderful book on that topic. With Steve Vinoski ;)

    Right, I haven't forgotten that. CORBA was the best thing going at the time. But CORBA got worse over time instead of better, and it took years of experience and learning to realize how all of the CORBA problems relentlessly added up to something that was very hard to use. (The run-away standards process didn't help either. I wrote about this at length in The Rise And Fall Of CORBA.)
    As usual, the problem is not the gun but the man behind it.
    I bet that it is possible to find a lot of people out there that can make disasters even with ICE.

    That is no doubt correct. Ice is a powerful tool and, as with any powerful tool, there are ways to use it incorrectly. Ice makes things as easy and efficient as possible, but that doesn't mean that it guarantees a successful distributed system. That still takes experience, insight, good design, and good implementation.

    As the old saying goes, "a good programmer can write FORTRAN in any language..."

    Cheers,

    Michi.
  • michi wrote: »
    Right, I haven't forgotten that. CORBA was the best thing going at the time. But CORBA got worse over time instead of better, and it took years of experience and learning to realize how all of the CORBA problems relentlessly added up to something that was very hard to use. (The run-away standards process didn't help either. I wrote about this at length in The Rise And Fall Of CORBA.)
    Got it !!
    I still remember when I was a happy Orbacus user and I was asking for a mean to let the application "register" and "deregister" objects in the ORB (it was in the 2.0 old days) and Matthew suggested the reading of a "new" proposal from Douglas Schmidt.....
    Apart from the well-described complications, POA concept is a corner stone for what a DOC framework should offer on the server-side.
    Client-side portable layer is another bright example of the good things CORBA has put on the table.
    These concepts are totally absent in SOAP-based specs and the marvelous J2EE.
    And no one seems to care about....

    Guido.
  • kwaclaw
    kwaclaw Oshawa, Canada
    michi wrote: »
    Just because the web is good for distributed computing with a human at one end of the connection does not mean that it also is good for distributed computing with computers at both ends of the connection. As far as I can see, no-one has ever really examined why the web is as successful as it is, and how that might affect distributed computing between computers, instead of between humans and computers...

    I think that is a crucial observation.
    The web/REST architecture might work if the computers on both end had some "artificial intelligence" that takes all kinds of context into account (incl. situation, cultural, ...) in order to interpret the incoming message unambiguously.

    The error in the REST thinking is that the bar for that AI is so low that it is practical to achieve today. In addition, the processing overhead would mean that such messages should be very high level and of coarse granularity (just like human communication), which they not always are.

    I do actually think that at some point we may have such style of computer to computer communications, but they will in part be used to dynamically establish and delegate to low level, strongly typed and efficient communications that for instance ICE provides.

    Karl