Archived

This forum has been archived. Please start a new discussion on GitHub.

Ice 3.5.1 Java: Fix connection handling on (unsupported) POSIX compliant platforms

Ice 3.5.1 improved the way the Java mapping handles connections asynchronously. As part of this change, it removed support for synchronously successful connections, which was in the code up until and including 3.5.0.

Ice 3.5.1 always assumes non-blocking TCP connections to be opened asynchronously and therefore always relies on select (java.nio.channels.Selector, which based on the platform might use poll, epoll or kqueue) to complete the connection process.

POSIX allows non-blocking connect to succeed immediately, see connect:
If the connection cannot be established immediately and O_NONBLOCK is set for the file descriptor for the socket, connect() shall fail and set errno to [EINPROGRESS].

The Java API documentation for java.nio.channels.SocketChannel also states, that this is compliant behavior, see SocketChannel (Java Platform SE 7 )
If this channel is in non-blocking mode then an invocation of this method initiates a non-blocking connection operation. If the connection is established immediately, as can happen with a local connection, then this method returns true. Otherwise this method returns false and the connection operation must later be completed by invoking the finishConnect method.

The behavior introduced in Ice 3.5.1 results in hanging connections, in case the OS decides to make a connection succeed immediately (on FreeBSD, which is the platform I discovered this on, this only happens for connections to local interfaces and depends on resources available at the time). Ice is stuck in select indefinitely, emitting "spurious selector wake up" warnings.

The attached patch aims to correct this behavior, so that these immediately established non-blocking connections work like expected. Applying it allows Ice Java to pass all unit tests, not much testing has been done beyond this.

To apply the source patch Attachment not found. to a fresh Ice 3.5.1 source distribution:
cd Ice-3.5.1
patch -p1 < ice351javaconnect.patch.txt

- Michael

Comments

  • benoit
    benoit Rennes, France
    Hi Michael,

    Thanks for the information. We refactored this code with Ice 3.6b and it now also check for the return value of the connect method.

    Cheers,
    Benoit.
  • Other POSIX platforms

    Dear Benoit,

    Are other POSIX platforms affected? The MacOSX tcp(4) manpage indicates that it's also using the FreeBSD implementation, which would imply this can be an issue on Darwin. Is Linux also affected?


    Thanks,
    Roger
  • benoit
    benoit Rennes, France
    Hi,

    The Ice 3.5.1 official supported platforms are not affected (https://www.zeroc.com/platforms_3_5_1.html). I believe this is more an issue with the selector implementation of the JVM. Most JVM implementations handle fine the registration of a socket for OP_CONNECT even if the connection got established without blocking.

    Michael, did you get this issue on any other platforms than FreeBSD?

    Cheers,
    Benoit.
  • Hi Benoit,

    As far as I can tell this is not about the selector implementation, I looked at this in detail in the OpenJDK source code and the implementation is sound. I tried reproducing the issue on MacOS X using the Oracle JDK and on Linux (even though I didn't try too hard) and couldn't. The difference between these systems was always and only what the connect call returned.

    On Mac OS X connect always returns 'connection pending', no matter what [caveat: my test weren't completely thorough, but I wasn't able to provoke a single non-pending/direct connection], so select has always a state change to report and therefore won't get stuck, while on FreeBSD connect returns 'connection pending' for local connections as long as there are sufficient CPU resources (it happens less often with higher ncpu) and starts returning direct connections if the system gets slightly busy (basically one process per CPU), which is in line with both POSIX and JDK specifications. It's those direct connections that derail Ice 3.5.1 (all versions up to 3.5 are ok) as it's waiting for selector to report a state change that already happened.

    Cheers,
    Michael
  • benoit
    benoit Rennes, France
    Hi Michael,

    You're right it's also probably mostly dependent on the system. In any case, this is something which will be fixed with Ice 3.6.0, if the socket connects immediately we won't register it anymore with the selector.

    Cheers,
    Benoit.