Archived

This forum has been archived. Please start a new discussion on GitHub.

possible file-handle leaks?

Over the past year or so, one of our production Ice-3.2.1 deployment servers(red-hat 2.6.9-67.0.7.ELsmp) will intermittently start leaking file-descriptors. The frequency is rare, but once it starts, it continues to leak descriptors (at a rate of about 800/24hrs), eventually forcing us to bounce the services.

I've spent hours and hours looking into this leak, with the intent of finding the offender in our code. To no avail - no orphaned jdbc connections, files, etc. We do no Runtime.exec() nor java.nio work, traditionally culprits with orphaned file-descriptors. The lsof command should make it easier to track down such issues, but its output does not seem to help. A sample of the orphaned file-descriptors (from lsof) is included below. Ice makes extensive use of the java.nio.channels.Selector and associated constructs - I'm wondering if you've ever run into an intermittent bug in your java networking stack which would orphan file descriptors as specified below. Note that netstat does not show lots of socket connections - we just have several thousand file-descriptors like the ones below.

Thanks

Dirk

java 20678 iceuser 320u sock 0,4 15174164 can't identify protocol
java 20678 iceuser 321r FIFO 0,7 15174167 pipe
java 20678 iceuser 322w FIFO 0,7 15174167 pipe
java 20678 iceuser 323r 0000 0,8 0 15174168 eventpoll
java 20678 iceuser 324w FIFO 0,7 15180537 pipe
java 20678 iceuser 325r 0000 0,8 0 15180538 eventpoll
java 20678 iceuser 326w FIFO 0,7 15377407 pipe
java 20678 iceuser 327u sock 0,4 15218695 can't identify protocol
java 20678 iceuser 328r FIFO 0,7 15218698 pipe
java 20678 iceuser 329u sock 0,4 15196493 can't identify protocol
java 20678 iceuser 330r FIFO 0,7 15196496 pipe
java 20678 iceuser 331w FIFO 0,7 15196496 pipe
java 20678 iceuser 332r 0000 0,8 0 15196497 eventpoll
java 20678 iceuser 333u sock 0,4 15207312 can't identify protocol
java 20678 iceuser 334r FIFO 0,7 15207315 pipe
java 20678 iceuser 335u sock 0,4 15183104 can't identify protocol
java 20678 iceuser 336r FIFO 0,7 15183107 pipe
java 20678 iceuser 337w FIFO 0,7 15183107 pipe
java 20678 iceuser 338r 0000 0,8 0 15183108 eventpoll
java 20678 iceuser 339w FIFO 0,7 15207315 pipe
java 20678 iceuser 340r 0000 0,8 0 15207316 eventpoll
java 20678 iceuser 341w FIFO 0,7 15218698 pipe
java 20678 iceuser 342r 0000 0,8 0 15218699 eventpoll
java 20678 iceuser 343w FIFO 0,7 15292396 pipe
java 20678 iceuser 344r 0000 0,8 0 15292397 eventpoll
java 20678 iceuser 345u sock 0,4 15048874 can't identify protocol
java 20678 iceuser 346r 0000 0,8 0 15377408 eventpoll
java 20678 iceuser 347r FIFO 0,7 15048877 pipe
java 20678 iceuser 348w FIFO 0,7 15048877 pipe
java 20678 iceuser 349r 0000 0,8 0 15048878 eventpoll
java 20678 iceuser 350u sock 0,4 15291771 can't identify protocol
java 20678 iceuser 351r FIFO 0,7 15291774 pipe
java 20678 iceuser 352u sock 0,4 15216746 can't identify protocol
java 20678 iceuser 353u sock 0,4 15205816 can't identify protocol
java 20678 iceuser 354r FIFO 0,7 15205819 pipe
java 20678 iceuser 355r FIFO 0,7 15201056 pipe
java 20678 iceuser 356w FIFO 0,7 15201056 pipe
java 20678 iceuser 357r 0000 0,8 0 15201057 eventpoll
java 20678 iceuser 358w FIFO 0,7 15205819 pipe
java 20678 iceuser 359r 0000 0,8 0 15205820 eventpoll
java 20678 iceuser 360r FIFO 0,7 15216749 pipe
java 20678 iceuser 361w FIFO 0,7 15216749 pipe
java 20678 iceuser 362r 0000 0,8 0 15216750 eventpoll
java 20678 iceuser 363w FIFO 0,7 15291774 pipe
java 20678 iceuser 364r 0000 0,8 0 15291775 eventpoll
java 20678 iceuser 365u sock 0,4 15378259 can't identify protocol
java 20678 iceuser 366u sock 0,4 15389044 can't identify protocol

Comments

  • benoit
    benoit Rennes, France
    Hi Dirk,

    Ice for Java 3.2.1 does indeed extensively use java.nio.channels.Selector objects. However, if connections are correctly closed (which appears to be the case for you as netstat only reports few connections), these selectors should be correctly destroyed as well.

    I did a bit of research on this and found two interesting links:

    Basically, it looks like the JVM creates per-thread selectors to handle blocking I/O. If the threads are destroyed (which could be occurring if you use Ice dynamic thread pools or make invocations from short lived threads), these per-thread selectors "leak" until a full GC occurs (which under normal circumstances should occur rarely).

    Could this be what is occurring in your case? Do you use dynamic thread pools or make invocations from short lived threads? If you use dynamic thread pools, you could try to increase to a higher value the initialize size of the thread pool to reduce the number of dynamic threads created by Ice.

    I realize that's probably not something you can do in the short term, but upgrading to Ice 3.3.0 would most likely help here. It doesn't use any blocking I/O anymore and only creates few java.nio.channels.Selector (it only creates one per thread pool now).

    Cheers,
    Benoit.
  • Thanks for the response.

    We do indeed make many short-lived requests and the size of the thread-pool hosting these requests is not large enough to accomodate normal request fluctuations - so the Ice runtime ends up creating/reaping many threads. We will up the minimum size of the thread-pool at the next possible opportunity, and see if the problem resolves itself.

    I had seen the java bug, but dismissed it because we are running 1.6. But certainly if it persisted from 1.4 through 1.5, it could still be present in 1.6. Certainly the profile and lsof output matches our case.

    Thanks

    Dirk