Archived

This forum has been archived. Please start a new discussion on GitHub.

IceInternal::ThreadPool::destroy assertion failure

We are debugging a crash in an application that uses Ice 2.1.0 for Windows XP. When in debug mode, we get the following stack:

msvcr71d.dll!_NMSG_WRITE(int rterrnum=10) Line 195 C
msvcr71d.dll!abort() Line 44 + 0x7 C
msvcr71d.dll!_assert(const char * expr=0x1013eb44, const char * filename=0x1013eb30, unsigned int lineno=152) Line 306 C
ice21d.dll!IceInternal::ThreadPool::destroy() Line 152 + 0x2c C++
> ice21d.dll!Ice::ObjectAdapterI::waitForDeactivate() Line 282 C++
rgsender.exe!`HprCore::SessionIce::removeAdapter'::`2'::AdapterDeactivationThread::run() Line 397 + 0x31 C++
iceutil21d.dll!startHook(void * arg=0x011d41f0) Line 203 + 0x1d C++
msvcr71d.dll!_threadstartex(void * ptd=0x011b3ab8) Line 241 + 0xd C
kernel32.dll!77e7d28e()
ntdll.dll!77f589f2()

Also, in the same file (ThreadPool.cpp), we sometimes see an infinite loop where an exception is printed out at line 352 - this seems to be the product of ::select repeatedly failing. This occurs when we shut down sockets outside of the program or unplug network cables.

Do you have any suggestions/insights/ideas as to why this might be happening?

Comments

  • Could these errors be related to bad socket state? In both cases, the errors are related to ungraceful connection loss. Is the socket in such a bad state that the EventHandler is unable to process events and the repeated ::select results in error?
  • mes
    mes California
    Hi Roland,

    What do you mean when you say "This occurs when we shut down sockets outside of the program"?

    Is your application manipulating file descriptors directly?

    Also, it would be useful for us to see the exception printed by line 352.

    Take care,
    - Mark
  • For defect 2 mentioned above (the infinite loop), here is the output.

    .\ThreadPool.cpp:352: Ice::SocketException:
    socket exception: WSAENOTSOCK
    error: exception in `afe516b0-aa9a-4474-a851-d0adfb6a2039.ThreadPool':
    .\ThreadPool.cpp:352: Ice::SocketException:
    socket exception: WSAENOTSOCK
    error: exception in `fd057a0a-0f6a-442c-92b8-a4f885eecb81.ThreadPool':
    .\ThreadPool.cpp:352: Ice::SocketException:
    socket exception: WSAENOTSOCK
    error: exception in `8c18d909-36e1-4b0e-8fb1-c962cf843f99.ThreadPool':
    .\ThreadPool.cpp:352: Ice::SocketException:
    socket exception: WSAENOTSOCK
    error: exception in `afe516b0-aa9a-4474-a851-d0adfb6a2039.ThreadPool':
    .\ThreadPool.cpp:352: Ice::SocketException:
    socket exception: WSAENOTSOCK
    error: exception in `fd057a0a-0f6a-442c-92b8-a4f885eecb81.ThreadPool':
    .\ThreadPool.cpp:352: Ice::SocketException:
    socket exception: WSAENOTSOCK
    error: exception in `8c18d909-36e1-4b0e-8fb1-c962cf843f99.ThreadPool':
    .\ThreadPool.cpp:352: Ice::SocketException:
    socket exception: WSAENOTSOCK
    error: exception in `afe516b0-aa9a-4474-a851-d0adfb6a2039.ThreadPool':
    .\ThreadPool.cpp:352: Ice::SocketException:
    socket exception: WSAENOTSOCK
    error: exception in `fd057a0a-0f6a-442c-92b8-a4f885eecb81.ThreadPool':
  • Hi Mes, we unplug network cables or use an application called TCPView to close down specific connections. The behavior appears to be random. Sometimes we don't see it for a long time.
  • mes
    mes California
    Thanks Roland, we'll look into it. As always, if you think of anything else that might help us reproduce or resolve the problem, please let us know.

    - Mark
  • Hi Mes, I just got debuggable Ice going and and have a breakpoint set on line 347 and it appears that we're stuck in an infinite loop in ThreadPool::run as select always come back with a return value = -1 and error of WSAENOTSOCK.

    We're using tcpview to close down the connection from another system. Not sure this is relevent, but here what I'm doing.

    This is a bi-dir connection that shouldn't be used at the time of the disconnect.

    Prior to deactivating the object adapter hold, waitForHold, and deactivate had been invokd on the object adapter.

    The object adapter has a per object adapter thread pool of size 1.

    This is a tcp connection.

    Active connection management is disabled, which is the reason I'm invoking waitForDeactivate. I don't think that the internal sockets were being cleaned up unless waitForDeactivate was invoked when ACM is disabled.

    deactivate and waitForDeactivate are actually invoked from a small Ice thread that only has these commands as its body. The reason being is that when a catastrophic failure occurs, like a socket being closed or network cable being pulled, adapter->waitForDeactivate took a little while to return/timeout. I didn't want my program to be waiting on these comands, which I use to invoke sychronosely. So now I invoke hold, waitForHold, but then spin deactivate and waitForDeactivate in a separate thread to allow my program to continue.

    Regards --Roland
  • marc
    marc Florida
    I don't fully understand why you call hold and waitForHold, and why there is a need to spin if you have a deactivate / waitForDeactivate combination. After calling deactivate, you exactly call waitForDeactivate once, and there is no need for any loop.

    However, I also have to admit that hold and waitForHold are probably the two most untested functions in the Ice OA. In fact, we were thinking about removing them from future versions, because so far nobody used them, and they are tricky to implement and test.
  • Hi Marc, It turned out that we didnt' invoke deactivate and waitForDeactivate until recently either. We noticed that if we invoked deactivate and waitForDeactivate and a catastrophic failure had occured that we could end up waiting a while for the adapter to close. I never went to root cause on this, but just assumed that since waitForDeactivate needed to shut down the endpoints and the socket had undergone a failure that closing down the adapter was taking a little while. So what we ended up doing was spinning the waitForDeactivate off in a separate thread.

    However, I needed to ensure that there were no more invocations being issued to by the object adapter prior to continuing on, since these invocations could possibly refer to other objects that I'm going to delete next. So I couldn't just invoke waitForDeactivate in a separate thread, I need to stop the adapter, so I used hold, and waitForHold. This all appeared to work very well. Hold and wiatForHold would be invoked sychronosely, but not touch the socket, so they would return immediately. Then I would spin a thread off to invoke waitForDeactivate.

    We're in the process of putting waitForDeactivate back in and seeing what this does again.

    Regards --Roland
  • Hi Marc & Mes, What we've done is replace the hold and waitForHold with removing the objects from the object adapter and then still using a separate thread to invoke deactivate and waitForDeactiviate.

    What we're wondering is if we remove an object from the object adapter does this execute synchronosely meaning after the return from adapter->remove are all invocation on the object complete and are pending invocations to the object discarded. If this is the case then this could replace our use of hold and waitForHold since we're just going to delete the deactivate the object adapter anyway. We've already made this change in our code and with our limited testing so far we dont' seem to be running into the problems reported above. So it is looking like hold and waitForHold are possibly causing the issue, but it is still a little early.

    We did go back and re-verify that deactivating an object adapter when there was a network failure does wait around a while. using hold and waitForHold does not wait.

    If adapter->remove does what we want then we're probably in good shape.

    Regards --Roland
  • marc
    marc Florida
    If you remove Ice objects from the ASM, then this only affects new requests, i.e., requests that arrive after the call to remove(). However, requests that have been started before might still be in progress when remove() returns.

    You are right, waitForDeactivate() might take some time, depending on how the connection failed. However, it should not take longer than the maximum timeout configured for the object adapter's endpoints.

    I would like to better understand your application in order to give advice. First, how do you detect a catastrophic failure? Usually many clients connect to an OA, and the connection from every client could fail. How do you detect this? And once you detect this, why do you shut down the OA at all? Won't this kill your communications with all other clients? Or do you have something like an OA-per-client setup? If so, why?
  • Hi Marc, We have multiple object adapters per client. We detect when a connection has failed using two methods:

    1. Iterating over the list of all object adapters and doing a ice_ping on a oneway proxy at regular intervals in a thread.

    2. Using a varient of the method that Michi Henning presented in the June issue of Connections.

    In many cases, since an object adapter has a one-to-one association with a connection, we want to close down the object adapter when a failure is detected. The reason for creating an object adapter per connection is that we have a media processing application and we want each media stream to be able to operate completely in parallel with other media stream, but serialized within a media stream wrt itself. So our object adapters are created with individual thread pools of size one. We basically don't want one media steam, like audio, to be backed up behind graphics. There were a couple of alternatives we considered:

    1. Create a single object adapter and then keep track of a sequence id for each media type, to ensure serialization wrt media streams, and put many threads in the object adapter. At the top of each function ensure that the method invocation is dispatched.

    2. Use a single object adapter and create our own internal dispatch mechanism that would dispatch to an internal queue with its own internal thread pool. This just seemed like re-creating the part that Ice provided ontop of being less efficient.

    3. Do nothing. Don't try an process media streams in parallel.

    One thing we just learned is that bi-dir connections on the client side use a single client thread pool, so in fact our design is currently not working. We would have to use a per connection communicator in addition to the object adapter to supply each object adapter it's own client thread pool in this case I guess.

    It would be really cool if we could create thread pools per object which I guess would be most similiar to option 2 above.

    What we're thinking of doing temporaily to work-around the what appears to be an issue with hold and waitForHold is create a synchronous remove using a monitor in the destructor of the object that is within the object adapter as follows:

    void Session::removeSync()
    {
    // A simple class to remove an object from an object adapter.
    class ServantDeactivationThread : public IceUtil::Thread
    {
    public:
    virtual void run(objectId)
    {
    m_adapter->remove(objectId);
    }
    };

    // Lock the object monitor.
    IceUtil::Monitor<IceUtil::RecMutx>::Lock lock(m_objectMonitor);

    // Create a servant deactivation thread.
    IceUtil::Thread servantDeactivationThread =
    new ServantDeactivationThread();

    // Start the thread.
    servantDeactivationThread->start();

    // Wait for the object to be deleted. The monitor will be notified
    // when the object's reference count is 0 and the Ice runtime deletes
    // the object. The destructor for teh object invokes notifyAll on the
    // monitor.
    m_objectMonitor->wait();
    }

    void Session::notifyObjectDeleted()
    {
    IceUtil::Monitor<IceUtil::RecMutx>::Lock lock(m_objectMonitor);
    m_objectMonitor.notifyAll();
    }

    SomeOject::~SomeObject()
    {
    m_session.notifyObjectDeleted();
    }

    Using this approach (I haven't quite tried the code yet) when removeSyn() is invoked by the main application it will lock a monitor (m_ojectMonitor), launch a thread to remove the object, and then wait on the monitor. remove() will go ahead and remove the object from the object adapter asynchronousely. When the reference count of the object adapter is 0 then the object will be deleted. In our case the object is only added to one object adapter.

    Our preliminary testing (we actually beat on it quite hard) using just remove worked out well, but I realize there is a hole with this that I'm just trying to plug up by creating a sychronous remove which replaces the hold/waitForHold, or deactivate/waitForDeactivate invocations.

    I hope this works since I'm running out of ideas. Please let me know if you have any better ones.

    Regards --Roland
  • Please disregard my last post on using the ServantDeactivationThread. I'm not exactly sure why, but this didn't work. I new I should have tried it first. Actually, I end in an infinte wait. I suspect that the Object isn't deleted when it's reference account = 0 right away. This is proabbly left for the garbage collector or when waitForDeactiviate is invoked. I'm not sure which but the destructor for the object was never invoked so notifyAll is never invoked.

    Regards --Roland
  • bernard
    bernard Jupiter, FL
    Hi Roland,

    In C++ a servant is deleted immediately when its reference count reaches 0. Obvioulsy the ASM holds a reference while the servant is registered; and the object adapter holds another count while a request is being dispatched.

    Coming back to hold/waitForHold vs deactivate/waitForDeactivate, have you considered using a servant locator instead?

    With a servant locator, you could easily:
    - ensure that no new request can be dispatched to your servants
    - wait until outstanding requests have completed (i.e. finished was called on the servant locator)

    Cheers,
    Bernard
  • Hi Bernard, Thanks for the feedback. Actually, the long posting about using the monitor in the destructor is working for us now. I had a reference to the object which was preventing the destructor from being invoked, even though adapter->remove had been invoked. Once my reference was removed the destructor ran and everything appears to be working, almost exactly as listed above. So basically what I have is a way of doing a synchronous remove. I guess this design is dependent on a 1-to-1 relationship between the object and servant, which I have right now.

    I've eliminated hold and waitForHold and I'm not seeing anymore Ice issues.

    Thanks for the pointer on using servant locators. I've justed started to read up on this so it will take me a while to work through the details and understand this better, but it looks promising and could replace using the monitor in the destructor.

    Regards --Roland