[Bug 61183] New: Deadlock occurs while sending to a closing session

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

[Bug 61183] New: Deadlock occurs while sending to a closing session

Bugzilla from bugzilla@apache.org
https://bz.apache.org/bugzilla/show_bug.cgi?id=61183

            Bug ID: 61183
           Summary: Deadlock occurs while sending to a closing session
           Product: Tomcat 8
           Version: 8.0.33
          Hardware: PC
            Status: NEW
          Severity: regression
          Priority: P2
         Component: WebSocket
          Assignee: [hidden email]
          Reporter: [hidden email]
  Target Milestone: ----

Hi,
We have recently ran into a deadlock situation in a customer site while using
websockets.
One of the locks is in our code while the other is the Tomcat "stateLock"
defined in "class WsSession".

I have marked this issue as a regression because one of the 2 locations that
Tomcat takes the stateLock in 8.0.33 did not exist in 8.0.29. Here's a quick
look at the differences before examining the deadlocked call stacks:

In 8.0.33 there's an expansion of the stateLock usage:
    protected void registerFuture(FutureToSendHandler f2sh) {
        boolean fail = false;
        synchronized (stateLock) {
                        // leaving out the details...
                }
        }

While in 8.0.29 the entire method looks like:
    protected void registerFuture(FutureToSendHandler f2sh) {
        futures.put(f2sh, f2sh);
    }

I can understand why the stateLock usage has been expanded. But I think there
is a problem with this. Previously we'd only enter stateLock in Tomcat ->
Application calls. Now with the new registerFuture we also can enter stateLock
in Application -> Tomcat calls.
So a recent change seems to have introduced a key ingredient for deadlocks. If
my application has its own lock there is the potential for 2 locks to be
acquired in reverse order by 2 simultaneous threads.

We have experienced a real-world example of this:

In the 1st thread Tomcat is closing the session after an abrupt disconnect from
the client. The close action bubbles up to the Application onClose handler:

"http-nio-8445-exec-76" daemon prio=5 tid=249 BLOCKED
                at
com.myc.common.domain.websocketutil.WsOutbound.setSocketStatus(WsOutbound.java:152)
--> Waits on the Application local lock
                   Local Variable:
com.myc.common.domain.websocketutil.WsOutbound$SocketStatus#2
                at
com.myc.customer.web.socket.WebSocketEndpoint.onClose(WebSocketEndpoint.java:224)
 --> Application onClose handler called which has to do some tidy up
                   Local Variable: javax.websocket.CloseReason#2
                   Local Variable:
com.myc.customer.web.socket.WebSocketEndpoint#4
                   Local Variable: java.lang.Object#9184
                at sun.reflect.GeneratedMethodAccessor250.invoke(<unknown
string>)
                at sun.reflect.DelegatingMethodAccessorImpl.invoke(<unknown
string>)
                at java.lang.reflect.Method.invoke(<unknown string>)
                at
org.apache.tomcat.websocket.pojo.PojoEndpointBase.onClose(PojoEndpointBase.java:107)
                   Local Variable:
org.apache.tomcat.websocket.pojo.PojoEndpointServer#6
                at
org.apache.tomcat.websocket.WsSession.fireEndpointOnClose(WsSession.java:541)
                   Local Variable:
org.apache.catalina.loader.WebappClassLoader#2
                   Local Variable:
org.apache.catalina.core.DefaultInstanceManager#3
                at
org.apache.tomcat.websocket.WsSession.doClose(WsSession.java:490)              
     --> Acquires a “state lock” on Tomcats call to close the session
                at
org.apache.tomcat.websocket.server.WsHttpUpgradeHandler.onError(WsHttpUpgradeHandler.java:150)
                   Local Variable:
org.apache.tomcat.websocket.server.WsHttpUpgradeHandler#6
                at
org.apache.tomcat.websocket.server.WsHttpUpgradeHandler.access$300(WsHttpUpgradeHandler.java:48)


In the 2nd thread the Application is simply sending a normal message to Tomcat
(unaware the session is closing simultaneously):

"defaultEventExecutorGroup-3-1" prio=5 tid=94 BLOCKED
                at
org.apache.tomcat.websocket.WsSession.registerFuture(WsSession.java:650)      
   --> Waits on the Tomcat “stateLock”
                   Local Variable: org.apache.tomcat.websocket.WsSession#6
                at
org.apache.tomcat.websocket.FutureToSendHandler.get(FutureToSendHandler.java:104)
                   Local Variable:
org.apache.tomcat.websocket.FutureToSendHandler#247
                   Local Variable: java.util.concurrent.TimeUnit$4#1
                at
org.apache.tomcat.websocket.FutureToSendHandler.get(FutureToSendHandler.java:31)
                at
com.myc.common.domain.websocketutil.WsOutbound.await(WsOutbound.java:255)
                at
com.myc.common.domain.websocketutil.WsOutbound.sendObj(WsOutbound.java:234)
                at
com.myc.common.domain.websocketutil.WsOutbound.send(WsOutbound.java:215)      
 --> Acquires the Application local lock while sending (to guarantee message
order when sending multiple messages)
                   Local Variable:
com.myc.customer.model.json.notification.NewMessageNotification#3

Our interface to Tomcat (wsOutbound.java) takes the Application lock using
class "synchronized" as follows:

    public synchronized void setSocketStatus(SocketStatus status) {
                // Does some state change stuff. Can be invoked from onClose()
        }

        public synchronized boolean send(Object obj) {
                // Sends one or more messages to Tomcat
        }

So we had a simple locking strategy. Now I have to come up with a message
queuing strategy so that I can send multiple messages, in order, without
holding an Application lock.

To my mind the root problem is in the existing Tomcat code for doClose(). In
other deadlock situations I've been involved with the onus has been on the
thread that invokes a callback/listener to exit any locks it holds first:

    private void doClose(CloseReason closeReasonMessage,
            CloseReason closeReasonLocal) {
        // Double-checked locking. OK because state is volatile
        if (state != State.OPEN) {
            return;
        }

        synchronized (stateLock) {
            if (state != State.OPEN) {
                return;
            }

            if (log.isDebugEnabled()) {
                log.debug(sm.getString("wsSession.doClose", id));
            }
            try {
                wsRemoteEndpoint.setBatchingAllowed(false);
            } catch (IOException e) {
                log.warn(sm.getString("wsSession.flushFailOnClose"), e);
                                                                               
        <-- Exit stateLock first?
                fireEndpointOnError(e);
            }

            state = State.CLOSING;

            sendCloseMessage(closeReasonMessage);
                                                                               
        <-- Exit stateLock first?
            fireEndpointOnClose(closeReasonLocal);
                                                                               
        <-- Reenter stateLock?
            state = State.CLOSED;
        }

I took a quick look at the latest 8.0.x (8.0.44) code and it appears the same
issue can happen.
I'd appreciate any comment on my analysis or improvements in coding our
application.

--
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

[Bug 61183] Deadlock occurs while sending to a closing session

Bugzilla from bugzilla@apache.org
https://bz.apache.org/bugzilla/show_bug.cgi?id=61183

--- Comment #1 from Lorcan O'Toole <[hidden email]> ---
Created attachment 35050
  --> https://bz.apache.org/bugzilla/attachment.cgi?id=35050&action=edit
Description in txt

Call stacks and comments might be easier to read with .txt

--
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

[Bug 61183] Deadlock occurs while sending to a closing session

Bugzilla from bugzilla@apache.org
In reply to this post by Bugzilla from bugzilla@apache.org
https://bz.apache.org/bugzilla/show_bug.cgi?id=61183

Mark Thomas <[hidden email]> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 OS|                            |All

--- Comment #2 from Mark Thomas <[hidden email]> ---
The analysis looks to be spot on. The tricky part is likely to be fixing this
without re-introducing the bugs those locks were put in place to fix.

--
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

[Bug 61183] Deadlock occurs while sending to a closing session

Bugzilla from bugzilla@apache.org
In reply to this post by Bugzilla from bugzilla@apache.org
https://bz.apache.org/bugzilla/show_bug.cgi?id=61183

Mark Thomas <[hidden email]> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|---                         |FIXED

--- Comment #3 from Mark Thomas <[hidden email]> ---
I spent a lot of time trying to avoid calling the callbacks with the lock held.
In the end, I opted to refactor registerFuture().

Fixed in:
- trunk for 9.0.0.M22 onwards
- 8.5.x for 8.5.16 onwards
- 8.0.x for 8.0.45 onwards
- 7.0.x for 7.0.79 onwards

--
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

[Bug 61183] Deadlock occurs while sending to a closing session

Bugzilla from bugzilla@apache.org
In reply to this post by Bugzilla from bugzilla@apache.org
https://bz.apache.org/bugzilla/show_bug.cgi?id=61183

--- Comment #4 from Lorcan O'Toole <[hidden email]> ---
Thanks Mark,
I assume the refactoring of registerFuture involved avoiding stateLock. I would
imagine this problem should be reproducible with the right test harness.

Have you any view to when these patches will be made available?
regards,
Lorcan

--
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

[Bug 61183] Deadlock occurs while sending to a closing session

Bugzilla from bugzilla@apache.org
In reply to this post by Bugzilla from bugzilla@apache.org
https://bz.apache.org/bugzilla/show_bug.cgi?id=61183

--- Comment #5 from Mark Thomas <[hidden email]> ---
I'm building the 9.0.0.M22 release now. 8.5.x will follow shortly. Releases
take 4-5 days, mainly because of the minimum 72 hour voting period and allowing
24 hours for the mirrors to sync.

--
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Loading...