[Bug 61831] New: NIO2 connector becomes intermittently unresponsive after some period of time

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

[Bug 61831] New: NIO2 connector becomes intermittently unresponsive after some period of time

Bugzilla from bugzilla@apache.org
https://bz.apache.org/bugzilla/show_bug.cgi?id=61831

            Bug ID: 61831
           Summary: NIO2 connector becomes intermittently unresponsive
                    after some period of time
           Product: Tomcat 8
           Version: 8.0.47
          Hardware: All
                OS: Linux
            Status: NEW
          Severity: normal
          Priority: P2
         Component: Connectors
          Assignee: [hidden email]
          Reporter: [hidden email]
  Target Milestone: ----

Created attachment 35564
  --> https://bz.apache.org/bugzilla/attachment.cgi?id=35564&action=edit
jstack thread dump

We are observing a scenario when NIO2 connector on tomcat becomes unresponsive
after some period of time, at the same time NIO connector running on the same
host is still able to process the same requests and serves traffic. Only server
restart helps in this case.
This issue is intermittent and with the current infrastructure we have few
nodes behind LB and it happens from time to time (like once per week) for each
node, so it seems to be not a node or hardware specific in our case.
Below is our server.xml:

    <Executor name="tomcatThreadPool" namePrefix="catalina-exec-"
maxThreads="800" minSpareThreads="100"/>


      <Connector executor="tomcatServiceThreadPool"
                 port="8080"
                 protocol="org.apache.coyote.http11.Http11Nio2Protocol"
                 connectionTimeout="1000"
                 enableLookups="false"
                 acceptorThreadCount="1"
                 processorCache="800"
                 socket.tcpNoDelay="true"
                 socket.soKeepAlive="true"
                 socket.soLingerOn="false"
                 compression="256"
               
compressableMimeType="text/html,text/xml,text/plain,application/x-protobuf,application/json,application/javascript"
                 URIEncoding="UTF-8" />

      <!-- The load balancer terminates SSL connections and
           then forwards them to the following connector as
           normal HTTP (non-secure) requests
       -->
      <Connector executor="tomcatServiceThreadPool"
                 port="8443"
                 protocol="org.apache.coyote.http11.Http11NioProtocol"
                 connectionTimeout="1000"
                 enableLookups="false"
                 connectionLinger="-1"
                 acceptorThreadCount="20"
                 processorCache="800"
                 socket.tcpNoDelay="true"
                 socket.soKeepAlive="true"
                 socket.soLingerOn="false"
                 compression="256"
               
compressableMimeType="text/html,text/xml,text/plain,application/x-protobuf,application/json,application/javascript"
                 URIEncoding="UTF-8" />


      <!-- Define an AJP 1.3 Connector on port 8009 -->
      <Connector port="8009" protocol="AJP/1.3" redirectPort="8443" />

Also below is an example of the behavior we observe:

curl -verbose 'http://localhost:8080/rs?id=nio2issue'
* About to connect() to localhost port 8080 (#0)
*   Trying 127.0.0.1... connected
* Connected to localhost (127.0.0.1) port 8080 (#0)
> GET /rs?id=nio2issue HTTP/1.1
> User-Agent: curl/7.19.7 (x86_64-redhat-linux-gnu) libcurl/7.19.7 NSS/3.15.3 zlib/1.2.3 libidn/1.18 libssh2/1.4.2
> Host: localhost:8080
> Accept: */*
> Referer: rbose
>
* Closing connection #0
* Failure when receiving data from the peer
curl: (56) Failure when receiving data from the peer

at the same time:

curl -i 'http://localhost:8443/rs?id=nio2issue'
HTTP/1.1 302 Found

Also, no unusual errors are logged to catalina.out at the time of the accident.
Enclosed is thread dump from the server.
Also, we have observed the same behavior on tomcat 8.0.18 and upgraded to the
latest version in the same release 8.0.47 but it didn't help.

Please let me know what else might be helpful as we keep one of the servers in
this state, for now, to be able to gather any data as the issue is intermittent
and we were not able to reproduce with a simple load test.

Regards,
Oleg.

--
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[Bug 61831] NIO2 connector becomes intermittently unresponsive after some period of time

Bugzilla from bugzilla@apache.org
https://bz.apache.org/bugzilla/show_bug.cgi?id=61831

Oleg <[hidden email]> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |[hidden email]

--
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[Bug 61831] NIO2 connector becomes intermittently unresponsive after some period of time

Bugzilla from bugzilla@apache.org
In reply to this post by Bugzilla from bugzilla@apache.org
https://bz.apache.org/bugzilla/show_bug.cgi?id=61831

Remy Maucherat <[hidden email]> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|---                         |WORKSFORME

--- Comment #1 from Remy Maucherat <[hidden email]> ---
The thread dump looks perfect: acceptor thread blocking on the accept, all
threads idle and ready to execute something. Please investigate on the user
list to get at least some idea on how to reproduce it.
If possible, try to avoid using a custom executor, it makes things more complex
and the benefit is usually not obvious.

--
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[Bug 61831] NIO2 connector becomes intermittently unresponsive after some period of time

Bugzilla from bugzilla@apache.org
In reply to this post by Bugzilla from bugzilla@apache.org
https://bz.apache.org/bugzilla/show_bug.cgi?id=61831

Oleg <[hidden email]> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
         Resolution|WORKSFORME                  |---
             Status|RESOLVED                    |REOPENED

--- Comment #2 from Oleg <[hidden email]> ---
Hi,

I realize that thread dump might look fine and this is the most confusing part:
even simple curl command from the same host receives no response from this
connector and it starts working fine after tomcat restart.  At the same time
tomcat in overall looks to be healthy and another connector works fine, as this
happens from time to time on different servers, this doesn't look like to be OS
or hardware issue but something which is tomcat NIO2 specific.
And when we do any request to this NIO2 endpoint connector in a bad state - no
thread is triggered in tomcat, while looking into tomcat source code it looks
like that countdownlatch was not simply updated and service just hangs because
of this but the root cause is still not clear.

So curious what additional information we can provide to help investigate this
issue together with tomcat apache dev team?


Also, I'm not sure about your remark about custom executor - we don't use
custom one, we just configure the one form tomcat.

Regards,
Oleg.

--
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[Bug 61831] NIO2 connector becomes intermittently unresponsive after some period of time

Bugzilla from bugzilla@apache.org
In reply to this post by Bugzilla from bugzilla@apache.org
https://bz.apache.org/bugzilla/show_bug.cgi?id=61831

Remy Maucherat <[hidden email]> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|REOPENED                    |NEEDINFO

--
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[Bug 61831] NIO2 connector becomes intermittently unresponsive after some period of time

Bugzilla from bugzilla@apache.org
In reply to this post by Bugzilla from bugzilla@apache.org
https://bz.apache.org/bugzilla/show_bug.cgi?id=61831

Piotr <[hidden email]> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEEDINFO                    |RESOLVED
         Resolution|---                         |INVALID

--- Comment #3 from Piotr <[hidden email]> ---
We think we figured it out to be a Java Bug in asynchronous server socket
implementation.

Please see the following bug report which seems to exhibit a similar issue.
https://bugs.openjdk.java.net/browse/JDK-8172750

--
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[Bug 61831] NIO2 connector becomes intermittently unresponsive after some period of time

Bugzilla from bugzilla@apache.org
In reply to this post by Bugzilla from bugzilla@apache.org
https://bz.apache.org/bugzilla/show_bug.cgi?id=61831

--- Comment #4 from Remy Maucherat <[hidden email]> ---
Ok, maybe. Let us know if you find some elements demonstrating an issue in
Tomcat.

--
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]