Tomcat 9.0.29 - HTTPS threads age, max connections reached, Tomcat not responding on 8443

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
11 messages Options
Reply | Threaded
Open this post in threaded view
|

Tomcat 9.0.29 - HTTPS threads age, max connections reached, Tomcat not responding on 8443

David
  In the last two weeks I've had two occurrences where a single CentOS 7
production server hosting a public webpage has become unresponsive. The
first time, all 300 available "https-jsse-nio-8443" threads were consumed,
with the max age being around 45minutes, and all in a "S" status. This time
all 300 were consumed in "S" status with the oldest being around
~16minutes. A restart of Tomcat on both occasions freed these threads and
the website became responsive again. The connections are post/get methods
which shouldn't take very long at all.

CPU/MEM/JVM all appear to be within normal operating limits. I've not had
much luck searching for articles for this behavior nor finding remedies.
The default timeout values are used in both Tomcat and in the applications
that run within as far as I can tell. Hopefully someone will have some
insight on why the behavior could be occurring, why isn't Tomcat killing
the connections? Even in a RST/ACK status, shouldn't Tomcat terminate the
connection without an ACK from the client after the default timeout?

Is there a graceful way to script the termination of threads in case Tomcat
isn't able to for whatever reason? My research for killing threads results
in system threads or application threads, not Tomcat Connector connection
threads, so I'm not sure if this is even viable. I'm also looking into ways
to terminate these aged sessions via the F5.At this time I'm open to any
suggestions that would be able to automate a resolution to keep the system
from experiencing downtime, or for any insight on where to look for a root
cause. Thanks in advance for any guidance you can lend.

Thanks, David
Reply | Threaded
Open this post in threaded view
|

Re: Tomcat 9.0.29 - HTTPS threads age, max connections reached, Tomcat not responding on 8443

Christopher Schultz-2
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA256

David,

On 8/27/20 10:48, David wrote:

> In the last two weeks I've had two occurrences where a single
> CentOS 7 production server hosting a public webpage has become
> unresponsive. The first time, all 300 available
> "https-jsse-nio-8443" threads were consumed, with the max age being
> around 45minutes, and all in a "S" status. This time all 300 were
> consumed in "S" status with the oldest being around ~16minutes. A
> restart of Tomcat on both occasions freed these threads and the
> website became responsive again. The connections are post/get
> methods which shouldn't take very long at all.
>
> CPU/MEM/JVM all appear to be within normal operating limits. I've
> not had much luck searching for articles for this behavior nor
> finding remedies. The default timeout values are used in both
> Tomcat and in the applications that run within as far as I can
> tell. Hopefully someone will have some insight on why the behavior
> could be occurring, why isn't Tomcat killing the connections? Even
> in a RST/ACK status, shouldn't Tomcat terminate the connection
> without an ACK from the client after the default timeout?

Can you please post:

1. Complete Tomcat version
2. Connector configuration (possibly redacted)

> Is there a graceful way to script the termination of threads in
> case Tomcat isn't able to for whatever reason?

Not really.

> My research for killing threads results in system threads or
> application threads, not Tomcat Connector connection threads, so
> I'm not sure if this is even viable. I'm also looking into ways to
> terminate these aged sessions via the F5. At this time I'm open to
>  any suggestions that would be able to automate a resolution to
> keep the system from experiencing downtime, or for any insight on
> where to look for a root cause. Thanks in advance for any guidance
> you can lend.
It might actually be the F5 itself, especially if it opens up a large
number of connections to Tomcat and then tries to open additional ones
for some reason. If it opens 300 connections (which are then e.g.
leaked by the F5 internally) but the 301st is refused, then your
server is essentially inert from that point forward.

NIO connectors default to max 10k connections so that's not likely the
actual problem, here, but it could be for some configurations.

Do you have a single F5 or a group of them?

- -chris
-----BEGIN PGP SIGNATURE-----
Comment: Using GnuPG with Thunderbird - https://www.enigmail.net/

iQIzBAEBCAAdFiEEMmKgYcQvxMe7tcJcHPApP6U8pFgFAl9H7tIACgkQHPApP6U8
pFjR1hAAldbVnHDrV0W4aPLvcDwO/zi7qvrCscNjnJWhmR1m9TrevlrSb0EZvCJo
gTl7DXYEiZ9sBEdgs6AavHlk8jQ+ZbXbp8lsMElW5X9QoxxUD3YyJEpDSeHOG7/S
2CyCYGzAQER0RlzBn9w97bCKWvUWoWDeLApd/pwdATjAo53hDtdNGdz6WdNLEzKm
g/BCZP0ynHZu7pMzSeZsOUBUXEKhDwHU+71fJo+WIJ4Gtiyb4xf2qkewvjQtuOGl
o/yESHNCJy09JAs3xK9W6eEVp981/Fuo4qDH32MJaXXbZRaNk32AwqngXKUhTW2l
BBl0jHoFIj+YJYc6AgVlv0la5qDIqP2VTv4ujOLBeF/95oP4sVRobIN4TiFyH6vv
ImvvRq55ML5NvKJv8g9Tj0aY5PusfwxcwyMCVovIof49vQXJUy7SbtgRB3eqgT8W
WwdBiGNsyWZpVjpr/CGBkBZmuR4wckeq0J5O/XGRFS9pK1jXH4gPnxe58vJmjA+P
RiSdp3SsU0P94SuF843CW+vmWyUu6SApCybUTwo5yiFXP2e/1+M9/fUGsykXpszU
zUvMcj9LWJ1QR3TbvEnwilsge4HKbUM3ZsFaujDjAVy6TAOgGS/dtVZ2UyMcrlOd
JMe3GeaOdM+ej27l5D8Eq6jaQcCfy+Mg9HxsYsbyrgrw3AhBhmo=
=eVIu
-----END PGP SIGNATURE-----

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Tomcat 9.0.29 - HTTPS threads age, max connections reached, Tomcat not responding on 8443

David
On Thu, Aug 27, 2020 at 12:35 PM Christopher Schultz
<[hidden email]> wrote:

>
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA256
>
> David,
>
> On 8/27/20 10:48, David wrote:
> > In the last two weeks I've had two occurrences where a single
> > CentOS 7 production server hosting a public webpage has become
> > unresponsive. The first time, all 300 available
> > "https-jsse-nio-8443" threads were consumed, with the max age being
> > around 45minutes, and all in a "S" status. This time all 300 were
> > consumed in "S" status with the oldest being around ~16minutes. A
> > restart of Tomcat on both occasions freed these threads and the
> > website became responsive again. The connections are post/get
> > methods which shouldn't take very long at all.
> >
> > CPU/MEM/JVM all appear to be within normal operating limits. I've
> > not had much luck searching for articles for this behavior nor
> > finding remedies. The default timeout values are used in both
> > Tomcat and in the applications that run within as far as I can
> > tell. Hopefully someone will have some insight on why the behavior
> > could be occurring, why isn't Tomcat killing the connections? Even
> > in a RST/ACK status, shouldn't Tomcat terminate the connection
> > without an ACK from the client after the default timeout?
>
> Can you please post:
>
> 1. Complete Tomcat version
I can't find anything more granular than 9.0.29, is there a command to
show a sub patch level?
> 2. Connector configuration (possibly redacted)
This is the 8443 section of the server.xml *8080 is available during
the outage and I'm able to curl the management page to see the 300
used threads, their status, and age*
  <Service name="Catalina">

    <!--The connectors can use a shared executor, you can define one
or more named thread pools-->
    <!--
    <Executor name="tomcatThreadPool" namePrefix="catalina-exec-"
        maxThreads="150" minSpareThreads="4"/>
    -->


    <!-- A "Connector" represents an endpoint by which requests are received
         and responses are returned. Documentation at :
         Java HTTP Connector: /docs/config/http.html
         Java AJP  Connector: /docs/config/ajp.html
         APR (HTTP/AJP) Connector: /docs/apr.html
         Define a non-SSL/TLS HTTP/1.1 Connector on port 8080
    -->
    <Connector port="8080" protocol="HTTP/1.1"
               connectionTimeout="20000"
               redirectPort="8443" />
    <!-- A "Connector" using the shared thread pool-->
    <!--
    <Connector executor="tomcatThreadPool"
               port="8080" protocol="HTTP/1.1"
               connectionTimeout="20000"
               redirectPort="8443" />
    -->
    <!-- Define an SSL/TLS HTTP/1.1 Connector on port 8443
         This connector uses the NIO implementation. The default
         SSLImplementation will depend on the presence of the APR/native
         library and the useOpenSSL attribute of the
         AprLifecycleListener.
         Either JSSE or OpenSSL style configuration may be used regardless of
         the SSLImplementation selected. JSSE style configuration is used below.
    -->
    <Connector port="8443" protocol="org.apache.coyote.http11.Http11NioProtocol"
        maxThreads="300" SSLEnabled="true" >
        <SSLHostConfig>
            <Certificate
certificateKeystoreFile="/opt/apache-tomcat-9.0.29/redacted.jks"
            certificateKeystorePassword="redacted"
            type="RSA" />
        </SSLHostConfig>
    </Connector>
    <!-- Define an SSL/TLS HTTP/1.1 Connector on port 8443 with HTTP/2
         This connector uses the APR/native implementation which always uses
         OpenSSL for TLS.
         Either JSSE or OpenSSL style configuration may be used. OpenSSL style
         configuration is used below.
    -->
    <Connector port="8443" protocol="org.apache.coyote.http11.Http11NioProtocol"
        maxThreads="300" SSLEnabled="true" >
        <SSLHostConfig protocols="TLSv1.2">
            <Certificate
certificateKeystoreFile="/opt/apache-tomcat-9.0.29/redacted.jks"
            certificateKeystorePassword="redacted"
            type="RSA" />
        </SSLHostConfig>
    </Connector>

>
> > Is there a graceful way to script the termination of threads in
> > case Tomcat isn't able to for whatever reason?
>
> Not really.
>
> > My research for killing threads results in system threads or
> > application threads, not Tomcat Connector connection threads, so
> > I'm not sure if this is even viable. I'm also looking into ways to
> > terminate these aged sessions via the F5. At this time I'm open to
> >  any suggestions that would be able to automate a resolution to
> > keep the system from experiencing downtime, or for any insight on
> > where to look for a root cause. Thanks in advance for any guidance
> > you can lend.
> It might actually be the F5 itself, especially if it opens up a large
> number of connections to Tomcat and then tries to open additional ones
> for some reason. If it opens 300 connections (which are then e.g.
> leaked by the F5 internally) but the 301st is refused, then your
> server is essentially inert from that point forward.
>
> NIO connectors default to max 10k connections so that's not likely the
> actual problem, here, but it could be for some configurations.
>
> Do you have a single F5 or a group of them?
A group of them, several HA pairs depending on internal or external
and application.  This server is behind one HA pair and is a single
server.
>
> - -chris
Thank you Chris!
David

> -----BEGIN PGP SIGNATURE-----
> Comment: Using GnuPG with Thunderbird - https://www.enigmail.net/
>
> iQIzBAEBCAAdFiEEMmKgYcQvxMe7tcJcHPApP6U8pFgFAl9H7tIACgkQHPApP6U8
> pFjR1hAAldbVnHDrV0W4aPLvcDwO/zi7qvrCscNjnJWhmR1m9TrevlrSb0EZvCJo
> gTl7DXYEiZ9sBEdgs6AavHlk8jQ+ZbXbp8lsMElW5X9QoxxUD3YyJEpDSeHOG7/S
> 2CyCYGzAQER0RlzBn9w97bCKWvUWoWDeLApd/pwdATjAo53hDtdNGdz6WdNLEzKm
> g/BCZP0ynHZu7pMzSeZsOUBUXEKhDwHU+71fJo+WIJ4Gtiyb4xf2qkewvjQtuOGl
> o/yESHNCJy09JAs3xK9W6eEVp981/Fuo4qDH32MJaXXbZRaNk32AwqngXKUhTW2l
> BBl0jHoFIj+YJYc6AgVlv0la5qDIqP2VTv4ujOLBeF/95oP4sVRobIN4TiFyH6vv
> ImvvRq55ML5NvKJv8g9Tj0aY5PusfwxcwyMCVovIof49vQXJUy7SbtgRB3eqgT8W
> WwdBiGNsyWZpVjpr/CGBkBZmuR4wckeq0J5O/XGRFS9pK1jXH4gPnxe58vJmjA+P
> RiSdp3SsU0P94SuF843CW+vmWyUu6SApCybUTwo5yiFXP2e/1+M9/fUGsykXpszU
> zUvMcj9LWJ1QR3TbvEnwilsge4HKbUM3ZsFaujDjAVy6TAOgGS/dtVZ2UyMcrlOd
> JMe3GeaOdM+ej27l5D8Eq6jaQcCfy+Mg9HxsYsbyrgrw3AhBhmo=
> =eVIu
> -----END PGP SIGNATURE-----
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Tomcat 9.0.29 - HTTPS threads age, max connections reached, Tomcat not responding on 8443

markt
On 27/08/2020 18:57, David wrote:
> On Thu, Aug 27, 2020 at 12:35 PM Christopher Schultz
> <[hidden email]> wrote:
>>>> Is there a graceful way to script the termination of threads in
>>>> case Tomcat isn't able to for whatever reason?
>
> Not really.

What you can do is take a thread dump when this happens so you can see
what the threads are doing. That should provide some insight to where
the problem is.

Mark

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Tomcat 9.0.29 - HTTPS threads age, max connections reached, Tomcat not responding on 8443

Felix Schumacher
In reply to this post by Christopher Schultz-2

Am 27.08.20 um 19:35 schrieb Christopher Schultz:

> David,
>
> On 8/27/20 10:48, David wrote:
> > In the last two weeks I've had two occurrences where a single
> > CentOS 7 production server hosting a public webpage has become
> > unresponsive. The first time, all 300 available
> > "https-jsse-nio-8443" threads were consumed, with the max age being
> > around 45minutes, and all in a "S" status. This time all 300 were
> > consumed in "S" status with the oldest being around ~16minutes. A
> > restart of Tomcat on both occasions freed these threads and the
> > website became responsive again. The connections are post/get
> > methods which shouldn't take very long at all.
>
> > CPU/MEM/JVM all appear to be within normal operating limits. I've
> > not had much luck searching for articles for this behavior nor
> > finding remedies. The default timeout values are used in both
> > Tomcat and in the applications that run within as far as I can
> > tell. Hopefully someone will have some insight on why the behavior
> > could be occurring, why isn't Tomcat killing the connections? Even
> > in a RST/ACK status, shouldn't Tomcat terminate the connection
> > without an ACK from the client after the default timeout?
>
> Can you please post:
>
> 1. Complete Tomcat version
> 2. Connector configuration (possibly redacted)
>
> > Is there a graceful way to script the termination of threads in
> > case Tomcat isn't able to for whatever reason?
>
> Not really.

(First look at Marks response on determining the root cause)

Well, there might be a way (if it is sane, I don't know). You can
configure a valve to look for seemingly stuck threads and try to
interrupt them:

http://tomcat.apache.org/tomcat-9.0-doc/config/valve.html#Stuck_Thread_Detection_Valve

There are a few caveats there. First it is only working, when both
conditions are true

 * the servlets are synchronous
 * the stuck thread can be "freed" with an Interrupt

But really, if your threads are stuck for more than 15 minutes, you have
ample of time to take a thread dump and hopefully find the root cause,
so that you don't need this valve.

Felix

>
> > My research for killing threads results in system threads or
> > application threads, not Tomcat Connector connection threads, so
> > I'm not sure if this is even viable. I'm also looking into ways to
> > terminate these aged sessions via the F5. At this time I'm open to
> >  any suggestions that would be able to automate a resolution to
> > keep the system from experiencing downtime, or for any insight on
> > where to look for a root cause. Thanks in advance for any guidance
> > you can lend.
> It might actually be the F5 itself, especially if it opens up a large
> number of connections to Tomcat and then tries to open additional ones
> for some reason. If it opens 300 connections (which are then e.g.
> leaked by the F5 internally) but the 301st is refused, then your
> server is essentially inert from that point forward.
>
> NIO connectors default to max 10k connections so that's not likely the
> actual problem, here, but it could be for some configurations.
>
> Do you have a single F5 or a group of them?
>
> -chris
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Tomcat 9.0.29 - HTTPS threads age, max connections reached, Tomcat not responding on 8443

Christopher Schultz-2
In reply to this post by David
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA256

David,

On 8/27/20 13:57, David wrote:

> On Thu, Aug 27, 2020 at 12:35 PM Christopher Schultz
> <[hidden email]> wrote:
>>
> David,
>
> On 8/27/20 10:48, David wrote:
>>>> In the last two weeks I've had two occurrences where a
>>>> single CentOS 7 production server hosting a public webpage
>>>> has become unresponsive. The first time, all 300 available
>>>> "https-jsse-nio-8443" threads were consumed, with the max age
>>>> being around 45minutes, and all in a "S" status. This time
>>>> all 300 were consumed in "S" status with the oldest being
>>>> around ~16minutes. A restart of Tomcat on both occasions
>>>> freed these threads and the website became responsive again.
>>>> The connections are post/get methods which shouldn't take
>>>> very long at all.
>>>>
>>>> CPU/MEM/JVM all appear to be within normal operating limits.
>>>> I've not had much luck searching for articles for this
>>>> behavior nor finding remedies. The default timeout values are
>>>> used in both Tomcat and in the applications that run within
>>>> as far as I can tell. Hopefully someone will have some
>>>> insight on why the behavior could be occurring, why isn't
>>>> Tomcat killing the connections? Even in a RST/ACK status,
>>>> shouldn't Tomcat terminate the connection without an ACK from
>>>> the client after the default timeout?
>
> Can you please post:
>
> 1. Complete Tomcat version
>> I can't find anything more granular than 9.0.29, is there a
>> command to show a sub patch level?

9.0.29 is the patch-level, so that's fine. You are about 10 versions
out of date (~1 year). Any chance for an upgrade?

> 2. Connector configuration (possibly redacted)
>> This is the 8443 section of the server.xml *8080 is available
>> during the outage and I'm able to curl the management page to see
>> the 300 used threads, their status, and age* <Service
>> name="Catalina">
>>
>> [snip]
>>
>> <Connector port="8080" protocol="HTTP/1.1"
>> connectionTimeout="20000" redirectPort="8443" /> [snip]
>> <Connector port="8443"
>> protocol="org.apache.coyote.http11.Http11NioProtocol"
>> maxThreads="300" SSLEnabled="true" > <SSLHostConfig>
>> <Certificate
>> certificateKeystoreFile="/opt/apache-tomcat-9.0.29/redacted.jks"
>> certificateKeystorePassword="redacted" type="RSA" />
>> </SSLHostConfig> </Connector> [snip] <Connector port="8443"
>> protocol="org.apache.coyote.http11.Http11NioProtocol"
>> maxThreads="300" SSLEnabled="true" > <SSLHostConfig
>> protocols="TLSv1.2"> <Certificate
>> certificateKeystoreFile="/opt/apache-tomcat-9.0.29/redacted.jks"
>> certificateKeystorePassword="redacted" type="RSA" />
>> </SSLHostConfig> </Connector>

What, two connectors on one port? Do you get errors when starting?

I don't see anything obviously problematic in the above configuration
(other than the double-definition of the 8443 connector).

300 tied-up connections (from your initial report) sounds like a
significant number: probably the thread count.

Mark (as is often the case) is right: take some thread dumps next time
everything locks up and see what all those threads are doing. Often,
it's something like everything is awaiting on a db connection and the
db pool has been exhausted or something. Relatively simple quick-fixes
are available for that, and better, longer-term fixes as well.

> Do you have a single F5 or a group of them?
>> A group of them, several HA pairs depending on internal or
>> external and application.  This server is behind one HA pair and
>> is a single server.

Okay. Just remember that each F5 can make some large number of
connections to Tomcat, so you need to make sure you can handle them.

This was a much bigger deal back in the BIO days when thread limit =
connection limit, and the thread limit was usually something like 250
- - 300. NIO is much better, and the default connection limit is 10k
which "ought to be enough for anyone"[1].

- -chris

[1] With apologies to Bill gates, who apparently never said anything
of the sort.
-----BEGIN PGP SIGNATURE-----
Comment: Using GnuPG with Thunderbird - https://www.enigmail.net/

iQIzBAEBCAAdFiEEMmKgYcQvxMe7tcJcHPApP6U8pFgFAl9IHUYACgkQHPApP6U8
pFgcMhAAsN/Fc0nG4EJ/aaxZtj46g7FW2UDLa3HcGI+r8mvI5pYlCxWO0Cm4oDHn
PAEUsjNgDcyLbWPa+hIfTWZ2v594w8ACrprpdNWHoPhZ316LudpG3G8RWwrIVsOa
pn6MmX79rvds1Htl2cbsZkaaNCg/3+dx5VgyQtexHopSP9FpU1swDwex4fIf/pFz
jcl4SB6eMnKxHwf/avwEy6sfdN05ALCl6KfJBCA6vnRvMT8hYVGs5B/bDdPRU5zL
0cNIAlNaxrcw0G13cuOhg5KYG+eeKBKl2R/OiSmyn4+Xp7zzbl3G3i4GvfbYrwqe
BFTcTGT3cTE3vwMcHmsskh2soxYcH3etWtJ2/XsrKoKdRqXpWybVyNEvHcUwhxdP
h7SAN5V8D2+9a8Hhh8y/hUCHBOT70THUyBipYweV26wUj4ipOAiYiJ2UaCATwNzf
E7Giv6D4Y9WQCU119HaQ65TLmvGTtfzctM5pJzrnRbI7LOpuo9/bNYxkYNoU8U9r
sAgY4t3kvKNttetFnwdY5+JTM+yrFolYFkYMFv8vpaVyiumP4+dpbkniRAmLabWl
O0kIn/bRTkek4ic/qCuawBi1Rc1hV1/1uUE1+t8XHG7Sfdd0vwUabZ8ZRxNUBWcc
EliCfzyMosWcsgU2puNduPyXDeRxKb5gfe4VdfaH5xvfdqIpfgw=
=SesB
-----END PGP SIGNATURE-----

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Tomcat 9.0.29 - HTTPS threads age, max connections reached, Tomcat not responding on 8443

Christopher Schultz-2
In reply to this post by Felix Schumacher
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA256

Felix,

On 8/27/20 16:09, Felix Schumacher wrote:

>
> Am 27.08.20 um 19:35 schrieb Christopher Schultz:
>> David,
>>
>> On 8/27/20 10:48, David wrote:
>>> In the last two weeks I've had two occurrences where a single
>>> CentOS 7 production server hosting a public webpage has become
>>> unresponsive. The first time, all 300 available
>>> "https-jsse-nio-8443" threads were consumed, with the max age
>>> being around 45minutes, and all in a "S" status. This time all
>>> 300 were consumed in "S" status with the oldest being around
>>> ~16minutes. A restart of Tomcat on both occasions freed these
>>> threads and the website became responsive again. The
>>> connections are post/get methods which shouldn't take very long
>>> at all.
>>
>>> CPU/MEM/JVM all appear to be within normal operating limits.
>>> I've not had much luck searching for articles for this behavior
>>> nor finding remedies. The default timeout values are used in
>>> both Tomcat and in the applications that run within as far as I
>>> can tell. Hopefully someone will have some insight on why the
>>> behavior could be occurring, why isn't Tomcat killing the
>>> connections? Even in a RST/ACK status, shouldn't Tomcat
>>> terminate the connection without an ACK from the client after
>>> the default timeout?
>>
>> Can you please post:
>>
>> 1. Complete Tomcat version 2. Connector configuration (possibly
>> redacted)
>>
>>> Is there a graceful way to script the termination of threads
>>> in case Tomcat isn't able to for whatever reason?
>>
>> Not really.
>
> (First look at Marks response on determining the root cause)
>
> Well, there might be a way (if it is sane, I don't know). You can
> configure a valve to look for seemingly stuck threads and try to
> interrupt them:
>
> http://tomcat.apache.org/tomcat-9.0-doc/config/valve.html#Stuck_Thread
_Detection_Valve

>
>  There are a few caveats there. First it is only working, when
> both conditions are true
>
> * the servlets are synchronous * the stuck thread can be "freed"
> with an Interrupt
>
> But really, if your threads are stuck for more than 15 minutes, you
> have ample of time to take a thread dump and hopefully find the
> root cause, so that you don't need this valve.

This is a good idea as a band-aid, but the reality is that if you need
the StuckThreadDetectionValve then your application is probably broken
and needs to be fixed.

Here are things that can be broken which might cause thread exhaustion:

1. Poor resource management. Things like db connections pools which
can leak and/or not be refilled by the application. Everything stops
when the db pool dries up.

2. Failure to set proper IO timeouts. Guess what the default read
timeout is on a socket? Forever! If you read from a socket you might
never hear back. Sounds like a problem. Set your read timeouts, kids.
You might need to do this on your HTTP connections (and pools, and
factories, and connection-wrappers like Apache http-client), your
database config (usually in the config URL), and any remote-API
libraries you are using (which use e.g. HTTP under the hood).

- -chris
-----BEGIN PGP SIGNATURE-----
Comment: Using GnuPG with Thunderbird - https://www.enigmail.net/

iQIzBAEBCAAdFiEEMmKgYcQvxMe7tcJcHPApP6U8pFgFAl9IICoACgkQHPApP6U8
pFgkuQ/+NE7tC+wWXoP2Ntv0yljJHyasRY/3dVewoNUxfO4CwcEkhSpK5YEkiHd3
sE7jygxEn3SHtHJ0WQPBWMAzL9RoLnglbAsxVXuWCzbQzd3tGCKOxQevCN3y+2ft
jffqMEqOCgrN4kvKivj75V3alFQotT+jbZm1nJEwuQCLSJCqiHWcyCLlJF9Y6axn
Thvsv40bnTKCPgqezo/0AYiYjQ9xIatTC3QDw129E7bofNKPBLk7LWcbg9CQBu+T
iboA8IIxFgrOFYn66Mgx4kcJcQTRJ2XgdJ1v8p+mSITWH3UkLa5OhZeTqU6x2LDl
LPuY8eC6y9QUqpFeEtaL72ZpDdYAn7Vcu4B3+D4Oobh7o2EJNQijIQ6A2QKIFw6e
eBACKL0JJMwvfxVnp3nKIuoB3yOemMGZ8NpqUNcEn5mjmZubRWXXJXjtjjF5pGYW
RRbMXvs3tFhLGsqnjVHQ/AV5MyuYKfl4Tqhvrz0u2oh0A8uo5Kq3CuHBDcLhLjs1
RkDiZuSdVugRFeq6hcQAyqO6LQ/QRhqtQ1hscecr9Iv8grvs8gzi4PvlurgBFqEF
AuWe0V2WY0IJ9S7BqUUDr3Ij+0CQgxeQ70yyOztWjsT0B6ZPdOChm5Meu1+qi2ni
EuT6Q5Lo2KHTqhrvi/RbTUXs+D6LSNFN6QFOzWtKWAr+gyrQjKg=
=Ew/J
-----END PGP SIGNATURE-----

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Tomcat 9.0.29 - HTTPS threads age, max connections reached, Tomcat not responding on 8443

David
In reply to this post by Christopher Schultz-2
Thank you all for the replies!

On Thu, Aug 27, 2020 at 3:53 PM Christopher Schultz
<[hidden email]> wrote:

>
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA256
>
> David,
>
> On 8/27/20 13:57, David wrote:
> > On Thu, Aug 27, 2020 at 12:35 PM Christopher Schultz
> > <[hidden email]> wrote:
> >>
> > David,
> >
> > On 8/27/20 10:48, David wrote:
> >>>> In the last two weeks I've had two occurrences where a
> >>>> single CentOS 7 production server hosting a public webpage
> >>>> has become unresponsive. The first time, all 300 available
> >>>> "https-jsse-nio-8443" threads were consumed, with the max age
> >>>> being around 45minutes, and all in a "S" status. This time
> >>>> all 300 were consumed in "S" status with the oldest being
> >>>> around ~16minutes. A restart of Tomcat on both occasions
> >>>> freed these threads and the website became responsive again.
> >>>> The connections are post/get methods which shouldn't take
> >>>> very long at all.
> >>>>
> >>>> CPU/MEM/JVM all appear to be within normal operating limits.
> >>>> I've not had much luck searching for articles for this
> >>>> behavior nor finding remedies. The default timeout values are
> >>>> used in both Tomcat and in the applications that run within
> >>>> as far as I can tell. Hopefully someone will have some
> >>>> insight on why the behavior could be occurring, why isn't
> >>>> Tomcat killing the connections? Even in a RST/ACK status,
> >>>> shouldn't Tomcat terminate the connection without an ACK from
> >>>> the client after the default timeout?
> >
> > Can you please post:
> >
> > 1. Complete Tomcat version
> >> I can't find anything more granular than 9.0.29, is there a
> >> command to show a sub patch level?
>
> 9.0.29 is the patch-level, so that's fine. You are about 10 versions
> out of date (~1 year). Any chance for an upgrade?

They had to re-dev many apps last year when we upgraded from I want to
say 1 or 3 or something equally as horrific.  Hopefully they are
forward compatible with the newer releases and if not should surely be
tackled now before later, I will certainly bring this to the table!

>
> > 2. Connector configuration (possibly redacted)
> >> This is the 8443 section of the server.xml *8080 is available
> >> during the outage and I'm able to curl the management page to see
> >> the 300 used threads, their status, and age* <Service
> >> name="Catalina">
> >>
> >> [snip]
> >>
> >> <Connector port="8080" protocol="HTTP/1.1"
> >> connectionTimeout="20000" redirectPort="8443" /> [snip]
> >> <Connector port="8443"
> >> protocol="org.apache.coyote.http11.Http11NioProtocol"
> >> maxThreads="300" SSLEnabled="true" > <SSLHostConfig>
> >> <Certificate
> >> certificateKeystoreFile="/opt/apache-tomcat-9.0.29/redacted.jks"
> >> certificateKeystorePassword="redacted" type="RSA" />
> >> </SSLHostConfig> </Connector> [snip] <Connector port="8443"
> >> protocol="org.apache.coyote.http11.Http11NioProtocol"
> >> maxThreads="300" SSLEnabled="true" > <SSLHostConfig
> >> protocols="TLSv1.2"> <Certificate
> >> certificateKeystoreFile="/opt/apache-tomcat-9.0.29/redacted.jks"
> >> certificateKeystorePassword="redacted" type="RSA" />
> >> </SSLHostConfig> </Connector>
>
> What, two connectors on one port? Do you get errors when starting?
No errors, one is "with HTTP2" should I delete the other former?
>
> I don't see anything obviously problematic in the above configuration
> (other than the double-definition of the 8443 connector).
>
> 300 tied-up connections (from your initial report) sounds like a
> significant number: probably the thread count.
Yes sir, that's the NIO thread count for the 8443 connector.
>
> Mark (as is often the case) is right: take some thread dumps next time
> everything locks up and see what all those threads are doing. Often,
> it's something like everything is awaiting on a db connection and the
> db pool has been exhausted or something. Relatively simple quick-fixes
> are available for that, and better, longer-term fixes as well.
>
Mark/Chris  is there a way to dump the connector threads specifically?
 Or simply is it all contained as a machine/process thread?  Sorry I'm
not really a Linux guy.

> > Do you have a single F5 or a group of them?
> >> A group of them, several HA pairs depending on internal or
> >> external and application.  This server is behind one HA pair and
> >> is a single server.
>
> Okay. Just remember that each F5 can make some large number of
> connections to Tomcat, so you need to make sure you can handle them.
>
> This was a much bigger deal back in the BIO days when thread limit =
> connection limit, and the thread limit was usually something like 250
> - - 300. NIO is much better, and the default connection limit is 10k
> which "ought to be enough for anyone"[1].
(lol)

I'm more used to the 1-1 of the BIO style, which kinda confused me
when I asked the F5 to truncate >X connections and alert me and there
were 600+ connections while Tomcat manager stated ~30.  Then I read
what the non-interrupt was about.
>
>
>
> [1] With apologies to Bill gates, who apparently never said anything
> of the sort.

Thanks again,
David

> -----BEGIN PGP SIGNATURE-----
> Comment: Using GnuPG with Thunderbird - https://www.enigmail.net/
>
> iQIzBAEBCAAdFiEEMmKgYcQvxMe7tcJcHPApP6U8pFgFAl9IHUYACgkQHPApP6U8
> pFgcMhAAsN/Fc0nG4EJ/aaxZtj46g7FW2UDLa3HcGI+r8mvI5pYlCxWO0Cm4oDHn
> PAEUsjNgDcyLbWPa+hIfTWZ2v594w8ACrprpdNWHoPhZ316LudpG3G8RWwrIVsOa
> pn6MmX79rvds1Htl2cbsZkaaNCg/3+dx5VgyQtexHopSP9FpU1swDwex4fIf/pFz
> jcl4SB6eMnKxHwf/avwEy6sfdN05ALCl6KfJBCA6vnRvMT8hYVGs5B/bDdPRU5zL
> 0cNIAlNaxrcw0G13cuOhg5KYG+eeKBKl2R/OiSmyn4+Xp7zzbl3G3i4GvfbYrwqe
> BFTcTGT3cTE3vwMcHmsskh2soxYcH3etWtJ2/XsrKoKdRqXpWybVyNEvHcUwhxdP
> h7SAN5V8D2+9a8Hhh8y/hUCHBOT70THUyBipYweV26wUj4ipOAiYiJ2UaCATwNzf
> E7Giv6D4Y9WQCU119HaQ65TLmvGTtfzctM5pJzrnRbI7LOpuo9/bNYxkYNoU8U9r
> sAgY4t3kvKNttetFnwdY5+JTM+yrFolYFkYMFv8vpaVyiumP4+dpbkniRAmLabWl
> O0kIn/bRTkek4ic/qCuawBi1Rc1hV1/1uUE1+t8XHG7Sfdd0vwUabZ8ZRxNUBWcc
> EliCfzyMosWcsgU2puNduPyXDeRxKb5gfe4VdfaH5xvfdqIpfgw=
> =SesB
> -----END PGP SIGNATURE-----
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Tomcat 9.0.29 - HTTPS threads age, max connections reached, Tomcat not responding on 8443

Christopher Schultz-2
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA256

David,

On 8/27/20 17:14, David wrote:

> Thank you all for the replies!
>
> On Thu, Aug 27, 2020 at 3:53 PM Christopher Schultz
> <[hidden email]> wrote:
>>
> David,
>
> On 8/27/20 13:57, David wrote:
>>>> On Thu, Aug 27, 2020 at 12:35 PM Christopher Schultz
>>>> <[hidden email]> wrote:
>>>>>
>>>> David,
>>>>
>>>> On 8/27/20 10:48, David wrote:
>>>>>>> In the last two weeks I've had two occurrences where a
>>>>>>> single CentOS 7 production server hosting a public
>>>>>>> webpage has become unresponsive. The first time, all
>>>>>>> 300 available "https-jsse-nio-8443" threads were
>>>>>>> consumed, with the max age being around 45minutes, and
>>>>>>> all in a "S" status. This time all 300 were consumed in
>>>>>>> "S" status with the oldest being around ~16minutes. A
>>>>>>> restart of Tomcat on both occasions freed these threads
>>>>>>> and the website became responsive again. The
>>>>>>> connections are post/get methods which shouldn't take
>>>>>>> very long at all.
>>>>>>>
>>>>>>> CPU/MEM/JVM all appear to be within normal operating
>>>>>>> limits. I've not had much luck searching for articles
>>>>>>> for this behavior nor finding remedies. The default
>>>>>>> timeout values are used in both Tomcat and in the
>>>>>>> applications that run within as far as I can tell.
>>>>>>> Hopefully someone will have some insight on why the
>>>>>>> behavior could be occurring, why isn't Tomcat killing
>>>>>>> the connections? Even in a RST/ACK status, shouldn't
>>>>>>> Tomcat terminate the connection without an ACK from the
>>>>>>> client after the default timeout?
>>>>
>>>> Can you please post:
>>>>
>>>> 1. Complete Tomcat version
>>>>> I can't find anything more granular than 9.0.29, is there
>>>>> a command to show a sub patch level?
>
> 9.0.29 is the patch-level, so that's fine. You are about 10
> versions out of date (~1 year). Any chance for an upgrade?
>
>> They had to re-dev many apps last year when we upgraded from I
>> want to say 1 or 3 or something equally as horrific.  Hopefully
>> they are forward compatible with the newer releases and if not
>> should surely be tackled now before later, I will certainly bring
>> this to the table!

I've rarely been bitten by an upgrade from foo.bar.x to foo.bar.y.
There is a recent caveat if you are using the AJP connector, but you
are not so it's not an issue for you.

>>>> 2. Connector configuration (possibly redacted)
>>>>> This is the 8443 section of the server.xml *8080 is
>>>>> available during the outage and I'm able to curl the
>>>>> management page to see the 300 used threads, their status,
>>>>> and age* <Service name="Catalina">
>>>>>
>>>>> [snip]
>>>>>
>>>>> <Connector port="8080" protocol="HTTP/1.1"
>>>>> connectionTimeout="20000" redirectPort="8443" /> [snip]
>>>>> <Connector port="8443"
>>>>> protocol="org.apache.coyote.http11.Http11NioProtocol"
>>>>> maxThreads="300" SSLEnabled="true" > <SSLHostConfig>
>>>>> <Certificate
>>>>> certificateKeystoreFile="/opt/apache-tomcat-9.0.29/redacted.jks"
>>>>>
>>>>>
certificateKeystorePassword="redacted" type="RSA" />
>>>>> </SSLHostConfig> </Connector> [snip] <Connector
>>>>> port="8443"
>>>>> protocol="org.apache.coyote.http11.Http11NioProtocol"
>>>>> maxThreads="300" SSLEnabled="true" > <SSLHostConfig
>>>>> protocols="TLSv1.2"> <Certificate
>>>>> certificateKeystoreFile="/opt/apache-tomcat-9.0.29/redacted.jks"
>>>>>
>>>>>
certificateKeystorePassword="redacted" type="RSA" />
>>>>> </SSLHostConfig> </Connector>
>
> What, two connectors on one port? Do you get errors when starting?
>> No errors, one is "with HTTP2" should I delete the other former?

Well, one of them will succeed in starting the and other one should
fail. Did you copy/paste your config without modification? Weird you
don't have any errors. Usually you'll get an IOException or whatever
binding to the port twice.

> I don't see anything obviously problematic in the above
> configuration (other than the double-definition of the 8443
> connector).
>
> 300 tied-up connections (from your initial report) sounds like a
> significant number: probably the thread count.
>> Yes sir, that's the NIO thread count for the 8443 connector.
>
> Mark (as is often the case) is right: take some thread dumps next
> time everything locks up and see what all those threads are doing.
> Often, it's something like everything is awaiting on a db
> connection and the db pool has been exhausted or something.
> Relatively simple quick-fixes are available for that, and better,
> longer-term fixes as well.
>
>> Mark/Chris  is there a way to dump the connector threads
>> specifically? Or simply is it all contained as a machine/process
>> thread?  Sorry I'm not really a Linux guy.

Most of the threads in the server will be connector threads. They will
have names like https-nio-[port]-exec-[number].

If you get a thread dump[1], you'll get a stack trace from every thread.

Rainer wrote a great presentation about them in the context of Tomcat.
Feel free to give it a read:
http://home.apache.org/~rjung/presentations/2018-06-13-ApacheRoadShow-Ja
vaThreadDumps.pdf

>>>> Do you have a single F5 or a group of them?
>>>>> A group of them, several HA pairs depending on internal or
>>>>> external and application.  This server is behind one HA
>>>>> pair and is a single server.
>
> Okay. Just remember that each F5 can make some large number of
> connections to Tomcat, so you need to make sure you can handle
> them.
>
> This was a much bigger deal back in the BIO days when thread limit
> = connection limit, and the thread limit was usually something like
> 250 - 300. NIO is much better, and the default connection limit is
> 10k which "ought to be enough for anyone"[1].
>> (lol)
>
>> I'm more used to the 1-1 of the BIO style, which kinda confused
>> me when I asked the F5 to truncate >X connections and alert me
>> and there were 600+ connections while Tomcat manager stated ~30.
>> Then I read what the non-interrupt was about.

Yeah, NIO allows Tomcat to accept a large number of connections and
have a small number of threads process the work they represent. It's
not totally swarm-style processing because (a) the servlet spec makes
some guarantees about which thread processes your request and (b) Java
doesn't really have the ability to pause execution in one thread and
let another thread take it over.

If you really want totally asynchronous processing, your application
must opt-into it using a special API. So if you have a bog-standard
read, process, write style application, then 300 simultaneous requests
will be all you can handle. (Unless you raise that limit, of course.)

You said something interesting earlier and I want to make sure I
understood you correctly. You said that the application locked-up but
you were able to use curl to observe something. Can you be really
specific about that? Most requests come through port 8443. Which port
did you connect to in order to call curl? If it's 8443 than that's
suspicious. If it's 8080 then it makes more sense, as there will be a
different thread-pool used for each of those connectors.

- -chris

[1]
https://cwiki.apache.org/confluence/display/TOMCAT/HowTo#HowTo-HowdoIobt
ainathreaddumpofmyrunningwebapp?

-----BEGIN PGP SIGNATURE-----
Comment: Using GnuPG with Thunderbird - https://www.enigmail.net/

iQIzBAEBCAAdFiEEMmKgYcQvxMe7tcJcHPApP6U8pFgFAl9IJboACgkQHPApP6U8
pFjMwQ//aTiwmuOChBg1VtNeaFXqieclyTlAYKswe8QNtMqaug93YzPhBOsbXnEp
0bWONHuLFVfFH3ZPZb0JWAvICL/qzUb31d45RBh2JIoytsertkZvpxsqc/OIy6sz
TRsqD0qPIdT9jOFRl8zI9kK3j/afSJhWBvSMHG4kxz+g9ZwP159PGaWtyGjd9pP+
YQO76xQVoKpcYDSW/Miiil5L2pMFxIK/gYNZpxxisCDTAGbUZIBri6yGEL+z2S2M
bEpT+dJvzUb5qkrmdBlOzRN2cw0vBuBmx4PL+fUu1E5ruRMLelWf5MM5LxHbd7wD
SV0C0RXL+hVlWLLXtcqKInFJZtLtcEnBqsu/+n6FlF9s01wL1AFbKauyZe2GpDo3
R+ggSflvnxYTsl1BtTYJexxewb17BkPd2Wj8nJAUIyf7SorQ3rt3btNUnrhNejGe
Fi5s1OD72YAdn2KAEAJDHHndVqCVsLK7Yj0ka4EibFvqM6ke1xnWNz2ufba4JoRu
qaALnxKAg3yd3eHlt0UiTfbi5LvqJiIspBBV1jOMJnZskzKBb4u1WZYUSoJYUvnO
i8KJRhu671KFg8Cqk0B+QQkR1D40yh3ejjwRIUfD9FF4B1HJT10khCh/99Ot1ZPp
bnRpq3BU0H+ECN2fp6Yd4CO9mRmx7m24g63l40qMcJfx5KyEwIs=
=4r1b
-----END PGP SIGNATURE-----

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Tomcat 9.0.29 - HTTPS threads age, max connections reached, Tomcat not responding on 8443

David
On Thu, Aug 27, 2020 at 4:30 PM Christopher Schultz
<[hidden email]> wrote:

>
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA256
>
> David,
>
> On 8/27/20 17:14, David wrote:
> > Thank you all for the replies!
> >
> > On Thu, Aug 27, 2020 at 3:53 PM Christopher Schultz
> > <[hidden email]> wrote:
> >>
> > David,
> >
> > On 8/27/20 13:57, David wrote:
> >>>> On Thu, Aug 27, 2020 at 12:35 PM Christopher Schultz
> >>>> <[hidden email]> wrote:
> >>>>>
> >>>> David,
> >>>>
> >>>> On 8/27/20 10:48, David wrote:
> >>>>>>> In the last two weeks I've had two occurrences where a
> >>>>>>> single CentOS 7 production server hosting a public
> >>>>>>> webpage has become unresponsive. The first time, all
> >>>>>>> 300 available "https-jsse-nio-8443" threads were
> >>>>>>> consumed, with the max age being around 45minutes, and
> >>>>>>> all in a "S" status. This time all 300 were consumed in
> >>>>>>> "S" status with the oldest being around ~16minutes. A
> >>>>>>> restart of Tomcat on both occasions freed these threads
> >>>>>>> and the website became responsive again. The
> >>>>>>> connections are post/get methods which shouldn't take
> >>>>>>> very long at all.
> >>>>>>>
> >>>>>>> CPU/MEM/JVM all appear to be within normal operating
> >>>>>>> limits. I've not had much luck searching for articles
> >>>>>>> for this behavior nor finding remedies. The default
> >>>>>>> timeout values are used in both Tomcat and in the
> >>>>>>> applications that run within as far as I can tell.
> >>>>>>> Hopefully someone will have some insight on why the
> >>>>>>> behavior could be occurring, why isn't Tomcat killing
> >>>>>>> the connections? Even in a RST/ACK status, shouldn't
> >>>>>>> Tomcat terminate the connection without an ACK from the
> >>>>>>> client after the default timeout?
> >>>>
> >>>> Can you please post:
> >>>>
> >>>> 1. Complete Tomcat version
> >>>>> I can't find anything more granular than 9.0.29, is there
> >>>>> a command to show a sub patch level?
> >
> > 9.0.29 is the patch-level, so that's fine. You are about 10
> > versions out of date (~1 year). Any chance for an upgrade?
> >
> >> They had to re-dev many apps last year when we upgraded from I
> >> want to say 1 or 3 or something equally as horrific.  Hopefully
> >> they are forward compatible with the newer releases and if not
> >> should surely be tackled now before later, I will certainly bring
> >> this to the table!
>
> I've rarely been bitten by an upgrade from foo.bar.x to foo.bar.y.
> There is a recent caveat if you are using the AJP connector, but you
> are not so it's not an issue for you.
>
> >>>> 2. Connector configuration (possibly redacted)
> >>>>> This is the 8443 section of the server.xml *8080 is
> >>>>> available during the outage and I'm able to curl the
> >>>>> management page to see the 300 used threads, their status,
> >>>>> and age* <Service name="Catalina">
> >>>>>
> >>>>> [snip]
> >>>>>
> >>>>> <Connector port="8080" protocol="HTTP/1.1"
> >>>>> connectionTimeout="20000" redirectPort="8443" /> [snip]
> >>>>> <Connector port="8443"
> >>>>> protocol="org.apache.coyote.http11.Http11NioProtocol"
> >>>>> maxThreads="300" SSLEnabled="true" > <SSLHostConfig>
> >>>>> <Certificate
> >>>>> certificateKeystoreFile="/opt/apache-tomcat-9.0.29/redacted.jks"
> >>>>>
> >>>>>
> certificateKeystorePassword="redacted" type="RSA" />
> >>>>> </SSLHostConfig> </Connector> [snip] <Connector
> >>>>> port="8443"
> >>>>> protocol="org.apache.coyote.http11.Http11NioProtocol"
> >>>>> maxThreads="300" SSLEnabled="true" > <SSLHostConfig
> >>>>> protocols="TLSv1.2"> <Certificate
> >>>>> certificateKeystoreFile="/opt/apache-tomcat-9.0.29/redacted.jks"
> >>>>>
> >>>>>
> certificateKeystorePassword="redacted" type="RSA" />
> >>>>> </SSLHostConfig> </Connector>
> >
> > What, two connectors on one port? Do you get errors when starting?
> >> No errors, one is "with HTTP2" should I delete the other former?
>
> Well, one of them will succeed in starting the and other one should
> fail. Did you copy/paste your config without modification? Weird you
> don't have any errors. Usually you'll get an IOException or whatever
> binding to the port twice.

I do recall IOExceptions and "port already in use" errors that caused
Tomcat to not start, but I think these were related to syntax errors
when defining catalina variables for my JVM sizing.  I'll take another
look at catalina.out and ensure I don't still see these, and will
likely clean up the non "with http2" connector out of the config
regardless. The only edits to the section of the supplied xml were the
.jks store name and pw.

>
> > I don't see anything obviously problematic in the above
> > configuration (other than the double-definition of the 8443
> > connector).
> >
> > 300 tied-up connections (from your initial report) sounds like a
> > significant number: probably the thread count.
> >> Yes sir, that's the NIO thread count for the 8443 connector.
> >
> > Mark (as is often the case) is right: take some thread dumps next
> > time everything locks up and see what all those threads are doing.
> > Often, it's something like everything is awaiting on a db
> > connection and the db pool has been exhausted or something.
> > Relatively simple quick-fixes are available for that, and better,
> > longer-term fixes as well.
> >
> >> Mark/Chris  is there a way to dump the connector threads
> >> specifically? Or simply is it all contained as a machine/process
> >> thread?  Sorry I'm not really a Linux guy.
>
> Most of the threads in the server will be connector threads. They will
> have names like https-nio-[port]-exec-[number].
>
> If you get a thread dump[1], you'll get a stack trace from every thread.
>
> Rainer wrote a great presentation about them in the context of Tomcat.
> Feel free to give it a read:
> http://home.apache.org/~rjung/presentations/2018-06-13-ApacheRoadShow-Ja
> vaThreadDumps.pdf

Awesome!!  Thank you for that, I will certainly read it!

>
> >>>> Do you have a single F5 or a group of them?
> >>>>> A group of them, several HA pairs depending on internal or
> >>>>> external and application.  This server is behind one HA
> >>>>> pair and is a single server.
> >
> > Okay. Just remember that each F5 can make some large number of
> > connections to Tomcat, so you need to make sure you can handle
> > them.
> >
> > This was a much bigger deal back in the BIO days when thread limit
> > = connection limit, and the thread limit was usually something like
> > 250 - 300. NIO is much better, and the default connection limit is
> > 10k which "ought to be enough for anyone"[1].
> >> (lol)
> >
> >> I'm more used to the 1-1 of the BIO style, which kinda confused
> >> me when I asked the F5 to truncate >X connections and alert me
> >> and there were 600+ connections while Tomcat manager stated ~30.
> >> Then I read what the non-interrupt was about.
>
> Yeah, NIO allows Tomcat to accept a large number of connections and
> have a small number of threads process the work they represent. It's
> not totally swarm-style processing because (a) the servlet spec makes
> some guarantees about which thread processes your request and (b) Java
> doesn't really have the ability to pause execution in one thread and
> let another thread take it over.
>
> If you really want totally asynchronous processing, your application
> must opt-into it using a special API. So if you have a bog-standard
> read, process, write style application, then 300 simultaneous requests
> will be all you can handle. (Unless you raise that limit, of course.)
>
> You said something interesting earlier and I want to make sure I
> understood you correctly. You said that the application locked-up but
> you were able to use curl to observe something. Can you be really
> specific about that? Most requests come through port 8443. Which port
> did you connect to in order to call curl? If it's 8443 than that's
> suspicious. If it's 8080 then it makes more sense, as there will be a
> different thread-pool used for each of those connectors.
>
That is correct, I used the http to 8080 in order to read the Tomcat
webmanager stats.   I originally had issues with the JVM being too
small, running out of memory, CPU spiking, threads maxing out, and
whole system instability.  Getting more machine memory and upping the
JVM allocation has remedied all of that except for apparently the
thread issue.   I'm unsure that they were aging at that time as I
couldn't get into anything, but with no room for GC to take place it
would make sense that the threads would not be released.

My intention was to restart Tomcat nightly to lessen the chance of an
occurrence until I could find a way to restart Tomcat based on the
thread count and script a thread dump at the same time, (likely
through Solarwinds).  Now that you've explained that the NIO threads
are a part of the system threads, I may be able to script something
like that directly on the system, with a chrontab to check count, if
>295 contains NIO dump thread to / systemctl stop-start tomcat.
That's very warming as it seems a viable way to get the data I need
without posing much impact to users.   Your explanation of threads
leads me to believe that the nightly restart may be rather moot as it
could likely be exhaustion on the downstream causing the backup on the
front end.  I didn't see these connected in this way and assumed they
were asynchronous and independent processes.  There are timeouts
configured for all the DB2 backend connections, and I was in the
mindset of the least timeout would kill all connections
upstream/downstream by presenting the application a forcibly closed by
remote host or a timeout.

I greatly appreciate the assistance, In looking through various
articles none of this was really discussed because either everyone
knows it, or maybe it was discussed on a level where I couldn't
understand it, there certainly doesn't seem to be any other instances
of connections being open for 18-45minutes or if there is it's not an
issue for them.  During a normal glance at the manager page, there are
no connections and maybe like 5 empty lines in a "Ready" stage, even
if I spam the server's logon landing page I can never see a persistent
connection, so it baffled me as to how connections could hang and
build up, so I'm thinking something was perhaps messed up with the
backend.  The webapp names /URL's for the oldest connections didn't
coincide between the two outages, so I kind of brushed it off as being
application specific, however it may still be.

I need it to occur again and get some dumps!

>
>
> [1]
> https://cwiki.apache.org/confluence/display/TOMCAT/HowTo#HowTo-HowdoIobt
> ainathreaddumpofmyrunningwebapp?
>
> -----BEGIN PGP SIGNATURE-----
> Comment: Using GnuPG with Thunderbird - https://www.enigmail.net/
>
> iQIzBAEBCAAdFiEEMmKgYcQvxMe7tcJcHPApP6U8pFgFAl9IJboACgkQHPApP6U8
> pFjMwQ//aTiwmuOChBg1VtNeaFXqieclyTlAYKswe8QNtMqaug93YzPhBOsbXnEp
> 0bWONHuLFVfFH3ZPZb0JWAvICL/qzUb31d45RBh2JIoytsertkZvpxsqc/OIy6sz
> TRsqD0qPIdT9jOFRl8zI9kK3j/afSJhWBvSMHG4kxz+g9ZwP159PGaWtyGjd9pP+
> YQO76xQVoKpcYDSW/Miiil5L2pMFxIK/gYNZpxxisCDTAGbUZIBri6yGEL+z2S2M
> bEpT+dJvzUb5qkrmdBlOzRN2cw0vBuBmx4PL+fUu1E5ruRMLelWf5MM5LxHbd7wD
> SV0C0RXL+hVlWLLXtcqKInFJZtLtcEnBqsu/+n6FlF9s01wL1AFbKauyZe2GpDo3
> R+ggSflvnxYTsl1BtTYJexxewb17BkPd2Wj8nJAUIyf7SorQ3rt3btNUnrhNejGe
> Fi5s1OD72YAdn2KAEAJDHHndVqCVsLK7Yj0ka4EibFvqM6ke1xnWNz2ufba4JoRu
> qaALnxKAg3yd3eHlt0UiTfbi5LvqJiIspBBV1jOMJnZskzKBb4u1WZYUSoJYUvnO
> i8KJRhu671KFg8Cqk0B+QQkR1D40yh3ejjwRIUfD9FF4B1HJT10khCh/99Ot1ZPp
> bnRpq3BU0H+ECN2fp6Yd4CO9mRmx7m24g63l40qMcJfx5KyEwIs=
> =4r1b
> -----END PGP SIGNATURE-----
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Tomcat 9.0.29 - HTTPS threads age, max connections reached, Tomcat not responding on 8443

Christopher Schultz-2
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA256

David,

On 8/27/20 18:14, David wrote:
>> I used the http to 8080 in order to read the Tomcat webmanager
>> stats.   I originally had issues with the JVM being too small,
>> running out of memory, CPU spiking, threads maxing out, and
>> whole system instability.  Getting more machine memory and upping
>> the JVM allocation has remedied all of that except for apparently
>> the thread issue.
What is the memory size of the server and of the JVM?

>> I'm unsure that they were aging at that time as I couldn't get
>> into anything, but with no room for GC to take place it would
>> make sense that the threads would not be released.

That's not usually an issue, unless the application is being a
significant amount of memory during a request and then releasing it
after the request has completed.

>> My intention was to restart Tomcat nightly to lessen the chance
>> of an occurrence until I could find a way to restart Tomcat based
>> on the thread count and script a thread dump at the same time,
>> (likely through Solarwinds).  Now that you've explained that the
>> NIO threads are a part of the system threads, I may be able to
>> script something like that directly on the system, with a
>> chrontab to check count, if
> 295 contains NIO dump thread to / systemctl stop-start tomcat.

I wouldn't do that. Just because the threads exist does not mean they
are stuck. They may be doing useful work or otherwise running just
fine. I would look for other ways to detect problems.

>> That's very warming as it seems a viable way to get the data I
>> need without posing much impact to users.   Your explanation of
>> threads leads me to believe that the nightly restart may be
>> rather moot as it could likely be exhaustion on the downstream
>> causing the backup on the front end.  I didn't see these
>> connected in this way and assumed they were asynchronous and
>> independent processes.  There are timeouts configured for all the
>> DB2 backend connections, and I was in the mindset of the least
>> timeout would kill all connections upstream/downstream by
>> presenting the application a forcibly closed by remote host or a
>> timeout.

If you can suffer through a few more incidents, you can probably get a
LOT more information about the root problem and maybe even get it
solved, instead of just trying to stop the bleeding.

>> I greatly appreciate the assistance, In looking through various
>> articles none of this was really discussed because either
>> everyone knows it, or maybe it was discussed on a level where I
>> couldn't understand it, there certainly doesn't seem to be any
>> other instances of connections being open for 18-45minutes or if
>> there is it's not an issue for them.

If you have a load-balancer (which you do), then I'd expect HTTP
keep-alived to keep those connections open literally all day long,
only maybe expiring when you have configured them to expire "just in
case" or maybe after some amount of inactivity. For a lb-environment,
I'd want those keep-alive timeouts to be fairly high so you don't
waste any time re-constructing sockets between the lb and the app server
.

When an lb is NOT in the mix, you generally want /low/ keep-alive
timeouts because you can't rely on clients sticking around for very
long and you want to get them off your doorstep ASAP.

>> During a normal glance at the manager page, there are no
>> connections and maybe like 5 empty lines in a "Ready" stage,
>> even if I spam the server's logon landing page I can never see a
>> persistent connection, so it baffled me as to how connections
>> could hang and build up, so I'm thinking something was perhaps
>> messed up with the backend.

If by "backend" you mean like databasse, etc. then that is probably
the issue. The login page is (realtively) static, so it's very
difficult to put Tomcat under such load that it's hosing just giving
you that same page over and over again.

I don't know what your "spamming" strategy is, but you might want to
use a real load-generating tool like ApacheBench (ab) or, even better,
JMeter which can actually swarm among several machines to basically
DDoS your internal servers, which can be useful sometimes for
stress-testing. But your tests really do have to comprise a realistic
scenario, not just hammering on the login page all day.

>> The webapp names /URL's for the oldest connections didn't
>> coincide between the two outages, so I kind of brushed it off as
>> being application specific, however it may still be.
>
>> I need it to occur again and get some dumps!

Unfortunately, yes.

- -chris
-----BEGIN PGP SIGNATURE-----
Comment: Using GnuPG with Thunderbird - https://www.enigmail.net/

iQIzBAEBCAAdFiEEMmKgYcQvxMe7tcJcHPApP6U8pFgFAl9JYLIACgkQHPApP6U8
pFgE6w//YnYR85ETrZJ6jvV0+jGM0qHeZeIz7tP49MZfp5fczPoYb93vrxeQ8W2T
TaoiJCYpyN37w9IAZo4cxuIGaaF/j10OY2sLAqB+Ogu6FRYXmWLvzqkO+fpX6Kw+
/KKjl3cru0XKQqYpYXpfAl99G0EAFOAeT8r43guBjeF5vyqfZyaQJC/YyfEfFL9R
eREnXOAcyWHwiZLZ99964TdOsfTrMeNFuz7DG2AVlbNRgbR6uRn/+RaavH4iXcUJ
VekAIT4nrU8dhzOj8gjzoVBTmyfROWeJgcCv3TLv2w/kNkViO4zuO6ndNvJCkZzD
3x3C8MhVHNPGABVKUQQ4pWcNRb4wG005Ny9t3F/xfsKgYE5/LinSFICKEmDo5vGy
jA8/0rtnP2yZBxYYjgFV+NvR2kWzWUFed0Id4zsG802HfqdNMgv3C56gli4IeEc5
CXFecAPKpYPMFqFNA6iB4GF6wYEvy9PbiJf9H/lRcsW91QnblMyEcB1SNXvkz4hw
n6SjhJdJ/cnLKHUT6H03mVMwtlR14Oedc3qS7ZED0WFrNYOFAjo49FR2PCcaKA3Q
smmEwIcKonGp+fRo53M14oGNn1OFqYjaezcEZ3/oGvXQWXjLy3SGpfOYVe1CS08Z
uFFfctFoGqrBx2FDu3IBvAmX5Xc6KvjsnHLvcyqLh1CnBlivz/8=
=3hGV
-----END PGP SIGNATURE-----

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]