Weirdest Tomcat Behavior Ever?

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
81 messages Options
12345
Reply | Threaded
Open this post in threaded view
|

Weirdest Tomcat Behavior Ever?

Eric Robinson
Has anyone ever seen a situation where tomcat occasionally fails to send responses but still logs them?

On a CentOS 7.5 server running in Azure with tomcat 7.0.76 with java 1.0.0_221, everything runs fine 99.99% of the time, but that last hundredth of a percent is a bitch. However, intermittently the server receives a request and then just terminates the connection without responding. But the localhost_access log shows that it DID respond.

Let me say that again.

The localhost_access log shows a request received and an HTTP 200 response sent, as follows...

10.51.14.133 [15/Oct/2020:12:52:45 -0400] 57 GET /app/code.jsp?gizmoid=64438&weekday=5&aptdate=2020-10-15&multiResFacFilterId=0&sessionDID=0&GzUserId=71340&zztusrlogtblid=321072&zztappprocessid=40696&rnd2=0.0715816&timestamp=15102020125245.789063 HTTP/1.0 ?gizmoid=64438&weekday=5&aptdate=2020-10-15&multiResFacFilterId=0&sessionDID=0&GzUserId=71340&zztusrlogtblid=321072&zztappprocessid=40696&rnd2=0.0715816&timestamp=15102020125245.789063 200

But WireShark shows what really happened. The server received the GET request, and then it sent a FIN to terminate the connection. So if tomcat sent an HTTP response, it did not make it out the Ethernet card.

Is this the weirdest thing or what? Ideas would sure be appreciated!

-Eric

Disclaimer : This email and any files transmitted with it are confidential and intended solely for intended recipients. If you are not the named addressee you should not disseminate, distribute, copy or alter this email. Any views or opinions presented in this email are solely those of the author and might not represent those of Physician Select Management. Warning: Although Physician Select Management has taken reasonable precautions to ensure no viruses are present in this email, the company cannot accept responsibility for any loss or damage arising from the use of this email or attachments.
Reply | Threaded
Open this post in threaded view
|

Re: Weirdest Tomcat Behavior Ever?

markt
On 16/10/2020 00:27, Eric Robinson wrote:

<snip/>

> The localhost_access log shows a request received and an HTTP 200 response sent, as follows...
>
> 10.51.14.133 [15/Oct/2020:12:52:45 -0400] 57 GET /app/code.jsp?gizmoid=64438&weekday=5&aptdate=2020-10-15&multiResFacFilterId=0&sessionDID=0&GzUserId=71340&zztusrlogtblid=321072&zztappprocessid=40696&rnd2=0.0715816&timestamp=15102020125245.789063 HTTP/1.0 ?gizmoid=64438&weekday=5&aptdate=2020-10-15&multiResFacFilterId=0&sessionDID=0&GzUserId=71340&zztusrlogtblid=321072&zztappprocessid=40696&rnd2=0.0715816&timestamp=15102020125245.789063 200
>
> But WireShark shows what really happened. The server received the GET request, and then it sent a FIN to terminate the connection. So if tomcat sent an HTTP response, it did not make it out the Ethernet card.
>
> Is this the weirdest thing or what? Ideas would sure be appreciated!

I am assuming there is a typo in your Java version and you are using Java 8.

That Tomcat version is over 3.5 years old (and Tomcat 7 is EOL in less
than 6 months). If you aren't already planning to upgrade (I'd suggest
to 9.0.x) then you might want to start thinking about it.

I have a few ideas about what might be going on but rather than fire out
random theories I have some questions that might help narrow things down.

1. If this request was successful, how big is the response?

2. If this request was successful, how long would it typically take to
complete?

3. Looking at the Wireshark trace for a failed request, how long after
the last byte of the request is sent by the client does Tomcat send the FIN?

4. Looking at the Wireshark trace for a failed request, is the request
fully sent (including terminating CRLF etc)?

5. Are there any proxies, firewalls etc between the user agent and Tomcat?

6. What timeouts are configured for the Connector?

7. Is this HTTP/1.1, HTTP/2, AJP, with or without TLS?

8. Where are you running Wireshark? User agent? Tomcat? Somewhere else?

Mark

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

RE: Weirdest Tomcat Behavior Ever?

Eric Robinson
Hi Mark --

Those are great questions. See answers below.


> -----Original Message-----
> From: Mark Thomas <[hidden email]>
> Sent: Friday, October 16, 2020 2:20 AM
> To: [hidden email]
> Subject: Re: Weirdest Tomcat Behavior Ever?
>
> On 16/10/2020 00:27, Eric Robinson wrote:
>
> <snip/>
>
> > The localhost_access log shows a request received and an HTTP 200
> response sent, as follows...
> >
> > 10.51.14.133 [15/Oct/2020:12:52:45 -0400] 57 GET
> > /app/code.jsp?gizmoid=64438&weekday=5&aptdate=2020-10-
> 15&multiResFacFi
> >
> lterId=0&sessionDID=0&GzUserId=71340&zztusrlogtblid=321072&zztappproc
> e
> > ssid=40696&rnd2=0.0715816&timestamp=15102020125245.789063 HTTP/1.0
> > ?gizmoid=64438&weekday=5&aptdate=2020-10-
> 15&multiResFacFilterId=0&sess
> >
> ionDID=0&GzUserId=71340&zztusrlogtblid=321072&zztappprocessid=40696&
> rn
> > d2=0.0715816&timestamp=15102020125245.789063 200
> >
> > But WireShark shows what really happened. The server received the GET
> request, and then it sent a FIN to terminate the connection. So if tomcat sent
> an HTTP response, it did not make it out the Ethernet card.
> >
> > Is this the weirdest thing or what? Ideas would sure be appreciated!
>
> I am assuming there is a typo in your Java version and you are using Java 8.
>

Yes, Java 8.

> That Tomcat version is over 3.5 years old (and Tomcat 7 is EOL in less than 6
> months). If you aren't already planning to upgrade (I'd suggest to 9.0.x) then
> you might want to start thinking about it.
>

Vendor constraint. It's a canned application published by a national software company, and they have not officially approved tomcat 8 for use on Linux yet.

> I have a few ideas about what might be going on but rather than fire out
> random theories I have some questions that might help narrow things down.
>
> 1. If this request was successful, how big is the response?
>

1035 bytes.

> 2. If this request was successful, how long would it typically take to
> complete?
>

Under 60 ms.

> 3. Looking at the Wireshark trace for a failed request, how long after the last
> byte of the request is sent by the client does Tomcat send the FIN?
>

Maybe 100 microseconds.

> 4. Looking at the Wireshark trace for a failed request, is the request fully sent
> (including terminating CRLF etc)?
>

Yes, the request as seen by the tomcat server is complete and is terminated by 0D 0A.

> 5. Are there any proxies, firewalls etc between the user agent and Tomcat?
>

User agent -> firewall -> nginx plus -> upstream tomcat servers

> 6. What timeouts are configured for the Connector?
>

Sorry, which connector are you referring to?

> 7. Is this HTTP/1.1, HTTP/2, AJP, with or without TLS?
>

HTTP/1.1

> 8. Where are you running Wireshark? User agent? Tomcat? Somewhere
> else?

On the nginx proxy and both upstream tomcat servers. (On the user agent, too, but that doesn't help us in this case.)

If you would like to see a screen shot showing all 4 captures side-by-size, I can send you a secure link. It will verify my answers above. It shows 4 separate WireShark captures taken simultaneously:

(a) the request going from the nginx proxy to tomcat 1
(b) tomcat 1 receiving the request and terminating the connection
(c) nginx sending the request to tomcat 2
(d) tomcat 2 replying to the request (but the reply does not help the user because the tomcat server does not recognize the user agent's JSESSIONID cookie, so it responds "invalid session."

>
> Mark
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]

Disclaimer : This email and any files transmitted with it are confidential and intended solely for intended recipients. If you are not the named addressee you should not disseminate, distribute, copy or alter this email. Any views or opinions presented in this email are solely those of the author and might not represent those of Physician Select Management. Warning: Although Physician Select Management has taken reasonable precautions to ensure no viruses are present in this email, the company cannot accept responsibility for any loss or damage arising from the use of this email or attachments.

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

RE: Weirdest Tomcat Behavior Ever?

Eric Robinson
> > 6. What timeouts are configured for the Connector?
> >
>
> Sorry, which connector are you referring to?
>

Sorry, I'm a dummy. Obviously you mean the tomcat connector.

connectionTimeout="20000"

-Eric
Disclaimer : This email and any files transmitted with it are confidential and intended solely for intended recipients. If you are not the named addressee you should not disseminate, distribute, copy or alter this email. Any views or opinions presented in this email are solely those of the author and might not represent those of Physician Select Management. Warning: Although Physician Select Management has taken reasonable precautions to ensure no viruses are present in this email, the company cannot accept responsibility for any loss or damage arising from the use of this email or attachments.

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Weirdest Tomcat Behavior Ever?

markt
In reply to this post by Eric Robinson
On 16/10/2020 10:05, Eric Robinson wrote:

> Hi Mark --
>
> Those are great questions. See answers below.
>
>
>> -----Original Message-----
>> From: Mark Thomas <[hidden email]>
>> Sent: Friday, October 16, 2020 2:20 AM
>> To: [hidden email]
>> Subject: Re: Weirdest Tomcat Behavior Ever?
>>
>> On 16/10/2020 00:27, Eric Robinson wrote:
>>
>> <snip/>
>>
>>> The localhost_access log shows a request received and an HTTP 200
>> response sent, as follows...
>>>
>>> 10.51.14.133 [15/Oct/2020:12:52:45 -0400] 57 GET
>>> /app/code.jsp?gizmoid=64438&weekday=5&aptdate=2020-10-
>> 15&multiResFacFi
>>>
>> lterId=0&sessionDID=0&GzUserId=71340&zztusrlogtblid=321072&zztappproc
>> e
>>> ssid=40696&rnd2=0.0715816&timestamp=15102020125245.789063 HTTP/1.0
>>> ?gizmoid=64438&weekday=5&aptdate=2020-10-
>> 15&multiResFacFilterId=0&sess
>>>
>> ionDID=0&GzUserId=71340&zztusrlogtblid=321072&zztappprocessid=40696&
>> rn
>>> d2=0.0715816&timestamp=15102020125245.789063 200
>>>
>>> But WireShark shows what really happened. The server received the GET
>> request, and then it sent a FIN to terminate the connection. So if tomcat sent
>> an HTTP response, it did not make it out the Ethernet card.
>>>
>>> Is this the weirdest thing or what? Ideas would sure be appreciated!
>>
>> I am assuming there is a typo in your Java version and you are using Java 8.
>>
>
> Yes, Java 8.
>
>> That Tomcat version is over 3.5 years old (and Tomcat 7 is EOL in less than 6
>> months). If you aren't already planning to upgrade (I'd suggest to 9.0.x) then
>> you might want to start thinking about it.
>>
>
> Vendor constraint. It's a canned application published by a national software company, and they have not officially approved tomcat 8 for use on Linux yet.
>
>> I have a few ideas about what might be going on but rather than fire out
>> random theories I have some questions that might help narrow things down.
>>
>> 1. If this request was successful, how big is the response?
>>
>
> 1035 bytes.
>
>> 2. If this request was successful, how long would it typically take to
>> complete?
>>
>
> Under 60 ms.
>
>> 3. Looking at the Wireshark trace for a failed request, how long after the last
>> byte of the request is sent by the client does Tomcat send the FIN?
>>
>
> Maybe 100 microseconds.
>
>> 4. Looking at the Wireshark trace for a failed request, is the request fully sent
>> (including terminating CRLF etc)?
>>
>
> Yes, the request as seen by the tomcat server is complete and is terminated by 0D 0A.
>
>> 5. Are there any proxies, firewalls etc between the user agent and Tomcat?
>>
>
> User agent -> firewall -> nginx plus -> upstream tomcat servers
>
>> 6. What timeouts are configured for the Connector?
>>
>
> Sorry, which connector are you referring to?
>
>> 7. Is this HTTP/1.1, HTTP/2, AJP, with or without TLS?
>>
>
> HTTP/1.1
>
>> 8. Where are you running Wireshark? User agent? Tomcat? Somewhere
>> else?
>
> On the nginx proxy and both upstream tomcat servers. (On the user agent, too, but that doesn't help us in this case.)
>
> If you would like to see a screen shot showing all 4 captures side-by-size, I can send you a secure link. It will verify my answers above. It shows 4 separate WireShark captures taken simultaneously:
>
> (a) the request going from the nginx proxy to tomcat 1
> (b) tomcat 1 receiving the request and terminating the connection
> (c) nginx sending the request to tomcat 2
> (d) tomcat 2 replying to the request (but the reply does not help the user because the tomcat server does not recognize the user agent's JSESSIONID cookie, so it responds "invalid session."

Hmm.

That rules out most of my ideas.

I'd like to see those screen shots please. Better still would be access
to the captures themselves (just the relevant connections not the whole
thing). I believe what you are telling us but long experience tells me
it is best to double check the original data as well.

I have observed something similar ish in the CI systems. In that case it
is the requests that disappear. Client side logging shows the request
was made but there is no sign of it ever being received by Tomcat. I
don't have network traces for that (yet) so I'm not sure where the data
is going missing.

I am beginning to suspect there is a hard to trigger Tomcat or JVM bug
here. I think a Tomcat bug is more likely although I have been over the
code several times and I don't see anything.

A few more questions:

Which HTTP connector are you using? BIO, NIO or APR/Native?

Is the issue reproducible if you switch to a different connector?

How easy is it for you to reproduce this issue?

How are you linking the request you see in the access log with the
request you see in Wireshark?

How comfortable are you running a patched version of Tomcat (drop class
files provided by me into $CATALINA_BASE/lib in the right directory
structure and restart Tomcat)? Just thinking ahead about collecting
additional debug information.

Thanks,

Mark

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Weirdest Tomcat Behavior Ever?

Paul Carter-Brown-2
Hi Eric,

These weird situations are sometimes best looked at by confirming what the
OS is seeing from user-space.

Can you run: sudo strace -r -f -e trace=network -p <tomcat pid>

You can then log that to a file and correlate and see if the kernel is in
fact being asked to send the response.

It's very insightful to  see what is actually going on between the JVM and
Kernel.

Paul

On Fri, Oct 16, 2020 at 12:16 PM Mark Thomas <[hidden email]> wrote:

> On 16/10/2020 10:05, Eric Robinson wrote:
> > Hi Mark --
> >
> > Those are great questions. See answers below.
> >
> >
> >> -----Original Message-----
> >> From: Mark Thomas <[hidden email]>
> >> Sent: Friday, October 16, 2020 2:20 AM
> >> To: [hidden email]
> >> Subject: Re: Weirdest Tomcat Behavior Ever?
> >>
> >> On 16/10/2020 00:27, Eric Robinson wrote:
> >>
> >> <snip/>
> >>
> >>> The localhost_access log shows a request received and an HTTP 200
> >> response sent, as follows...
> >>>
> >>> 10.51.14.133 [15/Oct/2020:12:52:45 -0400] 57 GET
> >>> /app/code.jsp?gizmoid=64438&weekday=5&aptdate=2020-10-
> >> 15&multiResFacFi
> >>>
> >> lterId=0&sessionDID=0&GzUserId=71340&zztusrlogtblid=321072&zztappproc
> >> e
> >>> ssid=40696&rnd2=0.0715816&timestamp=15102020125245.789063 HTTP/1.0
> >>> ?gizmoid=64438&weekday=5&aptdate=2020-10-
> >> 15&multiResFacFilterId=0&sess
> >>>
> >> ionDID=0&GzUserId=71340&zztusrlogtblid=321072&zztappprocessid=40696&
> >> rn
> >>> d2=0.0715816&timestamp=15102020125245.789063 200
> >>>
> >>> But WireShark shows what really happened. The server received the GET
> >> request, and then it sent a FIN to terminate the connection. So if
> tomcat sent
> >> an HTTP response, it did not make it out the Ethernet card.
> >>>
> >>> Is this the weirdest thing or what? Ideas would sure be appreciated!
> >>
> >> I am assuming there is a typo in your Java version and you are using
> Java 8.
> >>
> >
> > Yes, Java 8.
> >
> >> That Tomcat version is over 3.5 years old (and Tomcat 7 is EOL in less
> than 6
> >> months). If you aren't already planning to upgrade (I'd suggest to
> 9.0.x) then
> >> you might want to start thinking about it.
> >>
> >
> > Vendor constraint. It's a canned application published by a national
> software company, and they have not officially approved tomcat 8 for use on
> Linux yet.
> >
> >> I have a few ideas about what might be going on but rather than fire out
> >> random theories I have some questions that might help narrow things
> down.
> >>
> >> 1. If this request was successful, how big is the response?
> >>
> >
> > 1035 bytes.
> >
> >> 2. If this request was successful, how long would it typically take to
> >> complete?
> >>
> >
> > Under 60 ms.
> >
> >> 3. Looking at the Wireshark trace for a failed request, how long after
> the last
> >> byte of the request is sent by the client does Tomcat send the FIN?
> >>
> >
> > Maybe 100 microseconds.
> >
> >> 4. Looking at the Wireshark trace for a failed request, is the request
> fully sent
> >> (including terminating CRLF etc)?
> >>
> >
> > Yes, the request as seen by the tomcat server is complete and is
> terminated by 0D 0A.
> >
> >> 5. Are there any proxies, firewalls etc between the user agent and
> Tomcat?
> >>
> >
> > User agent -> firewall -> nginx plus -> upstream tomcat servers
> >
> >> 6. What timeouts are configured for the Connector?
> >>
> >
> > Sorry, which connector are you referring to?
> >
> >> 7. Is this HTTP/1.1, HTTP/2, AJP, with or without TLS?
> >>
> >
> > HTTP/1.1
> >
> >> 8. Where are you running Wireshark? User agent? Tomcat? Somewhere
> >> else?
> >
> > On the nginx proxy and both upstream tomcat servers. (On the user agent,
> too, but that doesn't help us in this case.)
> >
> > If you would like to see a screen shot showing all 4 captures
> side-by-size, I can send you a secure link. It will verify my answers
> above. It shows 4 separate WireShark captures taken simultaneously:
> >
> > (a) the request going from the nginx proxy to tomcat 1
> > (b) tomcat 1 receiving the request and terminating the connection
> > (c) nginx sending the request to tomcat 2
> > (d) tomcat 2 replying to the request (but the reply does not help the
> user because the tomcat server does not recognize the user agent's
> JSESSIONID cookie, so it responds "invalid session."
>
> Hmm.
>
> That rules out most of my ideas.
>
> I'd like to see those screen shots please. Better still would be access
> to the captures themselves (just the relevant connections not the whole
> thing). I believe what you are telling us but long experience tells me
> it is best to double check the original data as well.
>
> I have observed something similar ish in the CI systems. In that case it
> is the requests that disappear. Client side logging shows the request
> was made but there is no sign of it ever being received by Tomcat. I
> don't have network traces for that (yet) so I'm not sure where the data
> is going missing.
>
> I am beginning to suspect there is a hard to trigger Tomcat or JVM bug
> here. I think a Tomcat bug is more likely although I have been over the
> code several times and I don't see anything.
>
> A few more questions:
>
> Which HTTP connector are you using? BIO, NIO or APR/Native?
>
> Is the issue reproducible if you switch to a different connector?
>
> How easy is it for you to reproduce this issue?
>
> How are you linking the request you see in the access log with the
> request you see in Wireshark?
>
> How comfortable are you running a patched version of Tomcat (drop class
> files provided by me into $CATALINA_BASE/lib in the right directory
> structure and restart Tomcat)? Just thinking ahead about collecting
> additional debug information.
>
> Thanks,
>
> Mark
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>
>
Reply | Threaded
Open this post in threaded view
|

RE: Weirdest Tomcat Behavior Ever?

Eric Robinson
In reply to this post by markt
> -----Original Message-----
> From: Mark Thomas <[hidden email]>
> Sent: Friday, October 16, 2020 5:17 AM
> To: [hidden email]
> Subject: Re: Weirdest Tomcat Behavior Ever?
>
> On 16/10/2020 10:05, Eric Robinson wrote:
> > Hi Mark --
> >
> > Those are great questions. See answers below.
> >
> >
> >> -----Original Message-----
> >> From: Mark Thomas <[hidden email]>
> >> Sent: Friday, October 16, 2020 2:20 AM
> >> To: [hidden email]
> >> Subject: Re: Weirdest Tomcat Behavior Ever?
> >>
> >> On 16/10/2020 00:27, Eric Robinson wrote:
> >>
> >> <snip/>
> >>
> >>> The localhost_access log shows a request received and an HTTP 200
> >> response sent, as follows...
> >>>
> >>> 10.51.14.133 [15/Oct/2020:12:52:45 -0400] 57 GET
> >>> /app/code.jsp?gizmoid=64438&weekday=5&aptdate=2020-10-
> >> 15&multiResFacFi
> >>>
> >>
> lterId=0&sessionDID=0&GzUserId=71340&zztusrlogtblid=321072&zztappproc
> >> e
> >>> ssid=40696&rnd2=0.0715816&timestamp=15102020125245.789063
> HTTP/1.0
> >>> ?gizmoid=64438&weekday=5&aptdate=2020-10-
> >> 15&multiResFacFilterId=0&sess
> >>>
> >>
> ionDID=0&GzUserId=71340&zztusrlogtblid=321072&zztappprocessid=40696&
> >> rn
> >>> d2=0.0715816&timestamp=15102020125245.789063 200
> >>>
> >>> But WireShark shows what really happened. The server received the
> >>> GET
> >> request, and then it sent a FIN to terminate the connection. So if
> >> tomcat sent an HTTP response, it did not make it out the Ethernet card.
> >>>
> >>> Is this the weirdest thing or what? Ideas would sure be appreciated!
> >>
> >> I am assuming there is a typo in your Java version and you are using Java 8.
> >>
> >
> > Yes, Java 8.
> >
> >> That Tomcat version is over 3.5 years old (and Tomcat 7 is EOL in
> >> less than 6 months). If you aren't already planning to upgrade (I'd
> >> suggest to 9.0.x) then you might want to start thinking about it.
> >>
> >
> > Vendor constraint. It's a canned application published by a national
> software company, and they have not officially approved tomcat 8 for use on
> Linux yet.
> >
> >> I have a few ideas about what might be going on but rather than fire
> >> out random theories I have some questions that might help narrow things
> down.
> >>
> >> 1. If this request was successful, how big is the response?
> >>
> >
> > 1035 bytes.
> >
> >> 2. If this request was successful, how long would it typically take
> >> to complete?
> >>
> >
> > Under 60 ms.
> >
> >> 3. Looking at the Wireshark trace for a failed request, how long
> >> after the last byte of the request is sent by the client does Tomcat send
> the FIN?
> >>
> >
> > Maybe 100 microseconds.
> >
> >> 4. Looking at the Wireshark trace for a failed request, is the
> >> request fully sent (including terminating CRLF etc)?
> >>
> >
> > Yes, the request as seen by the tomcat server is complete and is
> terminated by 0D 0A.
> >
> >> 5. Are there any proxies, firewalls etc between the user agent and
> Tomcat?
> >>
> >
> > User agent -> firewall -> nginx plus -> upstream tomcat servers
> >
> >> 6. What timeouts are configured for the Connector?
> >>
> >
> > Sorry, which connector are you referring to?
> >
> >> 7. Is this HTTP/1.1, HTTP/2, AJP, with or without TLS?
> >>
> >
> > HTTP/1.1
> >
> >> 8. Where are you running Wireshark? User agent? Tomcat? Somewhere
> >> else?
> >
> > On the nginx proxy and both upstream tomcat servers. (On the user
> > agent, too, but that doesn't help us in this case.)
> >
> > If you would like to see a screen shot showing all 4 captures side-by-size, I
> can send you a secure link. It will verify my answers above. It shows 4
> separate WireShark captures taken simultaneously:
> >
> > (a) the request going from the nginx proxy to tomcat 1
> > (b) tomcat 1 receiving the request and terminating the connection
> > (c) nginx sending the request to tomcat 2
> > (d) tomcat 2 replying to the request (but the reply does not help the user
> because the tomcat server does not recognize the user agent's JSESSIONID
> cookie, so it responds "invalid session."
>
> Hmm.
>
> That rules out most of my ideas.
>
> I'd like to see those screen shots please. Better still would be access to the
> captures themselves (just the relevant connections not the whole thing). I
> believe what you are telling us but long experience tells me it is best to
> double check the original data as well.
>

I'll send you a link to the screen shot first, then I'll package up the captures and send a link to that in a bit. As the files may contain somewhat sensitive information, I'll send a secure mail direct to your inbox.


> I have observed something similar ish in the CI systems. In that case it is the
> requests that disappear. Client side logging shows the request was made but
> there is no sign of it ever being received by Tomcat. I don't have network
> traces for that (yet) so I'm not sure where the data is going missing.
>
> I am beginning to suspect there is a hard to trigger Tomcat or JVM bug here. I
> think a Tomcat bug is more likely although I have been over the code several
> times and I don't see anything.
>

I'm thinking a bug of some kind, too, but I've been hosting about 1800 instances of tomcat for 15 years and I have never seen this behavior before.

> A few more questions:
>

This is where I will begin to struggle bit.

> Which HTTP connector are you using? BIO, NIO or APR/Native?
>

I believe BIO is the default? server.xml just says...

    <Connector port="3016" protocol="HTTP/1.1"
               connectionTimeout="20000"
               redirectPort="8443" />


> Is the issue reproducible if you switch to a different connector?
>

In 15 years of using tomcat in production, we've never tried switching the connector type. (Probably because the app vendor never suggested it.) I did a little research and I'm beginning to think about the pros/cons.

> How easy is it for you to reproduce this issue?
>

It's not reproducible at will but it happens frequently enough that we don't have to wait long for it to happen. I have wireshark capturing to disk continuously and rotating the files at 10 minute intervals to keep them smallish. Then I just tail the logs and wait.

> How are you linking the request you see in the access log with the request
> you see in Wireshark?
>

Aside from the timestamp of the packets and the timestamp of the tomcat log messages, each HTTP request also contains a high-resolution timestamp and a unique random number. That way, even if the same request occurs multiple times in rapid succession, we can still isolate the exact one that failed.

> How comfortable are you running a patched version of Tomcat (drop class
> files provided by me into $CATALINA_BASE/lib in the right directory structure
> and restart Tomcat)? Just thinking ahead about collecting additional debug
> information.
>

That would be a tricky in our production environment, but the users are getting desperate enough that we'd be willing to explore that approach.

> Thanks,
>
> Mark
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]

Disclaimer : This email and any files transmitted with it are confidential and intended solely for intended recipients. If you are not the named addressee you should not disseminate, distribute, copy or alter this email. Any views or opinions presented in this email are solely those of the author and might not represent those of Physician Select Management. Warning: Although Physician Select Management has taken reasonable precautions to ensure no viruses are present in this email, the company cannot accept responsibility for any loss or damage arising from the use of this email or attachments.

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]
Reply | Threaded
Open this post in threaded view
|

Re: Weirdest Tomcat Behavior Ever?

Daniel Skiles
In reply to this post by Eric Robinson
This is a bit of a long shot, but are you running on VMWare ESX by chance?
A few years ago, I had a problem where Tomcat would send data down the
socket, but about half the message wouldn't actually arrive at the client.

On Fri, Oct 16, 2020 at 5:05 AM Eric Robinson <[hidden email]>
wrote:

> Hi Mark --
>
> Those are great questions. See answers below.
>
>
> > -----Original Message-----
> > From: Mark Thomas <[hidden email]>
> > Sent: Friday, October 16, 2020 2:20 AM
> > To: [hidden email]
> > Subject: Re: Weirdest Tomcat Behavior Ever?
> >
> > On 16/10/2020 00:27, Eric Robinson wrote:
> >
> > <snip/>
> >
> > > The localhost_access log shows a request received and an HTTP 200
> > response sent, as follows...
> > >
> > > 10.51.14.133 [15/Oct/2020:12:52:45 -0400] 57 GET
> > > /app/code.jsp?gizmoid=64438&weekday=5&aptdate=2020-10-
> > 15&multiResFacFi
> > >
> > lterId=0&sessionDID=0&GzUserId=71340&zztusrlogtblid=321072&zztappproc
> > e
> > > ssid=40696&rnd2=0.0715816&timestamp=15102020125245.789063 HTTP/1.0
> > > ?gizmoid=64438&weekday=5&aptdate=2020-10-
> > 15&multiResFacFilterId=0&sess
> > >
> > ionDID=0&GzUserId=71340&zztusrlogtblid=321072&zztappprocessid=40696&
> > rn
> > > d2=0.0715816&timestamp=15102020125245.789063 200
> > >
> > > But WireShark shows what really happened. The server received the GET
> > request, and then it sent a FIN to terminate the connection. So if
> tomcat sent
> > an HTTP response, it did not make it out the Ethernet card.
> > >
> > > Is this the weirdest thing or what? Ideas would sure be appreciated!
> >
> > I am assuming there is a typo in your Java version and you are using
> Java 8.
> >
>
> Yes, Java 8.
>
> > That Tomcat version is over 3.5 years old (and Tomcat 7 is EOL in less
> than 6
> > months). If you aren't already planning to upgrade (I'd suggest to
> 9.0.x) then
> > you might want to start thinking about it.
> >
>
> Vendor constraint. It's a canned application published by a national
> software company, and they have not officially approved tomcat 8 for use on
> Linux yet.
>
> > I have a few ideas about what might be going on but rather than fire out
> > random theories I have some questions that might help narrow things down.
> >
> > 1. If this request was successful, how big is the response?
> >
>
> 1035 bytes.
>
> > 2. If this request was successful, how long would it typically take to
> > complete?
> >
>
> Under 60 ms.
>
> > 3. Looking at the Wireshark trace for a failed request, how long after
> the last
> > byte of the request is sent by the client does Tomcat send the FIN?
> >
>
> Maybe 100 microseconds.
>
> > 4. Looking at the Wireshark trace for a failed request, is the request
> fully sent
> > (including terminating CRLF etc)?
> >
>
> Yes, the request as seen by the tomcat server is complete and is
> terminated by 0D 0A.
>
> > 5. Are there any proxies, firewalls etc between the user agent and
> Tomcat?
> >
>
> User agent -> firewall -> nginx plus -> upstream tomcat servers
>
> > 6. What timeouts are configured for the Connector?
> >
>
> Sorry, which connector are you referring to?
>
> > 7. Is this HTTP/1.1, HTTP/2, AJP, with or without TLS?
> >
>
> HTTP/1.1
>
> > 8. Where are you running Wireshark? User agent? Tomcat? Somewhere
> > else?
>
> On the nginx proxy and both upstream tomcat servers. (On the user agent,
> too, but that doesn't help us in this case.)
>
> If you would like to see a screen shot showing all 4 captures
> side-by-size, I can send you a secure link. It will verify my answers
> above. It shows 4 separate WireShark captures taken simultaneously:
>
> (a) the request going from the nginx proxy to tomcat 1
> (b) tomcat 1 receiving the request and terminating the connection
> (c) nginx sending the request to tomcat 2
> (d) tomcat 2 replying to the request (but the reply does not help the user
> because the tomcat server does not recognize the user agent's JSESSIONID
> cookie, so it responds "invalid session."
>
> >
> > Mark
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: [hidden email]
> > For additional commands, e-mail: [hidden email]
>
> Disclaimer : This email and any files transmitted with it are confidential
> and intended solely for intended recipients. If you are not the named
> addressee you should not disseminate, distribute, copy or alter this email.
> Any views or opinions presented in this email are solely those of the
> author and might not represent those of Physician Select Management.
> Warning: Although Physician Select Management has taken reasonable
> precautions to ensure no viruses are present in this email, the company
> cannot accept responsibility for any loss or damage arising from the use of
> this email or attachments.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>
>
Reply | Threaded
Open this post in threaded view
|

Re: Weirdest Tomcat Behavior Ever?

markt
In reply to this post by Eric Robinson
On 16/10/2020 12:37, Eric Robinson wrote:
>> From: Mark Thomas <[hidden email]>

<snip/>

>> I'd like to see those screen shots please. Better still would be access to the
>> captures themselves (just the relevant connections not the whole thing). I
>> believe what you are telling us but long experience tells me it is best to
>> double check the original data as well.
>>
>
> I'll send you a link to the screen shot first, then I'll package up the captures and send a link to that in a bit. As the files may contain somewhat sensitive information, I'll send a secure mail direct to your inbox.

Thanks. The screenshots didn't shed any light on this so far.

>> I have observed something similar ish in the CI systems. In that case it is the
>> requests that disappear. Client side logging shows the request was made but
>> there is no sign of it ever being received by Tomcat. I don't have network
>> traces for that (yet) so I'm not sure where the data is going missing.
>>
>> I am beginning to suspect there is a hard to trigger Tomcat or JVM bug here. I
>> think a Tomcat bug is more likely although I have been over the code several
>> times and I don't see anything.
>>
>
> I'm thinking a bug of some kind, too, but I've been hosting about 1800 instances of tomcat for 15 years and I have never seen this behavior before.
>
>> A few more questions:
>>
>
> This is where I will begin to struggle bit.
>
>> Which HTTP connector are you using? BIO, NIO or APR/Native?
>>
>
> I believe BIO is the default? server.xml just says...
>
>     <Connector port="3016" protocol="HTTP/1.1"
>                connectionTimeout="20000"
>                redirectPort="8443" />

That will be BIO or APR/Native depending on whether you have Tomcat
Native installed. If you look at the logs for when Tomcat starts you
should see something like:

INFO: Initializing ProtocolHandler ["http-bio-3016"]
or
INFO: Initializing ProtocolHandler ["http-apr-3016"]

What do you see between the square brackets?

>> Is the issue reproducible if you switch to a different connector?
>>
>
> In 15 years of using tomcat in production, we've never tried switching the connector type. (Probably because the app vendor never suggested it.) I did a little research and I'm beginning to think about the pros/cons.

If you wanted to try this, I'd recommend:

protocol="org.apache.coyote.http11.Http11NioProtocol"

>> How easy is it for you to reproduce this issue?
>>
>
> It's not reproducible at will but it happens frequently enough that we don't have to wait long for it to happen. I have wireshark capturing to disk continuously and rotating the files at 10 minute intervals to keep them smallish. Then I just tail the logs and wait.

Ack.

>> How are you linking the request you see in the access log with the request
>> you see in Wireshark?
>
> Aside from the timestamp of the packets and the timestamp of the tomcat log messages, each HTTP request also contains a high-resolution timestamp and a unique random number. That way, even if the same request occurs multiple times in rapid succession, we can still isolate the exact one that failed.

Excellent.

>> How comfortable are you running a patched version of Tomcat (drop class
>> files provided by me into $CATALINA_BASE/lib in the right directory structure
>> and restart Tomcat)? Just thinking ahead about collecting additional debug
>> information.
>
> That would be a tricky in our production environment, but the users are getting desperate enough that we'd be willing to explore that approach.

Understood.

Some other questions that have come to mind:

- Has this app always had this problem?

- If not, when did it start and what changed at that point (JVM version,
Tomcat version etc)

- I notice the the requests in the screenshots aren't using HTTP
keep-alive. Do you know why?

- How heavily loaded is the system (typical requests per second etc)?

- I'm assuming there is nothing in the log files when the failed request
happens. Is this correct?

Mark

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

RE: Weirdest Tomcat Behavior Ever?

Eric Robinson
> -----Original Message-----
> From: Mark Thomas <[hidden email]>
> Sent: Friday, October 16, 2020 8:02 AM
> To: [hidden email]
> Subject: Re: Weirdest Tomcat Behavior Ever?
>
> On 16/10/2020 12:37, Eric Robinson wrote:
> >> From: Mark Thomas <[hidden email]>
>
> <snip/>
>
> >> I'd like to see those screen shots please. Better still would be
> >> access to the captures themselves (just the relevant connections not
> >> the whole thing). I believe what you are telling us but long
> >> experience tells me it is best to double check the original data as well.
> >>
> >
> > I'll send you a link to the screen shot first, then I'll package up the captures
> and send a link to that in a bit. As the files may contain somewhat sensitive
> information, I'll send a secure mail direct to your inbox.
>
> Thanks. The screenshots didn't shed any light on this so far.
>
> >> I have observed something similar ish in the CI systems. In that case
> >> it is the requests that disappear. Client side logging shows the
> >> request was made but there is no sign of it ever being received by
> >> Tomcat. I don't have network traces for that (yet) so I'm not sure where
> the data is going missing.
> >>
> >> I am beginning to suspect there is a hard to trigger Tomcat or JVM
> >> bug here. I think a Tomcat bug is more likely although I have been
> >> over the code several times and I don't see anything.
> >>
> >
> > I'm thinking a bug of some kind, too, but I've been hosting about 1800
> instances of tomcat for 15 years and I have never seen this behavior before.
> >
> >> A few more questions:
> >>
> >
> > This is where I will begin to struggle bit.
> >
> >> Which HTTP connector are you using? BIO, NIO or APR/Native?
> >>
> >
> > I believe BIO is the default? server.xml just says...
> >
> >     <Connector port="3016" protocol="HTTP/1.1"
> >                connectionTimeout="20000"
> >                redirectPort="8443" />
>
> That will be BIO or APR/Native depending on whether you have Tomcat
> Native installed. If you look at the logs for when Tomcat starts you should see
> something like:
>
> INFO: Initializing ProtocolHandler ["http-bio-3016"] or
> INFO: Initializing ProtocolHandler ["http-apr-3016"]
>
> What do you see between the square brackets?

["http-bio-3016"]

>
> >> Is the issue reproducible if you switch to a different connector?
> >>
> >
> > In 15 years of using tomcat in production, we've never tried switching the
> connector type. (Probably because the app vendor never suggested it.) I did
> a little research and I'm beginning to think about the pros/cons.
>
> If you wanted to try this, I'd recommend:
>
> protocol="org.apache.coyote.http11.Http11NioProtocol"
>

We're in the middle of a production day so I want to avoid restarting tomcat if I can, but I'll plan to change that tonight.

> >> How easy is it for you to reproduce this issue?
> >>
> >
> > It's not reproducible at will but it happens frequently enough that we don't
> have to wait long for it to happen. I have wireshark capturing to disk
> continuously and rotating the files at 10 minute intervals to keep them
> smallish. Then I just tail the logs and wait.
>
> Ack.
>
> >> How are you linking the request you see in the access log with the
> >> request you see in Wireshark?
> >
> > Aside from the timestamp of the packets and the timestamp of the tomcat
> log messages, each HTTP request also contains a high-resolution timestamp
> and a unique random number. That way, even if the same request occurs
> multiple times in rapid succession, we can still isolate the exact one that
> failed.
>
> Excellent.
>
> >> How comfortable are you running a patched version of Tomcat (drop
> >> class files provided by me into $CATALINA_BASE/lib in the right
> >> directory structure and restart Tomcat)? Just thinking ahead about
> >> collecting additional debug information.
> >
> > That would be a tricky in our production environment, but the users are
> getting desperate enough that we'd be willing to explore that approach.
>
> Understood.
>
> Some other questions that have come to mind:
>
> - Has this app always had this problem?
>

No, it's been running fine in this environment since October 2018.

> - If not, when did it start and what changed at that point (JVM version,
> Tomcat version etc)
>

This is a new thing in the past month or so, but we can't think of what might have changed. There are 2 Linux tomcat servers, each running 17 instances of on different ports. The symptom seems to appear in many of the instances intermittently, but for 2 of them it is painfully frequent. The users are getting kicked out multiple times per day. The tomcat and java versions have not changed since at least July 2019.

> - I notice the the requests in the screenshots aren't using HTTP keep-alive.
> Do you know why?
>

That's a good question. I checked the captures from the client side and they do have the HTTP keep-alive header. It must be something related to the proxy.

> - How heavily loaded is the system (typical requests per second etc)?
>

There are 17 instances of tomcat. Most of them get less than 10 requests/second. The instance I'm focusing on is the busiest, with 20-60 requests/second. From a workload standpoint, the servers are pretty bored.

> - I'm assuming there is nothing in the log files when the failed request
> happens. Is this correct?
>

If you mean the Linux system logs, I don't see anything remotely suspicious there. The localhost_access logs do show the failed requests with an HTTP 200 response. However, according to WireShark the response does not go out over the wire.

> Mark
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]

Disclaimer : This email and any files transmitted with it are confidential and intended solely for intended recipients. If you are not the named addressee you should not disseminate, distribute, copy or alter this email. Any views or opinions presented in this email are solely those of the author and might not represent those of Physician Select Management. Warning: Although Physician Select Management has taken reasonable precautions to ensure no viruses are present in this email, the company cannot accept responsibility for any loss or damage arising from the use of this email or attachments.

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]
Reply | Threaded
Open this post in threaded view
|

Re: Weirdest Tomcat Behavior Ever?

markt
Hi Eric (and those following along),

Eric sent me some network captures off-list from multiple points in the
network from the system where this is happening. The following is the
summary so far:

Overview:
A small number of requests are receiving a completely empty (no headers,
no body) response.


Information Gathering:
Successful requests that are similar to the failed request:
- Take 7ms to 13ms to complete
- Have relatively small responses (~1k)
- Use HTTP/1.0
- Do not use keep-alive (request has Connection: close header)
- The request target is a JSP

The failed request:
- Completes in ~6ms
- Has no response headers or body
- Records a successful entry in the access log

System:
Tomcat 7.0.76, BIO HTTP connector
Java 1.8.0_221,
CentOS 7.5 server running in Azure

Requests are received from an nginx reverse proxy. It looks like nginx
is using HTTP/1.0 without keep-alive to proxy requests to Tomcat. This
actually makes things a little easier as we have one TCP/IP connection
per request.

After switching to NIO, the issue is still observable (info received
off-list along with access to network traces etc.).

The issue appeared ~1 months ago after running without error since
October 2018. No known changes were made ~1 month ago.

The TCP sequence numbers show that, as far as the network stack is
concerned, Tomcat did not write any data before closing the connection
cleanly.

There is no other activity on the client port associated with the failed
request in the provided trace.

The failed request does not appear to violate the HTTP specification.

Neither the request nor the response are compressed.

No WebSocket or other HTTP upgrade requests present in the network traces.

No obviously relevant bugs fixed since 7.0.76.


Follow-up questions:
- JVM
  - Vendor?
  - OS package or direct from Vendor?

- Tomcat
  - OS package, 3rd-party package or direct from ASF?

- Config
  - Any changes at all around the time the problems started? I'm
    thinking OS updates, VM restarted etc?


Conclusions:
- It isn't timeout related. The request is completing in orders of
  magnitude less time than the timeout.

- Based on the timings it looks like the JSP is processing the request
  and generating the response.

- It happens with BIO so sendfile isn't a factor.

- No compression so GZIP issues aren't a factor.

- Given that the issue occurs with both BIO and NIO that rules out a bug
  in any BIO or NIO specific code. Note that while 7.0.x has largely
  separate code for BIO and NIO there are still significant sections of
  code that are essentially identical so it isn't quite as simple as
  just ruling out all the code in the BIO and NIO specific classes.
  It also makes a JVM issue seem less likely at this point.


Current thinking:
- I can think of various ways this might be happening but they all seem
  pretty unlikely. The next steps will be to enable existing logging
  (and potentially add some custom logging) to try and narrow down where
  the response data is disappearing.

- Having reviewed the BIO code, there is a mercifully simple way to see
  how many bytes Tomcat thinks it has written to the response. The "%b"
  pattern in the access log valve will show how many bytes from the
  request body that Tomcat has written to the network socket without an
  IOException. I'd prefer something that recorded header bytes as well
  but this is a good first start and doesn't require custom patches.


Recommendations:
- Switch back to the BIO connector if you haven't already. It has fewer
  moving parts than NIO so it is simpler debug.
- Add "%b" to the access log pattern for Tomcat's access log valve to
  record the number of body (excluding headers) bytes Tomcat believes it
  has written to the response.


Next steps:
- Wait for the issue to re-occur after the recommended changes above and
  depending on what is recorded in the access log for %b for a failed
  request, shift the focus accordingly.
- Answers to the additional questions would be nice but the access log
  %b value for a failed request is the key piece of information required
  at this point.


HTH,

Mark

(I'm trimming the previous discussion on this thread as I think all the
relevant info is summarized above but do point out if I have missed
something.)

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

RE: Weirdest Tomcat Behavior Ever?

Eric Robinson
Hi Mark --

Thanks tons for digging into this. See my answers below.

> -----Original Message-----
> From: Mark Thomas <[hidden email]>
> Sent: Friday, October 23, 2020 5:09 PM
> To: [hidden email]
> Subject: Re: Weirdest Tomcat Behavior Ever?
>
> Hi Eric (and those following along),
>
> Eric sent me some network captures off-list from multiple points in the
> network from the system where this is happening. The following is the
> summary so far:
>
> Overview:
> A small number of requests are receiving a completely empty (no headers,
> no body) response.
>
>
> Information Gathering:
> Successful requests that are similar to the failed request:
> - Take 7ms to 13ms to complete
> - Have relatively small responses (~1k)
> - Use HTTP/1.0
> - Do not use keep-alive (request has Connection: close header)
> - The request target is a JSP
>
> The failed request:
> - Completes in ~6ms

I think we've seen the failed requests take as much as 50ms.

> - Has no response headers or body
> - Records a successful entry in the access log
>
> System:
> Tomcat 7.0.76, BIO HTTP connector
> Java 1.8.0_221,
> CentOS 7.5 server running in Azure
>
> Requests are received from an nginx reverse proxy. It looks like nginx is using
> HTTP/1.0 without keep-alive to proxy requests to Tomcat. This actually
> makes things a little easier as we have one TCP/IP connection per request.
>
> After switching to NIO, the issue is still observable (info received off-list along
> with access to network traces etc.).
>
> The issue appeared ~1 months ago after running without error since October
> 2018. No known changes were made ~1 month ago.
>
> The TCP sequence numbers show that, as far as the network stack is
> concerned, Tomcat did not write any data before closing the connection
> cleanly.
>
> There is no other activity on the client port associated with the failed request
> in the provided trace.
>
> The failed request does not appear to violate the HTTP specification.
>
> Neither the request nor the response are compressed.
>
> No WebSocket or other HTTP upgrade requests present in the network
> traces.
>
> No obviously relevant bugs fixed since 7.0.76.
>
>
> Follow-up questions:
> - JVM
>   - Vendor?
>   - OS package or direct from Vendor?
>

jdk-8u221-linux-x64.tar.gz downloaded from the Oracle web site.


> - Tomcat
>   - OS package, 3rd-party package or direct from ASF?
>

tomcat.noarch  7.0.76-6.el7 from CentOS base repository

> - Config
>   - Any changes at all around the time the problems started? I'm
>     thinking OS updates, VM restarted etc?
>

server.xml has not changed since 4/20/20, which was well before the problem manifested, and all the other files in the conf folder are even older than that. We're seeing this symptom on both production servers. One of them was rebooted a week ago, but the other has been up continuously for 258 days.

>
> Conclusions:
> - It isn't timeout related. The request is completing in orders of
>   magnitude less time than the timeout.
>
> - Based on the timings it looks like the JSP is processing the request
>   and generating the response.
>
> - It happens with BIO so sendfile isn't a factor.
>
> - No compression so GZIP issues aren't a factor.
>
> - Given that the issue occurs with both BIO and NIO that rules out a bug
>   in any BIO or NIO specific code. Note that while 7.0.x has largely
>   separate code for BIO and NIO there are still significant sections of
>   code that are essentially identical so it isn't quite as simple as
>   just ruling out all the code in the BIO and NIO specific classes.
>   It also makes a JVM issue seem less likely at this point.
>
>
> Current thinking:
> - I can think of various ways this might be happening but they all seem
>   pretty unlikely. The next steps will be to enable existing logging
>   (and potentially add some custom logging) to try and narrow down where
>   the response data is disappearing.
>
> - Having reviewed the BIO code, there is a mercifully simple way to see
>   how many bytes Tomcat thinks it has written to the response. The "%b"
>   pattern in the access log valve will show how many bytes from the
>   request body that Tomcat has written to the network socket without an
>   IOException. I'd prefer something that recorded header bytes as well
>   but this is a good first start and doesn't require custom patches.
>
>
> Recommendations:
> - Switch back to the BIO connector if you haven't already. It has fewer
>   moving parts than NIO so it is simpler debug.
> - Add "%b" to the access log pattern for Tomcat's access log valve to
>   record the number of body (excluding headers) bytes Tomcat believes it
>   has written to the response.
>
>
> Next steps:
> - Wait for the issue to re-occur after the recommended changes above and
>   depending on what is recorded in the access log for %b for a failed
>   request, shift the focus accordingly.
> - Answers to the additional questions would be nice but the access log
>   %b value for a failed request is the key piece of information required
>   at this point.
>

Good news! I enabled that parameter a few days ago and we have already caught some instances of the problem occurring. Here is the logging format...

<Valve className="org.apache.catalina.valves.AccessLogValve" directory="logs"
               prefix="localhost_access." suffix=".log" pattern="%h %l %D %u %t %{JSESSIONID}c %{cookie}i %r %s %b %S %q" />

Due to some sensitive content in the HTTP requests below, I have globally replaced certain words and addresses with random-ish strings, but I don't think I've done anything to impact the issue.

Following is an example from Wednesday.

This is a request being sent from the nginx proxy to the first of 2 upstream servers, 10.51.14.46

2020/10/21 15:51:22 [error] 39268#39268: *842342531 upstream prematurely closed connection while reading response header from upstream, client: 99.88.77.66, server: redacted.domain.com, request: "GET /sandhut/jsp/catalog/xml/getWidgets.jsp?eDate=2020-10-21&statueId=64438&facilityId=0&DeptId=0&CompressXML=Yes&sessionDID=0&GzUserId=75064&gizusrlogtblid=322095&gizappprocessid=8568&rnd2=0.5650846&timestamp=21102020155122.472656 HTTP/1.1", upstream: "http://10.51.14.46:3016/sandhut/jsp/catalog/xml/getWidgets.jsp?eDate=2020-10-21&statueId=64438&facilityId=0&DeptId=0&CompressXML=Yes&sessionDID=0&GzUserId=75064&gizusrlogtblid=322095&gizappprocessid=8568&rnd2=0.5650846&timestamp=21102020155122.472656", host: "redacted.domain.com"

Here is the matching localhost_access log entry from that server….

10.51.14.133 - 144 - [21/Oct/2020:15:51:22 -0400] F405E25E49E3DCB81A36A87DED1FE573 JSESSIONID=F405E25E49E3DCB81A36A87DED1FE573; srv_id=dea8d61a7d725e980a6093cb78d8ec73; JSESSIONID=F405E25E49E3DCB81A36A87DED1FE573; srv_id=dea8d61a7d725e980a6093cb78d8ec73 GET /sandhut/jsp/catalog/xml/getWidgets.jsp?eDate=2020-10-21&statueId=64438&facilityId=0&DeptId=0&CompressXML=Yes&sessionDID=0&GzUserId=75064&gizusrlogtblid=322095&gizappprocessid=8568&rnd2=0.5650846&timestamp=21102020155122.472656 HTTP/1.0 200 40423 F405E25E49E3DCB81A36A87DED1FE573 ?eDate=2020-10-21&statueId=64438&facilityId=0&DeptId=0&CompressXML=Yes&sessionDID=0&GzUserId=75064&gizusrlogtblid=322095&gizappprocessid=8568&rnd2=0.5650846&timestamp=21102020155122.472656

Tomcat appears to think it sent 40423 bytes. However, even though it shows an HTTP 200 response, WireShark shows the response did not actually go out on the wire.

The proxy never got a response, so it sent the request to the next upstream, 10.51.14.57.

Here is the matching localhost_access log entry from that server…

10.51.14.133 - 1 - [21/Oct/2020:15:51:22 -0400] F405E25E49E3DCB81A36A87DED1FE573 JSESSIONID=F405E25E49E3DCB81A36A87DED1FE573; srv_id=dea8d61a7d725e980a6093cb78d8ec73; JSESSIONID=F405E25E49E3DCB81A36A87DED1FE573; srv_id=dea8d61a7d725e980a6093cb78d8ec73 GET /sandhut/jsp/catalog/xml/getWidgets.jsp?eDate=2020-10-21&statueId=64438&facilityId=0&DeptId=0&CompressXML=Yes&sessionDID=0&GzUserId=75064&gizusrlogtblid=322095&gizappprocessid=8568&rnd2=0.5650846&timestamp=21102020155122.472656 HTTP/1.0 200 663 - ?eDate=2020-10-21&statueId=64438&facilityId=0&DeptId=0&CompressXML=Yes&sessionDID=0&GzUserId=75064&gizusrlogtblid=322095&gizappprocessid=8568&rnd2=0.5650846&timestamp=21102020155122.472656

Tomcat on that server did not recognize the JSESSIONID cookie, so it sent back a 663 byte message stating, "The session is invalid."

The client was then kicked out of the software.

>
> HTH,
>
> Mark
>
> (I'm trimming the previous discussion on this thread as I think all the relevant
> info is summarized above but do point out if I have missed
> something.)
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]

Disclaimer : This email and any files transmitted with it are confidential and intended solely for intended recipients. If you are not the named addressee you should not disseminate, distribute, copy or alter this email. Any views or opinions presented in this email are solely those of the author and might not represent those of Physician Select Management. Warning: Although Physician Select Management has taken reasonable precautions to ensure no viruses are present in this email, the company cannot accept responsibility for any loss or damage arising from the use of this email or attachments.

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]
Reply | Threaded
Open this post in threaded view
|

RE: Weirdest Tomcat Behavior Ever?

Eric Robinson
> -----Original Message-----
> From: Eric Robinson <[hidden email]>
> Sent: Friday, October 23, 2020 7:09 PM
> To: Tomcat Users List <[hidden email]>
> Subject: RE: Weirdest Tomcat Behavior Ever?
>
> Hi Mark --
>
> Thanks tons for digging into this. See my answers below.
>
> > -----Original Message-----
> > From: Mark Thomas <[hidden email]>
> > Sent: Friday, October 23, 2020 5:09 PM
> > To: [hidden email]
> > Subject: Re: Weirdest Tomcat Behavior Ever?
> >
> > Hi Eric (and those following along),
> >
> > Eric sent me some network captures off-list from multiple points in
> > the network from the system where this is happening. The following is
> > the summary so far:
> >
> > Overview:
> > A small number of requests are receiving a completely empty (no
> > headers, no body) response.
> >
> >
> > Information Gathering:
> > Successful requests that are similar to the failed request:
> > - Take 7ms to 13ms to complete
> > - Have relatively small responses (~1k)
> > - Use HTTP/1.0
> > - Do not use keep-alive (request has Connection: close header)
> > - The request target is a JSP
> >
> > The failed request:
> > - Completes in ~6ms
>
> I think we've seen the failed requests take as much as 50ms.
>
> > - Has no response headers or body
> > - Records a successful entry in the access log
> >
> > System:
> > Tomcat 7.0.76, BIO HTTP connector
> > Java 1.8.0_221,
> > CentOS 7.5 server running in Azure
> >
> > Requests are received from an nginx reverse proxy. It looks like nginx
> > is using
> > HTTP/1.0 without keep-alive to proxy requests to Tomcat. This actually
> > makes things a little easier as we have one TCP/IP connection per request.
> >
> > After switching to NIO, the issue is still observable (info received
> > off-list along with access to network traces etc.).
> >
> > The issue appeared ~1 months ago after running without error since
> > October 2018. No known changes were made ~1 month ago.
> >
> > The TCP sequence numbers show that, as far as the network stack is
> > concerned, Tomcat did not write any data before closing the connection
> > cleanly.
> >
> > There is no other activity on the client port associated with the
> > failed request in the provided trace.
> >
> > The failed request does not appear to violate the HTTP specification.
> >
> > Neither the request nor the response are compressed.
> >
> > No WebSocket or other HTTP upgrade requests present in the network
> > traces.
> >
> > No obviously relevant bugs fixed since 7.0.76.
> >
> >
> > Follow-up questions:
> > - JVM
> >   - Vendor?
> >   - OS package or direct from Vendor?
> >
>
> jdk-8u221-linux-x64.tar.gz downloaded from the Oracle web site.
>
>
> > - Tomcat
> >   - OS package, 3rd-party package or direct from ASF?
> >
>
> tomcat.noarch  7.0.76-6.el7 from CentOS base repository
>

Drat, slight correction. I now recall that although we initially installed 7.0.76 from the CentOS repo, the application vendor made us lower the version to 7.0.72, and I DO NOT know where we got that. However, it has not changed since October-ish of 2018.

> > - Config
> >   - Any changes at all around the time the problems started? I'm
> >     thinking OS updates, VM restarted etc?
> >
>
> server.xml has not changed since 4/20/20, which was well before the
> problem manifested, and all the other files in the conf folder are even older
> than that. We're seeing this symptom on both production servers. One of
> them was rebooted a week ago, but the other has been up continuously for
> 258 days.
>
> >
> > Conclusions:
> > - It isn't timeout related. The request is completing in orders of
> >   magnitude less time than the timeout.
> >
> > - Based on the timings it looks like the JSP is processing the request
> >   and generating the response.
> >
> > - It happens with BIO so sendfile isn't a factor.
> >
> > - No compression so GZIP issues aren't a factor.
> >
> > - Given that the issue occurs with both BIO and NIO that rules out a bug
> >   in any BIO or NIO specific code. Note that while 7.0.x has largely
> >   separate code for BIO and NIO there are still significant sections of
> >   code that are essentially identical so it isn't quite as simple as
> >   just ruling out all the code in the BIO and NIO specific classes.
> >   It also makes a JVM issue seem less likely at this point.
> >
> >
> > Current thinking:
> > - I can think of various ways this might be happening but they all seem
> >   pretty unlikely. The next steps will be to enable existing logging
> >   (and potentially add some custom logging) to try and narrow down where
> >   the response data is disappearing.
> >
> > - Having reviewed the BIO code, there is a mercifully simple way to see
> >   how many bytes Tomcat thinks it has written to the response. The "%b"
> >   pattern in the access log valve will show how many bytes from the
> >   request body that Tomcat has written to the network socket without an
> >   IOException. I'd prefer something that recorded header bytes as well
> >   but this is a good first start and doesn't require custom patches.
> >
> >
> > Recommendations:
> > - Switch back to the BIO connector if you haven't already. It has fewer
> >   moving parts than NIO so it is simpler debug.
> > - Add "%b" to the access log pattern for Tomcat's access log valve to
> >   record the number of body (excluding headers) bytes Tomcat believes it
> >   has written to the response.
> >
> >
> > Next steps:
> > - Wait for the issue to re-occur after the recommended changes above and
> >   depending on what is recorded in the access log for %b for a failed
> >   request, shift the focus accordingly.
> > - Answers to the additional questions would be nice but the access log
> >   %b value for a failed request is the key piece of information required
> >   at this point.
> >
>
> Good news! I enabled that parameter a few days ago and we have already
> caught some instances of the problem occurring. Here is the logging format...
>
> <Valve className="org.apache.catalina.valves.AccessLogValve"
> directory="logs"
>                prefix="localhost_access." suffix=".log" pattern="%h %l %D %u %t
> %{JSESSIONID}c %{cookie}i %r %s %b %S %q" />
>
> Due to some sensitive content in the HTTP requests below, I have globally
> replaced certain words and addresses with random-ish strings, but I don't
> think I've done anything to impact the issue.
>
> Following is an example from Wednesday.
>
> This is a request being sent from the nginx proxy to the first of 2 upstream
> servers, 10.51.14.46
>
> 2020/10/21 15:51:22 [error] 39268#39268: *842342531 upstream prematurely
> closed connection while reading response header from upstream, client:
> 99.88.77.66, server: redacted.domain.com, request: "GET
> /sandhut/jsp/catalog/xml/getWidgets.jsp?eDate=2020-10-
> 21&statueId=64438&facilityId=0&DeptId=0&CompressXML=Yes&sessionDID
> =0&GzUserId=75064&gizusrlogtblid=322095&gizappprocessid=8568&rnd2=0.
> 5650846&timestamp=21102020155122.472656 HTTP/1.1", upstream:
> "http://10.51.14.46:3016/sandhut/jsp/catalog/xml/getWidgets.jsp?eDate=20
> 20-10-
> 21&statueId=64438&facilityId=0&DeptId=0&CompressXML=Yes&sessionDID
> =0&GzUserId=75064&gizusrlogtblid=322095&gizappprocessid=8568&rnd2=0.
> 5650846&timestamp=21102020155122.472656", host:
> "redacted.domain.com"
>
> Here is the matching localhost_access log entry from that server….
>
> 10.51.14.133 - 144 - [21/Oct/2020:15:51:22 -0400]
> F405E25E49E3DCB81A36A87DED1FE573
> JSESSIONID=F405E25E49E3DCB81A36A87DED1FE573;
> srv_id=dea8d61a7d725e980a6093cb78d8ec73;
> JSESSIONID=F405E25E49E3DCB81A36A87DED1FE573;
> srv_id=dea8d61a7d725e980a6093cb78d8ec73 GET
> /sandhut/jsp/catalog/xml/getWidgets.jsp?eDate=2020-10-
> 21&statueId=64438&facilityId=0&DeptId=0&CompressXML=Yes&sessionDID
> =0&GzUserId=75064&gizusrlogtblid=322095&gizappprocessid=8568&rnd2=0.
> 5650846&timestamp=21102020155122.472656 HTTP/1.0 200 40423
> F405E25E49E3DCB81A36A87DED1FE573 ?eDate=2020-10-
> 21&statueId=64438&facilityId=0&DeptId=0&CompressXML=Yes&sessionDID
> =0&GzUserId=75064&gizusrlogtblid=322095&gizappprocessid=8568&rnd2=0.
> 5650846&timestamp=21102020155122.472656
>
> Tomcat appears to think it sent 40423 bytes. However, even though it shows
> an HTTP 200 response, WireShark shows the response did not actually go out
> on the wire.
>
> The proxy never got a response, so it sent the request to the next upstream,
> 10.51.14.57.
>
> Here is the matching localhost_access log entry from that server…
>
> 10.51.14.133 - 1 - [21/Oct/2020:15:51:22 -0400]
> F405E25E49E3DCB81A36A87DED1FE573
> JSESSIONID=F405E25E49E3DCB81A36A87DED1FE573;
> srv_id=dea8d61a7d725e980a6093cb78d8ec73;
> JSESSIONID=F405E25E49E3DCB81A36A87DED1FE573;
> srv_id=dea8d61a7d725e980a6093cb78d8ec73 GET
> /sandhut/jsp/catalog/xml/getWidgets.jsp?eDate=2020-10-
> 21&statueId=64438&facilityId=0&DeptId=0&CompressXML=Yes&sessionDID
> =0&GzUserId=75064&gizusrlogtblid=322095&gizappprocessid=8568&rnd2=0.
> 5650846&timestamp=21102020155122.472656 HTTP/1.0 200 663 -
> ?eDate=2020-10-
> 21&statueId=64438&facilityId=0&DeptId=0&CompressXML=Yes&sessionDID
> =0&GzUserId=75064&gizusrlogtblid=322095&gizappprocessid=8568&rnd2=0.
> 5650846&timestamp=21102020155122.472656
>
> Tomcat on that server did not recognize the JSESSIONID cookie, so it sent
> back a 663 byte message stating, "The session is invalid."
>
> The client was then kicked out of the software.
>
> >
> > HTH,
> >
> > Mark
> >
> > (I'm trimming the previous discussion on this thread as I think all
> > the relevant info is summarized above but do point out if I have
> > missed
> > something.)
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: [hidden email]
> > For additional commands, e-mail: [hidden email]
>
> Disclaimer : This email and any files transmitted with it are confidential and
> intended solely for intended recipients. If you are not the named addressee
> you should not disseminate, distribute, copy or alter this email. Any views or
> opinions presented in this email are solely those of the author and might not
> represent those of Physician Select Management. Warning: Although
> Physician Select Management has taken reasonable precautions to ensure
> no viruses are present in this email, the company cannot accept responsibility
> for any loss or damage arising from the use of this email or attachments.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]

Disclaimer : This email and any files transmitted with it are confidential and intended solely for intended recipients. If you are not the named addressee you should not disseminate, distribute, copy or alter this email. Any views or opinions presented in this email are solely those of the author and might not represent those of Physician Select Management. Warning: Although Physician Select Management has taken reasonable precautions to ensure no viruses are present in this email, the company cannot accept responsibility for any loss or damage arising from the use of this email or attachments.

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]
Reply | Threaded
Open this post in threaded view
|

Re: Weirdest Tomcat Behavior Ever?

Konstantin Kolinko
In reply to this post by Eric Robinson
сб, 24 окт. 2020 г. в 03:08, Eric Robinson <[hidden email]>:

>
> > - Answers to the additional questions would be nice but the access log
> >   %b value for a failed request is the key piece of information required
> >   at this point.
> >
>
> Good news! I enabled that parameter a few days ago and we have already caught some instances of the problem occurring. Here is the logging format...
>
> <Valve className="org.apache.catalina.valves.AccessLogValve" directory="logs"
>                prefix="localhost_access." suffix=".log" pattern="%h %l %D %u %t %{JSESSIONID}c %{cookie}i %r %s %b %S %q" />

I usually also have the following pattern in my access logs:
[%{org.apache.catalina.parameter_parse_failed}r
%{org.apache.catalina.parameter_parse_failed_reason}r]

It captures whether a request.getParameter() call completed abnormally and why.
That is unlikely your cause, but there may be some other request
attributes that may be of interest, but nothing comes to mind at the
moment.

> Due to some sensitive content in the HTTP requests below, I have globally replaced certain words and addresses with random-ish strings, but I don't think I've done anything to impact the issue.
>
> Following is an example from Wednesday.
>
> This is a request being sent from the nginx proxy to the first of 2 upstream servers, 10.51.14.46
>
> 2020/10/21 15:51:22 [error] 39268#39268: *842342531 upstream prematurely closed connection while reading response header from upstream, client: 99.88.77.66, server: redacted.domain.com, request: "GET /sandhut/jsp/catalog/xml/getWidgets.jsp?eDate=2020-10-21&statueId=64438&facilityId=0&DeptId=0&CompressXML=Yes&sessionDID=0&GzUserId=75064&gizusrlogtblid=322095&gizappprocessid=8568&rnd2=0.5650846&timestamp=21102020155122.472656 HTTP/1.1", upstream: "http://10.51.14.46:3016/sandhut/jsp/catalog/xml/getWidgets.jsp?eDate=2020-10-21&statueId=64438&facilityId=0&DeptId=0&CompressXML=Yes&sessionDID=0&GzUserId=75064&gizusrlogtblid=322095&gizappprocessid=8568&rnd2=0.5650846&timestamp=21102020155122.472656", host: "redacted.domain.com"

I wonder what "CompressXML=Yes" does.


I also recommend to add the following system property to the configuration:
org.apache.catalina.connector.RECYCLE_FACADES=true

https://cwiki.apache.org/confluence/display/TOMCAT/Troubleshooting+and+Diagnostics#TroubleshootingandDiagnostics-TroubleshootingunexpectedResponsestateproblems

> Drat, slight correction. I now recall that although we initially installed 7.0.76 from the CentOS repo, the application vendor made us lower the version to 7.0.72, and I DO NOT know where we got that.

It is possible to compare the files with pristine version from ASF
archive dist site,
https://archive.apache.org/dist/tomcat/tomcat-7/

If it was installed as an RPM package, `rpm -qf` should be able to
answer what installed package those files belong to.
https://stackoverflow.com/questions/1133495/how-do-i-find-which-rpm-package-supplies-a-file-im-looking-for

Best regards,
Konstantin Kolinko

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Weirdest Tomcat Behavior Ever?

markt
In reply to this post by Eric Robinson
On 24/10/2020 01:32, Eric Robinson wrote:

<snip/>

>>> -----Original Message-----
>>> From: Mark Thomas <[hidden email]>

<snip/>

>>> The failed request:
>>> - Completes in ~6ms
>>
>> I think we've seen the failed requests take as much as 50ms.

Ack. That is still orders of magnitude smaller that the timeout and
consistent with generation time of some of the larger responses.

I wouldn't sat it confirms any of my previous conclusions but it doesn't
invalidate them either.

>>> Follow-up questions:
>>> - JVM
>>>   - Vendor?
>>>   - OS package or direct from Vendor?
>>>
>>
>> jdk-8u221-linux-x64.tar.gz downloaded from the Oracle web site.

OK. That is post Java 8u202 so it should be a paid for, commercially
supported version of Java 8.

The latest Java 8 release from Oracle is 8u271.

The latest Java 8 release from AdoptOpenJDK is 8u272.

I don't think we are quite at this point yet but what is your view on
updating to the latest Java 8 JDK (from either Oracle or AdoptOpenJDK).

>>> - Tomcat
>>>   - OS package, 3rd-party package or direct from ASF?
>>>
>>
>> tomcat.noarch  7.0.76-6.el7 from CentOS base repository
>>
>
> Drat, slight correction. I now recall that although we initially installed 7.0.76 from the CentOS repo, the application vendor made us lower the version to 7.0.72, and I DO NOT know where we got that. However, it has not changed since October-ish of 2018.

I've reviewed the 7.0.72 to 7.0.76 changelog and I don't see any
relevant changes.

>>> - Config
>>>   - Any changes at all around the time the problems started? I'm
>>>     thinking OS updates, VM restarted etc?
>>>
>>
>> server.xml has not changed since 4/20/20, which was well before the
>> problem manifested, and all the other files in the conf folder are even older
>> than that. We're seeing this symptom on both production servers. One of
>> them was rebooted a week ago, but the other has been up continuously for
>> 258 days.

OK. That rules a few things out which is good but it does make the
trigger for this issue even more mysterious.

Any changes in the Nginx configuration in the relevant timescale?

Any updates to the application in the relevant timescale?

Any features users started using that hadn't been used before in that
timescale?

<snip/>

>>> Recommendations:
>>> - Switch back to the BIO connector if you haven't already. It has fewer
>>>   moving parts than NIO so it is simpler debug.
>>> - Add "%b" to the access log pattern for Tomcat's access log valve to
>>>   record the number of body (excluding headers) bytes Tomcat believes it
>>>   has written to the response.
>>>
>>>
>>> Next steps:
>>> - Wait for the issue to re-occur after the recommended changes above and
>>>   depending on what is recorded in the access log for %b for a failed
>>>   request, shift the focus accordingly.
>>> - Answers to the additional questions would be nice but the access log
>>>   %b value for a failed request is the key piece of information required
>>>   at this point.
>>>
>>
>> Good news! I enabled that parameter a few days ago and we have already
>> caught some instances of the problem occurring.

Excellent!

>> Here is the logging format...
>>
>> <Valve className="org.apache.catalina.valves.AccessLogValve"
>> directory="logs"
>>                prefix="localhost_access." suffix=".log" pattern="%h %l %D %u %t
>> %{JSESSIONID}c %{cookie}i %r %s %b %S %q" />
>>
>> Due to some sensitive content in the HTTP requests below, I have globally
>> replaced certain words and addresses with random-ish strings, but I don't
>> think I've done anything to impact the issue.
>>
>> Following is an example from Wednesday.
>>
>> This is a request being sent from the nginx proxy to the first of 2 upstream
>> servers, 10.51.14.46
>>
>> 2020/10/21 15:51:22 [error] 39268#39268: *842342531 upstream prematurely
>> closed connection while reading response header from upstream, client:
>> 99.88.77.66, server: redacted.domain.com, request: "GET
>> /sandhut/jsp/catalog/xml/getWidgets.jsp?eDate=2020-10-
>> 21&statueId=64438&facilityId=0&DeptId=0&CompressXML=Yes&sessionDID
>> =0&GzUserId=75064&gizusrlogtblid=322095&gizappprocessid=8568&rnd2=0.
>> 5650846&timestamp=21102020155122.472656 HTTP/1.1", upstream:
>> "http://10.51.14.46:3016/sandhut/jsp/catalog/xml/getWidgets.jsp?eDate=20
>> 20-10-
>> 21&statueId=64438&facilityId=0&DeptId=0&CompressXML=Yes&sessionDID
>> =0&GzUserId=75064&gizusrlogtblid=322095&gizappprocessid=8568&rnd2=0.
>> 5650846&timestamp=21102020155122.472656", host:
>> "redacted.domain.com"
>>
>> Here is the matching localhost_access log entry from that server….
>>
>> 10.51.14.133 - 144 - [21/Oct/2020:15:51:22 -0400]
>> F405E25E49E3DCB81A36A87DED1FE573
>> JSESSIONID=F405E25E49E3DCB81A36A87DED1FE573;
>> srv_id=dea8d61a7d725e980a6093cb78d8ec73;
>> JSESSIONID=F405E25E49E3DCB81A36A87DED1FE573;
>> srv_id=dea8d61a7d725e980a6093cb78d8ec73 GET
>> /sandhut/jsp/catalog/xml/getWidgets.jsp?eDate=2020-10-
>> 21&statueId=64438&facilityId=0&DeptId=0&CompressXML=Yes&sessionDID
>> =0&GzUserId=75064&gizusrlogtblid=322095&gizappprocessid=8568&rnd2=0.
>> 5650846&timestamp=21102020155122.472656 HTTP/1.0 200 40423
>> F405E25E49E3DCB81A36A87DED1FE573 ?eDate=2020-10-
>> 21&statueId=64438&facilityId=0&DeptId=0&CompressXML=Yes&sessionDID
>> =0&GzUserId=75064&gizusrlogtblid=322095&gizappprocessid=8568&rnd2=0.
>> 5650846&timestamp=21102020155122.472656
>>
>> Tomcat appears to think it sent 40423 bytes. However, even though it shows
>> an HTTP 200 response, WireShark shows the response did not actually go out
>> on the wire.

That is very interesting. That narrows down where things might be going
wrong.

It confirms that the application is generating the response so we can
rule out a bug in that process.

By default there is some buffering here. The buffer size is 9000 bytes.
So with a response of 40423 I'd expect at least some of those bytes to
be written to network. However, Wireshark is reporting no bytes written.


At the moment, I see the following options:

1. Tomcat bug causing the writes to made to the wrong response.

2. Application bug (retaining references to request/response/writer
   etc.) causing the writes to made to the wrong response.

3. JVM bug causing network writes to be dropped.

4. OS bug causing network writes to be dropped before they reach the
   point where they are observable by Wireshark.

To be frank, none of those options seem particularly likely.

At this point I'd echo Konstantin's recommendation to add the following
system property:
org.apache.catalina.connector.RECYCLE_FACADES=true

You'd normally do this in $CATALINA_HOME/bin/setenv.sh (creating that
file if necessary) with a line like:

CATALINA_OPTS="$CATALINA_OPTS
-Dorg.apache.catalina.connector.RECYCLE_FACADES=true"

You can confirm that the setting has been applied by looking in the log
for the start-up. You should see something like:

Oct 26, 2020 10:18:45 AM
org.apache.catalina.startup.VersionLoggerListener log
INFO: Command line argument:
-Dorg.apache.catalina.connector.RECYCLE_FACADES=true


That option reduces the re-use of request, response and related objects
between requests and, if an application is retaining references it
shouldn't, you usually see a bunch of NullPointerExceptions in the logs
when the application tries to re-use those objects.

Meanwhile, I'm going to work on a custom patch for 7.0.72 to add some
additional logging around the network writes.

Mark

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Weirdest Tomcat Behavior Ever?

markt
On 26/10/2020 10:26, Mark Thomas wrote:

> On 24/10/2020 01:32, Eric Robinson wrote:
>
> <snip/>
>
>>>> -----Original Message-----
>>>> From: Mark Thomas <[hidden email]>
>
> <snip/>
>
>>>> The failed request:
>>>> - Completes in ~6ms
>>>
>>> I think we've seen the failed requests take as much as 50ms.
>
> Ack. That is still orders of magnitude smaller that the timeout and
> consistent with generation time of some of the larger responses.
>
> I wouldn't sat it confirms any of my previous conclusions but it doesn't
> invalidate them either.
>
>>>> Follow-up questions:
>>>> - JVM
>>>>   - Vendor?
>>>>   - OS package or direct from Vendor?
>>>>
>>>
>>> jdk-8u221-linux-x64.tar.gz downloaded from the Oracle web site.
>
> OK. That is post Java 8u202 so it should be a paid for, commercially
> supported version of Java 8.
>
> The latest Java 8 release from Oracle is 8u271.
>
> The latest Java 8 release from AdoptOpenJDK is 8u272.
>
> I don't think we are quite at this point yet but what is your view on
> updating to the latest Java 8 JDK (from either Oracle or AdoptOpenJDK).
>
>>>> - Tomcat
>>>>   - OS package, 3rd-party package or direct from ASF?
>>>>
>>>
>>> tomcat.noarch  7.0.76-6.el7 from CentOS base repository
>>>
>>
>> Drat, slight correction. I now recall that although we initially installed 7.0.76 from the CentOS repo, the application vendor made us lower the version to 7.0.72, and I DO NOT know where we got that. However, it has not changed since October-ish of 2018.
>
> I've reviewed the 7.0.72 to 7.0.76 changelog and I don't see any
> relevant changes.
>
>>>> - Config
>>>>   - Any changes at all around the time the problems started? I'm
>>>>     thinking OS updates, VM restarted etc?
>>>>
>>>
>>> server.xml has not changed since 4/20/20, which was well before the
>>> problem manifested, and all the other files in the conf folder are even older
>>> than that. We're seeing this symptom on both production servers. One of
>>> them was rebooted a week ago, but the other has been up continuously for
>>> 258 days.
>
> OK. That rules a few things out which is good but it does make the
> trigger for this issue even more mysterious.
>
> Any changes in the Nginx configuration in the relevant timescale?
>
> Any updates to the application in the relevant timescale?
>
> Any features users started using that hadn't been used before in that
> timescale?
>
> <snip/>
>
>>>> Recommendations:
>>>> - Switch back to the BIO connector if you haven't already. It has fewer
>>>>   moving parts than NIO so it is simpler debug.
>>>> - Add "%b" to the access log pattern for Tomcat's access log valve to
>>>>   record the number of body (excluding headers) bytes Tomcat believes it
>>>>   has written to the response.
>>>>
>>>>
>>>> Next steps:
>>>> - Wait for the issue to re-occur after the recommended changes above and
>>>>   depending on what is recorded in the access log for %b for a failed
>>>>   request, shift the focus accordingly.
>>>> - Answers to the additional questions would be nice but the access log
>>>>   %b value for a failed request is the key piece of information required
>>>>   at this point.
>>>>
>>>
>>> Good news! I enabled that parameter a few days ago and we have already
>>> caught some instances of the problem occurring.
>
> Excellent!
>
>>> Here is the logging format...
>>>
>>> <Valve className="org.apache.catalina.valves.AccessLogValve"
>>> directory="logs"
>>>                prefix="localhost_access." suffix=".log" pattern="%h %l %D %u %t
>>> %{JSESSIONID}c %{cookie}i %r %s %b %S %q" />
>>>
>>> Due to some sensitive content in the HTTP requests below, I have globally
>>> replaced certain words and addresses with random-ish strings, but I don't
>>> think I've done anything to impact the issue.
>>>
>>> Following is an example from Wednesday.
>>>
>>> This is a request being sent from the nginx proxy to the first of 2 upstream
>>> servers, 10.51.14.46
>>>
>>> 2020/10/21 15:51:22 [error] 39268#39268: *842342531 upstream prematurely
>>> closed connection while reading response header from upstream, client:
>>> 99.88.77.66, server: redacted.domain.com, request: "GET
>>> /sandhut/jsp/catalog/xml/getWidgets.jsp?eDate=2020-10-
>>> 21&statueId=64438&facilityId=0&DeptId=0&CompressXML=Yes&sessionDID
>>> =0&GzUserId=75064&gizusrlogtblid=322095&gizappprocessid=8568&rnd2=0.
>>> 5650846&timestamp=21102020155122.472656 HTTP/1.1", upstream:
>>> "http://10.51.14.46:3016/sandhut/jsp/catalog/xml/getWidgets.jsp?eDate=20
>>> 20-10-
>>> 21&statueId=64438&facilityId=0&DeptId=0&CompressXML=Yes&sessionDID
>>> =0&GzUserId=75064&gizusrlogtblid=322095&gizappprocessid=8568&rnd2=0.
>>> 5650846&timestamp=21102020155122.472656", host:
>>> "redacted.domain.com"
>>>
>>> Here is the matching localhost_access log entry from that server….
>>>
>>> 10.51.14.133 - 144 - [21/Oct/2020:15:51:22 -0400]
>>> F405E25E49E3DCB81A36A87DED1FE573
>>> JSESSIONID=F405E25E49E3DCB81A36A87DED1FE573;
>>> srv_id=dea8d61a7d725e980a6093cb78d8ec73;
>>> JSESSIONID=F405E25E49E3DCB81A36A87DED1FE573;
>>> srv_id=dea8d61a7d725e980a6093cb78d8ec73 GET
>>> /sandhut/jsp/catalog/xml/getWidgets.jsp?eDate=2020-10-
>>> 21&statueId=64438&facilityId=0&DeptId=0&CompressXML=Yes&sessionDID
>>> =0&GzUserId=75064&gizusrlogtblid=322095&gizappprocessid=8568&rnd2=0.
>>> 5650846&timestamp=21102020155122.472656 HTTP/1.0 200 40423
>>> F405E25E49E3DCB81A36A87DED1FE573 ?eDate=2020-10-
>>> 21&statueId=64438&facilityId=0&DeptId=0&CompressXML=Yes&sessionDID
>>> =0&GzUserId=75064&gizusrlogtblid=322095&gizappprocessid=8568&rnd2=0.
>>> 5650846&timestamp=21102020155122.472656
>>>
>>> Tomcat appears to think it sent 40423 bytes. However, even though it shows
>>> an HTTP 200 response, WireShark shows the response did not actually go out
>>> on the wire.
>
> That is very interesting. That narrows down where things might be going
> wrong.
>
> It confirms that the application is generating the response so we can
> rule out a bug in that process.
>
> By default there is some buffering here. The buffer size is 9000 bytes.
> So with a response of 40423 I'd expect at least some of those bytes to
> be written to network. However, Wireshark is reporting no bytes written.
>
>
> At the moment, I see the following options:
>
> 1. Tomcat bug causing the writes to made to the wrong response.
>
> 2. Application bug (retaining references to request/response/writer
>    etc.) causing the writes to made to the wrong response.
>
> 3. JVM bug causing network writes to be dropped.
>
> 4. OS bug causing network writes to be dropped before they reach the
>    point where they are observable by Wireshark.
>
> To be frank, none of those options seem particularly likely.
>
> At this point I'd echo Konstantin's recommendation to add the following
> system property:
> org.apache.catalina.connector.RECYCLE_FACADES=true
>
> You'd normally do this in $CATALINA_HOME/bin/setenv.sh (creating that
> file if necessary) with a line like:
>
> CATALINA_OPTS="$CATALINA_OPTS
> -Dorg.apache.catalina.connector.RECYCLE_FACADES=true"
>
> You can confirm that the setting has been applied by looking in the log
> for the start-up. You should see something like:
>
> Oct 26, 2020 10:18:45 AM
> org.apache.catalina.startup.VersionLoggerListener log
> INFO: Command line argument:
> -Dorg.apache.catalina.connector.RECYCLE_FACADES=true
>
>
> That option reduces the re-use of request, response and related objects
> between requests and, if an application is retaining references it
> shouldn't, you usually see a bunch of NullPointerExceptions in the logs
> when the application tries to re-use those objects.
>
> Meanwhile, I'm going to work on a custom patch for 7.0.72 to add some
> additional logging around the network writes.

Patch files and instructions for use:

http://home.apache.org/~markt/dev/v7.0.72-custom-patch-v1/

Mark

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

RE: Weirdest Tomcat Behavior Ever?

Eric Robinson
> On 26/10/2020 10:26, Mark Thomas wrote:
> > On 24/10/2020 01:32, Eric Robinson wrote:
> >
> > <snip/>
> >
> >>>> -----Original Message-----
> >>>> From: Mark Thomas <[hidden email]>
> >
> > <snip/>
> >
> >>>> The failed request:
> >>>> - Completes in ~6ms
> >>>
> >>> I think we've seen the failed requests take as much as 50ms.
> >
> > Ack. That is still orders of magnitude smaller that the timeout and
> > consistent with generation time of some of the larger responses.
> >
> > I wouldn't sat it confirms any of my previous conclusions but it
> > doesn't invalidate them either.
> >
> >>>> Follow-up questions:
> >>>> - JVM
> >>>>   - Vendor?
> >>>>   - OS package or direct from Vendor?
> >>>>
> >>>
> >>> jdk-8u221-linux-x64.tar.gz downloaded from the Oracle web site.
> >
> > OK. That is post Java 8u202 so it should be a paid for, commercially
> > supported version of Java 8.
> >
> > The latest Java 8 release from Oracle is 8u271.
> >
> > The latest Java 8 release from AdoptOpenJDK is 8u272.
> >
> > I don't think we are quite at this point yet but what is your view on
> > updating to the latest Java 8 JDK (from either Oracle or AdoptOpenJDK).
> >
> >>>> - Tomcat
> >>>>   - OS package, 3rd-party package or direct from ASF?
> >>>>
> >>>
> >>> tomcat.noarch  7.0.76-6.el7 from CentOS base repository
> >>>
> >>
> >> Drat, slight correction. I now recall that although we initially installed 7.0.76
> from the CentOS repo, the application vendor made us lower the version to
> 7.0.72, and I DO NOT know where we got that. However, it has not changed
> since October-ish of 2018.
> >
> > I've reviewed the 7.0.72 to 7.0.76 changelog and I don't see any
> > relevant changes.
> >
> >>>> - Config
> >>>>   - Any changes at all around the time the problems started? I'm
> >>>>     thinking OS updates, VM restarted etc?
> >>>>
> >>>
> >>> server.xml has not changed since 4/20/20, which was well before the
> >>> problem manifested, and all the other files in the conf folder are
> >>> even older than that. We're seeing this symptom on both production
> >>> servers. One of them was rebooted a week ago, but the other has been
> >>> up continuously for
> >>> 258 days.
> >
> > OK. That rules a few things out which is good but it does make the
> > trigger for this issue even more mysterious.
> >
> > Any changes in the Nginx configuration in the relevant timescale?
> >

The last change to the nginx config files was on 8/21. The first report of problems from the users in question was on 9/16. There is another set of users on a different tomcat instance who reported issues around 8/26, 5 days after nginx config change. It seems unlikely to be related. Also, I can't imagine what nginx could be sending that would induce the upstream tomcat to behave this way.

> > Any updates to the application in the relevant timescale?
> >

Their application was patched to a newer version on 6/5.

> > Any features users started using that hadn't been used before in that
> > timescale?

That one I couldn't answer, as we are only the hosting facility and we are not in the loop when it comes to the users' workflow, but it seems unlikely given the nature of their business.

> >
> > <snip/>
> >
> >>>> Recommendations:
> >>>> - Switch back to the BIO connector if you haven't already. It has fewer
> >>>>   moving parts than NIO so it is simpler debug.
> >>>> - Add "%b" to the access log pattern for Tomcat's access log valve to
> >>>>   record the number of body (excluding headers) bytes Tomcat believes
> it
> >>>>   has written to the response.
> >>>>
> >>>>
> >>>> Next steps:
> >>>> - Wait for the issue to re-occur after the recommended changes above
> and
> >>>>   depending on what is recorded in the access log for %b for a failed
> >>>>   request, shift the focus accordingly.
> >>>> - Answers to the additional questions would be nice but the access log
> >>>>   %b value for a failed request is the key piece of information required
> >>>>   at this point.
> >>>>
> >>>
> >>> Good news! I enabled that parameter a few days ago and we have
> >>> already caught some instances of the problem occurring.
> >
> > Excellent!
> >
> >>> Here is the logging format...
> >>>
> >>> <Valve className="org.apache.catalina.valves.AccessLogValve"
> >>> directory="logs"
> >>>                prefix="localhost_access." suffix=".log" pattern="%h
> >>> %l %D %u %t %{JSESSIONID}c %{cookie}i %r %s %b %S %q" />
> >>>
> >>> Due to some sensitive content in the HTTP requests below, I have
> >>> globally replaced certain words and addresses with random-ish
> >>> strings, but I don't think I've done anything to impact the issue.
> >>>
> >>> Following is an example from Wednesday.
> >>>
> >>> This is a request being sent from the nginx proxy to the first of 2
> >>> upstream servers, 10.51.14.46
> >>>
> >>> 2020/10/21 15:51:22 [error] 39268#39268: *842342531 upstream
> >>> prematurely closed connection while reading response header from
> upstream, client:
> >>> 99.88.77.66, server: redacted.domain.com, request: "GET
> >>> /sandhut/jsp/catalog/xml/getWidgets.jsp?eDate=2020-10-
> >>>
> 21&statueId=64438&facilityId=0&DeptId=0&CompressXML=Yes&sessionDID
> >>>
> =0&GzUserId=75064&gizusrlogtblid=322095&gizappprocessid=8568&rnd2=0.
> >>> 5650846&timestamp=21102020155122.472656 HTTP/1.1", upstream:
> >>> "http://10.51.14.46:3016/sandhut/jsp/catalog/xml/getWidgets.jsp?eDat
> >>> e=20
> >>> 20-10-
> >>>
> 21&statueId=64438&facilityId=0&DeptId=0&CompressXML=Yes&sessionDID
> >>>
> =0&GzUserId=75064&gizusrlogtblid=322095&gizappprocessid=8568&rnd2=0.
> >>> 5650846&timestamp=21102020155122.472656", host:
> >>> "redacted.domain.com"
> >>>
> >>> Here is the matching localhost_access log entry from that server….
> >>>
> >>> 10.51.14.133 - 144 - [21/Oct/2020:15:51:22 -0400]
> >>> F405E25E49E3DCB81A36A87DED1FE573
> >>> JSESSIONID=F405E25E49E3DCB81A36A87DED1FE573;
> >>> srv_id=dea8d61a7d725e980a6093cb78d8ec73;
> >>> JSESSIONID=F405E25E49E3DCB81A36A87DED1FE573;
> >>> srv_id=dea8d61a7d725e980a6093cb78d8ec73 GET
> >>> /sandhut/jsp/catalog/xml/getWidgets.jsp?eDate=2020-10-
> >>>
> 21&statueId=64438&facilityId=0&DeptId=0&CompressXML=Yes&sessionDID
> >>>
> =0&GzUserId=75064&gizusrlogtblid=322095&gizappprocessid=8568&rnd2=0.
> >>> 5650846&timestamp=21102020155122.472656 HTTP/1.0 200 40423
> >>> F405E25E49E3DCB81A36A87DED1FE573 ?eDate=2020-10-
> >>>
> 21&statueId=64438&facilityId=0&DeptId=0&CompressXML=Yes&sessionDID
> >>>
> =0&GzUserId=75064&gizusrlogtblid=322095&gizappprocessid=8568&rnd2=0.
> >>> 5650846&timestamp=21102020155122.472656
> >>>
> >>> Tomcat appears to think it sent 40423 bytes. However, even though it
> >>> shows an HTTP 200 response, WireShark shows the response did not
> >>> actually go out on the wire.
> >
> > That is very interesting. That narrows down where things might be
> > going wrong.
> >
> > It confirms that the application is generating the response so we can
> > rule out a bug in that process.
> >
> > By default there is some buffering here. The buffer size is 9000 bytes.
> > So with a response of 40423 I'd expect at least some of those bytes to
> > be written to network. However, Wireshark is reporting no bytes written.
> >
> >
> > At the moment, I see the following options:
> >
> > 1. Tomcat bug causing the writes to made to the wrong response.
> >
> > 2. Application bug (retaining references to request/response/writer
> >    etc.) causing the writes to made to the wrong response.
> >
> > 3. JVM bug causing network writes to be dropped.
> >
> > 4. OS bug causing network writes to be dropped before they reach the
> >    point where they are observable by Wireshark.
> >
> > To be frank, none of those options seem particularly likely.
> >
> > At this point I'd echo Konstantin's recommendation to add the
> > following system property:
> > org.apache.catalina.connector.RECYCLE_FACADES=true
> >
> > You'd normally do this in $CATALINA_HOME/bin/setenv.sh (creating that
> > file if necessary) with a line like:
> >
> > CATALINA_OPTS="$CATALINA_OPTS
> > -Dorg.apache.catalina.connector.RECYCLE_FACADES=true"
> >
> > You can confirm that the setting has been applied by looking in the
> > log for the start-up. You should see something like:
> >
> > Oct 26, 2020 10:18:45 AM
> > org.apache.catalina.startup.VersionLoggerListener log
> > INFO: Command line argument:
> > -Dorg.apache.catalina.connector.RECYCLE_FACADES=true
> >
> >
> > That option reduces the re-use of request, response and related
> > objects between requests and, if an application is retaining
> > references it shouldn't, you usually see a bunch of
> > NullPointerExceptions in the logs when the application tries to re-use those
> objects.
> >
> > Meanwhile, I'm going to work on a custom patch for 7.0.72 to add some
> > additional logging around the network writes.
>
> Patch files and instructions for use:
>
> http://home.apache.org/~markt/dev/v7.0.72-custom-patch-v1/
>
> Mark

Hi Mark,

A couple of questions.

1. Now that you have provided this patch, should I still enable RECYCLE_FACADES=true?

2. The servers in question are multi-tenanted. There are 17 instances of tomcat, each running on a different set of ports and out of a separate directory tree, but they are all started with essentially the same init script, which exports certain site-specific path variables and executes tomcat from the specified folder structure. Can you think of any potential issues where making this change for one instance could have a negative effect on any of the other instances? Probably not, but just being careful. I will have these changes implemented during our nightly maintenance window will begin to gather relevant logs first thing tomorrow!

--Eric

Disclaimer : This email and any files transmitted with it are confidential and intended solely for intended recipients. If you are not the named addressee you should not disseminate, distribute, copy or alter this email. Any views or opinions presented in this email are solely those of the author and might not represent those of Physician Select Management. Warning: Although Physician Select Management has taken reasonable precautions to ensure no viruses are present in this email, the company cannot accept responsibility for any loss or damage arising from the use of this email or attachments.

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]
Reply | Threaded
Open this post in threaded view
|

Re: Weirdest Tomcat Behavior Ever?

Konstantin Kolinko
вт, 27 окт. 2020 г. в 00:07, Eric Robinson <[hidden email]>:

>
> > On 26/10/2020 10:26, Mark Thomas wrote:
> > > On 24/10/2020 01:32, Eric Robinson wrote:
> > >
> > > At this point I'd echo Konstantin's recommendation to add the
> > > following system property:
> > > org.apache.catalina.connector.RECYCLE_FACADES=true
> > >
> > > You'd normally do this in $CATALINA_HOME/bin/setenv.sh (creating that
> > > file if necessary) with a line like:
> > >
> > > CATALINA_OPTS="$CATALINA_OPTS
> > > -Dorg.apache.catalina.connector.RECYCLE_FACADES=true"
> > >
> > > You can confirm that the setting has been applied by looking in the
> > > log for the start-up. You should see something like:
> > >
> > > Oct 26, 2020 10:18:45 AM
> > > org.apache.catalina.startup.VersionLoggerListener log
> > > INFO: Command line argument:
> > > -Dorg.apache.catalina.connector.RECYCLE_FACADES=true
> > >
> > >
> > > That option reduces the re-use of request, response and related
> > > objects between requests and, if an application is retaining
> > > references it shouldn't, you usually see a bunch of
> > > NullPointerExceptions in the logs when the application tries to re-use those
> > objects.
> > >
> > > Meanwhile, I'm going to work on a custom patch for 7.0.72 to add some
> > > additional logging around the network writes.
> >
> > Patch files and instructions for use:
> >
> > http://home.apache.org/~markt/dev/v7.0.72-custom-patch-v1/
> >
> > Mark
>
> Hi Mark,
>
> A couple of questions.
>
> 1. Now that you have provided this patch, should I still enable RECYCLE_FACADES=true?

Regarding the patch,
there is no source code for it, but I think that it adds debug
logging, nothing more.


RECYCLE_FACADES makes your configuration more safe, protecting Tomcat
from misbehaving web applications. I have that property set on all
Tomcat installations that I care about. Thus I think that you should
set it anyway.

I usually add that property into the conf/catalina.property file.

See the wiki for a more detailed answer.
https://cwiki.apache.org/confluence/display/TOMCAT/Troubleshooting+and+Diagnostics#TroubleshootingandDiagnostics-TroubleshootingunexpectedResponsestateproblems

My thought that you case could be caused by something like the "Java
ImageIO" issue mentioned there. If something in the web application
produces dangling references to java.io.OutputStream and they are
closed during garbage collection, corrupting Tomcat internals.

> 2. [...] Can you think of any potential issues where making this change for one instance could have a negative effect on any of the other instances? Probably not, but just being careful.

I hope that you can cope with the amount of logging that this generates.

Best regards,
Konstantin Kolinko

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

RE: Weirdest Tomcat Behavior Ever?

Eric Robinson
In reply to this post by Eric Robinson
> > On 26/10/2020 10:26, Mark Thomas wrote:
> > > On 24/10/2020 01:32, Eric Robinson wrote:
> > >
> > > <snip/>
> > >
> > >>>> -----Original Message-----
> > >>>> From: Mark Thomas <[hidden email]>
> > >
> > > <snip/>
> > >
> > >>>> The failed request:
> > >>>> - Completes in ~6ms
> > >>>
> > >>> I think we've seen the failed requests take as much as 50ms.
> > >
> > > Ack. That is still orders of magnitude smaller that the timeout and
> > > consistent with generation time of some of the larger responses.
> > >
> > > I wouldn't sat it confirms any of my previous conclusions but it
> > > doesn't invalidate them either.
> > >
> > >>>> Follow-up questions:
> > >>>> - JVM
> > >>>>   - Vendor?
> > >>>>   - OS package or direct from Vendor?
> > >>>>
> > >>>
> > >>> jdk-8u221-linux-x64.tar.gz downloaded from the Oracle web site.
> > >
> > > OK. That is post Java 8u202 so it should be a paid for, commercially
> > > supported version of Java 8.
> > >
> > > The latest Java 8 release from Oracle is 8u271.
> > >
> > > The latest Java 8 release from AdoptOpenJDK is 8u272.
> > >
> > > I don't think we are quite at this point yet but what is your view
> > > on updating to the latest Java 8 JDK (from either Oracle or
> AdoptOpenJDK).
> > >
> > >>>> - Tomcat
> > >>>>   - OS package, 3rd-party package or direct from ASF?
> > >>>>
> > >>>
> > >>> tomcat.noarch  7.0.76-6.el7 from CentOS base repository
> > >>>
> > >>
> > >> Drat, slight correction. I now recall that although we initially
> > >> installed 7.0.76
> > from the CentOS repo, the application vendor made us lower the version
> > to 7.0.72, and I DO NOT know where we got that. However, it has not
> > changed since October-ish of 2018.
> > >
> > > I've reviewed the 7.0.72 to 7.0.76 changelog and I don't see any
> > > relevant changes.
> > >
> > >>>> - Config
> > >>>>   - Any changes at all around the time the problems started? I'm
> > >>>>     thinking OS updates, VM restarted etc?
> > >>>>
> > >>>
> > >>> server.xml has not changed since 4/20/20, which was well before
> > >>> the problem manifested, and all the other files in the conf folder
> > >>> are even older than that. We're seeing this symptom on both
> > >>> production servers. One of them was rebooted a week ago, but the
> > >>> other has been up continuously for
> > >>> 258 days.
> > >
> > > OK. That rules a few things out which is good but it does make the
> > > trigger for this issue even more mysterious.
> > >
> > > Any changes in the Nginx configuration in the relevant timescale?
> > >
>
> The last change to the nginx config files was on 8/21. The first report of
> problems from the users in question was on 9/16. There is another set of
> users on a different tomcat instance who reported issues around 8/26, 5 days
> after nginx config change. It seems unlikely to be related. Also, I can't
> imagine what nginx could be sending that would induce the upstream tomcat
> to behave this way.
>
> > > Any updates to the application in the relevant timescale?
> > >
>
> Their application was patched to a newer version on 6/5.
>
> > > Any features users started using that hadn't been used before in
> > > that timescale?
>
> That one I couldn't answer, as we are only the hosting facility and we are not
> in the loop when it comes to the users' workflow, but it seems unlikely given
> the nature of their business.
>
> > >
> > > <snip/>
> > >
> > >>>> Recommendations:
> > >>>> - Switch back to the BIO connector if you haven't already. It has fewer
> > >>>>   moving parts than NIO so it is simpler debug.
> > >>>> - Add "%b" to the access log pattern for Tomcat's access log valve to
> > >>>>   record the number of body (excluding headers) bytes Tomcat
> > >>>> believes
> > it
> > >>>>   has written to the response.
> > >>>>
> > >>>>
> > >>>> Next steps:
> > >>>> - Wait for the issue to re-occur after the recommended changes
> > >>>> above
> > and
> > >>>>   depending on what is recorded in the access log for %b for a failed
> > >>>>   request, shift the focus accordingly.
> > >>>> - Answers to the additional questions would be nice but the access
> log
> > >>>>   %b value for a failed request is the key piece of information required
> > >>>>   at this point.
> > >>>>
> > >>>
> > >>> Good news! I enabled that parameter a few days ago and we have
> > >>> already caught some instances of the problem occurring.
> > >
> > > Excellent!
> > >
> > >>> Here is the logging format...
> > >>>
> > >>> <Valve className="org.apache.catalina.valves.AccessLogValve"
> > >>> directory="logs"
> > >>>                prefix="localhost_access." suffix=".log"
> > >>> pattern="%h %l %D %u %t %{JSESSIONID}c %{cookie}i %r %s %b %S
> %q"
> > >>> />
> > >>>
> > >>> Due to some sensitive content in the HTTP requests below, I have
> > >>> globally replaced certain words and addresses with random-ish
> > >>> strings, but I don't think I've done anything to impact the issue.
> > >>>
> > >>> Following is an example from Wednesday.
> > >>>
> > >>> This is a request being sent from the nginx proxy to the first of
> > >>> 2 upstream servers, 10.51.14.46
> > >>>
> > >>> 2020/10/21 15:51:22 [error] 39268#39268: *842342531 upstream
> > >>> prematurely closed connection while reading response header from
> > upstream, client:
> > >>> 99.88.77.66, server: redacted.domain.com, request: "GET
> > >>> /sandhut/jsp/catalog/xml/getWidgets.jsp?eDate=2020-10-
> > >>>
> >
> 21&statueId=64438&facilityId=0&DeptId=0&CompressXML=Yes&sessionDID
> > >>>
> >
> =0&GzUserId=75064&gizusrlogtblid=322095&gizappprocessid=8568&rnd2=0.
> > >>> 5650846&timestamp=21102020155122.472656 HTTP/1.1", upstream:
> > >>> "http://10.51.14.46:3016/sandhut/jsp/catalog/xml/getWidgets.jsp?eD
> > >>> at
> > >>> e=20
> > >>> 20-10-
> > >>>
> >
> 21&statueId=64438&facilityId=0&DeptId=0&CompressXML=Yes&sessionDID
> > >>>
> >
> =0&GzUserId=75064&gizusrlogtblid=322095&gizappprocessid=8568&rnd2=0.
> > >>> 5650846&timestamp=21102020155122.472656", host:
> > >>> "redacted.domain.com"
> > >>>
> > >>> Here is the matching localhost_access log entry from that server….
> > >>>
> > >>> 10.51.14.133 - 144 - [21/Oct/2020:15:51:22 -0400]
> > >>> F405E25E49E3DCB81A36A87DED1FE573
> > >>> JSESSIONID=F405E25E49E3DCB81A36A87DED1FE573;
> > >>> srv_id=dea8d61a7d725e980a6093cb78d8ec73;
> > >>> JSESSIONID=F405E25E49E3DCB81A36A87DED1FE573;
> > >>> srv_id=dea8d61a7d725e980a6093cb78d8ec73 GET
> > >>> /sandhut/jsp/catalog/xml/getWidgets.jsp?eDate=2020-10-
> > >>>
> >
> 21&statueId=64438&facilityId=0&DeptId=0&CompressXML=Yes&sessionDID
> > >>>
> >
> =0&GzUserId=75064&gizusrlogtblid=322095&gizappprocessid=8568&rnd2=0.
> > >>> 5650846&timestamp=21102020155122.472656 HTTP/1.0 200 40423
> > >>> F405E25E49E3DCB81A36A87DED1FE573 ?eDate=2020-10-
> > >>>
> >
> 21&statueId=64438&facilityId=0&DeptId=0&CompressXML=Yes&sessionDID
> > >>>
> >
> =0&GzUserId=75064&gizusrlogtblid=322095&gizappprocessid=8568&rnd2=0.
> > >>> 5650846&timestamp=21102020155122.472656
> > >>>
> > >>> Tomcat appears to think it sent 40423 bytes. However, even though
> > >>> it shows an HTTP 200 response, WireShark shows the response did
> > >>> not actually go out on the wire.
> > >
> > > That is very interesting. That narrows down where things might be
> > > going wrong.
> > >
> > > It confirms that the application is generating the response so we
> > > can rule out a bug in that process.
> > >
> > > By default there is some buffering here. The buffer size is 9000 bytes.
> > > So with a response of 40423 I'd expect at least some of those bytes
> > > to be written to network. However, Wireshark is reporting no bytes
> written.
> > >
> > >
> > > At the moment, I see the following options:
> > >
> > > 1. Tomcat bug causing the writes to made to the wrong response.
> > >
> > > 2. Application bug (retaining references to request/response/writer
> > >    etc.) causing the writes to made to the wrong response.
> > >
> > > 3. JVM bug causing network writes to be dropped.
> > >
> > > 4. OS bug causing network writes to be dropped before they reach the
> > >    point where they are observable by Wireshark.
> > >
> > > To be frank, none of those options seem particularly likely.
> > >
> > > At this point I'd echo Konstantin's recommendation to add the
> > > following system property:
> > > org.apache.catalina.connector.RECYCLE_FACADES=true
> > >
> > > You'd normally do this in $CATALINA_HOME/bin/setenv.sh (creating
> > > that file if necessary) with a line like:
> > >
> > > CATALINA_OPTS="$CATALINA_OPTS
> > > -Dorg.apache.catalina.connector.RECYCLE_FACADES=true"
> > >
> > > You can confirm that the setting has been applied by looking in the
> > > log for the start-up. You should see something like:
> > >
> > > Oct 26, 2020 10:18:45 AM
> > > org.apache.catalina.startup.VersionLoggerListener log
> > > INFO: Command line argument:
> > > -Dorg.apache.catalina.connector.RECYCLE_FACADES=true
> > >
> > >
> > > That option reduces the re-use of request, response and related
> > > objects between requests and, if an application is retaining
> > > references it shouldn't, you usually see a bunch of
> > > NullPointerExceptions in the logs when the application tries to
> > > re-use those
> > objects.
> > >
> > > Meanwhile, I'm going to work on a custom patch for 7.0.72 to add
> > > some additional logging around the network writes.
> >
> > Patch files and instructions for use:
> >
> > http://home.apache.org/~markt/dev/v7.0.72-custom-patch-v1/
> >
> > Mark
>
> Hi Mark,
>
> A couple of questions.
>
> 1. Now that you have provided this patch, should I still enable
> RECYCLE_FACADES=true?
>
> 2. The servers in question are multi-tenanted. There are 17 instances of
> tomcat, each running on a different set of ports and out of a separate
> directory tree, but they are all started with essentially the same init script,
> which exports certain site-specific path variables and executes tomcat from
> the specified folder structure. Can you think of any potential issues where
> making this change for one instance could have a negative effect on any of
> the other instances? Probably not, but just being careful. I will have these
> changes implemented during our nightly maintenance window will begin to
> gather relevant logs first thing tomorrow!
>
> --Eric

Mark, the changes have been made per your instructions and tomcat has been restarted. The debug0.log, and debug0.log.lck files were created in the directory, but they currently both have 0 bytes.

--Eric



Disclaimer : This email and any files transmitted with it are confidential and intended solely for intended recipients. If you are not the named addressee you should not disseminate, distribute, copy or alter this email. Any views or opinions presented in this email are solely those of the author and might not represent those of Physician Select Management. Warning: Although Physician Select Management has taken reasonable precautions to ensure no viruses are present in this email, the company cannot accept responsibility for any loss or damage arising from the use of this email or attachments.

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]
Reply | Threaded
Open this post in threaded view
|

RE: Weirdest Tomcat Behavior Ever?

Eric Robinson
> -----Original Message-----
> From: Eric Robinson <[hidden email]>
> Sent: Monday, October 26, 2020 11:37 PM
> To: Tomcat Users List <[hidden email]>
> Subject: RE: Weirdest Tomcat Behavior Ever?
>
> > > On 26/10/2020 10:26, Mark Thomas wrote:
> > > > On 24/10/2020 01:32, Eric Robinson wrote:
> > > >
> > > > <snip/>
> > > >
> > > >>>> -----Original Message-----
> > > >>>> From: Mark Thomas <[hidden email]>
> > > >
> > > > <snip/>
> > > >
> > > >>>> The failed request:
> > > >>>> - Completes in ~6ms
> > > >>>
> > > >>> I think we've seen the failed requests take as much as 50ms.
> > > >
> > > > Ack. That is still orders of magnitude smaller that the timeout
> > > > and consistent with generation time of some of the larger responses.
> > > >
> > > > I wouldn't sat it confirms any of my previous conclusions but it
> > > > doesn't invalidate them either.
> > > >
> > > >>>> Follow-up questions:
> > > >>>> - JVM
> > > >>>>   - Vendor?
> > > >>>>   - OS package or direct from Vendor?
> > > >>>>
> > > >>>
> > > >>> jdk-8u221-linux-x64.tar.gz downloaded from the Oracle web site.
> > > >
> > > > OK. That is post Java 8u202 so it should be a paid for,
> > > > commercially supported version of Java 8.
> > > >
> > > > The latest Java 8 release from Oracle is 8u271.
> > > >
> > > > The latest Java 8 release from AdoptOpenJDK is 8u272.
> > > >
> > > > I don't think we are quite at this point yet but what is your view
> > > > on updating to the latest Java 8 JDK (from either Oracle or
> > AdoptOpenJDK).
> > > >
> > > >>>> - Tomcat
> > > >>>>   - OS package, 3rd-party package or direct from ASF?
> > > >>>>
> > > >>>
> > > >>> tomcat.noarch  7.0.76-6.el7 from CentOS base repository
> > > >>>
> > > >>
> > > >> Drat, slight correction. I now recall that although we initially
> > > >> installed 7.0.76
> > > from the CentOS repo, the application vendor made us lower the
> > > version to 7.0.72, and I DO NOT know where we got that. However, it
> > > has not changed since October-ish of 2018.
> > > >
> > > > I've reviewed the 7.0.72 to 7.0.76 changelog and I don't see any
> > > > relevant changes.
> > > >
> > > >>>> - Config
> > > >>>>   - Any changes at all around the time the problems started? I'm
> > > >>>>     thinking OS updates, VM restarted etc?
> > > >>>>
> > > >>>
> > > >>> server.xml has not changed since 4/20/20, which was well before
> > > >>> the problem manifested, and all the other files in the conf
> > > >>> folder are even older than that. We're seeing this symptom on
> > > >>> both production servers. One of them was rebooted a week ago,
> > > >>> but the other has been up continuously for
> > > >>> 258 days.
> > > >
> > > > OK. That rules a few things out which is good but it does make the
> > > > trigger for this issue even more mysterious.
> > > >
> > > > Any changes in the Nginx configuration in the relevant timescale?
> > > >
> >
> > The last change to the nginx config files was on 8/21. The first
> > report of problems from the users in question was on 9/16. There is
> > another set of users on a different tomcat instance who reported
> > issues around 8/26, 5 days after nginx config change. It seems
> > unlikely to be related. Also, I can't imagine what nginx could be
> > sending that would induce the upstream tomcat to behave this way.
> >
> > > > Any updates to the application in the relevant timescale?
> > > >
> >
> > Their application was patched to a newer version on 6/5.
> >
> > > > Any features users started using that hadn't been used before in
> > > > that timescale?
> >
> > That one I couldn't answer, as we are only the hosting facility and we
> > are not in the loop when it comes to the users' workflow, but it seems
> > unlikely given the nature of their business.
> >
> > > >
> > > > <snip/>
> > > >
> > > >>>> Recommendations:
> > > >>>> - Switch back to the BIO connector if you haven't already. It has
> fewer
> > > >>>>   moving parts than NIO so it is simpler debug.
> > > >>>> - Add "%b" to the access log pattern for Tomcat's access log valve to
> > > >>>>   record the number of body (excluding headers) bytes Tomcat
> > > >>>> believes
> > > it
> > > >>>>   has written to the response.
> > > >>>>
> > > >>>>
> > > >>>> Next steps:
> > > >>>> - Wait for the issue to re-occur after the recommended changes
> > > >>>> above
> > > and
> > > >>>>   depending on what is recorded in the access log for %b for a failed
> > > >>>>   request, shift the focus accordingly.
> > > >>>> - Answers to the additional questions would be nice but the
> > > >>>> access
> > log
> > > >>>>   %b value for a failed request is the key piece of information
> required
> > > >>>>   at this point.
> > > >>>>
> > > >>>
> > > >>> Good news! I enabled that parameter a few days ago and we have
> > > >>> already caught some instances of the problem occurring.
> > > >
> > > > Excellent!
> > > >
> > > >>> Here is the logging format...
> > > >>>
> > > >>> <Valve className="org.apache.catalina.valves.AccessLogValve"
> > > >>> directory="logs"
> > > >>>                prefix="localhost_access." suffix=".log"
> > > >>> pattern="%h %l %D %u %t %{JSESSIONID}c %{cookie}i %r %s %b %S
> > %q"
> > > >>> />
> > > >>>
> > > >>> Due to some sensitive content in the HTTP requests below, I have
> > > >>> globally replaced certain words and addresses with random-ish
> > > >>> strings, but I don't think I've done anything to impact the issue.
> > > >>>
> > > >>> Following is an example from Wednesday.
> > > >>>
> > > >>> This is a request being sent from the nginx proxy to the first
> > > >>> of
> > > >>> 2 upstream servers, 10.51.14.46
> > > >>>
> > > >>> 2020/10/21 15:51:22 [error] 39268#39268: *842342531 upstream
> > > >>> prematurely closed connection while reading response header from
> > > upstream, client:
> > > >>> 99.88.77.66, server: redacted.domain.com, request: "GET
> > > >>> /sandhut/jsp/catalog/xml/getWidgets.jsp?eDate=2020-10-
> > > >>>
> > >
> >
> 21&statueId=64438&facilityId=0&DeptId=0&CompressXML=Yes&sessionDID
> > > >>>
> > >
> >
> =0&GzUserId=75064&gizusrlogtblid=322095&gizappprocessid=8568&rnd2=0.
> > > >>> 5650846&timestamp=21102020155122.472656 HTTP/1.1", upstream:
> > > >>> "http://10.51.14.46:3016/sandhut/jsp/catalog/xml/getWidgets.jsp?
> > > >>> eD
> > > >>> at
> > > >>> e=20
> > > >>> 20-10-
> > > >>>
> > >
> >
> 21&statueId=64438&facilityId=0&DeptId=0&CompressXML=Yes&sessionDID
> > > >>>
> > >
> >
> =0&GzUserId=75064&gizusrlogtblid=322095&gizappprocessid=8568&rnd2=0.
> > > >>> 5650846&timestamp=21102020155122.472656", host:
> > > >>> "redacted.domain.com"
> > > >>>
> > > >>> Here is the matching localhost_access log entry from that server….
> > > >>>
> > > >>> 10.51.14.133 - 144 - [21/Oct/2020:15:51:22 -0400]
> > > >>> F405E25E49E3DCB81A36A87DED1FE573
> > > >>> JSESSIONID=F405E25E49E3DCB81A36A87DED1FE573;
> > > >>> srv_id=dea8d61a7d725e980a6093cb78d8ec73;
> > > >>> JSESSIONID=F405E25E49E3DCB81A36A87DED1FE573;
> > > >>> srv_id=dea8d61a7d725e980a6093cb78d8ec73 GET
> > > >>> /sandhut/jsp/catalog/xml/getWidgets.jsp?eDate=2020-10-
> > > >>>
> > >
> >
> 21&statueId=64438&facilityId=0&DeptId=0&CompressXML=Yes&sessionDID
> > > >>>
> > >
> >
> =0&GzUserId=75064&gizusrlogtblid=322095&gizappprocessid=8568&rnd2=0.
> > > >>> 5650846&timestamp=21102020155122.472656 HTTP/1.0 200 40423
> > > >>> F405E25E49E3DCB81A36A87DED1FE573 ?eDate=2020-10-
> > > >>>
> > >
> >
> 21&statueId=64438&facilityId=0&DeptId=0&CompressXML=Yes&sessionDID
> > > >>>
> > >
> >
> =0&GzUserId=75064&gizusrlogtblid=322095&gizappprocessid=8568&rnd2=0.
> > > >>> 5650846&timestamp=21102020155122.472656
> > > >>>
> > > >>> Tomcat appears to think it sent 40423 bytes. However, even
> > > >>> though it shows an HTTP 200 response, WireShark shows the
> > > >>> response did not actually go out on the wire.
> > > >
> > > > That is very interesting. That narrows down where things might be
> > > > going wrong.
> > > >
> > > > It confirms that the application is generating the response so we
> > > > can rule out a bug in that process.
> > > >
> > > > By default there is some buffering here. The buffer size is 9000 bytes.
> > > > So with a response of 40423 I'd expect at least some of those
> > > > bytes to be written to network. However, Wireshark is reporting no
> > > > bytes
> > written.
> > > >
> > > >
> > > > At the moment, I see the following options:
> > > >
> > > > 1. Tomcat bug causing the writes to made to the wrong response.
> > > >
> > > > 2. Application bug (retaining references to request/response/writer
> > > >    etc.) causing the writes to made to the wrong response.
> > > >
> > > > 3. JVM bug causing network writes to be dropped.
> > > >
> > > > 4. OS bug causing network writes to be dropped before they reach the
> > > >    point where they are observable by Wireshark.
> > > >
> > > > To be frank, none of those options seem particularly likely.
> > > >
> > > > At this point I'd echo Konstantin's recommendation to add the
> > > > following system property:
> > > > org.apache.catalina.connector.RECYCLE_FACADES=true
> > > >
> > > > You'd normally do this in $CATALINA_HOME/bin/setenv.sh (creating
> > > > that file if necessary) with a line like:
> > > >
> > > > CATALINA_OPTS="$CATALINA_OPTS
> > > > -Dorg.apache.catalina.connector.RECYCLE_FACADES=true"
> > > >
> > > > You can confirm that the setting has been applied by looking in
> > > > the log for the start-up. You should see something like:
> > > >
> > > > Oct 26, 2020 10:18:45 AM
> > > > org.apache.catalina.startup.VersionLoggerListener log
> > > > INFO: Command line argument:
> > > > -Dorg.apache.catalina.connector.RECYCLE_FACADES=true
> > > >
> > > >
> > > > That option reduces the re-use of request, response and related
> > > > objects between requests and, if an application is retaining
> > > > references it shouldn't, you usually see a bunch of
> > > > NullPointerExceptions in the logs when the application tries to
> > > > re-use those
> > > objects.
> > > >
> > > > Meanwhile, I'm going to work on a custom patch for 7.0.72 to add
> > > > some additional logging around the network writes.
> > >
> > > Patch files and instructions for use:
> > >
> > > http://home.apache.org/~markt/dev/v7.0.72-custom-patch-v1/
> > >
> > > Mark
> >
> > Hi Mark,
> >
> > A couple of questions.
> >
> > 1. Now that you have provided this patch, should I still enable
> > RECYCLE_FACADES=true?
> >
> > 2. The servers in question are multi-tenanted. There are 17 instances
> > of tomcat, each running on a different set of ports and out of a
> > separate directory tree, but they are all started with essentially the
> > same init script, which exports certain site-specific path variables
> > and executes tomcat from the specified folder structure. Can you think
> > of any potential issues where making this change for one instance
> > could have a negative effect on any of the other instances? Probably
> > not, but just being careful. I will have these changes implemented
> > during our nightly maintenance window will begin to gather relevant logs
> first thing tomorrow!
> >
> > --Eric
>
> Mark, the changes have been made per your instructions and tomcat has
> been restarted. The debug0.log, and debug0.log.lck files were created in the
> directory, but they currently both have 0 bytes.
>
> --Eric
>

Also, RECYCLE_FACADES has been enabled and I confirmed that it is referenced in the logs as follows...

INFO: Command line argument: -Dorg.apache.catalina.connector.RECYCLE_FACADES=true


Disclaimer : This email and any files transmitted with it are confidential and intended solely for intended recipients. If you are not the named addressee you should not disseminate, distribute, copy or alter this email. Any views or opinions presented in this email are solely those of the author and might not represent those of Physician Select Management. Warning: Although Physician Select Management has taken reasonable precautions to ensure no viruses are present in this email, the company cannot accept responsibility for any loss or damage arising from the use of this email or attachments.

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]
12345