[Bug 63859] New: AJP cping/cpong mode failing on Tomcat 9.x

classic Classic list List threaded Threaded
18 messages Options
Reply | Threaded
Open this post in threaded view
|

[Bug 63859] New: AJP cping/cpong mode failing on Tomcat 9.x

Bugzilla from bugzilla@apache.org
https://bz.apache.org/bugzilla/show_bug.cgi?id=63859

            Bug ID: 63859
           Summary: AJP cping/cpong mode failing on Tomcat 9.x
           Product: Tomcat 9
           Version: 9.0.26
          Hardware: PC
            Status: NEW
          Severity: regression
          Priority: P2
         Component: Connectors
          Assignee: [hidden email]
          Reporter: [hidden email]
  Target Milestone: -----

Hello,

We have a common infratructure for all our Tomcat instances (7,8&9) based on
httpd + mod_jk / ajp.

We implemented a while ago cping/pong om httpd with mode A, and it works
perfectly with Tomcat 7&8.

[info] ajp_handle_cping_cpong::jk_ajp_common.c (976): (tomcat9) timeout in
reply cpong after 10000 ms. Socket = 31 (event=110)
[info] ajp_maintain::jk_ajp_common.c (3464): (tomcat9) failed sending request,
socket -1 keepalive cping/cpong failure (errno=110)
Leading progressively to
[info] ajp_service::jk_ajp_common.c (2775): (tomcat9) sending request to tomcat
failed (recoverable), because of error during request sending (attempt=2)
[error] ajp_service::jk_ajp_common.c (2796): (tomcat9) connecting to tomcat
failed (rc=-3, errors=1, client_errors=0).
[info] service::jk_lb_worker.c (1601): service failed, worker tomcat9 is in
error state

We use the latest mod_jk (mod_jk/1.2.46)

I tried using Tomcat native protocol (which we never used so far), same error.

I tried to look into tomcat 9 logs, nothing shows, and Tomcat is still up and
running (Inspecction via JMX shows everything ok, Probe console and manager
console shows no limit...).
I even tried to run a massload using Jmeter on AJP port while the error was
showing in httpd logs : no error, all the GET ends in success. But httpd still
sees a cping/cpong error.

In the end I may have fixed the issue by changing the mod_jk parameter and
disabling the "cping cpong" mode :
worker.tomcat9.ping_mode=C
instead of previously worker.tomcat9.ping_mode=A
(which does the cping/cpong on a regular basis)

=> Since I changed this I don't have any errors in httpd.

So I suspect a regression in Tomcat 9 somewhere with AJP protocol, even though
I couldn't find any error in Tomcat logs ...

Not sure if I can help more, just wanted to highlight this issue but if you
need more detail or debug log don't hesitate.

The settings of my connectors (7 8 & 9 ) are the same :
<Connector port="xxxx" protocol="AJP/1.3" maxThreads="400"
connectionTimeout="600000"/>

--
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[Bug 63859] AJP cping/cpong mode failing on Tomcat 9.x

Bugzilla from bugzilla@apache.org
https://bz.apache.org/bugzilla/show_bug.cgi?id=63859

Aurelien Pernoud <[hidden email]> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 OS|                            |All

--- Comment #1 from Aurelien Pernoud <[hidden email]> ---
I missed a sentence...

We implemented a while ago cping/pong om httpd with mode A, and it works
perfectly with Tomcat 7&8.
+"But with Tomcat 9 we inconsistently have those errors of cping/cpong in logs"

And then the logs...

--
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[Bug 63859] AJP cping/cpong mode failing on Tomcat 9.x

Bugzilla from bugzilla@apache.org
In reply to this post by Bugzilla from bugzilla@apache.org
https://bz.apache.org/bugzilla/show_bug.cgi?id=63859

Aurelien Pernoud <[hidden email]> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |[hidden email]

--
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[Bug 63859] AJP cping/cpong mode failing on Tomcat 9.x

Bugzilla from bugzilla@apache.org
In reply to this post by Bugzilla from bugzilla@apache.org
https://bz.apache.org/bugzilla/show_bug.cgi?id=63859

--- Comment #2 from Rainer Jung <[hidden email]> ---
Any info on usage characteristics during times this happens? High load (how
many requests per second), with or without load balancing, workers.properties
config etc.?

How frequently does it happen (always, sporadicaly for NNN % of requests, in
spikes, ...)?

Can you easily reproduce?

Do you have root privileges, so could you sniff network traffic to the AJP
port? Are Apache and Tomcat on diferent machines? Any active components
(Firewalls, routers) in between?

Thanks and regards,

Rainer

--
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[Bug 63859] AJP cping/cpong mode failing on Tomcat 9.x

Bugzilla from bugzilla@apache.org
In reply to this post by Bugzilla from bugzilla@apache.org
https://bz.apache.org/bugzilla/show_bug.cgi?id=63859

--- Comment #3 from Aurelien Pernoud <[hidden email]> ---
Hi Rainer,

Any info on usage characteristics during times this happens? High load (how
many requests per second), with or without load balancing, workers.properties
config etc.?

=> There is no "rule", it even happen with only a few users on my test
instance.
Workers.properties is the same than for my tomcat 7&8 instances, e.g. I use a
servertemplate which I apply to node and a lbtemplate too.

Here is an extract :
#Create one common template for all workers nodes
worker.servertemplate.type=ajp13
# factor of each worker is the same
worker.servertemplate.lbfactor=1
# ping_mode A is the most complete
worker.servertemplate.ping_mode=C
# socket_timeout in seconds
worker.servertemplate.socket_timeout=30
# connection_pool_timeout in seconds
worker.servertemplate.connection_pool_timeout=600
# reply_timeout in milliseconds : 10 min
worker.servertemplate.reply_timeout=600000
# recovery_options : 3 : don't retry on error after request was sent
worker.servertemplate.recovery_options=3

# Create one common template for all LB nodes
worker.balancertemplate.type=lb
# If we have 10 replies timing out in the same minute => worker goes in error
state
worker.balancertemplate.max_reply_timeouts=10

Then based on this I apply those settings to nodes / clusters.

How frequently does it happen (always, sporadicaly for NNN % of requests, in
spikes, ...)?

=> I couldn't find any "rule" sorry.... but it happened at least 10 times / day
and since I changed the cping cpong to C it doesn't show anymore in the logs
even though I have activity

Can you easily reproduce?

=> Yes, I have a test environment which is in use and if I push back
cping/cpong mode to A I'm sure the error will show up again.

Do you have root privileges, so could you sniff network traffic to the AJP
port? Are Apache and Tomcat on diferent machines? Any active components
(Firewalls, routers) in between?

=> I'm not root unfortunately, but my config is that I run 2 linux servers,
which both hosts httpd and tomcat instances (both servers runs same versions of
httpd, tomcat), and it works fine with the ajp port of tomcat 7&8.
.
There is no firewall between them (hard or soft), and the error occurs even
when the httpd is connecting to "himself" (even though I don't use "localhost"
but the hostname)

I've ran this setup for more than a year on Tomcat 7&8 (even with upgrades),
and only met the issue with 9. It failes with 9.0.20 so I tried to upgrade last
week to 9.0.26 but still here.
Since yesterday and switching the ping_mode to "C" it stopped failing, so for
now I will go like that in production but might be good to investigate.

Let me know if I can be of any help.

FYI I tried :
- putting mod_jk in debug => way to verbose (I run 9 instances of tomcat in the
end on the same server), but at the beginning the behaviour looks correct (I
saw the ping pong working with my tomcat 9 instances every minute)
- adding debug in tomcat "AJP" : couldn't find exactly what I could put in
debug... if I put the full org.apache logger in DEBUG it's nightmare :)

Let me know if I can help, and thanks for replying so quickly

--
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[Bug 63859] AJP cping/cpong mode failing on Tomcat 9.x

Bugzilla from bugzilla@apache.org
In reply to this post by Bugzilla from bugzilla@apache.org
https://bz.apache.org/bugzilla/show_bug.cgi?id=63859

--- Comment #4 from Aurelien Pernoud <[hidden email]> ---
Also I even checked while the error was showing the "total connections" from my
httpd to tomcat 9 ajp port (netstat) : it was less than 10.
And when I stress test the ajp port, it worked. I'm lost :D

--
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[Bug 63859] AJP cping/cpong mode failing on Tomcat 9.x

Bugzilla from bugzilla@apache.org
In reply to this post by Bugzilla from bugzilla@apache.org
https://bz.apache.org/bugzilla/show_bug.cgi?id=63859

--- Comment #5 from Christopher Schultz <[hidden email]> ---
(In reply to Aurelien Pernoud from comment #3)
> Tomcat 7&8

Is this Tomcat 8.0.x or Tomcat 8.5.x?

--
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[Bug 63859] AJP cping/cpong mode failing on Tomcat 9.x

Bugzilla from bugzilla@apache.org
In reply to this post by Bugzilla from bugzilla@apache.org
https://bz.apache.org/bugzilla/show_bug.cgi?id=63859

--- Comment #6 from Aurelien Pernoud <[hidden email]> ---
Tomcat 8.0.x

--
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[Bug 63859] AJP cping/cpong mode failing on Tomcat 9.x

Bugzilla from bugzilla@apache.org
In reply to this post by Bugzilla from bugzilla@apache.org
https://bz.apache.org/bugzilla/show_bug.cgi?id=63859

Mark Thomas <[hidden email]> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |NEEDINFO

--- Comment #7 from Mark Thomas <[hidden email]> ---
I know it is verbose but what we really need is the mod_jk debug log from when
this error occurs. Based on the information provided, that looks like the best
option for further investigation.

--
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[Bug 63859] AJP cping/cpong mode failing on Tomcat 9.x

Bugzilla from bugzilla@apache.org
In reply to this post by Bugzilla from bugzilla@apache.org
https://bz.apache.org/bugzilla/show_bug.cgi?id=63859

--- Comment #8 from Aurelien Pernoud <[hidden email]> ---
Hi Mark,

I have the debug but can't attach it due to limit of 1MB :-(

I managed to reproduce the issue, I have 2 httpd with the exact same
configuration, pointing to same tomcat.

In the logs of "http node 1" I see for ex :

[Mon Nov 04 09:15:03 2019] [13694:140683904751360] [info]
ajp_handle_cping_cpong::jk_ajp_common.c (976): (tom_tst01_srv09) timeout in
reply cpong after 10000 ms. Socket = 25 (event=110)
[Mon Nov 04 09:15:13 2019] [13694:140683904751360] [info]
ajp_handle_cping_cpong::jk_ajp_common.c (976): (tom_tst01_srv09) timeout in
reply cpong after 10000 ms. Socket = 14 (event=110)
[Mon Nov 04 09:15:24 2019] [13694:140683904751360] [info]
ajp_handle_cping_cpong::jk_ajp_common.c (976): (tom_tst01_srv09) timeout in
reply cpong after 10000 ms. Socket = 14 (event=110)

leading progressively to :

[Mon Nov 04 09:15:24 2019] [13694:140683904751360] [error]
ajp_send_request::jk_ajp_common.c (1725): (tom_tst01_srv09) connecting to
backend failed. Tomcat is probably not started or is listening on the wrong
port (errno=110)

(Here I only searched for the error)

In the logs of "http node 2", at the same timing, the same server replies
correctly to ping (at least that's how I interpret the below lines) :

[Mon Nov 04 09:14:52 2019] [25023:139691817195264] [debug]
ajp_maintain::jk_ajp_common.c (3441): (tom_tst01_srv09) reached pool min size
13 from 25 cache slots
[Mon Nov 04 09:14:52 2019] [25023:139691817195264] [debug]
ajp_maintain::jk_ajp_common.c (3493): (tom_tst01_srv09) pinged 1 and recycled 0
sockets in 0 seconds from 25 pool slots
[Mon Nov 04 09:15:32 2019] [25022:139691817195264] [debug]
ajp_maintain::jk_ajp_common.c (3441): (tom_tst01_srv09) reached pool min size
13 from 25 cache slots
[Mon Nov 04 09:15:32 2019] [25022:139691817195264] [debug]
ajp_maintain::jk_ajp_common.c (3493): (tom_tst01_srv09) pinged 1 and recycled 0
sockets in 0 seconds from 25 pool slots
[Mon Nov 04 09:15:32 2019] [25024:139691817195264] [debug]
ajp_maintain::jk_ajp_common.c (3441): (tom_tst01_srv09) reached pool min size
13 from 25 cache slots
[Mon Nov 04 09:15:32 2019] [25024:139691817195264] [debug]
ajp_maintain::jk_ajp_common.c (3493): (tom_tst01_srv09) pinged 2 and recycled 0
sockets in 0 seconds from 25 pool slots
[Mon Nov 04 09:15:50 2019] [26242:139691817195264] [debug]
ajp_maintain::jk_ajp_common.c (3441): (tom_tst01_srv09) reached pool min size
13 from 25 cache slots
[Mon Nov 04 09:15:50 2019] [26242:139691817195264] [debug]
ajp_maintain::jk_ajp_common.c (3493): (tom_tst01_srv09) pinged 2 and recycled 0
sockets in 0 seconds from 25 pool slots

Both server times are synecd of course.

I can send the logs, once zipped both nodes we are talking about a file of
30MB, let me know how ?

--
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[Bug 63859] AJP cping/cpong mode failing on Tomcat 9.x

Bugzilla from bugzilla@apache.org
In reply to this post by Bugzilla from bugzilla@apache.org
https://bz.apache.org/bugzilla/show_bug.cgi?id=63859

Aurelien Pernoud <[hidden email]> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEEDINFO                    |NEW

--
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[Bug 63859] AJP cping/cpong mode failing on Tomcat 9.x

Bugzilla from bugzilla@apache.org
In reply to this post by Bugzilla from bugzilla@apache.org
https://bz.apache.org/bugzilla/show_bug.cgi?id=63859

--- Comment #9 from Aurelien Pernoud <[hidden email]> ---
I also collected tomcat logs even though I can't find anything in it, except
that "luckily" I also see for ex that our monitoring tool "pinged" (using HTTP
protocol) the server at  [04/Nov/2019:09:15:02 +0000] using "Probe PSI
quickcheck", and this call worked like a charm...
There was very limited activity too according to access_log.
I can also provide this if needed.

I don't know if I can provide any more details, let me know, for now I'm
switching back to mode C to avoid this error, I just double checked again my
production environment which has much more activity and I don't have this
"timeout in reply cpong after 10000 ms" showing at all.

--
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[Bug 63859] AJP cping/cpong mode failing on Tomcat 9.x

Bugzilla from bugzilla@apache.org
In reply to this post by Bugzilla from bugzilla@apache.org
https://bz.apache.org/bugzilla/show_bug.cgi?id=63859

--- Comment #10 from Mark Thomas <[hidden email]> ---
I'll send you a Google Drive link privately where you can upload the files.

Please upload the mod_jk debug logs, the Tomcat logs, the access log, the
mod_jk configuration files and the server.xml from the Tomcat instances (with
any passwords replaced by ***).

Thanks.

--
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[Bug 63859] AJP cping/cpong mode failing on Tomcat 9.x

Bugzilla from bugzilla@apache.org
In reply to this post by Bugzilla from bugzilla@apache.org
https://bz.apache.org/bugzilla/show_bug.cgi?id=63859

--- Comment #11 from Aurelien Pernoud <[hidden email]> ---
Thanks Mark, just uploaded everything.

in "logs" : mod_jk debug log from both httpd instances limited to the period
where the error occured
tom_tst01_srv09 is the full tomcat log

in "conf" : server.xml anonymized, I also provide the setenv as I use
environment variables as you will see
workers.properties also provided, except of course that here I switched it back
to "C" mode, the error showed only with "A" mode.

--
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[Bug 63859] AJP cping/cpong mode failing on Tomcat 9.x

Bugzilla from bugzilla@apache.org
In reply to this post by Bugzilla from bugzilla@apache.org
https://bz.apache.org/bugzilla/show_bug.cgi?id=63859

--- Comment #12 from Mark Thomas <[hidden email]> ---
Thanks. I can't see anything obviously wrong in those files. The CPING is sent
and the CPONG is never seen.

If any Tomcat committer wants access to the logs, let me know and I'll send you
a link.

Debug logging on the Tomcat side might be useful. Adding the following to
logging.properties:
org.apache.coyote.ajp.level = FINE
org.apache.tomcat.util.net.level = FINE

If that generates too much data, try starting with just the first of the two
lines.

Meanwhile, I am going to try and reproduce this.

--
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[Bug 63859] AJP cping/cpong mode failing on Tomcat 9.x

Bugzilla from bugzilla@apache.org
In reply to this post by Bugzilla from bugzilla@apache.org
https://bz.apache.org/bugzilla/show_bug.cgi?id=63859

--- Comment #13 from Aurelien Pernoud <[hidden email]> ---
Thanks Mark,

appreciate all the time spent, I also agree this looks a complex one.

Reading mod_jk logs it seems the cping / cpong is not coming back but as you
also see the second httpd pointing to the same tomcat gets replies...

So to me the issue is definitely in the cping cpong (mode A), but why does it
fail sometimes, no idea ... Also as you can see in the logs the only server
failing is the one running tomcat 9, all the others works fine, so to me the
bug is on tomcat too.

--
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[Bug 63859] AJP cping/cpong mode failing on Tomcat 9.x

Bugzilla from bugzilla@apache.org
In reply to this post by Bugzilla from bugzilla@apache.org
https://bz.apache.org/bugzilla/show_bug.cgi?id=63859

Mark Thomas <[hidden email]> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |NEEDINFO

--- Comment #14 from Mark Thomas <[hidden email]> ---
I am unable to reproduce this.

CPing/CPong mode "A" is equivalent to mode "CPI". Clearly, mode "C" is not an
issue but it would be useful to know if it is enabling P or I that triggers the
issue. You can use mode "CP" and "CI" to test this.

I've been looking through the mod_jk logs in more detail and it is a P test
that fails and takes Tomcat off-line but I see "I" test failures as well.

It is interesting that - in the logs we have - it is only srv09 that is seeing
these errors. Is there something about those two instances that is unique?

I have also been looking at the Tomcat code. I do see something that isn't
quite right (after a CPing/CPong Tomcat always does a blocking read) but I
don't see how that could be triggering the error you see. I'll fix this anyway
for 9.0.x and 8.5.x.

The debug logging (as previously mentioned for Tomcat) may help. If not, we'll
need to ad more logging on the Tomcat side to try and get to the bottom of
this.

--
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[Bug 63859] AJP cping/cpong mode failing on Tomcat 9.x

Bugzilla from bugzilla@apache.org
In reply to this post by Bugzilla from bugzilla@apache.org
https://bz.apache.org/bugzilla/show_bug.cgi?id=63859

--- Comment #15 from Aurelien Pernoud <[hidden email]> ---
Hi Mark,

thanks for all the testing.

"It is interesting that - in the logs we have - it is only srv09 that is seeing
these errors. Is there something about those two instances that is unique?"

=> Yes, it's the only instance runnning Tomcat 9.0.x. Others (srv01->srv08) are
running Tomcat 7.0.x and 8.0.x.
I have no 8.5.x instances, so if the "bug" you found is due to a change in
Tomca > 8.5.x it might be it :)

Anyway, I'll make tests with : CI / CP only (see which one triggers the rror)
and also activate debug on Tomcat 9 as required and will try to provide
everything back when possible.

Appreciate your help !

--
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]