NGINX + tomcat 8.0.35 (110: Connection timed out)

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
18 messages Options
Reply | Threaded
Open this post in threaded view
|

NGINX + tomcat 8.0.35 (110: Connection timed out)

Ayub Khan
During high load of 16k requests per minute, we notice below error in log.

 [error] 2437#2437: *13335389 upstream timed out (110: Connection timed
out) while reading response header from upstream,  server: jahez.net,
request: "GET /serviceContext/ServiceName?callback= HTTP/1.1", upstream: "
http://127.0.0.1:8080/serviceContext/ServiceName

Below is the flow of requests:

cloudflare-->AWS ALB--> NGINX--> Tomcat-->Elastic-search

In NGINX we have the below config

location /serviceContext/ServiceName{

    proxy_pass          http://localhost:8080/serviceContext/ServiceName;
   proxy_http_version  1.1;
    proxy_set_header    Connection          $connection_upgrade;
    proxy_set_header    Upgrade             $http_upgrade;
    proxy_set_header    Host                      $host;
    proxy_set_header    X-Real-IP              $remote_addr;
    proxy_set_header    X-Forwarded-For     $proxy_add_x_forwarded_for;


        proxy_buffers 16 16k;
        proxy_buffer_size 32k;
}

below is tomcat connector config

<Connector port="8080"
               protocol="org.apache.coyote.http11.Http11NioProtocol"
               connectionTimeout="200" maxThreads="50000"
               URIEncoding="UTF-8"
               redirectPort="8443" />


We monitor the open file using *watch "sudo ls /proc/`cat
/var/run/tomcat8.pid`/fd/ | wc -l" *the number of tomcat open files keeps
increasing slowing the responses. the only option to recover from this is
to restart tomcat.
Reply | Threaded
Open this post in threaded view
|

Re: NGINX + tomcat 8.0.35 (110: Connection timed out)

Christopher Schultz-2
Ayub,

On 10/28/20 23:28, Ayub Khan wrote:

> During high load of 16k requests per minute, we notice below error in log.
>
>   [error] 2437#2437: *13335389 upstream timed out (110: Connection timed
> out) while reading response header from upstream,  server: jahez.net,
> request: "GET /serviceContext/ServiceName?callback= HTTP/1.1", upstream: "
> http://127.0.0.1:8080/serviceContext/ServiceName
>
> Below is the flow of requests:
>
> cloudflare-->AWS ALB--> NGINX--> Tomcat-->Elastic-search

I'm curious about why you are using all of cloudflare and ALB and nginx.
Seems like any one of those could provide what you are getting from all
3 of them.

> In NGINX we have the below config
>
> location /serviceContext/ServiceName{
>
>      proxy_pass          http://localhost:8080/serviceContext/ServiceName;
>     proxy_http_version  1.1;
>      proxy_set_header    Connection          $connection_upgrade;
>      proxy_set_header    Upgrade             $http_upgrade;
>      proxy_set_header    Host                      $host;
>      proxy_set_header    X-Real-IP              $remote_addr;
>      proxy_set_header    X-Forwarded-For     $proxy_add_x_forwarded_for;
>
>
>          proxy_buffers 16 16k;
>          proxy_buffer_size 32k;
> }

What is the maximum number of simultaneous requests that one nginx
instance will accept? What is the maximum number of simultaneous proxied
requests one nginx instance will make to a back-end Tomcat node? How
many nginx nodes do you have? How many Tomcat nodes?

> below is tomcat connector config
>
> <Connector port="8080"
>                 protocol="org.apache.coyote.http11.Http11NioProtocol"
>                 connectionTimeout="200" maxThreads="50000"
>                 URIEncoding="UTF-8"
>                 redirectPort="8443" />

50,000 threads is a LOT of threads.

> We monitor the open file using *watch "sudo ls /proc/`cat
> /var/run/tomcat8.pid`/fd/ | wc -l" *the number of tomcat open files keeps
> increasing slowing the responses. the only option to recover from this is
> to restart tomcat.

So this looks like Linux (/proc filesystem). Linux kernels have a 16-bit
pid space which means a theoretical max pid of 65535. In practice, the
max pid is actually to be found here:

$ cat /proc/sys/kernel/pid_max
32768

(on my Debian Linux system, 4.9.0-era kernel)

Each thread takes a pid. 50k threads means more than the maximum allowed
on the OS. So you will eventually hit some kind of serious problem with
that many threads.

How many fds do you get in the process before Tomcat grinds to a halt?
What does the CPU usage look like? The process I/O? Disk usage? What
does a thread dump look like (if you have the disk space to dump it!)?

Why do you need that many threads?

-chris

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: NGINX + tomcat 8.0.35 (110: Connection timed out)

Ayub Khan
Chris,



*I'm curious about why you are using all of cloudflare and ALB and
nginx.Seems like any one of those could provide what you are getting from
all3 of them. *

Cloudflare is doing just the DNS and nginx is doing ssl termination




*What is the maximum number of simultaneous requests that one nginxinstance
will accept? What is the maximum number of simultaneous proxiedrequests one
nginx instance will make to a back-end Tomcat node? Howmany nginx nodes do
you have? How many Tomcat nodes?  *

We have 4 vms each having nginx and tomcat running on them and each tomcat
has nginx in front of them to proxy the requests. So it's one Nginx
proxying to a dedicated tomcat on the same VM.

below is the tomcat connector configuration

<Connector port="8080"
               connectionTimeout="60000" maxThreads="2000"
               protocol="org.apache.coyote.http11.Http11NioProtocol"
               URIEncoding="UTF-8"
               redirectPort="8443" />

When I am doing a load test of 2000 concurrent users I see the open files
increase to 10,320 and when I take thread dump I see the threads are in a
waiting state.Slowly as the requests are completed I see the open files
come down to normal levels.

The output of the below command is
sudo cat /proc/sys/kernel/pid_max
131072

I am testing this on a c4.8xlarge VM in AWS.

below is the config I changed in nginx.conf file

events {
        worker_connections 50000;
        # multi_accept on;
}

worker_rlimit_nofile 30000;

What would be the ideal config for tomcat and Nginx so this setup on
c4.8xlarge vm could serve at least 5k or 10k requests simultaneously
without causing the open files to spike to 10K.



On Thu, Oct 29, 2020 at 10:29 PM Christopher Schultz <
[hidden email]> wrote:

> Ayub,
>
> On 10/28/20 23:28, Ayub Khan wrote:
> > During high load of 16k requests per minute, we notice below error in
> log.
> >
> >   [error] 2437#2437: *13335389 upstream timed out (110: Connection timed
> > out) while reading response header from upstream,  server: jahez.net,
> > request: "GET /serviceContext/ServiceName?callback= HTTP/1.1", upstream:
> "
> > http://127.0.0.1:8080/serviceContext/ServiceName
> >
> > Below is the flow of requests:
> >
> > cloudflare-->AWS ALB--> NGINX--> Tomcat-->Elastic-search
>
> I'm curious about why you are using all of cloudflare and ALB and nginx.
> Seems like any one of those could provide what you are getting from all
> 3 of them.
>
> > In NGINX we have the below config
> >
> > location /serviceContext/ServiceName{
> >
> >      proxy_pass
> http://localhost:8080/serviceContext/ServiceName;
> >     proxy_http_version  1.1;
> >      proxy_set_header    Connection          $connection_upgrade;
> >      proxy_set_header    Upgrade             $http_upgrade;
> >      proxy_set_header    Host                      $host;
> >      proxy_set_header    X-Real-IP              $remote_addr;
> >      proxy_set_header    X-Forwarded-For     $proxy_add_x_forwarded_for;
> >
> >
> >          proxy_buffers 16 16k;
> >          proxy_buffer_size 32k;
> > }
>
> What is the maximum number of simultaneous requests that one nginx
> instance will accept? What is the maximum number of simultaneous proxied
> requests one nginx instance will make to a back-end Tomcat node? How
> many nginx nodes do you have? How many Tomcat nodes?
>
> > below is tomcat connector config
> >
> > <Connector port="8080"
> >                 protocol="org.apache.coyote.http11.Http11NioProtocol"
> >                 connectionTimeout="200" maxThreads="50000"
> >                 URIEncoding="UTF-8"
> >                 redirectPort="8443" />
>
> 50,000 threads is a LOT of threads.
>
> > We monitor the open file using *watch "sudo ls /proc/`cat
> > /var/run/tomcat8.pid`/fd/ | wc -l" *the number of tomcat open files keeps
> > increasing slowing the responses. the only option to recover from this is
> > to restart tomcat.
>
> So this looks like Linux (/proc filesystem). Linux kernels have a 16-bit
> pid space which means a theoretical max pid of 65535. In practice, the
> max pid is actually to be found here:
>
> $ cat /proc/sys/kernel/pid_max
> 32768
>
> (on my Debian Linux system, 4.9.0-era kernel)
>
> Each thread takes a pid. 50k threads means more than the maximum allowed
> on the OS. So you will eventually hit some kind of serious problem with
> that many threads.
>
> How many fds do you get in the process before Tomcat grinds to a halt?
> What does the CPU usage look like? The process I/O? Disk usage? What
> does a thread dump look like (if you have the disk space to dump it!)?
>
> Why do you need that many threads?
>
> -chris
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>
>

--
--------------------------------------------------------------------
Sun Certified Enterprise Architect 1.5
Sun Certified Java Programmer 1.4
Microsoft Certified Systems Engineer 2000
http://in.linkedin.com/pub/ayub-khan/a/811/b81
mobile:+966-502674604
----------------------------------------------------------------------
It is proved that Hard Work and kowledge will get you close but attitude
will get you there. However, it's the Love
of God that will put you over the top!!
Reply | Threaded
Open this post in threaded view
|

Re: NGINX + tomcat 8.0.35 (110: Connection timed out)

Christopher Schultz-2
Ayub,

On 11/3/20 10:56, Ayub Khan wrote:
> *I'm curious about why you are using all of cloudflare and ALB and
> nginx.Seems like any one of those could provide what you are getting from
> all3 of them. *
>
> Cloudflare is doing just the DNS and nginx is doing ssl termination

What do you mean "Cloudflare is doing just the DNS?"

So what is ALB doing, then?

> *What is the maximum number of simultaneous requests that one nginxinstance
> will accept? What is the maximum number of simultaneous proxiedrequests one
> nginx instance will make to a back-end Tomcat node? Howmany nginx nodes do
> you have? How many Tomcat nodes?  *
>
> We have 4 vms each having nginx and tomcat running on them and each tomcat
> has nginx in front of them to proxy the requests. So it's one Nginx
> proxying to a dedicated tomcat on the same VM.

Okay.

> below is the tomcat connector configuration
>
> <Connector port="8080"
>                 connectionTimeout="60000" maxThreads="2000"
>                 protocol="org.apache.coyote.http11.Http11NioProtocol"
>                 URIEncoding="UTF-8"
>                 redirectPort="8443" />

60 seconds is a *long* time for a connection timeout.

Do you actually need 2000 threads? That's a lot, though not insane. 2000
threads means you expect to handle 2000 concurrent (non-async,
non-Wewbsocket) requests. Do you need that (per node)? Are you expecting
8000 concurrent requests? Does your load-balancer understand the
topography and current-load on any given node?

> When I am doing a load test of 2000 concurrent users I see the open files
> increase to 10,320 and when I take thread dump I see the threads are in a
> waiting state.Slowly as the requests are completed I see the open files
> come down to normal levels.

Are you performing your load-test against the CF/ALB/nginx/Tomcat stack,
or just hitting Tomcat (or nginx) directly?

Are you using HTTP keepalive in your load-test (from the client to
whichever server is being contacted)?

> The output of the below command is
> sudo cat /proc/sys/kernel/pid_max
> 131072
>
> I am testing this on a c4.8xlarge VM in AWS.
>
> below is the config I changed in nginx.conf file
>
> events {
>          worker_connections 50000;
>          # multi_accept on;
> }

This will allow 50k incoming connections, and Tomcat will accept an
unbounded number of connections (for NIO connector). So limiting your
threads to 2000 only means that the work of each request will be done in
groups of 2000.

> worker_rlimit_nofile 30000;

I'm not sure how many connections are handled by a single nginx worker.
If you accept 50k connections and only allow 30k file handles, you may
have a problem if that's all being done by a single worker.

> What would be the ideal config for tomcat and Nginx so this setup on
> c4.8xlarge vm could serve at least 5k or 10k requests simultaneously
> without causing the open files to spike to 10K.

You will never be able to serve 10k simultaneous requests without having
10k open files on the server. If you mean 10k requests across the whole
4-node environment, then I'd expect 10k requests to open (roughly) 2500
open files on each server. And of course, you need all kinds of other
files open as well, from JAR files to DB connections or other network
connections.

But each connection needs a file descriptor, full stop. If you need to
handle 10k connections, then you will need to make it possible to open
10k file handles /just for incoming network connections/ for that
process. There is no way around it.

Are you trying to hit a performance target or are you actively getting
errors with a particular configuration? Your subject says "Connection
Timed Out". Is it nginx that is reporting the connection timeout? Have
you checked on the Tomcat side what is happening with those requests?

-chris

> On Thu, Oct 29, 2020 at 10:29 PM Christopher Schultz <
> [hidden email]> wrote:
>
>> Ayub,
>>
>> On 10/28/20 23:28, Ayub Khan wrote:
>>> During high load of 16k requests per minute, we notice below error in
>> log.
>>>
>>>    [error] 2437#2437: *13335389 upstream timed out (110: Connection timed
>>> out) while reading response header from upstream,  server: jahez.net,
>>> request: "GET /serviceContext/ServiceName?callback= HTTP/1.1", upstream:
>> "
>>> http://127.0.0.1:8080/serviceContext/ServiceName
>>>
>>> Below is the flow of requests:
>>>
>>> cloudflare-->AWS ALB--> NGINX--> Tomcat-->Elastic-search
>>
>> I'm curious about why you are using all of cloudflare and ALB and nginx.
>> Seems like any one of those could provide what you are getting from all
>> 3 of them.
>>
>>> In NGINX we have the below config
>>>
>>> location /serviceContext/ServiceName{
>>>
>>>       proxy_pass
>> http://localhost:8080/serviceContext/ServiceName;
>>>      proxy_http_version  1.1;
>>>       proxy_set_header    Connection          $connection_upgrade;
>>>       proxy_set_header    Upgrade             $http_upgrade;
>>>       proxy_set_header    Host                      $host;
>>>       proxy_set_header    X-Real-IP              $remote_addr;
>>>       proxy_set_header    X-Forwarded-For     $proxy_add_x_forwarded_for;
>>>
>>>
>>>           proxy_buffers 16 16k;
>>>           proxy_buffer_size 32k;
>>> }
>>
>> What is the maximum number of simultaneous requests that one nginx
>> instance will accept? What is the maximum number of simultaneous proxied
>> requests one nginx instance will make to a back-end Tomcat node? How
>> many nginx nodes do you have? How many Tomcat nodes?
>>
>>> below is tomcat connector config
>>>
>>> <Connector port="8080"
>>>                  protocol="org.apache.coyote.http11.Http11NioProtocol"
>>>                  connectionTimeout="200" maxThreads="50000"
>>>                  URIEncoding="UTF-8"
>>>                  redirectPort="8443" />
>>
>> 50,000 threads is a LOT of threads.
>>
>>> We monitor the open file using *watch "sudo ls /proc/`cat
>>> /var/run/tomcat8.pid`/fd/ | wc -l" *the number of tomcat open files keeps
>>> increasing slowing the responses. the only option to recover from this is
>>> to restart tomcat.
>>
>> So this looks like Linux (/proc filesystem). Linux kernels have a 16-bit
>> pid space which means a theoretical max pid of 65535. In practice, the
>> max pid is actually to be found here:
>>
>> $ cat /proc/sys/kernel/pid_max
>> 32768
>>
>> (on my Debian Linux system, 4.9.0-era kernel)
>>
>> Each thread takes a pid. 50k threads means more than the maximum allowed
>> on the OS. So you will eventually hit some kind of serious problem with
>> that many threads.
>>
>> How many fds do you get in the process before Tomcat grinds to a halt?
>> What does the CPU usage look like? The process I/O? Disk usage? What
>> does a thread dump look like (if you have the disk space to dump it!)?
>>
>> Why do you need that many threads?
>>
>> -chris
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: [hidden email]
>> For additional commands, e-mail: [hidden email]
>>
>>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: NGINX + tomcat 8.0.35 (110: Connection timed out)

Ayub Khan
Chris,

I was load testing using the ec2 load balancer dns. I have increased the
connector timeout to 6000 and also gave 32gig to the JVM of tomcat. I am
not seeing connection timeout in nginx logs now. No errors in kernel.log I
am not seeing any errors in tomcat catalina.out.
During regular operations when the request count is between 4 to 6k
requests per minute the open files count for the tomcat process is between
200 to 350. Responses from tomcat are within 5 seconds.
If the requests count goes beyond 6.5 k open files slowly move up  to 2300
to 3000 and the request responses from tomcat become slow.

I am not concerned about high open files as I do not see any errors related
to open files. Only side effect of  open files going above 700 is the
response from tomcat is slow. I checked if this is caused from elastic
search, aws cloud watch shows elastic search response is within 5
milliseconds.

what might be the reason that when the open files goes beyond 600, it slows
down the response time for tomcat. I tried with tomcat 9 and it's the same
behavior






On Tue, Nov 3, 2020 at 9:40 PM Christopher Schultz <
[hidden email]> wrote:

> Ayub,
>
> On 11/3/20 10:56, Ayub Khan wrote:
> > *I'm curious about why you are using all of cloudflare and ALB and
> > nginx.Seems like any one of those could provide what you are getting from
> > all3 of them. *
> >
> > Cloudflare is doing just the DNS and nginx is doing ssl termination
>
> What do you mean "Cloudflare is doing just the DNS?"
>
> So what is ALB doing, then?
>
> > *What is the maximum number of simultaneous requests that one
> nginxinstance
> > will accept? What is the maximum number of simultaneous proxiedrequests
> one
> > nginx instance will make to a back-end Tomcat node? Howmany nginx nodes
> do
> > you have? How many Tomcat nodes?  *
> >
> > We have 4 vms each having nginx and tomcat running on them and each
> tomcat
> > has nginx in front of them to proxy the requests. So it's one Nginx
> > proxying to a dedicated tomcat on the same VM.
>
> Okay.
>
> > below is the tomcat connector configuration
> >
> > <Connector port="8080"
> >                 connectionTimeout="60000" maxThreads="2000"
> >                 protocol="org.apache.coyote.http11.Http11NioProtocol"
> >                 URIEncoding="UTF-8"
> >                 redirectPort="8443" />
>
> 60 seconds is a *long* time for a connection timeout.
>
> Do you actually need 2000 threads? That's a lot, though not insane. 2000
> threads means you expect to handle 2000 concurrent (non-async,
> non-Wewbsocket) requests. Do you need that (per node)? Are you expecting
> 8000 concurrent requests? Does your load-balancer understand the
> topography and current-load on any given node?
>
> > When I am doing a load test of 2000 concurrent users I see the open files
> > increase to 10,320 and when I take thread dump I see the threads are in a
> > waiting state.Slowly as the requests are completed I see the open files
> > come down to normal levels.
>
> Are you performing your load-test against the CF/ALB/nginx/Tomcat stack,
> or just hitting Tomcat (or nginx) directly?
>
> Are you using HTTP keepalive in your load-test (from the client to
> whichever server is being contacted)?
>
> > The output of the below command is
> > sudo cat /proc/sys/kernel/pid_max
> > 131072
> >
> > I am testing this on a c4.8xlarge VM in AWS.
> >
> > below is the config I changed in nginx.conf file
> >
> > events {
> >          worker_connections 50000;
> >          # multi_accept on;
> > }
>
> This will allow 50k incoming connections, and Tomcat will accept an
> unbounded number of connections (for NIO connector). So limiting your
> threads to 2000 only means that the work of each request will be done in
> groups of 2000.
>
> > worker_rlimit_nofile 30000;
>
> I'm not sure how many connections are handled by a single nginx worker.
> If you accept 50k connections and only allow 30k file handles, you may
> have a problem if that's all being done by a single worker.
>
> > What would be the ideal config for tomcat and Nginx so this setup on
> > c4.8xlarge vm could serve at least 5k or 10k requests simultaneously
> > without causing the open files to spike to 10K.
>
> You will never be able to serve 10k simultaneous requests without having
> 10k open files on the server. If you mean 10k requests across the whole
> 4-node environment, then I'd expect 10k requests to open (roughly) 2500
> open files on each server. And of course, you need all kinds of other
> files open as well, from JAR files to DB connections or other network
> connections.
>
> But each connection needs a file descriptor, full stop. If you need to
> handle 10k connections, then you will need to make it possible to open
> 10k file handles /just for incoming network connections/ for that
> process. There is no way around it.
>
> Are you trying to hit a performance target or are you actively getting
> errors with a particular configuration? Your subject says "Connection
> Timed Out". Is it nginx that is reporting the connection timeout? Have
> you checked on the Tomcat side what is happening with those requests?
>
> -chris
>
> > On Thu, Oct 29, 2020 at 10:29 PM Christopher Schultz <
> > [hidden email]> wrote:
> >
> >> Ayub,
> >>
> >> On 10/28/20 23:28, Ayub Khan wrote:
> >>> During high load of 16k requests per minute, we notice below error in
> >> log.
> >>>
> >>>    [error] 2437#2437: *13335389 upstream timed out (110: Connection
> timed
> >>> out) while reading response header from upstream,  server: jahez.net,
> >>> request: "GET /serviceContext/ServiceName?callback= HTTP/1.1",
> upstream:
> >> "
> >>> http://127.0.0.1:8080/serviceContext/ServiceName
> >>>
> >>> Below is the flow of requests:
> >>>
> >>> cloudflare-->AWS ALB--> NGINX--> Tomcat-->Elastic-search
> >>
> >> I'm curious about why you are using all of cloudflare and ALB and nginx.
> >> Seems like any one of those could provide what you are getting from all
> >> 3 of them.
> >>
> >>> In NGINX we have the below config
> >>>
> >>> location /serviceContext/ServiceName{
> >>>
> >>>       proxy_pass
> >> http://localhost:8080/serviceContext/ServiceName;
> >>>      proxy_http_version  1.1;
> >>>       proxy_set_header    Connection          $connection_upgrade;
> >>>       proxy_set_header    Upgrade             $http_upgrade;
> >>>       proxy_set_header    Host                      $host;
> >>>       proxy_set_header    X-Real-IP              $remote_addr;
> >>>       proxy_set_header    X-Forwarded-For
>  $proxy_add_x_forwarded_for;
> >>>
> >>>
> >>>           proxy_buffers 16 16k;
> >>>           proxy_buffer_size 32k;
> >>> }
> >>
> >> What is the maximum number of simultaneous requests that one nginx
> >> instance will accept? What is the maximum number of simultaneous proxied
> >> requests one nginx instance will make to a back-end Tomcat node? How
> >> many nginx nodes do you have? How many Tomcat nodes?
> >>
> >>> below is tomcat connector config
> >>>
> >>> <Connector port="8080"
> >>>                  protocol="org.apache.coyote.http11.Http11NioProtocol"
> >>>                  connectionTimeout="200" maxThreads="50000"
> >>>                  URIEncoding="UTF-8"
> >>>                  redirectPort="8443" />
> >>
> >> 50,000 threads is a LOT of threads.
> >>
> >>> We monitor the open file using *watch "sudo ls /proc/`cat
> >>> /var/run/tomcat8.pid`/fd/ | wc -l" *the number of tomcat open files
> keeps
> >>> increasing slowing the responses. the only option to recover from this
> is
> >>> to restart tomcat.
> >>
> >> So this looks like Linux (/proc filesystem). Linux kernels have a 16-bit
> >> pid space which means a theoretical max pid of 65535. In practice, the
> >> max pid is actually to be found here:
> >>
> >> $ cat /proc/sys/kernel/pid_max
> >> 32768
> >>
> >> (on my Debian Linux system, 4.9.0-era kernel)
> >>
> >> Each thread takes a pid. 50k threads means more than the maximum allowed
> >> on the OS. So you will eventually hit some kind of serious problem with
> >> that many threads.
> >>
> >> How many fds do you get in the process before Tomcat grinds to a halt?
> >> What does the CPU usage look like? The process I/O? Disk usage? What
> >> does a thread dump look like (if you have the disk space to dump it!)?
> >>
> >> Why do you need that many threads?
> >>
> >> -chris
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: [hidden email]
> >> For additional commands, e-mail: [hidden email]
> >>
> >>
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>
>

--
--------------------------------------------------------------------
Sun Certified Enterprise Architect 1.5
Sun Certified Java Programmer 1.4
Microsoft Certified Systems Engineer 2000
http://in.linkedin.com/pub/ayub-khan/a/811/b81
mobile:+966-502674604
----------------------------------------------------------------------
It is proved that Hard Work and kowledge will get you close but attitude
will get you there. However, it's the Love
of God that will put you over the top!!
Reply | Threaded
Open this post in threaded view
|

Re: NGINX + tomcat 8.0.35 (110: Connection timed out)

mgrigorov
On Wed, Nov 11, 2020 at 11:17 PM Ayub Khan <[hidden email]> wrote:

> Chris,
>
> I was load testing using the ec2 load balancer dns. I have increased the
> connector timeout to 6000 and also gave 32gig to the JVM of tomcat. I am
> not seeing connection timeout in nginx logs now. No errors in kernel.log I
> am not seeing any errors in tomcat catalina.out.
> During regular operations when the request count is between 4 to 6k
> requests per minute the open files count for the tomcat process is between
> 200 to 350. Responses from tomcat are within 5 seconds.
> If the requests count goes beyond 6.5 k open files slowly move up  to 2300
> to 3000 and the request responses from tomcat become slow.
>
> I am not concerned about high open files as I do not see any errors related
> to open files. Only side effect of  open files going above 700 is the
> response from tomcat is slow. I checked if this is caused from elastic
> search, aws cloud watch shows elastic search response is within 5
> milliseconds.
>
> what might be the reason that when the open files goes beyond 600, it slows
> down the response time for tomcat. I tried with tomcat 9 and it's the same
> behavior
>

Do you know what kind of files are being opened ?


>
>
>
>
>
>
> On Tue, Nov 3, 2020 at 9:40 PM Christopher Schultz <
> [hidden email]> wrote:
>
> > Ayub,
> >
> > On 11/3/20 10:56, Ayub Khan wrote:
> > > *I'm curious about why you are using all of cloudflare and ALB and
> > > nginx.Seems like any one of those could provide what you are getting
> from
> > > all3 of them. *
> > >
> > > Cloudflare is doing just the DNS and nginx is doing ssl termination
> >
> > What do you mean "Cloudflare is doing just the DNS?"
> >
> > So what is ALB doing, then?
> >
> > > *What is the maximum number of simultaneous requests that one
> > nginxinstance
> > > will accept? What is the maximum number of simultaneous proxiedrequests
> > one
> > > nginx instance will make to a back-end Tomcat node? Howmany nginx nodes
> > do
> > > you have? How many Tomcat nodes?  *
> > >
> > > We have 4 vms each having nginx and tomcat running on them and each
> > tomcat
> > > has nginx in front of them to proxy the requests. So it's one Nginx
> > > proxying to a dedicated tomcat on the same VM.
> >
> > Okay.
> >
> > > below is the tomcat connector configuration
> > >
> > > <Connector port="8080"
> > >                 connectionTimeout="60000" maxThreads="2000"
> > >                 protocol="org.apache.coyote.http11.Http11NioProtocol"
> > >                 URIEncoding="UTF-8"
> > >                 redirectPort="8443" />
> >
> > 60 seconds is a *long* time for a connection timeout.
> >
> > Do you actually need 2000 threads? That's a lot, though not insane. 2000
> > threads means you expect to handle 2000 concurrent (non-async,
> > non-Wewbsocket) requests. Do you need that (per node)? Are you expecting
> > 8000 concurrent requests? Does your load-balancer understand the
> > topography and current-load on any given node?
> >
> > > When I am doing a load test of 2000 concurrent users I see the open
> files
> > > increase to 10,320 and when I take thread dump I see the threads are
> in a
> > > waiting state.Slowly as the requests are completed I see the open files
> > > come down to normal levels.
> >
> > Are you performing your load-test against the CF/ALB/nginx/Tomcat stack,
> > or just hitting Tomcat (or nginx) directly?
> >
> > Are you using HTTP keepalive in your load-test (from the client to
> > whichever server is being contacted)?
> >
> > > The output of the below command is
> > > sudo cat /proc/sys/kernel/pid_max
> > > 131072
> > >
> > > I am testing this on a c4.8xlarge VM in AWS.
> > >
> > > below is the config I changed in nginx.conf file
> > >
> > > events {
> > >          worker_connections 50000;
> > >          # multi_accept on;
> > > }
> >
> > This will allow 50k incoming connections, and Tomcat will accept an
> > unbounded number of connections (for NIO connector). So limiting your
> > threads to 2000 only means that the work of each request will be done in
> > groups of 2000.
> >
> > > worker_rlimit_nofile 30000;
> >
> > I'm not sure how many connections are handled by a single nginx worker.
> > If you accept 50k connections and only allow 30k file handles, you may
> > have a problem if that's all being done by a single worker.
> >
> > > What would be the ideal config for tomcat and Nginx so this setup on
> > > c4.8xlarge vm could serve at least 5k or 10k requests simultaneously
> > > without causing the open files to spike to 10K.
> >
> > You will never be able to serve 10k simultaneous requests without having
> > 10k open files on the server. If you mean 10k requests across the whole
> > 4-node environment, then I'd expect 10k requests to open (roughly) 2500
> > open files on each server. And of course, you need all kinds of other
> > files open as well, from JAR files to DB connections or other network
> > connections.
> >
> > But each connection needs a file descriptor, full stop. If you need to
> > handle 10k connections, then you will need to make it possible to open
> > 10k file handles /just for incoming network connections/ for that
> > process. There is no way around it.
> >
> > Are you trying to hit a performance target or are you actively getting
> > errors with a particular configuration? Your subject says "Connection
> > Timed Out". Is it nginx that is reporting the connection timeout? Have
> > you checked on the Tomcat side what is happening with those requests?
> >
> > -chris
> >
> > > On Thu, Oct 29, 2020 at 10:29 PM Christopher Schultz <
> > > [hidden email]> wrote:
> > >
> > >> Ayub,
> > >>
> > >> On 10/28/20 23:28, Ayub Khan wrote:
> > >>> During high load of 16k requests per minute, we notice below error in
> > >> log.
> > >>>
> > >>>    [error] 2437#2437: *13335389 upstream timed out (110: Connection
> > timed
> > >>> out) while reading response header from upstream,  server: jahez.net
> ,
> > >>> request: "GET /serviceContext/ServiceName?callback= HTTP/1.1",
> > upstream:
> > >> "
> > >>> http://127.0.0.1:8080/serviceContext/ServiceName
> > >>>
> > >>> Below is the flow of requests:
> > >>>
> > >>> cloudflare-->AWS ALB--> NGINX--> Tomcat-->Elastic-search
> > >>
> > >> I'm curious about why you are using all of cloudflare and ALB and
> nginx.
> > >> Seems like any one of those could provide what you are getting from
> all
> > >> 3 of them.
> > >>
> > >>> In NGINX we have the below config
> > >>>
> > >>> location /serviceContext/ServiceName{
> > >>>
> > >>>       proxy_pass
> > >> http://localhost:8080/serviceContext/ServiceName;
> > >>>      proxy_http_version  1.1;
> > >>>       proxy_set_header    Connection          $connection_upgrade;
> > >>>       proxy_set_header    Upgrade             $http_upgrade;
> > >>>       proxy_set_header    Host                      $host;
> > >>>       proxy_set_header    X-Real-IP              $remote_addr;
> > >>>       proxy_set_header    X-Forwarded-For
> >  $proxy_add_x_forwarded_for;
> > >>>
> > >>>
> > >>>           proxy_buffers 16 16k;
> > >>>           proxy_buffer_size 32k;
> > >>> }
> > >>
> > >> What is the maximum number of simultaneous requests that one nginx
> > >> instance will accept? What is the maximum number of simultaneous
> proxied
> > >> requests one nginx instance will make to a back-end Tomcat node? How
> > >> many nginx nodes do you have? How many Tomcat nodes?
> > >>
> > >>> below is tomcat connector config
> > >>>
> > >>> <Connector port="8080"
> > >>>
> protocol="org.apache.coyote.http11.Http11NioProtocol"
> > >>>                  connectionTimeout="200" maxThreads="50000"
> > >>>                  URIEncoding="UTF-8"
> > >>>                  redirectPort="8443" />
> > >>
> > >> 50,000 threads is a LOT of threads.
> > >>
> > >>> We monitor the open file using *watch "sudo ls /proc/`cat
> > >>> /var/run/tomcat8.pid`/fd/ | wc -l" *the number of tomcat open files
> > keeps
> > >>> increasing slowing the responses. the only option to recover from
> this
> > is
> > >>> to restart tomcat.
> > >>
> > >> So this looks like Linux (/proc filesystem). Linux kernels have a
> 16-bit
> > >> pid space which means a theoretical max pid of 65535. In practice, the
> > >> max pid is actually to be found here:
> > >>
> > >> $ cat /proc/sys/kernel/pid_max
> > >> 32768
> > >>
> > >> (on my Debian Linux system, 4.9.0-era kernel)
> > >>
> > >> Each thread takes a pid. 50k threads means more than the maximum
> allowed
> > >> on the OS. So you will eventually hit some kind of serious problem
> with
> > >> that many threads.
> > >>
> > >> How many fds do you get in the process before Tomcat grinds to a halt?
> > >> What does the CPU usage look like? The process I/O? Disk usage? What
> > >> does a thread dump look like (if you have the disk space to dump it!)?
> > >>
> > >> Why do you need that many threads?
> > >>
> > >> -chris
> > >>
> > >> ---------------------------------------------------------------------
> > >> To unsubscribe, e-mail: [hidden email]
> > >> For additional commands, e-mail: [hidden email]
> > >>
> > >>
> > >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: [hidden email]
> > For additional commands, e-mail: [hidden email]
> >
> >
>
> --
> --------------------------------------------------------------------
> Sun Certified Enterprise Architect 1.5
> Sun Certified Java Programmer 1.4
> Microsoft Certified Systems Engineer 2000
> http://in.linkedin.com/pub/ayub-khan/a/811/b81
> mobile:+966-502674604
> ----------------------------------------------------------------------
> It is proved that Hard Work and kowledge will get you close but attitude
> will get you there. However, it's the Love
> of God that will put you over the top!!
>
Reply | Threaded
Open this post in threaded view
|

Re: NGINX + tomcat 8.0.35 (110: Connection timed out)

Ayub Khan
Martin,

These are file descriptors, some are related to the jar files which are
included in the web application and some are related to the sockets from
nginx to tomcat and some are related to database connections. I use the
below command to count the open file descriptors

watch "sudo ls /proc/`cat /var/run/tomcat8.pid`/fd/ | wc -l"



On Thu, Nov 12, 2020 at 10:56 AM Martin Grigorov <[hidden email]>
wrote:

> On Wed, Nov 11, 2020 at 11:17 PM Ayub Khan <[hidden email]> wrote:
>
> > Chris,
> >
> > I was load testing using the ec2 load balancer dns. I have increased the
> > connector timeout to 6000 and also gave 32gig to the JVM of tomcat. I am
> > not seeing connection timeout in nginx logs now. No errors in kernel.log
> I
> > am not seeing any errors in tomcat catalina.out.
> > During regular operations when the request count is between 4 to 6k
> > requests per minute the open files count for the tomcat process is
> between
> > 200 to 350. Responses from tomcat are within 5 seconds.
> > If the requests count goes beyond 6.5 k open files slowly move up  to
> 2300
> > to 3000 and the request responses from tomcat become slow.
> >
> > I am not concerned about high open files as I do not see any errors
> related
> > to open files. Only side effect of  open files going above 700 is the
> > response from tomcat is slow. I checked if this is caused from elastic
> > search, aws cloud watch shows elastic search response is within 5
> > milliseconds.
> >
> > what might be the reason that when the open files goes beyond 600, it
> slows
> > down the response time for tomcat. I tried with tomcat 9 and it's the
> same
> > behavior
> >
>
> Do you know what kind of files are being opened ?
>
>
> >
> >
> >
> >
> >
> >
> > On Tue, Nov 3, 2020 at 9:40 PM Christopher Schultz <
> > [hidden email]> wrote:
> >
> > > Ayub,
> > >
> > > On 11/3/20 10:56, Ayub Khan wrote:
> > > > *I'm curious about why you are using all of cloudflare and ALB and
> > > > nginx.Seems like any one of those could provide what you are getting
> > from
> > > > all3 of them. *
> > > >
> > > > Cloudflare is doing just the DNS and nginx is doing ssl termination
> > >
> > > What do you mean "Cloudflare is doing just the DNS?"
> > >
> > > So what is ALB doing, then?
> > >
> > > > *What is the maximum number of simultaneous requests that one
> > > nginxinstance
> > > > will accept? What is the maximum number of simultaneous
> proxiedrequests
> > > one
> > > > nginx instance will make to a back-end Tomcat node? Howmany nginx
> nodes
> > > do
> > > > you have? How many Tomcat nodes?  *
> > > >
> > > > We have 4 vms each having nginx and tomcat running on them and each
> > > tomcat
> > > > has nginx in front of them to proxy the requests. So it's one Nginx
> > > > proxying to a dedicated tomcat on the same VM.
> > >
> > > Okay.
> > >
> > > > below is the tomcat connector configuration
> > > >
> > > > <Connector port="8080"
> > > >                 connectionTimeout="60000" maxThreads="2000"
> > > >                 protocol="org.apache.coyote.http11.Http11NioProtocol"
> > > >                 URIEncoding="UTF-8"
> > > >                 redirectPort="8443" />
> > >
> > > 60 seconds is a *long* time for a connection timeout.
> > >
> > > Do you actually need 2000 threads? That's a lot, though not insane.
> 2000
> > > threads means you expect to handle 2000 concurrent (non-async,
> > > non-Wewbsocket) requests. Do you need that (per node)? Are you
> expecting
> > > 8000 concurrent requests? Does your load-balancer understand the
> > > topography and current-load on any given node?
> > >
> > > > When I am doing a load test of 2000 concurrent users I see the open
> > files
> > > > increase to 10,320 and when I take thread dump I see the threads are
> > in a
> > > > waiting state.Slowly as the requests are completed I see the open
> files
> > > > come down to normal levels.
> > >
> > > Are you performing your load-test against the CF/ALB/nginx/Tomcat
> stack,
> > > or just hitting Tomcat (or nginx) directly?
> > >
> > > Are you using HTTP keepalive in your load-test (from the client to
> > > whichever server is being contacted)?
> > >
> > > > The output of the below command is
> > > > sudo cat /proc/sys/kernel/pid_max
> > > > 131072
> > > >
> > > > I am testing this on a c4.8xlarge VM in AWS.
> > > >
> > > > below is the config I changed in nginx.conf file
> > > >
> > > > events {
> > > >          worker_connections 50000;
> > > >          # multi_accept on;
> > > > }
> > >
> > > This will allow 50k incoming connections, and Tomcat will accept an
> > > unbounded number of connections (for NIO connector). So limiting your
> > > threads to 2000 only means that the work of each request will be done
> in
> > > groups of 2000.
> > >
> > > > worker_rlimit_nofile 30000;
> > >
> > > I'm not sure how many connections are handled by a single nginx worker.
> > > If you accept 50k connections and only allow 30k file handles, you may
> > > have a problem if that's all being done by a single worker.
> > >
> > > > What would be the ideal config for tomcat and Nginx so this setup on
> > > > c4.8xlarge vm could serve at least 5k or 10k requests simultaneously
> > > > without causing the open files to spike to 10K.
> > >
> > > You will never be able to serve 10k simultaneous requests without
> having
> > > 10k open files on the server. If you mean 10k requests across the whole
> > > 4-node environment, then I'd expect 10k requests to open (roughly) 2500
> > > open files on each server. And of course, you need all kinds of other
> > > files open as well, from JAR files to DB connections or other network
> > > connections.
> > >
> > > But each connection needs a file descriptor, full stop. If you need to
> > > handle 10k connections, then you will need to make it possible to open
> > > 10k file handles /just for incoming network connections/ for that
> > > process. There is no way around it.
> > >
> > > Are you trying to hit a performance target or are you actively getting
> > > errors with a particular configuration? Your subject says "Connection
> > > Timed Out". Is it nginx that is reporting the connection timeout? Have
> > > you checked on the Tomcat side what is happening with those requests?
> > >
> > > -chris
> > >
> > > > On Thu, Oct 29, 2020 at 10:29 PM Christopher Schultz <
> > > > [hidden email]> wrote:
> > > >
> > > >> Ayub,
> > > >>
> > > >> On 10/28/20 23:28, Ayub Khan wrote:
> > > >>> During high load of 16k requests per minute, we notice below error
> in
> > > >> log.
> > > >>>
> > > >>>    [error] 2437#2437: *13335389 upstream timed out (110: Connection
> > > timed
> > > >>> out) while reading response header from upstream,  server:
> jahez.net
> > ,
> > > >>> request: "GET /serviceContext/ServiceName?callback= HTTP/1.1",
> > > upstream:
> > > >> "
> > > >>> http://127.0.0.1:8080/serviceContext/ServiceName
> > > >>>
> > > >>> Below is the flow of requests:
> > > >>>
> > > >>> cloudflare-->AWS ALB--> NGINX--> Tomcat-->Elastic-search
> > > >>
> > > >> I'm curious about why you are using all of cloudflare and ALB and
> > nginx.
> > > >> Seems like any one of those could provide what you are getting from
> > all
> > > >> 3 of them.
> > > >>
> > > >>> In NGINX we have the below config
> > > >>>
> > > >>> location /serviceContext/ServiceName{
> > > >>>
> > > >>>       proxy_pass
> > > >> http://localhost:8080/serviceContext/ServiceName;
> > > >>>      proxy_http_version  1.1;
> > > >>>       proxy_set_header    Connection          $connection_upgrade;
> > > >>>       proxy_set_header    Upgrade             $http_upgrade;
> > > >>>       proxy_set_header    Host                      $host;
> > > >>>       proxy_set_header    X-Real-IP              $remote_addr;
> > > >>>       proxy_set_header    X-Forwarded-For
> > >  $proxy_add_x_forwarded_for;
> > > >>>
> > > >>>
> > > >>>           proxy_buffers 16 16k;
> > > >>>           proxy_buffer_size 32k;
> > > >>> }
> > > >>
> > > >> What is the maximum number of simultaneous requests that one nginx
> > > >> instance will accept? What is the maximum number of simultaneous
> > proxied
> > > >> requests one nginx instance will make to a back-end Tomcat node? How
> > > >> many nginx nodes do you have? How many Tomcat nodes?
> > > >>
> > > >>> below is tomcat connector config
> > > >>>
> > > >>> <Connector port="8080"
> > > >>>
> > protocol="org.apache.coyote.http11.Http11NioProtocol"
> > > >>>                  connectionTimeout="200" maxThreads="50000"
> > > >>>                  URIEncoding="UTF-8"
> > > >>>                  redirectPort="8443" />
> > > >>
> > > >> 50,000 threads is a LOT of threads.
> > > >>
> > > >>> We monitor the open file using *watch "sudo ls /proc/`cat
> > > >>> /var/run/tomcat8.pid`/fd/ | wc -l" *the number of tomcat open files
> > > keeps
> > > >>> increasing slowing the responses. the only option to recover from
> > this
> > > is
> > > >>> to restart tomcat.
> > > >>
> > > >> So this looks like Linux (/proc filesystem). Linux kernels have a
> > 16-bit
> > > >> pid space which means a theoretical max pid of 65535. In practice,
> the
> > > >> max pid is actually to be found here:
> > > >>
> > > >> $ cat /proc/sys/kernel/pid_max
> > > >> 32768
> > > >>
> > > >> (on my Debian Linux system, 4.9.0-era kernel)
> > > >>
> > > >> Each thread takes a pid. 50k threads means more than the maximum
> > allowed
> > > >> on the OS. So you will eventually hit some kind of serious problem
> > with
> > > >> that many threads.
> > > >>
> > > >> How many fds do you get in the process before Tomcat grinds to a
> halt?
> > > >> What does the CPU usage look like? The process I/O? Disk usage? What
> > > >> does a thread dump look like (if you have the disk space to dump
> it!)?
> > > >>
> > > >> Why do you need that many threads?
> > > >>
> > > >> -chris
> > > >>
> > > >>
> ---------------------------------------------------------------------
> > > >> To unsubscribe, e-mail: [hidden email]
> > > >> For additional commands, e-mail: [hidden email]
> > > >>
> > > >>
> > > >
> > >
> > > ---------------------------------------------------------------------
> > > To unsubscribe, e-mail: [hidden email]
> > > For additional commands, e-mail: [hidden email]
> > >
> > >
> >
> > --
> > --------------------------------------------------------------------
> > Sun Certified Enterprise Architect 1.5
> > Sun Certified Java Programmer 1.4
> > Microsoft Certified Systems Engineer 2000
> > http://in.linkedin.com/pub/ayub-khan/a/811/b81
> > mobile:+966-502674604
> > ----------------------------------------------------------------------
> > It is proved that Hard Work and kowledge will get you close but attitude
> > will get you there. However, it's the Love
> > of God that will put you over the top!!
> >
>


--
--------------------------------------------------------------------
Sun Certified Enterprise Architect 1.5
Sun Certified Java Programmer 1.4
Microsoft Certified Systems Engineer 2000
http://in.linkedin.com/pub/ayub-khan/a/811/b81
mobile:+966-502674604
----------------------------------------------------------------------
It is proved that Hard Work and kowledge will get you close but attitude
will get you there. However, it's the Love
of God that will put you over the top!!
Reply | Threaded
Open this post in threaded view
|

Re: NGINX + tomcat 8.0.35 (110: Connection timed out)

mgrigorov
On Thu, Nov 12, 2020 at 10:37 AM Ayub Khan <[hidden email]> wrote:

> Martin,
>
> These are file descriptors, some are related to the jar files which are
> included in the web application and some are related to the sockets from
> nginx to tomcat and some are related to database connections. I use the
> below command to count the open file descriptors
>

which type of connections increase ?
the sockets ? the DB ones ?


>
> watch "sudo ls /proc/`cat /var/run/tomcat8.pid`/fd/ | wc -l"
>

you can also use lsof command


>
>
>
> On Thu, Nov 12, 2020 at 10:56 AM Martin Grigorov <[hidden email]>
> wrote:
>
> > On Wed, Nov 11, 2020 at 11:17 PM Ayub Khan <[hidden email]> wrote:
> >
> > > Chris,
> > >
> > > I was load testing using the ec2 load balancer dns. I have increased
> the
> > > connector timeout to 6000 and also gave 32gig to the JVM of tomcat. I
> am
> > > not seeing connection timeout in nginx logs now. No errors in
> kernel.log
> > I
> > > am not seeing any errors in tomcat catalina.out.
> > > During regular operations when the request count is between 4 to 6k
> > > requests per minute the open files count for the tomcat process is
> > between
> > > 200 to 350. Responses from tomcat are within 5 seconds.
> > > If the requests count goes beyond 6.5 k open files slowly move up  to
> > 2300
> > > to 3000 and the request responses from tomcat become slow.
> > >
> > > I am not concerned about high open files as I do not see any errors
> > related
> > > to open files. Only side effect of  open files going above 700 is the
> > > response from tomcat is slow. I checked if this is caused from elastic
> > > search, aws cloud watch shows elastic search response is within 5
> > > milliseconds.
> > >
> > > what might be the reason that when the open files goes beyond 600, it
> > slows
> > > down the response time for tomcat. I tried with tomcat 9 and it's the
> > same
> > > behavior
> > >
> >
> > Do you know what kind of files are being opened ?
> >
> >
> > >
> > >
> > >
> > >
> > >
> > >
> > > On Tue, Nov 3, 2020 at 9:40 PM Christopher Schultz <
> > > [hidden email]> wrote:
> > >
> > > > Ayub,
> > > >
> > > > On 11/3/20 10:56, Ayub Khan wrote:
> > > > > *I'm curious about why you are using all of cloudflare and ALB and
> > > > > nginx.Seems like any one of those could provide what you are
> getting
> > > from
> > > > > all3 of them. *
> > > > >
> > > > > Cloudflare is doing just the DNS and nginx is doing ssl termination
> > > >
> > > > What do you mean "Cloudflare is doing just the DNS?"
> > > >
> > > > So what is ALB doing, then?
> > > >
> > > > > *What is the maximum number of simultaneous requests that one
> > > > nginxinstance
> > > > > will accept? What is the maximum number of simultaneous
> > proxiedrequests
> > > > one
> > > > > nginx instance will make to a back-end Tomcat node? Howmany nginx
> > nodes
> > > > do
> > > > > you have? How many Tomcat nodes?  *
> > > > >
> > > > > We have 4 vms each having nginx and tomcat running on them and each
> > > > tomcat
> > > > > has nginx in front of them to proxy the requests. So it's one Nginx
> > > > > proxying to a dedicated tomcat on the same VM.
> > > >
> > > > Okay.
> > > >
> > > > > below is the tomcat connector configuration
> > > > >
> > > > > <Connector port="8080"
> > > > >                 connectionTimeout="60000" maxThreads="2000"
> > > > >
>  protocol="org.apache.coyote.http11.Http11NioProtocol"
> > > > >                 URIEncoding="UTF-8"
> > > > >                 redirectPort="8443" />
> > > >
> > > > 60 seconds is a *long* time for a connection timeout.
> > > >
> > > > Do you actually need 2000 threads? That's a lot, though not insane.
> > 2000
> > > > threads means you expect to handle 2000 concurrent (non-async,
> > > > non-Wewbsocket) requests. Do you need that (per node)? Are you
> > expecting
> > > > 8000 concurrent requests? Does your load-balancer understand the
> > > > topography and current-load on any given node?
> > > >
> > > > > When I am doing a load test of 2000 concurrent users I see the open
> > > files
> > > > > increase to 10,320 and when I take thread dump I see the threads
> are
> > > in a
> > > > > waiting state.Slowly as the requests are completed I see the open
> > files
> > > > > come down to normal levels.
> > > >
> > > > Are you performing your load-test against the CF/ALB/nginx/Tomcat
> > stack,
> > > > or just hitting Tomcat (or nginx) directly?
> > > >
> > > > Are you using HTTP keepalive in your load-test (from the client to
> > > > whichever server is being contacted)?
> > > >
> > > > > The output of the below command is
> > > > > sudo cat /proc/sys/kernel/pid_max
> > > > > 131072
> > > > >
> > > > > I am testing this on a c4.8xlarge VM in AWS.
> > > > >
> > > > > below is the config I changed in nginx.conf file
> > > > >
> > > > > events {
> > > > >          worker_connections 50000;
> > > > >          # multi_accept on;
> > > > > }
> > > >
> > > > This will allow 50k incoming connections, and Tomcat will accept an
> > > > unbounded number of connections (for NIO connector). So limiting your
> > > > threads to 2000 only means that the work of each request will be done
> > in
> > > > groups of 2000.
> > > >
> > > > > worker_rlimit_nofile 30000;
> > > >
> > > > I'm not sure how many connections are handled by a single nginx
> worker.
> > > > If you accept 50k connections and only allow 30k file handles, you
> may
> > > > have a problem if that's all being done by a single worker.
> > > >
> > > > > What would be the ideal config for tomcat and Nginx so this setup
> on
> > > > > c4.8xlarge vm could serve at least 5k or 10k requests
> simultaneously
> > > > > without causing the open files to spike to 10K.
> > > >
> > > > You will never be able to serve 10k simultaneous requests without
> > having
> > > > 10k open files on the server. If you mean 10k requests across the
> whole
> > > > 4-node environment, then I'd expect 10k requests to open (roughly)
> 2500
> > > > open files on each server. And of course, you need all kinds of other
> > > > files open as well, from JAR files to DB connections or other network
> > > > connections.
> > > >
> > > > But each connection needs a file descriptor, full stop. If you need
> to
> > > > handle 10k connections, then you will need to make it possible to
> open
> > > > 10k file handles /just for incoming network connections/ for that
> > > > process. There is no way around it.
> > > >
> > > > Are you trying to hit a performance target or are you actively
> getting
> > > > errors with a particular configuration? Your subject says "Connection
> > > > Timed Out". Is it nginx that is reporting the connection timeout?
> Have
> > > > you checked on the Tomcat side what is happening with those requests?
> > > >
> > > > -chris
> > > >
> > > > > On Thu, Oct 29, 2020 at 10:29 PM Christopher Schultz <
> > > > > [hidden email]> wrote:
> > > > >
> > > > >> Ayub,
> > > > >>
> > > > >> On 10/28/20 23:28, Ayub Khan wrote:
> > > > >>> During high load of 16k requests per minute, we notice below
> error
> > in
> > > > >> log.
> > > > >>>
> > > > >>>    [error] 2437#2437: *13335389 upstream timed out (110:
> Connection
> > > > timed
> > > > >>> out) while reading response header from upstream,  server:
> > jahez.net
> > > ,
> > > > >>> request: "GET /serviceContext/ServiceName?callback= HTTP/1.1",
> > > > upstream:
> > > > >> "
> > > > >>> http://127.0.0.1:8080/serviceContext/ServiceName
> > > > >>>
> > > > >>> Below is the flow of requests:
> > > > >>>
> > > > >>> cloudflare-->AWS ALB--> NGINX--> Tomcat-->Elastic-search
> > > > >>
> > > > >> I'm curious about why you are using all of cloudflare and ALB and
> > > nginx.
> > > > >> Seems like any one of those could provide what you are getting
> from
> > > all
> > > > >> 3 of them.
> > > > >>
> > > > >>> In NGINX we have the below config
> > > > >>>
> > > > >>> location /serviceContext/ServiceName{
> > > > >>>
> > > > >>>       proxy_pass
> > > > >> http://localhost:8080/serviceContext/ServiceName;
> > > > >>>      proxy_http_version  1.1;
> > > > >>>       proxy_set_header    Connection
> $connection_upgrade;
> > > > >>>       proxy_set_header    Upgrade             $http_upgrade;
> > > > >>>       proxy_set_header    Host                      $host;
> > > > >>>       proxy_set_header    X-Real-IP              $remote_addr;
> > > > >>>       proxy_set_header    X-Forwarded-For
> > > >  $proxy_add_x_forwarded_for;
> > > > >>>
> > > > >>>
> > > > >>>           proxy_buffers 16 16k;
> > > > >>>           proxy_buffer_size 32k;
> > > > >>> }
> > > > >>
> > > > >> What is the maximum number of simultaneous requests that one nginx
> > > > >> instance will accept? What is the maximum number of simultaneous
> > > proxied
> > > > >> requests one nginx instance will make to a back-end Tomcat node?
> How
> > > > >> many nginx nodes do you have? How many Tomcat nodes?
> > > > >>
> > > > >>> below is tomcat connector config
> > > > >>>
> > > > >>> <Connector port="8080"
> > > > >>>
> > > protocol="org.apache.coyote.http11.Http11NioProtocol"
> > > > >>>                  connectionTimeout="200" maxThreads="50000"
> > > > >>>                  URIEncoding="UTF-8"
> > > > >>>                  redirectPort="8443" />
> > > > >>
> > > > >> 50,000 threads is a LOT of threads.
> > > > >>
> > > > >>> We monitor the open file using *watch "sudo ls /proc/`cat
> > > > >>> /var/run/tomcat8.pid`/fd/ | wc -l" *the number of tomcat open
> files
> > > > keeps
> > > > >>> increasing slowing the responses. the only option to recover from
> > > this
> > > > is
> > > > >>> to restart tomcat.
> > > > >>
> > > > >> So this looks like Linux (/proc filesystem). Linux kernels have a
> > > 16-bit
> > > > >> pid space which means a theoretical max pid of 65535. In practice,
> > the
> > > > >> max pid is actually to be found here:
> > > > >>
> > > > >> $ cat /proc/sys/kernel/pid_max
> > > > >> 32768
> > > > >>
> > > > >> (on my Debian Linux system, 4.9.0-era kernel)
> > > > >>
> > > > >> Each thread takes a pid. 50k threads means more than the maximum
> > > allowed
> > > > >> on the OS. So you will eventually hit some kind of serious problem
> > > with
> > > > >> that many threads.
> > > > >>
> > > > >> How many fds do you get in the process before Tomcat grinds to a
> > halt?
> > > > >> What does the CPU usage look like? The process I/O? Disk usage?
> What
> > > > >> does a thread dump look like (if you have the disk space to dump
> > it!)?
> > > > >>
> > > > >> Why do you need that many threads?
> > > > >>
> > > > >> -chris
> > > > >>
> > > > >>
> > ---------------------------------------------------------------------
> > > > >> To unsubscribe, e-mail: [hidden email]
> > > > >> For additional commands, e-mail: [hidden email]
> > > > >>
> > > > >>
> > > > >
> > > >
> > > > ---------------------------------------------------------------------
> > > > To unsubscribe, e-mail: [hidden email]
> > > > For additional commands, e-mail: [hidden email]
> > > >
> > > >
> > >
> > > --
> > > --------------------------------------------------------------------
> > > Sun Certified Enterprise Architect 1.5
> > > Sun Certified Java Programmer 1.4
> > > Microsoft Certified Systems Engineer 2000
> > > http://in.linkedin.com/pub/ayub-khan/a/811/b81
> > > mobile:+966-502674604
> > > ----------------------------------------------------------------------
> > > It is proved that Hard Work and kowledge will get you close but
> attitude
> > > will get you there. However, it's the Love
> > > of God that will put you over the top!!
> > >
> >
>
>
> --
> --------------------------------------------------------------------
> Sun Certified Enterprise Architect 1.5
> Sun Certified Java Programmer 1.4
> Microsoft Certified Systems Engineer 2000
> http://in.linkedin.com/pub/ayub-khan/a/811/b81
> mobile:+966-502674604
> ----------------------------------------------------------------------
> It is proved that Hard Work and kowledge will get you close but attitude
> will get you there. However, it's the Love
> of God that will put you over the top!!
>
Reply | Threaded
Open this post in threaded view
|

Re: NGINX + tomcat 8.0.35 (110: Connection timed out)

Ayub Khan
Martin,

Could you provide me a command which you want me to run and provide you the
results which might help you to debug this issue ?


On Thu, Nov 12, 2020 at 1:36 PM Martin Grigorov <[hidden email]>
wrote:

> On Thu, Nov 12, 2020 at 10:37 AM Ayub Khan <[hidden email]> wrote:
>
> > Martin,
> >
> > These are file descriptors, some are related to the jar files which are
> > included in the web application and some are related to the sockets from
> > nginx to tomcat and some are related to database connections. I use the
> > below command to count the open file descriptors
> >
>
> which type of connections increase ?
> the sockets ? the DB ones ?
>
>
> >
> > watch "sudo ls /proc/`cat /var/run/tomcat8.pid`/fd/ | wc -l"
> >
>
> you can also use lsof command
>
>
> >
> >
> >
> > On Thu, Nov 12, 2020 at 10:56 AM Martin Grigorov <[hidden email]>
> > wrote:
> >
> > > On Wed, Nov 11, 2020 at 11:17 PM Ayub Khan <[hidden email]> wrote:
> > >
> > > > Chris,
> > > >
> > > > I was load testing using the ec2 load balancer dns. I have increased
> > the
> > > > connector timeout to 6000 and also gave 32gig to the JVM of tomcat. I
> > am
> > > > not seeing connection timeout in nginx logs now. No errors in
> > kernel.log
> > > I
> > > > am not seeing any errors in tomcat catalina.out.
> > > > During regular operations when the request count is between 4 to 6k
> > > > requests per minute the open files count for the tomcat process is
> > > between
> > > > 200 to 350. Responses from tomcat are within 5 seconds.
> > > > If the requests count goes beyond 6.5 k open files slowly move up  to
> > > 2300
> > > > to 3000 and the request responses from tomcat become slow.
> > > >
> > > > I am not concerned about high open files as I do not see any errors
> > > related
> > > > to open files. Only side effect of  open files going above 700 is the
> > > > response from tomcat is slow. I checked if this is caused from
> elastic
> > > > search, aws cloud watch shows elastic search response is within 5
> > > > milliseconds.
> > > >
> > > > what might be the reason that when the open files goes beyond 600, it
> > > slows
> > > > down the response time for tomcat. I tried with tomcat 9 and it's the
> > > same
> > > > behavior
> > > >
> > >
> > > Do you know what kind of files are being opened ?
> > >
> > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > > On Tue, Nov 3, 2020 at 9:40 PM Christopher Schultz <
> > > > [hidden email]> wrote:
> > > >
> > > > > Ayub,
> > > > >
> > > > > On 11/3/20 10:56, Ayub Khan wrote:
> > > > > > *I'm curious about why you are using all of cloudflare and ALB
> and
> > > > > > nginx.Seems like any one of those could provide what you are
> > getting
> > > > from
> > > > > > all3 of them. *
> > > > > >
> > > > > > Cloudflare is doing just the DNS and nginx is doing ssl
> termination
> > > > >
> > > > > What do you mean "Cloudflare is doing just the DNS?"
> > > > >
> > > > > So what is ALB doing, then?
> > > > >
> > > > > > *What is the maximum number of simultaneous requests that one
> > > > > nginxinstance
> > > > > > will accept? What is the maximum number of simultaneous
> > > proxiedrequests
> > > > > one
> > > > > > nginx instance will make to a back-end Tomcat node? Howmany nginx
> > > nodes
> > > > > do
> > > > > > you have? How many Tomcat nodes?  *
> > > > > >
> > > > > > We have 4 vms each having nginx and tomcat running on them and
> each
> > > > > tomcat
> > > > > > has nginx in front of them to proxy the requests. So it's one
> Nginx
> > > > > > proxying to a dedicated tomcat on the same VM.
> > > > >
> > > > > Okay.
> > > > >
> > > > > > below is the tomcat connector configuration
> > > > > >
> > > > > > <Connector port="8080"
> > > > > >                 connectionTimeout="60000" maxThreads="2000"
> > > > > >
> >  protocol="org.apache.coyote.http11.Http11NioProtocol"
> > > > > >                 URIEncoding="UTF-8"
> > > > > >                 redirectPort="8443" />
> > > > >
> > > > > 60 seconds is a *long* time for a connection timeout.
> > > > >
> > > > > Do you actually need 2000 threads? That's a lot, though not insane.
> > > 2000
> > > > > threads means you expect to handle 2000 concurrent (non-async,
> > > > > non-Wewbsocket) requests. Do you need that (per node)? Are you
> > > expecting
> > > > > 8000 concurrent requests? Does your load-balancer understand the
> > > > > topography and current-load on any given node?
> > > > >
> > > > > > When I am doing a load test of 2000 concurrent users I see the
> open
> > > > files
> > > > > > increase to 10,320 and when I take thread dump I see the threads
> > are
> > > > in a
> > > > > > waiting state.Slowly as the requests are completed I see the open
> > > files
> > > > > > come down to normal levels.
> > > > >
> > > > > Are you performing your load-test against the CF/ALB/nginx/Tomcat
> > > stack,
> > > > > or just hitting Tomcat (or nginx) directly?
> > > > >
> > > > > Are you using HTTP keepalive in your load-test (from the client to
> > > > > whichever server is being contacted)?
> > > > >
> > > > > > The output of the below command is
> > > > > > sudo cat /proc/sys/kernel/pid_max
> > > > > > 131072
> > > > > >
> > > > > > I am testing this on a c4.8xlarge VM in AWS.
> > > > > >
> > > > > > below is the config I changed in nginx.conf file
> > > > > >
> > > > > > events {
> > > > > >          worker_connections 50000;
> > > > > >          # multi_accept on;
> > > > > > }
> > > > >
> > > > > This will allow 50k incoming connections, and Tomcat will accept an
> > > > > unbounded number of connections (for NIO connector). So limiting
> your
> > > > > threads to 2000 only means that the work of each request will be
> done
> > > in
> > > > > groups of 2000.
> > > > >
> > > > > > worker_rlimit_nofile 30000;
> > > > >
> > > > > I'm not sure how many connections are handled by a single nginx
> > worker.
> > > > > If you accept 50k connections and only allow 30k file handles, you
> > may
> > > > > have a problem if that's all being done by a single worker.
> > > > >
> > > > > > What would be the ideal config for tomcat and Nginx so this setup
> > on
> > > > > > c4.8xlarge vm could serve at least 5k or 10k requests
> > simultaneously
> > > > > > without causing the open files to spike to 10K.
> > > > >
> > > > > You will never be able to serve 10k simultaneous requests without
> > > having
> > > > > 10k open files on the server. If you mean 10k requests across the
> > whole
> > > > > 4-node environment, then I'd expect 10k requests to open (roughly)
> > 2500
> > > > > open files on each server. And of course, you need all kinds of
> other
> > > > > files open as well, from JAR files to DB connections or other
> network
> > > > > connections.
> > > > >
> > > > > But each connection needs a file descriptor, full stop. If you need
> > to
> > > > > handle 10k connections, then you will need to make it possible to
> > open
> > > > > 10k file handles /just for incoming network connections/ for that
> > > > > process. There is no way around it.
> > > > >
> > > > > Are you trying to hit a performance target or are you actively
> > getting
> > > > > errors with a particular configuration? Your subject says
> "Connection
> > > > > Timed Out". Is it nginx that is reporting the connection timeout?
> > Have
> > > > > you checked on the Tomcat side what is happening with those
> requests?
> > > > >
> > > > > -chris
> > > > >
> > > > > > On Thu, Oct 29, 2020 at 10:29 PM Christopher Schultz <
> > > > > > [hidden email]> wrote:
> > > > > >
> > > > > >> Ayub,
> > > > > >>
> > > > > >> On 10/28/20 23:28, Ayub Khan wrote:
> > > > > >>> During high load of 16k requests per minute, we notice below
> > error
> > > in
> > > > > >> log.
> > > > > >>>
> > > > > >>>    [error] 2437#2437: *13335389 upstream timed out (110:
> > Connection
> > > > > timed
> > > > > >>> out) while reading response header from upstream,  server:
> > > jahez.net
> > > > ,
> > > > > >>> request: "GET /serviceContext/ServiceName?callback= HTTP/1.1",
> > > > > upstream:
> > > > > >> "
> > > > > >>> http://127.0.0.1:8080/serviceContext/ServiceName
> > > > > >>>
> > > > > >>> Below is the flow of requests:
> > > > > >>>
> > > > > >>> cloudflare-->AWS ALB--> NGINX--> Tomcat-->Elastic-search
> > > > > >>
> > > > > >> I'm curious about why you are using all of cloudflare and ALB
> and
> > > > nginx.
> > > > > >> Seems like any one of those could provide what you are getting
> > from
> > > > all
> > > > > >> 3 of them.
> > > > > >>
> > > > > >>> In NGINX we have the below config
> > > > > >>>
> > > > > >>> location /serviceContext/ServiceName{
> > > > > >>>
> > > > > >>>       proxy_pass
> > > > > >> http://localhost:8080/serviceContext/ServiceName;
> > > > > >>>      proxy_http_version  1.1;
> > > > > >>>       proxy_set_header    Connection
> > $connection_upgrade;
> > > > > >>>       proxy_set_header    Upgrade             $http_upgrade;
> > > > > >>>       proxy_set_header    Host                      $host;
> > > > > >>>       proxy_set_header    X-Real-IP              $remote_addr;
> > > > > >>>       proxy_set_header    X-Forwarded-For
> > > > >  $proxy_add_x_forwarded_for;
> > > > > >>>
> > > > > >>>
> > > > > >>>           proxy_buffers 16 16k;
> > > > > >>>           proxy_buffer_size 32k;
> > > > > >>> }
> > > > > >>
> > > > > >> What is the maximum number of simultaneous requests that one
> nginx
> > > > > >> instance will accept? What is the maximum number of simultaneous
> > > > proxied
> > > > > >> requests one nginx instance will make to a back-end Tomcat node?
> > How
> > > > > >> many nginx nodes do you have? How many Tomcat nodes?
> > > > > >>
> > > > > >>> below is tomcat connector config
> > > > > >>>
> > > > > >>> <Connector port="8080"
> > > > > >>>
> > > > protocol="org.apache.coyote.http11.Http11NioProtocol"
> > > > > >>>                  connectionTimeout="200" maxThreads="50000"
> > > > > >>>                  URIEncoding="UTF-8"
> > > > > >>>                  redirectPort="8443" />
> > > > > >>
> > > > > >> 50,000 threads is a LOT of threads.
> > > > > >>
> > > > > >>> We monitor the open file using *watch "sudo ls /proc/`cat
> > > > > >>> /var/run/tomcat8.pid`/fd/ | wc -l" *the number of tomcat open
> > files
> > > > > keeps
> > > > > >>> increasing slowing the responses. the only option to recover
> from
> > > > this
> > > > > is
> > > > > >>> to restart tomcat.
> > > > > >>
> > > > > >> So this looks like Linux (/proc filesystem). Linux kernels have
> a
> > > > 16-bit
> > > > > >> pid space which means a theoretical max pid of 65535. In
> practice,
> > > the
> > > > > >> max pid is actually to be found here:
> > > > > >>
> > > > > >> $ cat /proc/sys/kernel/pid_max
> > > > > >> 32768
> > > > > >>
> > > > > >> (on my Debian Linux system, 4.9.0-era kernel)
> > > > > >>
> > > > > >> Each thread takes a pid. 50k threads means more than the maximum
> > > > allowed
> > > > > >> on the OS. So you will eventually hit some kind of serious
> problem
> > > > with
> > > > > >> that many threads.
> > > > > >>
> > > > > >> How many fds do you get in the process before Tomcat grinds to a
> > > halt?
> > > > > >> What does the CPU usage look like? The process I/O? Disk usage?
> > What
> > > > > >> does a thread dump look like (if you have the disk space to dump
> > > it!)?
> > > > > >>
> > > > > >> Why do you need that many threads?
> > > > > >>
> > > > > >> -chris
> > > > > >>
> > > > > >>
> > > ---------------------------------------------------------------------
> > > > > >> To unsubscribe, e-mail: [hidden email]
> > > > > >> For additional commands, e-mail: [hidden email]
> > > > > >>
> > > > > >>
> > > > > >
> > > > >
> > > > >
> ---------------------------------------------------------------------
> > > > > To unsubscribe, e-mail: [hidden email]
> > > > > For additional commands, e-mail: [hidden email]
> > > > >
> > > > >
> > > >
> > > > --
> > > > --------------------------------------------------------------------
> > > > Sun Certified Enterprise Architect 1.5
> > > > Sun Certified Java Programmer 1.4
> > > > Microsoft Certified Systems Engineer 2000
> > > > http://in.linkedin.com/pub/ayub-khan/a/811/b81
> > > > mobile:+966-502674604
> > > >
> ----------------------------------------------------------------------
> > > > It is proved that Hard Work and kowledge will get you close but
> > attitude
> > > > will get you there. However, it's the Love
> > > > of God that will put you over the top!!
> > > >
> > >
> >
> >
> > --
> > --------------------------------------------------------------------
> > Sun Certified Enterprise Architect 1.5
> > Sun Certified Java Programmer 1.4
> > Microsoft Certified Systems Engineer 2000
> > http://in.linkedin.com/pub/ayub-khan/a/811/b81
> > mobile:+966-502674604
> > ----------------------------------------------------------------------
> > It is proved that Hard Work and kowledge will get you close but attitude
> > will get you there. However, it's the Love
> > of God that will put you over the top!!
> >
>


--
--------------------------------------------------------------------
Sun Certified Enterprise Architect 1.5
Sun Certified Java Programmer 1.4
Microsoft Certified Systems Engineer 2000
http://in.linkedin.com/pub/ayub-khan/a/811/b81
mobile:+966-502674604
----------------------------------------------------------------------
It is proved that Hard Work and kowledge will get you close but attitude
will get you there. However, it's the Love
of God that will put you over the top!!
Reply | Threaded
Open this post in threaded view
|

Re: NGINX + tomcat 8.0.35 (110: Connection timed out)

mgrigorov
On Thu, Nov 12, 2020 at 2:40 PM Ayub Khan <[hidden email]> wrote:

> Martin,
>
> Could you provide me a command which you want me to run and provide you the
> results which might help you to debug this issue ?
>

1) start your app and click around to load the usual FDs
2) lsof -p `cat /var/run/tomcat8.pid` > after_start.txt
3) load your app
4) lsof -p `cat /var/run/tomcat8.pid` > after_load.txt

you can analyze the differences in the files yourself before sending them
to us :-)


>
>
> On Thu, Nov 12, 2020 at 1:36 PM Martin Grigorov <[hidden email]>
> wrote:
>
> > On Thu, Nov 12, 2020 at 10:37 AM Ayub Khan <[hidden email]> wrote:
> >
> > > Martin,
> > >
> > > These are file descriptors, some are related to the jar files which are
> > > included in the web application and some are related to the sockets
> from
> > > nginx to tomcat and some are related to database connections. I use the
> > > below command to count the open file descriptors
> > >
> >
> > which type of connections increase ?
> > the sockets ? the DB ones ?
> >
> >
> > >
> > > watch "sudo ls /proc/`cat /var/run/tomcat8.pid`/fd/ | wc -l"
> > >
> >
> > you can also use lsof command
> >
> >
> > >
> > >
> > >
> > > On Thu, Nov 12, 2020 at 10:56 AM Martin Grigorov <[hidden email]
> >
> > > wrote:
> > >
> > > > On Wed, Nov 11, 2020 at 11:17 PM Ayub Khan <[hidden email]>
> wrote:
> > > >
> > > > > Chris,
> > > > >
> > > > > I was load testing using the ec2 load balancer dns. I have
> increased
> > > the
> > > > > connector timeout to 6000 and also gave 32gig to the JVM of
> tomcat. I
> > > am
> > > > > not seeing connection timeout in nginx logs now. No errors in
> > > kernel.log
> > > > I
> > > > > am not seeing any errors in tomcat catalina.out.
> > > > > During regular operations when the request count is between 4 to 6k
> > > > > requests per minute the open files count for the tomcat process is
> > > > between
> > > > > 200 to 350. Responses from tomcat are within 5 seconds.
> > > > > If the requests count goes beyond 6.5 k open files slowly move up
> to
> > > > 2300
> > > > > to 3000 and the request responses from tomcat become slow.
> > > > >
> > > > > I am not concerned about high open files as I do not see any errors
> > > > related
> > > > > to open files. Only side effect of  open files going above 700 is
> the
> > > > > response from tomcat is slow. I checked if this is caused from
> > elastic
> > > > > search, aws cloud watch shows elastic search response is within 5
> > > > > milliseconds.
> > > > >
> > > > > what might be the reason that when the open files goes beyond 600,
> it
> > > > slows
> > > > > down the response time for tomcat. I tried with tomcat 9 and it's
> the
> > > > same
> > > > > behavior
> > > > >
> > > >
> > > > Do you know what kind of files are being opened ?
> > > >
> > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > > On Tue, Nov 3, 2020 at 9:40 PM Christopher Schultz <
> > > > > [hidden email]> wrote:
> > > > >
> > > > > > Ayub,
> > > > > >
> > > > > > On 11/3/20 10:56, Ayub Khan wrote:
> > > > > > > *I'm curious about why you are using all of cloudflare and ALB
> > and
> > > > > > > nginx.Seems like any one of those could provide what you are
> > > getting
> > > > > from
> > > > > > > all3 of them. *
> > > > > > >
> > > > > > > Cloudflare is doing just the DNS and nginx is doing ssl
> > termination
> > > > > >
> > > > > > What do you mean "Cloudflare is doing just the DNS?"
> > > > > >
> > > > > > So what is ALB doing, then?
> > > > > >
> > > > > > > *What is the maximum number of simultaneous requests that one
> > > > > > nginxinstance
> > > > > > > will accept? What is the maximum number of simultaneous
> > > > proxiedrequests
> > > > > > one
> > > > > > > nginx instance will make to a back-end Tomcat node? Howmany
> nginx
> > > > nodes
> > > > > > do
> > > > > > > you have? How many Tomcat nodes?  *
> > > > > > >
> > > > > > > We have 4 vms each having nginx and tomcat running on them and
> > each
> > > > > > tomcat
> > > > > > > has nginx in front of them to proxy the requests. So it's one
> > Nginx
> > > > > > > proxying to a dedicated tomcat on the same VM.
> > > > > >
> > > > > > Okay.
> > > > > >
> > > > > > > below is the tomcat connector configuration
> > > > > > >
> > > > > > > <Connector port="8080"
> > > > > > >                 connectionTimeout="60000" maxThreads="2000"
> > > > > > >
> > >  protocol="org.apache.coyote.http11.Http11NioProtocol"
> > > > > > >                 URIEncoding="UTF-8"
> > > > > > >                 redirectPort="8443" />
> > > > > >
> > > > > > 60 seconds is a *long* time for a connection timeout.
> > > > > >
> > > > > > Do you actually need 2000 threads? That's a lot, though not
> insane.
> > > > 2000
> > > > > > threads means you expect to handle 2000 concurrent (non-async,
> > > > > > non-Wewbsocket) requests. Do you need that (per node)? Are you
> > > > expecting
> > > > > > 8000 concurrent requests? Does your load-balancer understand the
> > > > > > topography and current-load on any given node?
> > > > > >
> > > > > > > When I am doing a load test of 2000 concurrent users I see the
> > open
> > > > > files
> > > > > > > increase to 10,320 and when I take thread dump I see the
> threads
> > > are
> > > > > in a
> > > > > > > waiting state.Slowly as the requests are completed I see the
> open
> > > > files
> > > > > > > come down to normal levels.
> > > > > >
> > > > > > Are you performing your load-test against the CF/ALB/nginx/Tomcat
> > > > stack,
> > > > > > or just hitting Tomcat (or nginx) directly?
> > > > > >
> > > > > > Are you using HTTP keepalive in your load-test (from the client
> to
> > > > > > whichever server is being contacted)?
> > > > > >
> > > > > > > The output of the below command is
> > > > > > > sudo cat /proc/sys/kernel/pid_max
> > > > > > > 131072
> > > > > > >
> > > > > > > I am testing this on a c4.8xlarge VM in AWS.
> > > > > > >
> > > > > > > below is the config I changed in nginx.conf file
> > > > > > >
> > > > > > > events {
> > > > > > >          worker_connections 50000;
> > > > > > >          # multi_accept on;
> > > > > > > }
> > > > > >
> > > > > > This will allow 50k incoming connections, and Tomcat will accept
> an
> > > > > > unbounded number of connections (for NIO connector). So limiting
> > your
> > > > > > threads to 2000 only means that the work of each request will be
> > done
> > > > in
> > > > > > groups of 2000.
> > > > > >
> > > > > > > worker_rlimit_nofile 30000;
> > > > > >
> > > > > > I'm not sure how many connections are handled by a single nginx
> > > worker.
> > > > > > If you accept 50k connections and only allow 30k file handles,
> you
> > > may
> > > > > > have a problem if that's all being done by a single worker.
> > > > > >
> > > > > > > What would be the ideal config for tomcat and Nginx so this
> setup
> > > on
> > > > > > > c4.8xlarge vm could serve at least 5k or 10k requests
> > > simultaneously
> > > > > > > without causing the open files to spike to 10K.
> > > > > >
> > > > > > You will never be able to serve 10k simultaneous requests without
> > > > having
> > > > > > 10k open files on the server. If you mean 10k requests across the
> > > whole
> > > > > > 4-node environment, then I'd expect 10k requests to open
> (roughly)
> > > 2500
> > > > > > open files on each server. And of course, you need all kinds of
> > other
> > > > > > files open as well, from JAR files to DB connections or other
> > network
> > > > > > connections.
> > > > > >
> > > > > > But each connection needs a file descriptor, full stop. If you
> need
> > > to
> > > > > > handle 10k connections, then you will need to make it possible to
> > > open
> > > > > > 10k file handles /just for incoming network connections/ for that
> > > > > > process. There is no way around it.
> > > > > >
> > > > > > Are you trying to hit a performance target or are you actively
> > > getting
> > > > > > errors with a particular configuration? Your subject says
> > "Connection
> > > > > > Timed Out". Is it nginx that is reporting the connection timeout?
> > > Have
> > > > > > you checked on the Tomcat side what is happening with those
> > requests?
> > > > > >
> > > > > > -chris
> > > > > >
> > > > > > > On Thu, Oct 29, 2020 at 10:29 PM Christopher Schultz <
> > > > > > > [hidden email]> wrote:
> > > > > > >
> > > > > > >> Ayub,
> > > > > > >>
> > > > > > >> On 10/28/20 23:28, Ayub Khan wrote:
> > > > > > >>> During high load of 16k requests per minute, we notice below
> > > error
> > > > in
> > > > > > >> log.
> > > > > > >>>
> > > > > > >>>    [error] 2437#2437: *13335389 upstream timed out (110:
> > > Connection
> > > > > > timed
> > > > > > >>> out) while reading response header from upstream,  server:
> > > > jahez.net
> > > > > ,
> > > > > > >>> request: "GET /serviceContext/ServiceName?callback=
> HTTP/1.1",
> > > > > > upstream:
> > > > > > >> "
> > > > > > >>> http://127.0.0.1:8080/serviceContext/ServiceName
> > > > > > >>>
> > > > > > >>> Below is the flow of requests:
> > > > > > >>>
> > > > > > >>> cloudflare-->AWS ALB--> NGINX--> Tomcat-->Elastic-search
> > > > > > >>
> > > > > > >> I'm curious about why you are using all of cloudflare and ALB
> > and
> > > > > nginx.
> > > > > > >> Seems like any one of those could provide what you are getting
> > > from
> > > > > all
> > > > > > >> 3 of them.
> > > > > > >>
> > > > > > >>> In NGINX we have the below config
> > > > > > >>>
> > > > > > >>> location /serviceContext/ServiceName{
> > > > > > >>>
> > > > > > >>>       proxy_pass
> > > > > > >> http://localhost:8080/serviceContext/ServiceName;
> > > > > > >>>      proxy_http_version  1.1;
> > > > > > >>>       proxy_set_header    Connection
> > > $connection_upgrade;
> > > > > > >>>       proxy_set_header    Upgrade             $http_upgrade;
> > > > > > >>>       proxy_set_header    Host                      $host;
> > > > > > >>>       proxy_set_header    X-Real-IP
> $remote_addr;
> > > > > > >>>       proxy_set_header    X-Forwarded-For
> > > > > >  $proxy_add_x_forwarded_for;
> > > > > > >>>
> > > > > > >>>
> > > > > > >>>           proxy_buffers 16 16k;
> > > > > > >>>           proxy_buffer_size 32k;
> > > > > > >>> }
> > > > > > >>
> > > > > > >> What is the maximum number of simultaneous requests that one
> > nginx
> > > > > > >> instance will accept? What is the maximum number of
> simultaneous
> > > > > proxied
> > > > > > >> requests one nginx instance will make to a back-end Tomcat
> node?
> > > How
> > > > > > >> many nginx nodes do you have? How many Tomcat nodes?
> > > > > > >>
> > > > > > >>> below is tomcat connector config
> > > > > > >>>
> > > > > > >>> <Connector port="8080"
> > > > > > >>>
> > > > > protocol="org.apache.coyote.http11.Http11NioProtocol"
> > > > > > >>>                  connectionTimeout="200" maxThreads="50000"
> > > > > > >>>                  URIEncoding="UTF-8"
> > > > > > >>>                  redirectPort="8443" />
> > > > > > >>
> > > > > > >> 50,000 threads is a LOT of threads.
> > > > > > >>
> > > > > > >>> We monitor the open file using *watch "sudo ls /proc/`cat
> > > > > > >>> /var/run/tomcat8.pid`/fd/ | wc -l" *the number of tomcat open
> > > files
> > > > > > keeps
> > > > > > >>> increasing slowing the responses. the only option to recover
> > from
> > > > > this
> > > > > > is
> > > > > > >>> to restart tomcat.
> > > > > > >>
> > > > > > >> So this looks like Linux (/proc filesystem). Linux kernels
> have
> > a
> > > > > 16-bit
> > > > > > >> pid space which means a theoretical max pid of 65535. In
> > practice,
> > > > the
> > > > > > >> max pid is actually to be found here:
> > > > > > >>
> > > > > > >> $ cat /proc/sys/kernel/pid_max
> > > > > > >> 32768
> > > > > > >>
> > > > > > >> (on my Debian Linux system, 4.9.0-era kernel)
> > > > > > >>
> > > > > > >> Each thread takes a pid. 50k threads means more than the
> maximum
> > > > > allowed
> > > > > > >> on the OS. So you will eventually hit some kind of serious
> > problem
> > > > > with
> > > > > > >> that many threads.
> > > > > > >>
> > > > > > >> How many fds do you get in the process before Tomcat grinds
> to a
> > > > halt?
> > > > > > >> What does the CPU usage look like? The process I/O? Disk
> usage?
> > > What
> > > > > > >> does a thread dump look like (if you have the disk space to
> dump
> > > > it!)?
> > > > > > >>
> > > > > > >> Why do you need that many threads?
> > > > > > >>
> > > > > > >> -chris
> > > > > > >>
> > > > > > >>
> > > > ---------------------------------------------------------------------
> > > > > > >> To unsubscribe, e-mail: [hidden email]
> > > > > > >> For additional commands, e-mail: [hidden email]
> > > > > > >>
> > > > > > >>
> > > > > > >
> > > > > >
> > > > > >
> > ---------------------------------------------------------------------
> > > > > > To unsubscribe, e-mail: [hidden email]
> > > > > > For additional commands, e-mail: [hidden email]
> > > > > >
> > > > > >
> > > > >
> > > > > --
> > > > >
> --------------------------------------------------------------------
> > > > > Sun Certified Enterprise Architect 1.5
> > > > > Sun Certified Java Programmer 1.4
> > > > > Microsoft Certified Systems Engineer 2000
> > > > > http://in.linkedin.com/pub/ayub-khan/a/811/b81
> > > > > mobile:+966-502674604
> > > > >
> > ----------------------------------------------------------------------
> > > > > It is proved that Hard Work and kowledge will get you close but
> > > attitude
> > > > > will get you there. However, it's the Love
> > > > > of God that will put you over the top!!
> > > > >
> > > >
> > >
> > >
> > > --
> > > --------------------------------------------------------------------
> > > Sun Certified Enterprise Architect 1.5
> > > Sun Certified Java Programmer 1.4
> > > Microsoft Certified Systems Engineer 2000
> > > http://in.linkedin.com/pub/ayub-khan/a/811/b81
> > > mobile:+966-502674604
> > > ----------------------------------------------------------------------
> > > It is proved that Hard Work and kowledge will get you close but
> attitude
> > > will get you there. However, it's the Love
> > > of God that will put you over the top!!
> > >
> >
>
>
> --
> --------------------------------------------------------------------
> Sun Certified Enterprise Architect 1.5
> Sun Certified Java Programmer 1.4
> Microsoft Certified Systems Engineer 2000
> http://in.linkedin.com/pub/ayub-khan/a/811/b81
> mobile:+966-502674604
> ----------------------------------------------------------------------
> It is proved that Hard Work and kowledge will get you close but attitude
> will get you there. However, it's the Love
> of God that will put you over the top!!
>
Reply | Threaded
Open this post in threaded view
|

Re: NGINX + tomcat 8.0.35 (110: Connection timed out)

Ayub Khan
Mark,

The difference between after_start and after_load is the below sockets
which is just a sample from the repeated list, the ports are random. How to
know what these connections are related to ?

java    5021 tomcat8 3162u     IPv6              98361       0t0     TCP
localhost:http-alt->localhost:51746 (ESTABLISHED)
java    5021 tomcat8 3163u     IPv6              98362       0t0     TCP
localhost:http-alt->localhost:51748 (ESTABLISHED)
java    5021 tomcat8 3164u     IPv6              98363       0t0     TCP
localhost:http-alt->localhost:51750 (ESTABLISHED)
java    5021 tomcat8 3165u     IPv6              98364       0t0     TCP
localhost:http-alt->localhost:51752 (ESTABLISHED)
java    5021 tomcat8 3166u     IPv6              25334       0t0     TCP
localhost:http-alt->localhost:51754 (ESTABLISHED)
java    5021 tomcat8 3167u     IPv6              25335       0t0     TCP
localhost:http-alt->localhost:51756 (ESTABLISHED)
java    5021 tomcat8 3168u     IPv6              25336       0t0     TCP
localhost:http-alt->localhost:51758 (ESTABLISHED)
java    5021 tomcat8 3169u     IPv6              25337       0t0     TCP
localhost:http-alt->localhost:51760 (ESTABLISHED)
java    5021 tomcat8 3170u     IPv6              25338       0t0     TCP
localhost:http-alt->localhost:51762 (ESTABLISHED)
java    5021 tomcat8 3171u     IPv6              25339       0t0     TCP
localhost:http-alt->localhost:51764 (ESTABLISHED)
java    5021 tomcat8 3172u     IPv6              25340       0t0     TCP
localhost:http-alt->localhost:51766 (ESTABLISHED)
java    5021 tomcat8 3173u     IPv6              25341       0t0     TCP
localhost:http-alt->localhost:51768 (ESTABLISHED)
java    5021 tomcat8 3174u     IPv6              25342       0t0     TCP
localhost:http-alt->localhost:51770 (ESTABLISHED)
java    5021 tomcat8 3175u     IPv6              25343       0t0     TCP
localhost:http-alt->localhost:51772 (ESTABLISHED)
java    5021 tomcat8 3176u     IPv6              25344       0t0     TCP
localhost:http-alt->localhost:51774 (ESTABLISHED)
java    5021 tomcat8 3177u     IPv6              25345       0t0     TCP
localhost:http-alt->localhost:51776 (ESTABLISHED)
java    5021 tomcat8 3178u     IPv6              25346       0t0     TCP
localhost:http-alt->localhost:51778 (ESTABLISHED)
java    5021 tomcat8 3179u     IPv6              25347       0t0     TCP
localhost:http-alt->localhost:51780 (ESTABLISHED)
java    5021 tomcat8 3180u     IPv6              25348       0t0     TCP
localhost:http-alt->localhost:51782 (ESTABLISHED)
java    5021 tomcat8 3181u     IPv6              25349       0t0     TCP
localhost:http-alt->localhost:51784 (ESTABLISHED)
java    5021 tomcat8 3182u     IPv6              25350       0t0     TCP
localhost:http-alt->localhost:51786 (ESTABLISHED)
java    5021 tomcat8 3183u     IPv6              25351       0t0     TCP
localhost:http-alt->localhost:51788 (ESTABLISHED)

On Thu, Nov 12, 2020 at 4:05 PM Martin Grigorov <[hidden email]>
wrote:

> On Thu, Nov 12, 2020 at 2:40 PM Ayub Khan <[hidden email]> wrote:
>
> > Martin,
> >
> > Could you provide me a command which you want me to run and provide you
> the
> > results which might help you to debug this issue ?
> >
>
> 1) start your app and click around to load the usual FDs
> 2) lsof -p `cat /var/run/tomcat8.pid` > after_start.txt
> 3) load your app
> 4) lsof -p `cat /var/run/tomcat8.pid` > after_load.txt
>
> you can analyze the differences in the files yourself before sending them
> to us :-)
>
>
> >
> >
> > On Thu, Nov 12, 2020 at 1:36 PM Martin Grigorov <[hidden email]>
> > wrote:
> >
> > > On Thu, Nov 12, 2020 at 10:37 AM Ayub Khan <[hidden email]> wrote:
> > >
> > > > Martin,
> > > >
> > > > These are file descriptors, some are related to the jar files which
> are
> > > > included in the web application and some are related to the sockets
> > from
> > > > nginx to tomcat and some are related to database connections. I use
> the
> > > > below command to count the open file descriptors
> > > >
> > >
> > > which type of connections increase ?
> > > the sockets ? the DB ones ?
> > >
> > >
> > > >
> > > > watch "sudo ls /proc/`cat /var/run/tomcat8.pid`/fd/ | wc -l"
> > > >
> > >
> > > you can also use lsof command
> > >
> > >
> > > >
> > > >
> > > >
> > > > On Thu, Nov 12, 2020 at 10:56 AM Martin Grigorov <
> [hidden email]
> > >
> > > > wrote:
> > > >
> > > > > On Wed, Nov 11, 2020 at 11:17 PM Ayub Khan <[hidden email]>
> > wrote:
> > > > >
> > > > > > Chris,
> > > > > >
> > > > > > I was load testing using the ec2 load balancer dns. I have
> > increased
> > > > the
> > > > > > connector timeout to 6000 and also gave 32gig to the JVM of
> > tomcat. I
> > > > am
> > > > > > not seeing connection timeout in nginx logs now. No errors in
> > > > kernel.log
> > > > > I
> > > > > > am not seeing any errors in tomcat catalina.out.
> > > > > > During regular operations when the request count is between 4 to
> 6k
> > > > > > requests per minute the open files count for the tomcat process
> is
> > > > > between
> > > > > > 200 to 350. Responses from tomcat are within 5 seconds.
> > > > > > If the requests count goes beyond 6.5 k open files slowly move up
> > to
> > > > > 2300
> > > > > > to 3000 and the request responses from tomcat become slow.
> > > > > >
> > > > > > I am not concerned about high open files as I do not see any
> errors
> > > > > related
> > > > > > to open files. Only side effect of  open files going above 700 is
> > the
> > > > > > response from tomcat is slow. I checked if this is caused from
> > > elastic
> > > > > > search, aws cloud watch shows elastic search response is within 5
> > > > > > milliseconds.
> > > > > >
> > > > > > what might be the reason that when the open files goes beyond
> 600,
> > it
> > > > > slows
> > > > > > down the response time for tomcat. I tried with tomcat 9 and it's
> > the
> > > > > same
> > > > > > behavior
> > > > > >
> > > > >
> > > > > Do you know what kind of files are being opened ?
> > > > >
> > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > > On Tue, Nov 3, 2020 at 9:40 PM Christopher Schultz <
> > > > > > [hidden email]> wrote:
> > > > > >
> > > > > > > Ayub,
> > > > > > >
> > > > > > > On 11/3/20 10:56, Ayub Khan wrote:
> > > > > > > > *I'm curious about why you are using all of cloudflare and
> ALB
> > > and
> > > > > > > > nginx.Seems like any one of those could provide what you are
> > > > getting
> > > > > > from
> > > > > > > > all3 of them. *
> > > > > > > >
> > > > > > > > Cloudflare is doing just the DNS and nginx is doing ssl
> > > termination
> > > > > > >
> > > > > > > What do you mean "Cloudflare is doing just the DNS?"
> > > > > > >
> > > > > > > So what is ALB doing, then?
> > > > > > >
> > > > > > > > *What is the maximum number of simultaneous requests that one
> > > > > > > nginxinstance
> > > > > > > > will accept? What is the maximum number of simultaneous
> > > > > proxiedrequests
> > > > > > > one
> > > > > > > > nginx instance will make to a back-end Tomcat node? Howmany
> > nginx
> > > > > nodes
> > > > > > > do
> > > > > > > > you have? How many Tomcat nodes?  *
> > > > > > > >
> > > > > > > > We have 4 vms each having nginx and tomcat running on them
> and
> > > each
> > > > > > > tomcat
> > > > > > > > has nginx in front of them to proxy the requests. So it's one
> > > Nginx
> > > > > > > > proxying to a dedicated tomcat on the same VM.
> > > > > > >
> > > > > > > Okay.
> > > > > > >
> > > > > > > > below is the tomcat connector configuration
> > > > > > > >
> > > > > > > > <Connector port="8080"
> > > > > > > >                 connectionTimeout="60000" maxThreads="2000"
> > > > > > > >
> > > >  protocol="org.apache.coyote.http11.Http11NioProtocol"
> > > > > > > >                 URIEncoding="UTF-8"
> > > > > > > >                 redirectPort="8443" />
> > > > > > >
> > > > > > > 60 seconds is a *long* time for a connection timeout.
> > > > > > >
> > > > > > > Do you actually need 2000 threads? That's a lot, though not
> > insane.
> > > > > 2000
> > > > > > > threads means you expect to handle 2000 concurrent (non-async,
> > > > > > > non-Wewbsocket) requests. Do you need that (per node)? Are you
> > > > > expecting
> > > > > > > 8000 concurrent requests? Does your load-balancer understand
> the
> > > > > > > topography and current-load on any given node?
> > > > > > >
> > > > > > > > When I am doing a load test of 2000 concurrent users I see
> the
> > > open
> > > > > > files
> > > > > > > > increase to 10,320 and when I take thread dump I see the
> > threads
> > > > are
> > > > > > in a
> > > > > > > > waiting state.Slowly as the requests are completed I see the
> > open
> > > > > files
> > > > > > > > come down to normal levels.
> > > > > > >
> > > > > > > Are you performing your load-test against the
> CF/ALB/nginx/Tomcat
> > > > > stack,
> > > > > > > or just hitting Tomcat (or nginx) directly?
> > > > > > >
> > > > > > > Are you using HTTP keepalive in your load-test (from the client
> > to
> > > > > > > whichever server is being contacted)?
> > > > > > >
> > > > > > > > The output of the below command is
> > > > > > > > sudo cat /proc/sys/kernel/pid_max
> > > > > > > > 131072
> > > > > > > >
> > > > > > > > I am testing this on a c4.8xlarge VM in AWS.
> > > > > > > >
> > > > > > > > below is the config I changed in nginx.conf file
> > > > > > > >
> > > > > > > > events {
> > > > > > > >          worker_connections 50000;
> > > > > > > >          # multi_accept on;
> > > > > > > > }
> > > > > > >
> > > > > > > This will allow 50k incoming connections, and Tomcat will
> accept
> > an
> > > > > > > unbounded number of connections (for NIO connector). So
> limiting
> > > your
> > > > > > > threads to 2000 only means that the work of each request will
> be
> > > done
> > > > > in
> > > > > > > groups of 2000.
> > > > > > >
> > > > > > > > worker_rlimit_nofile 30000;
> > > > > > >
> > > > > > > I'm not sure how many connections are handled by a single nginx
> > > > worker.
> > > > > > > If you accept 50k connections and only allow 30k file handles,
> > you
> > > > may
> > > > > > > have a problem if that's all being done by a single worker.
> > > > > > >
> > > > > > > > What would be the ideal config for tomcat and Nginx so this
> > setup
> > > > on
> > > > > > > > c4.8xlarge vm could serve at least 5k or 10k requests
> > > > simultaneously
> > > > > > > > without causing the open files to spike to 10K.
> > > > > > >
> > > > > > > You will never be able to serve 10k simultaneous requests
> without
> > > > > having
> > > > > > > 10k open files on the server. If you mean 10k requests across
> the
> > > > whole
> > > > > > > 4-node environment, then I'd expect 10k requests to open
> > (roughly)
> > > > 2500
> > > > > > > open files on each server. And of course, you need all kinds of
> > > other
> > > > > > > files open as well, from JAR files to DB connections or other
> > > network
> > > > > > > connections.
> > > > > > >
> > > > > > > But each connection needs a file descriptor, full stop. If you
> > need
> > > > to
> > > > > > > handle 10k connections, then you will need to make it possible
> to
> > > > open
> > > > > > > 10k file handles /just for incoming network connections/ for
> that
> > > > > > > process. There is no way around it.
> > > > > > >
> > > > > > > Are you trying to hit a performance target or are you actively
> > > > getting
> > > > > > > errors with a particular configuration? Your subject says
> > > "Connection
> > > > > > > Timed Out". Is it nginx that is reporting the connection
> timeout?
> > > > Have
> > > > > > > you checked on the Tomcat side what is happening with those
> > > requests?
> > > > > > >
> > > > > > > -chris
> > > > > > >
> > > > > > > > On Thu, Oct 29, 2020 at 10:29 PM Christopher Schultz <
> > > > > > > > [hidden email]> wrote:
> > > > > > > >
> > > > > > > >> Ayub,
> > > > > > > >>
> > > > > > > >> On 10/28/20 23:28, Ayub Khan wrote:
> > > > > > > >>> During high load of 16k requests per minute, we notice
> below
> > > > error
> > > > > in
> > > > > > > >> log.
> > > > > > > >>>
> > > > > > > >>>    [error] 2437#2437: *13335389 upstream timed out (110:
> > > > Connection
> > > > > > > timed
> > > > > > > >>> out) while reading response header from upstream,  server:
> > > > > jahez.net
> > > > > > ,
> > > > > > > >>> request: "GET /serviceContext/ServiceName?callback=
> > HTTP/1.1",
> > > > > > > upstream:
> > > > > > > >> "
> > > > > > > >>> http://127.0.0.1:8080/serviceContext/ServiceName
> > > > > > > >>>
> > > > > > > >>> Below is the flow of requests:
> > > > > > > >>>
> > > > > > > >>> cloudflare-->AWS ALB--> NGINX--> Tomcat-->Elastic-search
> > > > > > > >>
> > > > > > > >> I'm curious about why you are using all of cloudflare and
> ALB
> > > and
> > > > > > nginx.
> > > > > > > >> Seems like any one of those could provide what you are
> getting
> > > > from
> > > > > > all
> > > > > > > >> 3 of them.
> > > > > > > >>
> > > > > > > >>> In NGINX we have the below config
> > > > > > > >>>
> > > > > > > >>> location /serviceContext/ServiceName{
> > > > > > > >>>
> > > > > > > >>>       proxy_pass
> > > > > > > >> http://localhost:8080/serviceContext/ServiceName;
> > > > > > > >>>      proxy_http_version  1.1;
> > > > > > > >>>       proxy_set_header    Connection
> > > > $connection_upgrade;
> > > > > > > >>>       proxy_set_header    Upgrade
>  $http_upgrade;
> > > > > > > >>>       proxy_set_header    Host                      $host;
> > > > > > > >>>       proxy_set_header    X-Real-IP
> > $remote_addr;
> > > > > > > >>>       proxy_set_header    X-Forwarded-For
> > > > > > >  $proxy_add_x_forwarded_for;
> > > > > > > >>>
> > > > > > > >>>
> > > > > > > >>>           proxy_buffers 16 16k;
> > > > > > > >>>           proxy_buffer_size 32k;
> > > > > > > >>> }
> > > > > > > >>
> > > > > > > >> What is the maximum number of simultaneous requests that one
> > > nginx
> > > > > > > >> instance will accept? What is the maximum number of
> > simultaneous
> > > > > > proxied
> > > > > > > >> requests one nginx instance will make to a back-end Tomcat
> > node?
> > > > How
> > > > > > > >> many nginx nodes do you have? How many Tomcat nodes?
> > > > > > > >>
> > > > > > > >>> below is tomcat connector config
> > > > > > > >>>
> > > > > > > >>> <Connector port="8080"
> > > > > > > >>>
> > > > > > protocol="org.apache.coyote.http11.Http11NioProtocol"
> > > > > > > >>>                  connectionTimeout="200" maxThreads="50000"
> > > > > > > >>>                  URIEncoding="UTF-8"
> > > > > > > >>>                  redirectPort="8443" />
> > > > > > > >>
> > > > > > > >> 50,000 threads is a LOT of threads.
> > > > > > > >>
> > > > > > > >>> We monitor the open file using *watch "sudo ls /proc/`cat
> > > > > > > >>> /var/run/tomcat8.pid`/fd/ | wc -l" *the number of tomcat
> open
> > > > files
> > > > > > > keeps
> > > > > > > >>> increasing slowing the responses. the only option to
> recover
> > > from
> > > > > > this
> > > > > > > is
> > > > > > > >>> to restart tomcat.
> > > > > > > >>
> > > > > > > >> So this looks like Linux (/proc filesystem). Linux kernels
> > have
> > > a
> > > > > > 16-bit
> > > > > > > >> pid space which means a theoretical max pid of 65535. In
> > > practice,
> > > > > the
> > > > > > > >> max pid is actually to be found here:
> > > > > > > >>
> > > > > > > >> $ cat /proc/sys/kernel/pid_max
> > > > > > > >> 32768
> > > > > > > >>
> > > > > > > >> (on my Debian Linux system, 4.9.0-era kernel)
> > > > > > > >>
> > > > > > > >> Each thread takes a pid. 50k threads means more than the
> > maximum
> > > > > > allowed
> > > > > > > >> on the OS. So you will eventually hit some kind of serious
> > > problem
> > > > > > with
> > > > > > > >> that many threads.
> > > > > > > >>
> > > > > > > >> How many fds do you get in the process before Tomcat grinds
> > to a
> > > > > halt?
> > > > > > > >> What does the CPU usage look like? The process I/O? Disk
> > usage?
> > > > What
> > > > > > > >> does a thread dump look like (if you have the disk space to
> > dump
> > > > > it!)?
> > > > > > > >>
> > > > > > > >> Why do you need that many threads?
> > > > > > > >>
> > > > > > > >> -chris
> > > > > > > >>
> > > > > > > >>
> > > > >
> ---------------------------------------------------------------------
> > > > > > > >> To unsubscribe, e-mail: [hidden email]
> > > > > > > >> For additional commands, e-mail:
> [hidden email]
> > > > > > > >>
> > > > > > > >>
> > > > > > > >
> > > > > > >
> > > > > > >
> > > ---------------------------------------------------------------------
> > > > > > > To unsubscribe, e-mail: [hidden email]
> > > > > > > For additional commands, e-mail: [hidden email]
> > > > > > >
> > > > > > >
> > > > > >
> > > > > > --
> > > > > >
> > --------------------------------------------------------------------
> > > > > > Sun Certified Enterprise Architect 1.5
> > > > > > Sun Certified Java Programmer 1.4
> > > > > > Microsoft Certified Systems Engineer 2000
> > > > > > http://in.linkedin.com/pub/ayub-khan/a/811/b81
> > > > > > mobile:+966-502674604
> > > > > >
> > > ----------------------------------------------------------------------
> > > > > > It is proved that Hard Work and kowledge will get you close but
> > > > attitude
> > > > > > will get you there. However, it's the Love
> > > > > > of God that will put you over the top!!
> > > > > >
> > > > >
> > > >
> > > >
> > > > --
> > > > --------------------------------------------------------------------
> > > > Sun Certified Enterprise Architect 1.5
> > > > Sun Certified Java Programmer 1.4
> > > > Microsoft Certified Systems Engineer 2000
> > > > http://in.linkedin.com/pub/ayub-khan/a/811/b81
> > > > mobile:+966-502674604
> > > >
> ----------------------------------------------------------------------
> > > > It is proved that Hard Work and kowledge will get you close but
> > attitude
> > > > will get you there. However, it's the Love
> > > > of God that will put you over the top!!
> > > >
> > >
> >
> >
> > --
> > --------------------------------------------------------------------
> > Sun Certified Enterprise Architect 1.5
> > Sun Certified Java Programmer 1.4
> > Microsoft Certified Systems Engineer 2000
> > http://in.linkedin.com/pub/ayub-khan/a/811/b81
> > mobile:+966-502674604
> > ----------------------------------------------------------------------
> > It is proved that Hard Work and kowledge will get you close but attitude
> > will get you there. However, it's the Love
> > of God that will put you over the top!!
> >
>


--
--------------------------------------------------------------------
Sun Certified Enterprise Architect 1.5
Sun Certified Java Programmer 1.4
Microsoft Certified Systems Engineer 2000
http://in.linkedin.com/pub/ayub-khan/a/811/b81
mobile:+966-502674604
----------------------------------------------------------------------
It is proved that Hard Work and kowledge will get you close but attitude
will get you there. However, it's the Love
of God that will put you over the top!!
Reply | Threaded
Open this post in threaded view
|

Re: NGINX + tomcat 8.0.35 (110: Connection timed out)

Christopher Schultz-2
In reply to this post by Ayub Khan
Ayub,

On 11/11/20 16:16, Ayub Khan wrote:
> I was load testing using the ec2 load balancer dns. I have increased the
> connector timeout to 6000 and also gave 32gig to the JVM of tomcat. I am
> not seeing connection timeout in nginx logs now. No errors in kernel.log I
> am not seeing any errors in tomcat catalina.out.

The timeouts are most likely related to the connection timeout (and
therefore keepalive) setting. If you are proxying connections from nginx
and they should be staying open, you should really never be experiencing
a timeout between nginx and Tomcat.

> During regular operations when the request count is between 4 to 6k
> requests per minute the open files count for the tomcat process is between
> 200 to 350. Responses from tomcat are within 5 seconds.

Good.

> If the requests count goes beyond 6.5 k open files slowly move up  to 2300
> to 3000 and the request responses from tomcat become slow.

This is pretty important, here. You are measuring two things:

1. Rise in file descriptor count
2. Application slowness

You are assuming that #1 is causing #2. It's entirely possible that #2
is causing #1.

The real question is "why is the application slowing down". Do you see
CPU spikes? If not, check your db connections.

If your db connection pool is fully-utilized (no more available), then
you may have lots of request processing threads sitting there waiting on
db connections. You'd see a rise in incoming connections (waiting) which
aren't making any progress, and the application seems to "slow down",
and there is a snowball effect where more requests means more waiting,
and therefore more slowness. This would manifest as sloe response times
without any CPU spike.

You could also have a slow database and/or some other resource such as a
downstream web service.

I would investigate those options before trying to prove that fds don't
scale on JVM or Linux (because they likely DO scale quite well).

> I am not concerned about high open files as I do not see any errors related
> to open files. Only side effect of  open files going above 700 is the
> response from tomcat is slow. I checked if this is caused from elastic
> search, aws cloud watch shows elastic search response is within 5
> milliseconds.
>
> what might be the reason that when the open files goes beyond 600, it slows
> down the response time for tomcat. I tried with tomcat 9 and it's the same
> behavior

You might want to add some debug logging to your application when
getting ready to contact e.g. a database or remote service. Something like:

[timestamp] [thread-id] DEBUG Making call to X
[timestamp] [thread-id] DEBUG Completed call to X

or

[timestamp] [thread-id] DEBUG Call to X took [duration]ms

Then have a look at all those logs when the applications slows down and
see if you can observe a significant jump in the time-to-complete those
operations.

Hope that helps,
-chris

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: NGINX + tomcat 8.0.35 (110: Connection timed out)

Ayub Khan
Chris,

I am using hikaricp connection pooling and the maximum pool size is set to
100, without specifying minimum idle connections. Even during high load I
see there are more than 80 connections in idle state.

I have setup debug statements to print the total time taken to complete the
request. The response time of completed call during load is around 5
seconds, the response time without load is around 400 to 500 milliseconds

During the load I cannot even access static html page






On Thu, Nov 12, 2020 at 4:59 PM Christopher Schultz <
[hidden email]> wrote:

> Ayub,
>
> On 11/11/20 16:16, Ayub Khan wrote:
> > I was load testing using the ec2 load balancer dns. I have increased the
> > connector timeout to 6000 and also gave 32gig to the JVM of tomcat. I am
> > not seeing connection timeout in nginx logs now. No errors in kernel.log
> I
> > am not seeing any errors in tomcat catalina.out.
>
> The timeouts are most likely related to the connection timeout (and
> therefore keepalive) setting. If you are proxying connections from nginx
> and they should be staying open, you should really never be experiencing
> a timeout between nginx and Tomcat.
>
> > During regular operations when the request count is between 4 to 6k
> > requests per minute the open files count for the tomcat process is
> between
> > 200 to 350. Responses from tomcat are within 5 seconds.
>
> Good.
>
> > If the requests count goes beyond 6.5 k open files slowly move up  to
> 2300
> > to 3000 and the request responses from tomcat become slow.
>
> This is pretty important, here. You are measuring two things:
>
> 1. Rise in file descriptor count
> 2. Application slowness
>
> You are assuming that #1 is causing #2. It's entirely possible that #2
> is causing #1.
>
> The real question is "why is the application slowing down". Do you see
> CPU spikes? If not, check your db connections.
>
> If your db connection pool is fully-utilized (no more available), then
> you may have lots of request processing threads sitting there waiting on
> db connections. You'd see a rise in incoming connections (waiting) which
> aren't making any progress, and the application seems to "slow down",
> and there is a snowball effect where more requests means more waiting,
> and therefore more slowness. This would manifest as sloe response times
> without any CPU spike.
>
> You could also have a slow database and/or some other resource such as a
> downstream web service.
>
> I would investigate those options before trying to prove that fds don't
> scale on JVM or Linux (because they likely DO scale quite well).
>
> > I am not concerned about high open files as I do not see any errors
> related
> > to open files. Only side effect of  open files going above 700 is the
> > response from tomcat is slow. I checked if this is caused from elastic
> > search, aws cloud watch shows elastic search response is within 5
> > milliseconds.
> >
> > what might be the reason that when the open files goes beyond 600, it
> slows
> > down the response time for tomcat. I tried with tomcat 9 and it's the
> same
> > behavior
>
> You might want to add some debug logging to your application when
> getting ready to contact e.g. a database or remote service. Something like:
>
> [timestamp] [thread-id] DEBUG Making call to X
> [timestamp] [thread-id] DEBUG Completed call to X
>
> or
>
> [timestamp] [thread-id] DEBUG Call to X took [duration]ms
>
> Then have a look at all those logs when the applications slows down and
> see if you can observe a significant jump in the time-to-complete those
> operations.
>
> Hope that helps,
> -chris
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>
>

--
--------------------------------------------------------------------
Sun Certified Enterprise Architect 1.5
Sun Certified Java Programmer 1.4
Microsoft Certified Systems Engineer 2000
http://in.linkedin.com/pub/ayub-khan/a/811/b81
mobile:+966-502674604
----------------------------------------------------------------------
It is proved that Hard Work and kowledge will get you close but attitude
will get you there. However, it's the Love
of God that will put you over the top!!
Reply | Threaded
Open this post in threaded view
|

Re: NGINX + tomcat 8.0.35 (110: Connection timed out)

Ayub Khan
Chris,

I am using hikaricp connection pooling and the maximum pool size is set to
100, without specifying minimum idle connections. Even during high load I
see there are more than 80 connections in idle state.

I have setup debug statements to print the total time taken to complete the
request. The response time of completed call during load is around 5
seconds, the response time without load is around 400 to 500 milliseconds

During the load I cannot even access static html page

Using Jmeter, I executed 1500 requests to AWS elastic load balancer which
had only one VM instance of ninx--> tomcat  on the same VM and tomcat
consumed total memory of 30Gig and CPU was at 28% t

On Thu, Nov 12, 2020 at 6:47 PM Ayub Khan <[hidden email]> wrote:

> Chris,
>
> I am using hikaricp connection pooling and the maximum pool size is set to
> 100, without specifying minimum idle connections. Even during high load I
> see there are more than 80 connections in idle state.
>
> I have setup debug statements to print the total time taken to complete
> the request. The response time of completed call during load is around 5
> seconds, the response time without load is around 400 to 500 milliseconds
>
> During the load I cannot even access static html page
>
>
>
>
>
>
> On Thu, Nov 12, 2020 at 4:59 PM Christopher Schultz <
> [hidden email]> wrote:
>
>> Ayub,
>>
>> On 11/11/20 16:16, Ayub Khan wrote:
>> > I was load testing using the ec2 load balancer dns. I have increased the
>> > connector timeout to 6000 and also gave 32gig to the JVM of tomcat. I am
>> > not seeing connection timeout in nginx logs now. No errors in
>> kernel.log I
>> > am not seeing any errors in tomcat catalina.out.
>>
>> The timeouts are most likely related to the connection timeout (and
>> therefore keepalive) setting. If you are proxying connections from nginx
>> and they should be staying open, you should really never be experiencing
>> a timeout between nginx and Tomcat.
>>
>> > During regular operations when the request count is between 4 to 6k
>> > requests per minute the open files count for the tomcat process is
>> between
>> > 200 to 350. Responses from tomcat are within 5 seconds.
>>
>> Good.
>>
>> > If the requests count goes beyond 6.5 k open files slowly move up  to
>> 2300
>> > to 3000 and the request responses from tomcat become slow.
>>
>> This is pretty important, here. You are measuring two things:
>>
>> 1. Rise in file descriptor count
>> 2. Application slowness
>>
>> You are assuming that #1 is causing #2. It's entirely possible that #2
>> is causing #1.
>>
>> The real question is "why is the application slowing down". Do you see
>> CPU spikes? If not, check your db connections.
>>
>> If your db connection pool is fully-utilized (no more available), then
>> you may have lots of request processing threads sitting there waiting on
>> db connections. You'd see a rise in incoming connections (waiting) which
>> aren't making any progress, and the application seems to "slow down",
>> and there is a snowball effect where more requests means more waiting,
>> and therefore more slowness. This would manifest as sloe response times
>> without any CPU spike.
>>
>> You could also have a slow database and/or some other resource such as a
>> downstream web service.
>>
>> I would investigate those options before trying to prove that fds don't
>> scale on JVM or Linux (because they likely DO scale quite well).
>>
>> > I am not concerned about high open files as I do not see any errors
>> related
>> > to open files. Only side effect of  open files going above 700 is the
>> > response from tomcat is slow. I checked if this is caused from elastic
>> > search, aws cloud watch shows elastic search response is within 5
>> > milliseconds.
>> >
>> > what might be the reason that when the open files goes beyond 600, it
>> slows
>> > down the response time for tomcat. I tried with tomcat 9 and it's the
>> same
>> > behavior
>>
>> You might want to add some debug logging to your application when
>> getting ready to contact e.g. a database or remote service. Something
>> like:
>>
>> [timestamp] [thread-id] DEBUG Making call to X
>> [timestamp] [thread-id] DEBUG Completed call to X
>>
>> or
>>
>> [timestamp] [thread-id] DEBUG Call to X took [duration]ms
>>
>> Then have a look at all those logs when the applications slows down and
>> see if you can observe a significant jump in the time-to-complete those
>> operations.
>>
>> Hope that helps,
>> -chris
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: [hidden email]
>> For additional commands, e-mail: [hidden email]
>>
>>
>
> --
> --------------------------------------------------------------------
> Sun Certified Enterprise Architect 1.5
> Sun Certified Java Programmer 1.4
> Microsoft Certified Systems Engineer 2000
> http://in.linkedin.com/pub/ayub-khan/a/811/b81
> mobile:+966-502674604
> ----------------------------------------------------------------------
> It is proved that Hard Work and kowledge will get you close but attitude
> will get you there. However, it's the Love
> of God that will put you over the top!!
>


--
--------------------------------------------------------------------
Sun Certified Enterprise Architect 1.5
Sun Certified Java Programmer 1.4
Microsoft Certified Systems Engineer 2000
http://in.linkedin.com/pub/ayub-khan/a/811/b81
mobile:+966-502674604
----------------------------------------------------------------------
It is proved that Hard Work and kowledge will get you close but attitude
will get you there. However, it's the Love
of God that will put you over the top!!
Reply | Threaded
Open this post in threaded view
|

Re: NGINX + tomcat 8.0.35 (110: Connection timed out)

Christopher Schultz-2
In reply to this post by Ayub Khan
Ayub,

On 11/12/20 10:47, Ayub Khan wrote:
> Chris,
>
> I am using hikaricp connection pooling and the maximum pool size is set to
> 100, without specifying minimum idle connections. Even during high load I
> see there are more than 80 connections in idle state.
>
> I have setup debug statements to print the total time taken to complete the
> request. The response time of completed call during load is around 5
> seconds, the response time without load is around 400 to 500 milliseconds

That's a significant difference. Is your database server showing high
CPU usage or more I/O usage during those high-load times?

> During the load I cannot even access static html page

Now *that* is an interesting data point.

You are sure that the "static" request doesn't hit any other resources?
No filter is doing anything? No logging to an external service or
double-checking any security constraints in the db before serving the page?

(And the static page is being returned by Tomcat, not nginx, right?)

-chris

> On Thu, Nov 12, 2020 at 4:59 PM Christopher Schultz <
> [hidden email]> wrote:
>
>> Ayub,
>>
>> On 11/11/20 16:16, Ayub Khan wrote:
>>> I was load testing using the ec2 load balancer dns. I have increased the
>>> connector timeout to 6000 and also gave 32gig to the JVM of tomcat. I am
>>> not seeing connection timeout in nginx logs now. No errors in kernel.log
>> I
>>> am not seeing any errors in tomcat catalina.out.
>>
>> The timeouts are most likely related to the connection timeout (and
>> therefore keepalive) setting. If you are proxying connections from nginx
>> and they should be staying open, you should really never be experiencing
>> a timeout between nginx and Tomcat.
>>
>>> During regular operations when the request count is between 4 to 6k
>>> requests per minute the open files count for the tomcat process is
>> between
>>> 200 to 350. Responses from tomcat are within 5 seconds.
>>
>> Good.
>>
>>> If the requests count goes beyond 6.5 k open files slowly move up  to
>> 2300
>>> to 3000 and the request responses from tomcat become slow.
>>
>> This is pretty important, here. You are measuring two things:
>>
>> 1. Rise in file descriptor count
>> 2. Application slowness
>>
>> You are assuming that #1 is causing #2. It's entirely possible that #2
>> is causing #1.
>>
>> The real question is "why is the application slowing down". Do you see
>> CPU spikes? If not, check your db connections.
>>
>> If your db connection pool is fully-utilized (no more available), then
>> you may have lots of request processing threads sitting there waiting on
>> db connections. You'd see a rise in incoming connections (waiting) which
>> aren't making any progress, and the application seems to "slow down",
>> and there is a snowball effect where more requests means more waiting,
>> and therefore more slowness. This would manifest as sloe response times
>> without any CPU spike.
>>
>> You could also have a slow database and/or some other resource such as a
>> downstream web service.
>>
>> I would investigate those options before trying to prove that fds don't
>> scale on JVM or Linux (because they likely DO scale quite well).
>>
>>> I am not concerned about high open files as I do not see any errors
>> related
>>> to open files. Only side effect of  open files going above 700 is the
>>> response from tomcat is slow. I checked if this is caused from elastic
>>> search, aws cloud watch shows elastic search response is within 5
>>> milliseconds.
>>>
>>> what might be the reason that when the open files goes beyond 600, it
>> slows
>>> down the response time for tomcat. I tried with tomcat 9 and it's the
>> same
>>> behavior
>>
>> You might want to add some debug logging to your application when
>> getting ready to contact e.g. a database or remote service. Something like:
>>
>> [timestamp] [thread-id] DEBUG Making call to X
>> [timestamp] [thread-id] DEBUG Completed call to X
>>
>> or
>>
>> [timestamp] [thread-id] DEBUG Call to X took [duration]ms
>>
>> Then have a look at all those logs when the applications slows down and
>> see if you can observe a significant jump in the time-to-complete those
>> operations.
>>
>> Hope that helps,
>> -chris
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: [hidden email]
>> For additional commands, e-mail: [hidden email]
>>
>>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: NGINX + tomcat 8.0.35 (110: Connection timed out)

Ayub Khan
Chris,

That's correct, it's just a plain static hello world page I created to
verify tomcat. It is served by tomcat. I have bundled this page in the same
context where the service is running. When I create load on the service and
then try to access the static hello world page browser keeps busy and does
not return the page.

I checked the database dashboard and the monitoring charts are normal, no
spikes on cpu or any other resources of the database. The delay is
noticeable when there are more than 1000 concurrent requests from each of 4
different JMeter test instances

Why does tomcat not even serve the html page




On Thu, Nov 12, 2020 at 7:01 PM Christopher Schultz <
[hidden email]> wrote:

> Ayub,
>
> On 11/12/20 10:47, Ayub Khan wrote:
> > Chris,
> >
> > I am using hikaricp connection pooling and the maximum pool size is set
> to
> > 100, without specifying minimum idle connections. Even during high load I
> > see there are more than 80 connections in idle state.
> >
> > I have setup debug statements to print the total time taken to complete
> the
> > request. The response time of completed call during load is around 5
> > seconds, the response time without load is around 400 to 500 milliseconds
>
> That's a significant difference. Is your database server showing high
> CPU usage or more I/O usage during those high-load times?
>
> > During the load I cannot even access static html page
>
> Now *that* is an interesting data point.
>
> You are sure that the "static" request doesn't hit any other resources?
> No filter is doing anything? No logging to an external service or
> double-checking any security constraints in the db before serving the page?
>
> (And the static page is being returned by Tomcat, not nginx, right?)
>
> -chris
>
> > On Thu, Nov 12, 2020 at 4:59 PM Christopher Schultz <
> > [hidden email]> wrote:
> >
> >> Ayub,
> >>
> >> On 11/11/20 16:16, Ayub Khan wrote:
> >>> I was load testing using the ec2 load balancer dns. I have increased
> the
> >>> connector timeout to 6000 and also gave 32gig to the JVM of tomcat. I
> am
> >>> not seeing connection timeout in nginx logs now. No errors in
> kernel.log
> >> I
> >>> am not seeing any errors in tomcat catalina.out.
> >>
> >> The timeouts are most likely related to the connection timeout (and
> >> therefore keepalive) setting. If you are proxying connections from nginx
> >> and they should be staying open, you should really never be experiencing
> >> a timeout between nginx and Tomcat.
> >>
> >>> During regular operations when the request count is between 4 to 6k
> >>> requests per minute the open files count for the tomcat process is
> >> between
> >>> 200 to 350. Responses from tomcat are within 5 seconds.
> >>
> >> Good.
> >>
> >>> If the requests count goes beyond 6.5 k open files slowly move up  to
> >> 2300
> >>> to 3000 and the request responses from tomcat become slow.
> >>
> >> This is pretty important, here. You are measuring two things:
> >>
> >> 1. Rise in file descriptor count
> >> 2. Application slowness
> >>
> >> You are assuming that #1 is causing #2. It's entirely possible that #2
> >> is causing #1.
> >>
> >> The real question is "why is the application slowing down". Do you see
> >> CPU spikes? If not, check your db connections.
> >>
> >> If your db connection pool is fully-utilized (no more available), then
> >> you may have lots of request processing threads sitting there waiting on
> >> db connections. You'd see a rise in incoming connections (waiting) which
> >> aren't making any progress, and the application seems to "slow down",
> >> and there is a snowball effect where more requests means more waiting,
> >> and therefore more slowness. This would manifest as sloe response times
> >> without any CPU spike.
> >>
> >> You could also have a slow database and/or some other resource such as a
> >> downstream web service.
> >>
> >> I would investigate those options before trying to prove that fds don't
> >> scale on JVM or Linux (because they likely DO scale quite well).
> >>
> >>> I am not concerned about high open files as I do not see any errors
> >> related
> >>> to open files. Only side effect of  open files going above 700 is the
> >>> response from tomcat is slow. I checked if this is caused from elastic
> >>> search, aws cloud watch shows elastic search response is within 5
> >>> milliseconds.
> >>>
> >>> what might be the reason that when the open files goes beyond 600, it
> >> slows
> >>> down the response time for tomcat. I tried with tomcat 9 and it's the
> >> same
> >>> behavior
> >>
> >> You might want to add some debug logging to your application when
> >> getting ready to contact e.g. a database or remote service. Something
> like:
> >>
> >> [timestamp] [thread-id] DEBUG Making call to X
> >> [timestamp] [thread-id] DEBUG Completed call to X
> >>
> >> or
> >>
> >> [timestamp] [thread-id] DEBUG Call to X took [duration]ms
> >>
> >> Then have a look at all those logs when the applications slows down and
> >> see if you can observe a significant jump in the time-to-complete those
> >> operations.
> >>
> >> Hope that helps,
> >> -chris
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: [hidden email]
> >> For additional commands, e-mail: [hidden email]
> >>
> >>
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>
>

--
--------------------------------------------------------------------
Sun Certified Enterprise Architect 1.5
Sun Certified Java Programmer 1.4
Microsoft Certified Systems Engineer 2000
http://in.linkedin.com/pub/ayub-khan/a/811/b81
mobile:+966-502674604
----------------------------------------------------------------------
It is proved that Hard Work and kowledge will get you close but attitude
will get you there. However, it's the Love
of God that will put you over the top!!
Reply | Threaded
Open this post in threaded view
|

Re: NGINX + tomcat 8.0.35 (110: Connection timed out)

Christopher Schultz-2
Ayub,

On 11/12/20 11:20, Ayub Khan wrote:

> Chris,
>
> That's correct, it's just a plain static hello world page I created to
> verify tomcat. It is served by tomcat. I have bundled this page in the same
> context where the service is running. When I create load on the service and
> then try to access the static hello world page browser keeps busy and does
> not return the page.
>
> I checked the database dashboard and the monitoring charts are normal, no
> spikes on cpu or any other resources of the database. The delay is
> noticeable when there are more than 1000 concurrent requests from each of 4
> different JMeter test instances

That's 4000 concurrent requests. Your <Connector> only has 2000 threads,
so only 2000 requests can be processed simultaneously.

You have a keepalive timeout of 6 seconds (6000ms) and I'm guessing your
load test doesn't actually use KeepAlive.

> Why does tomcat not even serve the html page

I think the keepalive timeout explains what you are seeing.

Are you instructing JMeter to re-use connections and also use KeepAlive?

What happens if you set the KeepAlive timeout to 1 second instead of 6?
Does that improve things?

-chris

> On Thu, Nov 12, 2020 at 7:01 PM Christopher Schultz <
> [hidden email]> wrote:
>
>> Ayub,
>>
>> On 11/12/20 10:47, Ayub Khan wrote:
>>> Chris,
>>>
>>> I am using hikaricp connection pooling and the maximum pool size is set
>> to
>>> 100, without specifying minimum idle connections. Even during high load I
>>> see there are more than 80 connections in idle state.
>>>
>>> I have setup debug statements to print the total time taken to complete
>> the
>>> request. The response time of completed call during load is around 5
>>> seconds, the response time without load is around 400 to 500 milliseconds
>>
>> That's a significant difference. Is your database server showing high
>> CPU usage or more I/O usage during those high-load times?
>>
>>> During the load I cannot even access static html page
>>
>> Now *that* is an interesting data point.
>>
>> You are sure that the "static" request doesn't hit any other resources?
>> No filter is doing anything? No logging to an external service or
>> double-checking any security constraints in the db before serving the page?
>>
>> (And the static page is being returned by Tomcat, not nginx, right?)
>>
>> -chris
>>
>>> On Thu, Nov 12, 2020 at 4:59 PM Christopher Schultz <
>>> [hidden email]> wrote:
>>>
>>>> Ayub,
>>>>
>>>> On 11/11/20 16:16, Ayub Khan wrote:
>>>>> I was load testing using the ec2 load balancer dns. I have increased
>> the
>>>>> connector timeout to 6000 and also gave 32gig to the JVM of tomcat. I
>> am
>>>>> not seeing connection timeout in nginx logs now. No errors in
>> kernel.log
>>>> I
>>>>> am not seeing any errors in tomcat catalina.out.
>>>>
>>>> The timeouts are most likely related to the connection timeout (and
>>>> therefore keepalive) setting. If you are proxying connections from nginx
>>>> and they should be staying open, you should really never be experiencing
>>>> a timeout between nginx and Tomcat.
>>>>
>>>>> During regular operations when the request count is between 4 to 6k
>>>>> requests per minute the open files count for the tomcat process is
>>>> between
>>>>> 200 to 350. Responses from tomcat are within 5 seconds.
>>>>
>>>> Good.
>>>>
>>>>> If the requests count goes beyond 6.5 k open files slowly move up  to
>>>> 2300
>>>>> to 3000 and the request responses from tomcat become slow.
>>>>
>>>> This is pretty important, here. You are measuring two things:
>>>>
>>>> 1. Rise in file descriptor count
>>>> 2. Application slowness
>>>>
>>>> You are assuming that #1 is causing #2. It's entirely possible that #2
>>>> is causing #1.
>>>>
>>>> The real question is "why is the application slowing down". Do you see
>>>> CPU spikes? If not, check your db connections.
>>>>
>>>> If your db connection pool is fully-utilized (no more available), then
>>>> you may have lots of request processing threads sitting there waiting on
>>>> db connections. You'd see a rise in incoming connections (waiting) which
>>>> aren't making any progress, and the application seems to "slow down",
>>>> and there is a snowball effect where more requests means more waiting,
>>>> and therefore more slowness. This would manifest as sloe response times
>>>> without any CPU spike.
>>>>
>>>> You could also have a slow database and/or some other resource such as a
>>>> downstream web service.
>>>>
>>>> I would investigate those options before trying to prove that fds don't
>>>> scale on JVM or Linux (because they likely DO scale quite well).
>>>>
>>>>> I am not concerned about high open files as I do not see any errors
>>>> related
>>>>> to open files. Only side effect of  open files going above 700 is the
>>>>> response from tomcat is slow. I checked if this is caused from elastic
>>>>> search, aws cloud watch shows elastic search response is within 5
>>>>> milliseconds.
>>>>>
>>>>> what might be the reason that when the open files goes beyond 600, it
>>>> slows
>>>>> down the response time for tomcat. I tried with tomcat 9 and it's the
>>>> same
>>>>> behavior
>>>>
>>>> You might want to add some debug logging to your application when
>>>> getting ready to contact e.g. a database or remote service. Something
>> like:
>>>>
>>>> [timestamp] [thread-id] DEBUG Making call to X
>>>> [timestamp] [thread-id] DEBUG Completed call to X
>>>>
>>>> or
>>>>
>>>> [timestamp] [thread-id] DEBUG Call to X took [duration]ms
>>>>
>>>> Then have a look at all those logs when the applications slows down and
>>>> see if you can observe a significant jump in the time-to-complete those
>>>> operations.
>>>>
>>>> Hope that helps,
>>>> -chris
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: [hidden email]
>>>> For additional commands, e-mail: [hidden email]
>>>>
>>>>
>>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: [hidden email]
>> For additional commands, e-mail: [hidden email]
>>
>>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: NGINX + tomcat 8.0.35 (110: Connection timed out)

Ayub Khan
Chris,

I have changed the connector config as below, it has improved the
performance. I want to use this config to support at least 20k concurrent
requests. I have tested this config and there is a delay in the response
and noticed that it's coming from elastic search. I am trying to increase
the number of replicas for elastic search to improve the performance. Could
you please verify if the below connector config is good enough if I exclude
elastic search tuning ?

<Connector port="8080"
               protocol="org.apache.coyote.http11.Http11NioProtocol"
               connectionTimeout="1000" maxConnections="40000"
maxThreads="40000" processorCache="2000"  minSpareThreads="4000"
               maxKeepAliveRequests="4000"
               URIEncoding="UTF-8"
               redirectPort="8443" />


On Thu, Nov 12, 2020 at 8:12 PM Christopher Schultz <
[hidden email]> wrote:

> Ayub,
>
> On 11/12/20 11:20, Ayub Khan wrote:
> > Chris,
> >
> > That's correct, it's just a plain static hello world page I created to
> > verify tomcat. It is served by tomcat. I have bundled this page in the
> same
> > context where the service is running. When I create load on the service
> and
> > then try to access the static hello world page browser keeps busy and
> does
> > not return the page.
> >
> > I checked the database dashboard and the monitoring charts are normal, no
> > spikes on cpu or any other resources of the database. The delay is
> > noticeable when there are more than 1000 concurrent requests from each
> of 4
> > different JMeter test instances
>
> That's 4000 concurrent requests. Your <Connector> only has 2000 threads,
> so only 2000 requests can be processed simultaneously.
>
> You have a keepalive timeout of 6 seconds (6000ms) and I'm guessing your
> load test doesn't actually use KeepAlive.
>
> > Why does tomcat not even serve the html page
>
> I think the keepalive timeout explains what you are seeing.
>
> Are you instructing JMeter to re-use connections and also use KeepAlive?
>
> What happens if you set the KeepAlive timeout to 1 second instead of 6?
> Does that improve things?
>
> -chris
>
> > On Thu, Nov 12, 2020 at 7:01 PM Christopher Schultz <
> > [hidden email]> wrote:
> >
> >> Ayub,
> >>
> >> On 11/12/20 10:47, Ayub Khan wrote:
> >>> Chris,
> >>>
> >>> I am using hikaricp connection pooling and the maximum pool size is set
> >> to
> >>> 100, without specifying minimum idle connections. Even during high
> load I
> >>> see there are more than 80 connections in idle state.
> >>>
> >>> I have setup debug statements to print the total time taken to complete
> >> the
> >>> request. The response time of completed call during load is around 5
> >>> seconds, the response time without load is around 400 to 500
> milliseconds
> >>
> >> That's a significant difference. Is your database server showing high
> >> CPU usage or more I/O usage during those high-load times?
> >>
> >>> During the load I cannot even access static html page
> >>
> >> Now *that* is an interesting data point.
> >>
> >> You are sure that the "static" request doesn't hit any other resources?
> >> No filter is doing anything? No logging to an external service or
> >> double-checking any security constraints in the db before serving the
> page?
> >>
> >> (And the static page is being returned by Tomcat, not nginx, right?)
> >>
> >> -chris
> >>
> >>> On Thu, Nov 12, 2020 at 4:59 PM Christopher Schultz <
> >>> [hidden email]> wrote:
> >>>
> >>>> Ayub,
> >>>>
> >>>> On 11/11/20 16:16, Ayub Khan wrote:
> >>>>> I was load testing using the ec2 load balancer dns. I have increased
> >> the
> >>>>> connector timeout to 6000 and also gave 32gig to the JVM of tomcat. I
> >> am
> >>>>> not seeing connection timeout in nginx logs now. No errors in
> >> kernel.log
> >>>> I
> >>>>> am not seeing any errors in tomcat catalina.out.
> >>>>
> >>>> The timeouts are most likely related to the connection timeout (and
> >>>> therefore keepalive) setting. If you are proxying connections from
> nginx
> >>>> and they should be staying open, you should really never be
> experiencing
> >>>> a timeout between nginx and Tomcat.
> >>>>
> >>>>> During regular operations when the request count is between 4 to 6k
> >>>>> requests per minute the open files count for the tomcat process is
> >>>> between
> >>>>> 200 to 350. Responses from tomcat are within 5 seconds.
> >>>>
> >>>> Good.
> >>>>
> >>>>> If the requests count goes beyond 6.5 k open files slowly move up  to
> >>>> 2300
> >>>>> to 3000 and the request responses from tomcat become slow.
> >>>>
> >>>> This is pretty important, here. You are measuring two things:
> >>>>
> >>>> 1. Rise in file descriptor count
> >>>> 2. Application slowness
> >>>>
> >>>> You are assuming that #1 is causing #2. It's entirely possible that #2
> >>>> is causing #1.
> >>>>
> >>>> The real question is "why is the application slowing down". Do you see
> >>>> CPU spikes? If not, check your db connections.
> >>>>
> >>>> If your db connection pool is fully-utilized (no more available), then
> >>>> you may have lots of request processing threads sitting there waiting
> on
> >>>> db connections. You'd see a rise in incoming connections (waiting)
> which
> >>>> aren't making any progress, and the application seems to "slow down",
> >>>> and there is a snowball effect where more requests means more waiting,
> >>>> and therefore more slowness. This would manifest as sloe response
> times
> >>>> without any CPU spike.
> >>>>
> >>>> You could also have a slow database and/or some other resource such
> as a
> >>>> downstream web service.
> >>>>
> >>>> I would investigate those options before trying to prove that fds
> don't
> >>>> scale on JVM or Linux (because they likely DO scale quite well).
> >>>>
> >>>>> I am not concerned about high open files as I do not see any errors
> >>>> related
> >>>>> to open files. Only side effect of  open files going above 700 is the
> >>>>> response from tomcat is slow. I checked if this is caused from
> elastic
> >>>>> search, aws cloud watch shows elastic search response is within 5
> >>>>> milliseconds.
> >>>>>
> >>>>> what might be the reason that when the open files goes beyond 600, it
> >>>> slows
> >>>>> down the response time for tomcat. I tried with tomcat 9 and it's the
> >>>> same
> >>>>> behavior
> >>>>
> >>>> You might want to add some debug logging to your application when
> >>>> getting ready to contact e.g. a database or remote service. Something
> >> like:
> >>>>
> >>>> [timestamp] [thread-id] DEBUG Making call to X
> >>>> [timestamp] [thread-id] DEBUG Completed call to X
> >>>>
> >>>> or
> >>>>
> >>>> [timestamp] [thread-id] DEBUG Call to X took [duration]ms
> >>>>
> >>>> Then have a look at all those logs when the applications slows down
> and
> >>>> see if you can observe a significant jump in the time-to-complete
> those
> >>>> operations.
> >>>>
> >>>> Hope that helps,
> >>>> -chris
> >>>>
> >>>> ---------------------------------------------------------------------
> >>>> To unsubscribe, e-mail: [hidden email]
> >>>> For additional commands, e-mail: [hidden email]
> >>>>
> >>>>
> >>>
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: [hidden email]
> >> For additional commands, e-mail: [hidden email]
> >>
> >>
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>
>

--
--------------------------------------------------------------------
Sun Certified Enterprise Architect 1.5
Sun Certified Java Programmer 1.4
Microsoft Certified Systems Engineer 2000
http://in.linkedin.com/pub/ayub-khan/a/811/b81
mobile:+966-502674604
----------------------------------------------------------------------
It is proved that Hard Work and kowledge will get you close but attitude
will get you there. However, it's the Love
of God that will put you over the top!!