Tomcat Clustering Support

classic Classic list List threaded Threaded
13 messages Options
Reply | Threaded
Open this post in threaded view
|

Tomcat Clustering Support

Scott Evans
Hi,

Our system is on Apache Tomcat Version 8.0.47.
OS is Windows Server 2012 R2 Datacenter.

We are looking for someone that may be interested in paid contract work to
assist with troubleshooting and resolving a Tomcat clustering issue in our
system.

The system is composed of multiple Java PrimeFaces applications running in
a clustered Tomcat environment which is experiencing occasional
deadlocking issues from an unknown source requiring the Nodes to be cycled
in order to resolve.  The issue is only occurring in our Production
environment and we've determined that the issues are occurring at random
with the replication threads.

We would need someone to help investigate our configuration and determine
if there are any further changes that can be made to our system to catch
these deadlock issues before they occur (requiring a Node cycle).

Please let me know if you or someone you know may be interested or if you
have further questions I can help answer.

Thanks,
Scott

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Tomcat Clustering Support

markt
On 15/08/18 20:43, Scott Evans wrote:

> Hi,
>
> Our system is on Apache Tomcat Version 8.0.47.
> OS is Windows Server 2012 R2 Datacenter.
>
> We are looking for someone that may be interested in paid contract work to
> assist with troubleshooting and resolving a Tomcat clustering issue in our
> system.
>
> The system is composed of multiple Java PrimeFaces applications running in
> a clustered Tomcat environment which is experiencing occasional
> deadlocking issues from an unknown source requiring the Nodes to be cycled
> in order to resolve.  The issue is only occurring in our Production
> environment and we've determined that the issues are occurring at random
> with the replication threads.
>
> We would need someone to help investigate our configuration and determine
> if there are any further changes that can be made to our system to catch
> these deadlock issues before they occur (requiring a Node cycle).
>
> Please let me know if you or someone you know may be interested or if you
> have further questions I can help answer.

If you can provide a thread dump of the deadlock when it occurs we can
probably help you here for free.

Mark

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: [OT] Tomcat Clustering Support

Christopher Schultz-2
In reply to this post by Scott Evans
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA256

Scott,

This list is typically for non-paid support requests, but there really
isn't a great forum for finding Tomcat consultants... other than
Google I suppose.

So I'm marking this Off-Topic and replying below.

On 8/15/18 3:43 PM, Scott Evans wrote:

> Our system is on Apache Tomcat Version 8.0.47. OS is Windows Server
> 2012 R2 Datacenter.
>
> We are looking for someone that may be interested in paid contract
> work to assist with troubleshooting and resolving a Tomcat
> clustering issue in our system.
>
> The system is composed of multiple Java PrimeFaces applications
> running in a clustered Tomcat environment which is experiencing
> occasional deadlocking issues from an unknown source requiring the
> Nodes to be cycled in order to resolve.  The issue is only
> occurring in our Production environment and we've determined that
> the issues are occurring at random with the replication threads.

Is it feasible to disable clustering (session replication) completely
in your environment to determine whether the replication itself is the
problem, or perhaps the problem is elsewhere? Disabling clustering
should only degrade your service if (a) you aren't using
session-stickiness (which pretty much everyone should be using) and
(b) you expect a lot of fail-over and (c) it is unacceptable to have a
failed-over user lose their session information.

> We would need someone to help investigate our configuration and
> determine if there are any further changes that can be made to our
> system to catch these deadlock issues before they occur (requiring
> a Node cycle).

Can you post your cluster configuration from conf/server.xml, minus
any secrets that may be in there? If all configs are the same on each
node except for e.g. IP lists, feel free to just post a single one and
mention that the IPs are different.

It would be good to know how many nodes are clustered and the
replication strategy (e.g. delta versus back). The strategy will be
obvious from the configuration. You'll need to tell us the size of the
cluster... and perhaps the deployment model such as "all servers in
the same DC" or "50/50 split between primary DC and secondary DC", etc.

> Please let me know if you or someone you know may be interested or
> if you have further questions I can help answer.

- -chris
-----BEGIN PGP SIGNATURE-----
Comment: GPGTools - http://gpgtools.org
Comment: Using GnuPG with Thunderbird - https://www.enigmail.net/

iQIzBAEBCAAdFiEEMmKgYcQvxMe7tcJcHPApP6U8pFgFAlt0hSUACgkQHPApP6U8
pFiGPg/+PdAG+pVDfZzPWQRCBfjbaSQaRwum7LDNtBY+w5xQjfuS1t3UmST678ft
RGaMRP5Qm/TYuMBI9mzdXrrRIAFaRI/3QAuuyB0jdzKqn18/6fldoJKHpwlzm27x
SE1r5R4tD/lihC1lFWfXEMzOosO2uk7iBZWUv532zKL7TJ0lrLNgVdD5eakX9iVE
dMpnYMbK9CtWHzQZ0LOYcHlXrnAYcr3OqYxZomHCpYsRjHEndfS8gsWqY1t/+gA0
xl0/Vz4lWXYhjmC/PVyN21LL21PA7MNFuywFmt7Xw4sxzgtHXorkOdJcTaqsoz+3
qqItb91vEQ4MvqjW949NApgV1bTaH/juEx8Z99fe9AEAireBCQ6q3qRdvWVzYNqP
+WE0Ghxv5400h1VPYiYoDkFrJUnlPeVortqn1OldEVqoAfQGFW1Gt9wgpEK7Snfx
EVGYG8alAwuvd5sFR9Ge6FcY8NTp+9/awbCwFAPmCCW6dlwdRmYhCRa8tJ9K6MZn
PeeTPFXqBny1HS3BGR5owDn87Mv97eDxTNDfBjAxJP4u4DcMmtvSvM1Uk5QkkI4z
A+rivKxqNw60qR96njWJOyi6A4Nk1OpTSN541snEAegnNm7ad9SMSot349mk80a9
culdRnzLlMCNdv0GwYpoCZBESOckFRQ2G+XrMNqnaCpNVTfCUyo=
=cPO2
-----END PGP SIGNATURE-----

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

RE: [OT] Tomcat Clustering Support

Scott Evans
Scott Evans, PMP
Senior Manager
[hidden email]

GUILFORD GROUP
business driven software solutions
P 317.814.1035 ext. 222 F 317.814.1044
615 West Carmel Drive, Suite 130
Carmel, IN 46032

-----Original Message-----
From: Christopher Schultz <[hidden email]>
Sent: Wednesday, August 15, 2018 3:55 PM
To: [hidden email]
Subject: Re: [OT] Tomcat Clustering Support

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA256

Scott,

This list is typically for non-paid support requests, but there really isn't
a great forum for finding Tomcat consultants... other than Google I suppose.

So I'm marking this Off-Topic and replying below.

On 8/15/18 3:43 PM, Scott Evans wrote:

> Our system is on Apache Tomcat Version 8.0.47. OS is Windows Server
> 2012 R2 Datacenter.
>
> We are looking for someone that may be interested in paid contract
> work to assist with troubleshooting and resolving a Tomcat clustering
> issue in our system.
>
> The system is composed of multiple Java PrimeFaces applications
> running in a clustered Tomcat environment which is experiencing
> occasional deadlocking issues from an unknown source requiring the
> Nodes to be cycled in order to resolve.  The issue is only occurring
> in our Production environment and we've determined that the issues are
> occurring at random with the replication threads.

Is it feasible to disable clustering (session replication) completely in
your environment to determine whether the replication itself is the problem,
or perhaps the problem is elsewhere? Disabling clustering should only
degrade your service if (a) you aren't using session-stickiness (which
pretty much everyone should be using) and
(b) you expect a lot of fail-over and (c) it is unacceptable to have a
failed-over user lose their session information.

> We would need someone to help investigate our configuration and
> determine if there are any further changes that can be made to our
> system to catch these deadlock issues before they occur (requiring a
> Node cycle).

Can you post your cluster configuration from conf/server.xml, minus any
secrets that may be in there? If all configs are the same on each node
except for e.g. IP lists, feel free to just post a single one and mention
that the IPs are different.

It would be good to know how many nodes are clustered and the replication
strategy (e.g. delta versus back). The strategy will be obvious from the
configuration. You'll need to tell us the size of the cluster... and perhaps
the deployment model such as "all servers in the same DC" or "50/50 split
between primary DC and secondary DC", etc.

> Please let me know if you or someone you know may be interested or if
> you have further questions I can help answer.

- -chris
-----BEGIN PGP SIGNATURE-----
Comment: GPGTools - http://gpgtools.org
Comment: Using GnuPG with Thunderbird - https://www.enigmail.net/

iQIzBAEBCAAdFiEEMmKgYcQvxMe7tcJcHPApP6U8pFgFAlt0hSUACgkQHPApP6U8
pFiGPg/+PdAG+pVDfZzPWQRCBfjbaSQaRwum7LDNtBY+w5xQjfuS1t3UmST678ft
RGaMRP5Qm/TYuMBI9mzdXrrRIAFaRI/3QAuuyB0jdzKqn18/6fldoJKHpwlzm27x
SE1r5R4tD/lihC1lFWfXEMzOosO2uk7iBZWUv532zKL7TJ0lrLNgVdD5eakX9iVE
dMpnYMbK9CtWHzQZ0LOYcHlXrnAYcr3OqYxZomHCpYsRjHEndfS8gsWqY1t/+gA0
xl0/Vz4lWXYhjmC/PVyN21LL21PA7MNFuywFmt7Xw4sxzgtHXorkOdJcTaqsoz+3
qqItb91vEQ4MvqjW949NApgV1bTaH/juEx8Z99fe9AEAireBCQ6q3qRdvWVzYNqP
+WE0Ghxv5400h1VPYiYoDkFrJUnlPeVortqn1OldEVqoAfQGFW1Gt9wgpEK7Snfx
EVGYG8alAwuvd5sFR9Ge6FcY8NTp+9/awbCwFAPmCCW6dlwdRmYhCRa8tJ9K6MZn
PeeTPFXqBny1HS3BGR5owDn87Mv97eDxTNDfBjAxJP4u4DcMmtvSvM1Uk5QkkI4z
A+rivKxqNw60qR96njWJOyi6A4Nk1OpTSN541snEAegnNm7ad9SMSot349mk80a9
culdRnzLlMCNdv0GwYpoCZBESOckFRQ2G+XrMNqnaCpNVTfCUyo=
=cPO2
-----END PGP SIGNATURE-----

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]
----------------------------------------------------------------------------------------------------

Please see server.xml contents below.

--We have 4 nodes in the clustered environment
--Add nodes are in the same DC

---start server.xml---
<?xml version='1.0' encoding='utf-8'?>
<!--
  Licensed to the Apache Software Foundation (ASF) under one or more
  contributor license agreements.  See the NOTICE file distributed with
  this work for additional information regarding copyright ownership.
  The ASF licensaes this file to You under the Apache License, Version 2.0
  (the "License"); you may not use this file except in compliance with
  the License.  You may obtain a copy of the License aat

      http://www.apache.org/licenses/LICENSE-2.0

  Unless required by applicable law or agreed to in writing, software
  distributed under the License is distributed on an "AS IS" BASIS,
  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  See the License for the specific language governing permissions and
  limitations under the License.
-->
<!-- Note:  A "Server" is not itself a "Container", so you may not
     define subcomponents such as "Valves" at this level.
     Documentation at /docs/config/server.html
 -->
<Server port="8005" shutdown="SHUTDOWN">
  <Listener className="org.apache.catalina.startup.VersionLoggerListener" />
  <!-- Security listener. Documentation at /docs/config/listeners.html
  <Listener className="org.apache.catalina.security.SecurityListener" />
  -->
  <!--APR library loader. Documentation at /docs/apr.html -->
  <Listener className="org.apache.catalina.core.AprLifecycleListener"
SSLEngine="on" />
  <!-- Prevent memory leaks due to use of particular java/javax APIs-->
  <Listener
className="org.apache.catalina.core.JreMemoryLeakPreventionListener" />
  <Listener
className="org.apache.catalina.mbeans.GlobalResourcesLifecycleListener" />
  <Listener
className="org.apache.catalina.core.ThreadLocalLeakPreventionListener" />

  <!-- Global JNDI resources
       Documentation at /docs/jndi-resources-howto.html
  -->
  <GlobalNamingResources>
    <!-- Editable user database that can also be used by
         UserDatabaseRealm to authenticate users
    -->
    <Resource name="UserDatabase" auth="Container"
              type="org.apache.catalina.UserDatabase"
              description="User database that can be updated and saved"
              factory="org.apache.catalina.users.MemoryUserDatabaseFactory"
              pathname="conf/tomcat-users.xml" />




<Resource
        type="javax.sql.DataSource"
        name="jdbc/M2_datasource"
        factory="org.apache.tomcat.jdbc.pool.DataSourceFactory"
        driverClassName="com.ibm.as400.access.AS400JDBCDriver"

        url="jdbc:as400://AS400A;naming=system;libraries="
        username=""
        password=""

        validationQuery="SELECT 1 from SYSIBM.SYSDUMMY1"
        validationInterval="30000"

        testWhileIdle="true"
        testOnBorrow="false"
        testOnReturn="false"

        maxActive="400"
        minIdle="0"
        maxIdle="5"
        maxWait="30000"
        initialSize="0"

        removeAbandoned="true"
        removeAbandonedTimeout="60"

        minEvictableIdleTimeMillis="300000"

        logAbandoned="true"
/>


<Resource
        type="javax.sql.DataSource"
        name="jdbc/M1_M3_X1_datasource"
        factory="org.apache.tomcat.jdbc.pool.DataSourceFactory"
        driverClassName="com.ibm.as400.access.AS400JDBCDriver"

        url="jdbc:as400://AS400A;naming=system;"
        username=""
        password=""

        validationQuery="SELECT 1 from SYSIBM.SYSDUMMY1"
        validationInterval="30000"

        testWhileIdle="true"
        testOnBorrow="false"
        testOnReturn="false"

        maxActive="400"
        minIdle="0"
        maxIdle="5"
        maxWait="30000"
        initialSize="0"

        removeAbandoned="true"
        removeAbandonedTimeout="60"

        minEvictableIdleTimeMillis="300000"

        logAbandoned="true"
/>


<Resource
        type="javax.sql.DataSource"
        name="jdbc/keycloak_datasource"
        factory="org.apache.tomcat.jdbc.pool.DataSourceFactory"
        driverClassName="com.mysql.jdbc.Driver"

        url="jdbc:mysql://t"
        username=""
        password=""

        validationQuery="/* ping */ SELECT 1"
        validationInterval="30000"

        testWhileIdle="true"
        testOnBorrow="false"
        testOnReturn="false"

        maxActive="10"
        minIdle="0"
        maxIdle="1"
        maxWait="30000"
        initialSize="0"

        removeAbandoned="true"
        removeAbandonedTimeout="60"

        minEvictableIdleTimeMillis="300000"

        logAbandoned="true"
/>




  </GlobalNamingResources>

  <!-- A "Service" is a collection of one or more "Connectors" that share
       a single "Container" Note:  A "Service" is not itself a "Container",
       so you may not define subcomponents such as "Valves" at this level.
       Documentation at /docs/config/service.html
   -->
  <Service name="Catalina">

    <!--The connectors can use a shared executor, you can define one or more
named thread pools-->
    <!--
    <Executor name="tomcatThreadPool" namePrefix="catalina-exec-"
        maxThreads="150" minSpareThreads="4"/>
    -->


    <!-- A "Connector" represents an endpoint by which requests are received
         and responses are returned. Documentation at :
         Java HTTP Connector: /docs/config/http.html (blocking &
non-blocking)
         Java AJP  Connector: /docs/config/ajp.html
         APR (HTTP/AJP) Connector: /docs/apr.html
         Define a non-SSL/TLS HTTP/1.1 Connector on port 8080
    -->
    <Connector port="8080" protocol="HTTP/1.1"
               connectionTimeout="20000"
               redirectPort="8443" />
    <!-- A "Connector" using the shared thread pool-->
    <!--
    <Connector executor="tomcatThreadPool"
               port="8080" protocol="HTTP/1.1"
               connectionTimeout="20000"
               redirectPort="8443" />
    -->
    <!-- Define a SSL/TLS HTTP/1.1 Connector on port 8443
         This connector uses the NIO implementation that requires the JSSE
         style configuration. When using the APR/native implementation, the
         OpenSSL style configuration is required as described in the
APR/native
         documentation -->
    <!--
    <Connector port="8443"
protocol="org.apache.coyote.http11.Http11NioProtocol"
               maxThreads="150" SSLEnabled="true" scheme="https"
secure="true"
               clientAuth="false" sslProtocol="TLS" />
    -->

    <!-- Define an AJP 1.3 Connector on port 8009 -->
    <Connector port="8009"
        protocol="AJP/1.3"
        redirectPort="8443"
  maxPostSize="16777216"
  maxSavePostSize="-1"
  connectionTimeout="120000"
  keepAliveTimeout="120000"
  acceptorThreadCount="2" maxThreads="1024" acceptCount="200"
minSpareThreads="20"

  />


    <!-- An Engine represents the entry point (within Catalina) that
processes
         every request.  The Engine implementation for Tomcat stand alone
         analyzes the HTTP headers included with the request, and passes
them
         on to the appropriate Host (virtual host).
         Documentation at /docs/config/engine.html -->

    <!-- You should set jvmRoute to support load-balancing via AJP ie :
    <Engine name="Catalina2" defaultHost="localhost" jvmRoute="jvm1">
    -->
    <Engine name="Catalina" defaultHost="localhost" jvmRoute="tomcatnode3">

<Cluster className="org.apache.catalina.ha.tcp.SimpleTcpCluster"
                 channelSendOptions="10" channelStartOptions="3">

          <Manager className="org.apache.catalina.ha.session.BackupManager"
                   expireSessionsOnShutdown="false"
                   notifyListenersOnReplication="true"
                   mapSendOptions="10"
                   />

          <Channel
className="org.apache.catalina.tribes.group.GroupChannel">
            <Membership
className="org.apache.catalina.tribes.membership.McastService"
                        address="228.1.0.13"
                        port="45522"
                        frequency="500"
                        dropTime="15000"
                        soTimeout="10000"
                                                domain="mercer"/>
            <Receiver
className="org.apache.catalina.tribes.transport.nio.NioReceiver"
                      address="10.255.250.34"
                      port="4003"
                      selectorTimeout="8000"
                      maxThreads="25"
                                          />

            <Sender
className="org.apache.catalina.tribes.transport.ReplicationTransmitter">
              <Transport
className="org.apache.catalina.tribes.transport.nio.PooledParallelSender"
                timeout="8000" poolSize="25"
                />
            </Sender>
            <Interceptor
className="org.apache.catalina.tribes.group.interceptors.TcpPingInterceptor"/>
            <Interceptor
className="org.apache.catalina.tribes.group.interceptors.TcpFailureDetector"/>
            <Interceptor
className="org.apache.catalina.tribes.group.interceptors.MessageDispatch15Interceptor"/>


     <Interceptor
className="org.apache.catalina.tribes.group.interceptors.StaticMembershipInterceptor">

            <LocalMember
className="org.apache.catalina.tribes.membership.StaticMember"
                  domain="mercer"
                  uniqueId="{5,6,7,0,1,2,3,4,0,0,0,0,0,0,1,0}"/>

            <Member
                className="org.apache.catalina.tribes.membership.StaticMember"
                port="4001"
                host="10.255.250.35"
                domain="mercer"
                uniqueId="{5,6,7,0,1,2,3,4,0,0,0,0,0,0,0,0}"
            />

            <Member
                className="org.apache.catalina.tribes.membership.StaticMember"
                port="4002"
                host="10.255.250.35"
                domain="mercer"
                uniqueId="{5,6,7,0,1,2,3,4,0,0,0,0,0,0,0,1}"
            />

            <Member
                className="org.apache.catalina.tribes.membership.StaticMember"
                port="4004"
                host="10.255.250.34"
                domain="mercer"
                uniqueId="{5,6,7,0,1,2,3,4,0,0,0,0,0,0,1,1}"
            />

     </Interceptor>


          </Channel>

          <Valve
className="org.apache.catalina.ha.tcp.ForceReplicationValve" />
          <Valve className="org.apache.catalina.ha.tcp.ReplicationValve"
                 filter=".*javax\.faces\.resource.*"/>
          <Valve
className="org.apache.catalina.ha.session.JvmRouteBinderValve"/>

          <Deployer
className="org.apache.catalina.ha.deploy.FarmWarDeployer"
                    tempDir="/tmp/war-temp/"
                    deployDir="/tmp/war-deploy/"
                    watchDir="/tmp/war-listen/"
                    watchEnabled="false"/>

          <!--<ClusterListener
className="org.apache.catalina.ha.session.ClusterSessionListener"/>-->
        </Cluster>


      <!-- Use the LockOutRealm to prevent attempts to guess user passwords
           via a brute-force attack -->
      <Realm className="org.apache.catalina.realm.LockOutRealm">
        <!-- This Realm uses the UserDatabase configured in the global JNDI
             resources under the key "UserDatabase".  Any edits
             that are performed against this UserDatabase are immediately
             available for use by the Realm.  -->
        <Realm className="org.apache.catalina.realm.UserDatabaseRealm"
               resourceName="UserDatabase"/>
      </Realm>

      <Host name="localhost"  appBase="webapps"
            unpackWARs="true" autoDeploy="true">

        <!-- SingleSignOn valve, share authentication between web
applications
             Documentation at: /docs/config/valve.html -->
        <!--
        <Valve className="org.apache.catalina.authenticator.SingleSignOn" />
        -->

        <!-- Access log processes all example.
             Documentation at: /docs/config/valve.html
             Note: The pattern used is equivalent to using
pattern="common" -->
        <Valve className="org.apache.catalina.valves.AccessLogValve"
directory="logs"
               prefix="localhost_access_log" suffix=".txt"
               pattern="%h %l %u %t &quot;%r&quot; %s %b" />
        <Context docBase="C:/tomcatnode3/logs" path="/tomcatnode3_logs"
crossContext="false" debug="0" reloadable="true" privileged="true" />
      </Host>
    </Engine>
  </Service>
</Server>
-----end server.xml-------

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: [OT] Tomcat Clustering Support

Christopher Schultz-2
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA256

Scott,

I'm no Tomcat-clustering expert, but...

On 8/28/18 13:59, Scott Evans wrote:

> <Cluster className="org.apache.catalina.ha.tcp.SimpleTcpCluster"
> channelSendOptions="10" channelStartOptions="3">
>
> <Manager className="org.apache.catalina.ha.session.BackupManager"
> expireSessionsOnShutdown="false"
> notifyListenersOnReplication="true" mapSendOptions="10" />
>
> <Channel
> className="org.apache.catalina.tribes.group.GroupChannel">
> <Membership
> className="org.apache.catalina.tribes.membership.McastService"
> address="228.1.0.13" port="45522" frequency="500" dropTime="15000"
> soTimeout="10000" domain="mercer"/> <Receiver
> className="org.apache.catalina.tribes.transport.nio.NioReceiver"
> address="10.255.250.34" port="4003" selectorTimeout="8000"
> maxThreads="25" />
>
> <Sender
> className="org.apache.catalina.tribes.transport.ReplicationTransmitter
">
>
>
<Transport
> className="org.apache.catalina.tribes.transport.nio.PooledParallelSend
er"
>
>
timeout="8000" poolSize="25"
> /> </Sender> <Interceptor
> className="org.apache.catalina.tribes.group.interceptors.TcpPingInterc
eptor"/>
>
>
<Interceptor
> className="org.apache.catalina.tribes.group.interceptors.TcpFailureDet
ector"/>
>
>
<Interceptor
> className="org.apache.catalina.tribes.group.interceptors.MessageDispat
ch15Interceptor"/>
>
>
>
> <Interceptor
> className="org.apache.catalina.tribes.group.interceptors.StaticMembers
hipInterceptor">

>
>  <LocalMember
> className="org.apache.catalina.tribes.membership.StaticMember"
> domain="mercer" uniqueId="{5,6,7,0,1,2,3,4,0,0,0,0,0,0,1,0}"/>
>
> <Member
> className="org.apache.catalina.tribes.membership.StaticMember"
> port="4001" host="10.255.250.35" domain="mercer"
> uniqueId="{5,6,7,0,1,2,3,4,0,0,0,0,0,0,0,0}" />
>
> <Member
> className="org.apache.catalina.tribes.membership.StaticMember"
> port="4002" host="10.255.250.35" domain="mercer"
> uniqueId="{5,6,7,0,1,2,3,4,0,0,0,0,0,0,0,1}" />
>
> <Member
> className="org.apache.catalina.tribes.membership.StaticMember"
> port="4004" host="10.255.250.34" domain="mercer"
> uniqueId="{5,6,7,0,1,2,3,4,0,0,0,0,0,0,1,1}" />
>
> </Interceptor>
>
>
> </Channel>
>
> <Valve className="org.apache.catalina.ha.tcp.ForceReplicationValve"
> /> <Valve className="org.apache.catalina.ha.tcp.ReplicationValve"
> filter=".*javax\.faces\.resource.*"/> <Valve
> className="org.apache.catalina.ha.session.JvmRouteBinderValve"/>
>
> <Deployer
> className="org.apache.catalina.ha.deploy.FarmWarDeployer"
> tempDir="/tmp/war-temp/" deployDir="/tmp/war-deploy/"
> watchDir="/tmp/war-listen/" watchEnabled="false"/>
>
> <!--<ClusterListener
> className="org.apache.catalina.ha.session.ClusterSessionListener"/>-->
>
>
</Cluster>

It looks like you have both multicast AND static membership enabled.

Keichi's presentation on Clustering at ApacheCon Miami (2017) has a
slide (it's slide #38 here:
https://events.static.linuxfound.org/sites/events/files/slides/TomcatClu
ster_3.pdf)
that says that using static-membership requires that you disable
multicast.

Also, just confirming that you have two Tomcat nodes on one IP address
(10.255.250.35, ports 4001 and 4002).

Can you post a thread dump of a deadlock situation? Only the
deadlocked threads should really be necessary to post. Can you
replicate the deadlock without using your own full application? That
is, can you create a simple application that can be used to reproduce
this on a similarly-configured test instance (cluster) of Tomcat nodes?

- -chris
-----BEGIN PGP SIGNATURE-----
Comment: Using GnuPG with Thunderbird - https://www.enigmail.net/

iQIzBAEBCAAdFiEEMmKgYcQvxMe7tcJcHPApP6U8pFgFAluFo/MACgkQHPApP6U8
pFhG4xAAkUb/Zj9HtwRle8xypc8hrmLfiifo9acIKbb1H3k/2VfYW3EjGqVRzV6c
E5iGf3JFlnsDEsMIi/oSTObe/aJ15y6z1qfCpud1BRSvi1yHr8jf6W+/M4/QcMNk
JerBmsx8dgoLteVq34xEld678NftgufaHpd3z5y3HnqfX0MoJkCOaYH5lUbA5MpI
61vEngWnWsLvFyTcf+h9PnkxsH5CdA0A9Hjsg56MESAyGoEZ1Jx1MkrIooFLOHVx
sgxciUIosQy5wqIbpZrZMteB1T6gFSvVsoTCu2ogubJUU216xt3XEezVtksL9Kfc
+1GbaDeMb65W6GlUU9W61TPb4Id/2mcQ2oUyQERctvIib7GoTcpLJFSHkKlp81GL
vS3L4siQkSv1M6pIvAtnAJEVPogBgYJXnSVOObpGAmyaDkJt8k1OSCDWqVPmLfUm
mIlhDGBtngxl0pEM1juLFC2ulaOGS8Vjn5VGZgXDXZVQ6xVmqBDfl9o6x+IB+KDT
beOGXQKveI18K0qPjxfVtF9OyVgfeLoOzVw2AXAD8QBXorWPlEt53sbInv2r/a3l
UOKGvxxGpeqmzAtEwm0GxrJsDrfJ2tTp0eIDA94n7d3tuG+zoOgOFaMxXcryieyj
XXl+4+DjD7YxVAXNfUslP7eYglHh1SdJVc8/MwlH0g0fARY74/o=
=eKZv
-----END PGP SIGNATURE-----

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: [OT] Tomcat Clustering Support

jeffery.scott.crump
Never mind. It's visible again.

On Tue, Aug 28, 2018, 2:35 PM Christopher Schultz <
[hidden email]> wrote:

> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA256
>
> Scott,
>
> I'm no Tomcat-clustering expert, but...
>
> On 8/28/18 13:59, Scott Evans wrote:
> > <Cluster className="org.apache.catalina.ha.tcp.SimpleTcpCluster"
> > channelSendOptions="10" channelStartOptions="3">
> >
> > <Manager className="org.apache.catalina.ha.session.BackupManager"
> > expireSessionsOnShutdown="false"
> > notifyListenersOnReplication="true" mapSendOptions="10" />
> >
> > <Channel
> > className="org.apache.catalina.tribes.group.GroupChannel">
> > <Membership
> > className="org.apache.catalina.tribes.membership.McastService"
> > address="228.1.0.13" port="45522" frequency="500" dropTime="15000"
> > soTimeout="10000" domain="mercer"/> <Receiver
> > className="org.apache.catalina.tribes.transport.nio.NioReceiver"
> > address="10.255.250.34" port="4003" selectorTimeout="8000"
> > maxThreads="25" />
> >
> > <Sender
> > className="org.apache.catalina.tribes.transport.ReplicationTransmitter
> ">
> >
> >
> <Transport
> > className="org.apache.catalina.tribes.transport.nio.PooledParallelSend
> er"
> >
> >
> timeout="8000" poolSize="25"
> > /> </Sender> <Interceptor
> > className="org.apache.catalina.tribes.group.interceptors.TcpPingInterc
> eptor"/>
> >
> >
> <Interceptor
> > className="org.apache.catalina.tribes.group.interceptors.TcpFailureDet
> ector"/>
> >
> >
> <Interceptor
> > className="org.apache.catalina.tribes.group.interceptors.MessageDispat
> ch15Interceptor"/>
> >
> >
> >
> > <Interceptor
> > className="org.apache.catalina.tribes.group.interceptors.StaticMembers
> hipInterceptor">
> >
> >  <LocalMember
> > className="org.apache.catalina.tribes.membership.StaticMember"
> > domain="mercer" uniqueId="{5,6,7,0,1,2,3,4,0,0,0,0,0,0,1,0}"/>
> >
> > <Member
> > className="org.apache.catalina.tribes.membership.StaticMember"
> > port="4001" host="10.255.250.35" domain="mercer"
> > uniqueId="{5,6,7,0,1,2,3,4,0,0,0,0,0,0,0,0}" />
> >
> > <Member
> > className="org.apache.catalina.tribes.membership.StaticMember"
> > port="4002" host="10.255.250.35" domain="mercer"
> > uniqueId="{5,6,7,0,1,2,3,4,0,0,0,0,0,0,0,1}" />
> >
> > <Member
> > className="org.apache.catalina.tribes.membership.StaticMember"
> > port="4004" host="10.255.250.34" domain="mercer"
> > uniqueId="{5,6,7,0,1,2,3,4,0,0,0,0,0,0,1,1}" />
> >
> > </Interceptor>
> >
> >
> > </Channel>
> >
> > <Valve className="org.apache.catalina.ha.tcp.ForceReplicationValve"
> > /> <Valve className="org.apache.catalina.ha.tcp.ReplicationValve"
> > filter=".*javax\.faces\.resource.*"/> <Valve
> > className="org.apache.catalina.ha.session.JvmRouteBinderValve"/>
> >
> > <Deployer
> > className="org.apache.catalina.ha.deploy.FarmWarDeployer"
> > tempDir="/tmp/war-temp/" deployDir="/tmp/war-deploy/"
> > watchDir="/tmp/war-listen/" watchEnabled="false"/>
> >
> > <!--<ClusterListener
> > className="org.apache.catalina.ha.session.ClusterSessionListener"/>-->
> >
> >
> </Cluster>
>
> It looks like you have both multicast AND static membership enabled.
>
> Keichi's presentation on Clustering at ApacheCon Miami (2017) has a
> slide (it's slide #38 here:
> https://events.static.linuxfound.org/sites/events/files/slides/TomcatClu
> ster_3.pdf
> <https://events.static.linuxfound.org/sites/events/files/slides/TomcatCluster_3.pdf>
> )
> that says that using static-membership requires that you disable
> multicast.
>
> Also, just confirming that you have two Tomcat nodes on one IP address
> (10.255.250.35, ports 4001 and 4002).
>
> Can you post a thread dump of a deadlock situation? Only the
> deadlocked threads should really be necessary to post. Can you
> replicate the deadlock without using your own full application? That
> is, can you create a simple application that can be used to reproduce
> this on a similarly-configured test instance (cluster) of Tomcat nodes?
>
> - -chris
> -----BEGIN PGP SIGNATURE-----
> Comment: Using GnuPG with Thunderbird - https://www.enigmail.net/
>
> iQIzBAEBCAAdFiEEMmKgYcQvxMe7tcJcHPApP6U8pFgFAluFo/MACgkQHPApP6U8
> pFhG4xAAkUb/Zj9HtwRle8xypc8hrmLfiifo9acIKbb1H3k/2VfYW3EjGqVRzV6c
> E5iGf3JFlnsDEsMIi/oSTObe/aJ15y6z1qfCpud1BRSvi1yHr8jf6W+/M4/QcMNk
> JerBmsx8dgoLteVq34xEld678NftgufaHpd3z5y3HnqfX0MoJkCOaYH5lUbA5MpI
> 61vEngWnWsLvFyTcf+h9PnkxsH5CdA0A9Hjsg56MESAyGoEZ1Jx1MkrIooFLOHVx
> sgxciUIosQy5wqIbpZrZMteB1T6gFSvVsoTCu2ogubJUU216xt3XEezVtksL9Kfc
> +1GbaDeMb65W6GlUU9W61TPb4Id/2mcQ2oUyQERctvIib7GoTcpLJFSHkKlp81GL
> vS3L4siQkSv1M6pIvAtnAJEVPogBgYJXnSVOObpGAmyaDkJt8k1OSCDWqVPmLfUm
> mIlhDGBtngxl0pEM1juLFC2ulaOGS8Vjn5VGZgXDXZVQ6xVmqBDfl9o6x+IB+KDT
> beOGXQKveI18K0qPjxfVtF9OyVgfeLoOzVw2AXAD8QBXorWPlEt53sbInv2r/a3l
> UOKGvxxGpeqmzAtEwm0GxrJsDrfJ2tTp0eIDA94n7d3tuG+zoOgOFaMxXcryieyj
> XXl+4+DjD7YxVAXNfUslP7eYglHh1SdJVc8/MwlH0g0fARY74/o=
> =eKZv
> -----END PGP SIGNATURE-----
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>
>
Reply | Threaded
Open this post in threaded view
|

RE: [OT] Tomcat Clustering Support

Scott Evans
In reply to this post by Christopher Schultz-2
-----Original Message-----
From: Christopher Schultz <[hidden email]>
Sent: Tuesday, August 28, 2018 3:35 PM
To: [hidden email]
Subject: Re: [OT] Tomcat Clustering Support

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA256

Scott,

I'm no Tomcat-clustering expert, but...

On 8/28/18 13:59, Scott Evans wrote:

> <Cluster className="org.apache.catalina.ha.tcp.SimpleTcpCluster"
> channelSendOptions="10" channelStartOptions="3">
>
> <Manager className="org.apache.catalina.ha.session.BackupManager"
> expireSessionsOnShutdown="false"
> notifyListenersOnReplication="true" mapSendOptions="10" />
>
> <Channel
> className="org.apache.catalina.tribes.group.GroupChannel">
> <Membership
> className="org.apache.catalina.tribes.membership.McastService"
> address="228.1.0.13" port="45522" frequency="500" dropTime="15000"
> soTimeout="10000" domain="mercer"/> <Receiver
> className="org.apache.catalina.tribes.transport.nio.NioReceiver"
> address="10.255.250.34" port="4003" selectorTimeout="8000"
> maxThreads="25" />
>
> <Sender
> className="org.apache.catalina.tribes.transport.ReplicationTransmitter
">
>
>
<Transport
> className="org.apache.catalina.tribes.transport.nio.PooledParallelSend
er"
>
>
timeout="8000" poolSize="25"
> /> </Sender> <Interceptor
> className="org.apache.catalina.tribes.group.interceptors.TcpPingInterc
eptor"/>
>
>
<Interceptor
> className="org.apache.catalina.tribes.group.interceptors.TcpFailureDet
ector"/>
>
>
<Interceptor
> className="org.apache.catalina.tribes.group.interceptors.MessageDispat
ch15Interceptor"/>
>
>
>
> <Interceptor
> className="org.apache.catalina.tribes.group.interceptors.StaticMembers
hipInterceptor">

>
>  <LocalMember
> className="org.apache.catalina.tribes.membership.StaticMember"
> domain="mercer" uniqueId="{5,6,7,0,1,2,3,4,0,0,0,0,0,0,1,0}"/>
>
> <Member
> className="org.apache.catalina.tribes.membership.StaticMember"
> port="4001" host="10.255.250.35" domain="mercer"
> uniqueId="{5,6,7,0,1,2,3,4,0,0,0,0,0,0,0,0}" />
>
> <Member
> className="org.apache.catalina.tribes.membership.StaticMember"
> port="4002" host="10.255.250.35" domain="mercer"
> uniqueId="{5,6,7,0,1,2,3,4,0,0,0,0,0,0,0,1}" />
>
> <Member
> className="org.apache.catalina.tribes.membership.StaticMember"
> port="4004" host="10.255.250.34" domain="mercer"
> uniqueId="{5,6,7,0,1,2,3,4,0,0,0,0,0,0,1,1}" />
>
> </Interceptor>
>
>
> </Channel>
>
> <Valve className="org.apache.catalina.ha.tcp.ForceReplicationValve"
> /> <Valve className="org.apache.catalina.ha.tcp.ReplicationValve"
> filter=".*javax\.faces\.resource.*"/> <Valve
> className="org.apache.catalina.ha.session.JvmRouteBinderValve"/>
>
> <Deployer
> className="org.apache.catalina.ha.deploy.FarmWarDeployer"
> tempDir="/tmp/war-temp/" deployDir="/tmp/war-deploy/"
> watchDir="/tmp/war-listen/" watchEnabled="false"/>
>
> <!--<ClusterListener
> className="org.apache.catalina.ha.session.ClusterSessionListener"/>-->
>
>
</Cluster>

It looks like you have both multicast AND static membership enabled.

Keichi's presentation on Clustering at ApacheCon Miami (2017) has a slide
(it's slide #38 here:
https://events.static.linuxfound.org/sites/events/files/slides/TomcatClu
ster_3.pdf)
that says that using static-membership requires that you disable multicast.

Also, just confirming that you have two Tomcat nodes on one IP address
(10.255.250.35, ports 4001 and 4002).

Can you post a thread dump of a deadlock situation? Only the deadlocked
threads should really be necessary to post. Can you replicate the deadlock
without using your own full application? That is, can you create a simple
application that can be used to reproduce this on a similarly-configured
test instance (cluster) of Tomcat nodes?

- -chris
-----BEGIN PGP SIGNATURE-----
Comment: Using GnuPG with Thunderbird - https://www.enigmail.net/

iQIzBAEBCAAdFiEEMmKgYcQvxMe7tcJcHPApP6U8pFgFAluFo/MACgkQHPApP6U8
pFhG4xAAkUb/Zj9HtwRle8xypc8hrmLfiifo9acIKbb1H3k/2VfYW3EjGqVRzV6c
E5iGf3JFlnsDEsMIi/oSTObe/aJ15y6z1qfCpud1BRSvi1yHr8jf6W+/M4/QcMNk
JerBmsx8dgoLteVq34xEld678NftgufaHpd3z5y3HnqfX0MoJkCOaYH5lUbA5MpI
61vEngWnWsLvFyTcf+h9PnkxsH5CdA0A9Hjsg56MESAyGoEZ1Jx1MkrIooFLOHVx
sgxciUIosQy5wqIbpZrZMteB1T6gFSvVsoTCu2ogubJUU216xt3XEezVtksL9Kfc
+1GbaDeMb65W6GlUU9W61TPb4Id/2mcQ2oUyQERctvIib7GoTcpLJFSHkKlp81GL
vS3L4siQkSv1M6pIvAtnAJEVPogBgYJXnSVOObpGAmyaDkJt8k1OSCDWqVPmLfUm
mIlhDGBtngxl0pEM1juLFC2ulaOGS8Vjn5VGZgXDXZVQ6xVmqBDfl9o6x+IB+KDT
beOGXQKveI18K0qPjxfVtF9OyVgfeLoOzVw2AXAD8QBXorWPlEt53sbInv2r/a3l
UOKGvxxGpeqmzAtEwm0GxrJsDrfJ2tTp0eIDA94n7d3tuG+zoOgOFaMxXcryieyj
XXl+4+DjD7YxVAXNfUslP7eYglHh1SdJVc8/MwlH0g0fARY74/o=
=eKZv
-----END PGP SIGNATURE-----

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Hi Chris,

Just getting back to this.  Thanks for the info on static membership and
multicast, we may give that a try.  We've also finally been able to get a
thread dump of a recent deadlock situation that occurred which is good news.

To answer your other questions, you are correct that two Nodes are on one IP
and we do not have a way to create a simple program to reproduce the issue.
It is just randomly occurring, though very infrequently.

For next steps, due to the sensitive nature and voluminous amount of the
data contained in the thread dump, we would ask that we somehow take this
offline so it's not shared on the forums.  Please let me know what our
options would be for that.

Thanks

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

RE: [OT] Tomcat Clustering Support

Scott Evans
In reply to this post by Christopher Schultz-2
-----Original Message-----
From: Scott Evans <[hidden email]>
Sent: Monday, September 17, 2018 10:57 AM
To: 'Tomcat Users List' <[hidden email]>; '[hidden email]'
<[hidden email]>
Subject: RE: [OT] Tomcat Clustering Support


-----Original Message-----
From: Christopher Schultz <[hidden email]>
Sent: Tuesday, August 28, 2018 3:35 PM
To: [hidden email]
Subject: Re: [OT] Tomcat Clustering Support

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA256

Scott,

I'm no Tomcat-clustering expert, but...

On 8/28/18 13:59, Scott Evans wrote:

> <Cluster className="org.apache.catalina.ha.tcp.SimpleTcpCluster"
> channelSendOptions="10" channelStartOptions="3">
>
> <Manager className="org.apache.catalina.ha.session.BackupManager"
> expireSessionsOnShutdown="false"
> notifyListenersOnReplication="true" mapSendOptions="10" />
>
> <Channel
> className="org.apache.catalina.tribes.group.GroupChannel">
> <Membership
> className="org.apache.catalina.tribes.membership.McastService"
> address="228.1.0.13" port="45522" frequency="500" dropTime="15000"
> soTimeout="10000" domain="mercer"/> <Receiver
> className="org.apache.catalina.tribes.transport.nio.NioReceiver"
> address="10.255.250.34" port="4003" selectorTimeout="8000"
> maxThreads="25" />
>
> <Sender
> className="org.apache.catalina.tribes.transport.ReplicationTransmitter
">
>
>
<Transport
> className="org.apache.catalina.tribes.transport.nio.PooledParallelSend
er"
>
>
timeout="8000" poolSize="25"
> /> </Sender> <Interceptor
> className="org.apache.catalina.tribes.group.interceptors.TcpPingInterc
eptor"/>
>
>
<Interceptor
> className="org.apache.catalina.tribes.group.interceptors.TcpFailureDet
ector"/>
>
>
<Interceptor
> className="org.apache.catalina.tribes.group.interceptors.MessageDispat
ch15Interceptor"/>
>
>
>
> <Interceptor
> className="org.apache.catalina.tribes.group.interceptors.StaticMembers
hipInterceptor">

>
>  <LocalMember
> className="org.apache.catalina.tribes.membership.StaticMember"
> domain="mercer" uniqueId="{5,6,7,0,1,2,3,4,0,0,0,0,0,0,1,0}"/>
>
> <Member
> className="org.apache.catalina.tribes.membership.StaticMember"
> port="4001" host="10.255.250.35" domain="mercer"
> uniqueId="{5,6,7,0,1,2,3,4,0,0,0,0,0,0,0,0}" />
>
> <Member
> className="org.apache.catalina.tribes.membership.StaticMember"
> port="4002" host="10.255.250.35" domain="mercer"
> uniqueId="{5,6,7,0,1,2,3,4,0,0,0,0,0,0,0,1}" />
>
> <Member
> className="org.apache.catalina.tribes.membership.StaticMember"
> port="4004" host="10.255.250.34" domain="mercer"
> uniqueId="{5,6,7,0,1,2,3,4,0,0,0,0,0,0,1,1}" />
>
> </Interceptor>
>
>
> </Channel>
>
> <Valve className="org.apache.catalina.ha.tcp.ForceReplicationValve"
> /> <Valve className="org.apache.catalina.ha.tcp.ReplicationValve"
> filter=".*javax\.faces\.resource.*"/> <Valve
> className="org.apache.catalina.ha.session.JvmRouteBinderValve"/>
>
> <Deployer
> className="org.apache.catalina.ha.deploy.FarmWarDeployer"
> tempDir="/tmp/war-temp/" deployDir="/tmp/war-deploy/"
> watchDir="/tmp/war-listen/" watchEnabled="false"/>
>
> <!--<ClusterListener
> className="org.apache.catalina.ha.session.ClusterSessionListener"/>-->
>
>
</Cluster>

It looks like you have both multicast AND static membership enabled.

Keichi's presentation on Clustering at ApacheCon Miami (2017) has a slide
(it's slide #38 here:
https://events.static.linuxfound.org/sites/events/files/slides/TomcatClu
ster_3.pdf)
that says that using static-membership requires that you disable multicast.

Also, just confirming that you have two Tomcat nodes on one IP address
(10.255.250.35, ports 4001 and 4002).

Can you post a thread dump of a deadlock situation? Only the deadlocked
threads should really be necessary to post. Can you replicate the deadlock
without using your own full application? That is, can you create a simple
application that can be used to reproduce this on a similarly-configured
test instance (cluster) of Tomcat nodes?

- -chris
-----BEGIN PGP SIGNATURE-----
Comment: Using GnuPG with Thunderbird - https://www.enigmail.net/

iQIzBAEBCAAdFiEEMmKgYcQvxMe7tcJcHPApP6U8pFgFAluFo/MACgkQHPApP6U8
pFhG4xAAkUb/Zj9HtwRle8xypc8hrmLfiifo9acIKbb1H3k/2VfYW3EjGqVRzV6c
E5iGf3JFlnsDEsMIi/oSTObe/aJ15y6z1qfCpud1BRSvi1yHr8jf6W+/M4/QcMNk
JerBmsx8dgoLteVq34xEld678NftgufaHpd3z5y3HnqfX0MoJkCOaYH5lUbA5MpI
61vEngWnWsLvFyTcf+h9PnkxsH5CdA0A9Hjsg56MESAyGoEZ1Jx1MkrIooFLOHVx
sgxciUIosQy5wqIbpZrZMteB1T6gFSvVsoTCu2ogubJUU216xt3XEezVtksL9Kfc
+1GbaDeMb65W6GlUU9W61TPb4Id/2mcQ2oUyQERctvIib7GoTcpLJFSHkKlp81GL
vS3L4siQkSv1M6pIvAtnAJEVPogBgYJXnSVOObpGAmyaDkJt8k1OSCDWqVPmLfUm
mIlhDGBtngxl0pEM1juLFC2ulaOGS8Vjn5VGZgXDXZVQ6xVmqBDfl9o6x+IB+KDT
beOGXQKveI18K0qPjxfVtF9OyVgfeLoOzVw2AXAD8QBXorWPlEt53sbInv2r/a3l
UOKGvxxGpeqmzAtEwm0GxrJsDrfJ2tTp0eIDA94n7d3tuG+zoOgOFaMxXcryieyj
XXl+4+DjD7YxVAXNfUslP7eYglHh1SdJVc8/MwlH0g0fARY74/o=
=eKZv
-----END PGP SIGNATURE-----

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Hi Chris,

Just getting back to this.  Thanks for the info on static membership and
multicast, we may give that a try.  We've also finally been able to get a
thread dump of a recent deadlock situation that occurred which is good news.

To answer your other questions, you are correct that two Nodes are on one IP
and we do not have a way to create a simple program to reproduce the issue.
It is just randomly occurring, though very infrequently.

For next steps, due to the sensitive nature and voluminous amount of the
data contained in the thread dump, we would ask that we somehow take this
offline so it's not shared on the forums.  Please let me know what our
options would be for that.

Thanks

--------------------------

Hi, just checking back in.  Is there anyone who is willing to look at our
thread dump on an individual basis to see what may be causing the deadlocks?
We would rather not share it here since it contains sensitive information,
thanks.

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

RE: [OT] Tomcat Clustering Support

markt
On September 20, 2018 2:26:36 PM UTC, Scott Evans <[hidden email]> wrote:

>-----Original Message-----
>From: Scott Evans <[hidden email]>
>Sent: Monday, September 17, 2018 10:57 AM
>To: 'Tomcat Users List' <[hidden email]>;
>'[hidden email]'
><[hidden email]>
>Subject: RE: [OT] Tomcat Clustering Support
>
>
>-----Original Message-----
>From: Christopher Schultz <[hidden email]>
>Sent: Tuesday, August 28, 2018 3:35 PM
>To: [hidden email]
>Subject: Re: [OT] Tomcat Clustering Support
>
>-----BEGIN PGP SIGNED MESSAGE-----
>Hash: SHA256
>
>Scott,
>
>I'm no Tomcat-clustering expert, but...
>
>On 8/28/18 13:59, Scott Evans wrote:
>> <Cluster className="org.apache.catalina.ha.tcp.SimpleTcpCluster"
>> channelSendOptions="10" channelStartOptions="3">
>>
>> <Manager className="org.apache.catalina.ha.session.BackupManager"
>> expireSessionsOnShutdown="false"
>> notifyListenersOnReplication="true" mapSendOptions="10" />
>>
>> <Channel
>> className="org.apache.catalina.tribes.group.GroupChannel">
>> <Membership
>> className="org.apache.catalina.tribes.membership.McastService"
>> address="228.1.0.13" port="45522" frequency="500" dropTime="15000"
>> soTimeout="10000" domain="mercer"/> <Receiver
>> className="org.apache.catalina.tribes.transport.nio.NioReceiver"
>> address="10.255.250.34" port="4003" selectorTimeout="8000"
>> maxThreads="25" />
>>
>> <Sender
>>
>className="org.apache.catalina.tribes.transport.ReplicationTransmitter
>">
>>
>>
><Transport
>>
>className="org.apache.catalina.tribes.transport.nio.PooledParallelSend
>er"
>>
>>
>timeout="8000" poolSize="25"
>> /> </Sender> <Interceptor
>>
>className="org.apache.catalina.tribes.group.interceptors.TcpPingInterc
>eptor"/>
>>
>>
><Interceptor
>>
>className="org.apache.catalina.tribes.group.interceptors.TcpFailureDet
>ector"/>
>>
>>
><Interceptor
>>
>className="org.apache.catalina.tribes.group.interceptors.MessageDispat
>ch15Interceptor"/>
>>
>>
>>
>> <Interceptor
>>
>className="org.apache.catalina.tribes.group.interceptors.StaticMembers
>hipInterceptor">
>>
>>  <LocalMember
>> className="org.apache.catalina.tribes.membership.StaticMember"
>> domain="mercer" uniqueId="{5,6,7,0,1,2,3,4,0,0,0,0,0,0,1,0}"/>
>>
>> <Member
>> className="org.apache.catalina.tribes.membership.StaticMember"
>> port="4001" host="10.255.250.35" domain="mercer"
>> uniqueId="{5,6,7,0,1,2,3,4,0,0,0,0,0,0,0,0}" />
>>
>> <Member
>> className="org.apache.catalina.tribes.membership.StaticMember"
>> port="4002" host="10.255.250.35" domain="mercer"
>> uniqueId="{5,6,7,0,1,2,3,4,0,0,0,0,0,0,0,1}" />
>>
>> <Member
>> className="org.apache.catalina.tribes.membership.StaticMember"
>> port="4004" host="10.255.250.34" domain="mercer"
>> uniqueId="{5,6,7,0,1,2,3,4,0,0,0,0,0,0,1,1}" />
>>
>> </Interceptor>
>>
>>
>> </Channel>
>>
>> <Valve className="org.apache.catalina.ha.tcp.ForceReplicationValve"
>> /> <Valve className="org.apache.catalina.ha.tcp.ReplicationValve"
>> filter=".*javax\.faces\.resource.*"/> <Valve
>> className="org.apache.catalina.ha.session.JvmRouteBinderValve"/>
>>
>> <Deployer
>> className="org.apache.catalina.ha.deploy.FarmWarDeployer"
>> tempDir="/tmp/war-temp/" deployDir="/tmp/war-deploy/"
>> watchDir="/tmp/war-listen/" watchEnabled="false"/>
>>
>> <!--<ClusterListener
>>
>className="org.apache.catalina.ha.session.ClusterSessionListener"/>-->
>>
>>
></Cluster>
>
>It looks like you have both multicast AND static membership enabled.
>
>Keichi's presentation on Clustering at ApacheCon Miami (2017) has a
>slide
>(it's slide #38 here:
>https://events.static.linuxfound.org/sites/events/files/slides/TomcatClu
>ster_3.pdf)
>that says that using static-membership requires that you disable
>multicast.
>
>Also, just confirming that you have two Tomcat nodes on one IP address
>(10.255.250.35, ports 4001 and 4002).
>
>Can you post a thread dump of a deadlock situation? Only the deadlocked
>threads should really be necessary to post. Can you replicate the
>deadlock
>without using your own full application? That is, can you create a
>simple
>application that can be used to reproduce this on a
>similarly-configured
>test instance (cluster) of Tomcat nodes?
>
>- -chris
>-----BEGIN PGP SIGNATURE-----
>Comment: Using GnuPG with Thunderbird - https://www.enigmail.net/
>
>iQIzBAEBCAAdFiEEMmKgYcQvxMe7tcJcHPApP6U8pFgFAluFo/MACgkQHPApP6U8
>pFhG4xAAkUb/Zj9HtwRle8xypc8hrmLfiifo9acIKbb1H3k/2VfYW3EjGqVRzV6c
>E5iGf3JFlnsDEsMIi/oSTObe/aJ15y6z1qfCpud1BRSvi1yHr8jf6W+/M4/QcMNk
>JerBmsx8dgoLteVq34xEld678NftgufaHpd3z5y3HnqfX0MoJkCOaYH5lUbA5MpI
>61vEngWnWsLvFyTcf+h9PnkxsH5CdA0A9Hjsg56MESAyGoEZ1Jx1MkrIooFLOHVx
>sgxciUIosQy5wqIbpZrZMteB1T6gFSvVsoTCu2ogubJUU216xt3XEezVtksL9Kfc
>+1GbaDeMb65W6GlUU9W61TPb4Id/2mcQ2oUyQERctvIib7GoTcpLJFSHkKlp81GL
>vS3L4siQkSv1M6pIvAtnAJEVPogBgYJXnSVOObpGAmyaDkJt8k1OSCDWqVPmLfUm
>mIlhDGBtngxl0pEM1juLFC2ulaOGS8Vjn5VGZgXDXZVQ6xVmqBDfl9o6x+IB+KDT
>beOGXQKveI18K0qPjxfVtF9OyVgfeLoOzVw2AXAD8QBXorWPlEt53sbInv2r/a3l
>UOKGvxxGpeqmzAtEwm0GxrJsDrfJ2tTp0eIDA94n7d3tuG+zoOgOFaMxXcryieyj
>XXl+4+DjD7YxVAXNfUslP7eYglHh1SdJVc8/MwlH0g0fARY74/o=
>=eKZv
>-----END PGP SIGNATURE-----
>
>---------------------------------------------------------------------
>To unsubscribe, e-mail: [hidden email]
>For additional commands, e-mail: [hidden email]
>
>Hi Chris,
>
>Just getting back to this.  Thanks for the info on static membership
>and
>multicast, we may give that a try.  We've also finally been able to get
>a
>thread dump of a recent deadlock situation that occurred which is good
>news.
>
>To answer your other questions, you are correct that two Nodes are on
>one IP
>and we do not have a way to create a simple program to reproduce the
>issue.
>It is just randomly occurring, though very infrequently.
>
>For next steps, due to the sensitive nature and voluminous amount of
>the
>data contained in the thread dump, we would ask that we somehow take
>this
>offline so it's not shared on the forums.  Please let me know what our
>options would be for that.
>
>Thanks
>
>--------------------------
>
>Hi, just checking back in.  Is there anyone who is willing to look at
>our
>thread dump on an individual basis to see what may be causing the
>deadlocks?
>We would rather not share it here since it contains sensitive
>information,
>thanks.
>
>---------------------------------------------------------------------
>To unsubscribe, e-mail: [hidden email]
>For additional commands, e-mail: [hidden email]

You can mail it direct to me if you like.

Mark

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Tomcat Clustering Support

markt
In reply to this post by markt
On 15/08/18 20:52, Mark Thomas wrote:

> On 15/08/18 20:43, Scott Evans wrote:
>> Hi,
>>
>> Our system is on Apache Tomcat Version 8.0.47.
>> OS is Windows Server 2012 R2 Datacenter.
>>
>> We are looking for someone that may be interested in paid contract work to
>> assist with troubleshooting and resolving a Tomcat clustering issue in our
>> system.
>>
>> The system is composed of multiple Java PrimeFaces applications running in
>> a clustered Tomcat environment which is experiencing occasional
>> deadlocking issues from an unknown source requiring the Nodes to be cycled
>> in order to resolve.  The issue is only occurring in our Production
>> environment and we've determined that the issues are occurring at random
>> with the replication threads.
>>
>> We would need someone to help investigate our configuration and determine
>> if there are any further changes that can be made to our system to catch
>> these deadlock issues before they occur (requiring a Node cycle).
>>
>> Please let me know if you or someone you know may be interested or if you
>> have further questions I can help answer.
>
> If you can provide a thread dump of the deadlock when it occurs we can
> probably help you here for free.

Scott provided me with a sanitised copy of the thread-dump off-line. I'm
sharing my analysis with the list (with Scott's permission) as I think
the root cause is likely to be of wider interest.

There was, indeed, a deadlock.

The issues was follows.

The application is using JSF. Specifically, the Mojarra implementation
from Oracle.

There are multiple concurrent requests for the same session.

Each request is processed by a dedicated thread (this is mandated by the
Servlet spec although it may not be expressed that way).

The threads in question are:

A. ajp-apr-8009-exec-9005
B. ajp-apr-8009-exec-9000

Thread A is in the middle of processing a request. It is evaluating some
EL which requires access to the view map which in turn causes the
ViewMap to update the session.
com.sun.faces.application.view.ViewScopeManager.processEvent locks the
ViewMap object. It then tries to update the session. To do this it
requires the session lock. Thread A is waiting for this lock.

Thread B is at the end of a request. The session has been updated and it
is attempting to write the updated session attributes to the cluster.
The session lock has been obtained. The individual attributes are being
written. The code has reached the ViewMap object. In order to write this
object, the ViewMap object must be locked. Thread B is waiting for this
lock.

So, thread A holds the lock that thread B wants and is waiting for the
lock thread B is holding. Thread B holds the lock the thread A wants and
is waiting for the lock thread A is holding. Deadlock.

This is, in essence, cause by a combination of how Tomcat's clustering
is designed and Mojarra is implemented.

The application is using the BackupManager. I assume with sticky
sessions. Therefore, I would expect session failover between nodes to be
a rare event.

My recommendation is to investigate excluding the ViewMap from the
replication via sessionAttributeNameFilter. You'd need a regular
expression that matched anything except
"com.sun.faces.application.view.activeViewMaps"
I don't know how integral this object is to Mojarra. Mojarra may simply
recreate this object if required. If not, you may need to trigger
recreation after failover. I don't know how feasible this solution is.
This will require some testing and possibly code changes.

Has anyone on the users list come across this problem before? If so, how
have you solved it? Suggestions for alternative solutions also welcome.

Mark

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

RE: Tomcat Clustering Support

Caldarale, Charles R
> From: Mark Thomas [mailto:[hidden email]]
> Subject: Re: Tomcat Clustering Support

> Thread A is in the middle of processing a request. It is evaluating some
> EL which requires access to the view map which in turn causes the
> ViewMap to update the session.
> com.sun.faces.application.view.ViewScopeManager.processEvent locks the
> ViewMap object. It then tries to update the session. To do this it
> requires the session lock. Thread A is waiting for this lock.

Assuming the ViewMap is used by multiple sessions, this locking order goes
against the usual protocol of more local before more global.  Might be
possible to file a bug report with Mojarra, but given that the code appears
to be in a com.sun class, that might not get anywhere.

> Thread B is at the end of a request. The session has been updated and it
> is attempting to write the updated session attributes to the cluster.
> The session lock has been obtained. The individual attributes are being
> written. The code has reached the ViewMap object. In order to write this
> object, the ViewMap object must be locked. Thread B is waiting for this
> lock.

This is the generally the more desirable order.

> Has anyone on the users list come across this problem before? If so, how
> have you solved it? Suggestions for alternative solutions also welcome.

Can the thread doing the session synchronization lock the session, get a
shallow copy of the attributes, unlock the session, then process the
attributes?  Not sure if that would maintain sufficient coherency.

  - Chuck


THIS COMMUNICATION MAY CONTAIN CONFIDENTIAL AND/OR OTHERWISE PROPRIETARY
MATERIAL and is thus for use only by the intended recipient. If you received
this in error, please contact the sender and delete the e-mail and its
attachments from all computers.


smime.p7s (10K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Tomcat Clustering Support

markt
On 10/10/18 23:04, Caldarale, Charles R wrote:

>> From: Mark Thomas [mailto:[hidden email]]
>> Subject: Re: Tomcat Clustering Support
>
>> Thread A is in the middle of processing a request. It is evaluating some
>> EL which requires access to the view map which in turn causes the
>> ViewMap to update the session.
>> com.sun.faces.application.view.ViewScopeManager.processEvent locks the
>> ViewMap object. It then tries to update the session. To do this it
>> requires the session lock. Thread A is waiting for this lock.
>
> Assuming the ViewMap is used by multiple sessions, this locking order goes
> against the usual protocol of more local before more global.  Might be
> possible to file a bug report with Mojarra, but given that the code appears
> to be in a com.sun class, that might not get anywhere.
>
>> Thread B is at the end of a request. The session has been updated and it
>> is attempting to write the updated session attributes to the cluster.
>> The session lock has been obtained. The individual attributes are being
>> written. The code has reached the ViewMap object. In order to write this
>> object, the ViewMap object must be locked. Thread B is waiting for this
>> lock.
>
> This is the generally the more desirable order.

I think ViewMap is per session but I haven't looked that closely at the
code.

>> Has anyone on the users list come across this problem before? If so, how
>> have you solved it? Suggestions for alternative solutions also welcome.
>
> Can the thread doing the session synchronization lock the session, get a
> shallow copy of the attributes, unlock the session, then process the
> attributes?  Not sure if that would maintain sufficient coherency.

A variation of that might work but at the possible expense of generating
rather more garbage. The changes to the session are stored in a
DeltaRequest. Currently the sequence is:
- lock session
- serialize DeltaRequest to message
- recycle DeltaRequest
- unlock session
- send message

Change that to:
- lock session
- keep reference to populated DeltaRequest
- provide session with new DeltaRequest object
- unlock session
- serialize populated DeltaRequest to message
- send message

and this deadlock should be resolved. To avoid the expense of creating a
new DeltaRequest each time, a pool of them could be used which should
minimise the garbage.

Looking at the sequence of events, I don't think this does much that is
likely to harm coherence.

If folks think this looks reasonable, I can create a BZ enhancement
request to implement it.

Mark

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Tomcat Clustering Support

markt
On 11/10/18 10:12, Mark Thomas wrote:

<snip/>

> If folks think this looks reasonable, I can create a BZ enhancement
> request to implement it.

https://bz.apache.org/bugzilla/show_bug.cgi?id=62841

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]