JVM Crash in tcnative due to concurrency/timing in HTTP/2

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
11 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

JVM Crash in tcnative due to concurrency/timing in HTTP/2

Kreuser, Peter
Hi all,

Sorry for the long text. I hope somebody can help me track down the problem I'm facing with Tomcat (8.5.15), tcnative (1.2.12), openssl (1.1.0e) and HTTP/2. JVM is zulu-8.21.0.1 (1.8.0_131-b11)

I've added (already for a long time) the logging of the ssl_cipher to the accesslog. For some time now (min. Tomcat 8.5.14), I get JVM crashes when testing my site with curl and the default HTTP/2.

server.xml

        <Valve className="org.apache.catalina.valves.AccessLogValve" directory="logs"
               prefix="localhost_access_log" suffix=".txt"
               pattern="%h %l %u %t &quot;%r&quot; %s %b %{javax.servlet.request.cipher_suite}r" />

The problem goes away if I do remove the Request-Attribute-Logging. It doesn't happen from FireFox or Safari and not with HTTP/1.1 (see below). I reverted to an older jvm in docker or the oracle jvm on macos, but that does not change it.

I figured out the relevant code and the apparently concurrent shutdown of the OpenSSLEngine in a separate thread before the RequestAttributes are processed:


a)  org.apache.tomcat.util.net.openssl.OpenSSLEngine:

   OpenSSLEngine(long sslCtx, String fallbackApplicationProtocol,
            boolean clientMode, OpenSSLSessionContext sessionContext, boolean alpn) {
        if (sslCtx == 0) {
            throw new IllegalArgumentException(sm.getString("engine.noSSLContext"));
        }
        session = new OpenSSLSession();
        destroyed = true;
        ssl = SSL.newSSL(sslCtx, !clientMode);  <<< set
        networkBIO = SSL.makeNetworkBIO(ssl);
        destroyed = false;
        this.fallbackApplicationProtocol = fallbackApplicationProtocol;
        this.clientMode = clientMode;
        this.sessionContext = sessionContext;
        this.alpn = alpn;
    }


b)  org.apache.tomcat.util.net.openssl.OpenSSLEngine:

    public synchronized void shutdown() {
        if (!destroyed) {
            destroyed = true;
            SSL.freeBIO(networkBIO);
            SSL.freeSSL(ssl);
            ssl = networkBIO = 0;

            // internal errors can cause shutdown without marking the engine closed
            isInboundDone = isOutboundDone = engineClosed = true;
        }
    }


c)  org.apache.tomcat.util.net.openssl.OpenSSLEngine$OpenSSLSession:

        public String getCipherSuite() {
            if (!handshakeFinished) {
                return INVALID_CIPHER;
            }
            if (cipher == null) {
                String c = OpenSSLCipherConfigurationParser.openSSLToJsse(SSL.getCipherForSSL(ssl));
                if (c != null) {
                    cipher = c;
                }
            }
            return cipher;
        }

HTTP/2: JNI to tcnative crashes JVM:

curl -Ik --http2  https://172.17.0.3:8843/


1.       https-openssl-nio2-8843-exec-2

-          OpenSSLEngine.<init> << initializes ssl = SSL.newSSL(sslCtx, !clientMode);


2.       https-openssl-nio2-8843-exec-4

-          StreamProcessor(AbstractProcessor).populateSslRequestAttributes()


3.       https-openssl-nio2-8843-exec-5 (sometime in between)

-          OpenSSLEngine.shutdown() << sets ssl to 0


4.       https-openssl-nio2-8843-exec-4

-          JSSESupport. getCipherSuite()

-          OpenSSLSession.getCipherSuite()
OpenSSLCipherConfigurationParser.openSSLToJsse(SSL.getCipherForSSL(ssl)) << ssl is 0
>> SIGSEGV in JNI tcnative/openssl

#
# A fatal error has been detected by the Java Runtime Environment:
#
#  SIGSEGV (0xb) at pc=0x00007f7d33b99680, pid=1, tid=0x00007f7d318fd700
#
# JRE version: OpenJDK Runtime Environment (8.0_131-b11) (build 1.8.0_131-b11)
# Java VM: OpenJDK 64-Bit Server VM (25.131-b11 mixed mode linux-amd64 compressed oops)
# Problematic frame:
# C  [libssl.so.1.1+0x32680]  SSL_get_current_cipher+0x0
#
# Core dump written. Default location: /usr/local/tomcat/core or core.1
#
# Can not save log file, dump to screen..
#
# A fatal error has been detected by the Java Runtime Environment:
#
#  SIGSEGV (0xb) at pc=0x00007f7d33b99680, pid=1, tid=0x00007f7d318fd700
#
# JRE version: OpenJDK Runtime Environment (8.0_131-b11) (build 1.8.0_131-b11)
# Java VM: OpenJDK 64-Bit Server VM (25.131-b11 mixed mode linux-amd64 compressed oops)
# Problematic frame:
# C  [libssl.so.1.1+0x32680]  SSL_get_current_cipher+0x0
#
# Core dump written. Default location: /usr/local/tomcat/core or core.1
#
# If you would like to submit a bug report, please visit:
#   http://www.azulsystems.com/support/
# The crash happened outside the Java Virtual Machine in native code.
# See problematic frame for where to report the bug.
#

---------------  T H R E A D  ---------------

Current thread (0x00007f7d8464f800):  JavaThread "https-openssl-nio2-8843-exec-4" daemon [_thread_in_native, id=45, stack(0x00007f7d317fd000,0x00007f7d318fe000)]

siginfo: si_signo: 11 (SIGSEGV), si_code: 1 (SEGV_MAPERR), si_addr: 0x0000000000000178

Registers:
RAX=0x00007f7d8a739620, RBX=0x00007f7d8464f9e0, RCX=0x00007f7d3c0cf090, RDX=0x0000000000000000
RSP=0x00007f7d318fbff8, RBP=0x00007f7d89ecba60, RSI=0x00007f7d318fc080, RDI=0x0000000000000000
R8 =0x0000000000000005, R9 =0x00007f7d3c0cf070, R10=0x000000000000032a, R11=0x00007f7d33b99680
R12=0x0000000000000000, R13=0x00007f7d32eb4c68, R14=0x00007f7d318fc098, R15=0x00007f7d8464f800
RIP=0x00007f7d33b99680, EFLAGS=0x0000000000010202, CSGSFS=0x0000000000000033, ERR=0x0000000000000004
  TRAPNO=0x000000000000000e

Top of Stack: (sp=0x00007f7d318fbff8)
0x00007f7d318fbff8:   00007f7d33def9eb 00007f7d318fc070
0x00007f7d318fc008:   00007f7d32eb4c68 00007f7d318fc070
0x00007f7d318fc018:   00007f7d7501f444 00007f7d8464f800
0x00007f7d318fc028:   00007f7d7501eec2 00007f7d318fc030
0x00007f7d318fc038:   00007f7d32eb4c68 00007f7d318fc098
0x00007f7d318fc048:   00007f7d32eb5748 0000000000000000
0x00007f7d318fc058:   00007f7d32eb4c68 0000000000000000
0x00007f7d318fc068:   00007f7d318fc090 00007f7d318fc0e0
0x00007f7d318fc078:   00007f7d75007e00 00000000cce29290
0x00007f7d318fc088:   00007f7d750161bb 0000000000000000
0x00007f7d318fc098:   00000000c35560c0 00007f7d318fc0a0
0x00007f7d318fc0a8:   00007f7d2fa298f6 00007f7d318fc0f8
0x00007f7d318fc0b8:   00007f7d2fa29f70 0000000000000000
0x00007f7d318fc0c8:   00007f7d2fa29938 00007f7d318fc090
0x00007f7d318fc0d8:   00007f7d318fc0f8 00007f7d318fc140
0x00007f7d318fc0e8:   00007f7d75007ed4 0000000000000000
0x00007f7d318fc0f8:   00000000c35560a0 00007f7d318fc100
0x00007f7d318fc108:   00007f7d32fb9c65 00007f7d318fc150
0x00007f7d318fc118:   00007f7d32fba598 0000000000000000
0x00007f7d318fc128:   00007f7d32fb9c88 00007f7d318fc0f8
0x00007f7d318fc138:   00007f7d318fc150 00007f7d318fc198
0x00007f7d318fc148:   00007f7d75007ed4 00000000c3548d88
0x00007f7d318fc158:   00007f7d318fc158 00007f7d2fa95a83
0x00007f7d318fc168:   00007f7d318fc1b0 00007f7d2fa96788
0x00007f7d318fc178:   0000000000000000 00007f7d2fa95b68
0x00007f7d318fc188:   00007f7d318fc150 00007f7d318fc1b0
0x00007f7d318fc198:   00007f7d318fc1f8 00007f7d7500807d
0x00007f7d318fc1a8:   0000000000000000 00000000c3548c28
0x00007f7d318fc1b8:   00007f7d318fc1b8 00007f7d2fa94cbc
0x00007f7d318fc1c8:   00007f7d318fc228 00007f7d2fa96788
0x00007f7d318fc1d8:   0000000000000000 00007f7d2fa94f70
0x00007f7d318fc1e8:   00007f7d318fc1b0 00007f7d318fc218

Instructions: (pc=0x00007f7d33b99680)
0x00007f7d33b99660:   48 8b 87 20 01 00 00 48 85 c0 74 0c 48 8b 00 48
0x00007f7d33b99670:   8b 40 08 c3 0f 1f 40 00 31 c0 c3 0f 1f 44 00 00
0x00007f7d33b99680:   48 8b 87 78 01 00 00 48 85 c0 74 0c 48 8b 80 d8
0x00007f7d33b99690:   00 00 00 c3 0f 1f 40 00 31 c0 c3 0f 1f 44 00 00

Register to memory mapping:

RAX=0x00007f7d8a739620: <offset 0xef4620> in /usr/lib/jvm/zulu-8-amd64/jre/lib/amd64/server/libjvm.so at 0x00007f7d89845000
RBX=0x00007f7d8464f9e0 is an unknown value
RCX=0x00007f7d3c0cf090 is an unknown value
RDX=0x0000000000000000 is an unknown value
RSP=0x00007f7d318fbff8 is pointing into the stack for thread: 0x00007f7d8464f800
RBP=0x00007f7d89ecba60: <offset 0x686a60> in /usr/lib/jvm/zulu-8-amd64/jre/lib/amd64/server/libjvm.so at 0x00007f7d89845000
RSI=0x00007f7d318fc080 is pointing into the stack for thread: 0x00007f7d8464f800
RDI=0x0000000000000000 is an unknown value
R8 =0x0000000000000005 is an unknown value
R9 =0x00007f7d3c0cf070 is an unknown value
R10=0x000000000000032a is an unknown value
R11=0x00007f7d33b99680: SSL_get_current_cipher+0 in /usr/local/lib/libssl.so.1.1 at 0x00007f7d33b67000
R12=0x0000000000000000 is an unknown value
R13={method} {0x00007f7d32eb4c68} 'getCipherForSSL' '(J)Ljava/lang/String;' in 'org/apache/tomcat/jni/SSL'
R14=0x00007f7d318fc098 is pointing into the stack for thread: 0x00007f7d8464f800
R15=0x00007f7d8464f800 is a thread


Stack: [0x00007f7d317fd000,0x00007f7d318fe000],  sp=0x00007f7d318fbff8,  free space=1019k
Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code)
C  [libssl.so.1.1+0x32680]  SSL_get_current_cipher+0x0

Java frames: (J=compiled Java code, j=interpreted, Vv=VM code)
j  org.apache.tomcat.jni.SSL.getCipherForSSL(J)Ljava/lang/String;+0
j  org.apache.tomcat.util.net.openssl.OpenSSLEngine$OpenSSLSession.getCipherSuite()Ljava/lang/String;+30
j  org.apache.tomcat.util.net.jsse.JSSESupport.getCipherSuite()Ljava/lang/String;+13
j  org.apache.coyote.AbstractProcessor.populateSslRequestAttributes()V+11
j  org.apache.coyote.AbstractProcessor.action(Lorg/apache/coyote/ActionCode;Ljava/lang/Object;)V+572
j  org.apache.coyote.Request.action(Lorg/apache/coyote/ActionCode;Ljava/lang/Object;)V+31
j  org.apache.catalina.connector.Request.getAttribute(Ljava/lang/String;)Ljava/lang/Object;+85
j  org.apache.catalina.valves.AbstractAccessLogValve$RequestAttributeElement.addElement(Ljava/io/CharArrayWriter;Ljava/util/Date;Lorg/apache/catalina/connector/Request;Lorg/apache/catalina/connector/Response;J)V+12
j  org.apache.catalina.valves.AbstractAccessLogValve.log(Lorg/apache/catalina/connector/Request;Lorg/apache/catalina/connector/Response;J)V+147
j  org.apache.catalina.core.AccessLogAdapter.log(Lorg/apache/catalina/connector/Request;Lorg/apache/catalina/connector/Response;J)V+33
j  org.apache.catalina.core.ContainerBase.logAccess(Lorg/apache/catalina/connector/Request;Lorg/apache/catalina/connector/Response;JZ)V+17
j  org.apache.catalina.core.ContainerBase.logAccess(Lorg/apache/catalina/connector/Request;Lorg/apache/catalina/connector/Response;JZ)V+54
j  org.apache.catalina.connector.CoyoteAdapter.service(Lorg/apache/coyote/Request;Lorg/apache/coyote/Response;)V+421
j  org.apache.coyote.http2.StreamProcessor.service(Lorg/apache/tomcat/util/net/SocketWrapperBase;)Lorg/apache/tomcat/util/net/AbstractEndpoint$Handler$SocketState;+12
j  org.apache.coyote.AbstractProcessorLight.process(Lorg/apache/tomcat/util/net/SocketWrapperBase;Lorg/apache/tomcat/util/net/SocketEvent;)Lorg/apache/tomcat/util/net/AbstractEndpoint$Handler$SocketState;+113
j  org.apache.coyote.http2.StreamProcessor.process(Lorg/apache/tomcat/util/net/SocketEvent;)V+17
j  org.apache.coyote.http2.StreamRunnable.run()V+8
j  java.util.concurrent.ThreadPoolExecutor.runWorker(Ljava/util/concurrent/ThreadPoolExecutor$Worker;)V+95
j  java.util.concurrent.ThreadPoolExecutor$Worker.run()V+5
j  org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run()V+4
j  java.lang.Thread.run()V+11
v  ~StubRoutines::call_stub


In HTTP/1.1 all is well:

curl -Ik --http1.1  https://172.17.0.3:8843/

https-openssl-nio2-8843-exec-2

-          OpenSSLEngine.<init>

https-openssl-nio2-8843-exec-3   (in one thread)

-          Http11Processor(AbstractProcessor).populateSslRequestAttributes()

-          JSSESupport. getCipherSuite()

-          OpenSSLSession.getCipherSuite()
OpenSSLCipherConfigurationParser.openSSLToJsse(SSL.getCipherForSSL(ssl)) << ssl is non-0

-          OpenSSLEngine.shutdown()



Connector conf:

  <Connector port="8843" protocol="org.apache.coyote.http11.Http11Nio2Protocol"
             sslImplementationName="org.apache.tomcat.util.net.openssl.OpenSSLImplementation"
             server="Apache Tomcat"
             allowTrace="false"
             maxThreads="150" SSLEnabled="true">
    <UpgradeProtocol className="org.apache.coyote.http2.Http2Protocol" />
    <SSLHostConfig honorCipherOrder="true" insecureRenegotiation="false"
                   protocols="TLSv1.1+TLSv1.2"
                   certificateVerification="false"
                   disableCompression="true"
                   disableSessionTickets="true"
                   ciphers="ECDHE-ECDSA-CHACHA20-POLY1305:ECDHE-RSA-CHACHA20-POLY1305:ECDHE-ECDSA-AES128-GCM-SHA256:ECDHE-RSA-AES128-GCM-SHA256:ECDHE-ECDSA-AES256-GCM-SHA384:ECDHE-RSA-AES256-GCM-SHA384:DHE-RSA-AES128-GCM-SHA256:DHE-RSA-AES256-GCM-SHA384:ECDHE-ECDSA-AES128-SHA256:ECDHE-RSA-AES128-SHA256:ECDHE-ECDSA-AES128-SHA:ECDHE-RSA-AES256-SHA384:ECDHE-RSA-AES128-SHA:ECDHE-ECDSA-AES256-SHA384:ECDHE-ECDSA-AES256-SHA:ECDHE-RSA-AES256-SHA:DHE-RSA-AES128-SHA256:DHE-RSA-AES128-SHA:DHE-RSA-AES256-SHA256:DHE-RSA-AES256-SHA:AES128-GCM-SHA256:AES256-GCM-SHA384:AES128-SHA256:AES256-SHA256:AES128-SHA:AES256-SHA:!DSS" >
      <Certificate certificateKeyFile="${catalina.base}/conf/ssl/my.key"
                   certificateFile="${catalina.base}/conf/ssl/my.pem"
                   type="RSA" />
    </SSLHostConfig>
  </Connector>

Tomcat version and command line params:

09-Jun-2017 15:34:27.292 FINE [main] sun.rmi.transport.tcp.TCPEndpoint.<clinit> main: localHostKnown = true, localHost = 172.17.0.3
09-Jun-2017 15:34:27.305 FINE [main] sun.rmi.transport.tcp.TCPTransport.<init> main: Version = 2, ep = [172.17.0.3:0,sun.management.jmxremote.LocalRMIServerSocketFactory@5bcab519]
09-Jun-2017 15:34:27.306 FINE [main] sun.rmi.transport.tcp.TCPEndpoint.getLocalEndpoint main: created local endpoint for socket factory sun.management.jmxremote.LocalRMIServerSocketFactory@5bcab519 on port 0
09-Jun-2017 15:34:27.514 FINE [main] sun.rmi.transport.tcp.TCPTransport.<init> main: Version = 2, ep = [172.17.0.3:0]
09-Jun-2017 15:34:27.515 FINE [main] sun.rmi.transport.tcp.TCPEndpoint.getLocalEndpoint main: created local endpoint for socket factory null on port 0
09-Jun-2017 15:34:27.521 FINE [main] sun.rmi.transport.tcp.TCPTransport.listen main: (port 0) create server socket
09-Jun-2017 15:34:27.526 FINE [main] sun.rmi.transport.tcp.TCPEndpoint.setDefaultPort main: default port for server socket factory sun.management.jmxremote.LocalRMIServerSocketFactory@5bcab519 and client socket factory null set to 38314
09-Jun-2017 15:34:27.529 FINE [RMI TCP Accept-0] sun.rmi.transport.tcp.TCPTransport$AcceptLoop.executeAcceptLoop RMI TCP Accept-0: listening on port 38314
09-Jun-2017 15:34:29.043 INFO [main] org.apache.catalina.startup.VersionLoggerListener.log Server version:        Apache Tomcat/8.5.15
09-Jun-2017 15:34:29.046 INFO [main] org.apache.catalina.startup.VersionLoggerListener.log Server built:          May 5 2017 11:03:04 UTC
09-Jun-2017 15:34:29.047 INFO [main] org.apache.catalina.startup.VersionLoggerListener.log Server number:         8.5.15.0
09-Jun-2017 15:34:29.048 INFO [main] org.apache.catalina.startup.VersionLoggerListener.log OS Name:               Linux
09-Jun-2017 15:34:29.049 INFO [main] org.apache.catalina.startup.VersionLoggerListener.log OS Version:            4.5.0-kali1-amd64
09-Jun-2017 15:34:29.049 INFO [main] org.apache.catalina.startup.VersionLoggerListener.log Architecture:          amd64
09-Jun-2017 15:34:29.051 INFO [main] org.apache.catalina.startup.VersionLoggerListener.log Java Home:             /usr/lib/jvm/zulu-8-amd64/jre
09-Jun-2017 15:34:29.051 INFO [main] org.apache.catalina.startup.VersionLoggerListener.log JVM Version:           1.8.0_131-b11
09-Jun-2017 15:34:29.052 INFO [main] org.apache.catalina.startup.VersionLoggerListener.log JVM Vendor:            Azul Systems, Inc.
09-Jun-2017 15:34:29.053 INFO [main] org.apache.catalina.startup.VersionLoggerListener.log CATALINA_BASE:         /opt/apache-tomcat.base
09-Jun-2017 15:34:29.053 INFO [main] org.apache.catalina.startup.VersionLoggerListener.log CATALINA_HOME:         /usr/local/tomcat
09-Jun-2017 15:34:29.055 INFO [main] org.apache.catalina.startup.VersionLoggerListener.log Command line argument: -Djava.util.logging.config.file=/opt/apache-tomcat.base/conf/logging.properties
09-Jun-2017 15:34:29.056 INFO [main] org.apache.catalina.startup.VersionLoggerListener.log Command line argument: -Djava.util.logging.manager=org.apache.juli.ClassLoaderLogManager
09-Jun-2017 15:34:29.056 INFO [main] org.apache.catalina.startup.VersionLoggerListener.log Command line argument: -Djdk.tls.ephemeralDHKeySize=2048
09-Jun-2017 15:34:29.056 INFO [main] org.apache.catalina.startup.VersionLoggerListener.log Command line argument: -Djava.protocol.handler.pkgs=org.apache.catalina.webresources
09-Jun-2017 15:34:29.056 INFO [main] org.apache.catalina.startup.VersionLoggerListener.log Command line argument: -agentlib:jdwp=transport=dt_socket,address=localhost:8000,server=y,suspend=n
09-Jun-2017 15:34:29.057 INFO [main] org.apache.catalina.startup.VersionLoggerListener.log Command line argument: -Djava.awt.headless=true
09-Jun-2017 15:34:29.057 INFO [main] org.apache.catalina.startup.VersionLoggerListener.log Command line argument: -XX:+UseConcMarkSweepGC
09-Jun-2017 15:34:29.058 INFO [main] org.apache.catalina.startup.VersionLoggerListener.log Command line argument: -Xloggc:/opt/apache-tomcat.base/logs/gc.log
09-Jun-2017 15:34:29.058 INFO [main] org.apache.catalina.startup.VersionLoggerListener.log Command line argument: -Djava.security.egd=file:/dev/urandom
09-Jun-2017 15:34:29.058 INFO [main] org.apache.catalina.startup.VersionLoggerListener.log Command line argument: -Dsun.net.inetaddr.ttl=60
09-Jun-2017 15:34:29.058 INFO [main] org.apache.catalina.startup.VersionLoggerListener.log Command line argument: -Djava.library.path=/usr/lib/jni
09-Jun-2017 15:34:29.059 INFO [main] org.apache.catalina.startup.VersionLoggerListener.log Command line argument: -Djdk.tls.ephemeralDHKeySize=2048
09-Jun-2017 15:34:29.060 INFO [main] org.apache.catalina.startup.VersionLoggerListener.log Command line argument: -Djdk.tls.rejectClientInitiatedRenegotiation=true
09-Jun-2017 15:34:29.061 INFO [main] org.apache.catalina.startup.VersionLoggerListener.log Command line argument: -Dcom.sun.management.jmxremote
09-Jun-2017 15:34:29.061 INFO [main] org.apache.catalina.startup.VersionLoggerListener.log Command line argument: -Dcom.sun.management.jmxremote.ssl=false
09-Jun-2017 15:34:29.061 INFO [main] org.apache.catalina.startup.VersionLoggerListener.log Command line argument: -Dcom.sun.management.jmxremote.authenticate=true
09-Jun-2017 15:34:29.062 INFO [main] org.apache.catalina.startup.VersionLoggerListener.log Command line argument: -Dcom.sun.management.jmxremote.access.file=/opt/apache-tomcat.base/conf/jmxremote.access
09-Jun-2017 15:34:29.062 INFO [main] org.apache.catalina.startup.VersionLoggerListener.log Command line argument: -Dcom.sun.management.jmxremote.password.file=/opt/apache-tomcat.base/conf/jmxremote.password
09-Jun-2017 15:34:29.062 INFO [main] org.apache.catalina.startup.VersionLoggerListener.log Command line argument: -Dcatalina.base=/opt/apache-tomcat.base
09-Jun-2017 15:34:29.063 INFO [main] org.apache.catalina.startup.VersionLoggerListener.log Command line argument: -Dcatalina.home=/usr/local/tomcat
09-Jun-2017 15:34:29.063 INFO [main] org.apache.catalina.startup.VersionLoggerListener.log Command line argument: -Djava.io.tmpdir=/opt/apache-tomcat.base/temp
09-Jun-2017 15:34:29.063 INFO [main] org.apache.catalina.core.AprLifecycleListener.lifecycleEvent Loaded APR based Apache Tomcat Native library [1.2.12] using APR version [1.5.2].
09-Jun-2017 15:34:29.064 INFO [main] org.apache.catalina.core.AprLifecycleListener.lifecycleEvent APR capabilities: IPv6 [true], sendfile [true], accept filters [false], random [true].
09-Jun-2017 15:34:29.064 INFO [main] org.apache.catalina.core.AprLifecycleListener.lifecycleEvent APR/OpenSSL configuration: useAprConnector [false], useOpenSSL [true]
09-Jun-2017 15:34:29.101 INFO [main] org.apache.catalina.core.AprLifecycleListener.initializeSSL OpenSSL successfully initialized [OpenSSL 1.1.0e  16 Feb 2017]
09-Jun-2017 15:34:29.268 INFO [main] org.apache.coyote.AbstractProtocol.init Initializing ProtocolHandler ["ajp-nio-8009"]
09-Jun-2017 15:34:29.293 INFO [main] org.apache.tomcat.util.net.NioSelectorPool.getSharedSelector Using a shared selector for servlet write/read
09-Jun-2017 15:34:29.362 SEVERE [main] org.apache.coyote.http11.AbstractHttp11Protocol.configureUpgradeProtocol The upgrade handler [org.apache.coyote.http2.Http2Protocol] for [h2] only supports upgrade via ALPN but has been configured for the ["https-jsse-nio2-8443"] connector that does not support ALPN.
09-Jun-2017 15:34:29.370 INFO [main] org.apache.coyote.AbstractProtocol.init Initializing ProtocolHandler ["https-jsse-nio2-8443"]
09-Jun-2017 15:34:29.826 INFO [main] org.apache.coyote.http11.AbstractHttp11Protocol.configureUpgradeProtocol The ["https-openssl-nio2-8843"] connector has been configured to support negotiation to [h2] via ALPN
09-Jun-2017 15:34:29.827 INFO [main] org.apache.coyote.AbstractProtocol.init Initializing ProtocolHandler ["https-openssl-nio2-8843"]
09-Jun-2017 15:34:29.920 INFO [main] org.apache.catalina.startup.Catalina.load Initialization processed in 2217 ms
09-Jun-2017 15:34:29.969 FINE [main] sun.rmi.transport.tcp.TCPTransport.<init> main: Version = 2, ep = [172.17.0.3:10001]
09-Jun-2017 15:34:29.970 FINE [main] sun.rmi.transport.tcp.TCPEndpoint.getLocalEndpoint main: created local endpoint for socket factory null on port 10001
09-Jun-2017 15:34:29.980 FINE [main] sun.rmi.transport.tcp.TCPTransport.listen main: (port 10001) create server socket
09-Jun-2017 15:34:29.982 FINE [RMI TCP Accept-10001] sun.rmi.transport.tcp.TCPTransport$AcceptLoop.executeAcceptLoop RMI TCP Accept-10001: listening on port 10001
09-Jun-2017 15:34:29.996 FINE [main] sun.rmi.transport.tcp.TCPTransport.<init> main: Version = 2, ep = [172.17.0.3:10002]
09-Jun-2017 15:34:29.997 FINE [main] sun.rmi.transport.tcp.TCPEndpoint.getLocalEndpoint main: created local endpoint for socket factory null on port 10002
09-Jun-2017 15:34:29.999 FINE [main] sun.rmi.transport.tcp.TCPTransport.listen main: (port 10002) create server socket
09-Jun-2017 15:34:30.001 FINE [RMI TCP Accept-10002] sun.rmi.transport.tcp.TCPTransport$AcceptLoop.executeAcceptLoop RMI TCP Accept-10002: listening on port 10002
09-Jun-2017 15:34:30.081 INFO [main] org.apache.catalina.mbeans.JmxRemoteLifecycleListener.createServer The JMX Remote Listener has configured the registry on port [10001] and the server on port [10002] for the [Platform] server
09-Jun-2017 15:34:30.161 INFO [main] org.apache.catalina.core.StandardService.startInternal Starting service [Catalina]
09-Jun-2017 15:34:30.163 INFO [main] org.apache.catalina.core.StandardEngine.startInternal Starting Servlet Engine: Apache Tomcat/8.5.15
09-Jun-2017 15:34:30.188 INFO [localhost-startStop-1] org.apache.catalina.startup.HostConfig.deployDescriptor Deploying configuration descriptor [/opt/apache-tomcat.base/conf/Catalina/localhost/manager.xml]
09-Jun-2017 15:34:31.013 INFO [localhost-startStop-1] org.apache.jasper.servlet.TldScanner.scanJars At least one JAR was scanned for TLDs yet contained no TLDs. Enable debug logging for this logger for a complete list of JARs that were scanned but no TLDs were found in them. Skipping unneeded JARs during scanning can improve startup time and JSP compilation time.
09-Jun-2017 15:34:31.229 INFO [localhost-startStop-1] org.apache.catalina.startup.HostConfig.deployDescriptor Deployment of configuration descriptor [/opt/apache-tomcat.base/conf/Catalina/localhost/manager.xml] has finished in [1,028] ms
09-Jun-2017 15:34:31.234 INFO [main] org.apache.coyote.AbstractProtocol.start Starting ProtocolHandler ["ajp-nio-8009"]
09-Jun-2017 15:34:31.272 INFO [main] org.apache.coyote.AbstractProtocol.start Starting ProtocolHandler ["https-jsse-nio2-8443"]
09-Jun-2017 15:34:31.275 INFO [main] org.apache.coyote.AbstractProtocol.start Starting ProtocolHandler ["https-openssl-nio2-8843"]
09-Jun-2017 15:34:31.277 INFO [main] org.apache.catalina.startup.Catalina.start Server startup in 1340 ms

Would be nice if somebody here could tell me if that is a bug or if I have some misconfiguration.

Thank You and Best regards

Peter

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: JVM Crash in tcnative due to concurrency/timing in HTTP/2

Mark Thomas-2
On 09/06/17 16:02, Kreuser, Peter wrote:
> Hi all,
>
> Sorry for the long text. I hope somebody can help me track down the problem I'm facing with Tomcat (8.5.15), tcnative (1.2.12), openssl (1.1.0e) and HTTP/2. JVM is zulu-8.21.0.1 (1.8.0_131-b11)

No need to apologise. There was enough information in this report for me
to be 95% sure of what was going on before I tested things locally.

<snip/>

> Would be nice if somebody here could tell me if that is a bug or if I have some misconfiguration.

It is a bug.

The short explanation is that with NIO2, the style of the API (async
read/write with CompletionHandlers) means that a TLS connection can get
closed down (by a CompletionHandler) before the access log entry is
written. This leads to passing a NULL reference to some native code
which - unsurprisingly - falls over (Tomcat is meant to make sure it
doesn't pass NULL references).

There is an easy fix but the side-effect is that sometimes the cipher
suite name won't appear in the access log. Depending on timing on your
system that could be very frequent to almost never. I'd roughly expect
it to happen as frequently as the crashes.

The harder fix is to note that we want the info and to ensure we cache
it before the TLS connection is closed down. In theory, we could do this
in the access log but it means parsing the request attribute config.
Performance wise, there will be an impact on users since the lookup to
populate the cache will occur during request processing rather than
afterwards. However, overall throughput should be unaffected.

An alternative fix is for the application to trigger the caching of the
TLS info by requesting one of the TLD attributes.

Switching to NIO should also fix this. Note, I would not expect NIO and
NIO2 to be much different performance wise.

Would the easy fix be sufficient?

Mark

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: JVM Crash in tcnative due to concurrency/timing in HTTP/2

remm
2017-06-12 17:07 GMT-05:00 Mark Thomas <[hidden email]>:

> On 09/06/17 16:02, Kreuser, Peter wrote:
> > Hi all,
> >
> > Sorry for the long text. I hope somebody can help me track down the
> problem I'm facing with Tomcat (8.5.15), tcnative (1.2.12), openssl
> (1.1.0e) and HTTP/2. JVM is zulu-8.21.0.1 (1.8.0_131-b11)
>
> No need to apologise. There was enough information in this report for me
> to be 95% sure of what was going on before I tested things locally.
>
> <snip/>
>
> > Would be nice if somebody here could tell me if that is a bug or if I
> have some misconfiguration.
>
> It is a bug.
>
> The short explanation is that with NIO2, the style of the API (async
> read/write with CompletionHandlers) means that a TLS connection can get
> closed down (by a CompletionHandler) before the access log entry is
> written. This leads to passing a NULL reference to some native code
> which - unsurprisingly - falls over (Tomcat is meant to make sure it
> doesn't pass NULL references).
>
> There is an easy fix but the side-effect is that sometimes the cipher
> suite name won't appear in the access log. Depending on timing on your
> system that could be very frequent to almost never. I'd roughly expect
> it to happen as frequently as the crashes.
>
> The harder fix is to note that we want the info and to ensure we cache
> it before the TLS connection is closed down. In theory, we could do this
> in the access log but it means parsing the request attribute config.
> Performance wise, there will be an impact on users since the lookup to
> populate the cache will occur during request processing rather than
> afterwards. However, overall throughput should be unaffected.
>
> An alternative fix is for the application to trigger the caching of the
> TLS info by requesting one of the TLD attributes.
>
> Switching to NIO should also fix this. Note, I would not expect NIO and
> NIO2 to be much different performance wise.
>
> Would the easy fix be sufficient?
>
> Are you sure it's not simply still this bug:
https://bz.apache.org/bugzilla/show_bug.cgi?id=60461 ?
I agree the fix is "hard" since, although there's no actual async close (I
don't see any completion handler closing things abruptly personally - on
read I failed to do anything async for now, and on write the completion
handler simply sets an error field to be processed synchronously later),
closing can occur after parsing any frame. IMO with HTTP/2 everything is
async to the application, so it's the same for all connectors.

Rémy
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

AW: JVM Crash in tcnative due to concurrency/timing in HTTP/2

Kreuser, Peter
In reply to this post by Mark Thomas-2
Mark,

> On 09/06/17 16:02, Kreuser, Peter wrote:
> > Hi all,
> >
> > Sorry for the long text. I hope somebody can help me track down the problem I'm facing with Tomcat (8.5.15), tcnative (1.2.12), openssl (1.1.0e) and HTTP/2. JVM is zulu-8.21.0.1 (1.8.0_131-b11)
>
> No need to apologise. There was enough information in this report for me
> to be 95% sure of what was going on before I tested things locally.
>
> <snip/>
>
> > Would be nice if somebody here could tell me if that is a bug or if I have some misconfiguration.
>
> It is a bug.
>
> The short explanation is that with NIO2, the style of the API (async
> read/write with CompletionHandlers) means that a TLS connection can get
> closed down (by a CompletionHandler) before the access log entry is
> written. This leads to passing a NULL reference to some native code
> which - unsurprisingly - falls over (Tomcat is meant to make sure it
> doesn't pass NULL references).
>
> There is an easy fix but the side-effect is that sometimes the cipher
> suite name won't appear in the access log. Depending on timing on your
> system that could be very frequent to almost never. I'd roughly expect
> it to happen as frequently as the crashes.
>
> The harder fix is to note that we want the info and to ensure we cache
> it before the TLS connection is closed down. In theory, we could do this
> in the access log but it means parsing the request attribute config.
> Performance wise, there will be an impact on users since the lookup to
> populate the cache will occur during request processing rather than
> afterwards. However, overall throughput should be unaffected.
>
> An alternative fix is for the application to trigger the caching of the
> TLS info by requesting one of the TLD attributes.
>
> Switching to NIO should also fix this. Note, I would not expect NIO and
> NIO2 to be much different performance wise.
>
> Would the easy fix be sufficient?
>
> Mark

For a start the easy fix would be sufficient. Apparently this is only a real problem when I call the site via curl, as it does not happen in the various browsers (at least not on every call).

We should at least make users aware that certain information can be missing in the accesslog and the cause of it (in the AccessLogValve documentation?). And are there any other implications of the asynchronicity? I guess the behavior of NIO2 and HTTP/2 processing is not obvious?

Your alternative fix would be to copy the attributes during request processing (in a filter) and then log them from the filter request attributes instead of the predefined ssl attributes? Correct?

Thank you for your support and the bug fix ;-)! Have you filed the bug internally or should I?

Best regards

Peter


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: AW: JVM Crash in tcnative due to concurrency/timing in HTTP/2

Mark Thomas-2
On 13/06/17 10:13, Kreuser, Peter wrote:

> Mark,
>
>> On 09/06/17 16:02, Kreuser, Peter wrote:
>>> Hi all,
>>>
>>> Sorry for the long text. I hope somebody can help me track down the problem I'm facing with Tomcat (8.5.15), tcnative (1.2.12), openssl (1.1.0e) and HTTP/2. JVM is zulu-8.21.0.1 (1.8.0_131-b11)
>>
>> No need to apologise. There was enough information in this report for me
>> to be 95% sure of what was going on before I tested things locally.
>>
>> <snip/>
>>
>>> Would be nice if somebody here could tell me if that is a bug or if I have some misconfiguration.
>>
>> It is a bug.
>>
>> The short explanation is that with NIO2, the style of the API (async
>> read/write with CompletionHandlers) means that a TLS connection can get
>> closed down (by a CompletionHandler) before the access log entry is
>> written. This leads to passing a NULL reference to some native code
>> which - unsurprisingly - falls over (Tomcat is meant to make sure it
>> doesn't pass NULL references).
>>
>> There is an easy fix but the side-effect is that sometimes the cipher
>> suite name won't appear in the access log. Depending on timing on your
>> system that could be very frequent to almost never. I'd roughly expect
>> it to happen as frequently as the crashes.
>>
>> The harder fix is to note that we want the info and to ensure we cache
>> it before the TLS connection is closed down. In theory, we could do this
>> in the access log but it means parsing the request attribute config.
>> Performance wise, there will be an impact on users since the lookup to
>> populate the cache will occur during request processing rather than
>> afterwards. However, overall throughput should be unaffected.
>>
>> An alternative fix is for the application to trigger the caching of the
>> TLS info by requesting one of the TLD attributes.
>>
>> Switching to NIO should also fix this. Note, I would not expect NIO and
>> NIO2 to be much different performance wise.
>>
>> Would the easy fix be sufficient?
>>
>> Mark
>
> For a start the easy fix would be sufficient. Apparently this is only a real problem when I call the site via curl, as it does not happen in the various browsers (at least not on every call).
>
> We should at least make users aware that certain information can be missing in the accesslog and the cause of it (in the AccessLogValve documentation?). And are there any other implications of the asynchronicity? I guess the behavior of NIO2 and HTTP/2 processing is not obvious?
>
> Your alternative fix would be to copy the attributes during request processing (in a filter) and then log them from the filter request attributes instead of the predefined ssl attributes? Correct?
>
> Thank you for your support and the bug fix ;-)! Have you filed the bug internally or should I?

I need to look into this some more first. Remy's comments suggest that
my first analysis might have been wrong.

Mark

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: AW: JVM Crash in tcnative due to concurrency/timing in HTTP/2

Mark Thomas-2
On 13/06/17 11:57, Mark Thomas wrote:

> On 13/06/17 10:13, Kreuser, Peter wrote:
>> Mark,
>>
>>> On 09/06/17 16:02, Kreuser, Peter wrote:
>>>> Hi all,
>>>>
>>>> Sorry for the long text. I hope somebody can help me track down the problem I'm facing with Tomcat (8.5.15), tcnative (1.2.12), openssl (1.1.0e) and HTTP/2. JVM is zulu-8.21.0.1 (1.8.0_131-b11)
>>>
>>> No need to apologise. There was enough information in this report for me
>>> to be 95% sure of what was going on before I tested things locally.
>>>
>>> <snip/>
>>>
>>>> Would be nice if somebody here could tell me if that is a bug or if I have some misconfiguration.
>>>
>>> It is a bug.
>>>
>>> The short explanation is that with NIO2, the style of the API (async
>>> read/write with CompletionHandlers) means that a TLS connection can get
>>> closed down (by a CompletionHandler) before the access log entry is
>>> written. This leads to passing a NULL reference to some native code
>>> which - unsurprisingly - falls over (Tomcat is meant to make sure it
>>> doesn't pass NULL references).
>>>
>>> There is an easy fix but the side-effect is that sometimes the cipher
>>> suite name won't appear in the access log. Depending on timing on your
>>> system that could be very frequent to almost never. I'd roughly expect
>>> it to happen as frequently as the crashes.
>>>
>>> The harder fix is to note that we want the info and to ensure we cache
>>> it before the TLS connection is closed down. In theory, we could do this
>>> in the access log but it means parsing the request attribute config.
>>> Performance wise, there will be an impact on users since the lookup to
>>> populate the cache will occur during request processing rather than
>>> afterwards. However, overall throughput should be unaffected.
>>>
>>> An alternative fix is for the application to trigger the caching of the
>>> TLS info by requesting one of the TLD attributes.
>>>
>>> Switching to NIO should also fix this. Note, I would not expect NIO and
>>> NIO2 to be much different performance wise.
>>>
>>> Would the easy fix be sufficient?
>>>
>>> Mark
>>
>> For a start the easy fix would be sufficient. Apparently this is only a real problem when I call the site via curl, as it does not happen in the various browsers (at least not on every call).
>>
>> We should at least make users aware that certain information can be missing in the accesslog and the cause of it (in the AccessLogValve documentation?). And are there any other implications of the asynchronicity? I guess the behavior of NIO2 and HTTP/2 processing is not obvious?
>>
>> Your alternative fix would be to copy the attributes during request processing (in a filter) and then log them from the filter request attributes instead of the predefined ssl attributes? Correct?
>>
>> Thank you for your support and the bug fix ;-)! Have you filed the bug internally or should I?
>
> I need to look into this some more first. Remy's comments suggest that
> my first analysis might have been wrong.

I've been looking into this this afternoon. Rémy has made some changes
to trunk and 8.5.x that prevent the crash.

It does appear that a timing issue is involved. I'm still looking into
the details.

At the moment, the SSL_NULL_WITH_NULL_NULL cipher suite is recorded
rather than the crash when the TLS information is not available when the
access log needs it.

Rémy's changes are in the 9.0.x and 8.5.x releases being voted on at the
moment.

Mark

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: JVM Crash in tcnative due to concurrency/timing in HTTP/2

Peter Kreuser
Mark,

>
> Am 23.06.2017 um 15:17 schrieb Mark Thomas <[hidden email]>:
>
> On 13/06/17 11:57, Mark Thomas wrote:
>> On 13/06/17 10:13, Kreuser, Peter wrote:
>>> Mark,
>>>
>>>> On 09/06/17 16:02, Kreuser, Peter wrote:
>>>>> Hi all,
>>>>>
>>>>> Sorry for the long text. I hope somebody can help me track down the problem I'm facing with Tomcat (8.5.15), tcnative (1.2.12), openssl (1.1.0e) and HTTP/2. JVM is zulu-8.21.0.1 (1.8.0_131-b11)
>>>>
>>>> No need to apologise. There was enough information in this report for me
>>>> to be 95% sure of what was going on before I tested things locally.
>>>>
>>>> <snip/>
>>>>
>>>>> Would be nice if somebody here could tell me if that is a bug or if I have some misconfiguration.
>>>>
>>>> It is a bug.
>>>>
>>>> The short explanation is that with NIO2, the style of the API (async
>>>> read/write with CompletionHandlers) means that a TLS connection can get
>>>> closed down (by a CompletionHandler) before the access log entry is
>>>> written. This leads to passing a NULL reference to some native code
>>>> which - unsurprisingly - falls over (Tomcat is meant to make sure it
>>>> doesn't pass NULL references).
>>>>
>>>> There is an easy fix but the side-effect is that sometimes the cipher
>>>> suite name won't appear in the access log. Depending on timing on your
>>>> system that could be very frequent to almost never. I'd roughly expect
>>>> it to happen as frequently as the crashes.
>>>>
>>>> The harder fix is to note that we want the info and to ensure we cache
>>>> it before the TLS connection is closed down. In theory, we could do this
>>>> in the access log but it means parsing the request attribute config.
>>>> Performance wise, there will be an impact on users since the lookup to
>>>> populate the cache will occur during request processing rather than
>>>> afterwards. However, overall throughput should be unaffected.
>>>>
>>>> An alternative fix is for the application to trigger the caching of the
>>>> TLS info by requesting one of the TLD attributes.
>>>>
>>>> Switching to NIO should also fix this. Note, I would not expect NIO and
>>>> NIO2 to be much different performance wise.
>>>>
>>>> Would the easy fix be sufficient?
>>>>
>>>> Mark
>>>
>>> For a start the easy fix would be sufficient. Apparently this is only a real problem when I call the site via curl, as it does not happen in the various browsers (at least not on every call).
>>>
>>> We should at least make users aware that certain information can be missing in the accesslog and the cause of it (in the AccessLogValve documentation?). And are there any other implications of the asynchronicity? I guess the behavior of NIO2 and HTTP/2 processing is not obvious?
>>>
>>> Your alternative fix would be to copy the attributes during request processing (in a filter) and then log them from the filter request attributes instead of the predefined ssl attributes? Correct?
>>>
>>> Thank you for your support and the bug fix ;-)! Have you filed the bug internally or should I?
>>
>> I need to look into this some more first. Remy's comments suggest that
>> my first analysis might have been wrong.
>
> I've been looking into this this afternoon. Rémy has made some changes
> to trunk and 8.5.x that prevent the crash.
>
> It does appear that a timing issue is involved. I'm still looking into
> the details.
>
> At the moment, the SSL_NULL_WITH_NULL_NULL cipher suite is recorded
> rather than the crash when the TLS information is not available when the
> access log needs it.
>
> Rémy's changes are in the 9.0.x and 8.5.x releases being voted on at the
> moment.
>
> Mark
>
> ---------------------------------------------------------------------

That seems to help. I get

192.168.1.10 - - [23/Jun/2017:16:53:22 +0200] 'GET / HTTP/2.0' 200 16 5 8843 '-' 'curl/7.54.0' - SSL_NULL_WITH_NULL_NULL

plus on the console:

23-Jun-2017 16:53:22.045 WARNING [https-openssl-nio2-8843-exec-3] org.apache.coyote.AbstractProcessor.populateSslRequestAttributes Exception getting SSL attributes
 java.lang.IllegalStateException: SSL session ID not available
        at org.apache.tomcat.util.net.openssl.OpenSSLEngine$OpenSSLSession.getId(OpenSSLEngine.java:1047)
        at org.apache.tomcat.util.net.jsse.JSSESupport.getSessionId(JSSESupport.java:156)
        at org.apache.coyote.AbstractProcessor.populateSslRequestAttributes(AbstractProcessor.java:608)
        at org.apache.coyote.AbstractProcessor.action(AbstractProcessor.java:361)
        at org.apache.coyote.Request.action(Request.java:426)
        at org.apache.catalina.connector.Request.getAttribute(Request.java:903)
        at org.apache.catalina.valves.AbstractAccessLogValve$RequestAttributeElement.addElement(AbstractAccessLogValve.java:1439)
        at org.apache.catalina.valves.AbstractAccessLogValve.log(AbstractAccessLogValve.java:653)
        at org.apache.catalina.core.AccessLogAdapter.log(AccessLogAdapter.java:48)
        at org.apache.catalina.core.ContainerBase.logAccess(ContainerBase.java:1065)
        at org.apache.catalina.core.ContainerBase.logAccess(ContainerBase.java:1072)
        at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:401)
        at org.apache.coyote.http2.StreamProcessor.service(StreamProcessor.java:245)
        at org.apache.coyote.AbstractProcessorLight.process(AbstractProcessorLight.java:66)
        at org.apache.coyote.http2.StreamProcessor.process(StreamProcessor.java:65)
        at org.apache.coyote.http2.StreamRunnable.run(StreamRunnable.java:35)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread.java:61)
        at java.lang.Thread.run(Thread.java:748)

However I still see a timing problem, as it is not always happening. Maybe 5 out of 7 are NULL. If I send more requests immediately after each other NULL is more frequent.

Best regards

Peter
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: JVM Crash in tcnative due to concurrency/timing in HTTP/2

Mark Thomas-2
On 23/06/17 16:02, [hidden email] wrote:

> Mark,
>>
>> Am 23.06.2017 um 15:17 schrieb Mark Thomas <[hidden email]>:
>>
>> On 13/06/17 11:57, Mark Thomas wrote:
>>> On 13/06/17 10:13, Kreuser, Peter wrote:
>>>> Mark,
>>>>
>>>>> On 09/06/17 16:02, Kreuser, Peter wrote:
>>>>>> Hi all,
>>>>>>
>>>>>> Sorry for the long text. I hope somebody can help me track down the problem I'm facing with Tomcat (8.5.15), tcnative (1.2.12), openssl (1.1.0e) and HTTP/2. JVM is zulu-8.21.0.1 (1.8.0_131-b11)
>>>>>
>>>>> No need to apologise. There was enough information in this report for me
>>>>> to be 95% sure of what was going on before I tested things locally.
>>>>>
>>>>> <snip/>
>>>>>
>>>>>> Would be nice if somebody here could tell me if that is a bug or if I have some misconfiguration.
>>>>>
>>>>> It is a bug.
>>>>>
>>>>> The short explanation is that with NIO2, the style of the API (async
>>>>> read/write with CompletionHandlers) means that a TLS connection can get
>>>>> closed down (by a CompletionHandler) before the access log entry is
>>>>> written. This leads to passing a NULL reference to some native code
>>>>> which - unsurprisingly - falls over (Tomcat is meant to make sure it
>>>>> doesn't pass NULL references).
>>>>>
>>>>> There is an easy fix but the side-effect is that sometimes the cipher
>>>>> suite name won't appear in the access log. Depending on timing on your
>>>>> system that could be very frequent to almost never. I'd roughly expect
>>>>> it to happen as frequently as the crashes.
>>>>>
>>>>> The harder fix is to note that we want the info and to ensure we cache
>>>>> it before the TLS connection is closed down. In theory, we could do this
>>>>> in the access log but it means parsing the request attribute config.
>>>>> Performance wise, there will be an impact on users since the lookup to
>>>>> populate the cache will occur during request processing rather than
>>>>> afterwards. However, overall throughput should be unaffected.
>>>>>
>>>>> An alternative fix is for the application to trigger the caching of the
>>>>> TLS info by requesting one of the TLD attributes.
>>>>>
>>>>> Switching to NIO should also fix this. Note, I would not expect NIO and
>>>>> NIO2 to be much different performance wise.
>>>>>
>>>>> Would the easy fix be sufficient?
>>>>>
>>>>> Mark
>>>>
>>>> For a start the easy fix would be sufficient. Apparently this is only a real problem when I call the site via curl, as it does not happen in the various browsers (at least not on every call).
>>>>
>>>> We should at least make users aware that certain information can be missing in the accesslog and the cause of it (in the AccessLogValve documentation?). And are there any other implications of the asynchronicity? I guess the behavior of NIO2 and HTTP/2 processing is not obvious?
>>>>
>>>> Your alternative fix would be to copy the attributes during request processing (in a filter) and then log them from the filter request attributes instead of the predefined ssl attributes? Correct?
>>>>
>>>> Thank you for your support and the bug fix ;-)! Have you filed the bug internally or should I?
>>>
>>> I need to look into this some more first. Remy's comments suggest that
>>> my first analysis might have been wrong.
>>
>> I've been looking into this this afternoon. Rémy has made some changes
>> to trunk and 8.5.x that prevent the crash.
>>
>> It does appear that a timing issue is involved. I'm still looking into
>> the details.
>>
>> At the moment, the SSL_NULL_WITH_NULL_NULL cipher suite is recorded
>> rather than the crash when the TLS information is not available when the
>> access log needs it.
>>
>> Rémy's changes are in the 9.0.x and 8.5.x releases being voted on at the
>> moment.
>>
>> Mark
>>
>> ---------------------------------------------------------------------
>
> That seems to help. I get
>
> 192.168.1.10 - - [23/Jun/2017:16:53:22 +0200] 'GET / HTTP/2.0' 200 16 5 8843 '-' 'curl/7.54.0' - SSL_NULL_WITH_NULL_NULL
>
> plus on the console:
>
> 23-Jun-2017 16:53:22.045 WARNING [https-openssl-nio2-8843-exec-3] org.apache.coyote.AbstractProcessor.populateSslRequestAttributes Exception getting SSL attributes
>  java.lang.IllegalStateException: SSL session ID not available
> at org.apache.tomcat.util.net.openssl.OpenSSLEngine$OpenSSLSession.getId(OpenSSLEngine.java:1047)
> at org.apache.tomcat.util.net.jsse.JSSESupport.getSessionId(JSSESupport.java:156)
> at org.apache.coyote.AbstractProcessor.populateSslRequestAttributes(AbstractProcessor.java:608)
> at org.apache.coyote.AbstractProcessor.action(AbstractProcessor.java:361)
> at org.apache.coyote.Request.action(Request.java:426)
> at org.apache.catalina.connector.Request.getAttribute(Request.java:903)
> at org.apache.catalina.valves.AbstractAccessLogValve$RequestAttributeElement.addElement(AbstractAccessLogValve.java:1439)
> at org.apache.catalina.valves.AbstractAccessLogValve.log(AbstractAccessLogValve.java:653)
> at org.apache.catalina.core.AccessLogAdapter.log(AccessLogAdapter.java:48)
> at org.apache.catalina.core.ContainerBase.logAccess(ContainerBase.java:1065)
> at org.apache.catalina.core.ContainerBase.logAccess(ContainerBase.java:1072)
> at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:401)
> at org.apache.coyote.http2.StreamProcessor.service(StreamProcessor.java:245)
> at org.apache.coyote.AbstractProcessorLight.process(AbstractProcessorLight.java:66)
> at org.apache.coyote.http2.StreamProcessor.process(StreamProcessor.java:65)
> at org.apache.coyote.http2.StreamRunnable.run(StreamRunnable.java:35)
> at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread.java:61)
> at java.lang.Thread.run(Thread.java:748)
>
> However I still see a timing problem, as it is not always happening. Maybe 5 out of 7 are NULL. If I send more requests immediately after each other NULL is more frequent.

Thanks for the confirmation. The important thing is that the crashes
have stopped. Note I'm fairly certain this is issue only affects NIO2
with OpenSSL for TLS.

I'm currently looking at a way to pre-populate the TLS info if the
AccessLogValve requires this to avoid the error messages and the NULL
cipher suite entries.

Mark

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: JVM Crash in tcnative due to concurrency/timing in HTTP/2

Peter Kreuser
Mark,


> Am 23.06.2017 um 17:32 schrieb Mark Thomas <[hidden email]>:
>
> On 23/06/17 16:02, [hidden email] wrote:
>> Mark,
>>>
>>> Am 23.06.2017 um 15:17 schrieb Mark Thomas <[hidden email]>:
>>>
>>> On 13/06/17 11:57, Mark Thomas wrote:
>>>> On 13/06/17 10:13, Kreuser, Peter wrote:
>>>>> Mark,
>>>>>
>>>>>> On 09/06/17 16:02, Kreuser, Peter wrote:
>>>>>>> Hi all,
>>>>>>>
>>>>>>> Sorry for the long text. I hope somebody can help me track down the problem I'm facing with Tomcat (8.5.15), tcnative (1.2.12), openssl (1.1.0e) and HTTP/2. JVM is zulu-8.21.0.1 (1.8.0_131-b11)
>>>>>>
>>>>>> No need to apologise. There was enough information in this report for me
>>>>>> to be 95% sure of what was going on before I tested things locally.
>>>>>>
>>>>>> <snip/>
>>>>>>
>>>>>>> Would be nice if somebody here could tell me if that is a bug or if I have some misconfiguration.
>>>>>>
>>>>>> It is a bug.
>>>>>>
>>>>>> The short explanation is that with NIO2, the style of the API (async
>>>>>> read/write with CompletionHandlers) means that a TLS connection can get
>>>>>> closed down (by a CompletionHandler) before the access log entry is
>>>>>> written. This leads to passing a NULL reference to some native code
>>>>>> which - unsurprisingly - falls over (Tomcat is meant to make sure it
>>>>>> doesn't pass NULL references).
>>>>>>
>>>>>> There is an easy fix but the side-effect is that sometimes the cipher
>>>>>> suite name won't appear in the access log. Depending on timing on your
>>>>>> system that could be very frequent to almost never. I'd roughly expect
>>>>>> it to happen as frequently as the crashes.
>>>>>>
>>>>>> The harder fix is to note that we want the info and to ensure we cache
>>>>>> it before the TLS connection is closed down. In theory, we could do this
>>>>>> in the access log but it means parsing the request attribute config.
>>>>>> Performance wise, there will be an impact on users since the lookup to
>>>>>> populate the cache will occur during request processing rather than
>>>>>> afterwards. However, overall throughput should be unaffected.
>>>>>>
>>>>>> An alternative fix is for the application to trigger the caching of the
>>>>>> TLS info by requesting one of the TLD attributes.
>>>>>>
>>>>>> Switching to NIO should also fix this. Note, I would not expect NIO and
>>>>>> NIO2 to be much different performance wise.
>>>>>>
>>>>>> Would the easy fix be sufficient?
>>>>>>
>>>>>> Mark
>>>>>
>>>>> For a start the easy fix would be sufficient. Apparently this is only a real problem when I call the site via curl, as it does not happen in the various browsers (at least not on every call).
>>>>>
>>>>> We should at least make users aware that certain information can be missing in the accesslog and the cause of it (in the AccessLogValve documentation?). And are there any other implications of the asynchronicity? I guess the behavior of NIO2 and HTTP/2 processing is not obvious?
>>>>>
>>>>> Your alternative fix would be to copy the attributes during request processing (in a filter) and then log them from the filter request attributes instead of the predefined ssl attributes? Correct?
>>>>>
>>>>> Thank you for your support and the bug fix ;-)! Have you filed the bug internally or should I?
>>>>
>>>> I need to look into this some more first. Remy's comments suggest that
>>>> my first analysis might have been wrong.
>>>
>>> I've been looking into this this afternoon. Rémy has made some changes
>>> to trunk and 8.5.x that prevent the crash.
>>>
>>> It does appear that a timing issue is involved. I'm still looking into
>>> the details.
>>>
>>> At the moment, the SSL_NULL_WITH_NULL_NULL cipher suite is recorded
>>> rather than the crash when the TLS information is not available when the
>>> access log needs it.
>>>
>>> Rémy's changes are in the 9.0.x and 8.5.x releases being voted on at the
>>> moment.
>>>
>>> Mark
>>>
>>> ---------------------------------------------------------------------
>>
>> That seems to help. I get
>>
>> 192.168.1.10 - - [23/Jun/2017:16:53:22 +0200] 'GET / HTTP/2.0' 200 16 5 8843 '-' 'curl/7.54.0' - SSL_NULL_WITH_NULL_NULL
>>
>> plus on the console:
>>
>> 23-Jun-2017 16:53:22.045 WARNING [https-openssl-nio2-8843-exec-3] org.apache.coyote.AbstractProcessor.populateSslRequestAttributes Exception getting SSL attributes
>> java.lang.IllegalStateException: SSL session ID not available
>> at org.apache.tomcat.util.net.openssl.OpenSSLEngine$OpenSSLSession.getId(OpenSSLEngine.java:1047)
>> at org.apache.tomcat.util.net.jsse.JSSESupport.getSessionId(JSSESupport.java:156)
>> at org.apache.coyote.AbstractProcessor.populateSslRequestAttributes(AbstractProcessor.java:608)
>> at org.apache.coyote.AbstractProcessor.action(AbstractProcessor.java:361)
>> at org.apache.coyote.Request.action(Request.java:426)
>> at org.apache.catalina.connector.Request.getAttribute(Request.java:903)
>> at org.apache.catalina.valves.AbstractAccessLogValve$RequestAttributeElement.addElement(AbstractAccessLogValve.java:1439)
>> at org.apache.catalina.valves.AbstractAccessLogValve.log(AbstractAccessLogValve.java:653)
>> at org.apache.catalina.core.AccessLogAdapter.log(AccessLogAdapter.java:48)
>> at org.apache.catalina.core.ContainerBase.logAccess(ContainerBase.java:1065)
>> at org.apache.catalina.core.ContainerBase.logAccess(ContainerBase.java:1072)
>> at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:401)
>> at org.apache.coyote.http2.StreamProcessor.service(StreamProcessor.java:245)
>> at org.apache.coyote.AbstractProcessorLight.process(AbstractProcessorLight.java:66)
>> at org.apache.coyote.http2.StreamProcessor.process(StreamProcessor.java:65)
>> at org.apache.coyote.http2.StreamRunnable.run(StreamRunnable.java:35)
>> at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>> at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>> at org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread.java:61)
>> at java.lang.Thread.run(Thread.java:748)
>>
>> However I still see a timing problem, as it is not always happening. Maybe 5 out of 7 are NULL. If I send more requests immediately after each other NULL is more frequent.
>
> Thanks for the confirmation. The important thing is that the crashes
> have stopped. Note I'm fairly certain this is issue only affects NIO2
> with OpenSSL for TLS.
>
> I'm currently looking at a way to pre-populate the TLS info if the
> AccessLogValve requires this to avoid the error messages and the NULL
> cipher suite entries.
>
> Mark
>

I agree with that - no more crashes is key!

Thanks. Have a great weekend.

Peter


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: JVM Crash in tcnative due to concurrency/timing in HTTP/2

Mark Thomas-2
On 23/06/17 17:52, [hidden email] wrote:
>> Am 23.06.2017 um 17:32 schrieb Mark Thomas <[hidden email]>:

<snip/>

>>> However I still see a timing problem, as it is not always happening. Maybe 5 out of 7 are NULL. If I send more requests immediately after each other NULL is more frequent.
>>
>> Thanks for the confirmation. The important thing is that the crashes
>> have stopped. Note I'm fairly certain this is issue only affects NIO2
>> with OpenSSL for TLS.
>>
>> I'm currently looking at a way to pre-populate the TLS info if the
>> AccessLogValve requires this to avoid the error messages and the NULL
>> cipher suite entries.
>>
>> Mark
>>
>
> I agree with that - no more crashes is key!
>
> Thanks. Have a great weekend.

The ISEs are fixed. Actually the test has been removed. It is possible
to see null session IDs with OpenSSL if session tickets are enabled.

I've also added some code to the AccessLogValve that pre-populates the
TLS attributes if the access log is configured to log them.

So, as far as I can tell, everything should now be fixed.

The complete set of fixes will be in 9.0.0.M23 onwards and 8.5.17 onwards.

Mark

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: JVM Crash in tcnative due to concurrency/timing in HTTP/2

remm
2017-06-23 16:20 GMT-05:00 Mark Thomas <[hidden email]>:

> On 23/06/17 17:52, [hidden email] wrote:
> >> Am 23.06.2017 um 17:32 schrieb Mark Thomas <[hidden email]>:
>
> <snip/>
>
> >>> However I still see a timing problem, as it is not always happening.
> Maybe 5 out of 7 are NULL. If I send more requests immediately after each
> other NULL is more frequent.
> >>
> >> Thanks for the confirmation. The important thing is that the crashes
> >> have stopped. Note I'm fairly certain this is issue only affects NIO2
> >> with OpenSSL for TLS.
> >>
> >> I'm currently looking at a way to pre-populate the TLS info if the
> >> AccessLogValve requires this to avoid the error messages and the NULL
> >> cipher suite entries.
> >>
> >> Mark
> >>
> >
> > I agree with that - no more crashes is key!
> >
> > Thanks. Have a great weekend.
>
> The ISEs are fixed. Actually the test has been removed. It is possible
> to see null session IDs with OpenSSL if session tickets are enabled.
>
> I've also added some code to the AccessLogValve that pre-populates the
> TLS attributes if the access log is configured to log them.
>
> So, as far as I can tell, everything should now be fixed.
>
> The complete set of fixes will be in 9.0.0.M23 onwards and 8.5.17 onwards.
>
> Ok, so I'll add my patch for APR to fix 60461 then ?

IMO, until the connection close waits for all streams to complete, which
may not be the best idea, the sync is going to be needed for all connectors.

Rémy
Loading...