[GitHub] [tomcat] kamnani opened a new pull request #352: Optimizing Resource Lookup using Bloom Filter

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
32 messages Options
12
Reply | Threaded
Open this post in threaded view
|

[GitHub] [tomcat] kamnani opened a new pull request #352: Optimizing Resource Lookup using Bloom Filter

GitBox

kamnani opened a new pull request #352:
URL: https://github.com/apache/tomcat/pull/352


   This is a redo of Previous PR: https://github.com/apache/tomcat/pull/348
   
   The following changes have been made based on the suggestions earlier:
   1) Flag can be passed through Host Configuration. By default it remains false and Tomcat's default behavior will be untouched. See example below.
   2) Jar Contents will be refreshed every 60 seconds if corresponding jar file has been modified.  
   3) PR is against master.
   
   Apologies in case something is missed.
   
   Example on Bloom Filter : https://llimllib.github.io/bloomfilter-tutorial/#:~:text=A%20Bloom%20filter%20is%20a,may%20be%20in%20the%20set.
   
   ```
   <Host name="localhost"  appBase="webapps"
               unpackWARs="true" autoDeploy="true" fastClasspathScanning="true">


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]



---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[GitHub] [tomcat] martin-g commented on a change in pull request #352: Optimizing Resource Lookup using Bloom Filter

GitBox

martin-g commented on a change in pull request #352:
URL: https://github.com/apache/tomcat/pull/352#discussion_r482996760



##########
File path: java/org/apache/catalina/webresources/JarContents.java
##########
@@ -0,0 +1,144 @@
+package org.apache.catalina.webresources;
+
+import java.util.BitSet;
+import java.util.Enumeration;
+import java.util.jar.JarEntry;
+import java.util.jar.JarFile;
+
+/**
+ * This class represents the contents of a jar by determining whether a given
+ * resource <b>might</b> be in the cache, based on a bloom filter. This is not a
+ * general-purpose bloom filter because it contains logic to strip out
+ * characters from the beginning of the key.
+ *
+ * The hash methods are simple but good enough for this purpose.
+ */
+public final class JarContents {
+    private final BitSet bits1;
+    private final BitSet bits2;
+    /**
+     * Constant used by a typical hashing method.
+     */
+    private static final int HASH_PRIME_1 = 31;
+
+    /**
+     * Constant used by a typical hashing method.
+     */
+    private static final int HASH_PRIME_2 = 17;
+
+    /**
+     * Size of the fixed-length bit table. Larger reduces false positives,
+     * smaller saves memory.
+     */
+    private static final int TABLE_SIZE = 2048;
+
+    /**
+     * Parses the passed-in jar and populates the bit array.
+     *
+     * @param jar
+     */
+    public JarContents(JarFile jar) {
+        Enumeration<JarEntry> entries = jar.entries();
+        bits1 = new BitSet(TABLE_SIZE);
+        bits2 = new BitSet(TABLE_SIZE);
+
+        // Enumerations. When will they update this API?!
+        while (entries.hasMoreElements()) {
+            JarEntry entry = entries.nextElement();
+            String name = entry.getName();
+            int startPos = 0;
+
+            // If the path starts with a slash, that's not useful information.
+            // Skipping it increases the significance of our key by
+            // removing an insignificant character.
+            boolean precedingSlash = name.charAt(0) == '/';
+            if (precedingSlash) {
+                startPos = 1;
+            }
+
+            // Find the correct table slot
+            int pathHash1 = hashcode1(name, startPos);
+            int pathHash2 = hashcode2(name, startPos);
+
+            bits1.set(pathHash1 % TABLE_SIZE);
+            bits2.set(pathHash2 % TABLE_SIZE);
+        }
+    }
+
+    /**
+     * Simple hashcode of a portion of the string. Typically we would use
+     * substring, but memory and runtime speed are critical.
+     *
+     * @param content
+     *            Wrapping String.
+     * @param startPos
+     *            First character in the range.
+     * @return hashcode of the range.
+     */
+    private int hashcode1(String content, int startPos) {
+        int h = 7;
+        int contentLength = content.length();
+        for (int i = startPos; i < contentLength; i++) {
+            h = HASH_PRIME_1 * h + content.charAt(i);
+        }
+
+        if (h < 0) {
+            h = h * -1;
+        }
+        return h;
+    }
+
+    /**
+     * Simple hashcode of a portion of the string. Typically we would use
+     * substring, but memory and runtime speed are critical.
+     *
+     * @param content
+     *            Wrapping String.
+     * @param startPos
+     *            First character in the range.
+     * @return hashcode of the range.
+     */
+    private int hashcode2(String content, int startPos) {
+        int h = 7;

Review comment:
       The method body looks the same as `hashcode1`. Why don't you add an additional parameter for the `HASH_PRIME` and remove one of them

##########
File path: java/org/apache/catalina/webresources/AbstractArchiveResourceSet.java
##########
@@ -211,6 +216,43 @@ public final WebResource getResource(String path) {
         String webAppMount = getWebAppMount();
         WebResourceRoot root = getRoot();
 
+
+        /*
+         * This initializes (when necessary) and checks the jarContents, which
+         * is a highly efficient index of the files stored in the jar. If
+         * jarContents reports that this resource definitely does not contain
+         * the path, we can end this method and move on to the next jar.
+         *
+         * Note: the initialization is thread-safe because multiple simultaneous
+         * threads will create a complete and valid copy, then set the shared
+         * pointer. This guarantees the shared pointer will always go to a
+         * valid object. The cost of multiple copies is small since only one of
+         * them will be long-lived.
+         */
+        try {
+            if ((root.getContext() != null) && (root.getContext().getParent()) != null &&
+                    (((Host) root.getContext().getParent()).getFastClasspathScanning())) {
+
+                if (jarContents == null ||
+                        ((System.currentTimeMillis() - prevJarOpenTime > jarRefreshTime) &&
+                            !openJarFile().equals(jar))) {
+                    try {
+                        jar = openJarFile();

Review comment:
       You call `openJarFile()` twice here. This will increase `org.apache.catalina.webresources.AbstractArchiveResourceSet#archiveUseCount` twice and later it won't be GC-ed by `org.apache.catalina.webresources.AbstractArchiveResourceSet#gc`.
   I think you need to call `org.apache.catalina.webresources.AbstractArchiveResourceSet#closeJarFile` after creating the JarContents




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]



---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[GitHub] [tomcat] kamnani commented on a change in pull request #352: Optimizing Resource Lookup using Bloom Filter

GitBox
In reply to this post by GitBox

kamnani commented on a change in pull request #352:
URL: https://github.com/apache/tomcat/pull/352#discussion_r483173883



##########
File path: java/org/apache/catalina/webresources/JarContents.java
##########
@@ -0,0 +1,144 @@
+package org.apache.catalina.webresources;
+
+import java.util.BitSet;
+import java.util.Enumeration;
+import java.util.jar.JarEntry;
+import java.util.jar.JarFile;
+
+/**
+ * This class represents the contents of a jar by determining whether a given
+ * resource <b>might</b> be in the cache, based on a bloom filter. This is not a
+ * general-purpose bloom filter because it contains logic to strip out
+ * characters from the beginning of the key.
+ *
+ * The hash methods are simple but good enough for this purpose.
+ */
+public final class JarContents {
+    private final BitSet bits1;
+    private final BitSet bits2;
+    /**
+     * Constant used by a typical hashing method.
+     */
+    private static final int HASH_PRIME_1 = 31;
+
+    /**
+     * Constant used by a typical hashing method.
+     */
+    private static final int HASH_PRIME_2 = 17;
+
+    /**
+     * Size of the fixed-length bit table. Larger reduces false positives,
+     * smaller saves memory.
+     */
+    private static final int TABLE_SIZE = 2048;
+
+    /**
+     * Parses the passed-in jar and populates the bit array.
+     *
+     * @param jar
+     */
+    public JarContents(JarFile jar) {
+        Enumeration<JarEntry> entries = jar.entries();
+        bits1 = new BitSet(TABLE_SIZE);
+        bits2 = new BitSet(TABLE_SIZE);
+
+        // Enumerations. When will they update this API?!
+        while (entries.hasMoreElements()) {
+            JarEntry entry = entries.nextElement();
+            String name = entry.getName();
+            int startPos = 0;
+
+            // If the path starts with a slash, that's not useful information.
+            // Skipping it increases the significance of our key by
+            // removing an insignificant character.
+            boolean precedingSlash = name.charAt(0) == '/';
+            if (precedingSlash) {
+                startPos = 1;
+            }
+
+            // Find the correct table slot
+            int pathHash1 = hashcode1(name, startPos);
+            int pathHash2 = hashcode2(name, startPos);
+
+            bits1.set(pathHash1 % TABLE_SIZE);
+            bits2.set(pathHash2 % TABLE_SIZE);
+        }
+    }
+
+    /**
+     * Simple hashcode of a portion of the string. Typically we would use
+     * substring, but memory and runtime speed are critical.
+     *
+     * @param content
+     *            Wrapping String.
+     * @param startPos
+     *            First character in the range.
+     * @return hashcode of the range.
+     */
+    private int hashcode1(String content, int startPos) {
+        int h = 7;
+        int contentLength = content.length();
+        for (int i = startPos; i < contentLength; i++) {
+            h = HASH_PRIME_1 * h + content.charAt(i);
+        }
+
+        if (h < 0) {
+            h = h * -1;
+        }
+        return h;
+    }
+
+    /**
+     * Simple hashcode of a portion of the string. Typically we would use
+     * substring, but memory and runtime speed are critical.
+     *
+     * @param content
+     *            Wrapping String.
+     * @param startPos
+     *            First character in the range.
+     * @return hashcode of the range.
+     */
+    private int hashcode2(String content, int startPos) {
+        int h = 7;

Review comment:
       Resolved with latest commit

##########
File path: java/org/apache/catalina/webresources/AbstractArchiveResourceSet.java
##########
@@ -211,6 +216,43 @@ public final WebResource getResource(String path) {
         String webAppMount = getWebAppMount();
         WebResourceRoot root = getRoot();
 
+
+        /*
+         * This initializes (when necessary) and checks the jarContents, which
+         * is a highly efficient index of the files stored in the jar. If
+         * jarContents reports that this resource definitely does not contain
+         * the path, we can end this method and move on to the next jar.
+         *
+         * Note: the initialization is thread-safe because multiple simultaneous
+         * threads will create a complete and valid copy, then set the shared
+         * pointer. This guarantees the shared pointer will always go to a
+         * valid object. The cost of multiple copies is small since only one of
+         * them will be long-lived.
+         */
+        try {
+            if ((root.getContext() != null) && (root.getContext().getParent()) != null &&
+                    (((Host) root.getContext().getParent()).getFastClasspathScanning())) {
+
+                if (jarContents == null ||
+                        ((System.currentTimeMillis() - prevJarOpenTime > jarRefreshTime) &&
+                            !openJarFile().equals(jar))) {
+                    try {
+                        jar = openJarFile();

Review comment:
       Resolved with latest commit




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]



---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[GitHub] [tomcat] kamnani commented on pull request #352: Optimizing Resource Lookup using Bloom Filter

GitBox
In reply to this post by GitBox

kamnani commented on pull request #352:
URL: https://github.com/apache/tomcat/pull/352#issuecomment-689825942


   @martin-g   Thanks for your time on this : )
   I have marked all of the conversations as resolved after the latest commits. Do we have any more feedback on this PR?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]



---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[GitHub] [tomcat] martin-g commented on a change in pull request #352: Optimizing Resource Lookup using Bloom Filter

GitBox
In reply to this post by GitBox

martin-g commented on a change in pull request #352:
URL: https://github.com/apache/tomcat/pull/352#discussion_r486991509



##########
File path: java/org/apache/catalina/webresources/AbstractArchiveResourceSet.java
##########
@@ -211,6 +216,46 @@ public final WebResource getResource(String path) {
         String webAppMount = getWebAppMount();
         WebResourceRoot root = getRoot();
 
+
+        /*
+         * This initializes (when necessary) and checks the jarContents, which
+         * is a highly efficient index of the files stored in the jar. If
+         * jarContents reports that this resource definitely does not contain
+         * the path, we can end this method and move on to the next jar.
+         *
+         * Note: the initialization is thread-safe because multiple simultaneous
+         * threads will create a complete and valid copy, then set the shared
+         * pointer. This guarantees the shared pointer will always go to a
+         * valid object. The cost of multiple copies is small since only one of
+         * them will be long-lived.
+         */
+        try {
+            if ((root.getContext() != null) && (root.getContext().getParent()) != null &&
+                    (((Host) root.getContext().getParent()).getFastClasspathScanning())) {
+
+                if (jarContents == null ||
+                        ((System.currentTimeMillis() - prevJarOpenTime > jarRefreshTime) &&
+                            !openJarFile().equals(jar))) {

Review comment:
       This will still leak an `opening` of the same jar. If the `equals()` returns `true` then it won't enter the body of the `if` and `closeJarFile()` won't be called.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]



---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[GitHub] [tomcat] martin-g commented on a change in pull request #352: Optimizing Resource Lookup using Bloom Filter

GitBox
In reply to this post by GitBox

martin-g commented on a change in pull request #352:
URL: https://github.com/apache/tomcat/pull/352#discussion_r486993681



##########
File path: java/org/apache/catalina/webresources/JarContents.java
##########
@@ -0,0 +1,122 @@
+package org.apache.catalina.webresources;
+
+import java.util.BitSet;
+import java.util.Enumeration;
+import java.util.jar.JarEntry;
+import java.util.jar.JarFile;
+
+/**
+ * This class represents the contents of a jar by determining whether a given
+ * resource <b>might</b> be in the cache, based on a bloom filter. This is not a
+ * general-purpose bloom filter because it contains logic to strip out
+ * characters from the beginning of the key.
+ *
+ * The hash methods are simple but good enough for this purpose.
+ */
+public final class JarContents {
+    private final BitSet bits1;
+    private final BitSet bits2;
+    /**
+     * Constant used by a typical hashing method.
+     */
+    private static final int HASH_PRIME_1 = 31;
+
+    /**
+     * Constant used by a typical hashing method.
+     */
+    private static final int HASH_PRIME_2 = 17;
+
+    /**
+     * Size of the fixed-length bit table. Larger reduces false positives,
+     * smaller saves memory.
+     */
+    private static final int TABLE_SIZE = 2048;
+
+    /**
+     * Parses the passed-in jar and populates the bit array.
+     *
+     * @param jar
+     */
+    public JarContents(JarFile jar) {
+        Enumeration<JarEntry> entries = jar.entries();
+        bits1 = new BitSet(TABLE_SIZE);
+        bits2 = new BitSet(TABLE_SIZE);
+
+        // Enumerations. When will they update this API?!
+        while (entries.hasMoreElements()) {
+            JarEntry entry = entries.nextElement();
+            String name = entry.getName();
+            int startPos = 0;
+
+            // If the path starts with a slash, that's not useful information.
+            // Skipping it increases the significance of our key by
+            // removing an insignificant character.
+            boolean precedingSlash = name.charAt(0) == '/';
+            if (precedingSlash) {
+                startPos = 1;
+            }
+
+            // Find the correct table slot
+            int pathHash1 = hashcode(name, startPos,HASH_PRIME_1);

Review comment:
       missing space before the third argument




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]



---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[GitHub] [tomcat] martin-g commented on a change in pull request #352: Optimizing Resource Lookup using Bloom Filter

GitBox
In reply to this post by GitBox

martin-g commented on a change in pull request #352:
URL: https://github.com/apache/tomcat/pull/352#discussion_r486991509



##########
File path: java/org/apache/catalina/webresources/AbstractArchiveResourceSet.java
##########
@@ -211,6 +216,46 @@ public final WebResource getResource(String path) {
         String webAppMount = getWebAppMount();
         WebResourceRoot root = getRoot();
 
+
+        /*
+         * This initializes (when necessary) and checks the jarContents, which
+         * is a highly efficient index of the files stored in the jar. If
+         * jarContents reports that this resource definitely does not contain
+         * the path, we can end this method and move on to the next jar.
+         *
+         * Note: the initialization is thread-safe because multiple simultaneous
+         * threads will create a complete and valid copy, then set the shared
+         * pointer. This guarantees the shared pointer will always go to a
+         * valid object. The cost of multiple copies is small since only one of
+         * them will be long-lived.
+         */
+        try {
+            if ((root.getContext() != null) && (root.getContext().getParent()) != null &&
+                    (((Host) root.getContext().getParent()).getFastClasspathScanning())) {
+
+                if (jarContents == null ||
+                        ((System.currentTimeMillis() - prevJarOpenTime > jarRefreshTime) &&
+                            !openJarFile().equals(jar))) {

Review comment:
       This will still leak an `opening` of the same jar. If the `equals()` returns `true` then it won't enter the body of the `if` and `closeJarFile()` won't be called.

##########
File path: java/org/apache/catalina/webresources/JarContents.java
##########
@@ -0,0 +1,122 @@
+package org.apache.catalina.webresources;
+
+import java.util.BitSet;
+import java.util.Enumeration;
+import java.util.jar.JarEntry;
+import java.util.jar.JarFile;
+
+/**
+ * This class represents the contents of a jar by determining whether a given
+ * resource <b>might</b> be in the cache, based on a bloom filter. This is not a
+ * general-purpose bloom filter because it contains logic to strip out
+ * characters from the beginning of the key.
+ *
+ * The hash methods are simple but good enough for this purpose.
+ */
+public final class JarContents {
+    private final BitSet bits1;
+    private final BitSet bits2;
+    /**
+     * Constant used by a typical hashing method.
+     */
+    private static final int HASH_PRIME_1 = 31;
+
+    /**
+     * Constant used by a typical hashing method.
+     */
+    private static final int HASH_PRIME_2 = 17;
+
+    /**
+     * Size of the fixed-length bit table. Larger reduces false positives,
+     * smaller saves memory.
+     */
+    private static final int TABLE_SIZE = 2048;
+
+    /**
+     * Parses the passed-in jar and populates the bit array.
+     *
+     * @param jar
+     */
+    public JarContents(JarFile jar) {
+        Enumeration<JarEntry> entries = jar.entries();
+        bits1 = new BitSet(TABLE_SIZE);
+        bits2 = new BitSet(TABLE_SIZE);
+
+        // Enumerations. When will they update this API?!
+        while (entries.hasMoreElements()) {
+            JarEntry entry = entries.nextElement();
+            String name = entry.getName();
+            int startPos = 0;
+
+            // If the path starts with a slash, that's not useful information.
+            // Skipping it increases the significance of our key by
+            // removing an insignificant character.
+            boolean precedingSlash = name.charAt(0) == '/';
+            if (precedingSlash) {
+                startPos = 1;
+            }
+
+            // Find the correct table slot
+            int pathHash1 = hashcode(name, startPos,HASH_PRIME_1);

Review comment:
       missing space before the third argument

##########
File path: java/org/apache/catalina/webresources/AbstractArchiveResourceSet.java
##########
@@ -211,6 +216,46 @@ public final WebResource getResource(String path) {
         String webAppMount = getWebAppMount();
         WebResourceRoot root = getRoot();
 
+
+        /*
+         * This initializes (when necessary) and checks the jarContents, which
+         * is a highly efficient index of the files stored in the jar. If
+         * jarContents reports that this resource definitely does not contain
+         * the path, we can end this method and move on to the next jar.
+         *
+         * Note: the initialization is thread-safe because multiple simultaneous
+         * threads will create a complete and valid copy, then set the shared
+         * pointer. This guarantees the shared pointer will always go to a
+         * valid object. The cost of multiple copies is small since only one of
+         * them will be long-lived.
+         */
+        try {
+            if ((root.getContext() != null) && (root.getContext().getParent()) != null &&
+                    (((Host) root.getContext().getParent()).getFastClasspathScanning())) {
+
+                if (jarContents == null ||
+                        ((System.currentTimeMillis() - prevJarOpenTime > jarRefreshTime) &&
+                            !openJarFile().equals(jar))) {

Review comment:
       This will still leak an `opening` of the same jar. If the `equals()` returns `true` then it won't enter the body of the `if` and `closeJarFile()` won't be called.

##########
File path: java/org/apache/catalina/webresources/JarContents.java
##########
@@ -0,0 +1,122 @@
+package org.apache.catalina.webresources;
+
+import java.util.BitSet;
+import java.util.Enumeration;
+import java.util.jar.JarEntry;
+import java.util.jar.JarFile;
+
+/**
+ * This class represents the contents of a jar by determining whether a given
+ * resource <b>might</b> be in the cache, based on a bloom filter. This is not a
+ * general-purpose bloom filter because it contains logic to strip out
+ * characters from the beginning of the key.
+ *
+ * The hash methods are simple but good enough for this purpose.
+ */
+public final class JarContents {
+    private final BitSet bits1;
+    private final BitSet bits2;
+    /**
+     * Constant used by a typical hashing method.
+     */
+    private static final int HASH_PRIME_1 = 31;
+
+    /**
+     * Constant used by a typical hashing method.
+     */
+    private static final int HASH_PRIME_2 = 17;
+
+    /**
+     * Size of the fixed-length bit table. Larger reduces false positives,
+     * smaller saves memory.
+     */
+    private static final int TABLE_SIZE = 2048;
+
+    /**
+     * Parses the passed-in jar and populates the bit array.
+     *
+     * @param jar
+     */
+    public JarContents(JarFile jar) {
+        Enumeration<JarEntry> entries = jar.entries();
+        bits1 = new BitSet(TABLE_SIZE);
+        bits2 = new BitSet(TABLE_SIZE);
+
+        // Enumerations. When will they update this API?!
+        while (entries.hasMoreElements()) {
+            JarEntry entry = entries.nextElement();
+            String name = entry.getName();
+            int startPos = 0;
+
+            // If the path starts with a slash, that's not useful information.
+            // Skipping it increases the significance of our key by
+            // removing an insignificant character.
+            boolean precedingSlash = name.charAt(0) == '/';
+            if (precedingSlash) {
+                startPos = 1;
+            }
+
+            // Find the correct table slot
+            int pathHash1 = hashcode(name, startPos,HASH_PRIME_1);

Review comment:
       missing space before the third argument




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]



---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[GitHub] [tomcat] martin-g commented on a change in pull request #352: Optimizing Resource Lookup using Bloom Filter

GitBox
In reply to this post by GitBox

martin-g commented on a change in pull request #352:
URL: https://github.com/apache/tomcat/pull/352#discussion_r486991509



##########
File path: java/org/apache/catalina/webresources/AbstractArchiveResourceSet.java
##########
@@ -211,6 +216,46 @@ public final WebResource getResource(String path) {
         String webAppMount = getWebAppMount();
         WebResourceRoot root = getRoot();
 
+
+        /*
+         * This initializes (when necessary) and checks the jarContents, which
+         * is a highly efficient index of the files stored in the jar. If
+         * jarContents reports that this resource definitely does not contain
+         * the path, we can end this method and move on to the next jar.
+         *
+         * Note: the initialization is thread-safe because multiple simultaneous
+         * threads will create a complete and valid copy, then set the shared
+         * pointer. This guarantees the shared pointer will always go to a
+         * valid object. The cost of multiple copies is small since only one of
+         * them will be long-lived.
+         */
+        try {
+            if ((root.getContext() != null) && (root.getContext().getParent()) != null &&
+                    (((Host) root.getContext().getParent()).getFastClasspathScanning())) {
+
+                if (jarContents == null ||
+                        ((System.currentTimeMillis() - prevJarOpenTime > jarRefreshTime) &&
+                            !openJarFile().equals(jar))) {

Review comment:
       This will still leak an `opening` of the same jar. If the `equals()` returns `true` then it won't enter the body of the `if` and `closeJarFile()` won't be called.

##########
File path: java/org/apache/catalina/webresources/JarContents.java
##########
@@ -0,0 +1,122 @@
+package org.apache.catalina.webresources;
+
+import java.util.BitSet;
+import java.util.Enumeration;
+import java.util.jar.JarEntry;
+import java.util.jar.JarFile;
+
+/**
+ * This class represents the contents of a jar by determining whether a given
+ * resource <b>might</b> be in the cache, based on a bloom filter. This is not a
+ * general-purpose bloom filter because it contains logic to strip out
+ * characters from the beginning of the key.
+ *
+ * The hash methods are simple but good enough for this purpose.
+ */
+public final class JarContents {
+    private final BitSet bits1;
+    private final BitSet bits2;
+    /**
+     * Constant used by a typical hashing method.
+     */
+    private static final int HASH_PRIME_1 = 31;
+
+    /**
+     * Constant used by a typical hashing method.
+     */
+    private static final int HASH_PRIME_2 = 17;
+
+    /**
+     * Size of the fixed-length bit table. Larger reduces false positives,
+     * smaller saves memory.
+     */
+    private static final int TABLE_SIZE = 2048;
+
+    /**
+     * Parses the passed-in jar and populates the bit array.
+     *
+     * @param jar
+     */
+    public JarContents(JarFile jar) {
+        Enumeration<JarEntry> entries = jar.entries();
+        bits1 = new BitSet(TABLE_SIZE);
+        bits2 = new BitSet(TABLE_SIZE);
+
+        // Enumerations. When will they update this API?!
+        while (entries.hasMoreElements()) {
+            JarEntry entry = entries.nextElement();
+            String name = entry.getName();
+            int startPos = 0;
+
+            // If the path starts with a slash, that's not useful information.
+            // Skipping it increases the significance of our key by
+            // removing an insignificant character.
+            boolean precedingSlash = name.charAt(0) == '/';
+            if (precedingSlash) {
+                startPos = 1;
+            }
+
+            // Find the correct table slot
+            int pathHash1 = hashcode(name, startPos,HASH_PRIME_1);

Review comment:
       missing space before the third argument

##########
File path: java/org/apache/catalina/webresources/AbstractArchiveResourceSet.java
##########
@@ -211,6 +216,46 @@ public final WebResource getResource(String path) {
         String webAppMount = getWebAppMount();
         WebResourceRoot root = getRoot();
 
+
+        /*
+         * This initializes (when necessary) and checks the jarContents, which
+         * is a highly efficient index of the files stored in the jar. If
+         * jarContents reports that this resource definitely does not contain
+         * the path, we can end this method and move on to the next jar.
+         *
+         * Note: the initialization is thread-safe because multiple simultaneous
+         * threads will create a complete and valid copy, then set the shared
+         * pointer. This guarantees the shared pointer will always go to a
+         * valid object. The cost of multiple copies is small since only one of
+         * them will be long-lived.
+         */
+        try {
+            if ((root.getContext() != null) && (root.getContext().getParent()) != null &&
+                    (((Host) root.getContext().getParent()).getFastClasspathScanning())) {
+
+                if (jarContents == null ||
+                        ((System.currentTimeMillis() - prevJarOpenTime > jarRefreshTime) &&
+                            !openJarFile().equals(jar))) {

Review comment:
       This will still leak an `opening` of the same jar. If the `equals()` returns `true` then it won't enter the body of the `if` and `closeJarFile()` won't be called.

##########
File path: java/org/apache/catalina/webresources/JarContents.java
##########
@@ -0,0 +1,122 @@
+package org.apache.catalina.webresources;
+
+import java.util.BitSet;
+import java.util.Enumeration;
+import java.util.jar.JarEntry;
+import java.util.jar.JarFile;
+
+/**
+ * This class represents the contents of a jar by determining whether a given
+ * resource <b>might</b> be in the cache, based on a bloom filter. This is not a
+ * general-purpose bloom filter because it contains logic to strip out
+ * characters from the beginning of the key.
+ *
+ * The hash methods are simple but good enough for this purpose.
+ */
+public final class JarContents {
+    private final BitSet bits1;
+    private final BitSet bits2;
+    /**
+     * Constant used by a typical hashing method.
+     */
+    private static final int HASH_PRIME_1 = 31;
+
+    /**
+     * Constant used by a typical hashing method.
+     */
+    private static final int HASH_PRIME_2 = 17;
+
+    /**
+     * Size of the fixed-length bit table. Larger reduces false positives,
+     * smaller saves memory.
+     */
+    private static final int TABLE_SIZE = 2048;
+
+    /**
+     * Parses the passed-in jar and populates the bit array.
+     *
+     * @param jar
+     */
+    public JarContents(JarFile jar) {
+        Enumeration<JarEntry> entries = jar.entries();
+        bits1 = new BitSet(TABLE_SIZE);
+        bits2 = new BitSet(TABLE_SIZE);
+
+        // Enumerations. When will they update this API?!
+        while (entries.hasMoreElements()) {
+            JarEntry entry = entries.nextElement();
+            String name = entry.getName();
+            int startPos = 0;
+
+            // If the path starts with a slash, that's not useful information.
+            // Skipping it increases the significance of our key by
+            // removing an insignificant character.
+            boolean precedingSlash = name.charAt(0) == '/';
+            if (precedingSlash) {
+                startPos = 1;
+            }
+
+            // Find the correct table slot
+            int pathHash1 = hashcode(name, startPos,HASH_PRIME_1);

Review comment:
       missing space before the third argument

##########
File path: java/org/apache/catalina/webresources/AbstractArchiveResourceSet.java
##########
@@ -211,6 +216,46 @@ public final WebResource getResource(String path) {
         String webAppMount = getWebAppMount();
         WebResourceRoot root = getRoot();
 
+
+        /*
+         * This initializes (when necessary) and checks the jarContents, which
+         * is a highly efficient index of the files stored in the jar. If
+         * jarContents reports that this resource definitely does not contain
+         * the path, we can end this method and move on to the next jar.
+         *
+         * Note: the initialization is thread-safe because multiple simultaneous
+         * threads will create a complete and valid copy, then set the shared
+         * pointer. This guarantees the shared pointer will always go to a
+         * valid object. The cost of multiple copies is small since only one of
+         * them will be long-lived.
+         */
+        try {
+            if ((root.getContext() != null) && (root.getContext().getParent()) != null &&
+                    (((Host) root.getContext().getParent()).getFastClasspathScanning())) {
+
+                if (jarContents == null ||
+                        ((System.currentTimeMillis() - prevJarOpenTime > jarRefreshTime) &&
+                            !openJarFile().equals(jar))) {

Review comment:
       This will still leak an `opening` of the same jar. If the `equals()` returns `true` then it won't enter the body of the `if` and `closeJarFile()` won't be called.

##########
File path: java/org/apache/catalina/webresources/JarContents.java
##########
@@ -0,0 +1,122 @@
+package org.apache.catalina.webresources;
+
+import java.util.BitSet;
+import java.util.Enumeration;
+import java.util.jar.JarEntry;
+import java.util.jar.JarFile;
+
+/**
+ * This class represents the contents of a jar by determining whether a given
+ * resource <b>might</b> be in the cache, based on a bloom filter. This is not a
+ * general-purpose bloom filter because it contains logic to strip out
+ * characters from the beginning of the key.
+ *
+ * The hash methods are simple but good enough for this purpose.
+ */
+public final class JarContents {
+    private final BitSet bits1;
+    private final BitSet bits2;
+    /**
+     * Constant used by a typical hashing method.
+     */
+    private static final int HASH_PRIME_1 = 31;
+
+    /**
+     * Constant used by a typical hashing method.
+     */
+    private static final int HASH_PRIME_2 = 17;
+
+    /**
+     * Size of the fixed-length bit table. Larger reduces false positives,
+     * smaller saves memory.
+     */
+    private static final int TABLE_SIZE = 2048;
+
+    /**
+     * Parses the passed-in jar and populates the bit array.
+     *
+     * @param jar
+     */
+    public JarContents(JarFile jar) {
+        Enumeration<JarEntry> entries = jar.entries();
+        bits1 = new BitSet(TABLE_SIZE);
+        bits2 = new BitSet(TABLE_SIZE);
+
+        // Enumerations. When will they update this API?!
+        while (entries.hasMoreElements()) {
+            JarEntry entry = entries.nextElement();
+            String name = entry.getName();
+            int startPos = 0;
+
+            // If the path starts with a slash, that's not useful information.
+            // Skipping it increases the significance of our key by
+            // removing an insignificant character.
+            boolean precedingSlash = name.charAt(0) == '/';
+            if (precedingSlash) {
+                startPos = 1;
+            }
+
+            // Find the correct table slot
+            int pathHash1 = hashcode(name, startPos,HASH_PRIME_1);

Review comment:
       missing space before the third argument

##########
File path: java/org/apache/catalina/webresources/AbstractArchiveResourceSet.java
##########
@@ -211,6 +216,46 @@ public final WebResource getResource(String path) {
         String webAppMount = getWebAppMount();
         WebResourceRoot root = getRoot();
 
+
+        /*
+         * This initializes (when necessary) and checks the jarContents, which
+         * is a highly efficient index of the files stored in the jar. If
+         * jarContents reports that this resource definitely does not contain
+         * the path, we can end this method and move on to the next jar.
+         *
+         * Note: the initialization is thread-safe because multiple simultaneous
+         * threads will create a complete and valid copy, then set the shared
+         * pointer. This guarantees the shared pointer will always go to a
+         * valid object. The cost of multiple copies is small since only one of
+         * them will be long-lived.
+         */
+        try {
+            if ((root.getContext() != null) && (root.getContext().getParent()) != null &&
+                    (((Host) root.getContext().getParent()).getFastClasspathScanning())) {
+
+                if (jarContents == null ||
+                        ((System.currentTimeMillis() - prevJarOpenTime > jarRefreshTime) &&
+                            !openJarFile().equals(jar))) {

Review comment:
       This will still leak an `opening` of the same jar. If the `equals()` returns `true` then it won't enter the body of the `if` and `closeJarFile()` won't be called.

##########
File path: java/org/apache/catalina/webresources/JarContents.java
##########
@@ -0,0 +1,122 @@
+package org.apache.catalina.webresources;
+
+import java.util.BitSet;
+import java.util.Enumeration;
+import java.util.jar.JarEntry;
+import java.util.jar.JarFile;
+
+/**
+ * This class represents the contents of a jar by determining whether a given
+ * resource <b>might</b> be in the cache, based on a bloom filter. This is not a
+ * general-purpose bloom filter because it contains logic to strip out
+ * characters from the beginning of the key.
+ *
+ * The hash methods are simple but good enough for this purpose.
+ */
+public final class JarContents {
+    private final BitSet bits1;
+    private final BitSet bits2;
+    /**
+     * Constant used by a typical hashing method.
+     */
+    private static final int HASH_PRIME_1 = 31;
+
+    /**
+     * Constant used by a typical hashing method.
+     */
+    private static final int HASH_PRIME_2 = 17;
+
+    /**
+     * Size of the fixed-length bit table. Larger reduces false positives,
+     * smaller saves memory.
+     */
+    private static final int TABLE_SIZE = 2048;
+
+    /**
+     * Parses the passed-in jar and populates the bit array.
+     *
+     * @param jar
+     */
+    public JarContents(JarFile jar) {
+        Enumeration<JarEntry> entries = jar.entries();
+        bits1 = new BitSet(TABLE_SIZE);
+        bits2 = new BitSet(TABLE_SIZE);
+
+        // Enumerations. When will they update this API?!
+        while (entries.hasMoreElements()) {
+            JarEntry entry = entries.nextElement();
+            String name = entry.getName();
+            int startPos = 0;
+
+            // If the path starts with a slash, that's not useful information.
+            // Skipping it increases the significance of our key by
+            // removing an insignificant character.
+            boolean precedingSlash = name.charAt(0) == '/';
+            if (precedingSlash) {
+                startPos = 1;
+            }
+
+            // Find the correct table slot
+            int pathHash1 = hashcode(name, startPos,HASH_PRIME_1);

Review comment:
       missing space before the third argument

##########
File path: java/org/apache/catalina/webresources/AbstractArchiveResourceSet.java
##########
@@ -211,6 +216,46 @@ public final WebResource getResource(String path) {
         String webAppMount = getWebAppMount();
         WebResourceRoot root = getRoot();
 
+
+        /*
+         * This initializes (when necessary) and checks the jarContents, which
+         * is a highly efficient index of the files stored in the jar. If
+         * jarContents reports that this resource definitely does not contain
+         * the path, we can end this method and move on to the next jar.
+         *
+         * Note: the initialization is thread-safe because multiple simultaneous
+         * threads will create a complete and valid copy, then set the shared
+         * pointer. This guarantees the shared pointer will always go to a
+         * valid object. The cost of multiple copies is small since only one of
+         * them will be long-lived.
+         */
+        try {
+            if ((root.getContext() != null) && (root.getContext().getParent()) != null &&
+                    (((Host) root.getContext().getParent()).getFastClasspathScanning())) {
+
+                if (jarContents == null ||
+                        ((System.currentTimeMillis() - prevJarOpenTime > jarRefreshTime) &&
+                            !openJarFile().equals(jar))) {

Review comment:
       This will still leak an `opening` of the same jar. If the `equals()` returns `true` then it won't enter the body of the `if` and `closeJarFile()` won't be called.

##########
File path: java/org/apache/catalina/webresources/JarContents.java
##########
@@ -0,0 +1,122 @@
+package org.apache.catalina.webresources;
+
+import java.util.BitSet;
+import java.util.Enumeration;
+import java.util.jar.JarEntry;
+import java.util.jar.JarFile;
+
+/**
+ * This class represents the contents of a jar by determining whether a given
+ * resource <b>might</b> be in the cache, based on a bloom filter. This is not a
+ * general-purpose bloom filter because it contains logic to strip out
+ * characters from the beginning of the key.
+ *
+ * The hash methods are simple but good enough for this purpose.
+ */
+public final class JarContents {
+    private final BitSet bits1;
+    private final BitSet bits2;
+    /**
+     * Constant used by a typical hashing method.
+     */
+    private static final int HASH_PRIME_1 = 31;
+
+    /**
+     * Constant used by a typical hashing method.
+     */
+    private static final int HASH_PRIME_2 = 17;
+
+    /**
+     * Size of the fixed-length bit table. Larger reduces false positives,
+     * smaller saves memory.
+     */
+    private static final int TABLE_SIZE = 2048;
+
+    /**
+     * Parses the passed-in jar and populates the bit array.
+     *
+     * @param jar
+     */
+    public JarContents(JarFile jar) {
+        Enumeration<JarEntry> entries = jar.entries();
+        bits1 = new BitSet(TABLE_SIZE);
+        bits2 = new BitSet(TABLE_SIZE);
+
+        // Enumerations. When will they update this API?!
+        while (entries.hasMoreElements()) {
+            JarEntry entry = entries.nextElement();
+            String name = entry.getName();
+            int startPos = 0;
+
+            // If the path starts with a slash, that's not useful information.
+            // Skipping it increases the significance of our key by
+            // removing an insignificant character.
+            boolean precedingSlash = name.charAt(0) == '/';
+            if (precedingSlash) {
+                startPos = 1;
+            }
+
+            // Find the correct table slot
+            int pathHash1 = hashcode(name, startPos,HASH_PRIME_1);

Review comment:
       missing space before the third argument

##########
File path: java/org/apache/catalina/webresources/AbstractArchiveResourceSet.java
##########
@@ -211,6 +216,46 @@ public final WebResource getResource(String path) {
         String webAppMount = getWebAppMount();
         WebResourceRoot root = getRoot();
 
+
+        /*
+         * This initializes (when necessary) and checks the jarContents, which
+         * is a highly efficient index of the files stored in the jar. If
+         * jarContents reports that this resource definitely does not contain
+         * the path, we can end this method and move on to the next jar.
+         *
+         * Note: the initialization is thread-safe because multiple simultaneous
+         * threads will create a complete and valid copy, then set the shared
+         * pointer. This guarantees the shared pointer will always go to a
+         * valid object. The cost of multiple copies is small since only one of
+         * them will be long-lived.
+         */
+        try {
+            if ((root.getContext() != null) && (root.getContext().getParent()) != null &&
+                    (((Host) root.getContext().getParent()).getFastClasspathScanning())) {
+
+                if (jarContents == null ||
+                        ((System.currentTimeMillis() - prevJarOpenTime > jarRefreshTime) &&
+                            !openJarFile().equals(jar))) {

Review comment:
       This will still leak an `opening` of the same jar. If the `equals()` returns `true` then it won't enter the body of the `if` and `closeJarFile()` won't be called.

##########
File path: java/org/apache/catalina/webresources/JarContents.java
##########
@@ -0,0 +1,122 @@
+package org.apache.catalina.webresources;
+
+import java.util.BitSet;
+import java.util.Enumeration;
+import java.util.jar.JarEntry;
+import java.util.jar.JarFile;
+
+/**
+ * This class represents the contents of a jar by determining whether a given
+ * resource <b>might</b> be in the cache, based on a bloom filter. This is not a
+ * general-purpose bloom filter because it contains logic to strip out
+ * characters from the beginning of the key.
+ *
+ * The hash methods are simple but good enough for this purpose.
+ */
+public final class JarContents {
+    private final BitSet bits1;
+    private final BitSet bits2;
+    /**
+     * Constant used by a typical hashing method.
+     */
+    private static final int HASH_PRIME_1 = 31;
+
+    /**
+     * Constant used by a typical hashing method.
+     */
+    private static final int HASH_PRIME_2 = 17;
+
+    /**
+     * Size of the fixed-length bit table. Larger reduces false positives,
+     * smaller saves memory.
+     */
+    private static final int TABLE_SIZE = 2048;
+
+    /**
+     * Parses the passed-in jar and populates the bit array.
+     *
+     * @param jar
+     */
+    public JarContents(JarFile jar) {
+        Enumeration<JarEntry> entries = jar.entries();
+        bits1 = new BitSet(TABLE_SIZE);
+        bits2 = new BitSet(TABLE_SIZE);
+
+        // Enumerations. When will they update this API?!
+        while (entries.hasMoreElements()) {
+            JarEntry entry = entries.nextElement();
+            String name = entry.getName();
+            int startPos = 0;
+
+            // If the path starts with a slash, that's not useful information.
+            // Skipping it increases the significance of our key by
+            // removing an insignificant character.
+            boolean precedingSlash = name.charAt(0) == '/';
+            if (precedingSlash) {
+                startPos = 1;
+            }
+
+            // Find the correct table slot
+            int pathHash1 = hashcode(name, startPos,HASH_PRIME_1);

Review comment:
       missing space before the third argument

##########
File path: java/org/apache/catalina/webresources/AbstractArchiveResourceSet.java
##########
@@ -211,6 +216,46 @@ public final WebResource getResource(String path) {
         String webAppMount = getWebAppMount();
         WebResourceRoot root = getRoot();
 
+
+        /*
+         * This initializes (when necessary) and checks the jarContents, which
+         * is a highly efficient index of the files stored in the jar. If
+         * jarContents reports that this resource definitely does not contain
+         * the path, we can end this method and move on to the next jar.
+         *
+         * Note: the initialization is thread-safe because multiple simultaneous
+         * threads will create a complete and valid copy, then set the shared
+         * pointer. This guarantees the shared pointer will always go to a
+         * valid object. The cost of multiple copies is small since only one of
+         * them will be long-lived.
+         */
+        try {
+            if ((root.getContext() != null) && (root.getContext().getParent()) != null &&
+                    (((Host) root.getContext().getParent()).getFastClasspathScanning())) {
+
+                if (jarContents == null ||
+                        ((System.currentTimeMillis() - prevJarOpenTime > jarRefreshTime) &&
+                            !openJarFile().equals(jar))) {

Review comment:
       This will still leak an `opening` of the same jar. If the `equals()` returns `true` then it won't enter the body of the `if` and `closeJarFile()` won't be called.

##########
File path: java/org/apache/catalina/webresources/JarContents.java
##########
@@ -0,0 +1,122 @@
+package org.apache.catalina.webresources;
+
+import java.util.BitSet;
+import java.util.Enumeration;
+import java.util.jar.JarEntry;
+import java.util.jar.JarFile;
+
+/**
+ * This class represents the contents of a jar by determining whether a given
+ * resource <b>might</b> be in the cache, based on a bloom filter. This is not a
+ * general-purpose bloom filter because it contains logic to strip out
+ * characters from the beginning of the key.
+ *
+ * The hash methods are simple but good enough for this purpose.
+ */
+public final class JarContents {
+    private final BitSet bits1;
+    private final BitSet bits2;
+    /**
+     * Constant used by a typical hashing method.
+     */
+    private static final int HASH_PRIME_1 = 31;
+
+    /**
+     * Constant used by a typical hashing method.
+     */
+    private static final int HASH_PRIME_2 = 17;
+
+    /**
+     * Size of the fixed-length bit table. Larger reduces false positives,
+     * smaller saves memory.
+     */
+    private static final int TABLE_SIZE = 2048;
+
+    /**
+     * Parses the passed-in jar and populates the bit array.
+     *
+     * @param jar
+     */
+    public JarContents(JarFile jar) {
+        Enumeration<JarEntry> entries = jar.entries();
+        bits1 = new BitSet(TABLE_SIZE);
+        bits2 = new BitSet(TABLE_SIZE);
+
+        // Enumerations. When will they update this API?!
+        while (entries.hasMoreElements()) {
+            JarEntry entry = entries.nextElement();
+            String name = entry.getName();
+            int startPos = 0;
+
+            // If the path starts with a slash, that's not useful information.
+            // Skipping it increases the significance of our key by
+            // removing an insignificant character.
+            boolean precedingSlash = name.charAt(0) == '/';
+            if (precedingSlash) {
+                startPos = 1;
+            }
+
+            // Find the correct table slot
+            int pathHash1 = hashcode(name, startPos,HASH_PRIME_1);

Review comment:
       missing space before the third argument




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]



---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[GitHub] [tomcat] martin-g commented on a change in pull request #352: Optimizing Resource Lookup using Bloom Filter

GitBox
In reply to this post by GitBox

martin-g commented on a change in pull request #352:
URL: https://github.com/apache/tomcat/pull/352#discussion_r489294641



##########
File path: java/org/apache/catalina/webresources/AbstractArchiveResourceSet.java
##########
@@ -211,6 +216,50 @@ public final WebResource getResource(String path) {
         String webAppMount = getWebAppMount();
         WebResourceRoot root = getRoot();
 
+
+        /*
+         * This initializes (when necessary) and checks the jarContents, which
+         * is a highly efficient index of the files stored in the jar. If
+         * jarContents reports that this resource definitely does not contain
+         * the path, we can end this method and move on to the next jar.
+         *
+         * Note: the initialization is thread-safe because multiple simultaneous
+         * threads will create a complete and valid copy, then set the shared
+         * pointer. This guarantees the shared pointer will always go to a
+         * valid object. The cost of multiple copies is small since only one of
+         * them will be long-lived.
+         */
+        try {
+            if ((root.getContext() != null) && (root.getContext().getParent()) != null &&
+                    (((Host) root.getContext().getParent()).getFastClasspathScanning())) {
+
+                if (jarContents == null ||
+                        ((System.currentTimeMillis() - prevJarOpenTime > jarRefreshTime))) {
+
+                    if (!openJarFile().equals(jar)) {
+                        try {
+                            jar = openJarFile();
+                            jarContents = new JarContents(jar);
+                            prevJarOpenTime = System.currentTimeMillis();
+                            closeJarFile();
+                        } catch (IOException ioe) {
+                            throw new RuntimeException("Unable to parse contents of JAR", ioe);
+                        } finally {
+                            closeJarFile();
+                        }
+                    } else {
+                        closeJarFile();
+                    }
+                }
+                if (!jarContents.mightContainResource(path, webAppMount)) {
+                    return new EmptyResource(root, path);
+                }
+            }
+        } catch (Exception e) {
+            // Do nothing

Review comment:
       why nothing ?
   IMO it should be logged as WARNING, or at least DEBUG




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]



---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[GitHub] [tomcat] martin-g commented on a change in pull request #352: Optimizing Resource Lookup using Bloom Filter

GitBox
In reply to this post by GitBox

martin-g commented on a change in pull request #352:
URL: https://github.com/apache/tomcat/pull/352#discussion_r489297103



##########
File path: test/org/apache/catalina/webresources/TestJarContents.java
##########
@@ -0,0 +1,61 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *      http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.catalina.webresources;
+
+import java.io.File;
+
+import org.junit.Assert;
+import org.junit.Test;
+import java.util.jar.JarFile;
+
+import org.apache.catalina.WebResourceSet;
+
+/**
+ * @author Kamnani, Jatin
+ */
+public class TestJarContents {
+
+    @Test
+    public void testMightContainResource() {
+        try {
+            File empty = new File("test/webresources/dir3");
+            File jar = new File("test/webresources/dir1.jar");
+
+            TesterWebResourceRoot root = new TesterWebResourceRoot();
+
+            // Use empty dir for root of web app.
+            WebResourceSet webResourceSet = new DirResourceSet(root, "/", empty.getAbsolutePath(), "/");
+            root.setMainResources(webResourceSet);
+
+            // If this JAR was in a web application, this is equivalent to how it
+            // would be added
+            JarResourceSet test =
+                    new JarResourceSet(root, "/", jar.getAbsolutePath(), "/META-INF/resources");
+            test.setStaticOnly(true);
+            root.addJarResources(test);
+
+            JarContents testJarContentsObject = new JarContents(new JarFile("test/webresources/dir1.jar"));
+            Assert.assertTrue(testJarContentsObject.mightContainResource(
+                    "/d1/d1-f1.txt", jar.getAbsolutePath()));

Review comment:
       Please also add test assertions for:
   - `null`
   - "" (i.e. empty path)
   - "/", i.e. the root
   - path without leading `/`
   - any other possible weird value




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]



---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[GitHub] [tomcat] martin-g commented on a change in pull request #352: Optimizing Resource Lookup using Bloom Filter

GitBox
In reply to this post by GitBox

martin-g commented on a change in pull request #352:
URL: https://github.com/apache/tomcat/pull/352#discussion_r489297614



##########
File path: test/org/apache/catalina/webresources/TestJarContents.java
##########
@@ -0,0 +1,61 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *      http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.catalina.webresources;
+
+import java.io.File;
+
+import org.junit.Assert;
+import org.junit.Test;
+import java.util.jar.JarFile;
+
+import org.apache.catalina.WebResourceSet;
+
+/**
+ * @author Kamnani, Jatin
+ */
+public class TestJarContents {
+
+    @Test
+    public void testMightContainResource() {
+        try {
+            File empty = new File("test/webresources/dir3");
+            File jar = new File("test/webresources/dir1.jar");
+
+            TesterWebResourceRoot root = new TesterWebResourceRoot();
+
+            // Use empty dir for root of web app.
+            WebResourceSet webResourceSet = new DirResourceSet(root, "/", empty.getAbsolutePath(), "/");
+            root.setMainResources(webResourceSet);
+
+            // If this JAR was in a web application, this is equivalent to how it
+            // would be added
+            JarResourceSet test =
+                    new JarResourceSet(root, "/", jar.getAbsolutePath(), "/META-INF/resources");
+            test.setStaticOnly(true);
+            root.addJarResources(test);
+
+            JarContents testJarContentsObject = new JarContents(new JarFile("test/webresources/dir1.jar"));
+            Assert.assertTrue(testJarContentsObject.mightContainResource(
+                    "/d1/d1-f1.txt", jar.getAbsolutePath()));
+
+            Assert.assertFalse(testJarContentsObject.mightContainResource(
+                    "/d7/d1-f1.txt", jar.getAbsolutePath()));
+        } catch (Exception e) {
+            Assert.fail("Error Happened while Testing JarContents, " + e);

Review comment:
       No need of capital letters for `H`appened and `T`esting.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]



---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[GitHub] [tomcat] martin-g commented on a change in pull request #352: Optimizing Resource Lookup using Bloom Filter

GitBox
In reply to this post by GitBox

martin-g commented on a change in pull request #352:
URL: https://github.com/apache/tomcat/pull/352#discussion_r489297865



##########
File path: test/org/apache/catalina/webresources/TestJarContents.java
##########
@@ -0,0 +1,61 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *      http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.catalina.webresources;
+
+import java.io.File;
+
+import org.junit.Assert;
+import org.junit.Test;
+import java.util.jar.JarFile;
+
+import org.apache.catalina.WebResourceSet;
+
+/**
+ * @author Kamnani, Jatin
+ */
+public class TestJarContents {
+
+    @Test
+    public void testMightContainResource() {
+        try {
+            File empty = new File("test/webresources/dir3");
+            File jar = new File("test/webresources/dir1.jar");
+
+            TesterWebResourceRoot root = new TesterWebResourceRoot();
+
+            // Use empty dir for root of web app.
+            WebResourceSet webResourceSet = new DirResourceSet(root, "/", empty.getAbsolutePath(), "/");
+            root.setMainResources(webResourceSet);
+
+            // If this JAR was in a web application, this is equivalent to how it
+            // would be added
+            JarResourceSet test =
+                    new JarResourceSet(root, "/", jar.getAbsolutePath(), "/META-INF/resources");
+            test.setStaticOnly(true);
+            root.addJarResources(test);
+
+            JarContents testJarContentsObject = new JarContents(new JarFile("test/webresources/dir1.jar"));
+            Assert.assertTrue(testJarContentsObject.mightContainResource(
+                    "/d1/d1-f1.txt", jar.getAbsolutePath()));
+
+            Assert.assertFalse(testJarContentsObject.mightContainResource(
+                    "/d7/d1-f1.txt", jar.getAbsolutePath()));
+        } catch (Exception e) {
+            Assert.fail("Error Happened while Testing JarContents, " + e);

Review comment:
       Use `e.getMessage()` instead of `e.toString()` (implicitly)




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]



---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[GitHub] [tomcat] kamnani commented on a change in pull request #352: Optimizing Resource Lookup using Bloom Filter

GitBox
In reply to this post by GitBox

kamnani commented on a change in pull request #352:
URL: https://github.com/apache/tomcat/pull/352#discussion_r489609047



##########
File path: test/org/apache/catalina/webresources/TestJarContents.java
##########
@@ -0,0 +1,61 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *      http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.catalina.webresources;
+
+import java.io.File;
+
+import org.junit.Assert;
+import org.junit.Test;
+import java.util.jar.JarFile;
+
+import org.apache.catalina.WebResourceSet;
+
+/**
+ * @author Kamnani, Jatin
+ */
+public class TestJarContents {
+
+    @Test
+    public void testMightContainResource() {
+        try {
+            File empty = new File("test/webresources/dir3");
+            File jar = new File("test/webresources/dir1.jar");
+
+            TesterWebResourceRoot root = new TesterWebResourceRoot();
+
+            // Use empty dir for root of web app.
+            WebResourceSet webResourceSet = new DirResourceSet(root, "/", empty.getAbsolutePath(), "/");
+            root.setMainResources(webResourceSet);
+
+            // If this JAR was in a web application, this is equivalent to how it
+            // would be added
+            JarResourceSet test =
+                    new JarResourceSet(root, "/", jar.getAbsolutePath(), "/META-INF/resources");
+            test.setStaticOnly(true);
+            root.addJarResources(test);
+
+            JarContents testJarContentsObject = new JarContents(new JarFile("test/webresources/dir1.jar"));
+            Assert.assertTrue(testJarContentsObject.mightContainResource(
+                    "/d1/d1-f1.txt", jar.getAbsolutePath()));
+
+            Assert.assertFalse(testJarContentsObject.mightContainResource(
+                    "/d7/d1-f1.txt", jar.getAbsolutePath()));
+        } catch (Exception e) {
+            Assert.fail("Error Happened while Testing JarContents, " + e);

Review comment:
       Thanks for the suggestions :)
   Resolved with the latest commit.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]



---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[GitHub] [tomcat] kamnani commented on a change in pull request #352: Optimizing Resource Lookup using Bloom Filter

GitBox
In reply to this post by GitBox

kamnani commented on a change in pull request #352:
URL: https://github.com/apache/tomcat/pull/352#discussion_r489609047



##########
File path: test/org/apache/catalina/webresources/TestJarContents.java
##########
@@ -0,0 +1,61 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *      http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.catalina.webresources;
+
+import java.io.File;
+
+import org.junit.Assert;
+import org.junit.Test;
+import java.util.jar.JarFile;
+
+import org.apache.catalina.WebResourceSet;
+
+/**
+ * @author Kamnani, Jatin
+ */
+public class TestJarContents {
+
+    @Test
+    public void testMightContainResource() {
+        try {
+            File empty = new File("test/webresources/dir3");
+            File jar = new File("test/webresources/dir1.jar");
+
+            TesterWebResourceRoot root = new TesterWebResourceRoot();
+
+            // Use empty dir for root of web app.
+            WebResourceSet webResourceSet = new DirResourceSet(root, "/", empty.getAbsolutePath(), "/");
+            root.setMainResources(webResourceSet);
+
+            // If this JAR was in a web application, this is equivalent to how it
+            // would be added
+            JarResourceSet test =
+                    new JarResourceSet(root, "/", jar.getAbsolutePath(), "/META-INF/resources");
+            test.setStaticOnly(true);
+            root.addJarResources(test);
+
+            JarContents testJarContentsObject = new JarContents(new JarFile("test/webresources/dir1.jar"));
+            Assert.assertTrue(testJarContentsObject.mightContainResource(
+                    "/d1/d1-f1.txt", jar.getAbsolutePath()));
+
+            Assert.assertFalse(testJarContentsObject.mightContainResource(
+                    "/d7/d1-f1.txt", jar.getAbsolutePath()));
+        } catch (Exception e) {
+            Assert.fail("Error Happened while Testing JarContents, " + e);

Review comment:
       Thanks for the suggestions @martin-g  :)
   Resolved with the latest commit.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]



---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[GitHub] [tomcat] kamnani commented on a change in pull request #352: Optimizing Resource Lookup using Bloom Filter

GitBox
In reply to this post by GitBox

kamnani commented on a change in pull request #352:
URL: https://github.com/apache/tomcat/pull/352#discussion_r489609423



##########
File path: test/org/apache/catalina/webresources/TestJarContents.java
##########
@@ -0,0 +1,61 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *      http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.catalina.webresources;
+
+import java.io.File;
+
+import org.junit.Assert;
+import org.junit.Test;
+import java.util.jar.JarFile;
+
+import org.apache.catalina.WebResourceSet;
+
+/**
+ * @author Kamnani, Jatin
+ */
+public class TestJarContents {
+
+    @Test
+    public void testMightContainResource() {
+        try {
+            File empty = new File("test/webresources/dir3");
+            File jar = new File("test/webresources/dir1.jar");
+
+            TesterWebResourceRoot root = new TesterWebResourceRoot();
+
+            // Use empty dir for root of web app.
+            WebResourceSet webResourceSet = new DirResourceSet(root, "/", empty.getAbsolutePath(), "/");
+            root.setMainResources(webResourceSet);
+
+            // If this JAR was in a web application, this is equivalent to how it
+            // would be added
+            JarResourceSet test =
+                    new JarResourceSet(root, "/", jar.getAbsolutePath(), "/META-INF/resources");
+            test.setStaticOnly(true);
+            root.addJarResources(test);
+
+            JarContents testJarContentsObject = new JarContents(new JarFile("test/webresources/dir1.jar"));
+            Assert.assertTrue(testJarContentsObject.mightContainResource(
+                    "/d1/d1-f1.txt", jar.getAbsolutePath()));

Review comment:
       Added more test cases in the latest commit.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]



---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[GitHub] [tomcat] kamnani commented on a change in pull request #352: Optimizing Resource Lookup using Bloom Filter

GitBox
In reply to this post by GitBox

kamnani commented on a change in pull request #352:
URL: https://github.com/apache/tomcat/pull/352#discussion_r489609780



##########
File path: java/org/apache/catalina/webresources/AbstractArchiveResourceSet.java
##########
@@ -211,6 +216,50 @@ public final WebResource getResource(String path) {
         String webAppMount = getWebAppMount();
         WebResourceRoot root = getRoot();
 
+
+        /*
+         * This initializes (when necessary) and checks the jarContents, which
+         * is a highly efficient index of the files stored in the jar. If
+         * jarContents reports that this resource definitely does not contain
+         * the path, we can end this method and move on to the next jar.
+         *
+         * Note: the initialization is thread-safe because multiple simultaneous
+         * threads will create a complete and valid copy, then set the shared
+         * pointer. This guarantees the shared pointer will always go to a
+         * valid object. The cost of multiple copies is small since only one of
+         * them will be long-lived.
+         */
+        try {
+            if ((root.getContext() != null) && (root.getContext().getParent()) != null &&
+                    (((Host) root.getContext().getParent()).getFastClasspathScanning())) {
+
+                if (jarContents == null ||
+                        ((System.currentTimeMillis() - prevJarOpenTime > jarRefreshTime))) {
+
+                    if (!openJarFile().equals(jar)) {
+                        try {
+                            jar = openJarFile();
+                            jarContents = new JarContents(jar);
+                            prevJarOpenTime = System.currentTimeMillis();
+                            closeJarFile();
+                        } catch (IOException ioe) {
+                            throw new RuntimeException("Unable to parse contents of JAR", ioe);
+                        } finally {
+                            closeJarFile();
+                        }
+                    } else {
+                        closeJarFile();
+                    }
+                }
+                if (!jarContents.mightContainResource(path, webAppMount)) {
+                    return new EmptyResource(root, path);
+                }
+            }
+        } catch (Exception e) {
+            // Do nothing

Review comment:
       Added a logger message to this. Thanks for the suggestions :)

##########
File path: java/org/apache/catalina/webresources/JarContents.java
##########
@@ -0,0 +1,122 @@
+package org.apache.catalina.webresources;
+
+import java.util.BitSet;
+import java.util.Enumeration;
+import java.util.jar.JarEntry;
+import java.util.jar.JarFile;
+
+/**
+ * This class represents the contents of a jar by determining whether a given
+ * resource <b>might</b> be in the cache, based on a bloom filter. This is not a
+ * general-purpose bloom filter because it contains logic to strip out
+ * characters from the beginning of the key.
+ *
+ * The hash methods are simple but good enough for this purpose.
+ */
+public final class JarContents {
+    private final BitSet bits1;
+    private final BitSet bits2;
+    /**
+     * Constant used by a typical hashing method.
+     */
+    private static final int HASH_PRIME_1 = 31;
+
+    /**
+     * Constant used by a typical hashing method.
+     */
+    private static final int HASH_PRIME_2 = 17;
+
+    /**
+     * Size of the fixed-length bit table. Larger reduces false positives,
+     * smaller saves memory.
+     */
+    private static final int TABLE_SIZE = 2048;
+
+    /**
+     * Parses the passed-in jar and populates the bit array.
+     *
+     * @param jar
+     */
+    public JarContents(JarFile jar) {
+        Enumeration<JarEntry> entries = jar.entries();
+        bits1 = new BitSet(TABLE_SIZE);
+        bits2 = new BitSet(TABLE_SIZE);
+
+        // Enumerations. When will they update this API?!
+        while (entries.hasMoreElements()) {
+            JarEntry entry = entries.nextElement();
+            String name = entry.getName();
+            int startPos = 0;
+
+            // If the path starts with a slash, that's not useful information.
+            // Skipping it increases the significance of our key by
+            // removing an insignificant character.
+            boolean precedingSlash = name.charAt(0) == '/';
+            if (precedingSlash) {
+                startPos = 1;
+            }
+
+            // Find the correct table slot
+            int pathHash1 = hashcode(name, startPos,HASH_PRIME_1);

Review comment:
       Resolved with the latest commit




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]



---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[GitHub] [tomcat] kamnani commented on a change in pull request #352: Optimizing Resource Lookup using Bloom Filter

GitBox
In reply to this post by GitBox

kamnani commented on a change in pull request #352:
URL: https://github.com/apache/tomcat/pull/352#discussion_r489610039



##########
File path: java/org/apache/catalina/webresources/AbstractArchiveResourceSet.java
##########
@@ -211,6 +216,46 @@ public final WebResource getResource(String path) {
         String webAppMount = getWebAppMount();
         WebResourceRoot root = getRoot();
 
+
+        /*
+         * This initializes (when necessary) and checks the jarContents, which
+         * is a highly efficient index of the files stored in the jar. If
+         * jarContents reports that this resource definitely does not contain
+         * the path, we can end this method and move on to the next jar.
+         *
+         * Note: the initialization is thread-safe because multiple simultaneous
+         * threads will create a complete and valid copy, then set the shared
+         * pointer. This guarantees the shared pointer will always go to a
+         * valid object. The cost of multiple copies is small since only one of
+         * them will be long-lived.
+         */
+        try {
+            if ((root.getContext() != null) && (root.getContext().getParent()) != null &&
+                    (((Host) root.getContext().getParent()).getFastClasspathScanning())) {
+
+                if (jarContents == null ||
+                        ((System.currentTimeMillis() - prevJarOpenTime > jarRefreshTime) &&
+                            !openJarFile().equals(jar))) {

Review comment:
       Resolved with the latest commit.
   




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]



---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[GitHub] [tomcat] FSchumacher commented on a change in pull request #352: Optimizing Resource Lookup using Bloom Filter

GitBox
In reply to this post by GitBox

FSchumacher commented on a change in pull request #352:
URL: https://github.com/apache/tomcat/pull/352#discussion_r489610355



##########
File path: java/org/apache/catalina/webresources/JarContents.java
##########
@@ -0,0 +1,122 @@
+package org.apache.catalina.webresources;
+
+import java.util.BitSet;
+import java.util.Enumeration;
+import java.util.jar.JarEntry;
+import java.util.jar.JarFile;
+
+/**
+ * This class represents the contents of a jar by determining whether a given
+ * resource <b>might</b> be in the cache, based on a bloom filter. This is not a
+ * general-purpose bloom filter because it contains logic to strip out
+ * characters from the beginning of the key.
+ *
+ * The hash methods are simple but good enough for this purpose.
+ */
+public final class JarContents {
+    private final BitSet bits1;
+    private final BitSet bits2;
+    /**
+     * Constant used by a typical hashing method.
+     */
+    private static final int HASH_PRIME_1 = 31;
+
+    /**
+     * Constant used by a typical hashing method.
+     */
+    private static final int HASH_PRIME_2 = 17;
+
+    /**
+     * Size of the fixed-length bit table. Larger reduces false positives,
+     * smaller saves memory.
+     */
+    private static final int TABLE_SIZE = 2048;
+
+    /**
+     * Parses the passed-in jar and populates the bit array.
+     *
+     * @param jar
+     */
+    public JarContents(JarFile jar) {
+        Enumeration<JarEntry> entries = jar.entries();
+        bits1 = new BitSet(TABLE_SIZE);
+        bits2 = new BitSet(TABLE_SIZE);
+
+        // Enumerations. When will they update this API?!
+        while (entries.hasMoreElements()) {
+            JarEntry entry = entries.nextElement();
+            String name = entry.getName();
+            int startPos = 0;
+
+            // If the path starts with a slash, that's not useful information.
+            // Skipping it increases the significance of our key by
+            // removing an insignificant character.
+            boolean precedingSlash = name.charAt(0) == '/';
+            if (precedingSlash) {
+                startPos = 1;
+            }
+
+            // Find the correct table slot
+            int pathHash1 = hashcode(name, startPos, HASH_PRIME_1);
+            int pathHash2 = hashcode(name, startPos, HASH_PRIME_2);
+
+            bits1.set(pathHash1 % TABLE_SIZE);
+            bits2.set(pathHash2 % TABLE_SIZE);
+        }
+    }
+
+    /**
+     * Simple hashcode of a portion of the string. Typically we would use
+     * substring, but memory and runtime speed are critical.
+     *
+     * @param content
+     *            Wrapping String.
+     * @param startPos
+     *            First character in the range.
+     * @return hashcode of the range.
+     */
+    private int hashcode(String content, int startPos, int hashPrime) {
+        int h = hashPrime/2;
+        int contentLength = content.length();
+        for (int i = startPos; i < contentLength; i++) {
+            h = hashPrime * h + content.charAt(i);
+        }
+
+        if (h < 0) {
+            h = h * -1;
+        }
+        return h;
+    }
+
+
+    /**
+     * Method that identifies whether a given path <b>MIGHT</b> be in this jar.
+     * Uses the Bloom filter mechanism.
+     *
+     * @param path
+     *            Requested path. Sometimes starts with "/WEB-INF/classes".
+     * @param webappRoot
+     *            The value of the webapp location, which can be stripped from
+     *            the path. Typically is "/WEB-INF/classes".
+     * @return Whether the prefix of the path is known to be in this jar.
+     */
+    public final boolean mightContainResource(String path, String webappRoot) {
+        int startPos = 0;
+        if (path.startsWith(webappRoot)) {
+            startPos = webappRoot.length();
+        }
+
+        if (path.charAt(startPos) == '/') {
+            // ignore leading slash
+            startPos++;
+        }
+
+        // Find the correct table slot
+        int pathHash1 = hashcode(path, startPos, HASH_PRIME_1);
+        int pathHash2 = hashcode(path, startPos, HASH_PRIME_2);
+        boolean probablyPresent = (bits1.get(pathHash1 % TABLE_SIZE) && bits2.get(pathHash2 % TABLE_SIZE));

Review comment:
       Wouldn't it be even faster, when you calculate pathHash2 lazyly, i.e. inlining the two int variables and use the fact, that the `&&` operator is lazy?
   The parenthesis seem to be not needed here, too.
   And last, why not return the result directly?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]



---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[GitHub] [tomcat] kamnani commented on a change in pull request #352: Optimizing Resource Lookup using Bloom Filter

GitBox
In reply to this post by GitBox

kamnani commented on a change in pull request #352:
URL: https://github.com/apache/tomcat/pull/352#discussion_r489619352



##########
File path: java/org/apache/catalina/webresources/JarContents.java
##########
@@ -0,0 +1,122 @@
+package org.apache.catalina.webresources;
+
+import java.util.BitSet;
+import java.util.Enumeration;
+import java.util.jar.JarEntry;
+import java.util.jar.JarFile;
+
+/**
+ * This class represents the contents of a jar by determining whether a given
+ * resource <b>might</b> be in the cache, based on a bloom filter. This is not a
+ * general-purpose bloom filter because it contains logic to strip out
+ * characters from the beginning of the key.
+ *
+ * The hash methods are simple but good enough for this purpose.
+ */
+public final class JarContents {
+    private final BitSet bits1;
+    private final BitSet bits2;
+    /**
+     * Constant used by a typical hashing method.
+     */
+    private static final int HASH_PRIME_1 = 31;
+
+    /**
+     * Constant used by a typical hashing method.
+     */
+    private static final int HASH_PRIME_2 = 17;
+
+    /**
+     * Size of the fixed-length bit table. Larger reduces false positives,
+     * smaller saves memory.
+     */
+    private static final int TABLE_SIZE = 2048;
+
+    /**
+     * Parses the passed-in jar and populates the bit array.
+     *
+     * @param jar
+     */
+    public JarContents(JarFile jar) {
+        Enumeration<JarEntry> entries = jar.entries();
+        bits1 = new BitSet(TABLE_SIZE);
+        bits2 = new BitSet(TABLE_SIZE);
+
+        // Enumerations. When will they update this API?!
+        while (entries.hasMoreElements()) {
+            JarEntry entry = entries.nextElement();
+            String name = entry.getName();
+            int startPos = 0;
+
+            // If the path starts with a slash, that's not useful information.
+            // Skipping it increases the significance of our key by
+            // removing an insignificant character.
+            boolean precedingSlash = name.charAt(0) == '/';
+            if (precedingSlash) {
+                startPos = 1;
+            }
+
+            // Find the correct table slot
+            int pathHash1 = hashcode(name, startPos, HASH_PRIME_1);
+            int pathHash2 = hashcode(name, startPos, HASH_PRIME_2);
+
+            bits1.set(pathHash1 % TABLE_SIZE);
+            bits2.set(pathHash2 % TABLE_SIZE);
+        }
+    }
+
+    /**
+     * Simple hashcode of a portion of the string. Typically we would use
+     * substring, but memory and runtime speed are critical.
+     *
+     * @param content
+     *            Wrapping String.
+     * @param startPos
+     *            First character in the range.
+     * @return hashcode of the range.
+     */
+    private int hashcode(String content, int startPos, int hashPrime) {
+        int h = hashPrime/2;
+        int contentLength = content.length();
+        for (int i = startPos; i < contentLength; i++) {
+            h = hashPrime * h + content.charAt(i);
+        }
+
+        if (h < 0) {
+            h = h * -1;
+        }
+        return h;
+    }
+
+
+    /**
+     * Method that identifies whether a given path <b>MIGHT</b> be in this jar.
+     * Uses the Bloom filter mechanism.
+     *
+     * @param path
+     *            Requested path. Sometimes starts with "/WEB-INF/classes".
+     * @param webappRoot
+     *            The value of the webapp location, which can be stripped from
+     *            the path. Typically is "/WEB-INF/classes".
+     * @return Whether the prefix of the path is known to be in this jar.
+     */
+    public final boolean mightContainResource(String path, String webappRoot) {
+        int startPos = 0;
+        if (path.startsWith(webappRoot)) {
+            startPos = webappRoot.length();
+        }
+
+        if (path.charAt(startPos) == '/') {
+            // ignore leading slash
+            startPos++;
+        }
+
+        // Find the correct table slot
+        int pathHash1 = hashcode(path, startPos, HASH_PRIME_1);
+        int pathHash2 = hashcode(path, startPos, HASH_PRIME_2);
+        boolean probablyPresent = (bits1.get(pathHash1 % TABLE_SIZE) && bits2.get(pathHash2 % TABLE_SIZE));

Review comment:
       Thanks for the suggestion @FSchumacher 👍
   I have made this change in the latest commit.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]



---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[GitHub] [tomcat] kamnani commented on pull request #352: Optimizing Resource Lookup using Bloom Filter

GitBox
In reply to this post by GitBox

kamnani commented on pull request #352:
URL: https://github.com/apache/tomcat/pull/352#issuecomment-694940080


   @martin-g Thanks for your valuable feedback.
   I have added the test cases and marked the conversations as resolved after the latest commits. Do we have anything else on this PR?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[hidden email]



---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

12