Greetings,
First apologies for non tomcat question. I have seen that there is enough expertise here to provide hints and hints are what I am looking for to solve the problem and question is generic enough. I have tried researching problem to best of my abilities. It all happens on Ubuntu 20.04 and JDK 15 We have a java program that regularly throws "java.lang.OutOfMemoryError: Java heap space" exception. Puzzling point is it happens only on one VM. We have a set of two VMs/boxes spawned from same AWS image. Machine class/region is exactly same and since they are from same image, they should be mostly identical except stuff like host name, ip address etc. Number of tasks performed by VMs are comparable and not a significant difference. Yet, one VM never runs of out of memory and other one does. Sometimes it's as soon as half an hour after restarting the process while on the other box process is running for days and no issues. I took memory dumps from both VMs and they look similar. Program is started with -Xmx1g flag and we have taken regular memory dumps. In many cases eclipse MAT reports total memory usage was less than 100MB when program crashed with out of memory exception. Has anyone seen anything similar to this? Identical bits of code behaving differently? What else should I be looking for? Regards, Niranjan --------------------------------------------------------------------- To unsubscribe, e-mail: [hidden email] For additional commands, e-mail: [hidden email] |
On 2/18/2021 11:36 AM, Niranjan Rao wrote:
> First apologies for non tomcat question. I have seen that there is > enough expertise here to provide hints and hints are what I am looking > for to solve the problem and question is generic enough. I have tried > researching problem to best of my abilities. I believe you're right to think this isn't a tomcat question. There are a lot of things it could be. Tomcat is a *possible* source, though I think the chance of that is low. Without a LOT of info that I would probably be useless at interpreting or asking for, it's impossible to say for sure. With problems like this, it is normally the application running inside Tomcat that has a problem, not Tomcat itself. You're likely to get a lot more useful information if you go to the people responsible for those applications. > We have a java program that regularly throws > "java.lang.OutOfMemoryError: Java heap space" exception. Puzzling point > is it happens only on one VM. We have a set of two VMs/boxes spawned > from same AWS image. Machine class/region is exactly same and since they > are from same image, they should be mostly identical except stuff like > host name, ip address etc. > > Number of tasks performed by VMs are comparable and not a significant > difference. Yet, one VM never runs of out of memory and other one does. > Sometimes it's as soon as half an hour after restarting the process > while on the other box process is running for days and no issues. "Comparable" isn't "identical". Are they running the same apps? Which apps are involved? Is the one that's throwing OOME handling substantially similar requests when compared to one that doesn't? Is the request rate nearly the same, or is the problematic one handling a lot more? Another applicable question, also off topic for this mailing list: Are the apps in both cases configured identically? > I took memory dumps from both VMs and they look similar. Program is > started with -Xmx1g flag and we have taken regular memory dumps. In many > cases eclipse MAT reports total memory usage was less than 100MB when > program crashed with out of memory exception. That's extremely odd, unless the application requested a REALLY big chunk of memory such that the 100MB existing plus the new allocation would be larger than the max heap size of 1GB. Do you have enough free memory that you could increase the max heap to 2GB or beyond and see what happens? > Has anyone seen anything similar to this? Identical bits of code > behaving differently? What else should I be looking for? Earlier you said "comparable" and now you're saying "identical". So I have to ask ... which is it? Remember that differences in configurations, types of requests, and request load can lead to very different requirements, even if the apps running inside Tomcat are the same. Most of my experience in the Java world comes from Solr. Apache Solr is a servlet application, and ships with Jetty. Tomcat is not usually involved. I joined this mailing list because I was responsible for Tomcat servers running apps developed in-house, and every once in a while, I needed to ask something tomcat-specific. Thanks, Shawn --------------------------------------------------------------------- To unsubscribe, e-mail: [hidden email] For additional commands, e-mail: [hidden email] |
Hi Shawn
Thank you the response. This is not a web application, but a standalone java program. Hence I said it's not a tomcat question, but a generic JVM question. I have been researching about this a lot and based on many mails on this list, lot of people here know about internal behavior of JVM and specs lot better than I do. Both the boxes are spawned from same AWS image, we build the image. There is no other difference. Both receive tasks over MQ. Tasks could be slightly different - like for different users, number of entities user holds etc, but they should not be too different or kind of should average out in the long run. We have examined the data for the tasks and nothing unusual has come out so far. Regards, Niranjan On 2/18/21 10:59 AM, Shawn Heisey wrote: Regards, Niranjan > On 2/18/2021 11:36 AM, Niranjan Rao wrote: >> First apologies for non tomcat question. I have seen that there is >> enough expertise here to provide hints and hints are what I am >> looking for to solve the problem and question is generic enough. I >> have tried researching problem to best of my abilities. > > I believe you're right to think this isn't a tomcat question. There > are a lot of things it could be. Tomcat is a *possible* source, > though I think the chance of that is low. Without a LOT of info that > I would probably be useless at interpreting or asking for, it's > impossible to say for sure. > > With problems like this, it is normally the application running inside > Tomcat that has a problem, not Tomcat itself. You're likely to get a > lot more useful information if you go to the people responsible for > those applications. > >> We have a java program that regularly throws >> "java.lang.OutOfMemoryError: Java heap space" exception. Puzzling >> point is it happens only on one VM. We have a set of two VMs/boxes >> spawned from same AWS image. Machine class/region is exactly same and >> since they are from same image, they should be mostly identical >> except stuff like host name, ip address etc. >> >> Number of tasks performed by VMs are comparable and not a significant >> difference. Yet, one VM never runs of out of memory and other one >> does. Sometimes it's as soon as half an hour after restarting the >> process while on the other box process is running for days and no >> issues. > > "Comparable" isn't "identical". > > Are they running the same apps? Which apps are involved? Is the one > that's throwing OOME handling substantially similar requests when > compared to one that doesn't? Is the request rate nearly the same, or > is the problematic one handling a lot more? Another applicable > question, also off topic for this mailing list: Are the apps in both > cases configured identically? > >> I took memory dumps from both VMs and they look similar. Program is >> started with -Xmx1g flag and we have taken regular memory dumps. In >> many cases eclipse MAT reports total memory usage was less than 100MB >> when program crashed with out of memory exception. > > That's extremely odd, unless the application requested a REALLY big > chunk of memory such that the 100MB existing plus the new allocation > would be larger than the max heap size of 1GB. > > Do you have enough free memory that you could increase the max heap to > 2GB or beyond and see what happens? > >> Has anyone seen anything similar to this? Identical bits of code >> behaving differently? What else should I be looking for? > > Earlier you said "comparable" and now you're saying "identical". So I > have to ask ... which is it? Remember that differences in > configurations, types of requests, and request load can lead to very > different requirements, even if the apps running inside Tomcat are the > same. > > Most of my experience in the Java world comes from Solr. Apache Solr > is a servlet application, and ships with Jetty. Tomcat is not usually > involved. I joined this mailing list because I was responsible for > Tomcat servers running apps developed in-house, and every once in a > while, I needed to ask something tomcat-specific. > > Thanks, > Shawn > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [hidden email] > For additional commands, e-mail: [hidden email] > --------------------------------------------------------------------- To unsubscribe, e-mail: [hidden email] For additional commands, e-mail: [hidden email] |
Have you tried enabling heap dumps on OOM exceptions (
https://docs.oracle.com/javase/8/docs/technotes/guides/troubleshoot/clopts001.html ; HeapDumpOnOutOfMemoryError) and then looking at the heap dump? It should help you identify where the allocated heap is going to, and give you some ideas of where to look next. On Thu, Feb 18, 2021 at 2:12 PM Niranjan Rao <[hidden email]> wrote: > Hi Shawn > > Thank you the response. This is not a web application, but a standalone > java program. Hence I said it's not a tomcat question, but a generic JVM > question. I have been researching about this a lot and based on many > mails on this list, lot of people here know about internal behavior of > JVM and specs lot better than I do. > > Both the boxes are spawned from same AWS image, we build the image. > There is no other difference. Both receive tasks over MQ. Tasks could > be slightly different - like for different users, number of entities > user holds etc, but they should not be too different or kind of should > average out in the long run. We have examined the data for the tasks and > nothing unusual has come out so far. > > Regards, > > Niranjan > On 2/18/21 10:59 AM, Shawn Heisey wrote: > > Regards, > > Niranjan > > > On 2/18/2021 11:36 AM, Niranjan Rao wrote: > >> First apologies for non tomcat question. I have seen that there is > >> enough expertise here to provide hints and hints are what I am > >> looking for to solve the problem and question is generic enough. I > >> have tried researching problem to best of my abilities. > > > > I believe you're right to think this isn't a tomcat question. There > > are a lot of things it could be. Tomcat is a *possible* source, > > though I think the chance of that is low. Without a LOT of info that > > I would probably be useless at interpreting or asking for, it's > > impossible to say for sure. > > > > With problems like this, it is normally the application running inside > > Tomcat that has a problem, not Tomcat itself. You're likely to get a > > lot more useful information if you go to the people responsible for > > those applications. > > > >> We have a java program that regularly throws > >> "java.lang.OutOfMemoryError: Java heap space" exception. Puzzling > >> point is it happens only on one VM. We have a set of two VMs/boxes > >> spawned from same AWS image. Machine class/region is exactly same and > >> since they are from same image, they should be mostly identical > >> except stuff like host name, ip address etc. > >> > >> Number of tasks performed by VMs are comparable and not a significant > >> difference. Yet, one VM never runs of out of memory and other one > >> does. Sometimes it's as soon as half an hour after restarting the > >> process while on the other box process is running for days and no > >> issues. > > > > "Comparable" isn't "identical". > > > > Are they running the same apps? Which apps are involved? Is the one > > that's throwing OOME handling substantially similar requests when > > compared to one that doesn't? Is the request rate nearly the same, or > > is the problematic one handling a lot more? Another applicable > > question, also off topic for this mailing list: Are the apps in both > > cases configured identically? > > > >> I took memory dumps from both VMs and they look similar. Program is > >> started with -Xmx1g flag and we have taken regular memory dumps. In > >> many cases eclipse MAT reports total memory usage was less than 100MB > >> when program crashed with out of memory exception. > > > > That's extremely odd, unless the application requested a REALLY big > > chunk of memory such that the 100MB existing plus the new allocation > > would be larger than the max heap size of 1GB. > > > > Do you have enough free memory that you could increase the max heap to > > 2GB or beyond and see what happens? > > > >> Has anyone seen anything similar to this? Identical bits of code > >> behaving differently? What else should I be looking for? > > > > Earlier you said "comparable" and now you're saying "identical". So I > > have to ask ... which is it? Remember that differences in > > configurations, types of requests, and request load can lead to very > > different requirements, even if the apps running inside Tomcat are the > > same. > > > > Most of my experience in the Java world comes from Solr. Apache Solr > > is a servlet application, and ships with Jetty. Tomcat is not usually > > involved. I joined this mailing list because I was responsible for > > Tomcat servers running apps developed in-house, and every once in a > > while, I needed to ask something tomcat-specific. > > > > Thanks, > > Shawn > > > > --------------------------------------------------------------------- > > To unsubscribe, e-mail: [hidden email] > > For additional commands, e-mail: [hidden email] > > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [hidden email] > For additional commands, e-mail: [hidden email] > > |
In reply to this post by Niranjan Rao
You need to monitor the JVM through something like visual VM or JConsole.
Monitor the heap space. Your gonna have to modify your code to help you understand where the memory leak is occurring. The stack trace should give you an idea of where in your code it is trying to allocate memory. On Thu, Feb 18, 2021 at 2:12 PM Niranjan Rao <[hidden email]> wrote: > Hi Shawn > > Thank you the response. This is not a web application, but a standalone > java program. Hence I said it's not a tomcat question, but a generic JVM > question. I have been researching about this a lot and based on many > mails on this list, lot of people here know about internal behavior of > JVM and specs lot better than I do. > > Both the boxes are spawned from same AWS image, we build the image. > There is no other difference. Both receive tasks over MQ. Tasks could > be slightly different - like for different users, number of entities > user holds etc, but they should not be too different or kind of should > average out in the long run. We have examined the data for the tasks and > nothing unusual has come out so far. > > Regards, > > Niranjan > On 2/18/21 10:59 AM, Shawn Heisey wrote: > > Regards, > > Niranjan > > > On 2/18/2021 11:36 AM, Niranjan Rao wrote: > >> First apologies for non tomcat question. I have seen that there is > >> enough expertise here to provide hints and hints are what I am > >> looking for to solve the problem and question is generic enough. I > >> have tried researching problem to best of my abilities. > > > > I believe you're right to think this isn't a tomcat question. There > > are a lot of things it could be. Tomcat is a *possible* source, > > though I think the chance of that is low. Without a LOT of info that > > I would probably be useless at interpreting or asking for, it's > > impossible to say for sure. > > > > With problems like this, it is normally the application running inside > > Tomcat that has a problem, not Tomcat itself. You're likely to get a > > lot more useful information if you go to the people responsible for > > those applications. > > > >> We have a java program that regularly throws > >> "java.lang.OutOfMemoryError: Java heap space" exception. Puzzling > >> point is it happens only on one VM. We have a set of two VMs/boxes > >> spawned from same AWS image. Machine class/region is exactly same and > >> since they are from same image, they should be mostly identical > >> except stuff like host name, ip address etc. > >> > >> Number of tasks performed by VMs are comparable and not a significant > >> difference. Yet, one VM never runs of out of memory and other one > >> does. Sometimes it's as soon as half an hour after restarting the > >> process while on the other box process is running for days and no > >> issues. > > > > "Comparable" isn't "identical". > > > > Are they running the same apps? Which apps are involved? Is the one > > that's throwing OOME handling substantially similar requests when > > compared to one that doesn't? Is the request rate nearly the same, or > > is the problematic one handling a lot more? Another applicable > > question, also off topic for this mailing list: Are the apps in both > > cases configured identically? > > > >> I took memory dumps from both VMs and they look similar. Program is > >> started with -Xmx1g flag and we have taken regular memory dumps. In > >> many cases eclipse MAT reports total memory usage was less than 100MB > >> when program crashed with out of memory exception. > > > > That's extremely odd, unless the application requested a REALLY big > > chunk of memory such that the 100MB existing plus the new allocation > > would be larger than the max heap size of 1GB. > > > > Do you have enough free memory that you could increase the max heap to > > 2GB or beyond and see what happens? > > > >> Has anyone seen anything similar to this? Identical bits of code > >> behaving differently? What else should I be looking for? > > > > Earlier you said "comparable" and now you're saying "identical". So I > > have to ask ... which is it? Remember that differences in > > configurations, types of requests, and request load can lead to very > > different requirements, even if the apps running inside Tomcat are the > > same. > > > > Most of my experience in the Java world comes from Solr. Apache Solr > > is a servlet application, and ships with Jetty. Tomcat is not usually > > involved. I joined this mailing list because I was responsible for > > Tomcat servers running apps developed in-house, and every once in a > > while, I needed to ask something tomcat-specific. > > > > Thanks, > > Shawn > > > > --------------------------------------------------------------------- > > To unsubscribe, e-mail: [hidden email] > > For additional commands, e-mail: [hidden email] > > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [hidden email] > For additional commands, e-mail: [hidden email] > > -- Thanks, Brian Wolfe https://www.linkedin.com/in/brian-wolfe-3136425a/ |
In reply to this post by Niranjan Rao
On 2/18/2021 12:11 PM, Niranjan Rao wrote:
> Thank you the response. This is not a web application, but a standalone > java program. Hence I said it's not a tomcat question, but a generic JVM > question. I have been researching about this a lot and based on many > mails on this list, lot of people here know about internal behavior of > JVM and specs lot better than I do. Apologies for getting that wrong. Is it a custom app or something that you downloaded and installed? Talk to whoever wrote it. They will hopefully know what information is needed to troubleshoot further. Is Java 15 required for the application to function? If you can successfully use Java 11 or even Java 8, you'll be dealing with a far more stable platform. Major show-stopper bugs in Java are rare, but they do happen. I will warn you that although I do recommend downgrading Java for stability purposes, I do not hold out a lot of hope that it will solve this problem. Which garbage collector are you using? I would recommend one of the really stable collectors, like G1. I wrote this wiki page a long time ago that includes garbage collection information for Solr ... I think it would apply well to any application where latency is important than throughput: https://cwiki.apache.org/confluence/display/SOLR/ShawnHeisey Thanks, Shawn --------------------------------------------------------------------- To unsubscribe, e-mail: [hidden email] For additional commands, e-mail: [hidden email] |
On 2/18/21 12:53 PM, Shawn Heisey wrote:
> On 2/18/2021 12:11 PM, Niranjan Rao wrote: >> Thank you the response. This is not a web application, but a >> standalone java program. Hence I said it's not a tomcat question, but >> a generic JVM question. I have been researching about this a lot and >> based on many mails on this list, lot of people here know about >> internal behavior of JVM and specs lot better than I do. > > Apologies for getting that wrong. > > Is it a custom app or something that you downloaded and installed? > Talk to whoever wrote it. They will hopefully know what information > is needed to troubleshoot further. > > Is Java 15 required for the application to function? If you can > successfully use Java 11 or even Java 8, you'll be dealing with a far > more stable platform. Major show-stopper bugs in Java are rare, but > they do happen. I will warn you that although I do recommend > downgrading Java for stability purposes, I do not hold out a lot of > hope that it will solve this problem. > > Which garbage collector are you using? I would recommend one of the > really stable collectors, like G1. I wrote this wiki page a long time > ago that includes garbage collection information for Solr ... I think > it would apply well to any application where latency is important than > throughput: > > https://cwiki.apache.org/confluence/display/SOLR/ShawnHeisey > > Thanks, > Shawn > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [hidden email] > For additional commands, e-mail: [hidden email] > talking with self does not help with new ideas much ;) We added lot of logging and wrote a simple throw away tool to analyze the logs. Even though task counts are similar, there were some time out errors that could be causing the leaks. Currently a patch is deployed and we are waiting to see if it has made any impact. Interesting point was why is one machine getting brunt of bad things. May be we will drop the box and spawn another VM with the assumption that host could be heavily loaded or something similar not easily visible things going on. Your blog entry is very informative. Thank you. Regards, Niranjan --------------------------------------------------------------------- To unsubscribe, e-mail: [hidden email] For additional commands, e-mail: [hidden email] |
In reply to this post by Niranjan Rao
Niranjan,
On 2/18/21 13:36, Niranjan Rao wrote: > First apologies for non tomcat question. I have seen that there is > enough expertise here to provide hints and hints are what I am looking > for to solve the problem and question is generic enough. I have tried > researching problem to best of my abilities. > > It all happens on Ubuntu 20.04 and JDK 15 > > We have a java program that regularly throws > "java.lang.OutOfMemoryError: Java heap space" exception. Puzzling point > is it happens only on one VM. We have a set of two VMs/boxes spawned > from same AWS image. Machine class/region is exactly same and since they > are from same image, they should be mostly identical except stuff like > host name, ip address etc. > > Number of tasks performed by VMs are comparable and not a significant > difference. Yet, one VM never runs of out of memory and other one does. > Sometimes it's as soon as half an hour after restarting the process > while on the other box process is running for days and no issues. > > I took memory dumps from both VMs and they look similar. Program is > started with -Xmx1g flag and we have taken regular memory dumps. In many > cases eclipse MAT reports total memory usage was less than 100MB when > program crashed with out of memory exception. > > > Has anyone seen anything similar to this? Identical bits of code > behaving differently? What else should I be looking for? What is the load profile of each application/server? You said you aren't running Tomcat, but is load on each of the applications balanced in any way similar to how a web-application load-balancer would work? Sometimes, the answer is simply that one server is doing more work than the other. We have two application servers which are "identical" except that only one of them handles our email queue. Maybe though the "types" of tasks are the same for each server, one of them is getting unlucky and is handling a "big" task that fails each time? Do you have any logging which would indicate which task, ro what kinds of tasks are failing? Do you have a stack trace of the OOME? Do you have a bunch of them (from many separate events)? Do they all look the same? The AWS images are the same, have you upgraded the OS on either one after initial launch, or do you always start fresh with the same image and no "apt-get update" on them. Same JVM and everything on each of them? If you start with -Xmx1G then you should consider also using -Xms1G. IF you have a long-running process which you expect may take up 1G of heap space, go ahead and allocate it all at once instead of wasting time re-sizing the heap a bunch of times on your way up to 1G. -chris --------------------------------------------------------------------- To unsubscribe, e-mail: [hidden email] For additional commands, e-mail: [hidden email] |
Free forum by Nabble | Edit this page |