JVM runtime analysis
At the Java Virtual Machine (JVM) layer, the following parameters and JVM utilization factors can drastically affect the performance of a web application:
JVM PermSize setting
The PermSize section of the JVM heap is reserved for the permanent generations and holds all of the reflective data for the JVM (such as class, methods definitions, and associated data) and constants (such as intern strings). This allocation is completely separate and independent from the JVM heap size setting.
To optimize the performance of applications that dynamically load and unload many classes such as the mid tier, increase the size from the default maximum allocation of 83MB. (This default applies to Sun’s JVM 64-bit running on Windows in server mode and is 30% larger than the default of 64MB of the 32-bit JVM. Other modes, operating system, and some JVM vendors might default to a different value.)
For more information, see http://www.oracle.com/technetwork/java/javase/tech/vmoptions-jsp-140102.html.
The following JVM arguments are related to the PermSize setting:
-XX:PermSize— The initial PermSize. If this value is less than the maximum default value, then the JVM allocation starts with this size and grows to the maximum default size as needed. If this value is larger than the maximum default size and no
-XX:MaxPermSizeis set, then both are set to the same given value.
-XX:MaxPermSize— The maximum PermSize.
BMC recommends setting the PermSize for a 64-bit JVM hosting the Tomcat hosting the mid tier at 256MB, which is the default value that the mid-tier installer sets. This size is sufficiently large for a mid tier connecting to an AR System server with the complete BMC Remedy ITSM Suite deployed plus additional custom Data Visualization Modules, if any. However, if you have many custom Data Visualization Modules deployed or if your mid tier is connected to multiple AR System servers, you might need to adjust your PermSize value based on your JVM monitored data.
The following graph shows the JVM PermSize usage of a production mid tier connecting to an AR System server with the full BMC Remedy ITSM 7.6.04 suite deployed while under a user load of 300 concurrent users over 24 hours. At the default 256 MB for the PermSize, there is sufficient space to accommodate additional custom Data Visualization Modules and additional BMC Remedy AR System application deployment. Also, though the graph shows an increasing usage pattern for the permanent generations, garbage collection (GC) is on by default. If loaded class definitions with no associated objects are on the JVM heap, those definitions will be collected and the PermSize usage will drop as shown in parts of the graph.
PermSize Usage under full 300 user load
The following example sets the PermSize to the recommended 256MB.
This allocates the PermSize to a fixed 256MB at JVM startup because 256MB is larger than the default maximum.
JVM Garbage Collector selections
This section provides a brief overview of JVM Garbage Collector (GC). For a primer on how Java GC works, see http://www.oracle.com/technetwork/java/javase/gc-tuning-6-140523.html. For a summary of all the aspects of java GC, see http://www.petefreitag.com/articles/gctuning/.
The JVM heap, not including the permanent generations, is partitioned into the following regions:
- Young generations, which contain short-lived objects
- Old generations, which contain long-lived objects — each with its own independent GC.
Garbage collection of young generations occurs frequently whenever the eden space is full and is called a minor GC cycle. When a minor GC cycle cannot reclaim sufficient heap space as configured by various GC parameters, a major GC cycle starts. This involves the GC of the old generations (as well as moving objects from the young generations into the old generations). Whenever a major GC cycle occurs, application threads stop during that interval. This reduces the application’s performance and throughput, and it introduces additional delay to the web application’s service time.
Aim to configure the JVM with GCs that are most suitable to the hardware and web application usage to minimize the delay of servicing HTTP requests.
With exception of the incremental GC (-Xincgc) and older GC not suitable for web applications, the following table summarizes the different GCs for the old, young, and permanent generations in terms of low-pause, throughput, and number of CPUs.
Garbage collection summary for various generations
For permanent generations, the GC is on by default and is not configurable, but you can disable it through
-Xnoclassgc. The advantage of disabling the permanent generations GC is that the GC will not unload and garbage collect class definitions in the permanent generations space that have no associated live objects. This implies that a future object instantiation of one of those classes is much faster since its class definition is already in memory. However, if you disable this permanent generation GC, monitor the PermSize because the PermSize grows faster than with the GC on.
The most generic recommendations regarding GC choice are to use:
- Default GC for a single CPU.
- Low-pause GC for multiple CPUs if the focus in on HTTP service time.
- Throughput GC for multiple CPUs if the web application creates many objects such as a mid tier set up for reporting.
This might not be the most optimal recommendation for your hardware and user load. GC tuning is a highly specialized topic. Monitor and collect JVM data to analyze and see what works best for your hardware and load.
In the BMC testing environment, when the CPU clock speed is over 2.5GHz, the throughput GCs worked better than the low-pause GCs.
If you mix different GCs, not all versions of the JVM will take all combinations of old and young generations GC collectors. See http://www.oracle.com/technetwork/java/javase/gc-tuning-6-140523.html. If you have a different JVM vendor, refer to the vendor’s documentation.
If you selected an invalid combination, the JVM will not start, or it will start but will ignore the GC selection that does not make sense.
The following example sets both young and old generation collectors to the low-pause GC. Add the following arguments to your JVM arguments:
This changes the young and the old generation collectors from the default to the low-pause GC. If you have other GC’s already set, remove those arguments.
The following figures show a low-pause GC versus a throughput GC for the same 2 CPU’s hardware with 4 GB RAM running Windows 2008. For a one-hour load test, the throughput GC had one major GC cycle versus the three major GC cycles with the low-pause GC. The reason is that for the throughput GC, the minor GC cycles reclaimed significantly more heap space than with the low-pause GC, thus fewer major GC cycles are required. However, for these different GC choices, the average response time over the entire load test was similar.
GC behavior for throughput GC
GC behavior for low-pause GC
For most cases, these generic recommendations are sufficient. However, if your web stack still experiences pauses and you suspect that it is the GC, use the following article as a starting point: http://java.sun.com/docs/hotspot/gc1.4.2/example.html.
JVM heap size setting
A key item in making the transition from 32-bit JVM to 64-bit JVM is understanding that the 64-bit platform has associated performance overhead due to the addressing. For more information, see http://www.oracle.com/technetwork/java/hotspotfaq-138619.html#64bit_performance.
BMC internal performance stress tests recorded that the 32-bit JVM outperforms the 64-bit JVM by approximately 45% in CPU utilization on the same box with the two JVM versions installed and with the identical load that drives each box to at least 50% CPU utilization at the OS-level aggregated metric monitoring. Also, using
jvisualvm to monitor the JVM process, the 64-bit JVM heap usage was about 25% higher than the 32-bit JVM.
Oracle provides a hybrid mode for the 64-bit JVM to reclaim some of the performance overhead both in CPU and heap usage. This is known as the 64-bit JVM hybrid mode. This mode reclaims roughly 50% of the performance loss as recorded by BMC lab tests. Use this mode if your memory requirement is less than 32 GB because this is the maximum heap size possible in the hybrid mode. To activate this hybrid mode, add the following argument to your JVM startup arguments:
For information about the hybrid mode, see https://wikis.oracle.com/display/HotSpotInternals/CompressedOops.
Regarding the heap size allocation for the JVM (hosting the Tomcat running the mid tier), there is no one-size-fits-all setting. The size of the JVM heap depends on the following:
- The mid-tier version — Later versions use more memory to support additional features.
- The web user load — The rate of the HTTP requests that users generate.
- The number of concurrent users logged on to the mid tier — The number of HTTP session objects alive in the mid tier.
- The use cases supported — A use case that retrieves more data from the AR System server (such as reporting) creates more heap usage on the mid tier.
Another key item in JVM heap allocation is to always allocate the JVM heap up front (deterministic JVM heap allocation) so that none of the allocated memory is virtual (range JVM heap allocation). This helps the system behavior to be predictable as virtual allocation may cause the OS to swap.
Though a range allocation for the JVM heap is more flexible, when a range is allocated (by having the start value less than the maximum value), the JVM adheres to the following process:
- Allocates the start value at startup time
- Allocates the delta as virtual
- Requests system allocation when necessary as the heap usage grows
This works reasonably well if the OS has sufficient resources. In practice, however, when the system’s resource is constrained, a call from the JVM to acquire more heap space may send the system into swap, which is not recommended. For more information, see http://www.oracle.com/technetwork/java/javase/gc-tuning-6-140523.html.
To allocate the entire JVM heap up front as recommended, set the start and maximum value to the same number. The following example allocates 2GB to the JVM heap. Add the following arguments to your JVM arguments:
Use the following rules when allocating JVM heap size. (With the exception of SSL, the rest of the rules are specific to the mid-tier web application and are not usable for other web application deployment.)
- BMC Remedy Mid Tier 7.6.04SP2 requires a minimum of 1.5GB to support a workload of 300 concurrent web users in typical ITSM use cases. A minimum of 2 GB for a workload of 300 – 500 concurrent users for typical ITSM use cases with the following AR System server property limit:
Different use-case execution has different JVM heap usage. Typically, the more data a BMC Remedy AR System API call returns, the more JVM heap the mid tier needs to handle the data. For details on the workload, see the BSM case study.
- If Tomcat is handling SSL, allocate more memory than without SSL because SSL handling requires extra JVM heap allocation. The rule of thumb is 20% extra. (This depends on the number of concurrent users and the rate of simultaneous HTTP requests generated.)
- In general, to support more concurrent users per mid-tier instance, add 1 MB per user. This is based on empirical approximation; the exact value depends on which use case is being executed. See JVM CPU utilization and JVM heap utilization below.
- To allocate more memory to the ehcache in the mid tier, for each additional 500MB, increase the value of arsystem.ehcache.referenceMaxElementsInMemory in <midTierInstallationFolder>/WEB-INF/classes/config.properties by 1250. For example, consider that the default maximum heap size is 1024 MB and the arsystem.ehcache.referenceMaxElementsInMemory is set to a value of 1250. If you have additional memory available and you can allocate additional 500 MB heap size (total 1524 MB), you can change the value of referenceMaxElementsInMemory to 2500 (1250+1250) to get the benefit of the mid tier cache.
- Do not modify other ehcache properties unless you have the expertise to do so.
- The larger the JVM heap, the more important the GC tuning is. (BMC recommends fine-grained GC tuning for 4 GB heap or larger. A good starting point is http://java.sun.com/docs/hotspot/gc1.4.2/example.html.)
When allocating and accounting for system memory usage, add the PermSize to the JVM heap size to get the total heap allocation to the JVM.
The following figure shows the heap usage during a 1-hour, 300-user load test executing typical ITSM 7.6 use cases. The JVM was allocated 2GB for heap, 256 MB for PermSize, with the low-pause GC. The hardware was two CPUs @2GHz with 8 GB RAM. The graph shows that the JVM CPU is underutilized and can handle a higher HTTP request rate while the heap usage pattern is typical. Look for the following key points:
- A low percentage of CPU is spent on GC (blue graph of the first chart).
- The heap is fully utilized.
- Full GC cycles (there are two) did not use excessive CPU. (There was no jump in the blue graph of the first chart during major GC cycles.)
Heap usage during load testing
To install the Visual GC plug-in for
jvisualvm to observe finer-grained details of heap space usage, select Tools > Plugins, select Visual GC under the Available Plugins tab, and then let the plug-in self-install.
The interpretation of the Visual GC graphs is beyond the scope of this topic.
JVM CPU utilization
For JVM CPU utilization (based on your JVM monitored data for the interval of interest), CPU usage should not be above the 95% utilization for extended periods. In real deployment and under normal load, the JVM CPU should be underutilized so that it has sufficient space for surges such as users logging in at the start of a workday.
You can determine the maximum number of simultaneous HTTP requests during your monitored interval to see if this matches your load planning. If this exceeds your anticipated load, adjust the hardware scaling appropriately for your load. If your typical load has wide fluctuations, and you have a more lenient service time model, adjust your HTTP request buffer queue larger. (See Tomcat container workload configuration to learn how to determine the maximum number of simultaneous HTTP requests during the JVM monitored interval and how to adjust the HTTP request buffer queue.
The following shows various graphs of JVM CPU utilization, including categorization of behavior.
Ideal nominal load: Light usage with headroom for load surges
Ideal normal load: Some usage with headroom for load surges
Normal load with heavy constant usage: Can handle some load surges
Heavy load with heavy constant usage: Cannot handle load surges
Typically, your system should behave within the two extremes: Ideal Normal Load and Normal Load with Constant Usage.
The last graph of Heavy Load with Heavy Constant Usage indicates that the system is at its maximum handling capacity and will need vertical or horizontal scaling.
JVM heap utilization
For JVM heap utilization (based on your JVM monitored data for the interval of interest) at nominal load, your heap usage should be at about 25% of total allocated heap or less. During service, your heap usage will increase. (This is normal because Java has garbage collection instead of programmer-controlled memory allocation.) The normal pattern of increase is until about 90% of heap size. Then, a major GC cycle starts, provided that you did not adjust the default GC heap ratios. If you need the major GC to start before 90% is reached (to have more space for use cases such as running very large reports), resize the heap ratios. For more information, see http://java.sun.com/docs/hotspot/gc1.4.2/example.html.
For the heap percentage at nominal load, first invoke GC explicitly from the jvisualvm console. Otherwise, the graph would show heap usage ratio with objects that GC still needs to collect.
The following table shows the graph of JVM heap usage under load with different GC over a 20-minute monitoring interval.
Normal heap usage under load with low-pause GC
Normal heap usage under load with throughput GC
Other JVM settings
Set the following parameter if you want create heap dump whenever you encounter out of memory error:
Set the following parameter to specify the path of the heap dump directory: