Performance benchmarks

BMC conducted extensive tests to help size BMC IT Business Management Suite deployment more accurately for best possible performance and scalability. BMC used the following process:

Tested the BMC Demand and resource Management, BMC Financial Planning and Budgeting, BMC Service Cost management modules and time carding feature individually to understand the performance and scalability characteristics of each individual module.
Tested all the modules together.

This section describes the test results of multi-module mixed workload.

Lab environment used in performing performance tests

The following lab environment was used in performance tests:

Hardware

For the multi-module mixed workload tests, the following hardware systems were used:

JBoss Server: A Xeon 5620-based, 2x4 cores with HT @ 2.4 GHz physical system with 32 GB RAM, running on 64-bit Windows 2008 R2. The JBoss version was 4.3 with 64-bit JVM 1.6.0 update 21.
Oracle 11g R2: A Xeon 5320-based, 2x4 cores without HT @ 1.86 GHz physical system with 16 GB RAM, running on Red Hat Enterprise Linux Server release 6.0 (Santiago). A local RAID 0 was used for data storage.
AR System: ≥ a VM with 4 vCPU based on Xeon 7330 @ 2.4 GHz, 8 GB RAM, running on 64-bit Windows 2008 R2. The ESX Server was a system with 16 CPU/16GB RAM running VMware vSphere 4.1.0.
All above systems were located on a dedicated performance test subnet on the same LAN.

Workload

The mixed workload tests consisted of three modules: BMC Demand and resource Management, BMC Financial Planning and Budgeting, BMC Service Cost Management. BMC used the Silk Performer 2010 SP1 load test software to drive the tests. Five scripts per module were developed to cover five different use cases for each module, or a total of 15 silk scripts for the entire mixed workload. For this report, the test results with 500 concurrent users (180 for BMC Demand and Resource Management, 200 for BMC Financial Planning and Budgeting, and 120 for BMC Service Cost management) are presented, as it turned out that this would be one of the typical use scenarios that this setup could support.

System configurations

For best possible performance and scalability with the given hardware and workload, the following settings were applied to configuring each component:

JBoss Server:
- JVM min/max heap size: 2048 MB
- JVM maxPermSize: 512 MB
- Oracle min/max connection pool size: 20/1020
Oracle 11g R2:
- CURSOR_SHARING: FORCE
- Automatic Memory Management: memory_target = 6.71 GB, with a typical dynamic division of 4.16 GB for SGA and about 400 MB for PGA, respectively.
- All other Oracle settings: default
AR System:
- # of fast threads: 20 (5 per CPU)
- # of list threads: 12 (3 per CPU)
- Next ID Block Size: 100

Test Results

The 500 user multi-module mixed workload test was set to run for a period of one hour with a steady queuing model.

The following figure shows how the pattern of the number of 500 concurrent users at a given instant evolved. Concurrent users are users that log into the application, use some parts of the application and then log out during a one hour period.

Note

The Silk software counts a user only when the user is logged in to the application. Therefore the following figure shows about 120 – 140 active users on average.

Hybrid 500 concurrent user test (concurrency measured by the number of active users with time during the test duration)

The next set of figures show the average server response times in seconds for some of the typical actions performed by users with BMC Demand and resource Management, BMC Financial Planning and Budgeting, BMC Service Cost Management modules, respectively. These numbers may vary in an actual production environment, depending on many factors such as the hardware and system configurations, actual use scenarios, and so on.

500 user mixed workload test (average server response times with some typical actions from the BMC Demand and resource Management use case)

500 user mixed workload test (average server response times with some typical actions from the BMC Financial Planning and Budgeting use case)

500 user mixed workload test (average server response times with some typical actions from the BMC Service Cost Management use case)

Resource utilization

In this section, information on resource utilization is provided for both the 500 user mixed workload test and for individual module tests.

The following figure shows the average total CPU utilization on the JBoss server with the 500 user mixed workload test scenario. Although the average usage was only 39% as indicated at the bottom of the chart, after mid-point, it went up to over 60% and occasionally reached 100%. System performance starts to degrade whenever the CPU usage goes above 70%. This is how BMC determined the capacity of 500 users for this mixed workload scenario with the given environment.

Average total CPU utilization on the JBoss server during the one hour duration for the 500 user mixed workload test scenario

The following figure shows the average total CPU utilization on the AR System server with the 500 user mixed workload test scenario. Since for this test, JBoss processes most of the application logic and the AR System is used for user authentication only, very low CPU activities on the AR System server is expected in general.

Average total CPU utilization on the AR System server during the one hour duration for the mixed workload 500 user test scenario

The following figure shows the average total CPU utilization on the Oracle database server with the 500 user mixed workload test scenario. The CPU usage during the one-hour period was 12% only, but for most part of the period, it was between 10 – 20%. This helps verify that CPU on the Oracle database server is not a significant concern, in general, as long as a mid-range physical server is used.

Average total CPU utilization on the Oracle database server during the one hour duration for the mixed workload 500 user test scenario

The following figure shows the memory usage on each server during the one hour duration for the 500 user mixed workload test scenario.

Memory usage on each server during the one hour duration for the mixed workload 500 user test scenario

To help size the JBoss server for different schemes of user allocations to different modules, the following figure shows the average CPU usage when each module was tested alone, prior to the mixed workload 500 user test. With those tests, 300 users were used for BMC Demand and resource Management, 200 users were used for BMC Financial Planning and Budgeting, and 120 users were used for BMC Service Cost Management. As is seen, the BMC Demand and Resource Management module uses more CPU resource than the other two modules. If we divide the CPU usage by 100 users for each module, a rule of thumb is derived that the BMC Demand and Resource Management module uses 10% CPU per 100 users, whereas the BMC Financial Planning and Budgeting and BMC Service Cost Management modules uses 5% CPU per 100 users each. However, these per-100 user CPU usage numbers will vary with different hardware systems with different CPUs (model and power) for the JBoss server.
Average CPU usage on the JBoss server when each module was tested alone

Further JBoss JVM tunings

In order to help investigate whether some non-default JBoss Server JVM settings, such as --XX:+UseCompressedOops and -XX:+UseConcMarkSweepGC (note that according to page 559 in the book Java Performance, published in 2012, -XX:+UseParNewGC is auto-enabled when -XX:+UseConcMarkSweepGC is specified), could help reduce JBoss server CPU utilizations and improve overall application performance, a few additional runs were conducted. The results are shown in Table 2, with both response times averaged over all custom timers and JBoss server average total CPU usage for each test case reported. The response times were averaged over all custom timers by Silk, while JBoss server total CPU usages were averaged from all 16 CPUs (100% is the max). Note the following observations:

With no new GC and oops applied, increasing JVM max heap size from 2 GB to 3 GB did not seem to result in noticeable improvements in terms of average response times and JBoss server CPU usage.
With both new GC and oops applied, average response time degraded by 8%, while the average CPU usage on the JBoss server increased by 13%.
The oops tuning alone without new GC did not seem to help improve or degrade application performance and JBoss CPU usage much.
Without oops tuning applied, the new GC tuning alone seems to have degraded the response times significantly as well as the JBoss CPU utilization.

In order to help evaluate the effects of the new GC tuning, the figures below show how CPU patterns on the JBoss server had evolved with different tunings corresponding to the runs described in the table below (the 2 GB baseline CPU chart is not repeated as it has been shown in Figure 6).
Average Response Times (seconds) and Average Total CPU Utilization on the JBoss Server with New GC & oops JVM Tunings (500 Mixed Workload Concurrent Users with 2 GB Heap Size and 512 MB Perm Size for the JBoss JVM)

JVM tuning (max heap/newGC/oops)	Response Time (s)	CPU Usage (%)
2 GB/no newGC/no oops	4.54	[ 39	]
3 GB/no newGC/no oops	4.36	40
3 GB/newGC + oops	4.73	45
3 GB/oops only	4.71	39
3 GB/newGC only w/o oops (Run #1)	17.21	44
3 GB/newGC only w/o oops (Run #2)	9.17	56

The following figures show JBoss server CPU utilization (average with a max of 100%) with various JVM settings corresponding to rows 2 – 6 in the table above.

Figure a: 3 GB baseline w/o new GC & oops

Figure b:: 3 GB + newGC + oops

Figure c: 3 GB + oops only (no newGC)

Figure d: 3 GB + newGC only (no oops, run #1)

Figure e: 3 GB + newGC only (no oops, run #2)

The new GC algorithm works in a multi-threaded, stop-the-world fashion on young generation garbage collection, which seems to have caused the obvious instability exhibited as large gaps and spikes as shown in Figures (d) and (e). The second run with the new GC tuning, shown in Figure (e), was conducted for the purpose of confirming the behavior observed from the first run with the new GC tuning as shown in Figure (d).