Monitoring Remedy performance and capacity

This topic contains the following information:

Initial triage of performance problems involves observing resource consumption across the configurations. Resource profiles can point to the root cause of a problem, or they can help focus the search for a cause.

For each tier of your system, analyze the consumption of the following resources:

Central processing units (CPUs) — Include system time, user time, and I/O wait time. For benchmarks simulating online users, get CPU usage data for the computers driving the load.
Memory

Monitoring CPU consumption

CPU usage substantially affects system response time, usually in a nonlinear way. The following figure shows a typical response-time curve. Response time starts at a baseline and grows slowly while CPU usage is low. As CPU usage increases, the response-time slope becomes steeper until it is almost vertical.

Response-time curve

Response time growth is incremental until a significant increase is reached. The point at which the response time increase occurs is variable, but it typically occurs when CPU usage for at least one computer in the system is at least 60%, and sometimes as high as 90%.

Running only one server at high load in a multi-server configuration can substantially affect response time. Favorable response time does not always occur when some servers in the configuration are largely idle.

If the sharp increase in response time in the configuration's response-time curve is similar to the bottleneck in the figure but occurs while CPU usage on all systems is low, the consumption of a non-CPU resource is creating the system bottleneck.

CPU usage has several components:

User time — Time consumed by applications that are performing useful activity on the system. Generally, user time should constitute the majority of CPU usage.
System time — Time consumed by the OS kernel for core processing. If system time is higher than 5%, an OS-level resource or locking problem exists, which is a typical memory management issue.
I/O wait time — Time consumed waiting for a request to the I/O subsystem to return. If I/O wait time is substantial, the I/O subsystem cannot keep up with the CPU application processing. In this case, the system is likely to have poor response time unless the I/O subsystem bandwidth is improved or the application’s data requirements are tuned. Generally, I/O concerns are relevant only on the database system of a multi-tiered configuration.

Typical tools used to track CPU usage are the perfmon command for Microsoft Windows and the sar command for UNIX and Linux. In both cases, you should store the collected data for later review.

For UNIX systems, the top command can identify high-load processes, but its data is harder to store for review. Using the ps command might be a useful alternative. The top command is also helpful for monitoring relative CPU usage among running processes.

Monitoring memory consumption

Tracking memory consumption can be challenging because of the following situations:

Some systems assign unused memory to swap space or to other system resources.
Memory allocation algorithms vary depending on OS.
Understanding approximately how much of a configuration’s physical memory is in use during application execution is important.

A modern OS always provides a virtual memory space that far exceeds the system’s physical memory. If an application consumes nearly all the system’s physical memory, processes in shared memory are written to disk to free up space for new memory requirements. This practice is informally called swapping. Although swapping can be useful, it degrades both CPU usage and response time.

Note

Do not confuse process swapping with simple memory paging, which can be normal behavior and does not necessarily consume extra resources.

The primary goal of evaluating memory consumption is to ensure that:

The OS is not swapping
A running application can still allocate memory when needed

Because swapping increases CPU usage, the first symptom of running out of memory might be that CPU resources are fully used. Noticing how memory consumption grows over time is often useful. A process that continues to consume memory quickly might have a memory leak, which is likely to lead to swapping or failure.

On UNIX and Linux systems, application memory management (including how shared memory is configured) differs from vendor to vendor. To learn how to use your system’s virtual memory and shared memory management features to achieve the best performance for BMC applications, see your product documentation or consult your system administrator.

Monitoring Remedy performance and capacity

Monitoring CPU consumption

Monitoring memory consumption

Comments