Monitoring the performance of Solaris operating system
Operating system monitoring includes tracking the overall health of your operating system. This includes checking the availability of the servers, determining if the disks that are heavily used or are running out of space, monitoring the system-wide CPU usage, monitoring the the memory utilization, and so on. The metrics listed here will help you determine the health of the Solaris server.
Monitoring CPU utilization
CPU utilization is one of the key performance indicators. Regular monitoring of CPU helps you understand specific patterns of CPU usage and enables you to get an accurate picture of your system’s capabilities.
Utilization
This attribute displays the percentage of CPU utilization. CPU utilization is calculated by adding user time and system time. High CPU utilization is not necessarily bad. High CPU utilization is good if all your work is being done as expected. Low CPU utilization is normal when the CPU has a light load. This attribute is more useful when looked at in combination with the Load attribute. A high load average along with low CPU utilization can indicate problems (most likely in the I/O subsystem). However, a low load average with high CPU utilization is common. This simply means that the CPU is being used efficiently. High CPU utilization can also be an indication of excessive paging activity. When the value of this attribute reaches the warning or alarm range, PATROL annotates the attribute's value with a report that displays the top 10 highest CPU-using processes. Annotations are denoted by an asterisk (or user-specified symbol) in the graph.
Warning at two consecutive hits in the 90% to 95% range and Alarm at two consecutive hits in the 95% to 100% range are reasonable for a typical system. However, knowing the characteristics of your system and its work load will help you determine whether to change the defaults.
As stated previously, CPU utilization is much more meaningful when combined with other attributes. However, if your system always seems to be running above 95% utilization and you have done everything you can to improve your system's performance, then you may want to alter or deactivate the default range thresholds to avoid false alarms.
Suggested action
If your CPU utilization is high and a particular process that is important to you is not executing quickly enough, you can use the renice command to increase the process' priority. If you have batch processes running, you can schedule some jobs to run during non-peak times.
IdleTime
This attribute displays the percentage of CPU time that is spent idle. A high percentage of idle time indicates that the server is not using its CPU to it's fullest capacity. Consider setting up a alert if the idle time percentage is high for a considerable amount of time.
Monitoring memory utilization
While monitoring the performance of the operating system it is important to measure the impact of free and used memory.
Real memory available for use by applications (ActualUsed)
This attribute reports the percentage of (logical) free memory available for use by applications on the system rather than the real free memory on the system. It includes the real memory consumed for file-system buffers and page cache. A high percentage of real memory indicates that the memory utilization is high and you might soon run out of memory resulting in critical performance issues.
Set threshold on this attribute to get notified about the real memory available on the host in percent. Consider setting up an alert to trigger if the real memory is less than 15 percent.
Free Memory (Free)
This attribute displays the number of 1 KB pages of memory available. The amount of available free memory is critical for the operating system. If the amount is low and many applications are running, the operating system may start swapping information from main memory to secondary memory.
Set threshold on this attribute to get notified about the free memory available on the host in MB. Consider setting up an alert to trigger if the free memory is less than 15 percent.
Monitoring Filesystem utilization
You can use the following metrics to monitor Filesystem statistics of the server to ensure that enough storage capacity is available.
Utilization
This attribute displays the percentage of file system storage that is currently in use. It is calculated by dividing the total number of blocks used by the total number of blocks and then multiplying the result by 100. High utilization indicates that the filesystem storage might soon run out of space and might cause performance issues.
Set threshold on this attribute to get notified about the filesystem utilization in percent. Consider setting up an alert to trigger if the utlization is more than 90 percent.
MountStatus
This attribute displays mount status of the file system. The status can be 0-Ok, 1-Unmounted, or 2-Unknown. This attribute helps to monitor the availability of important filesystems. Consider setting up an alert to trigger if the filesystem status is unmounted or unknown of your key filesystems.
DatasetspaceUtilization (ZFS)
This attribute displays the space used by a ZFS dataset in GB. The amount of memory used by the ZFS dataset will be freed if the dataset is destroyed.
Monitoring Process utilization
This allows user to monitor system wide Process statistics and custom process statistics
CPUUsage (KIS_Process)
This attribute displays the percentage of CPU used by the selected process. This percentage is calculated on the total number of active CPUs in the system. If the CPU utilization is continously high for a specific process, you might want to check the patricular process.