Collecting additional metrics using the Sysdig agent
You can use the Sysdig agent to collect the additional metrics from your Linux and Windows virtual servers. These metrics are useful for gaining operational visibility into the performance and health of your applications, services, and platforms. The Sysdig agent collects these metrics and sends them to Sysdig instance. When you run the IBM Cloud API ETL, these metrics are imported into the BMC Helix Continuous Optimization database.
Collecting Sysdig performance metrics from Linux virtual server
- Log in to the virtual server by using your public IP address and root user name.
- Provision an instance of the IBM Cloud Monitoring.
One Sysdig service instance must be provisioned for each region. The user creating Sysdig instance must have "IBM Cloud Monitoring" privileges to create Sysdig instance.
Collecting Sysdig Performance metrics from Windows virtual server
The Prometheus WMI exporter runs as a Windows service. You can configure the metrics that you want to monitor by enabling the collectors.
The following collectors are supported by IBM:
- CPU
- Computer system metrics (cs)
- Disk metrics
- Network interface metrics
To monitor Windows systems metrics, use the default dashboard
Windows Node Overview
to view the Windows metrics. This default dashboard is located in the Hosts and Containers section.(Optional) Verify the uptime for Windows with Prometheus Blackbox exporter. For details, see Verifying uptime for Windows with Prometheus Blackbox exporter.
Metrics provided for Linux systems
BMC Helix Continuous Optimization | IBM Cloud metric | IBM Cloud metric label key | Time Aggregation type | Group Aggregation type | Formula | Description |
---|---|---|---|---|---|---|
NET_IN_BIT_RATE | net.bytes.in | timeAvg | sum | net.bytes.in*8 | This metric displays the inbound network bytes. | |
NET_OUT_BIT_RATE | net.bytes.out | timeAvg | sum | net.bytes.out*8 | This metric displays the outbound network bytes. | |
NET_BIT_RATE | net.bytes.total | timeAvg | sum | net.bytes.total*8 | This metric displays the total network bytes. | |
NET_CONNECTION_RATE | net.connection.count.total | timeAvg | sum | This metric displays the number of currently established connections. | ||
NET_IN_ERROR_RATE | net.error.count | timeAvg | sum | This metric displays the number of network errors. | ||
CPU_USED_NUM | cpu.cores.used | timeAvg | avg | This metric displays the CPU core usage of each container obtained from cgroups; and is equal to the number of cores used by the container. | ||
CPU_UTIL_IDLE | cpu.idle.percent | avg | avg | cpu.idle.percent/100 | This metric displays the percentage of time that the CPU/s were idle and the system did not have an outstanding disk I/O request. | |
CPU_UTIL_WAIO | cpu.iowait.percent | avg | avg | cpu.iowait.percent/100 | This metric displays the percentage of time that the CPU/s were idle during which the system had an outstanding disk I/O request. | |
CPU_UTIL_WAIT | cpu.stolen.percent | avg | avg | cpu.stolen.percent/100 | This metric measures the percentage of time that a virtual machine's CPU is in a state of involuntary wait because the physical CPU is shared among virtual machines. | |
CPU_UTIL_NICE | cpu.nice.percent | avg | avg | cpu.nice.percent/100 | This metric displays the percentage of CPU utilization that occurred while executing at the user level with Nice priority. | |
CPU_UTIL_SYSTEM | cpu.system.percent | avg | avg | cpu.system.percent/100 | This metric displays the percentage of CPU utilization that occurred while executing at the system level (kernel). | |
CPU_UTIL | cpu.used.percent | avg | avg | cpu.used.percent/100 | This metric displays the CPU usage for each host is obtained from /proc, and measured as the sum of the CPU usage of all cores, normalized by dividing by the number of cores. | |
CPU_UTIL_USER | cpu.user.percent | avg | avg | cpu.user.percent/100 | This metric displays the percentage of CPU utilization that occurred while executing at the user level (application). | |
BYFS_FREE | fs.bytes.free | fs.mountDir | avg | This metric displays the available filesystem space. | ||
BYFS_SIZE | fs.bytes.total | fs.mountDir | timeAvg | avg | This metric displays the total filesystem size. | |
BYFS_USED | fs.bytes.used | fs.mountDir | timeAvg | avg | This metric displays the used filesystem space. | |
BYFS_USED_SPACE_PCT | fs.bytes.used fs.bytes.total | fs.mountDir fs.mountDir | timeAvg timeAvg | avg avg | fs.bytes.used / fs.bytes.total | This metric displays the percentage of used disk space of a specific filesystem. |
TOTAL_FS_UTIL | fs.used.percent | avg | avg | This metric displays the amount of space written by a single container instance. | ||
TOTAL_FS_FREE | fs.bytes.free | timeAvg | avg | This metric displays the amount of free disk space on all filesystems, in Bytes. | ||
TOTAL_FS_SIZE | fs.bytes.total | timeAvg | avg | This metric displays the total size of all the filesystems, in Bytes | ||
TOTAL_FS_USED | fs.bytes.used | timeAvg | avg | This metric displays the amount of used disk space on all filesystems, in Bytes. | ||
DISK_USED_INODES_PCT | fs.inodes.used.percent | avg | avg | |||
BYFS_TOTAL_INODES | fs.inodes.total.count | fs.mountDir | timeAvg | avg | ||
BYFS_USED_INODES | fs.inodes.used.count | fs.mountDir | timeAvg | avg | ||
BYFS_FREE_INODES | fs.inodes.total.count fs.inodes.used.count | fs.mountDir | timeAvg | avg | fs.inodes.total.count - fs.inodes.used.count | |
MEM_FREE | memory.bytes.available | timeAvg | avg | This metric displays the amount of available memory. | ||
MEM_USED | memory.bytes.used | timeAvg | avg | This metric displays the amount of physical memory currently in use. | ||
MEM_VIRTUAL_TOTAL | memory.bytes.virtual | timeAvg | avg | This metric displays the virtual memory size of the process, in bytes. | ||
DISK_PAGING_IO_RATE | memory.pageFault.major | timeAvg | sum | This metric displays the count of the condition that occurs when a program accesses a memory page that is mapped in the virtual address space, but not loaded in physical memory. | ||
TOTAL_REAL_MEM | memory.bytes.total | timeAvg | avg | This metric displays the total memory of a host, in bytes. | ||
SWAP_SPACE_FREE | memory.swap.bytes.available | timeAvg | avg | This metric displays the swap memory available. | ||
SWAP_SPACE_TOT | memory.swap.bytes.total | timeAvg | avg | This metric displays the total amount of swap memory. | ||
SWAP_SPACE_USED | memory.swap.bytes.used | timeAvg | avg | This metric displays the amount of swap memory used. | ||
SWAP_SPACE_UTIL | memory.swap.used.percent | avg | avg | This metric displays the percentage of swap memory used. | ||
MEM_UTIL | memory.used.percent | avg | avg | This metric displays the percentage of physical memory in use. | ||
UPTIME | uptime | timeAvg | avg | (1-uptime)*3600 | This metric displays the percentage of time the selected entity or entities were down over the defined time window. |
Metrics provided for Windows systems
BMC Helix Continuous Optimization | IBM Cloud metric | IBM Cloud metric label key | Time Aggregation type | Group Aggregation type | Formula | Description |
---|---|---|---|---|---|---|
CPU_MHZ | wmi_cpu_core_frequency_mhz | avg | avg | This metric displays the core frequency. | ||
BYLDISK_IO_READ_RATE | wmi_logical_disk_reads_total | volume | timeAvg | avg | ||
DISK_IO_READ_RATE | wmi_logical_disk_reads_total | timeAvg | sum | |||
BYLDISK_IO_WRITE_RATE | wmi_logical_disk_writes_total | volume | timeAvg | avg | ||
DISK_IO_WRITE_RATE | wmi_logical_disk_writes_total | timeAvg | sum | |||
BYLDISK_SIZE | wmi_logical_disk_size_bytes | volume | avg | avg | ||
BYLDISK_FREE_SPACE | wmi_logical_disk_free_bytes | volume | avg | avg | ||
BYLDISK_USED_SPACE | wmi_logical_disk_size_bytes wmi_logical_disk_free_bytes | volume | avg | avg | wmi_logical_disk_size_bytes - wmi_logical_disk_free_bytes | |
TOTAL_LDISK_SIZE | wmi_logical_disk_size_bytes | avg | sum | |||
LDISK_FREE | wmi_logical_disk_free_bytes | avg | sum | |||
TOTAL_LDISK_USED | wmi_logical_disk_size_bytes wmi_logical_disk_free_bytes | avg | sum | wmi_logical_disk_size_bytes - wmi_logical_disk_free_bytes | ||
BYLDISK_READ_RESPONSE_TIME | wmi_logical_disk_read_seconds_total | volume | avg | avg | ||
DISK_READ_RESPONSE_TIME | wmi_logical_disk_read_seconds_total | avg | sum | |||
BYLDISK_WRITE_RESPONSE_TIME | wmi_logical_disk_write_seconds_total | volume | avg | avg | ||
DISK_WRITE_RESPONSE_TIME | wmi_logical_disk_write_seconds_total | avg | sum | |||
DISK_IO_RATE | wmi_logical_disk_reads_total wmi_logical_disk_writes_total | volume | timeAvg timeAvg | sum sum | wmi_logical_disk_reads_total + wmi_logical_disk_writes_total | This metric displays the disk Average I/O Rate aggregated by the host. |
NET_OUT_BIT_RATE | wmi_net_bytes_sent_total | avg | sum | wmi_net_bytes_sent_total*8 | This metric displays the total bytes transmitted by interface. | |
NET_IN_BIT_RATE | wmi_net_bytes_received_total | avg | sum | wmi_net_bytes_received_total*8 | This metric displays the total bytes received by interface. | |
NET_BIT_RATE | wmi_net_bytes_total | avg | sum | wmi_net_bytes_total*8 | This metric displays the total bytes received and transmitted by interface. | |
NET_OUT_PKT_ERROR_RATE | wmi_net_packets_outbound_errors | avg | sum | This metric displays the total packets that could not be transmitted due to errors. | ||
NET_IN_PKT_ERROR_RATE | wmi_net_packets_received_errors | timeAvg | sum | This metric displays the total packets that could not be received due to errors. | ||
NET_IN_PKT_RATE | wmi_net_packets_received_total | avg | sum | This metric displays the total packets received by interface. | ||
NET_OUT_PKT_RATE | wmi_net_packets_sent_total | avg | sum | This metric displays the total packets transmitted by interface. | ||
NET_PKT_RATE | wmi_net_packets_total | avg | sum | This metric displays the total packets received and transmitted by interface. | ||
NET_BANDWIDTH | wmi_net_current_bandwidth | avg | sum | This metric displays the estimate of the interface's current bandwidth. | ||
MEM_VIRTUAL_FREE | wmi_os_virtual_memory_free_bytes | avg | sum | This metric displays the bytes of virtual memory currently unused and available | ||
TOTAL_REAL_MEM | wmi_cs_physical_memory_bytes | timeAvg | sum | This metric displays the total installed physical memory. | ||
MEM_FREE | wmi_os_physical_memory_free_bytes | avg | sum | This metric displays the bytes of physical memory currently unused and available. | ||
MEM_USED | wmi_os_visible_memory_bytes wmi_os_physical_memory_free_bytes | avg | sum | wmi_os_visible_memory_bytes - wmi_os_physical_memory_free_bytes | This metric displays the total used memory, in bytes. | |
MEM_UTIL | wmi_os_visible_memory_bytes wmi_os_physical_memory_free_bytes | avg | sum | (wmi_os_visible_memory_bytes - wmi_os_physical_memory_free_bytes) / wmi_os_visible_memory_bytes | This metric displays the percentage of physical memory in use during the interval. | |
MEM_VIRTUAL_TOTAL | wmi_os_virtual_memory_bytes | avg | sum | This metric displays the bytes of virtual memory. | ||
PROCESS_NUM_RUNNING | wmi_os_processes | avg | sum | This metric displays the number of process contexts currently loaded or running on the operating system. | ||
MEM_COMMIT_LIMIT | wmi_os_paging_limit_bytes | avg | sum | This metric displays the total number of bytes that can be sorted in the operating system paging files. | ||
REQ_QUEUED | wmi_system_processor_queue_length | avg | avg | This metric displays the number of threads in the processor queue. | ||
UPTIME | wmi_system_system_up_time | avg | avg | This metric displays the time of last boot of system. | ||
CPU_UTIL_IDLE | wmi_cpu_time_total | Label key: mode Label value: idle | timeAvg | wmi_cpu_time_total | ||
CPU_UTIL_USER | wmi_cpu_time_total | Label key: mode Label value: user | timeAvg | wmi_cpu_time_total | ||
CPU_UTIL_INTERRUPT_HANDLING | wmi_cpu_time_total | Label key: mode Label value: interrupt | timeAvg | wmi_cpu_time_total | ||
CPU_UTIL_SYSTEM | wmi_cpu_time_total | Label key: mode Label value: privileged | timeAvg | wmi_cpu_time_total | ||
CPU_UTIL | wmi_cpu_time_total_privileged wmi_cpu_time_total_user | wmi_cpu_time_total_privileged
wmi_cpu_time_total_user
| timeAvg | wmi_cpu_time_total_privileged + wmi_cpu_time_total_user |
Troubleshooting Sysdig installation failure in Linux
If the Sysdig installation fails, perform the following tasks based on your operating system:
Type of error | Troubleshooting steps |
---|---|
Kernel headers are not available | Install the kernel headers manuallyFor Debian or Ubuntu Linux distribution
For RHEL, CentOS, and Fedora Linux distributions
|
sysdig-probe kernel module is not installed on the kernel |
|
Installation fails because the dkms_autoinstaller service is stopped |
|
The kernel packages are not available |
|
If the Sysdig agent installation still fails, check the logs at /opt/draios/etc/draios.log and raise support case with the IBM Support team.
Comments
Log in or register to comment.