Collecting additional metrics using the Sysdig agent

You can use the Sysdig agent to collect the additional metrics from your Linux and Windows virtual servers. These metrics are useful for gaining operational visibility into the performance and health of your applications, services, and platforms. The Sysdig agent collects these metrics and sends them to Sysdig instance. When you run the IBM Cloud API ETL, these metrics are imported into the BMC Helix Capacity Optimization database.

Collecting Sysdig performance metrics from Linux virtual server

  1. Log in to the virtual server by using your public IP address and root user name.
  2. Provision an instance of the IBM Cloud Monitoring.
    1. One Sysdig service instance must be provisioned for each region. The user creating Sysdig instance must have "IBM Cloud Monitoring" privileges to create Sysdig instance.

      1. Log in to the IBM Cloud console. Open link
      2. In the IBM Cloud Console header, click Manage > Access (IAM).
      3. From the left navigation page, select Users.
      4. In the Account users table, identify the user to whom you want to assign the access. From the Actions menu of that user, click Assign access.
      5. Select Assign access within a resource group.
      6. Select a resource group.
      7. If the user does not have a role already granted for the selected resource group, choose a role for the Assign access to a resource group field.
        Depending on the role that you select, the user can view the resource group on their dashboard, edit the resource group name, or manage user access to the group. You can select No access, if you want the user to have access only to the IBM Cloud Monitoring in the resource group.

      8. Select IBM Cloud Monitoring.

      9. Select the platform role Administrator.
      10. Click Assign.
    2. To add monitoring features with IBM Cloud Monitoring in the IBM Cloud, you need to provision an instance of the IBM Cloud Monitoring service. You provision an instance within the context of a resource group. A resource group lets you organize your services for access control and billing purposes. You can provision the IBM Cloud Monitoring with Sysdig instance in the default resource group or in a custom resource group. When you provision an instance, you automatically get an ingestion key, known as the Sysdig access key.

      1. Log in to the IBM Cloud console. Open link

      2. From the IBM Cloud dashboard, navigate to the menu > Observability to access the Observability dashboard.

      3. Select Monitoring > Options > Create.

      4. Select the region.

      5. Select a service plan. By default, the Trial plan is set. For more information about the service plans, see  Service plans. Open link

      6. Enter a service name.

      7. Select a resource group. By default, the Default resource group is set.

      8. Set on automatic collection of platform metrics by clicking Enable.

      9. Click Create to provision an instance.
        The service UI is displayed.

      To provision an instance of Sysdig by using the CLI, see Provisioning a Sysdig instance by using the CLI. Open link

    3. To configure your Linux host (Ubuntu server) to send metrics to your IBM Cloud Monitoring instance, install a Sysdig agent.

      Complete the following steps from the command line:

      1. Open the terminal.

      2. Run the following command to log in to the IBM Cloud:

        ibmcloud login -a cloud.ibm.com

        Select the account where the IBM Cloud Monitoring instance is available.

      3. Obtain the Sysdig access key.

        1. Log in to the IBM Cloud console. Open link .

        2. From the left navigation page, select Observability.
        3. Select Monitoring. The IBM Cloud Monitoring dashboard is displayed. A list of monitoring instances that are available on IBM Cloud are displayed.

        4. Identify the instance for which you want to get the access key. Select actions, then click View Key. A pop up window opens with the key information.

        5. Click the eye icon to view the access key.

          To obtain the access key by using the CLI, see Getting the access key by using the CLI Open link

      4. Obtain the IBM region list. For information, see Regions and endpoints. Open link .

      5. Obtain the region-specific public collector endpoint. For information, see Public collector endpoints. Open link
      6. Run the following command to deploy the Sysdig agent on the virtual server:
        curl -s https://s3.amazonaws.com/download.draios.com/stable/install-agent | sudo bash -s -- --access_key <SYSDIG_ACCESS_KEY> --collector <COLLECTOR_ENDPOINT> --collector_port 6443 --secure true --check_certificate false --tags TAG_DATA --additional_conf 'sysdig_capture_enabled: false'

        Example:
        command for frankfurt region from our environment: curl -s https://s3.amazonaws.com/download.draios.com/stable/install-agent | sudo bash -s -- --access_key 2cefff44-4cba-4c8d-afc0-a8563ee8049a --collector ingest.eu-de.monitoring.cloud.ibm.com --collector_port 6443 --secure true --check_certificate false --tags type:sysdig-agent,location:frankfurt,sourceType:virtualserver --additional_conf 'sysdig_capture_enabled: false'

      7. Verify the status of the dragent service

        Run the command: systemctl status dragent.service

      After the installation is done, check the contents of the /opt/draios/etc/dragent.yaml file. The values of ssl, ssl_verify_certificate, and sysdig_capture_enabled properties must be set to the following:

      • ssl: true 

      • ssl_verify_certificate: false

      • sysdig_capture_enabled: false

      If these values are not correct in the dragent.yml file, set these properties manually and save the file.

      (Optional) To filter the metrics, add the metrics_filter property in dragent.yaml file. For details, see Including and excluding metrics. Open link

      To view the metrics in the IBM Cloud Sysdig UI, launch the Sysdig Web UI. For details, see Launch the Web UI. Open link . In the Host and containers section, you can find the entry for your Ubuntu server.


Collecting Sysdig Performance metrics from Windows virtual server

The Prometheus WMI exporter runs as a Windows service. You can configure the metrics that you want to monitor by enabling the collectors.

The following collectors are supported by IBM:

  • CPU
  • Computer system metrics (cs)
  • Disk metrics
  • Network interface metrics
    1. Log in to your Windows computer.

    2. Download the Prometheus exporter. Open link
      BMC Helix Capacity Optimization does not support v0.13.0 and later versions of the Prometheus exporter.
    3. Identify the collectors that contain the information for the metric data that you want to send to the Sysdig agent.

    4. Run the wmi_exporter and configure the collectors that you want to enable.
      .\wmi_exporter-0.12.0-amd64.exe --collectors.enabled <COLLECTORS>

      where, <COLLECTORS> indicates the list of connectors that you want to configure
      Example: To collect computer system metrics (cs), CPU metrics, disk metrics, and network interface I/O metrics use the following command:

      .\wmi_exporter-0.10.2-amd64.exe --collectors.enabled "os,cpu,logical_disk,net,system"

    Note: The ETL does not support the latest version of the wmi exporter. Ensure that you download the 0.12.0 version (known as wmi_exporter and not the windows_exporter) of the exporter.

    1. Enable the Windows firewall to allow access to wmi_exporter-0.12.0-amd64.exe.

    2. (Optional) Update the VPC rules. If you use private endpoints, add an inbound rule to the security group for port 9182 with source type = Security Group and choose the security group for the Windows system.

  1. Use the Prometheus remote-write capabilities to push the metrics from the Windows system by running Prometheus as a client collector on Windows.

    1. Download the Prometheus monitoring system and time series database.  Download prometheus-2.15.2.windows-amd64.tar.gz file. Open link

    2. Unzip the prometheus-2.15.2.windows-amd64.tar.gz file.

    3. Edit the prometheus.yml file in a text editor.

    4. Configure the scrape_configs section of prometheus.yml configuration file as follows to have prometheus scrape the Windows wmi_exporter.

       scrape_configs:
         # The job name is added as a label `job=<job_name>` to any timeseries scraped from this configuration.
         - job_name: 'wmi_exporter'
      
           static_configs:
            - targets: ['localhost:9182']
      
            labels:
              region: us-east
              instance: <HOSTNAME>
              job: <JOBNAME>

      where,

      • <HOSTNAME> is the name of the Windows system

      • <JOBNAME> is a custom attribute that you can set to identify the role of the node that you are scraping, and you can also use to scope the data in Sysdig

    5. Add the remote_write configuration at the end of the prometheus.yml file to configure the target Sysdig instance that will receive the metrics.

       remote_write:
         - url: "ENDPOINT/api/prometheus/write"
      
           bearer_token_file: C:\Users\Administrator\prom\sysdig-apikey
      
           write_relabel_configs:
             # Drop forwarding the metrics generated by the exporter that are not supported
             - source_labels: ["__name__"]
               regex: "^wmi_(.*)"
               action: keep
      
             - regex: "(__name__)|(job)|(region)|(instance)|(status)|(core)|(name)|(start_mode)|(nic)|(volume)|(state)|(version)|(mode)|(branch)|(timezone)|(goversion)|(collector)|(revision)"
               action: labelkeep
      
      
      

      where,

      • ENDPOINT is the Sysdig collector endpoint. For the list of endpoints, see  Sysdig Collector endpoints. Open link

      • sysdig-apikey is the file that contains the Sysdig Monitor API Token. The file name does not have an extension.
        For information about how to get the API token, see  Getting the Sysdig API token. Open link
        Example: Completed version of the prometheus.yml

         # my global config
         global:
           scrape_interval:     15s # Set the scrape interval to every 15 seconds. Default is every 1 minute.
           evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute.
           # scrape_timeout is set to the global default (10s).
        
         # Alertmanager configuration
         alerting:
           alertmanagers:
           - static_configs:
             - targets:
               # - alertmanager:9093
        
         # Load rules once and periodically evaluate them according to the global 'evaluation_interval'.
         rule_files:
           # - "first_rules.yml"
           # - "second_rules.yml"
        
         # A scrape configuration containing exactly one endpoint to scrape:
         # Here it's Prometheus itself.
         scrape_configs:
           # The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
           - job_name: 'wmi_exporter'
        
             static_configs:
             - targets: ['localhost:9182']
        
               labels:
                 instance: "my-windows-hostname"
                 region: "us-south"
        
         # Connection to sysdig 
         remote_write:
           - url: "https://ingest.eu-gb.monitoring.cloud.ibm.com/api/prometheus/write"
        
             bearer_token_file: C:\Users\Administrator\prom\sysdig-api
        
             write_relabel_configs:
               - source_labels: ["__name__"]
                 regex: "^wmi_(.*)"
                 action: keep
        
               - regex: "(__name__)|(job)|(region)|(instance)|(status)|(core)|(name)|(start_mode)|(nic)|(volume)|(state)|(version)|(mode)|(branch)|(timezone)|(goversion)|(collector)|(revision)"
                 action: labelkeep
    6. Start the Prometheus executable from the location containing the prometheus.yml file. Run .\prometheus.exe.

  2. To monitor Windows systems metrics, use the default dashboard Windows Node Overview to view the Windows metrics. This default dashboard is located in the Hosts and Containers section.

  3. (Optional) Verify the uptime for Windows with Prometheus Blackbox exporter. For details, see  Verifying uptime for Windows with Prometheus Blackbox exporter. Open link

Metrics provided for Linux systems

BMC Helix Capacity Optimization

IBM Cloud metricIBM Cloud metric label keyTime Aggregation typeGroup Aggregation typeFormulaDescription
NET_IN_BIT_RATEnet.bytes.in
timeAvgsumnet.bytes.in*8This metric displays the inbound network bytes.
NET_OUT_BIT_RATEnet.bytes.out
timeAvgsumnet.bytes.out*8This metric displays the outbound network bytes.
NET_BIT_RATEnet.bytes.total
timeAvgsumnet.bytes.total*8This metric displays the total network bytes.
NET_CONNECTION_RATEnet.connection.count.total
timeAvgsum
This metric displays the number of currently established connections.
NET_IN_ERROR_RATEnet.error.count
timeAvgsum
This metric displays the number of network errors.
CPU_USED_NUMcpu.cores.used
timeAvgavg
This metric displays the CPU core usage of each container obtained from cgroups; and is equal to the number of cores used by the container.
CPU_UTIL_IDLEcpu.idle.percent
avgavgcpu.idle.percent/100This metric displays the percentage of time that the CPU/s were idle and the system did not have an outstanding disk I/O request. 
CPU_UTIL_WAIOcpu.iowait.percent
avgavgcpu.iowait.percent/100This metric displays the percentage of time that the CPU/s were idle during which the system had an outstanding disk I/O request.
CPU_UTIL_WAITcpu.stolen.percent
avgavgcpu.stolen.percent/100This metric measures the percentage of time that a virtual machine's CPU is in a state of involuntary wait because the physical CPU is shared among virtual machines.
CPU_UTIL_NICEcpu.nice.percent
avgavgcpu.nice.percent/100This metric displays the percentage of CPU utilization that occurred while executing at the user level with Nice priority.
CPU_UTIL_SYSTEMcpu.system.percent
avgavgcpu.system.percent/100This metric displays the percentage of CPU utilization that occurred while executing at the system level (kernel). 
CPU_UTILcpu.used.percent
avgavgcpu.used.percent/100This metric displays the CPU usage for each host is obtained from /proc, and measured as the sum of the CPU usage of all cores, normalized by dividing by the number of cores.
CPU_UTIL_USERcpu.user.percent
avgavgcpu.user.percent/100This metric displays the percentage of CPU utilization that occurred while executing at the user level (application). 
BYFS_FREEfs.bytes.freefs.mountDir
avg
This metric displays the available filesystem space.
BYFS_SIZEfs.bytes.totalfs.mountDirtimeAvgavg
This metric displays the total filesystem size.
BYFS_USEDfs.bytes.usedfs.mountDirtimeAvgavg
This metric displays the used filesystem space.
BYFS_USED_SPACE_PCT

fs.bytes.used

fs.bytes.total

fs.mountDir

fs.mountDir

timeAvg

timeAvg

avg

avg

fs.bytes.used / fs.bytes.totalThis metric displays the percentage of used disk space of a specific filesystem.
TOTAL_FS_UTILfs.used.percent


avgavg
This metric displays the amount of space written by a single container instance.

TOTAL_FS_FREE

fs.bytes.free


timeAvgavg
This metric displays the amount of free disk space on all filesystems, in Bytes.

TOTAL_FS_SIZE

fs.bytes.total


timeAvgavg

This metric displays the total size of all the filesystems, in Bytes

TOTAL_FS_USEDfs.bytes.used


timeAvgavg
This metric displays the amount of used disk space on all filesystems, in Bytes.
DISK_USED_INODES_PCTfs.inodes.used.percent
avgavg

BYFS_TOTAL_INODESfs.inodes.total.countfs.mountDirtimeAvgavg

BYFS_USED_INODESfs.inodes.used.countfs.mountDirtimeAvgavg

BYFS_FREE_INODES

fs.inodes.total.count

fs.inodes.used.count

fs.mountDirtimeAvgavgfs.inodes.total.count - fs.inodes.used.count
MEM_FREEmemory.bytes.available
timeAvgavg
This metric displays the amount of available memory. 
MEM_USEDmemory.bytes.used
timeAvgavg
This metric displays the amount of physical memory currently in use. 
MEM_VIRTUAL_TOTALmemory.bytes.virtual
timeAvgavg
This metric displays the virtual memory size of the process, in bytes. 
DISK_PAGING_IO_RATEmemory.pageFault.major
timeAvgsum
This metric displays the count of the condition that occurs when a program accesses a memory page that is mapped in the virtual address space, but not loaded in physical memory. 
TOTAL_REAL_MEMmemory.bytes.total
timeAvgavg
This metric displays the total memory of a host, in bytes.
SWAP_SPACE_FREEmemory.swap.bytes.available
timeAvgavg
This metric displays the swap memory available.
SWAP_SPACE_TOTmemory.swap.bytes.total
timeAvgavg
This metric displays the total amount of swap memory.
SWAP_SPACE_USEDmemory.swap.bytes.used
timeAvgavg
This metric displays the amount of swap memory used.
SWAP_SPACE_UTILmemory.swap.used.percent
avgavg
This metric displays the percentage of swap memory used.
MEM_UTILmemory.used.percent
avgavg
This metric displays the percentage of physical memory in use.
UPTIMEuptime
timeAvgavg(1-uptime)*3600This metric displays the percentage of time the selected entity or entities were down over the defined time window.

Metrics provided for Windows systems

BMC Helix Capacity Optimization

IBM Cloud metricIBM Cloud metric label keyTime Aggregation typeGroup Aggregation typeFormulaDescription
CPU_MHZwmi_cpu_core_frequency_mhz
avgavg
This metric displays the core frequency.
BYLDISK_IO_READ_RATEwmi_logical_disk_reads_totalvolumetimeAvgavg

DISK_IO_READ_RATEwmi_logical_disk_reads_total
timeAvgsum

BYLDISK_IO_WRITE_RATEwmi_logical_disk_writes_totalvolumetimeAvgavg

DISK_IO_WRITE_RATEwmi_logical_disk_writes_total
timeAvgsum

BYLDISK_SIZEwmi_logical_disk_size_bytesvolumeavgavg

BYLDISK_FREE_SPACEwmi_logical_disk_free_bytesvolumeavgavg

BYLDISK_USED_SPACE

wmi_logical_disk_size_bytes

wmi_logical_disk_free_bytes

volumeavgavgwmi_logical_disk_size_bytes - wmi_logical_disk_free_bytes
TOTAL_LDISK_SIZEwmi_logical_disk_size_bytes
avgsum

LDISK_FREEwmi_logical_disk_free_bytes
avgsum

TOTAL_LDISK_USED

wmi_logical_disk_size_bytes

wmi_logical_disk_free_bytes


avgsumwmi_logical_disk_size_bytes - wmi_logical_disk_free_bytes
BYLDISK_READ_RESPONSE_TIMEwmi_logical_disk_read_seconds_totalvolumeavgavg

DISK_READ_RESPONSE_TIMEwmi_logical_disk_read_seconds_total
avgsum

BYLDISK_WRITE_RESPONSE_TIMEwmi_logical_disk_write_seconds_totalvolumeavgavg

DISK_WRITE_RESPONSE_TIMEwmi_logical_disk_write_seconds_total
avgsum

DISK_IO_RATE

wmi_logical_disk_reads_total

wmi_logical_disk_writes_total

volume

timeAvg

timeAvg

sum

sum

wmi_logical_disk_reads_total + wmi_logical_disk_writes_totalThis metric displays the disk Average I/O Rate aggregated by the host.
NET_OUT_BIT_RATEwmi_net_bytes_sent_total
avgsumwmi_net_bytes_sent_total*8This metric displays the total bytes transmitted by interface.
NET_IN_BIT_RATEwmi_net_bytes_received_total
avgsumwmi_net_bytes_received_total*8This metric displays the total bytes received by interface.
NET_BIT_RATEwmi_net_bytes_total
avgsumwmi_net_bytes_total*8This metric displays the total bytes received and transmitted by interface.
NET_OUT_PKT_ERROR_RATEwmi_net_packets_outbound_errors
avgsum
This metric displays the total packets that could not be transmitted due to errors.
NET_IN_PKT_ERROR_RATEwmi_net_packets_received_errors
timeAvgsum
This metric displays the total packets that could not be received due to errors.
NET_IN_PKT_RATEwmi_net_packets_received_total
avgsum
This metric displays the total packets received by interface.
NET_OUT_PKT_RATEwmi_net_packets_sent_total
avgsum
This metric displays the total packets transmitted by interface.
NET_PKT_RATEwmi_net_packets_total
avgsum
This metric displays the total packets received and transmitted by interface.
NET_BANDWIDTHwmi_net_current_bandwidth
avgsum
This metric displays the estimate of the interface's current bandwidth.
MEM_VIRTUAL_FREEwmi_os_virtual_memory_free_bytes
avgsum
This metric displays the bytes of virtual memory currently unused and available
TOTAL_REAL_MEMwmi_cs_physical_memory_bytes
timeAvgsum
This metric displays the total installed physical memory.
MEM_FREEwmi_os_physical_memory_free_bytes
avgsum
This metric displays the bytes of physical memory currently unused and available.

MEM_USED

wmi_os_visible_memory_bytes

wmi_os_physical_memory_free_bytes


avg


sum

wmi_os_visible_memory_bytes - wmi_os_physical_memory_free_bytes

This metric displays the total used memory, in bytes.

MEM_UTIL

wmi_os_visible_memory_bytes

wmi_os_physical_memory_free_bytes


avg

sum

(wmi_os_visible_memory_bytes - wmi_os_physical_memory_free_bytes) / wmi_os_visible_memory_bytes

This metric displays the percentage of physical memory in use during the interval.

MEM_VIRTUAL_TOTALwmi_os_virtual_memory_bytes
avgsum
This metric displays the bytes of virtual memory.
PROCESS_NUM_RUNNINGwmi_os_processes
avgsum
This metric displays the number of process contexts currently loaded or running on the operating system.
MEM_COMMIT_LIMITwmi_os_paging_limit_bytes
avgsum
This metric displays the total number of bytes that can be sorted in the operating system paging files. 
REQ_QUEUEDwmi_system_processor_queue_length
avgavg
This metric displays the number of threads in the processor queue. 
UPTIMEwmi_system_system_up_time
avgavg
This metric displays the time of last boot of system.
CPU_UTIL_IDLEwmi_cpu_time_total

Label key: mode

Label value: idle

timeAvg
wmi_cpu_time_total
CPU_UTIL_USERwmi_cpu_time_total

Label key: mode

Label value: user

timeAvg
wmi_cpu_time_total
CPU_UTIL_INTERRUPT_HANDLINGwmi_cpu_time_total

Label key: mode

Label value: interrupt

timeAvg
wmi_cpu_time_total
CPU_UTIL_SYSTEMwmi_cpu_time_total

Label key: mode

Label value: privileged

timeAvg
wmi_cpu_time_total
CPU_UTIL

wmi_cpu_time_total_privileged

wmi_cpu_time_total_user

wmi_cpu_time_total_privileged

  • Label key: mode
  • Label value: privileged

wmi_cpu_time_total_user

  • Label key: mode
  • Label value: user
timeAvg
wmi_cpu_time_total_privileged + wmi_cpu_time_total_user

Troubleshooting Sysdig installation failure in Linux

If the Sysdig installation fails, perform the following tasks based on your operating system:

Type of errorTroubleshooting steps

Kernel headers are not available

Install the kernel headers manually

For Debian or Ubuntu Linux distribution

  1. Select a distribution (cat /etc/os-release)
  2. Run the following command for the selected distribution:
    apt-get -y install linux-headers-$(uname -r)
  3. If the error still persists, run the following command:
    yum install kernel kernel-headers
  4. Deploy the Sysdig agent

For RHEL, CentOS, and Fedora Linux distributions

  1. Select a distribution (cat /etc/os-release)
  2. Run the following command for the selected distribution:
     yum -y install kernel-devel-$(uname -r)
  3. If the error still persists, run the following command:
    yum install kernel kernel-headers
  4. Deploy the Sysdig agent
sysdig-probe kernel module is not installed on the kernel
  1. Install the kernel module using the following command:
    yum install kernel kernel-headers
  2. Deploy the Sysdig agent
Installation fails because the dkms_autoinstaller service is stopped
  1. Use the following commands to start the service:
    sudo yum -y install kernel-devel-$(uname -r)
    sudo /usr/lib/dkms/dkms_autoinstaller start
  2. Deploy the Sysdig agent
The kernel packages are not available
  1. Run the following command to get the names of the packages that are not available. The package names are available in the error that is generated after running the following command:
    yum -y install kernel-devel-$(uname -r)
  2. Download the package from the https://rpmfind.net/linux/rpm2html/search.php?query=kernel-devel-x86_64 using the wget command
  3. Install the missing package:
    sudo yum localinstall <RPM file name>

  4. Install the kernel headers:
    sudo yum -y install kernel-devel-$(uname -r)
    yum install kernel kernel-headers

If the Sysdig agent installation still fails, check the logs at /opt/draios/etc/draios.log and raise support case with the IBM Support team.

Was this page helpful? Yes No Submitting... Thank you

Comments