Collecting GPU data using NVIDIA dcgm-exporter scripts
GPU metric collection and import rely on the following components:
- A Process Runner system task that executes the dcgm-exporter collection and aggregation scripts.
- A Generic – CSV Parser ETL that imports the generated CSV files into BHCO.
Before you begin
Before you start collecting GPU metrics, ensure that the following requirements are met:
- dcgm-exporter is installed and running on all GPU hosts.
- REE must have network access to GPU endpoints.
- REE must have bash, curl, and jq installed.
A text that lists all dcgm-exporter endpoints and their corresponding and polling intervals is created. Example:
http://gpu-host1:9400/metrics,30 http://gpu-host2:9400/metrics,30 http://gpu-host3:9400/metrics,10
Scripts and usage
Use the following scripts to collect, aggregate, and prepare GPU metrics for import into BHCO:
| Script | Purpose | When to use | How to run |
|---|---|---|---|
| dcgm-integration.sh | Wrapper that automates collection and aggregation | For scheduled, repeatable execution | $ ./dcgm-integration.sh -i <path_to_endpoint_file> |
| dcgm-collector.sh | Polls GPU endpoints and stores raw JSON | Always as the first step | ./dcgm-collector.sh -l <dcgm-exporter-endpoint> -p <dcgm-exporter polling interval> -d <duration is seconds> |
| dcgm-aggregator.sh | Aggregates JSON files into CSV | After collection, before ETL import | $ ./dcgm-aggregator.sh |
| include/metrics_mapping.sh | Maps dcgm metrics to BHCO metrics | Always used by the integration script | Auto-included |
| include/tools.sh | Checks and installs jq if missing | Always used by the integration script | Auto-included |
To collect and process GPU metrics using the dcgm-exporter scripts
The dcgm-integration.sh script reads the endpoint file and launches dcgm-collector.sh for each GPU host.
- dcgm-collector.sh script polls dcgm-exporter endpoint at configured intervals and files are stored under <BCO_HOME>/etl/scripts/nvidia-dcgm/dcgm-data/<GPU host>_<port> directory.
- dcgm-aggregator.sh processes the raw JSON files, generates CSV files in the <BCO_HOME>/etl/scripts/nvidia-dcgm/metrics-etl directory, and removes the processed JSON files. For details on available GPU metrics and their statistics, see GPU Metrics in NVIDIA Summary Datamart.
- All log files are stored in <BCO_HOME>/etl/scripts/nvidia-dcgm/ for review and troubleshooting.
How the GPU collection scripts work
The GPU collection scripts run only on the Remote ETL Engine (REE). They do not execute any commands on the GPU hosts.
The REE collects GPU metrics by sending HTTP requests to the dcgm-exporter /metrics endpoint exposed on each GPU-enabled host. The scripts use curl to retrieve the metrics and process the response locally on the REE.
- No remote login is required.
- No username or password is used.
- No SSH access or key exchange is needed.
The only requirement is network connectivity from the REE to the dcgm-exporter endpoint (for example, port 9400 over HTTP).
How to Collect and Import GPU Metrics Using NVIDIA dcgm-exporter
Follow these steps to collect GPU metrics from NVIDIA GPU–enabled hosts and import them into BMC Helix Continuous Optimization:
Ensure prerequisites are met
- dcgm-exporter is installed and running on all GPU-enabled hosts.
- The Remote ETL Engine (REE) has network access to the GPU endpoints.
- Bash, curl, and jq are installed on the REE.
Create an endpoint file on the REE
List all dcgm-exporter endpoints along with their polling intervals in a text file. Example:
http://gpu-host1:9400/metrics,30 http://gpu-host2:9400/metrics,30 http://gpu-host3:9400/metrics,10
Create a process runner script on the REE
The script should execute the dcgm-integration.sh script using the endpoint file:
./dcgm-integration.sh -i <path_to_endpoint_file>
Schedule the process runner script as a System Task
- Configure the task to run at the desired frequency (every 5, 15, or 60 minutes), preferably starting at the top of the hour.
- This ensures metrics are collected consistently for accurate reporting.
Create and schedule a Generic – CSV Parser ETL
Configure the ETL to import the generated CSV files from:
<BCO_HOME>/etl/scripts/nvidia-dcgm/metrics-etl
- Enable the option to append a .done suffix once a CSV file is processed.
- Schedule the ETL to run a few minutes after the System Task to ensure CSV files are ready for import.
To test a GPU endpoint manually
- Open a terminal on a system that has network access to the GPU host.
Run the following command:
$ curl -s http://<gpu-host>:9400/metrics
- Replace <gpu-host> with the hostname or IP address of your GPU server.
The command returns the raw metrics exposed by dcgm-exporter, which helps verify that the endpoint is reachable and exporting data correctly.
To create and schedule a system task
- Select Administration > ETL & System Tasks > System Tasks.
On the System tasks page, select Add > Add process runner task.
The Add task page displays the configuration properties.- Create a Process Runner task on the Remote ETL Engine (REE).


Configure the following settings:
Property Description Name The name of the system task or ETL job. Helps identify the task in the scheduler or task list. Description Optional field to provide details about the task’s purpose or functionality. Maximum execution time before warning The maximum time the task is allowed to run before a warning is triggered. Example: 4 hours. Frequency Determines how often the task runs. Can be Predefined (e.g., Each Day) or Custom (specific number of days/hours). Predefined frequency If using a predefined option, select from choices like Each Day, Each Week, etc. Start timestamp The exact time the task starts. Includes: - Hour (0–23) - Minute (0–59) - Week day (if applicable) - Month day (if applicable) Custom frequency If using a custom schedule, define the interval, e.g., every 1 day. Custom start timestamp The date and time when the task should first run, e.g., 09/10/2025 08:47. Running on scheduler Specifies the Remote ETL Engine (REE) or scheduler node where the task will execute.
- Click Save and schedule the task.
For more information, see Maintaining System tasks.
ETL setup
To import GPU metrics collected by the dcgm-exporter scripts, you need to configure an ETL. For detailed step-by-step instructions on creating a Generic CSV Parser ETL, see Generic - CSV file parser.
GPU-specific ETL settings:
| Property | Description |
|---|---|
| Type | Select Generic – CSV Parser. |
| File location | <BCO_HOME>/etl/scripts/nvidia-dcgm/dcgm-data/metrics-etl – directory where scripts generate CSV files. |
| Append suffix to parsed file | Enable the property to avoid overwriting previously imported CSV files. |
| Entity catalog | Select or create a catalog for GPU entities. |
| Collection level | Define data granularity (per GPU, per host, or per cluster). |
| Object relationships | Map GPU metrics to their parent systems or clusters. |
| Percentage format | Set to 0–100 for GPU metrics expressed as percentages. |

To verify GPU metrics in BMC Helix Continuous Optimization
- Check ETL logs – Ensure that CSV files were parsed and imported successfully.
- Verify GPU metrics – Navigate to Workspace > Systems to confirm that GPU metrics are visible.
For a visual representation and analysis of the imported data, see Servers GPU Views.
For detailed metric definitions, see in GPU Metrics in NVIDIA Summary Datamart. - Confirm updates – Make sure the metrics refresh according to the scheduled frequency.
Best practices
- Always run dcgm-integration.sh via a System Task rather than manually to ensure consistent data collection.
- Update /etc/dcgm-exporter/default-counters.csv if you need to collect additional GPU metrics.
- Keep your endpoint file current whenever GPU hosts are added or removed.
- Regularly monitor logs to troubleshoot any issues such as network errors, missing tools (like jq), or misconfigured endpoints.