GPU Metrics In NVIDIA Summary Datamart
The NVIDIA Summary Datamart stores aggregated GPU metrics collected via the dcgm-exporter integration. Metrics are associated with the SYSGPU dataset and support monitoring, performance analysis, and capacity planning. Metrics collected here are imported from CSV files generated by the scripts in Collecting GPU Data Using NVIDIA dcgm-exporter.
Metrics List
| Metric | Statistic |
|---|---|
| BYGPU_CORRECTABLE_REMAPPED_ROWS | SUM |
| BYGPU_GPU_DECODER_UTIL | PCT95 |
| BYGPU_GPU_ENCODER_UTIL | PCT95 |
| BYGPU_GPU_MAX_OP_TEMP | LAST_VALUE |
| BYGPU_GPU_TEMP | MAX |
| BYGPU_GPU_UTIL | PCT95 |
| BYGPU_INDEX | VALUE |
| BYGPU_MEM_CLOCK_MHZ | PCT95 |
| BYGPU_MEM_COPY_UTIL | PCT95 |
| BYGPU_MEM_FREE | MIN |
| BYGPU_MEM_MAX_OP_TEMP | MAX |
| BYGPU_MEM_RESERVED | PCT95 |
| BYGPU_MEM_TEMP | MAX |
| BYGPU_MEM_USED | PCT95 |
| BYGPU_MEM_UTIL | PCT95 |
| BYGPU_MODEL | VALUE |
| BYGPU_NAME | None |
| BYGPU_PCIE_RETRIES | SUM |
| BYGPU_PWR_UTIL | AVG |
| BYGPU_ROW_REMAP_FAILURE | AVG |
| BYGPU_SHUTDOWN_TEMP | LAST_VALUE |
| BYGPU_SLOWDOWN_TEMP | LAST_VALUE |
| BYGPU_SM_CLOCK_MHZ | PCT95 |
| BYGPU_TOTAL_NVLINK_BANDWIDTH | SUM |
| BYGPU_TOTAL_REAL_MEM | LAST_VALUE |
| BYGPU_UNCORRECTABLE_REMAPPED_ROWS | SUM |
| BYGPU_UUID | VALUE |
| BYGPU_VGPU_LICENSE_STATUS | AVG |
| BYGPU_XID_ERROR | AVG |
Tip: For faster searching, add an asterisk to the end of your partial query. Example: cert*