Servers GPU views
The Servers GPU table lists all available GPUs in the selected server or domain. Each row corresponds to an individual GPU and shows the following information. By default, some columns are hidden. You can show or hide columns using the action menuthat is located next to the GPU table title. You can sort columns in tables alphabetically or by size. For more information, see Managing charts and tables in Views.
To view details of the required GPU, click name of the GPU in the GPU Metrics table. The following details are displayed on the GPU Metrics Details page:
Column | Description | Statistical Type | Metric |
---|---|---|---|
Server | Name of the physical server hosting the GPU. | Last value | NAME |
GPU | Name or identifier of the GPU (for example, nvidia0, nvidia1, etc.). | Last value | BYGPU_NAME |
GPU Utilization [%] | Percentage of GPU utilization, indicating how much of the GPU’s processing capacity is used. | 95th Percentile | BYGPU_GPU_UTIL |
Memory Utilization [%] | Percentage of total GPU memory currently used. | 95th Percentile | BYGPU_MEM_UTIL |
Power Utilization [kW] | Average power consumption of the GPU during operation. | 95th Percentile | BYGPU_PWR_UTIL |
GPU Temperature [°C] | Average core temperature of the GPU in degrees Celsius. | 95th Percentile | BYGPU_GPU_TEMP |
GPU Encoder Utilization [%] | Percentage of GPU encoder engine usage during media or video processing tasks. | 95th Percentile | BYGPU_GPU_ENCODER_UTIL |
GPU Decoder Utilization [%] | Percentage of GPU decoder engine usage during media or video playback tasks. | 95th Percentile | BYGPU_GPU_DECODER_UTIL |
Detailed information about a metric
To view details of the required server, click name of the GPU in the GPU metric table. The metrics are categorized based on different profiles and are grouped under separate tabs. Click a tab to view the charts for that profile. The following details are displayed on the GPU Details page:
Overview Profile
Chart title | Displayed values | Associated metric |
---|---|---|
GPU Utilization (%) | Percentage of GPU utilization per GPU | BYGPU_GPU_UTIL |
Memory Utilization (%) | GPU memory utilization percentage | BYGPU_MEM_UTIL |
Power Utilization (kW) | Power consumption of each GPU | BYGPU_PWR_UTIL |
GPU Temperature (°C) | Temperature of GPU cores | BYGPU_GPU_TEMP |
GPU Encoder Utilization (%) | Utilization percentage of the GPU encoder engine | BYGPU_GPU_ENCODER_UTIL |
GPU Decoder Utilization (%) | Utilization percentage of the GPU decoder engine | BYGPU_GPU_DECODER_UTIL |
Notes:
- Each chart displays utilization trends for all available GPUs (e.g., nvidia0, nvidia1, nvidia2, nvidia3).
- Hover over data points in the chart to view exact values for specific time intervals.
Memory Profile
Chart title | Displayed values | Associated metric |
---|---|---|
Total Real Memory (bytes) | Total physical memory available on each GPU | BYGPU_TOTAL_REAL_MEM |
Memory Free (bytes) | Unused GPU memory capacity | BYGPU_MEM_FREE |
Memory Reserved (bytes) | Memory reserved for GPU operations | BYGPU_MEM_RESERVED |
Memory Utilization (%) | Percentage of GPU memory currently used | BYGPU_MEM_UTIL |
Memory Used (bytes) | Total GPU memory in use | BYGPU_MEM_USED |
Notes:
- Memory charts display per-GPU data, allowing comparison across GPUs.
- When charts are stacked, Free Memory + Reserved Memory + Used Memory equals Total Real Memory.
- Trend lines can be viewed for individual GPUs such as nvidia0, nvidia1, nvidia2, and nvidia3.
- Data can be displayed using fixed or custom time periods for detailed trend analysis.
Temperature Profile
Chart title | Displayed values | Associated metric |
---|---|---|
GPU Temperature (°C) | Average temperature per GPU | BYGPU_GPU_TEMP |
Memory Temperature (°C) | Average temperature of GPU memory modules | BYGPU_MEM_TEMP |
Temperature Thresholds
Column | Description | Associated metric |
---|---|---|
GPU | Name or identifier of the GPU (for example, nvidia0, nvidia1, etc.). | BYGPU_NAME |
Slowdown temperature (°C) | The temperature at which the GPU starts to throttle or reduce performance to prevent overheating. | BYGPU_SLOWDOWN_TEMP |
Shutdown temperature (°C) | The maximum temperature limit that triggers an automatic shutdown to protect the hardware from damage. | BYGPU_SHUTDOWN_TEMP |
Maximum operating temperature (°C) | The highest recommended core temperature for continuous and stable GPU operation. | BYGPU_GPU_MAX_OP_TEMP |
Maximum memory operating temperature (°C) | The highest safe operating temperature for the GPU’s memory modules. | BYGPU_MEM_MAX_OP_TEMP |
For information about how GPU metrics are collected and imported into BMC Helix Continuous Optimization, see Collecting GPU Data Using NVIDIA dcgm-exporter Scripts.