Servers GPU views


The Servers GPU view displays GPU-specific performance metrics under fixed and custom time periods.

These views enable you to analyze GPU resource utilization across key metrics, including GPU utilization, memory usage, power consumption, temperature, and encoder/decoder utilization.

To access the Servers GPU view in the Views tab, select Views > Servers > GPU in the navigation panel.

Information

The NVIDIA GPU Summary view package is included out of the box (OOTB), but it is not installed by default. You must install it manually to access the GPU views.

 

The Servers GPU table lists all available GPUs in the selected server or domain. Each row corresponds to an individual GPU and shows the following information. By default, some columns are hidden. You can show or hide columns using the action menuaction_menu.pngthat is located next to the GPU table title. You can sort columns in tables alphabetically or by size. For more information, see Managing charts and tables in Views.

To view details of the required GPU, click name of the GPU in the GPU Metrics table. The following details are displayed on the GPU Metrics Details page:

ColumnDescriptionStatistical TypeMetric
ServerName of the physical server hosting the GPU.Last valueNAME
GPUName or identifier of the GPU (for example, nvidia0, nvidia1, etc.).Last valueBYGPU_NAME
GPU Utilization [%]Percentage of GPU utilization, indicating how much of the GPU’s processing capacity is used.95th PercentileBYGPU_GPU_UTIL
Memory Utilization [%]Percentage of total GPU memory currently used.95th PercentileBYGPU_MEM_UTIL
Power Utilization [kW]Average power consumption of the GPU during operation.95th PercentileBYGPU_PWR_UTIL
GPU Temperature [°C]Average core temperature of the GPU in degrees Celsius.95th PercentileBYGPU_GPU_TEMP
GPU Encoder Utilization [%]Percentage of GPU encoder engine usage during media or video processing tasks.95th PercentileBYGPU_GPU_ENCODER_UTIL
GPU Decoder Utilization [%]Percentage of GPU decoder engine usage during media or video playback tasks.95th PercentileBYGPU_GPU_DECODER_UTIL

Detailed information about a metric

To view details of the required server, click name of the GPU in the GPU metric table. The metrics are categorized based on different profiles and are grouped under separate tabs. Click a tab to view the charts for that profile. The following details are displayed on the GPU Details page:

Overview Profile

Chart titleDisplayed valuesAssociated metric
GPU Utilization (%)Percentage of GPU utilization per GPUBYGPU_GPU_UTIL
Memory Utilization (%)GPU memory utilization percentageBYGPU_MEM_UTIL
Power Utilization (kW)Power consumption of each GPUBYGPU_PWR_UTIL
GPU Temperature (°C)Temperature of GPU coresBYGPU_GPU_TEMP
GPU Encoder Utilization (%)Utilization percentage of the GPU encoder engineBYGPU_GPU_ENCODER_UTIL
GPU Decoder Utilization (%)Utilization percentage of the GPU decoder engineBYGPU_GPU_DECODER_UTIL

Notes:

  • Each chart displays utilization trends for all available GPUs (e.g., nvidia0, nvidia1, nvidia2, nvidia3).
  • Hover over data points in the chart to view exact values for specific time intervals.

Memory Profile

Chart titleDisplayed valuesAssociated metric
Total Real Memory (bytes)Total physical memory available on each GPUBYGPU_TOTAL_REAL_MEM
Memory Free (bytes)Unused GPU memory capacityBYGPU_MEM_FREE
Memory Reserved (bytes)Memory reserved for GPU operationsBYGPU_MEM_RESERVED
Memory Utilization (%)Percentage of GPU memory currently usedBYGPU_MEM_UTIL
Memory Used (bytes)Total GPU memory in useBYGPU_MEM_USED

Notes:

  • Memory charts display per-GPU data, allowing comparison across GPUs.
  • When charts are stacked, Free Memory + Reserved Memory + Used Memory equals Total Real Memory.
  • Trend lines can be viewed for individual GPUs such as nvidia0, nvidia1, nvidia2, and nvidia3.
  • Data can be displayed using fixed or custom time periods for detailed trend analysis.

Temperature Profile

Chart titleDisplayed valuesAssociated metric
GPU Temperature (°C)Average temperature per GPUBYGPU_GPU_TEMP
Memory Temperature (°C)Average temperature of GPU memory modulesBYGPU_MEM_TEMP

Temperature Thresholds

ColumnDescriptionAssociated metric
GPUName or identifier of the GPU (for example, nvidia0, nvidia1, etc.).BYGPU_NAME
Slowdown temperature (°C)The temperature at which the GPU starts to throttle or reduce performance to prevent overheating.BYGPU_SLOWDOWN_TEMP
Shutdown temperature (°C)The maximum temperature limit that triggers an automatic shutdown to protect the hardware from damage.BYGPU_SHUTDOWN_TEMP
Maximum operating temperature (°C)The highest recommended core temperature for continuous and stable GPU operation.BYGPU_GPU_MAX_OP_TEMP
Maximum memory operating temperature (°C)The highest safe operating temperature for the GPU’s memory modules.BYGPU_MEM_MAX_OP_TEMP

For information about how GPU metrics are collected and imported into BMC Helix Continuous Optimization, see Collecting GPU Data Using NVIDIA dcgm-exporter Scripts.

 

Tip: For faster searching, add an asterisk to the end of your partial query. Example: cert*

BMC Helix Continuous Optimization 25.4