Information

This site will undergo a brief period of maintenance on Thursday, 23 April at 2:30 AM Central/1:00 PM IST. During a 30 minute window, site availability may be intermittent.

LLM Usage and Cost dashboard


Large language models (LLMs) are a key component of AI-powered applications. Therefore, understanding the costs associated with their usage becomes important as it lets you optimize resources and manage budgets efficiently. Most LLMs use a token-based pricing system, where the cost depends on the number of tokens generated between the model and the user.

The LLM Usage and Cost dashboard helps you track the usage and cost of LLM models. It provides metrics to analyze the following parameters:

  • Token usage and their cost
  • Retrieval Augmented Generation (RAG) latency and score
  • Graphical Processing Unit (GPU) usage
  • Tabular representation of the top five agents with token usage and associated cost

Before you begin

Make sure that your LLM applications are instrumented with OpenTelemetry or OpenLLMetry to transmit the traces and metrics data for analysis.


To view the dashboard

  1. From the navigation menu, click Dashboards.
  2. Search for the AIOps Observability folder and double-click it.
  3. Click LLM Usage and Cost.
    The dashboard is displayed.

    LLM usage_cost_dashboard.png

Metrics in the LLM Cost dashboard

The dashboard provides the following metrics:

LLM Usage and Cost 

Monitor the following metrics to analyze the usage and cost of the tokens and RAG parameters.

PanelDescription
Total TokensDisplays the total number of tokens processed by the LLM during a given operation.
Cost Per TokenDisplays the total cost incurred per token for using the LLM during a given operation.
Total CostDisplays the total cost incurred for using the LLM during a given operation.
LLM LatencyDisplays the time the LLM takes to process a request and return a response.
Rag Documents RetrievedDisplays the number of documents that were retrieved while using the RAG system with the LLM.
Rag LatencyDisplays the latency (response time) of the RAG system while using the LLM.
Rag Relevance ScoreDisplays a relevance score that indicates how relevant the retrieved information is to the query in the RAG system.
Top 5 GenAI Models by Token UsageDisplays a bar chart showing the top five models by token usage.
Latency TrendDisplays the LLM's latency trend for processing a request and returning a response over a selected period.
Avg Token Consumption vs Avg Usage CostDisplays the comparison of the average number of tokens consumed and the average cost of token usage.
Rag Latency TrendDisplays the latency trend of the RAG system during a selected interval while using the LLM.
Total Token UsageDisplays token consumption over time, showing how total token usage varies across the selected time period.
Total CostDisplays the costs that change over time within the selected time range.

LLM GPU Usage

Monitor the following metrics to analyze the usage of Graphical Processing Unit (GPU).

PanelDescription
GPU Power UsageDisplays the power usage (in watts) of the Graphical Processing Unit (GPU) at a given moment.
GPU TemperatureDisplays the temperature (in Celsius) of the GPU.
GPU Memory UsedDisplays the GPU memory (in MB) that is currently being used.
CPU Memory UtilizationDisplays the percentage usage of CPU memory that is used for data transfers. 
GPU Utilization

Displays the percentage usage of GPU at a given moment. This metric indicates how much of the GPU compute resources (cores and processing units) are being utilized for tasks, such as computations, rendering, or machine learning operations.

Agetns by Usage

Top 5 agents by Token Usage

It displays the top five agents with the highest token usage and cost.

 

Tip: For faster searching, add an asterisk to the end of your partial query. Example: cert*

BMC Helix Dashboards 26.1