LLM Usage and Cost dashboard
Large language models (LLMs) are a key component of AI-powered applications. Therefore, understanding the costs associated with their usage becomes important as it lets you optimize resources and manage budgets efficiently. Most LLMs use a token-based pricing system, where the cost depends on the number of tokens generated between the model and the user.
The LLM Usage and Cost dashboard helps you track the usage and cost of LLM models. It provides metrics to analyze the following parameters:
- Tokens usage and their cost
- Retrieval Augmented Generation (RAG) latency and score
- Graphical Processing Unit (GPU) usage
Before you begin
Make sure that your LLM applications are instrumented with OpenTelemetry or OpenLLMetry to transmit the traces and metrics data for analysis.
To view the dashboard
- From the navigation menu, click Dashboards.
- Search for the AIOps Observability folder and double-click it.
- Click LLM Usage and Cost.
The dashboard is displayed.
Metrics in the LLM Cost dashboard
The dashboard provides the following metrics:
LLM Usage and Cost
Monitor the following metrics to analyze the usage and cost of the tokens and RAG parameters.
Panel | Description |
Total Tokens | Displays the total number of tokens processed by the LLM during a given operation. |
Cost Per Token | Displays the total cost incurred per token for using the LLM during a given operation. |
Total Cost | Displays the total cost incurred for using the LLM during a given operation. |
Latency | Displays the time required by the LLM to process a request and return a response. |
Rag Documents Retrieved | Displays the number of documents that were retrieved while using the RAG system with the LLM. |
Rag Latency | Displays the latency (response time) of the RAG system while using the LLM. |
Rag Relevance Score | Displays a relevance score that indicates how relevant the retrieved information is to the query in the RAG system. |
Top 5 GenAI Models by Token Usage | Displays the bar chart that shows the top five models according to token usage. |
Latency Trend | Displays the latency trend of the LLM to process a request and return a response for a selected period. |
Avg Token Consumption vs Avg Usage Cost | Displays the comparison of the average number of tokens consumed and the average cost of token usage. |
Rag Latency Trend | Displays the latency trend of the RAG system while using the LLM for a selected interval. |
LLM GPU Usage
Monitor the following metrics to analyze the usage of Graphical Processing Unit (GPU).
Panel | Description |
GPU Power Usage | Displays the power usage (in watts) of the Graphical Processing Unit (GPU) at a given moment. |
GPU Temperature | Displays the temperature (in Celsius) of the GPU. |
GPU Memory Used | Displays the GPU memory (in MB) that is currently being used. |
CPU Memory Utilization | Displays the percentage usage of CPU memory that is used for data transfers. |
GPU Utilization | Displays the percentage usage of GPU at a given moment. This metric indicates how much of the GPU compute resources (cores and processing units) are being utilized for tasks, such as computations, rendering, or machine learning operations. |