Skip to Content

Documentation

- Register
- Log in
Search

Search

×

Recently viewed

Tracking activity
Administering
Enabling affected assets and affected services for incident
Reviewing active users
Deprecated and discontinued features

Search result

Helix Common Services
…
Out-of-the-box dashboards
BMC Helix AIOps dashboards
LLM Usage and Cost dashboard

LLM Usage and Cost dashboard

Export

Choose the export format from the list below:

Filter

Options for exportFormat You can navigate through the options using arrow keys, the home and end keys. Select the currently highlighted option by pressing enter or space. Right and left arrows will move focus to the next possible option in the list. The Home key will bring focus to the start of the list and the END key to its end. Up and down arrows allow you to navigate quickly to the start of option categories.

Office Formats (3)

ODT
Export as Open Document Text (ODT) format using the Office Server
PDF
Export as Portable Document Format (PDF) using the Web Browser
RTF
Export as Rich Text Format (RTF) using the Office Server

Other Formats (1)

HTML
Export as HyperText Markup Language (HTML)

Export as

Select the pages to export:

Created pages The pages created by the user or by XWiki extensions on behalf of the user.
Created and modified pages Includes modified extension pages (usually configuration pages).
All pages Includes unmodified extension pages.

select all/ none

Legend:
: Created Page
: Modified Extension Page
: Clean Extension Page

Large language models (LLMs) are a key component of AI-powered applications. Therefore, understanding the costs associated with their usage becomes important as it lets you optimize resources and manage budgets efficiently. Most LLMs use a token-based pricing system, where the cost depends on the number of tokens generated between the model and the user.

The LLM Usage and Cost dashboard helps you track the usage and cost of LLM models. It provides metrics to analyze the following parameters:

Tokens usage and their cost
Retrieval Augmented Generation (RAG) latency and score
Graphical Processing Unit (GPU) usage

Before you begin

Make sure that your LLM applications are instrumented with OpenTelemetry or OpenLLMetry to transmit the traces and metrics data for analysis.

To view the dashboard

From the navigation menu, click Dashboards.
Search for the AIOps Observability folder and double-click it.
Click LLM Usage and Cost.
The dashboard is displayed.

Metrics in the LLM Cost dashboard

The dashboard provides the following metrics:

LLM Usage and Cost
LLM GPU Usage

LLM Usage and Cost

Monitor the following metrics to analyze the usage and cost of the tokens and RAG parameters.

Panel	Description
Total Tokens	Displays the total number of tokens processed by the LLM during a given operation.
Cost Per Token	Displays the total cost incurred per token for using the LLM during a given operation.
Total Cost	Displays the total cost incurred for using the LLM during a given operation.
Latency	Displays the time required by the LLM to process a request and return a response.
Rag Documents Retrieved	Displays the number of documents that were retrieved while using the RAG system with the LLM.
Rag Latency	Displays the latency (response time) of the RAG system while using the LLM.
Rag Relevance Score	Displays a relevance score that indicates how relevant the retrieved information is to the query in the RAG system.
Top 5 GenAI Models by Token Usage	Displays the bar chart that shows the top five models according to token usage.
Latency Trend	Displays the latency trend of the LLM to process a request and return a response for a selected period.
Avg Token Consumption vs Avg Usage Cost	Displays the comparison of the average number of tokens consumed and the average cost of token usage.
Rag Latency Trend	Displays the latency trend of the RAG system while using the LLM for a selected interval.

LLM GPU Usage

Monitor the following metrics to analyze the usage of Graphical Processing Unit (GPU).

Panel	Description
GPU Power Usage	Displays the power usage (in watts) of the Graphical Processing Unit (GPU) at a given moment.
GPU Temperature	Displays the temperature (in Celsius) of the GPU.
GPU Memory Used	Displays the GPU memory (in MB) that is currently being used.
CPU Memory Utilization	Displays the percentage usage of CPU memory that is used for data transfers.
GPU Utilization	Displays the percentage usage of GPU at a given moment. This metric indicates how much of the GPU compute resources (cores and processing units) are being utilized for tasks, such as computations, rendering, or machine learning operations.

Last modified by Bipin Inamdar on 2025/09/12 13:46

Was this page helpful? Submitting...

What is wrong with this page?
Confusing
Missing screenshots, graphics
Missing technical details
Needs a video
Not correct
Not the information I expected
Your feedback

Contact information
Send Skip

Your feedback

Contact information
Send Skip

Thank you

© Copyright 2025 BMC Software, Inc.

Comments (0)

25.3
25.2
25.1
24.4
24.3
24.2
24.1
23.4
23.3
23.2
23.1
22.4
22.3
22.2
22.1
21.3

Tip: For faster searching, add an asterisk to the end of your partial query. Example: cert*

BMC Helix Dashboards 25.2

On this page

Footer

Company

About BMC
Events
Webinars
Feedback
Careers
Global Contacts
Sitemap
Newsroom

Support

Support Central
Knowledge Base
Vulnerability Disclosure
Documentation
Downloads
Resources

Social

Community
LinkedIn
Facebook
YouTube
BMC Blogs
X
Instagram

Search BMC

Contact
Free Trials
Legal
Privacy Policy
Email Opt-Out
Trust Center

© Copyright 2005 - 2025 BMC Software, Inc. Use of this site signifies your acceptance of BMC’s Terms of Use. BMC,the BMC logo, and other BMC marks are assets of BMC Software, Inc. These trademarks are registered and may be registered in the U.S. and in other countries.