OTel Service Overview dashboard


OpenTelemetry is a vendor-neutral, open-source observability framework for instrumenting, generating, collecting, and exporting telemetry data, such as traces. A trace describes the journey of a request through a distributed system and identifies the operations that cause issues, such as errors and latencies.

Use the OTel Service Overview dashboard to get an overview of the operations, requests, latency, and errors occurred during operations in an application service. You can use this information to start investigating and troubleshooting issues that are reported in the system.

Important

The dashboard is available only if the BMC Helix OpenTelemetry service is enabled for your tenant. To enable this service, contact BMC Support.

The dashboard provides the following information about the traces generated from the constituent services of an instrumented application:

  • Top-level and total operations
  • Average rate of requests and latency
  • Requests rate and latency trends
  • Errors and their trend
  • User satisfaction scores
  • Duration and status
Scenario

Jim is an operator at Apex Global. He is responsible for monitoring the health of applications and constituent services in a microservices-based environment. He needs to ensure that the services are up and running without any issues or bottlenecks. To get an overview of the overall operations, requests, latency, errors, and traces in an applications services, he regularly monitors the OTel Service Overview dashboard.

For instance, Jim notices that some of the service traces show the ERROR status in the Traces section and the service shows a long tail latency in the Latency Trend section. He clicks the ERROR link against an erroneous trace to open the trace details on the OTel Trace Details dashboard. He analyzes these trace details to investigate issues and follows the same approach for other erroneous traces.

To view the dashboard

  1. From the navigation menu menu_icon.png, click Dashboards.
  2. Search for the Helix OpenTelemetry folder and select it.
  3. Click OTel Service Overview.
    The dashboard is displayed.

    OTelOverviewDashboard_24102.jpg

  4. From the Business Service list, select a business service if not selected already.
    The list shows the business services for which OpenTelemetry is enabled.
  5. From the OTel Namespace list, select a namespace if not selected already.
  6. From the OTel Service list, select an application service if not selected already.
  7. (Optional) Change the date range for the data displayed in the dashboard; the default is three hours.
  8. Review the telemetry data of the service in the dashboard panels.

Tip: Quick access from the Home page

To quickly open the dashboard from the Home page, mark it as a favorite by using the star icon. Additionally, after you open a dashboard, it is available under Recently viewed dashboards on the Home page.

Panels in the OTel Service Overview dashboard

Panel

Description

Top Level Operations

Displays the number of top-level operations that are initiated for the selected application service. A top-level operation is the first operation that is started within a trace.

Total Operations

Displays the total number of operations that occurred for the selected application service.

Largest Latency

Displays the latency value (in ms) of the trace that took the longest time.

Shortest Latency

Displays the latency value (in ms) of the trace that took the shortest time.

Total Errors

Displays the number of errors that occurred during the operations in the selected period.

User Satisfaction (Apdex) Scores < 1

Displays bar charts showing the user satisfaction survey scores that are less than 1. Hover over a bar to view the exact score.

The scores are calculated by using the Application Performance Index (APDEX) method. This method converts measurements into one number on a uniform scale of 0 to 1. The Apdex score measures the extent to which the measured performance meets user expectations. Any score close to one indicates a good satisfaction level. Any score close to zero indicates poor satisfaction. 

Average Rate

Displays the average number of requests that are sent per minute.

Rate Trend

Displays the trend of requests that are sent per minute in the selected period.

Traces for <service_name>

Displays the traces along with their duration and status for the selected OpenTelemetry service. 

  • Filter traces according to duration and status.
  • Observe traces with the Error status. The error indicates that a few operations failed due to some issues. Click the status link to check out more details for further investigation and troubleshooting on the OTel Trace Details dashboard.
  • Observe traces for which the duration is red. It indicates that some operations took more time than expected. Click the duration corresponding to the traces to navigate to the OTel Trace Details dashboard. On this dashboard, identify the operations that caused this delay.

Error Percentage

Displays the percentage of errors that occurred while running the operations for the selected service.

Error Trend

Displays a line chart that shows the trend of errors that occurred while running the operations in the selected period.

Average Latency

Displays the average latency (in seconds) of traces during the selected period.

Latency Trend

Displays a line chart that shows the latency trend of traces in the selected period.

  • The spikes on the chart denote high latency values, which indicate network or other issues.
  • The latency trend is shown for 95, 99, and 999 percentile values. For each percentile value, min, mean, and max values are displayed. 

User Satisfaction (Apdex)

Displays a line chart that shows the trend of user satisfaction scores for the selected period. The min, average, and max score values are displayed below the chart. 

The scores are calculated by using the Application Performance Index (Apdex) method. This method converts measurements into one number on a uniform scale of 0 to 1. The Apdex score measures the extent to which the measured performance meets user expectations. Any score close to one indicates a good satisfaction level. Any score close to zero indicates poor satisfaction.

 

Tip: For faster searching, add an asterisk to the end of your partial query. Example: cert*