OTel Service Overview dashboard


OpenTelemetry is a vendor-neutral, open-source observability framework for instrumenting, generating, collecting, and exporting telemetry data, such as traces. A trace describes the journey of a request through a distributed system and identifies the operations that cause issues, such as errors and latencies.

Use the OTel Service Overview dashboard to get an overview of the operations, requests, latency, and errors occurred during operations in an application service. You can use this information to start investigating and troubleshooting issues that are reported in the system.

Important

The dashboard is available only if the BMC Helix OpenTelemetry service is enabled for your tenant. To enable this service, contact BMC Support.

The dashboard provides the following information about the traces generated from the constituent services of an instrumented application:

  • Top-level and total operations
  • Average rate of requests and latency
  • Requests rate and latency trends
  • Errors and their trend
  • User satisfaction scores
  • Duration and status
Scenario

Jim is an operator at Apex Global. He is responsible for monitoring the health of applications and constituent services in a microservices-based environment. He needs to ensure that the services are up and running without any issues or bottlenecks. To get an overview of the overall operations, requests, latency, errors, and traces in an applications services, he regularly monitors the OTel Service Overview dashboard.

For instance, Jim notices that some of the service traces show the ERROR status in the Traces section and the service shows a long tail latency in the Latency Trend section. He clicks the ERROR link against an erroneous trace to open the trace details on the OTel Trace Details dashboard. He analyzes these trace details to investigate issues and follows the same approach for other erroneous traces.

To view the dashboard

  1. From the navigation menu menu_icon.png, click Dashboards.
  2. Search for the Helix OpenTelemetry folder and select it.
  3. Click OTel Service Overview.
    The dashboard is displayed.

    otel_ser_overview_dashboard.png
     
  4. From the Business Service list, select a business service if not selected already.
    The list shows the business services for which OpenTelemetry is enabled.
  5. From the OTel Namespace list, select a namespace if not selected already.
  6. From the OTel Service list, select an application service if not selected already.
  7. From the Status Filter list, select one of the following status codes that indicate whether a trace operation succeeded or failed:
    • Status_Code_Unset—Select to view all the trace operations.
    • Status_Code_Error—Select to view the trace operations that failed due to issues, such as timeouts, exceptions, or failed API calls.
  8. (Optional) Change the date range for the data displayed in the dashboard; the default is three hours.
  9. Review the telemetry data of the service in the dashboard panels.
     

Tip: Quick access from the Home page

To quickly open the dashboard from the Home page, mark it as a favorite by using the star icon. Additionally, after you open a dashboard, it is available under Recently viewed dashboards on the Home page.

Panels in the OTel Service Overview dashboard

Panel

Description

User Satisfaction (Apdex)

Displays a line chart that shows the trend of user satisfaction scores for the selected period. The minimum, average, and maximum score values are displayed below the chart. 

The scores are calculated by using the Application Performance Index (Apdex) method. This method converts measurements into one number on a uniform scale of 0 to 1. The Apdex score measures the extent to which the measured performance meets user expectations. Any score close to one indicates a good satisfaction level. Any score close to zero indicates poor satisfaction.

Rate TrendDisplays the trend of requests that are sent per minute in the selected period.
Error TrendDisplays a line chart that shows the trend of errors that occurred while running the operations during the selected period.
Latency Trend

Displays a line chart that shows the latency trend of traces in the selected period.

  • The spikes on the chart denote high latency values, which indicate network or other issues.
  • The latency trend is shown for 95, 99, and 999 percentile values. For each percentile value, min, mean, and max values are displayed. 
Average LatencyDisplays the average latency (in seconds) of traces during the selected period.

Top Level Operations

Displays the number of top-level operations that are initiated for the selected application service. A top-level operation is the first operation that is started within a trace.

Calls by Operation Type

Displays the details of calls that occurred during operations. The details include the type of calls, their count, and average call duration.

Average Rate

Displays the average number of requests that are sent per minute.

Total Errors

Displays the number of errors that occurred during the operations in the selected period.

Traces for <service_name>

Displays the traces along with their duration and status for the selected OpenTelemetry service. 

  • Filter traces according to duration and status.
  • Observe traces with the Error status. The error indicates that a few operations failed due to some issues. Click the status link to check out more details for further investigation and troubleshooting on the OTel Trace Details dashboard.
  • Observe traces for which the duration is red. It indicates that some operations took more time than expected. Click the duration corresponding to the traces to navigate to the OTel Trace Details dashboard. On this dashboard, identify the operations that caused this delay.

 

Tip: For faster searching, add an asterisk to the end of your partial query. Example: cert*