Using OpenTelemetry to identify application issues


BMC Helix AIOps connects with OpenTelemetry (also known as OTel) to help operators site reliability engineers (SREs) correlate application issues with infrastructure problems. For example, if end users experience checkout failures in a retail application, the root cause might be a failing database node. 

OpenTelemetry is a vendor-neutral, open-source observability framework to instrument, generate, collect, and export telemetry data (traces and metrics). The OpenTelemetry Collector collects telemetry data from applications and infrastructure, and sends it to observability backends such as BMC Helix applications (for example, BMC Helix Operations Management and BMC Helix Dashboards). After the BMC Helix applications ingest telemetry data, operators or SREs can monitor the instrumented applications through the business services in BMC Helix AIOps.

Warning
Important
  • The integration with OpenTelemetry is supported only for the SaaS deployments of BMC Helix applications, such as BMC Helix AIOps and BMC Helix Operations Management.
  • Telemetry data exported by OpenTelemetry is sent to BMC Helix applications through the BMC Helix OpenTelemetry service. To enable this service, contact BMC Helix Support. 

Telemetry data

OpenTelemetry can help you generate the following types of telemetry data:

  • Traces: A trace shows the journey of a request across a distributed system. The trace is made of spans, where each span represents a specific unit of work or operation (for example, a database query or an HTTP request). By viewing traces, you can track the complete execution path of the request and quickly identify which part of the application is causing issues, such as errors and latency concerns. Spans are associated with span events, which are structured log messages. Span events denote a meaningful, singular point in time during the span’s duration.
  • Metrics: A metric is a measurement of a service captured at runtime that tells us about the state of the system. Typically, when a problem occurs, values of one or more metrics are lower or greater than the recommended value. You can use the ingested metrics to generate events in BMC Helix Operations Management by defining alarm generation criteria. For example, you can configure a policy to generate an event when any API call takes longer than 5 ms. In addition, you can use these metrics to create dashboards for easy monitoring.
Error
Warning

BMC Helix does not automatically detect or mask sensitive data in OpenTelemetry payloads. All traces and metrics sent via OpenTelemetry are ingested and displayed in BMC Helix applications as received. Before exporting any data, configure your OpenTelemetry Collector to filter, remove, or mask sensitive data in accordance with your organizational policies. For the recommended approaches and examples about handling sensitive data, see the OpenTelemetry documentation.

In BMC Helix AIOps, you create a business service to represent the application topology. When an application is impacted, the values of various metrics change. If policies are defined against those metrics in BMC Helix Operations Management, events are generated. Those events impact the health of the business service​​​, helping you identify the impacted application service. From there, you can launch dashboards to analyze traces and correlate the traces and metrics to pinpoint the exact operation causing the issue in the application.

Watch the following video (2:36) to get a quick peek at how the combined power of BMC Helix applications and OpenTelemetry helps you to identify application issues quickly:

icon_play.pngWatch the YouTube video on how to use BMC Helix AIOps with OpenTelemetry’s tracing capabilities to identify application issues quickly.

Supported languages

BMC Helix AIOps supports monitoring of all the applications that are developed in the languages supported by OpenTelemetry. For the list of languages that are supported by OpenTelemetry, see the OpenTelemetry documentation.

Supported OpenTelemetry Collector versions

BMC Helix AIOps supports versions 0.128.0 and later of OpenTelemetry Collector.

Supported metrics

BMC Helix applications can ingest the following metric types from OpenTelemetry:

  • Rate, Errors, and Duration (RED) metrics: Represent the user-centric performance indicators and are derived from the underlying traces.  
  • Counter: Represents a value that accumulates or increases over time, for example, the number of disk reads and total requests received.
  • Gauge: Represents a current value at the time it is read, for example, the count of active connections on an application, and the memory used
  • Histogram: Represents a client-side aggregation of values, for example, request latency and request duration.

Process for using OpenTelemetry to identify application issues

The following table lists the tasks that you need to perform to identify application issues by using OpenTelemetry and BMC Helix applications: 

Task

Product

Role

Action

Reference

1

 

Ingest telemetry data into BMC Helix applications.

OpenTelemetry

Tenant administrator

  1. Instrument the application from which you want to ingest telemetry data.
  2. Install and configure the OpenTelemetry Collector to export telemetry data to the BMC Helix applications.
BMC Helix Discovery

Service designer

Make sure that the topology elements for the application that is instrumented by OpenTelemetry are ingested into BMC Helix Discovery.

BMC Helix DashboardsOperator

Make sure that the traces and metrics for the application that is instrumented by OpenTelemetry are ingested into BMC Helix Dashboards.

BMC Helix AIOps

Service designer

Create a service model that represents the topology of the application that is instrumented by OpenTelemetry. 

You can create a service model by using a service blueprint. Either use an out-of-the-box blueprint or create your own blueprint.

2

 

Monitor the service health in BMC Helix AIOps and analyze traces in BMC Helix Dashboards to identify application issues.

BMC Helix AIOps
BMC Helix Dashboards

SRE/ Operator

  • Monitor the service health in BMC Helix AIOps.
  • Analyze traces in BMC Helix Dashboards to identify the issue.

 

Tip: For faster searching, add an asterisk to the end of your partial query. Example: cert*

BMC Helix AIOps 26.1