Monitoring service insights

Service insights are various analytics, such as, trends, patterns, behavior, daily averages of different events, and metric data, that correspond to the performance of a service. Operators or site reliability engineers (SREs) can use the service insights to see service degradation, quickly investigate and identify the root cause, and restore system availability as quickly as possible.

BMC Helix AIOps uses AI/ML algorithms to analyze events and metric data collected from the service environment over a period. You can see the insights in terms of textual summaries with graphs. Remember that insights are only available for services that have been available for at least two days.

As operators or SREs, you can view the following information:

Health insights to identify precise time of service degradation in a day or week.
Health insights into events and incidents as a trend and behavior over a period of the last 15 days.

Insights available for monitoring services

The following table provides an overview of different insights and what information you can infer from the insight summary::

Insights	Description	Summary
Health score	This insight is derived from the health score of a service. The health score is calculated by using AI/ML technology on the events associated with various service entities, such as, nodes, clusters, applications, devices, and child services . The health score ranges from 0 to 100, and directly correlates with the service health, so the higher the score, the healthier the service. Take corrective measures if there is a degradation in health score. You can see this insight only for decreasing trend in the health score. This insight is not displayed if there is an increasing trend or no trend in the health score for the given period. For more information, see Insight into health score.	Overall trend of your service health Latest percentage degradation of the health score between two subsequent dates
Severity pattern	This insight shows the pattern derived from daily occurrences of Critical or Major severity. If a service is repeatedly affected due to Critical or Major severities, every day during specific times, it indicates the service is not healthy. Take corrective measures if you see a pattern in severity occurrences over a period. For more information, see Insight into severity pattern.	Duration of repeated occurrences of a Critical or Major severity
Events (Major, Critical)	This insight is derived from Major and Critical events; that is, notifications about any change in the state of an application or device that you are monitoring. Events occurrences are correlated inversely with service health, meaning the fewer the events, the healthier a service. Take corrective measures if you see Major and Critical events are increasing with an alarming daily average, over a period. You can see this insight only for an increasing trend in Major/Critical events. This insight is not displayed if there is a decreasing trend or no trend in Major/Critical event occurrences for the given period. For more information, see Insight into Major or Critical events.	Trend of Critical or Major events over a period Number of average events Latest percentage increase in Critical or Major events between two subsequent dates
Incidents	This insight is derived from incidents; that is, events that are not part of the standard operation of a service and are causing interruption or quality degradation of a service. Incident-related insights in BMC Helix AIOps are derived from incidents generated from events, either through a notification policy or by a right-click on the event, which have resulted in an Incident Info (or INCIDENT_INFO) class event. Take corrective measures if the occurrences of incidents are increasing with an alarming daily average, over a period . You can see this insight only for an increasing trend in incidents. This insight is not displayed if there is a decreasing trend or no trend in incident occurrences for the given period. For more information, see Insight into incidents.	Trend of incidents over a period Number of average incidents Latest percentage increase in incidents between two subsequent dates

To view insights for a service

On the Services page, click the service name.
Scroll down and expand Analyze Service Insights.
You can see insights based on the availability of metric data collected from the IT network. Insights are displayed as soon as they start appearing, up to last 15 days.
Click the summary text to view the corresponding graph.
Insights are available for health score, severity pattern, Major or Critical events, and incidents.

Are insights available if the service is impacted for less than 15 days?

Yes. Insights are displayed as soon as they are available. For example, if the service health score has degraded for the last 5 days, the trend and pattern is displayed on the graph.

Insight into health score

The insight summary shows the degradation of a service health score over a period, and the latest degradation in terms of a percentage change in daily average health score between the last two subsequent dates. In the graph, the daily average health score is measured vertically (along the Y-axis) and date-time is measured horizontally (along the X-axis). The highlighted zone shows the latest percentage degradation of the health score.

Example:

Let us take an example of insights into health score for which the summary shows the decreasing trend, and also 100% degradation of average service health. The highlighted zone in the graph represents the recent decrease in the average health score.

Now, let us understand how the insights are derived using the average health score, as described in the table given below. The recent decrease in average health score and their dates are highlighted in the table. You can correlate these values with the highlighted zone in the graph.

Date	Avg. health score	% change in average health score [(H2 - H1)/(H1)] x 100 H2 = Average score on a date H1 = Average score on previous date
12/29/2022	16	-
12/30/2022	0	[(0 - 16)/16] x 100 = 100%

Insight into severity pattern

The insight summary shows a pattern for Critical or Major severity based on their daily occurrences during specific times. In the corresponding graph, the daily average health score is measured vertically (along the Y-axis) and date-time is measured horizontally (along the X-axis). Multiple highlighted zones show the pattern of daily occurrences of Critical or Major severities. The summary is generated based on the date range and duration of the severity on the server's time zone and can be viewed in the user's local time zone.

Example:

Let us take an example of insights into severity pattern for a service. In the graph, the highlighted line represent pattern for Critical severity.

Now, let us understand how the severity pattern is derived using the daily occurrences of Critical severity, as described in the table given below. You can correlate the severity duration and the corresponding date range with the highlighted line in the graph.

Date range	Severity	Duration of severity
12/29/2022 to 01/12/2023	Critical	05:30 hr to 05:30 hr

Insight into Major or Critical events

The insight summary shows an increasing trend of Critical or Major events with daily average of event occurrences over a period, and the latest percentage increase of that event with comparison dates . In the corresponding graph, the number of Major or Critical events is measured vertically (along the Y-axis) and date-time is measured horizontally (along the X-axis). The highlighted zone shows the recent percentage increase in events.

Example:

Let us take an example of insights into Critical events for which the summary shows an increasing trend of Critical events with the daily average of 11 Critical events, and the recent increase in Critical events from 15 to 18, with the comparison dates. The highlighted zone in the graph represents the recent increase in the occurrences of Critical events.

Now, let us understand how the insights are derived using the daily occurrences of Critical events data, as described in the table given below. The recent increase in Critical events and their dates are highlighted in the table. You can correlate these values with the highlighted zone in the graph.

Date	Critical events	Average Critical events (Sum of Critical events)/(Number of days)
12/28/2022	10	(10+2+4+4+5+5+6+15+16+15+15+10+15+18+18) /15 =10.53 Daily average = 11 events (critical)
12/29/2022	2
12/30/2022	4
12/31/2022	4
01/01/2023	5
01/02/2023	5
01/03/2023	6
01/04/2023	15
01/05/2023	16
01/06/2023	15
01/07/2023	15
01/08/2023	10
01/09/2023	15
01/10/2023	18
01/11/2023	18

Insight into incidents

The insight summary shows an increasing trend of incidents with daily average of incident occurrences over a period, and the latest percentage increase of incidents with comparison dates. In the corresponding graph, the number of incidents is measured vertically (along the Y-axis) and date-time is measured horizontally (along the X-axis). The highlighted zone shows the latest increase in incident count. In , whenever an incident is created against a service, an information event gets logged in BMC Helix Operations Management . Such information events are then considered in BMC Helix AIOps to derive incident-related insights for the respective service.

Example:

Let us take an example of insights into incidents for which the summary shows an increasing trend with the daily average of 8 incidents, and the recent increase in incidents from 13 to 15, with the comparison dates. The highlighted zone in the graph represents the recent increase in the occurrences of incidents.

Now, let us understand how the insights are derived using the daily occurrences of incidents, as described in the table given below. The recent increase in incidents and their dates are highlighted in the table. You can correlate these values with the highlighted zone in the graph.

Date	Incidents	Average incidents (Sum of incidents)/(Number of days)
12/28/2022	1	(1+2+3+4+5+7+6+7+9+9+10+11+12+13+15)/15 =7.6
12/29/2022	2
12/30/2022	3
12/31/2022	4
01/01/2023	5
01/02/2023	7
01/03/2023	6
01/04/2023	7
01/05/2023	9
01/06/2023	9
01/07/2023	10
01/08/2023	11
01/09/2023	12
01/10/2023	13	Daily average = 8 incidents
01/11/2023	15	Daily average = 8 incidents

Where to go from here

Based on the health of and impact on a service, you can perform any of the following tasks:

View CI topology for impacted services. For more information, see Identifying-the-impacted-CI-nodes-from-CI-topology-view.
Investigate impacting events, incidents, and changes for nodes in service hierarchy. For more information, see Investigating-the-service-nodes-from-service-hierarchy-view.
View health indicators for an impacted service. For more information, see Monitoring-service-health-indicators.
View the causal analysis for the impact. For more information, see Performing-causal-analysis-of-impacted-services.

Monitoring service insights

Insights available for monitoring services

To view insights for a service

Insight into health score

Example:

Insight into severity pattern

Example:

Insight into Major or Critical events

Example:

Insight into incidents

Example:

Where to go from here

BMC Helix AIOps 25.1

On this page