Monitoring service insights
The following video (1:53) shows a high-level overview of the service insights feature in BMC Helix AIOps:
Watch the YouTube video about Overview of service insights in BMC Helix AIOps.
Insights available for monitoring services
The following table provides an overview of different insights and what information you can infer from the insight summary::
Insights | Description | Summary |
---|---|---|
Health score | This insight is derived from the health score of a service. The health score is calculated by using AI/ML technology on the events associated with various service entities, such as, nodes, clusters, applications, devices, and child services . The health score ranges from 0 to 100, and directly correlates with the service health, so the higher the score, the healthier the service. Take corrective measures if there is a degradation in health score. You can see this insight only for decreasing trend in the health score. This insight is not displayed if there is an increasing trend or no trend in the health score for the given period. For more information, see Insight into health score. |
|
Severity pattern | This insight shows the pattern derived from daily occurrences of Critical or Major severity. If a service is repeatedly affected due to Critical or Major severities, every day during specific times, it indicates the service is not healthy. Take corrective measures if you see a pattern in severity occurrences over a period. For more information, see Insight into severity pattern. |
|
Events (Major, Critical) | This insight is derived from Major and Critical events; that is, notifications about any change in the state of an application or device that you are monitoring. Events occurrences are correlated inversely with service health, meaning the fewer the events, the healthier a service. Take corrective measures if you see Major and Critical events are increasing with an alarming daily average, over a period. You can see this insight only for an increasing trend in Major/Critical events. This insight is not displayed if there is a decreasing trend or no trend in Major/Critical event occurrences for the given period. For more information, see Insight into Major or Critical events. |
|
Incidents | This insight is derived from incidents; that is, events that are not part of the standard operation of a service and are causing interruption or quality degradation of a service. Incident-related insights in BMC Helix AIOps are derived from incidents generated from events, either through a notification policy or by a right-click on the event, which have resulted in an Incident Info (or INCIDENT_INFO) class event. Take corrective measures if the occurrences of incidents are increasing with an alarming daily average, over a period . You can see this insight only for an increasing trend in incidents. This insight is not displayed if there is a decreasing trend or no trend in incident occurrences for the given period. For more information, see Insight into incidents. |
|
To view insights for a service
- On the Services page, click the service name.
- Scroll down and expand Analyze Service Insights.
You can see insights based on the availability of metric data collected from the IT network. Insights are displayed as soon as they start appearing, up to last 15 days. - Click the summary text to view the corresponding graph.
Insights are available for health score, severity pattern, Major or Critical events, and incidents.
Insight into health score
The insight summary shows the degradation of a service health score over a period, and the latest degradation in terms of a percentage change in daily average health score between the last two subsequent dates. In the graph, the daily average health score is measured vertically (along the Y-axis) and date-time is measured horizontally (along the X-axis). The highlighted zone shows the latest percentage degradation of the health score.
Example:
Let us take an example of insights into health score for which the summary shows the decreasing trend, and also 100% degradation of average service health. The highlighted zone in the graph represents the recent decrease in the average health score.
Now, let us understand how the insights are derived using the average health score, as described in the table given below. The recent decrease in average health score and their dates are highlighted in the table. You can correlate these values with the highlighted zone in the graph.
Date | Avg. health score | % change in average health score [(H2 - H1)/(H1)] x 100 H2 = Average score on a date H1 = Average score on previous date
|
---|---|---|
12/29/2022 | 16 | - |
12/30/2022 | 0 | [(0 - 16)/16] x 100 = 100% |
Insight into severity pattern
The insight summary shows a pattern for Critical or Major severity based on their daily occurrences during specific times. In the corresponding graph, the daily average health score is measured vertically (along the Y-axis) and date-time is measured horizontally (along the X-axis). Multiple highlighted zones show the pattern of daily occurrences of Critical or Major severities. The summary is generated based on the date range and duration of the severity on the server's time zone and can be viewed in the user's local time zone.
Example:
Let us take an example of insights into severity pattern for a service. In the graph, the highlighted line represent pattern for Critical severity.
Now, let us understand how the severity pattern is derived using the daily occurrences of Critical severity, as described in the table given below. You can correlate the severity duration and the corresponding date range with the highlighted line in the graph.
Date range | Severity | Duration of severity |
---|---|---|
12/29/2022 to 01/12/2023 | Critical | 05:30 hr to 05:30 hr |
Insight into Major or Critical events
The insight summary shows an increasing trend of Critical or Major events with daily average of event occurrences over a period, and the latest percentage increase of that event with comparison dates . In the corresponding graph, the number of Major or Critical events is measured vertically (along the Y-axis) and date-time is measured horizontally (along the X-axis). The highlighted zone shows the recent percentage increase in events.
Example:
Let us take an example of insights into Critical events for which the summary shows an increasing trend of Critical events with the daily average of 11 Critical events, and the recent increase in Critical events from 15 to 18, with the comparison dates. The highlighted zone in the graph represents the recent increase in the occurrences of Critical events.
Now, let us understand how the insights are derived using the daily occurrences of Critical events data, as described in the table given below. The recent increase in Critical events and their dates are highlighted in the table. You can correlate these values with the highlighted zone in the graph.
Date | Critical events | Average Critical events (Sum of Critical events)/(Number of days) |
---|---|---|
12/28/2022 | 10 | (10+2+4+4+5+5+6+15+16+15+15+10+15+18+18) /15 =10.53 Daily average = 11 events (critical) |
12/29/2022 | 2 | |
12/30/2022 | 4 | |
12/31/2022 | 4 | |
01/01/2023 | 5 | |
01/02/2023 | 5 | |
01/03/2023 | 6 | |
01/04/2023 | 15 | |
01/05/2023 | 16 | |
01/06/2023 | 15 | |
01/07/2023 | 15 | |
01/08/2023 | 10 | |
01/09/2023 | 15 | |
01/10/2023 | 18 | |
01/11/2023 | 18 |
Insight into incidents
The insight summary shows an increasing trend of incidents with daily average of incident occurrences over a period, and the latest percentage increase of incidents with comparison dates. In the corresponding graph, the number of incidents is measured vertically (along the Y-axis) and date-time is measured horizontally (along the X-axis). The highlighted zone shows the latest increase in incident count. In BMC Helix IT Service Management, whenever an incident is created against a service, an information event gets logged in BMC Helix Operations Management . Such information events are then considered in BMC Helix AIOps to derive incident-related insights for the respective service.
Example:
Let us take an example of insights into incidents for which the summary shows an increasing trend with the daily average of 8 incidents, and the recent increase in incidents from 13 to 15, with the comparison dates. The highlighted zone in the graph represents the recent increase in the occurrences of incidents.
Now, let us understand how the insights are derived using the daily occurrences of incidents, as described in the table given below. The recent increase in incidents and their dates are highlighted in the table. You can correlate these values with the highlighted zone in the graph.
Date | Incidents | Average incidents (Sum of incidents)/(Number of days) |
---|---|---|
12/28/2022 | 1 | (1+2+3+4+5+7+6+7+9+9+10+11+12+13+15)/15 =7.6 |
12/29/2022 | 2 | |
12/30/2022 | 3 | |
12/31/2022 | 4 | |
01/01/2023 | 5 | |
01/02/2023 | 7 | |
01/03/2023 | 6 | |
01/04/2023 | 7 | |
01/05/2023 | 9 | |
01/06/2023 | 9 | |
01/07/2023 | 10 | |
01/08/2023 | 11 | |
01/09/2023 | 12 | |
01/10/2023 | 13 | Daily average = 8 incidents |
01/11/2023 | 15 |
Where to go from here
Based on the health of and impact on a service, you can perform any of the following tasks:
- View CI topology for impacted services. For more information, see Identifying-the-impacted-CI-nodes-from-CI-topology-view.
- Investigate impacting events, incidents, and changes for nodes in service hierarchy. For more information, see Investigating-the-service-nodes-from-service-hierarchy-view.
- View health indicators for an impacted service. For more information, see Monitoring-service-health-indicators.
- View the causal analysis for the impact. For more information, see Performing-causal-analysis-of-impacted-services.