Investigating events



When you log into BMC AMI Ops Insight, it displays the Active Events screen.

The Active Events screen provides tools for diagnosing problems in your system. Problems are indicated as Low or  High deviation from normal.

Based on your historical data, the product builds models that reflect what normal is in your environment. The realtime or playback data is compared against these models and a z-score is calculated for each KPI Group. A z-score provides a distance measure of how far a raw value is from its modeled mean in terms of standard deviation units. A High deviation reflects a standard deviation that is twice as large as the Low deviation. For example if we define the low deviation limit as ±1.5 standard deviations from the mean, and the high deviation limit as ±3.0 standard deviations from the mean, then the Low deviation state/range is between 1.5 and 3 standard deviations from the mean and the High deviation state/range is more than 3 standard deviations from the mean. The model sensitivity dictates the normal range and the ranges of the levels of deviation. 

You can change the sensitivity of the product so the score needs to have a higher variance from the z-score to change the state. See Setting-the-sensitivity-level for more information.

If BMC AMI Ops Insight has not detected any anomalies, a summary of the monitored KPIs and KPI groups is displayed. 

Stable_State.png

If anomalies have been detected, a summary of the events is displayed at the top of the page:

Active_Events_summary.png

A tile is displayed for each sharing group and each subsystem that is in an anomaly state. The following information is displayed for each sharing group or subsystem:

Active_Event_Tile.png

1

Name of sharing group or subsystem

2

Indication whether the anomaly is in a sharing group or a subsystem

3

If the anomaly is in a subsystem, LPAR and sharing group of the subsystem

4

Time the event started

5

Elapsed time since the event started

To view events by LPAR or sharing group

Click Group by, and select LPAR or Sharing Group.

The tiles are grouped by LPAR or sharing group according to your selection. 

To investigate an event

To investigate an event, click on Investigate in the of the event you want to investigate. 

The following tabs are available:

  • Probable Cause Analysis—Information that helps you identify the source of the event
  • Event Progression—Details of the event over time


The Probable Cause Analysis tab displays the following:
Probabl_Cause_Analysis_tab.png

1

Status of the Categories at the start of the event

If you select a sharing group event, only the categories that are relevant to sharing groups are displayed: Contention, I/O, and Workload.

2

KPI Groups and Exceptions that were anomaly state at the start of the event

3

Categories of the KPI Groups

4

Type of anomaly

5

MainView view where you can see more details

Click Copy_MV_View_button.png to copy the MainView command for accessing the view.

6

Button for viewing the Event Progression tab.

7

Active events that are in the same sharing group or LPAR as this event

The Event Progression tab shows how the event developed over time. It is divided into two sections:

  • Time sequence—Sequence of KPI Groups going into anomaly and Exceptions occurring
  • Timeline—Graphic display of the performance of the KPI Groups and Exceptions

To filter the Event Progression tab

You can filter the Event Progression tab as follows:

  1. Click Filter.
  2. Select the Categories and KPI Groups that you want to see.
    If you select or deselect a Category, all of the KPI Groups in the Category are selected or deselected. If you select or deselect a KPI Group, the relevant Category is selected but the other KPI Groups in the Category are not selected.
  3. Click Apply.

Time sequence

The time sequence shows a sequence of KPI Groups going into and out of anomaly state, and Exceptions occurring.

Progression_Time_Sequence.png

1

KPI Groups going into anomaly state

KPI Groups are listed here when they go into anomaly state, they are not listed if they stay in anomaly state. KPI Groups that stabilize and go back to anomaly state, are relisted.

Exceptions are listed every time they occur. Exceptions are indicated by a red dot.

2

Time of the anomalies

Click on the time to see that time reflected on the timeline. The selected time is indicated on the Time Sequence by a box: selected_time.png. The scroller on the timeline moves to the selected time.

3

KPI Groups stabilizing

Timeline

The timeline shows the performance of each KPI Group over time during the event.

Progression_timeline.png

1

Names of the KPI Groups and Exceptions

2

Filter for the timeline

By default, only KPI Groups that are in anomaly state are displayed. To display all KPI Groups, click the filter and select Anomaly & Stable state.

3

Timeline showing performance for the KPI Groups and Exception KPIs

  • A plain green line indicates that the KPI Group was stable.
  • A yellow or orange rectangle indicates that the KPI Group was in an anomaly state. The orange rectangles are larger to indicate a stronger deviation from normal.
  • A red dot indicates that an Exception occurred.

Note

Click the legend_tab.png tab on the right to see the legend describing what you see on the timeline.

4

Indicators of the current trend for each KPI Group

  • A green up arrow indicates that the metric for the KPI Group is rising.
  • A red down arrow indicates that the metric for the KPI Group is falling.
  • A grey dash indicates that the metric for the KPI Group is holding steady.

5

Draggable scroller to see values at a specific time on the timeline

6

Projected status for the future based on the current trends

 

Tip: For faster searching, add an asterisk to the end of your partial query. Example: cert*