Overview of BMC AMI Ops Insight

BMC AMI Ops Insight is a forward-looking tool that helps you detect anomalies in your environment. It ingests your historical data and uses machine learning to understand normal behavior for your systems. It then uses multivariate analysis to detect anomalies in real-time data. This approach minimizes detection time, maximizes lead-time to remediation, and reduces false positives. Built-in domain knowledge and data science expertise lowers the learning curve for your employees by tracking carefully selected key metrics and connecting the dots so you don’t have to spend time configuring the solution and navigating through trial and error.

Goals

The product's goals are:

To reduce mean-time to detection for problems that occur in your mainframe environment and maximize the lead time to remediate problems
To help you move from a reactive mode to a proactive mode by notifying you about an impending problem before it happens
To optimize the cost of detecting problems by monitoring key metrics identified using our data-science domain expertise to do the heavy lifting for you

To achieve these goals, the product seeks to detect anomalies as soon as they manifest, in order to predict the behavior and project the impact of the anomalies.

Process

The following is the overall process:

We build algorithms using our domain expertise to identify KPIs (Key Performance Indicators) and groups of connected KPIs that can indicate problems. This minimizes the cost for you because we only monitor the KPIs that are relevant.
You bring your historical SMF 100 data so the product can identify normal levels for these KPIs in your environment to train models. This means that the models are not generic out-of-the-box models that may or may not be relevant to you. The models you use are customized to your environment.
The product then uses multivariate analysis to score your real-time data, comparing it with the data in the "normal" models to detect exceptions.
When the product detects anomalies, it looks for trends so it can project and say that you are experiencing a problem or that you are about to experience a problem. Reporting trends rather than individual anomalies maximizes accuracy and minimizes false positives.
Note

In some cases, for especially sensitive KPIs, individual exceptions are called out as soon as they show an anomaly.

Data source

The product uses SMF records as source data (currently only SMF 100 records). SMF 100 records contain hundreds of Db2 system-level statistics cut at 1 minute intervals. These are overall system-level indicators based upon the resource usage on the system during the last interval. These represent a useful and meaningful set of data to measure system-level activity based on these resource usage statistics. The product uses a selected subset of these statistics (KPIs), grouped based on specific areas of measurement including CPU time, storage, contention, throughput, and others.

Components and flow

The data flow is similar for the training and scoring processes.

This diagram illustrates how data proceeds through the product's components:

(Click diagram to expand)

Training

Historical data is ingested into the data preparation address space.
After the data is prepared it is passed to AMI Manager via the REST interface.
AMI Manager generates a set of models based on the historical data.
The models are stored in the database and are available via the browser-based UI for scoring.

Scoring

Real-time data is collected by BMC AMI Defender (separate license for BMC AMI Defender is not required).
The data is then ingested into the data preparation address space.
After the data is prepared it is passed to AMI Manager via the REST interface.
AMI Manager evaluates the data against a current model.
The results are stored in the database and displayed on the browser-based UI.