Guidelines to configure BMC ProactiveNet to perform Probable Cause Analysis


To maximize the capabilities of BMC ProactiveNet Probable Cause Analysis, configure your BMC ProactiveNet system as recommended by the following guidelines.

Managing KPIs

BMC highly recommends that you do a thorough review of your deployment to set appropriate Key Performance Indicators (KPIs) for your environment.

Baselines and abnormalities are automatically generated for KPIs. Thus, having the correct set of KPIs allows Probable Cause Analysis to work even better. For information about KPIs, see Viewing-the-Details-notebook.

Configuring complete monitor coverage

You must configure complete monitor coverage for your entire infrastructure. Complete monitor coverage is key to the accurate working of the Probable Cause Analysis process. Complete monitor coverage includes monitors at the system level, application level, and network level. If the coverage is not sufficient, then probable cause shows that no event is correlated. If monitors do not cover all dependencies in your infrastructure, then it is almost impossible to drill-down to the granular cause of a problem.

Creating service models for interdependent CIs

You must create service models for CIs that have dependencies on one another. You can create service models using the BMC ProactiveNet Administration Console or BMC Impact Model Designer. For information, see Working-with-BMC-ProactiveNet-Infrastructure-Management and Building-a-service-model-in-BMC-Impact-Model-Designer.

Associating monitors with the proper CI

When you create a service model by using the Associate Monitors feature in the Administration Console, ensure that you associate the monitor with the proper CI. For information on associating monitors with the proper CIs, see Associate-monitors-with-CIs-through-the-BMC-ProactiveNet-Administration-Console.

Setting a polling interval of 5 minutes or less

BMC recommends using the default polling frequency. However, if you decide to change the polling frequency, be aware that the polling frequency affects Probable Cause Analysis.

Probable Cause Analysis does not display events that are outside the specified timeframe before or after the result event. For example, if the default time correlation filter of one hour before the event and 30 minutes after the event is being applied and the polling frequency is 45 minutes, then too few data points are available within the default period to reliably pinpoint the probable cause.

A greater number of data points increases the likelihood of success of the Pobable Cause Analysis process. Therefore, it is best to configure the smallest polling interval that does not affect the performance of the device.

Establishing a reliable baseline

To ensure a reliable baseline for a monitor, the monitor must be collecting data consistently for at least a week. The longer the monitor has been collecting data, the more reliable the Probable Cause Analysis process becomes.

The following types of threshold settings impact the Probable Cause Analysis process:

  • Absolute/Signature event thresholds
     The duration of absolute and signature event thresholds is critical to correlating events that occur closely in time to the original event. An event is not generated until the condition exists for the duration specified in the threshold definition. If this duration period is too long and if you are trying to perform Probable Cause Analysis as soon as the problem event is created, this event might not be displayed.

    For example, suppose you create a threshold that creates a critical event when the server CPU used is above 90% (condition) for 1 hour (duration). At 10:00 A.M., the CPU used is higher than 90%. However, an event is generated at 11:00 A.M. only if the CPU used percentage stays above 90% until 11:00 A.M.

    If another event is created at 10:15 A.M. because of the high CPU consumption, and you are trying to perform Probable Cause Analysis on the event that occurred at 10:15 A.M., the CPU usage problem is not displayed as a probable cause event because the duration for this event has not yet passed.
  • Abnormality event thresholds
     BMC recommends that you retain the default settings for these thresholds.

Ensuring that subcategories are defined for external events

For external events to be analyzed based on global relationships, set the mc_event_subcategory slot for each external event. For information about the mc_event_subcategory slot, see MC_EVENT_SUBCATEGORY-enumeration.

Saving recurring conditions as a known probable cause

When you perform Probable Cause Analysis on an event and find a pattern that you want to reuse in the future, capture it by creating a knowledge pattern. Once a knowledge pattern is available, BMC ProactiveNet immediately applies this pattern to similar conditions and does not perform Probable Cause Analysis all over again.

 

Tip: For faster searching, add an asterisk to the end of your partial query. Example: cert*

BMC ProactiveNet 9.6