Best practice approach to infrastructure monitoring administration


As a Solution Administrator or Tenant Administrator, you need to configure and administer the monitoring of your infrastructure by setting up data collection and monitoring for IT operations.

To set up the monitors, you must incrementally create monitors in a development or testing environment (phase I) before moving them to a production environment (phase II). You can set up your environments as described in the following illustration.

 

ima_bestpractice_approach.png

 

Note

If you have separate environments for development and testing, you must configure the environments in three phases:

Phase I: In development environment

Phase II: In testing environment

Phase III: In production environment

Refer to the following Phase I and Phase II sections for steps to transition from one phase to the next.

Before you begin

Guidelines

Remember the following guidelines while performing the Phase I and II workflows:

  • Implement the configuration settings in the order outlined in this topic according to the process workflow. You can deviate from this best practice if you want; however, it is easier to understand the implementation and stay organized if you follow the process provided, especially during the initial implementation.
  • Create, test, and validate before moving to production. Do not edit in production unless you find a problem in production that requires editing.
  • Start with a small number of Agents in production to minimize risk.
  • Monitoring is not applied to policy-managed Agents until the policies in production are enabled. This is the point where you “go live” in production with monitoring. Backing out before this step is easy. Backing out after this step can be difficult, depending upon the situation. 

Phase 1: In development or testing environment

The following process diagram describes the recommended workflow in a development or testing environment:

1. Define staging policies

Create separate staging policies for the Infrastructure Management development, test, and production environments that are assigned to the appropriate Integration Service instances.

Develop a clear strategy for assigning the PATROL Agents to each Integration Service. The Infrastructure Management server does not auto-balance the load between PATROL Agents and Integration Services so the initial assignment is important.

At least one Integration Service must exist per network. However, to avoid overloading any one Integration Service, you can define a convention based on name or function, or simply round-robin assignment within a network.

 

2. Create and deploy packages to PATROL Agents

You can deploy PATROL Agents either manually, or with a distribution tool that you are using.

Best practices
  • Connect a subset of the PATROL Agents to the respective Integration Services.
    • Agents must be connected to the Integration Service by setting the configuration variables on the Agent when you add the Agent package. This enables the Agents to automatically initiate the connection to the Integration Service.
    • Starting with a subset of Agents is important to avoid overloading the Infrastructure Management server or Integration Service.
  • BMC recommends that you deploy between 50 to 100 PATROL Agents at one time. This is repeated until the maximum recommended number of Agents is added to the Integration Service (per the scaling guidelines). 
  1. Import the Infrastructure Management PATROL Repository and create a deployable package for PATROL Agents and monitoring solutions or Knowledge Modules.

  2. Deploy and install the package on the PATROL Agents.

    Note

    If previous versions of PATROL Agents already exist, the packages must be configured to be installed into the same directory as the existing PATROL Agent.

  3. Verify that the PATROL package installations are successful and validate and test the PATROL Agents.

 

3. Configure global thresholds

Configure global server thresholds for the monitoring solutions.

Best practice

Identify and configure the thresholds that need to be set at the PATROL Agent level and at the monitoring solution level. For more information, see Configuring-global-thresholds.

 

 

4. Define monitoring policies

Best practices
  • Ensure that only required monitor instances are being discovered. Each monitoring solution may have different options on how to control this discovery.
  • Disable discovery of instances that are short-lived (for example, instances that are created and then deleted within the span of one to two days).
  • Ensure that the monitoring solutions that are used for data collection are preloaded.

 

5. (Optional) Define time frames and blackout policies

If you are creating blackout policies, create time frames for the monitoring solutions and then, create the blackout policies that will use the time frames.

 

 

 

 

7. Test and validate the collected data

Test and validate that the data is collected according to the policies that you defined. Resolve issues, if any.

You can view the status of the applied policies on the PATROL Agents as shown in the following image:

bpa_mdpage.png

Also, you can view the performance data, events, and devices from the Truesight console.

 

Phase 2: In production environment

After you have validated and tested the collected data, you can move to the production environment. The following process diagram describes the recommended workflow in a production environment:

1. Move policies to production servers

Move the validated policies from test to production leveraging the export and import utility. This utility can be used only for blackout and monitoring policies. You must manually define the staging policies in the production server.

 

2. Deploy a subset of packages

Deploy and install a subset of the deployable packages on the production servers. 

  1. Import the Infrastructure Management PATROL Repository and create a deployable package for PATROL Agents and monitoring solutions or Knowledge Modules.
  2. Deploy and install the package on PATROL Agents to the development or test managed servers.

 

4. Enable the policies

Enable the policies in the production environment. Validate the PATROL Agents and data collection in production. Resolve any issues.

How-to topic

 

 

5. Configure global thresholds

Configure global server thresholds for the monitoring solutions. Global thresholds are not automatically moved or migrated to the production environment.

Best practice

Identify and configure the thresholds that need to be set at the PATROL Agent level and at the monitoring solution level. For more information, see Configuring-global-thresholds.

 

 

7. Monitor and tune the performance

Between each batch of PATROL Agents and Integration Services that are deployed and configured, ensure that the Infrastructure Management server and Integration Services are performing well and can still manage the load.

Ensure that the scalability limitations of the Integration Services are not exceeded.

 

 

Additional monitoring sources and capabilities

Define manual application models based on groups and devices, or implement BMC TrueSight App Visibility manager to enable automatic application models from which you can monitor the performance and health of active or synthetic applications, perform diagnostics, and trace application transactions. 


Use third-party adapters to provide a mechanism for external applications to funnel data into Infrastructure Management. Data adapters facilitate the synchronization of performance data collected by specific monitoring solutions into Infrastructure Management for further analysis.

Define impact service models to monitor when higher-level entities, such as applications, technical services, business services, and organizations are impacted, and how they are impacted when lower-level IT infrastructure entities, such as servers, network devices, and application systems are affected by some condition.

See also Integrating.