Best practice approach to infrastructure monitoring administration
As a Solution Administrator or Tenant Administrator, you need to configure and administer the monitoring of your infrastructure by setting up data collection and monitoring for IT operations.
To set up the monitors, you must incrementally create monitors in a development or testing environment (phase I) before moving them to a production environment (phase II). You can set up your environments as described in the following illustration.
Note
If you have separate environments for development and testing, you must configure the environments in three phases:
Phase I: In development environment
Phase II: In testing environment
Phase III: In production environment
Refer to the following Phase I and Phase II sections for steps to transition from one phase to the next.
Before you begin
Guidelines
Remember the following guidelines while performing the Phase I and II workflows:
- Implement the configuration settings in the order outlined in this topic according to the process workflow. You can deviate from this best practice if you want; however, it is easier to understand the implementation and stay organized if you follow the process provided, especially during the initial implementation.
- Create, test, and validate before moving to production. Do not edit in production unless you find a problem in production that requires editing.
- Start with a small number of Agents in production to minimize risk.
- Monitoring is not applied to policy-managed Agents until the policies in production are enabled. This is the point where you “go live” in production with monitoring. Backing out before this step is easy. Backing out after this step can be difficult, depending upon the situation.
Phase 1: In development or testing environment
The following process diagram describes the recommended workflow in a development or testing environment:
1. Define staging policies
Create separate staging policies for the Infrastructure Management development, test, and production environments that are assigned to the appropriate Integration Service instances.
Develop a clear strategy for assigning the PATROL Agents to each Integration Service. The Infrastructure Management server does not auto-balance the load between PATROL Agents and Integration Services so the initial assignment is important.
At least one Integration Service must exist per network. However, to avoid overloading any one Integration Service, you can define a convention based on name or function, or simply round-robin assignment within a network.
2. Create and deploy packages to PATROL Agents
You can deploy PATROL Agents either manually, or with a distribution tool that you are using.
Best practices
- Connect a subset of the PATROL Agents to the respective Integration Services.
- Agents must be connected to the Integration Service by setting the configuration variables on the Agent when you add the Agent package. This enables the Agents to automatically initiate the connection to the Integration Service.
- Starting with a subset of Agents is important to avoid overloading the Infrastructure Management server or Integration Service.
- BMC recommends that you deploy between 50 to 100 PATROL Agents at one time. This is repeated until the maximum recommended number of Agents is added to the Integration Service (per the scaling guidelines).
Import the Infrastructure Management PATROL Repository and create a deployable package for PATROL Agents and monitoring solutions or Knowledge Modules.
How-to topicDeploy and install the package on the PATROL Agents.
Note
If previous versions of PATROL Agents already exist, the packages must be configured to be installed into the same directory as the existing PATROL Agent.
How-to topicVerify that the PATROL package installations are successful and validate and test the PATROL Agents.
- Define staging policies
- Create and deploy packages to PATROL Agents
- Configure global thresholds
- Define monitoring policies
- Define time frames and blackout policies
- Enable the policies
- Test and validate the collected data
3. Configure global thresholds
Configure global server thresholds for the monitoring solutions.
Best practice
Identify and configure the thresholds that need to be set at the PATROL Agent level and at the monitoring solution level. For more information, see Configuring global thresholds.
4. Define monitoring policies
Create monitoring policies to be tested in the development environment.
Best practices
- Ensure that only required monitor instances are being discovered. Each monitoring solution may have different options on how to control this discovery.
- Disable discovery of instances that are short-lived (for example, instances that are created and then deleted within the span of one to two days).
- Ensure that the monitoring solutions that are used for data collection are preloaded.
5. (Optional) Define time frames and blackout policies
If you are creating blackout policies, create time frames for the monitoring solutions and then, create the blackout policies that will use the time frames.
6. Enable the policies
Enable the policies for the monitoring solutions.
7. Test and validate the collected data
Test and validate that the data is collected according to the policies that you defined. Resolve issues, if any.
You can view the status of the applied policies on the PATROL Agents as shown in the following image:
Also, you can view the performance data, events, and devices from the Truesight console.
Phase 2: In production environment
After you have validated and tested the collected data, you can move to the production environment. The following process diagram describes the recommended workflow in a production environment:
1. Move policies to production servers
Move the validated policies from test to production leveraging the export and import utility. This utility can be used only for blackout and monitoring policies. You must manually define the staging policies in the production server.
2. Deploy a subset of packages
Deploy and install a subset of the deployable packages on the production servers.
Import the Infrastructure Management PATROL Repository and create a deployable package for PATROL Agents and monitoring solutions or Knowledge Modules.
Deploy and install the package on PATROL Agents to the development or test managed servers.
3. Test and validate the deployment
Validate that the PATROL deployable package installations are successful.
4. Enable the policies
Enable the policies in the production environment. Validate the PATROL Agents and data collection in production. Resolve any issues.
5. Configure global thresholds
Configure global server thresholds for the monitoring solutions. Global thresholds are not automatically moved or migrated to the production environment.
Best practice
Identify and configure the thresholds that need to be set at the PATROL Agent level and at the monitoring solution level. For more information, see Configuring global thresholds.
6. Deploy remaining packages
Deploy remaining Agents in batches on the production servers.
7. Monitor and tune the performance
Between each batch of PATROL Agents and Integration Services that are deployed and configured, ensure that the Infrastructure Management server and Integration Services are performing well and can still manage the load.
Ensure that the scalability limitations of the Integration Services are not exceeded.
Additional monitoring sources and capabilities
Define manual application models based on groups and devices, or implement BMC TrueSight App Visibility manager to enable automatic application models from which you can monitor the performance and health of active or synthetic applications, perform diagnostics, and trace application transactions. | |
Use third-party adapters to provide a mechanism for external applications to funnel data into Infrastructure Management. Data adapters facilitate the synchronization of performance data collected by specific monitoring solutions into Infrastructure Management for further analysis. | |
Define impact service models to monitor when higher-level entities, such as applications, technical services, business services, and organizations are impacted, and how they are impacted when lower-level IT infrastructure entities, such as servers, network devices, and application systems are affected by some condition. |
See also Integrating.
Comments
Log in or register to comment.