This following topics describe the flow of event and performance data in Infrastructure Management and provide guidance in the deployment, configuration, and use of those components to achieve a scalable environment.
Integration Service architecture and functions
The Integration Service can consume and forward both performance data and events. The following diagram illustrates how the Integration Service nodes fit into the Infrastructure Management architecture.
The Integration Service accepts streaming of PATROL data and events using a common connection port. The default port is 3183. This includes all the data points and events from PATROL for parameters that you select. After the events arrive at the Integration Service, they are separated and follow a unique path to one of the following based on configuration:
- The Integration Service local cell (default behavior)
- A named event cell
- The BMC TrueSight Infrastructure Management Server associated with the Integration Service
For additional details about the ports used, see Network port schematics for Infrastructure Management.
Integration Service components
You can optionally install the following components and configure them on the Integration Service host depending on whether or not they are required in the environment. Before installing any of these additional components, consider scalability and additional resources that you might require.
Event management cell—The event management cell is the event management process installed locally on the same server with the Integration Service. BMC recommends that you install the event management cell on all of the Integration Service host computers.
The cell is not required for forwarding events to the BMC TrueSight Infrastructure Management Server; therefore, the cell does not have to be installed with the Integration Service.
BMC Event Adapters—BMC Event Adapters work with the event management cell to consume non-PATROL events; for example, SNMP traps. BMC recommends that significant non-PATROL event collection be dedicated to other event management cells. The default event adapter classes, rules, and files are installed with the cell that is installed with the Integration Service.
PATROL Agent and Knowledge Module (KM)— The PATROL Agent and Knowledge Module (KM) monitor the Integration Service host processes.
- BMC Impact Integration Web Services
Buffering and recovery of data and events by the Integration Service
The Infrastructure Management architecture supports buffering of BMC PATROL performance data and events at the PATROL Agents in case there is a network connectivity issue or if the Integration Service cannot be reached. When the PATROL Agent reconnects to an Integration Service process, the buffered data is sent. This capability is not intended to support buffering for very large amounts of data. It is intended to support a few minutes of lost connectivity, not hours or days. Testing has shown that the process can support up to 30 minutes of data collected by the PATROL Agents across 1000 managed servers.
Data collection and filtering by the Integration Service
The Integration Service processes are generally stateless, meaning the following:
- The Integration Services streams data directly to the BMC TrueSight Infrastructure Management Server.
- There are no adapters associated with the PATROL data collection.
Event and performance data flow processing by the Integration Service
BMC PATROL Agents collect performance data and generate events for availability metrics. Assuming version 9.6 or higher of the Infrastructure Management server and the Integration Service are in use, both performance data and events from BMC PATROL are streamed though the Integration Service hosts as follows:
- Performance data and events are sent to the Integration Service from BMC PATROL Agents over the same TCP communication path (for details, see Network port schematics for Infrastructure Management).
- The Integration Service then forwards events to the event management cell that is running locally on the same host with the Integration Service.
- The event management cell further processes the events (filtering, enrichment, correlation, and so on) and forwards them to the Infrastructure Management Server.
- The Integration Service streams the performance data to the Infrastructure Management server.
BMC PATROL streams raw performance data, including all of the data points that you decide to send, to the Infrastructure Management Server. The data is not summarized (as in previous versions).
Best practices for deploying the Integration Services with respect to event and data flow processing
- At least one remote Integration Service host must be deployed for all environments.
- Install the Integration Service and event management cell in pairs so that each Integration Service process has a corresponding event management cell installed on the same host computer. In this configuration, events are propagated from the Integration Service to the event management cell running on the same host. The option to install an event management cell is available when you install the Integration Service.
- Maintain the event flow path so that all events from any PATROL Agent are always processed through the same event management cells (including cell HA pairs). This ensures event processing continuity where automated processing of one event is dependent on one or more other events from the same agent. An example of this type of processing is the automated closure of critical events that is triggered by “OK” events for the same object that was in a state of critical alarm. If you do not maintain the same event flow path per agent through the same event management cells, correlation of all events from the same agent is not possible because the necessary events are not received and available in the same cells.
- Some environments might require more than two Integration Service hosts in a cluster or more than two Integration Service hosts defined for each agent that sends the data (events and performance) through a third party load balancer to the Integration Service hosts. This is acceptable as long as all events from any one agent always flow through the same high availability (HA) cell pair and the event processing continuity is maintained. For example, if four Integration Service nodes are clustered, then each node in the cluster must not have a cell configured on it. Instead, the cell must be on other systems (in an HA pair) so that the event path remains the same for all events coming from the agents that the cluster handles. For further information about HA deployments, see High-availability deployment and best practices for Infrastructure Management.
- Dedicate significant non-PATROL event collection to other event management cells as recommended in previous Infrastructure Management versions. For most environments, propagate events from the Integration Service to a lower tier event management cell. This is especially important in environments that meet any of the following conditions:
- Involve more than a few thousand events in the system at any one time
- Include multiple events sources other than PATROL
- Support more than a few users
- A medium or large environment involving more than 100 managed servers
Limiting event propagation to the Infrastructure Management Server
The event management cells allow you to further process events (event enrichment, filtering, correlation, deduplication, auto closure, and so on) before sending them on to the Infrastructure Management Server. This type of event processing must be avoided on the Infrastructure Management Server as much as possible. Event processing in the Infrastructure Management Servers must be controlled and limited to the following:
- Event presentation of actionable events only
- Collection of events for Probable Cause Analysis
- Events used in service modeling
Events sent to the Infrastructure Management Servers must be closely controlled and limited for the following reasons:
- Event presentation in the Infrastructure Management Server must not be cluttered with unactionable events that distract or otherwise reduce the efficiency of end users.
- The capability to view PATROL performance data in Infrastructure Management without having to forward and store the data in the database is likely to decrease the number of parameters that trend in the Infrastructure Management Server for most environments. This might increase the number of events propagated from PATROL for parameters that do not require baselines but do require static thresholds. This increase will increase the load on the event management cell in the Infrastructure Management Server.
- PATROL events are approximately twice the size in bytes compared to events generated in the Infrastructure Management Server. A larger volume of PATROL events increases the memory consumption of the event management cell on the Infrastructure Management Server and additionally increases the Infrastructure Management Server startup time. The overall startup time for an Infrastructure Management Server at full capacity ranges from 15 to 20 minutes.
- Automated events to monitor association has a slightly increased load on the event management cell that is embedded in the Infrastructure Management Server.
Event processing with BMC product integrations
For event processing, integrate Infrastructure Management Servers with BMC Remedy IT Service Management Suite and other BMC products such as BMC Atrium Orchestrator. These integrations must not be configured in the Central Server. For exceptions, see Exceptions requiring a Central Server deployment.
Additional Integration Service deployment best practices
The following are additional best practices for deployment and configuration of Integration Services, event management cells, and event and performance data collection.
|Integration Service and Integration Service host deployment|
- Install Integration Service hosts close to the data sources for which they process data. Deploy by geography, department, business, or applications, especially if multiple Integration Services are required from a single source.
- Install an Integration Service for each major network subnet.
- Use dedicated Integration Service hosts for large domain data collection. For example VMware vSphere, remote operating system monitoring, and other large sources of data.
- Limit the usage of HTTPS traffic between the Integration Service nodes and the Infrastructure Management Servers. HTTPS is not as scalable as HTTP and requires more administration.
|Event management cell deployment and configuration|
- The number and placement of event management cells must be based on the number of events, event source domains (secure zones, geography, and so on), and major event sources. Always deploy multiple event management cells in the following scenarios:
- Large environments
- Geographically distributed managed infrastructure
- Large numbers of events
- When different event sources require different event management rules; for example, large numbers of SMNP traps compared to events from BMC PATROL
- Significantly different event management operations are divided by teams
- Install dedicated event processing cells to manage large volumes of events from common sources such as SNMP traps, SCOM, and other significant sources of events.
- Install the event management cell on all Integration Service nodes. Do not install additional event management cells on the Infrastructure Management Server. Install them on Integration Service hosts and remote hosts as needed.
- Deploy event management cells close to or on the same node as event sources for third party sources.
- Do not try to use the event management cells as a high volume SNMP trap forwarding mechanism.
- Configure the display of remote event management cells in the Infrastructure Management Server when necessary.
- Do not configure Integration for notifications, or other global event forwarding integrations on the lower tier event processing cells. Global event forwarding integrations must configured on the Infrastructure Management Server.
|Event collection and processing|
- Filter, enrich, normalize, deduplicate and correlate events at the lowest tier event management cells as much as possible before propagating to the next level in the event flow path.
- Do not collect unnecessary events. Limit event messages sent from data sources to messages that require action or analysis.
- Do not send raw events directly to the BMC TrueSight Infrastructure Management Server. Every environment must have at least one lower tier event management cell.
|Performance data collection|
- Do not collect excessive or unnecessary performance data. Review the need for lower polling intervals considering server performance and database size.
- Do not collect trends for availability metrics.
- Limit the streaming of performance data to the Infrastructure Management Server for the following purposes only:
- Parameters designated as Key Performance Indicators (KPIs) to support baselines, abnormality detection, and predictive alarming.
- Parameter data required for performance reports in BMC TrueSight Operations Management Reporting. This must be limited to KPI parameters but can be extended.
- Parameters that are necessary or required for probable cause analysis leveraging baselines and abnormalities.
PATROL Agent configuration and assignment to an Integration Service
Configuration of the performance and event data that is sent from the PATROL Agents to the BMC TrueSight Infrastructure Management Server is defined in policies, which are automatically applied to the required PATROL Agents. The PATROL Agent assignment is defined in each policy based on selection criteria. The details of agent selection criteria per policy are discussed at Staging Integration Service host deployment and policy management for development, test, and production best practices. BMC PATROL events and performance data are completely controlled at the PATROL Agent based on these policies. This means data, events, data and events, or no data and no events are controlled as per the parameter. You can edit or change these configuration settings when you want without having to rebuild any configurations or restart any processes.
How configuration is applied to a PATROL Agent
First, a PATROL Agent reads the tag information from the pconfig variable, /AgentSetup/Identification/Tags/Tag/tagName, where
tagName is the name of the tag. The PATROL Agent then sends the information to the Integration Service, which passes the information to the Presentation Server. The Presentation Server evaluates which policies match the tags or the agent properties, determines the final configuration to be applied, and sends the configuration information to the agent.
When configuration is applied to a PATROL Agent
A PATROL Agent initiates a configuration request after certain events, such as agent installation, agent restart, agent auto-connection with Integration Service, or changing a tag on the agent. If no policy matches the agent conditions, the agent does not receive configuration information. The agent does not receive the configuration until a matching policy is created.
If a policy is created or updated, changes are pushed from the Presentation Server, via the Integration Service, to PATROL Agents.
Where PATROL Agent configuration is stored
The monitoring solutions configuration is stored under the /ConfigData pconfig branch. The pconfig variables received by PATROL Agent from the Presentation Server are applied with the REPLACE request. For the configuration under /ConfigData, only the difference between the configuration received and the configuration that the agent contains is applied. If some configuration is not received for a particular class, it is considered to be deleted and is deleted from /ConfigData. For the configuration under /AgentSetup, it is applied directly.
For /AgentSetup configurations, the variables under the /ConfigData pconfig branch take precedence if there are conflicts.
You should not manually update any variables and values under /ConfigData. The variables and values are only for internal use.
Solution Administrators configure BMC PATROL Agents in the TrueSight console. For information, see
Setting up agents and monitoring polices for event management
Deployment use cases and best practices for Infrastructure Management
Best practices webinars