Integration Service host deployment and best practices for event processing and propagation
The sections in this topic describe the flow of event and performance data in Infrastructure Management and provide guidance in the deployment, configuration, and use of those components to achieve a scalable environment.
Integration Service architecture and functions
The Integration Service can consume and forward both performance data and events. The following diagram illustrates how the Integration Service nodes fit into the Infrastructure Management architecture.
The Integration Service accepts streaming of PATROL data and events using a common connection port. The default port is 3183. This includes all the data points and events from PATROL for parameters that you select. After the events arrive at the Integration Service, they are separated and follow a unique path to one of the following based on configuration:
- The Integration Service local cell (default behavior)
- A named event cell
- The TrueSight Infrastructure Management Server associated with the Integration Service
For additional details about the ports used, see Network port schematics for Infrastructure Management.
Integration Service components
You can optionally install the following components and configure them on the Integration Service host depending on whether or not they are required in the environment. Before installing any of these additional components, consider scalability and additional resources that you might require.
—The event management cell is the event management process installed locally on the same server with the Integration Service. BMC recommends that you install the event management cell on all of the Integration Service host computers.
The cell is not required for forwarding events to the TrueSight Infrastructure Management Server; therefore, the cell does not have to be installed with the Integration Service.
—BMC Event Adapters work with the event management cell to consume non-PATROL events; for example, SNMP traps. BMC recommends that significant non-PATROL event collection be dedicated to other event management cells. The default event adapter classes, rules, and files are installed with the cell that is installed with the Integration Service.
and Knowledge Module (KM)— The PATROL Agent and Knowledge Module (KM) monitor the Integration Service host processes.
Buffering and recovery of data and events by the Integration Service
The Infrastructure Management architecture supports buffering of PATROL performance data and events at the PATROL Agents in case there is a network connectivity issue or if the Integration Service cannot be reached. When the PATROL Agent reconnects to an Integration Service process, the buffered data is sent. This capability is not intended to support buffering for very large amounts of data. It is intended to support a few minutes of lost connectivity, not hours or days. Testing has shown that the process can support up to 30 minutes of data collected by the PATROL Agents across 1000 managed servers.
Data collection and filtering by the Integration Service
The Integration Service processes are generally stateless, meaning the following:
- The Integration Services streams data directly to the TrueSight Infrastructure Management Server.
- There are no adapters associated with the PATROL data collection.
- All filtering of performance data is handled by the PATROL Agents.
- All filtering of events is handled by the PATROL Agents and if necessary in the event management cells.
The Integration Service acts as a proxy to receive and forward both data and events that are sent to it from the PATROL Agents. It also receives PATROL Agent and Knowledge Module (KM) configuration data from the Presentation Server and passes that data to the PATROL Agents.
Event and performance data flow processing by the Integration Service
PATROL Agents collect performance data and generate events for availability metrics. Assuming version 9.6 or higher of the Infrastructure Management server and the Integration Service are in use, both performance data and events from PATROL are streamed through the Integration Service hosts as follows:
- Performance data and events are sent to the Integration Service from PATROL Agents over the same TCP communication path (for details, see Network port schematics for Infrastructure Management).
- The Integration Service then forwards events to the event management cell that is running locally on the same host with the Integration Service.
- The event management cell further processes the events (filtering, enrichment, correlation, and so on) and forwards them to the Infrastructure Management Server.
- The Integration Service streams the performance data to the Infrastructure Management server.
PATROL streams raw performance data, including all of the data points that you decide to send, to the Infrastructure Management Server. The data is not summarized (as in previous versions).
Best practices for deploying the Integration Services with respect to event and data flow processing
- At least one remote Integration Service host must be deployed for all environments.
- Install the Integration Service and event management cell in pairs so that each Integration Service process has a corresponding event management cell installed on the same host computer. In this configuration, events are propagated from the Integration Service to the event management cell running on the same host. The option to install an event management cell is available when you install the Integration Service.
- Maintain the event flow path so that all events from any PATROL Agent are always processed through the same event management cells (including cell HA pairs). This ensures event processing continuity where automated processing of one event is dependent on one or more other events from the same agent. An example of this type of processing is the automated closure of critical events thatistriggered by “OK” events for the same object that was in a state of critical alarm. If you do not maintain the same event flow path per agent through the same event management cells, correlation of all events from the same agent is not possible because the necessary events are not received and available in the same cells.
- Some environments might require more than two Integration Service hosts in a cluster or more than two Integration Service hosts defined for each agent that sends the data (events and performance) through a third party load balancer to the Integration Service hosts. This is acceptable as long as all events from any one agent always flow through the same high availability (HA) cell pair and the event processing continuity is maintained. For example, if four Integration Service nodes are clustered, then each node in the cluster must not have a cell configured on it. Instead, the cell must be on other systems (in an HA pair) so that the event path remains the same for all events coming from the agents that the cluster handles. For further information about HA deployments, see High-availability deployment and best practices for Infrastructure Management.
- Dedicate significant non-PATROL event collection to other event management cells as recommended in previous Infrastructure Management versions. For most environments, propagate events from the Integration Service to a lower tier event management cell. This is especially important in environments that meet any of the following conditions:
- Involve more than a few thousand events in the system at any one time
- Include multiple events sources other than PATROL
- Support more than a few users
- A medium or large environment involving more than 100 managed servers
Limiting event propagation to the Infrastructure Management Server
The event management cells allow you to further process events (event enrichment, filtering, correlation, deduplication, auto closure, and so on) before sending them on to the Infrastructure Management Server. This type of event processing must be avoided on the Infrastructure Management Server as much as possible. Event processing in the Infrastructure Management Servers must be controlled and limited to the following:
- Event presentation of actionable events only
- Collection of events for Probable Cause Analysis
- Events used in service modeling
Events sent to the Infrastructure Management Servers must be closely controlled and limited for the following reasons:
- Event presentation in the Infrastructure Management Server must not be cluttered with unactionable events that distract or otherwise reduce the efficiency of end users.
- The capability to view PATROL performance data in Infrastructure Management without having to forward and store the data in the database is likely to decrease the number of parameters that trend in the Infrastructure Management Server for most environments. This might increase the number of events propagated from PATROL for parameters that do not require baselines but do require static thresholds. This increase will increase the load on the event management cell in the Infrastructure Management Server.
- PATROL events are approximately twice the size in bytes compared to events generated in the Infrastructure Management Server. A larger volume of PATROL events increases the memory consumption of the event management cell on the Infrastructure Management Server and also increases the Infrastructure Management Server startup time. The overall startup time for an Infrastructure Management Server at full capacity ranges from 15 to 20 minutes.
- Automated events to monitor association has a slightly increased load on the event management cell that is embedded in the Infrastructure Management Server.
Event processing with BMC product integrations
Additional Integration Service deployment best practices
The following are additional best practices for deployment and configuration of Integration Services, event management cells, and event and performance data collection.
|Integration Service and Integration Service host deployment|
|Event management cell deployment and configuration|
|Event collection and processing|
|Performance data collection|
PATROL Agent configuration and assignment to an Integration Service
The configuration of the performance and event data that is sent from the PATROL Agents to the TrueSight Infrastructure Management Server is defined in policies, which are automatically applied to the required PATROL Agents. The PATROL Agent assignment is defined in each policy based on selection criteria. The details of agent selection criteria per policy are discussed at Staging Integration Service host deployment and policy management for development, test, and production best practices. PATROL events and performance data are completely controlled at the PATROL Agent based on these policies. This means data, events, data, and events, or no data and no events are controlled as per the parameter. You can edit or change these configuration settings when you want without having to rebuild any configurations or restart any processes.
How configuration is applied to a PATROL Agent
First, a PATROL Agent reads the tag information from the pconfig variable, /AgentSetup/Identification/Tags/Tag/tagName, where
tagName is the name of the tag. The PATROL Agent then sends the information to the Integration Service, which passes the information to the Presentation Server. The Presentation Server evaluates which policies match the tags or the agent properties, determines the final configuration to be applied, and sends the configuration information to the agent.
When configuration is applied to a PATROL Agent
A PATROL Agent initiates a configuration request after certain events, such as agent installation, agent restart, agent auto-connection with Integration Service, or changing a tag on the agent. If no policy matches the agent conditions, the agent does not receive configuration information. The agent does not receive the configuration until a matching policy is created.
If a policy is created or updated, changes are pushed from the Presentation Server, via the Integration Service, to PATROL Agents.
Where PATROL Agent configuration is stored
The monitoring solutions configuration is stored under the /ConfigData pconfig branch. The pconfig variables received by PATROL Agent from the Presentation Server are applied with the REPLACE request. For the configuration under /ConfigData, only the difference between the configuration received and the configuration that the agent contains is applied. If some configuration is not received for a particular class, it is considered to be deleted and is deleted from /ConfigData. For the configuration under /AgentSetup, it is applied directly.
For /AgentSetup configurations, the variables under the /ConfigData pconfig branch take precedence if there are conflicts.
You should not manually update any variables and values under /ConfigData. The variables and values are only for internal use.
Solution Administrators configure PATROL Agents in the TrueSight console. For information, see .