Use case: Automatic incident management
Scenario
Consider a scenario with a large-scale organization, wherein one of the servers in the IT infrastructure is not responding. This outage impacts the Payroll application and slows down the Consumer Banking Service in the organization.
Before users begin calling in to report a problem, the server failure must be discovered, communicated to the service desk, diagnosed, and resolved.
By establishing PSR, the organization can act on the incident and initiate a change process to solve the issue before a more serious problem occurs. Whether it’s an outage, a slowdown, or a memory shortage, the PSR solution optimizes the way event-generation and incident-creation processes operate together to manage IT.
Additionally, PSR integration helps to save a considerable amount of time spent on identifying the following:
- What applications or services may be affected by the issue?
- Who supports the affected applications or services?
- Who owns the affected applications or services?
- Is the affected CI in a maintenance window?
- Is the affected CI in a scheduled change window?
- Is there an open incident associated to this event and affected CI? Is it already in progress?
- Who is responsible for investigating and fixing the system associated to the event?
- Is there a workflow or an automation to remediate the issue?
- When can the incident be closed?
Workflow
The PSR workflow depends on the service resolution level and on the deployment scenario planned for an organization. For more information about service resolution levels and a high-level overview of the products involved in PSR, see Proactive-Service-Resolution-for-automatic-incident-management-with-BMC-Helix-Integration-Service.
For instructions on setting up the required workflow, see Setting-up-Proactive-Service-Resolution-to-enable-automatic-incident-management.
Workflow for event-based service resolution
The following workflow diagram illustrates the process of automatic incident management for the event-based service resolution (Level 1).
- The Server Is Down event is created in TrueSight Operations Management or BMC Helix Operations Management.
- An active PSR flow in BMC Helix Integration Service is automatically triggered to create a corresponding incident with the event information in BMC Helix ITSM.
- The incident is updated in BMC Helix ITSM.
- An active PSR flow in BMC Helix Integration Service is automatically triggered to update the event in TrueSight Operations Management or BMC Helix Operations Management.
Workflow for infrastructure-based service resolution
The following workflow diagram illustrates the process of automatic incident management for the infrastructure-based service resolution (Level 2).
- BMC Discovery scans a server, and creates a corresponding CI in CMDB.
- The Server Is Down event is created in TrueSight Operations Management or BMC Helix Operations Management.
- An active PSR flow in BMC Helix Integration Service is automatically triggered to create a corresponding incident with the event information in BMC Helix ITSM.
- The incident is updated in BMC Helix ITSM.
- An active PSR flow in BMC Helix Integration Service is automatically triggered to update the event in TrueSight Operations Management or BMC Helix Operations Management.
Workflow for triage and remediation
The following workflow diagram illustrates the process of automatic incident management for the triage and remediation use case.
- BMC Discovery automatically tracks a service outage; a corresponding CI is created in CMDB.
- The Service Is Down event is created in TrueSight Operations Management or BMC Helix Operations Management.
- An active PSR flow in BMC Helix Integration Service is automatically triggered to convert the event into a corresponding incident in BMC Helix ITSM.
- If the incident is updated in BMC Helix ITSM, an active PSR flow automatically updates the event in TrueSight Operations Management or BMC Helix Operations Management.
- TrueSight Operations Management or BMC Helix Operations Management connects to TrueSight Orchestration to remediate the service issue.
- Truesight Orchestration validates and restarts the service.
- The incident in BMC Helix ITSM and the event in TrueSight Operations Management or BMC Helix Operations Management are updated with the remediation details.
- The event is closed in TrueSight Operations Management or BMC Helix Operations Management.
- An active PSR flow is automatically triggered to resolve the incident in BMC Helix ITSM.
Results
As a result of an established PSR integration, an event is converted into an incident that automatically goes through the following incident management stages:
- Identification
- Registration
- Categorization
- Prioritization
- Assessment
- Escalation
- Investigation and diagnosis
- Resolution and recovery
- Incident closure
Benefits
By establishing PSR, organizations can achieve high reliability, maintainability, and availability of key business services. The following table lists the benefits related to each workflow:
Workflow | Purpose | Benefits |
---|---|---|
Event-based service resolution | Fits for organizations on the early stages of the service management and operations management maturity. |
|
Infrastructure-based service resolution | Fits for organizations with an average service management, operations management, and CMDB maturity. |
|
Triage and remediation | Fits for organization with the high level of operations management maturity. |
|