Use case: Automatic incident management


This use case describes how an organization can automate the management of IT infrastructure incidents with saved time and effort by establishing Proactive Service Resolution (PSR) in BMC Helix Integration Service.

For more information about PSR, see Proactive-Service-Resolution-for-automatic-incident-management-with-BMC-Helix-Integration-Service.


Scenario

Consider a scenario with a large-scale organization, wherein one of the servers in the IT infrastructure is not responding. This outage impacts the Payroll application and slows down the Consumer Banking Service in the organization. 

Before users begin calling in to report a problem, the server failure must be discovered, communicated to the service desk, diagnosed, and resolved.

By establishing PSR, the organization can act on the incident and initiate a change process to solve the issue before a more serious problem occurs. Whether it’s an outage, a slowdown, or a memory shortage, the PSR solution optimizes the way event-generation and incident-creation processes operate together to manage IT.

Additionally, PSR integration helps to save a considerable amount of time spent on identifying the following:

  • What applications or services may be affected by the issue?
  • Who supports the affected applications or services?
  • Who owns the affected applications or services?
  • Is the affected CI in a maintenance window?
  • Is the affected CI in a scheduled change window?
  • Is there an open incident associated to this event and affected CI? Is it already in progress?
  • Who is responsible for investigating and fixing the system associated to the event?
  • Is there a workflow or an automation to remediate the issue?
  • When can the incident be closed?

Workflow

The PSR workflow depends on the service resolution level and on the deployment scenario planned for an organization. For more information about service resolution levels and a high-level overview of the products involved in PSR, see Proactive-Service-Resolution-for-automatic-incident-management-with-BMC-Helix-Integration-Service.

For instructions on setting up the required workflow, see Setting-up-Proactive-Service-Resolution-to-enable-automatic-incident-management.

Workflow for event-based service resolution

The following workflow diagram illustrates the process of automatic incident management for the event-based service resolution (Level 1).

psr_flow_L1.png

  1. The Server Is Down event is created in TrueSight Operations Management or BMC Helix Operations Management.
  2. An active PSR flow in BMC Helix Integration Service is automatically triggered to create a corresponding incident with the event information in BMC Helix ITSM.
  3. The incident is updated in BMC Helix ITSM.
  4. An active PSR flow in BMC Helix Integration Service is automatically triggered to update the event in TrueSight Operations Management or BMC Helix Operations Management.

Workflow for infrastructure-based service resolution

The following workflow diagram illustrates the process of automatic incident management for the infrastructure-based service resolution (Level 2).

psr_l2.png

  1. BMC Discovery scans a server, and creates a corresponding CI in CMDB.
  2. The Server Is Down event is created in TrueSight Operations Management or BMC Helix Operations Management.
  3. An active PSR flow in BMC Helix Integration Service is automatically triggered to create a corresponding incident with the event information in BMC Helix ITSM.
  4. The incident is updated in BMC Helix ITSM.
  5. An active PSR flow in BMC Helix Integration Service is automatically triggered to update the event in TrueSight Operations Management or BMC Helix Operations Management.

Workflow for triage and remediation

The following workflow diagram illustrates the process of automatic incident management for the triage and remediation use case.

psr_flow_L2.png

  1.  BMC Discovery automatically tracks a service outage; a corresponding CI is created in CMDB.
  2.  The Service Is Down event is created in TrueSight Operations Management or BMC Helix Operations Management.
  3. An active PSR flow in BMC Helix Integration Service is automatically triggered to convert the event into a corresponding incident in BMC Helix ITSM.
  4. If the incident is updated in BMC Helix ITSM, an active PSR flow automatically updates the event in TrueSight Operations Management or BMC Helix Operations Management.
  5. TrueSight Operations Management or BMC Helix Operations Management connects to TrueSight Orchestration to remediate the service issue.
  6. Truesight Orchestration validates and restarts the service. 
  7. The incident in BMC Helix ITSM and the event in TrueSight Operations Management or BMC Helix Operations Management are updated with the remediation details.
  8. The event is closed in TrueSight Operations Management or BMC Helix Operations Management.
  9. An active PSR flow is automatically triggered to resolve the incident in BMC Helix ITSM. 

Results

As a result of an established PSR integration, an event is converted into an incident that automatically goes through the following incident management stages: 

  • Identification
  • Registration
  • Categorization
  • Prioritization
  • Assessment
  • Escalation
  • Investigation and diagnosis
  • Resolution and recovery
  • Incident closure

Benefits

By establishing PSR, organizations can achieve high reliability, maintainability, and availability of key business services. The following table lists the benefits related to each workflow:

Workflow

Purpose

Benefits

Event-based service resolution

Fits for organizations on the early stages of the service management and operations management maturity.

  • Operations has insight into the state of the incidents raised from the corresponding events.
  • No CMDB is required.

Infrastructure-based service resolution

Fits for organizations with an average service management, operations management, and CMDB maturity.

  • Operations has insight into the state of the incidents raised from the corresponding events.
  • Service desk gains causal CI awareness for the incidents.
  • The CMDB service modeling is optional.
  • The BMC Discovery application modeling is optional.

Triage and remediation

Fits for organization with the high level of operations management maturity.

  • Enables operations to take remediation actions for events.
  • Introduces policies to automate remediation for known events.
  • Automates change process when performing remediation.
  • Automates incidents update and closure upon successful remediation.

 

Tip: For faster searching, add an asterisk to the end of your partial query. Example: cert*