Incident governance

Incident governance establishes which process (Event monitoring or Incident management) maintains control over the state and contents of an infrastructure event incident during key stages of its life cycle. Incident governance is established by the global setting of Manage or Update and by default governance is set to Manage.

When the global governance is set to Manage, the life cycle of the incident is managed by the event management system. This means that whatever values are updated by the service desk agent for the status, urgency, and severity fields will be overwritten by the event management system.

However, if the global governance setting is set to Update, after the incident is moved out of the Assigned state, all subsequent information about the event will be updated in the work details of the incident.

The following table discusses the characteristics of incident governance in event management system and Incident management system.

Event Monitoring governance (Global setting: "Incident Governance = Manage")Incident management governance  (Global setting: "Incident Governance = Update")

Incidents generated from an automated event monitoring (event management) system are often initiated and cleared before any human intervention is required. In such scenarios, it is necessary for the event management system to maintain the governance or control over the life cycle of the incident to ensure that resolved events resolve the supporting incidents that recorded the event.

However,

  • Events captured by the event management system are mainly indicators of issues that may exist in the infrastructure environment. In fact, there can be cases where false alarms occur as the monitoring trigger thresholds are refined.
  • Not all events point directly to the root cause of the issue(s). These alerts are important, but they don't identify the root cause of issue completely. They are an early warning of an issue that may exist which will require further investigation.
  • In many cases, well defined event management system generate events that identify the exact CI with the issue and also provide information about to related critical business or technical services impacted by the CI. However, it may also mean that the issue is much broader than the one CI the event was triggered from.
  • Not all event management systems send an update to indicate that the CI is back up and running. Most just send the original message that the CI is down and no subsequent updates are sent.

After an incident is generated, the break-fix process of Incident Management is engaged.

During the investigation of an incident, Incident Management allows you to address other issues in the environment related to the CI that generated the original incident. In such scenarios, the governance or control of the incidents lifecycle is managed appropriately by the service desk technician working now on the broader issue.

Any updates from the event management system about the state of the event are still relevant and important. However, these updates enrich the related incident, and do not determine what the state of the incident considering that the event is only one aspect of the larger issue that is now being worked upon.

BMC Service Resolution enables more flexibility in determining the governance of an incident generated by automated event management system. Incident governance can be managed as follows:

  • For efficiency, if an alert is generated and cleared before the Incident Management break-fix process is engaged, governance of the incident lies with the automated event management system that generated the incident.
  • After a service desk technician takes ownership of an incident by changing the status from assigned to any other status, governance of the incident can be shifted to the service desk technician. All subsequent updates to the original event are reflected in the related incident as updated information instead of changing the state of the incident.

The following diagram illustrates how governance of the Incident is shifted from the event management system to Incident Management and the assigned service desk technician.

  1. An alert is generated on one of the core server due to 75% capacity utilization of the temp space.
  2. The event management system creates the event.
  3. The service desk interface creates an incident for the event.
  4. Incident management routes and assigns that incident to the service desk technician.
  5. The service desk technician decides to govern the status of the incident.
    For more information, see:
  6. The service desk technician rather than just working on the alert, decides to asses the temp space on other servers also. The technician identifies that the temp space on two other servers has reached to 69% and 70% capacity. An alert was not generated for these servers as the monitoring threshold is set at 75%. The executive expands the scope of the incident to include both these servers along with the reported server. Only after the service desk technician clears the temp space on all the three servers the incident is resolved.

Related topics

Governing incidents by changing the status

Key concepts

Was this page helpful? Yes No Submitting... Thank you

Comments

  1. Tim Bocardo

    Why is "Manage" the default?  It's clearly not a best practice.

    Jul 06, 2017 03:43