Consider the following best practices before implementing or using BMC Service Resolution.
Assess your current level of capability
This best practice involves assessing your level of capability. Your level of capability is based on the maturity of your ticketing system. You might be using Event-based service resolution, Infrastructure-based service resolution, or Impact-model based ticketing system to manage your incident life cycle. In addition to the level of ticketing, software, hardware, products and its versions that your system uses, you should consider your organizational structure and its people when assessing your level of capability.
The organizational structure largely determines who gets engaged and who gets notified in case of an event. Organizational structure also determines how incidents are routed to a specific group based on the CI, which is further determined by the operational categories populated in the CI. You should plan, coordinate, and configure right routing rules that suit your organizational structure.
Evolve! Don't skip.
A misinterpretation that organizations make when attempting to implement Level 3 is that they try to jump from level zero to Level 3. There is an evolution that needs to occur to make sure the required components at each level are functioning for their respective areas—doing it effectively and doing it efficiently. As your organization matures, you expand the coverage of events types and enrich the content and context of the incidents generated for these events.
You start service resolution in your organization by getting connected at the event level. Next, you expand to the infrastructure level, where you take the organization CIs into consideration. You further expand to leverage business model, which helps establish a line of sight from issues in the infrastructure to business services or applications impacted by those issues.
- Level 1 is a very effective and good point to start BMC Service Resolution—even if you are already doing some base level service resolution.
Starting with Level 1 does not mean that you stay the same level and do not ever move. It does mean that the duration of your stay at this level entirely depends on the maturity and capability of your organization. At this level, the communication channel between event monitoring and incident management gets established, which ensures that when an event occurs, an incidents gets created for that event. If you do not have this channel established, jumping to level 2 and level 3 can be more complicated.
- After Level 1 is established, you can then easily and readily move to Level 2.
But you need to understand that by going to Level 2, you should have a CMDB in place, which is being populated by automated discovery and is a discipline in itself. So, if this discipline or capability is not in place, not managed, or not supported, then there is another effort or set of activities that need to be performed to establish the capability. This can then be leveraged to give you the results that you would achieve with Level 2 by defining CIs, having those CIs published, or making those CIs available to your event correlation system. Level 2 is to enhance the amount of information with better and richer content.
- Level 3 is the next logical step after you have established reliable consistent and good processes for supporting Level 1 and Level 2 type incidents.
Level 3 is about leveraging the context or the model. After reaching Level 3, you have another level of maturity that is required when you start talking about service models. You cannot achieve level 3 if you are not on level 2. Similarly, you cannot achieve level 2 if you are not on level 1.
Establish baseline metrics
Before you start any activity for BMC Service Resolution, review your existing environment and identify the metrics that are important to you. You must establish baseline metrics and you should be able to demonstrate that you are making progress and improvement; and if you are not, then you need to identify and understand the failure points. Metrics help you to assess your system's performance. You need to ensure that there is a baseline that is established–it is up to you as a customer to determine the metrics that are most relevant for you to track against; for example, mean time to resolution and mean time between failures.
Other things to consider:
- Monitor your incidents - how many incidents are reported by customers vs how many are auto-detected by the monitoring system
- Monitor Incident vs. Event ratio
- Identify your severity 1 incidents; perform a measurement; and create a graph or a chart that captures the reported time and the ultimate resolution time
- Find out how long it took (an average time) to resolve the Severity 1 incidents over the past months
Always ensure that you get all this information before you start the service resolution activity, as you want to have the baseline as a reference for later use. This will help you ensure that you get the improvement in results that you were looking for. After a certain period, (for example, six months), you should revisit your baseline metrics and establish new baselines. This will help you to further improve your system's processes and efficiency.
Stay in your respective swim lane of responsibility
BMC recommends that when using BMC Service Resolution, the event monitoring and incident management applications stay true and focused on their role in the solution. Although you might be tempted to do customizations that involve assigning a group in the event management application, this is not the function of event management. The incident routing activity should primarily occur in the Incident Management application. Event enrichment and preprocessing activities should occur in the event management application.
Refer to the following table that lists the activities and role of each application in the solution.
|Configuring routing rules||Incident Management|
|Group assignment||Incident Management|
|Product Categorization||Incident Management|
|Event correlation and enrichment||Event Monitoring|
|Generating a Causal Incident||Event Monitoring|
|Service modeling and data configuration||CMDB|
Always establish the default assignment group for routing
You must always define a default assignment group for routing. It is possible that in your system you have missed defining the routing rules and if a default assignment group is also not defined, the incident is not created because incident creation requires either a routing rule or a default group to route the incident. Note that an incident should be routed to the default assignment group only in the absence of routing rules. Routing every incident to the default group should not be the norm. A default group should be used as a safety net for guaranteed incident creation. See Incident routing.
Reconcile customizations, processes, and data
It is important that you review, assess, and reconcile your processes and data before implementing BMC Service Resolution. This is because the way you did certain processes in earlier versions might be impacted by what you do in the new release of BMC Service Resolution.
It is recommended that you assess your customizations and ensure that the customizations done in an earlier version of a product or solution are reconciled in advance so that upgrading to a newer version of the product or solution is smooth and does not result in loss of data. For example, starting with BMC Service Resolution 3.0, the INT:Staging form is no longer used. You should reconcile customizations done on the INT:Staging form before implementing BMC Service Resolution 3.0. You should also plan to deprecate customizations that are no longer needed
Consider a scenario to reconcile your processes. In the earlier releases of BMC Service Resolution, you could not consolidate related events in one incident automatically. It required manual intervention. You might have had a process that required a person to analyze related events and add them up to the incident work log. However, this manual process is no longer needed in BMC Service Resolution 3.0. In BMC Service Resolution 3.0, related events are consolidated in one incident by the system. See Consolidating events.
Consider a scenario to reconcile data. You might have configured Supported By group as the assignment group for a CI. Prior to BMC Intelligent Ticketing 2.0, this assignment was just used for documentation. The system did not route the incident to the Supported By group by looking at the details on the assignment form. However, in BMC Service Resolution 3.0, if you specify Supported By group for a CI, and if an event with the CI results in an incident, the incident is routed to the Supported By group as specified in the CI. Before implementing BMC Service Resolution 3.0, you need to asses and reconcile your data if you need this data for the automation to work as expected.
Incident management processes vs Event monitoring processes
When you design the impact model for an incident, ensure that you do not necessarily replicate the model that is in the event monitoring for CI infrastructure. There is a fundamental difference in the way we look at the model, assess the impact, generate incidents and then route to the assigned group for resolution. Be aware that there are differences in incident processes and event processes and do not perform customizations in your system to align both the models with each other. The goal should be to enable collaboration between event monitoring process and incident management process and not have one to one mapping between the two processes to convert an event into an incident.