Configuring incident correlation to detect similar incident clusters

After BMC Helix ITSM Insights is activated, Service Desk managers can use the Real-time incident correlation workspace to detect similar incident clusters and view emerging hotspots.

The system uses a set of default fields and settings for the Real-time incident correlation workspace. As a tenant administrator, you can change the incident correlation configuration based on your requirements.

If your admin has updated your permissions in the hierarchical groups in BMC Helix ITSM, you must update the Real-time incident correlation configuration settings to view the clusters relevant to you.

If you have set up custom priority values, you must update the Real-time incident correlation configuration settings (except Similarity threshold) to view the updated custom priority details in the Real-time incident correlation dashboard. The algorithm takes at least six hours to display the newly added custom priority values in the Real-time incident correlation workspace.

Warning

Updating the similarity threshold value or the stop word file triggers the deletion of existing clusters.

Best practice for configuring changes

When you change the incident correlation configuration, all existing clusters are removed from the dashboard and the system performs the analysis again. This action might impact the analysis being carried out by any Service Desk managers or agents who are using the Real-time incident correlation dashboard for analysis.

We recommend that any configuration changes for incident correlation are done in off-hours so that the impact is minimal.

Out-of-the-box configuration for incident correlation

BMC Helix ITSM Insightsuses a set of default fields and settings to display the clusters in the Real-time incident correlation dashboard.

The following table describes this out-of-the-box configuration for incident correlation:

Fields	Default value
Default fields used by the system for incident correlation	Assignee Assignee - Company (Assigned Support Company) Assignee Support group (Assigned Group) CI (HPD_CI) Calculated priority (Priority) City Closed Date Communication coordinator - Company (SV_ComCoord_SupportCompany) Communication coordinator - Support group (SV_ComCoordSGP) Company Customer site (Site) HPD_CI_ReconID Impact Incident Number Incident type (Service Type) InstanceId Last Resolved Date Major incident manager - Company (SV_MIM_Company) Major incident manager - Support group (SV_MIM_SGP) Operational category 1 (Categorization Tier 1) Operational category 2 (Categorization Tier 2) Operational category 3 (Categorization Tier 3) Product Name Product category 1 (Product Categorization Tier 1) Product category 2 (Product Categorization Tier 2) Product category 3 (Product Categorization Tier 3) Region Reported Date Service (Service CI) ServiceCI_ReconID Site group (Site Group) Status Status_Reason_Hidden (Status_Reason) Submit Date Submitter Summary (Description) Total Time Spent Urgency
The maximum number of days a cluster can stay open	7 days
Similarity threshold	7
Minimum number of incidents that a cluster should have to be visible in the dashboard	5

Warning

High volume of data in the Description (Detailed Description) field may result in performance issues while generating clusters in ITSM Insights.

Starting with version 23.3.00, the Description (Detailed Description) field is no longer a mandatory field. If you are already using this field to generate clusters, you can exclude it from the dataset manually.

Best practice
We recommend excluding the Description (Detailed Description) field from the configuration to improve the performance and turnaround time of generating clusters.

To update the configuration

In BMC Helix ITSM Insights, click the icon.
The Settings page is displayed.
Select Real-time incident correlation > Configure.
The Real-time incident correlation configuration page is displayed.
In the Data set section, you can view the data fields being used by the system for the configuration. The fields that you select here appear as filter criteria in the Real-time incident correlation dashboard filter.
Success
Tip
The fields that appear in BMC Helix ITSM display their field labels, system names (in brackets), and often display their additional description (in English only) in the data set. Therefore, when you choose amongst similar fields in the data set for creating clusters, we recommend you select the field that displays its label, system name and description. For example, while choosing between CI and CI(HPD_CI), we recommend you select CI(HPD_CI) because it displays the CI label, HPD_CI system name and its description as
.
In the Create clusters section, specify the parameters based on which incident data is grouped.
See Create clusters for more details.
In the Advanced cluster settings (Machine Learning) section, specify the details for generating clusters.
See Advanced ML for more details.
Upload stop word file
See Stop words for more details.
(Optional) Remove personally identifiable information from incident details during clustering.
See Personally identifiable information for more details.
In the Trending and major incident configuration section, specify the criteria for detecting major incidents in the clusters.
See Major incident configuration for more details.
In the Notification & email section, enable the notification feature, and specify the recipient and criteria to receive notifications.
See Notification for more details.
Click Save.

To configure the cluster groups

For the first level of grouping, select up to two fields to group the incidents at the top level for clustering. Only categorization fields are available for selection such as service, CI, and company.
Success
Best practice
We recommend grouping by Assignee - Company (Assigned Support Company), Assignee Support group (Assigned Group) and Company to create clusters with all incidents related to your company and assigned group. Availability of all incidents of your assigned group and company helps you manage their relationships effectively.
Select up to five additional field names for matching incidents to be grouped into a cluster. Only text fields are available for selection.

To configure advanced machine learning

Specify the maximum period that a cluster would stay open from the time an incident is last updated.
This window can range from hours to days. The default value is 7, which means, clusters that are more than seven days old are automatically deleted. However, you can set this value up to a maximum of 30 days.

Specify the similarity threshold in the slider.
Similarity threshold determines how similar the incident descriptions are in relation to the description of the original incident, which is the first incident of a cluster. The similarity threshold can be a value between 1 and 10, the default value being 7. The higher the value you select, the more stringent is the test to match the similarity of the incident, and therefore, the clusters formed are more cohesive and smaller.

View example of similarity threshold

Similarity threshold value	Observation
	A lesser similarity threshold value performs a lenient test to match the similarity of incidents for clustering.
	A higher similarity threshold value performs a stringent test to match the similarity of incidents for clustering.

In most cases, it is observed that the number of incidents in the cluster decreases as you select a higher value of similarity threshold.

Best practice
We suggest to set the threshold similarity to its default value of 7 to generate optimal results.

Specify the minimum number of incidents that a cluster should have, to be shown in the dashboard.
This criterion is used only when creating new clusters. Once created, clusters stay open as long as they have at least one open incident.

To configure stop words

You can use a regular expression to define stop word patterns, such as a combination of words and sentences, which the algorithm can either remove or extract based on your preference while clustering.

In version 23.3.04 and later, you must upload stop word files in YAML (yet another markup language) format as you can no longer upload them in TXT format. However, existing stop word files in TXT format from previous release versions are still supported.

You can download the sample .YAML stop word file and include the following details in it:

List of stop words
Prefix and postfix notations by using wildcards
Patterns of stop words by using regular expressions based on your use case.

The following template examples show how you can define stop word patterns using regular expressions in YAML file.

Tip

If incidents contain template-based details, we recommend using a template-based stop word file that includes regular expressions for removing or extracting stop words, as shown in the examples.
However, for incidents that contain simple stop words without any template, such as other, then, and if, you may define the words in the stop_word section of the YAML stop word file for extraction or removal.

Example 1: Using regular expression to remove words and sentences from getting clustered

Using regular expression to remove words and sentences from getting clustered

While generating clusters in the Real-time incident correlation dashboard, you can define patterns using regular expressions to remove words and sentences from incident details .
This example displays how you can remove words and sentences from the template-based incident.

Template-based Incident details

Reported by: John Smith
Address: 123 Main Street, New York, NY 10001
Email: joe@example.com
Phone: (555) 123-4567
Date of Birth: 07/15/1988
Social Security Number: 123-45-6789
Problem Summary: User unable to connect to the corporate VPN using IP address 192.168.1.101. The VPN access page https://vpn.example.com shows a timeout error after entering credentials.

Template-based Stop word file in YAML

The following stop word file is used to remove the irrelevant details from the incident details while generating clusters:

# Regex section contains regular expressions used for matching patterns in text.
# These can be used for tasks like text extraction and removal.
regex:
  removal:
    # Match email addresses
    - '\b[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}\b'
    # Match URLs
    - '\b((https?|ftp):\/\/[^\s\/$.?#].[^\s]*)\b'
    # Match phone numbers (US format)
    - '\b(?:\+1)?\s?$?\d{3}$?[-.\s]?\d{3}[-.\s]?\d{4}\b'
    # Match phone number (Indian format)
    - '\b(?:91[-.\s]?)?\d{5}[-.\s]?\d{5}\b'
    # Match IP addresses
    - '\b((25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\b'
    # Match dates of birth (MM/DD/YYYY)
    - '\b(0[1-9]|1[0-2])\/(0[1-9]|[12][0-9]|3[01])\/\d{4}\b'
    - '\b\d{3}-\d{2}-\d{4}\b'

# Wildcards section contains patterns with wildcard characters for flexible matching.
# These are used in scenarios where exact matching is not required, allowing for variability.
wildcards:
  # Match any word that ends with .com
  - '%.com'
# Note: All values in each section must be in between single quotes
# (If any of the stop words, regex, or wildcards contain single quotes; enclose such strings within double quotes)

Output

The email addresses, URLs, phone numbers (US and Indian format), IP addresses, and date of birth details are removed from incident details before generating clusters.

Example 2: Using regular expression to extract specific word and sentences pattern for clustering

Using regular expression to extract specific word and sentences pattern for clustering

While generating clusters in the Real-time incident correlation dashboard, you can define patterns using regular expressions to extract certain words and sentences from incident details.
This example displays how you can extract the details after "My requests" in the DESCRIPTION section of the template-based incident.

Template-based incident details

Customer Info:
ID: m756871
Name: ABC (ABC-DEMO.COM)
Email: ABC@xyz.com
Business: CACA
Business Group: CACA Asia Pacific
Enterprise: Agricultural Supply Chain
Phone: 011-2689 6767
Manager: QWERRTY
Region: Asia Pacific
Country:
City:

Form Name: TCE - Requests

DESCRIPTION
Create a personalized description to help you locate this ticket in “My Requests”:: Matrix Execution required by 12/6

REQUEST INFORMATION
What do you need help with today?: I need to make a request related to Master Data

Stop word file in YAML

# Stopwords section contains common words that should be excluded from text processing.
# These words are typically considered insignificant for the purpose of analysis.
stop_words:
- 'Pacific'
- 'Impact'

# Regex section contains regular expressions used for matching patterns in text.

# These can be used for tasks like validation, searching, or text extraction.
regex:
removal:

extraction:

- '(?<=::).*$'

# Wildcards section contains patterns with wildcard characters for flexible matching.
# These are used in scenarios where exact matching is not required, allowing for variability.
wildcards:
  # Match any words that starts with ERR
  - 'Parameter%'
  # Match any word that has prod in between
  - '%prod%'
  # Match any word that ends with .com
  - '%.com'

Output

The Matrix Execution required by 12/6 value from the incident is used for clustering.

Download the sample .YAML stop word file for reference.
Review the sample stop word file to understand the specified format and create the stop word file for uploading.
Define stop words and their patterns using regular expressions in a .YAML file, and validate the file by using any YAML validator.
Warning
Important
While creating a stop word file, you must adhere to the YAML format mentioned in the examples. See Stop word file examples for creating stop word patterns by using regular expressions.
You may validate your regular expressions and YAML file in https://regex101.com/ and https://yamlchecker.com/ respectively.
Upload the .YAML file that contains your stop words for the recurrent job.

Important

Every time you upload a new stop word file, it overrides the old file and removes the existing clusters. The last updated YAML file is used for creating clusters.
While generating cluster labels in the dashboard from the relevant incidents, the algorithm compares the incident description words with the stop word library. Therefore, the cluster labels do not contain words mentioned in the stop word library.

View the use of % in stop words

The following table describes the usage of % in stop words:

Incident summary	Stop word	Description
ITSMInsights is running low on memory	ITSM%	Removes the stop word ITSM and the characters following it. In this case, ITSMInsights is removed from the resulting cluster label.
ITSMInsights is running low on memory	%Ins%	Removes the stop word Ins and the characters preceding and following it. In this case, ITSMInsights is removed from the resulting cluster label.
ITSMInsights is running low on memory	%Insights	Removes the stop word Insights and the characters preceding it. In this case, ITSMInsights is removed from the resulting cluster label.

To configure Personally identifiable information

You can enable the Remove Personally Identifiable Information (PII) toggle to remove the personally identifiable information from the incident details from being clustered.

The following personally identifiable information (PII) are removed from incidents:

Name
Phone number
Email
City name
Credit card details
IP address
Address
US Passport number
Social security number
US driver license number

Important

The algorithm works best in removing PII related to English language, and may fail to detect and remove PII related to other languages as other languages are not supported.
When you enable the Remove Personally Identifiable Information (PII) toggle, the existing clusters are removed, and new clusters are generated.

To configure trend and major incident settings

Enter the following details to configure the trend and major incidents in clusters:

Measure trend over last hour(s): Specify the number of hours for which the trend must be calculated. By default, the trend is calculated for the last two hours.
# of incidents in cluster reaches : Set the incident threshold for a cluster. If incidents reach this limit, irrespective of the timeline, the cluster is flagged as containing at least one possible major incident.
Example:
Value in the field 10.
Day1: Cluster is created with 8 incidents.
Day2: 2 more incidents are added to the cluster.
On day 2, the cluster is flagged as containing at least one possible major incident.
# of incidents in trend window increases by: Set the incident threshold for a cluster within a time range(Trend Window). If incidents reach this limit within the defined time range(Trend Window), the cluster is flagged as containing at least one possible major incident.
Example:
Value in the field 30.
Value in Trend window is 1 hour.
The number of incidents in the cluster is 30 within 1 hour. The cluster is flagged as containing at least one possible major incident.
When the number of incidents reduces below 30, the cluster is no longer flagged.

To configure notification and email settings

Early notification helps major incident managers to assess the impact of the incidents on the overall business even when they are not actively monitoring the dashboard. You can set up notifications for emerging, potential major incidents in Real-time incident correlation clusters. Based on business requirement, you can add other users (all incident assignees, major incident manager of the incident cluster, or any other user) as recipients of the notification.

On the Real-time incident correlation configuration page, in the Notification & email section, turn on the Enable notification for possible major incidents toggle key.

Perform the following actions based on your requirement:

Field	Description
Notify affected incidents assignees	Select this field for notifying all unique assignees of the affected incidents present in the cluster. For example, if a cluster contains 100 incidents, the algorithm finds the unique assignees of the 100 incidents, and sends them notification.
Notify major incident managers of Affected support companies	Select this field for notifying the major incident managers of the affected support group companies. Every incident in the cluster is associated with a company (and a contact company, if applicable) that is mapped to multiple support groups. Every support group has multiple major incident managers. The algorithm finds the major incident managers for all incidents present in the cluster based on the support groups of the incidents' company. The algorithm then sends the notification to those major incident managers.
Notify major incident managers of Affected incident support groups	Select this field for notifying the potential major incident managers of the support group associated with each incident present in the cluster.
Add recipient	Enter the user name (and support group, if applicable) of the recipients who should receive the notification.

Important

You must select or enter a value in at least one of the 3 fields before you can save your changes.

Click Save.

Recipients can select their preferred locale and mode of receiving notification on the CTM:People form. For more details, see Configuring notifications for people records.

Configuring incident correlation to detect similar incident clusters

Best practice for configuring changes

Out-of-the-box configuration for incident correlation

To update the configuration

To configure the cluster groups

To configure advanced machine learning

To configure stop words

Using regular expression to remove words and sentences from getting clustered

Using regular expression to extract specific word and sentences pattern for clustering

To configure Personally identifiable information

To configure trend and major incident settings

To configure notification and email settings

BMC Helix ITSM Insights 25.2

On this page