- Click Manage Job.
- On the Proactive problem management settings page, click configure one-time job.
- In the General section, enter the job name and select the language for the incident text processing.
Even though your incident text contains mixed language, pre-processing is done on the basis of the selected language.

- In the Data Set section, you can specify the data fields on which you can create clusters and filter data. Select the fields on which you want to create clusters.

See Data set filters for more details. - In the Data range section, specify the date range that the job should use to search incident data.
See Data range for more details. - In the Create clusters section, specify the parameters by which the data is grouped.
See Create clusters for more details. - In the Advanced machine learning section, specify the number of clusters displayed in the dashboard.
See Advanced machine learning for more details. - Upload stop word file.
See Stop words for more details. - (Optional) Remove personally identifiable information from incident details during clustering.
See Personally identifiable information for more details. - In the Resolution insights section, enter the parameters of Resolution insights.
See Resolution insights for more details. - Click Run now.
A job might take several minutes to complete, depending on the incident data to be processed. Refresh the jobs table to check if the job is completed.
When the job run is completed, the jobs table displays job status in the Jobs table. If the job run was not successful, the Job message column displays the reason why the job failed. Once the job is successfully run, you can select it from the Jobs list in the dashboard to view the clusters.
Important
- Depending on the number of records or incidents and the kind of incident data, the number of clusters on the dashboard could be less than the number you have specified.
- Running a job multiple times with the same parameters may generate a different number of clusters. You may notice a slight difference in the number of clusters generated under the following conditions:
- When you use Group by fields and let the system generate the ideal number of clusters.
- When you use Group by fields and provide a cluster value.
- The algorithm groups incidents that have similar words in their description and avoids generating one-word cluster labels in the dashboard. Therefore, if an incident has a one-word description, the algorithm finds an existing cluster that has incidents with that word in its description sentence and adds the incident to that cluster.
To view the required system fields, select the Show fields required by the system check box. The system fields cannot be removed. Except for Submitter, all other system fields cannot be removed. Some fields may be hidden from the data set by your admin to comply with privacy regulations.
Tip
The fields that appear in BMC Helix ITSM display their field labels, system names (in brackets), and often display their additional description (in English only) in the data set. Therefore, when you choose amongst similar fields in the data set for creating clusters, we recommend you select the field that displays its label, system name and description. For example, while choosing between CI and CI(HPD_CI), we recommend you select CI(HPD_CI) because it displays the CI label, HPD_CI system name and its description as
.
You can specify filters on the fields to further refine your data set.
Click the required filter category and select one of the following options:
| |
---|
| Select this option to include values in the filter.
 |
| Select this option to exclude values from the filter.
 |
- Search and select the field value that you want to include or exclude.
Click Apply filters.
Note
- Searching for a string without a wildcard (%) is not supported in a filter that has a text field. We recommend using a wildcard (%) for a search in such filters.
- The Equals to and Not equals to options appear only in fields with a character menu, such as Service CI.
Enter the date range in the Data range date field field within which you want to search for incidents.
Best practice
Define your date range based on the problem management process and review cycles for problem identification in your organization. Typically, the date range is the previous 1-4 weeks of data.

- For the first level of grouping, select up to three fields in Group by (max 3) to group the incidents at the top level for clustering.
You can select only the categorization fields that are selected in the Data set section, such as Status and Priority. For matching incidents to be grouped into a cluster, select up to five additional field names in Inputs for machine learning.
You can select only the text fields that are selected in the Data set section. If no field is selected in Group by (max 3), it is mandatory to select at least one field in Inputs for machine learning for clustering.
Summary is the default text field used for clustering. If more than one text field is provided, these fields are concatenated into one field.

View how existing group by configuration appear after the 23.3.03 update
Group by configuration in existing job | Group by configuration after 23.3.03 update |
---|
Only Machine learning selected in group by (level 1). 
| The Machine learning option is removed from Group by (level 1). The Inputs for machine learning field appears by default. 
|
Categorical field is selected in level 1 Text field is selected as input for machine learning in level 2 
| Group by (level 2) is removed. The Inputs for machine learning field appears by default. 
|
Categorical field is selected in level 1 Categorical field is selected in level 2 
| Fields in the Group by (level 2) appear in Group by (max 3). You can select up to 3 fields in Group by (max 3). Summary (Description) is selected by default for machine learning. 
|
Important
If you do not select any text field in Inputs for machine learning, the algorithm groups clusters based on categorical fields.
If you select text fields in Inputs for machine learning, the algorithm uses machine learning to groups clusters based on text fields.
To allow the system to set the number of clusters, click the Let the system set the number of clusters check box.
Best practice
When selecting the number of clusters, we recommend selecting the Let the system find no. of clusters check box rather than setting the number of clusters yourself. The system automatically selects an optimal number of clusters. However, when you know the number of clusters from prior execution runs or domain knowledge, you can specify a value to improve the response time. We recommend setting 20-30 clusters for optimal incident monitoring.

You can use regular expressions to define stop word patterns, such as a combination of words and sentences, which the algorithm can either remove or extract based on your preference while clustering.
In version 23.3.04 and later, new jobs only support stop word files in YAML (yet another markup language) format. However, older jobs created in previous release versions still support stop word files in TXT format.
You can download the sample .YAML stop word file and include the following details in it:
- List of stop words
- Prefix and postfix notations by using wildcards
- Patterns of stop words by using regular expressions based on your use case.

The following examples show how you can define stop word patterns using regular expressions in a YAML file:
Tip
If incidents contain template-based details, we recommend using a template-based stop word file that includes regular expressions for removing or extracting stop words, as shown in the examples.
However, for incidents that contain simple stop words without any template, such as other, then, and if, you may define the words in the stop_word section of the YAML stop word file for extraction or removal.
Example 1: Using regular expression to remove words and sentences from getting clustered
Using regular expressions to remove words and sentences from getting clustered
While generating clusters in the Proactive problem management dashboard, you can define patterns by using regular expressions to remove words and sentences from incident details.
This example displays how you can remove words and sentences from the template-based incident.
Template-based Incident details
Reported by: John Smith
Address: 123 Main Street, New York, NY 10001
Email: joe@example.com
Phone: (555) 123-4567
Date of Birth: 07/15/1988
Social Security Number: 123-45-6789
Problem Summary: User unable to connect to the corporate VPN using IP address 192.168.1.101. The VPN access page https://vpn.example.com shows a timeout error after entering credentials.
Template-based Stop word file in YAML
The following stop word file is used to remove the irrelevant details from the incident details while generating clusters:
# Regex section contains regular expressions used for matching patterns in text.
# These can be used for tasks like text extraction and removal.
regex:
removal:
# Match email addresses
- '\b[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}\b'
# Match URLs
- '\b((https?|ftp):\/\/[^\s\/$.?#].[^\s]*)\b'
# Match phone numbers (US format)
- '\b(?:\+1)?\s?\(?\d{3}\)?[-.\s]?\d{3}[-.\s]?\d{4}\b'
# Match phone number (Indian format)
- '\b(?:91[-.\s]?)?\d{5}[-.\s]?\d{5}\b'
# Match IP addresses
- '\b((25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\b'
# Match dates of birth (MM/DD/YYYY)
- '\b(0[1-9]|1[0-2])\/(0[1-9]|[12][0-9]|3[01])\/\d{4}\b'
- '\b\d{3}-\d{2}-\d{4}\b'
# Wildcards section contains patterns with wildcard characters for flexible matching.
# These are used in scenarios where exact matching is not required, allowing for variability.
wildcards:
# Match any word that ends with .com
- '%.com'
# Note: All values in each section must be in between single quotes
# (unless any of the stop words, regex, or wildcards contain single quotes; enclose such strings within double quotes)
Output
The email addresses, URLs, phone numbers(US and Indian format), IP addresses, and date of birth details are removed from incident details before generating clusters.
Example 2: Using regular expression to extract specific word and sentences pattern for clustering
Using regular expressions to extract specific word and sentences pattern for clustering
While generating clusters in the Proactive problem management dashboard, you can define patterns by using regular expressions to extract certain words and sentences from incident details.
This example displays how you can extract the details after "My requests" in the DESCRIPTION section of the template-based incident.
Template-based incident details
Customer Info:
ID: m756871
Name: ABC (ABC-DEMO.COM)
Email: ABC@xyz.com
Business: CACA
Business Group: CACA Asia Pacific
Enterprise: Agricultural Supply Chain
Phone: 011-2689 6767
Manager: QWERRTY
Region: Asia Pacific
Country:
City:
Form Name: TCE - Requests
DESCRIPTION
Create a personalized description to help you locate this ticket in “My Requests”:: Matrix Execution required by 12/6
REQUEST INFORMATION
What do you need help with today?: I need to make a request related to Master Data
Stop word file in YAML
# Stopwords section contains common words that should be excluded from text processing.
# These words are typically considered insignificant for the purpose of analysis.
stop_words:
- 'Pacific'
- 'Impact'
# Regex section contains regular expressions used for matching patterns in text.
# These can be used for tasks like validation, searching, or text extraction.
regex:
removal:
extraction:
- '(?<=::).*$'
# Wildcards section contains patterns with wildcard characters for flexible matching.
# These are used in scenarios where exact matching is not required, allowing for variability.
wildcards:
# Match any words that starts with ERR
- 'Parameter%'
# Match any word that has prod in between
- '%prod%'
# Match any word that ends with .com
- '%.com'
Output
The Matrix Execution required by 12/6 value from the incident is used for clustering.
- Download the sample .YAML stop word file for reference.
Review the sample stop word file to understand the specified format and create the stop word file for uploading.

Define stop words and their patterns using regular expressions in a .YAML file, and validate the file using any YAML validator.
- Upload the .YAML file that contains your stop words for the recurrent job.
Important
- You can continue using the stop word file that was uploaded in .TXT format in the previous releases. However, in version 23.3.04 and later, you must use a YAML file to define stop words and their patterns.
- Every time you upload a new stop word file, it overrides the old file. The last updated YAML file is used for creating clusters.
- While generating cluster labels in the dashboard from the relevant incidents, the algorithm compares the incident description words with the stop word library. Therefore, the cluster labels do not contain words mentioned in the stop word library.
Proactive problem management has a built-in library of stop words for every supported language. The algorithm refers to the library and your preferred stop words while processing incident information.
View default stop words of English language
View the use of % in stop words
The following table describes the usage of % in stop words:
| | |
---|
ITSMInsights is running low on memory | | Removes the stop word ITSM and the characters following it. In this case, ITSMInsights is removed from the resulting cluster label. |
ITSMInsights is running low on memory | | Removes the stop word Ins and the characters preceding and following it. In this case, ITSMInsights is removed from the resulting cluster label. |
ITSMInsights is running low on memory | | Removes the stop word Insights and the characters preceding it. In this case, ITSMInsights is removed from the resulting cluster label. |
You can enable the Remove Personally Identifiable Information (PII) toggle to remove the personally identifiable information from the incident details from being clustered.
The following personally identifiable information (PII) are removed from incidents:
- Name
- Phone number
- Email
- City name
- Credit card details
- IP address
- Address
- US Passport number
- Social security number
- US driver license number
Important
- The algorithm works best in removing PII related to English language, and may fail to detect and remove PII related to other languages as other languages are not supported.
- When you enable the Remove Personally Identifiable Information (PII) toggle, the existing clusters are removed, and new clusters are generated.
- Click the Enable toggle key, and select the following parameters to derive accurate resolution notes for incidents.
Select the source field name of the incident in ITSM from which the algorithm derives the resolution note.
By default, Resolution note (Resolution) is selected.
Tip
If you want to select a custom textual field as the source field, you must first add it to the data set.
- Select the minimum number of incidents required in a cluster for the algorithm to generate the resolution insight details.
By default, at least five incidents must be present in a cluster to generate the resolution insights details. Select the maximum number of resolution insights clusters to be displayed in the drill-down view.
A maximum of 25 resolution insights clusters can be displayed in the drill-down view.

Important
When you run a job with resolution insights enabled, the job runs in two iterations. The first iteration of the job run generates Proactive problem management clusters in the dashboard and the second iteration generates the resolution insights clusters. We recommend you wait until the second job run iteration is complete before viewing the Grouped by resolution insights tab.
To edit a recurrent job, click the edit icon in the Actions column. Make the necessary changes to the job.
The changes you have done will take effect in the next job run.
To delete a recurrent job, click the delete icon in the Actions column.
When you delete a job, the job definitions, that is, the data fields and filters applied are also deleted. Also, all job runs associated with that job are deleted.