How incidents are clustered and resolution insights are derived in BMC Helix ITSM Insights
How are the clusters created
For proactive problem management in BMC Helix ITSM Insights, a clustering algorithm is applied to incidents to identify clusters which are candidate for problems. BMC Helix ITSM Insights uses a patented algorithm based on K-means to perform clustering. The clustering algorithm is a machine learning algorithm that groups incidents together based on the similarity of incidents on the Description or Summary field even if their text does not exactly match. The fields on which clustering is to be performed can be selected by the user on the job configuration page.
The information from the incident text fields is extracted to find similarities.
The following techniques are applied to extract and pre-process incident text data:Technique
Description
Tokenization
The phrases or sentences are split into smaller units, such as individual words or terms. Each of these smaller units are called tokens.
Stop word removal
A stop word is a commonly used word (such as “the”, “a”, “an”, “in”) that you would like to ignore when running machine learning algorithms.
On selecting the language, the system automatically adds a stop words list for that language. You can also upload a list of custom stop words to be filtered out before text processing.
Stemming
The words are reduced to their root forms by removing any affixes.
Example: For words such as failing, fails, failed, failure, the root form is fail.
Lemmatization
Inflectional and related forms of a word are reduced to a common base form.
NLP for vectorization
Words or phrases are mapped to a corresponding vector of real numbers.
- Incidents with descriptions such as Please reset my password on VPN, VPN password needs to be reset, and VPN password reset issue, such incidents are clustered together into a cluster called VPN-password-reset as shown in the following diagram.
After the algorithm is run, clusters appear on the dashboard.
For example, you might have clusters named VPN-connectivity-fail or VPN-password-reset and so on with each cluster having a set of incidents that are closely related to each other.
How are the clusters named
A machine learning algorithm called topic modeling algorithm is used to automatically determine the name of the cluster. The algorithm finds the most important and most frequent words in the text of all the incidents in each cluster and deduces a title for each cluster as a three-part name, such as platform-restart-container or VPN-password-reset.
What is the impact of group-by fields in clustering
The group by field allows hard boundaries for clusters before the text-based clustering takes place. For example, if you have selected the first level group by field as service, clusters are formed within each service, as shown in the following diagram. If a group by field is not specified, the clusters on the left side of this diagram can become co-mingled with different services.
Commonly used group by fields are: services, product names, tenant companies, product categories, operational categories. You can also group by custom fields.
How resolution insights summary is derived from incidents
The algorithm uses resolution details from incidents in a cluster. Every incident in the cluster must have resolution details in it for the algorithm to work properly. The resolution details are stored in the incident cluster for further processing.
To derive resolution insights summary, BMC Helix ITSM Insights performs the following actions on the incident cluster:
- Template-based resolution statements are extracted.
- Semantically similar resolution statements are clustered, and the representative resolution statement from each cluster is selected as a candidate for resolution insights generation.
- Based on configuration, the weighted sentence ranking algorithm is applied to derive a fixed number of useful resolution insights statements from the incident cluster.
- Noisy phrases, such as The ticket is closed or Auto resolved, are removed from the candidate resolution insights statements.
These noisy phrases can be updated by the Problem Config user. - An anonymization algorithm is applied to remove personal information of users from the candidate resolution statements.
For more details about natural language processing for resolution insights, see Configuring additional stop words, number of jobs, and default number of clusters.