How BMC Helix ITSM Insights clusters incidents

BMC Helix ITSM Insights uses various machine learning techniques and algorithms to categorize incident data into clusters.

How are the clusters created

For proactive problem management in BMC Helix ITSM Insights, a clustering algorithm is applied to incidents to identify clusters which are candidate for problems. BMC Helix ITSM Insights uses an industry-standard, open source k-means algorithm to perform clustering. The clustering algorithm is a machine learning algorithm that groups incidents together based on the similarity of incidents on the Description or Summary field even if their text does not exactly match. The fields on which to do the clustering can be selected by the user in the job configuration page.

First, the information from the incident text fields are extracted to find similarity. The following techniques are applied to extract and pre-process incident text data:

Technique	Description
Tokenization	The phrases or sentences are split into smaller units, such as individual words or terms. Each of these smaller units are called tokens.
Stop word removal	A stop word is a commonly used word (such as “the”, “a”, “an”, “in”) that you would like to ignore when running machine learning algorithms. On selecting the language, the system automatically adds a stop words list for that language. You can also upload a list of custom stop words to be filtered out before text processing.
Stemming	The words are reduced to their root forms by removing any affixes. Example: For words such as failing, fails, failed, failure, the root form is fail.
Lemmatization	Inflectional and related forms of a word are reduced to a common base form.
NLP for vectorization	Words or phrases are mapped to a corresponding vector of real numbers.

If you have incidents with descriptions such as Please reset my password on VPN, VPN password needs to be reset, and VPN password reset issue, such incidents are clustered together into a cluster called VPN-password-reset as shown in the following diagram.

After running the algorithm, you can visualize many clusters on the dashboard. For example, you might have clusters named VPN-connectivity-fail or VPN-password-reset and so on with each cluster having a set of incidents that are closely related to each other.

How are the clusters named

A machine learning algorithm called topic modeling algorithm is used to automatically determine the name of the cluster. The algorithm finds the most important and most frequent words in the text of all the incidents in each cluster and deduces a title for each cluster as a three-part name, such as platform-restart-container or VPN-password-reset.

Important

Sometimes multiple clusters with similar or same names are observed on the dashboard. This situation occurs when:

Similar type of clusters are formed in various groups (clusters driven by group-by fields such as company, department, operational categories etc. For example, if the grouping is done by company and each company has clusters related to virtual machine issues, there might be clusters with the name as 'virtual-machine-unreachable'.
You allow the system to set the optimal number of clusters. This setting might result in discrete clusters and some clusters might have the same name or similar name.

What is the impact of group-by fields in clustering

The group by field allows hard boundaries for clusters before the text-based clustering takes place. For example, if you have selected the first level group by field as service, clusters are formed within each service, as shown in the following diagram. If a group by field is not specified, the clusters on the left side of this diagram can become co-mingled with different services.

Group by in Clustering.PNG

Commonly used group by fields are: services, product names, tenant companies, product categories, operational categories. You can also group by custom fields.

How BMC Helix ITSM Insights clusters incidents

How are the clusters created

How are the clusters named

What is the impact of group-by fields in clustering

On this page