Phased rollout This version is currently available to SaaS customers only. It will be available to on-premises customers soon.

How BMC Helix ITSM Insights clusters incidents


BMC Helix ITSM Insights uses various machine learning techniques and algorithms to categorize incident data into clusters.

How are the clusters created

For proactive problem management in BMC Helix ITSM Insights, a clustering algorithm is applied to incidents to identify clusters which are candidate for problems.  BMC Helix ITSM Insights uses an industry-standard, open source k-means algorithm to perform clustering. The clustering algorithm is a machine learning algorithm that groups incidents together based on the similarity of incidents on the Description or Summary field even if their text does not exactly match. The fields on which to do the clustering can be selected by the user in the job configuration page.

First, the information from the incident text fields are extracted to find similarity. The following techniques are applied to extract and pre-process incident text data:

Technique

Description

Tokenization

The phrases or sentences are split into smaller units, such as individual words or terms. Each of these smaller units are called tokens.

Stop word removal

A stop word is a commonly used word (such as “the”, “a”, “an”, “in”) that you would like to ignore when running machine learning algorithms.

On selecting the language, the system automatically adds a stop words list for that language. You can also upload a list of custom stop words to be filtered out before text processing.

Stemming

The words are reduced to their root forms by removing any affixes.

Example: For words such as failing, fails, failed, failure, the root form is fail.

Lemmatization

Inflectional and related forms of a word are reduced to a common base form.

NLP for vectorization

Words or phrases are mapped to a corresponding vector of real numbers.

If you have incidents with descriptions such as Please reset my password on VPN, VPN password needs to be reset, and VPN password reset issue, such incidents are clustered together into a cluster called VPN-password-reset as shown in the following diagram.

After running the algorithm, you can visualize many clusters on the dashboard. For example, you might have clusters named VPN-connectivity-fail or VPN-password-reset and so on with each cluster having a set of incidents that are closely related to each other.  

Clustering.png

How are the clusters named

A machine learning algorithm called topic modeling algorithm is used to automatically determine the name of the cluster. The algorithm finds the most important and most frequent words in the text of all the incidents in each cluster and deduces a title for each cluster as a three-part name, such as platform-restart-container or VPN-password-reset.

Important

Sometimes multiple clusters with similar or same names are observed on the dashboard. This situation occurs when:

  • Similar type of clusters are formed in various groups (clusters driven by group-by fields such as company, department, operational categories etc. For example, if the grouping is done by company and each company has clusters related to virtual machine issues, there might be clusters with the name as 'virtual-machine-unreachable'.
  • You allow the system to set the optimal number of clusters. This setting might result in discrete clusters and some clusters might have the same name or similar name.


What is the impact of group-by fields in clustering

The group by field allows hard boundaries for clusters before the text-based clustering takes place. For example, if you have selected the first level group by field as service, clusters are formed within each service, as shown in the following diagram. If a group by field is not specified, the clusters on the left side of this diagram can become co-mingled with different services.

Group by in Clustering.PNG

Commonly used group by fields are: services, product names, tenant companies, product categories, operational categories. You can also group by custom fields.

 

Tip: For faster searching, add an asterisk to the end of your partial query. Example: cert*