This documentation supports the 19.08 version of BMC Helix Platform. 
To view an earlier version, select 19.05 from the Product version menu.

Leveraging machine learning metrics to improve cognitive service data sets

BMC Helix Platform Cognitive Service provides auto-categorization, auto-assignment, and chatbot capabilities for an application. To utilize these capabilities, you must train IBM Watson Assistant to work with your data by creating data sets. You can create a CSV file of the data set or select application data, and test the accuracy of the data sets to ensure that the cognitive service and chatbot application correctly auto-categorizes service requests raised by end users.

After testing the Cognitive Service data sets, the test results provide the exact problem area when the data sets do not predict the appropriate categories, so that you can rectify the data sets. The tests are particularly important when you are implementing a new training data set or if you have made major changes to the data sets.

Benefits of testing the cognitive service

BMC Helix Platform provides a tool to test the cogniitive service. These tests are particularly important when you are implementing a new training data set or if you have made major changes to the data sets. Testing the Cognitive Service and chatbot has the following benefits:

  • You do not require prior knowledge of data science to use this tool. 
  • Helps to evaluate the cognitive service on the basis of standard machine learning algorithms. 
  • Helps identify the exact area of problem so that you can rectify the data sets to improve the performance of the cognitive service.
  • Provides a history of the test results. 

Test metrics derived after testing the cognitive service data sets

You can test the BMC Helix Platform Cognitive Service to get the following test metrics:

  • Accuracy—Accuracy is the ratio of number of correct predictions to the total number of input samples.
    For example, if the test results indicate that 9 out of 10 variations of increase RAM request are correctly predicted, the accuracy is 9/10 = 0.9.
  • Recall—Recall is the number of correct positive results divided by the number of positive results predicted by the cognitive service.
    For example, for a search query that contains increase RAM, if the system returns 10 results that contain both both increase and RAM and 8 of those results include the phrase increase RAM, the precision is 8 out of 10. If 20 more instances are related to increase RAM, the recall is 8 out of 30.
    Higher recall indicates higher viability of the data sets.
  • Precision—Precision is the number of correct positive results divided by the total number of relevant samples.
    For example, for a search query that contains increase RAM, the system returns 10 results that contain both increase and RAM and 8 of those results include the phrase increase RAM. In this case, the precision is 8/ 10 = 0.8.
    Higher precision indicates higher viability of the data sets.

  • F-score—F-score is the harmonic average of precision and recall. F-score reaches its best value at 1 (indicating perfect precision and recall) and worst at 0. 
    Traditionally, F-score is calculated as F = 2 × (Precision × Recall) / (Precision + Recall)

Higher precision and recall indicate higher viability of the data sets. For more information about how test metrics are calculated, see FAQs and additional resources.

Additional information about machine learning metrics

The following blogs provide more information about machine learning metrics and macro versus micro average of precision. BMC does not endorse the information in these external links. This information provided in these links should be used for reference purposes only.

  • GreyAtom Blog, Performance Metrics for Classification problems in Machine Learning. https://medium.com/greyatom/
  • Text Mining, Analystics, & More Blog, Computing Precision and Recall for Multi-Class Classification Problems. http://text-analytics101.rxnlp.com
  • Data Science Stack Exchange Question and Answer Site, Micro Average vs Macro average Performance in a Multiclass classification setting. https://datascience.stackexchange.com
  • Rushdi Shams Blog, Micro and Macro-average of Precision, Recall and F-Score. http://rushdishams.blogspot.com

Scenario of testing the cognitive service data sets by using CSV file

Scenario: An organization uses BMC Helix Platform Cognitive Service to automatically route end user service requests for increasing RAM as Hardware | Component | Memory. As an administrator, you create a CSV data set to train the cognitive service to auto-categorize this service request to the correct support group. Before implementing, you also test whether the data set correctly categorizes the service requests.

Example of training data: According to the training data, when an end user requests for increasing RAM, the ticket is categorized as Hardware | Component | Memory. You want to test the data set to check whether variations of the request such as increase laptop RAM and increase RAM on network file server are also categorized as Hardware | Component | Memory. You also specify the percentage of data that must be used for training the cognitive service, for example, 70%. The CSV file is randomly split into training data and test data according to the specified percentage.

Example of test results: The test results CSV file show that the variation Increase RAM on network file server is categorized incorrectly as Network | Router | Remote Access Server. The administrator adds more examples of this variation in the data set so that cognitive service categorizes it correctly.

Maximum amount of test data in a CSV file:  After the system automatically splits the CSV file into training data set and test data set, the limit of the test data is 10,000 rows.

Scenario of testing the cognitive service data sets by using application data

Scenario: Continuing with the above example, an organization uses BMC Helix Platform Cognitive Service to automatically categorize the end user requests for increasing RAM as Hardware | Component | Memory. As an administrator, you want to use application data to train the cognitive service. The Request Service record definition in your application is used to raise service requests.

Example of training data: The administrator selects the Request Service record definition and the Requester, Summary, Description, Categorization Tier 1, Categorization Tier 2, and Categorization Tier 3 fields in the record definition from which data is used to train and test the cognitive service. You also specify the percentage data that must be used for training the cognitive service, for example, 70%. The application data is randomly split into training data and test data according to the specified percentage. After testing the application data, a CSV file of the test results is generated. 

Example of test results: The test results CSV file show that the variation Increase RAM on network file server is categorized incorrectly as Network | Router | Remote Access Server. The administrator adds more examples of this variation in the data set so that the cognitive service categorizes it correctly.

Related topics

Leveraging machine learning metrics to improve chatbot predictability

Types of data sets used to train and test the cognitive service

Training and testing the cognitive service for a custom application

Evaluating the cognitive service test results


Was this page helpful? Yes No Submitting... Thank you

Comments