Leveraging machine learning metrics to improve cognitive service data sets

AI Service Management (Categorization and Classification) provides auto-categorization, auto-assignment, and chatbot capabilities for an application. To utilize these capabilities, you must train IBM Watson Assistant to work with your data by creating data sets. Create a CSV file of the data set or select application data, and test the accuracy of the data sets to ensure that the cognitive service and chatbot application correctly auto-categorizes service requests raised by end users.

Test metrics derived after testing the cognitive service data sets

Test the cognitive service to get the following test metrics:

Accuracy—Accuracy is the ratio of number of correct predictions to the total number of input samples.
For example, if the test results indicate that 9 out of 10 variations of increase RAM request are correctly predicted, the accuracy is 9/10 = 0.9.
Recall—Recall is the number of correct positive results divided by the number of positive results predicted by the cognitive service.
For example, for a search query that contains increase RAM, if the system returns 10 results that contain both increase and RAM and 8 of those results include the phrase increase RAM, the precision is 8 out of 10. If 20 more instances are related to increase RAM, the recall is 8 out of 30.
Higher recall indicates higher viability of the data sets.

Precision—Precision is the number of correct positive results divided by the total number of relevant samples.
For example, for a search query that contains increase RAM, the system returns 10 results that contain both increase and RAM, and 8 of those results include the phrase increase RAM. In this case, the precision is 8/ 10 = 0.8.
Higher precision indicates higher viability of the data sets.
F-score—F-score is the harmonic average of precision and recall. F-score reaches its best value at 1 (indicating perfect precision and recall) and worst at 0.
Traditionally, F-score is calculated as F = 2 × (Precision × Recall) / (Precision + Recall)

Higher precision and recall indicate higher viability of the data sets. For more information about how test metrics are calculated, see FAQs and additional resources.

Additional information about machine learning metrics

The following blogs provide more information about machine learning metrics and macro versus micro average of precision. BMC does not endorse the information in these external links. This information provided in these links should be used for reference purposes only.

GreyAtom Blog, Performance Metrics for Classification problems in Machine Learning. https://medium.com/greyatom/
Text Mining, Analytics, & More Blog, Computing Precision and Recall for Multi-Class Classification Problems. http://text-analytics101.rxnlp.com
Data Science Stack Exchange Question and Answer Site, Micro Average vs Macro average Performance in a Multiclass classification setting. https://datascience.stackexchange.com
Rushdi Shams Blog, Micro and Macro-average of Precision, Recall and F-Score. http://rushdishams.blogspot.com

Scenario of testing the cognitive service data sets by using CSV file

An organization uses AI Service Management (Categorization and Classification) to automatically route end user service requests for increasing RAM as Hardware | Component | Memory. As an administrator, you create a CSV data set to train the cognitive service to auto-categorize this service request to the correct support group. Before implementing, you also test whether the data set correctly categorizes the service requests.

Example of training data

According to the training data, when an end user requests for increasing RAM, the ticket is categorized as Hardware | Component | Memory. You want to test the data set to check whether variations of the request such as increase laptop RAM and increase RAM on network file server are also categorized as Hardware | Component | Memory. You also specify the percentage of data that must be used for training the cognitive service, for example, 70%. The CSV file is randomly split into training data and test data according to the specified percentage.

Example of test results

The test results CSV file show that the variation Increase RAM on network file server is categorized incorrectly as Network | Router | Remote Access Server. The administrator adds more examples of this variation in the data set so that cognitive service categorizes it correctly.

After the system automatically splits the CSV file into training data set and test data set, the limit of the test data is 10,000 rows.

Scenario of testing the cognitive service data sets by using application data

Continuing with the above example, an organization uses AI Service Management (Categorization and Classification) capabilities to automatically categorize the end user requests for increasing RAM as Hardware | Component | Memory. As an administrator, you want to use application data to train the cognitive service. The Request Service record definition in your application is used to raise service requests.

Example of training data

The administrator selects the Request Service record definition and the Requester, Summary, Description, Categorization Tier 1, Categorization Tier 2, and Categorization Tier 3 fields in the record definition from which data is used to train and test the cognitive service. You also specify the percentage data that must be used for training the cognitive service, for example, 70%. The application data is randomly split into training data and test data according to the specified percentage. After testing the application data, a CSV file of the test results is generated.

Example of test results

Benefits

BMC Helix Innovation Studio provides a tool to test the cognitive service. These tests are particularly important when you are implementing a new training data set or if you have made major changes to the data sets. Testing the cognitive service and chatbot has the following benefits:

You do not require prior knowledge of data science to use this tool.
Helps to evaluate the cognitive service on the basis of standard machine learning algorithms.
Helps identify the exact area of the problem so that you can rectify the data sets to improve the performance of the cognitive service.
Provides a history of the test results.

Where to go from here

Measuring the cognitive service consumption

Leveraging machine learning metrics to improve cognitive service data sets

Test metrics derived after testing the cognitive service data sets

Scenario of testing the cognitive service data sets by using CSV file

Example of training data

Example of test results

Scenario of testing the cognitive service data sets by using application data

Example of training data

Example of test results

Where to go from here

On this page