Leveraging machine learning metrics to improve cognitive service data sets
After testing the cognitive service data sets, the test results provide the exact problem area when the data sets do not predict the appropriate categories, so that you can rectify the data sets. The tests are particularly important when you are implementing a new training data set or if you have made major changes to the data sets.
Test metrics derived after testing the cognitive service data sets
Test the cognitive service to get the following test metrics:
- Accuracy—Accuracy is the ratio of number of correct predictions to the total number of input samples.
For example, if the test results indicate that 9 out of 10 variations of increase RAM request are correctly predicted, the accuracy is 9/10 = 0.9. - Recall—Recall is the number of correct positive results divided by the number of positive results predicted by the cognitive service.
For example, for a search query that contains increase RAM, if the system returns 10 results that contain both increase and RAM and 8 of those results include the phrase increase RAM, the precision is 8 out of 10. If 20 more instances are related to increase RAM, the recall is 8 out of 30.
Higher recall indicates higher viability of the data sets.
- Precision—Precision is the number of correct positive results divided by the total number of relevant samples.
For example, for a search query that contains increase RAM, the system returns 10 results that contain both increase and RAM, and 8 of those results include the phrase increase RAM. In this case, the precision is 8/ 10 = 0.8.
Higher precision indicates higher viability of the data sets. - F-score—F-score is the harmonic average of precision and recall. F-score reaches its best value at 1 (indicating perfect precision and recall) and worst at 0.
Traditionally, F-score is calculated as F = 2 × (Precision × Recall) / (Precision + Recall)
Higher precision and recall indicate higher viability of the data sets. For more information about how test metrics are calculated, see FAQs and additional resources.
Scenario of testing the cognitive service data sets by using CSV file
Example of training data
According to the training data, when an end user requests for increasing RAM, the ticket is categorized as Hardware | Component | Memory. You want to test the data set to check whether variations of the request such as increase laptop RAM and increase RAM on network file server are also categorized as Hardware | Component | Memory. You also specify the percentage of data that must be used for training the cognitive service, for example, 70%. The CSV file is randomly split into training data and test data according to the specified percentage.
Example of test results
The test results CSV file show that the variation Increase RAM on network file server is categorized incorrectly as Network | Router | Remote Access Server. The administrator adds more examples of this variation in the data set so that cognitive service categorizes it correctly.
After the system automatically splits the CSV file into training data set and test data set, the limit of the test data is 10,000 rows.
Scenario of testing the cognitive service data sets by using application data
Example of training data
The administrator selects the Request Service record definition and the Requester, Summary, Description, Categorization Tier 1, Categorization Tier 2, and Categorization Tier 3 fields in the record definition from which data is used to train and test the cognitive service. You also specify the percentage data that must be used for training the cognitive service, for example, 70%. The application data is randomly split into training data and test data according to the specified percentage. After testing the application data, a CSV file of the test results is generated.
Example of test results
The test results CSV file show that the variation Increase RAM on network file server is categorized incorrectly as Network | Router | Remote Access Server. The administrator adds more examples of this variation in the data set so that the cognitive service categorizes it correctly.
Benefits
BMC Helix Innovation Studio provides a tool to test the cognitive service. These tests are particularly important when you are implementing a new training data set or if you have made major changes to the data sets. Testing the cognitive service and chatbot has the following benefits:
- You do not require prior knowledge of data science to use this tool.
- Helps to evaluate the cognitive service on the basis of standard machine learning algorithms.
- Helps identify the exact area of the problem so that you can rectify the data sets to improve the performance of the cognitive service.
- Provides a history of the test results.