Training and testing the cognitive service for a custom application

To use AI Service Management (Categorization and Classification) capabilities for auto-categorization and auto-assignment, administrators must train the cognitive service. You can update the sample data set or create new data sets and must use the data sets in existing or new processes or rules for the application.

You can create data sets in the following ways:

By using a CSV file
By using application data

You must specify the percentage of data that can be used for training the cognitive service. By default, 80% of the data set is used for training and 20% is used as test data.

Important

To train and test BMC Native (Google) classification, make sure that you do not create and train more than 100 data sets. If you exceed this limit, the following error is generated:
Data set training job has failed: datasetAbsoluteName : data set name job ID: job ID

Process for training and testing cognitive service

The following image explains the tasks that an administrator must perform to train and test the cognitive service by using a CSV data set or application data:

22_1_training_and testing _cognitive_service.png

The following table describes the steps to train and test the cognitive service:

Create data sets
Type of data set	Task	Description	Reference
CSV	Create a CSV data set	Create the CSV data set according to the defined structure and guidelines.	Types of cognitive data sets
CSV	Upload the CSV file	Upload the CSV data and specify the percentage of rows that you want to use as training data and test data. The system randomly splits the CSV file into training data set and test data set.	To upload the CSV data file
Application data	(Optional) Upload seed data	To start the cognitive service training, initially you might not have application data. In this case, the seed data acts a startup data for machine learning. The cognitive service learns by using the seed data initially and then picks up the application data. Important: The seed data is provided in a CSV file.	To create a sample CSV data set file, see Types-of-data-sets-used-to-train-and-test-the-cognitive-service To see the procedure to upload the sample CSV data set, see To upload seed data
Application data	Specify the application data that you want to use for training	You must select the fields in the record definitions from which data is used for training and test the cognitive service. To ensure that the training data does not go beyond the limit specified by IBM Watson, you can define a condition to further filter the data. Example: You select the Service request record definition and the Summary and Category fields in that record definition. You also define a condition such as Status = Open and Priority = High so that only the data which matches these conditions are used for training.	To specify application data that you want to use for training
Train the cognitive service
CSV	Train the cognitive service	Select the CSV training data set that you uploaded earlier to train the cognitive service.	To train the cognitive service
Application data	Train the cognitive service	Select the application data that you specified earlier to train the cognitive service.	To train the cognitive service
Evaluate the cognitive service training
CSV and Application data	Evaluate whether the cognitive service is trained correctly.	After you train the cognitive service, you can evaluate the cognitive service training for auto-categorization or auto-assignment. For auto-assignment, the cognitive service returns the login IDs of the assignee. When you create assignment training data sets, you must ensure that the assignee belongs to the Agents group in the Foundation library.	To evaluate the cognitive service training

Before you begin

Before you train or test the cognitive service by using the CSV file or by using application data, make sure that you have completed the following tasks:

To be able to train or test the IBM Watson or BMC Native (Google) classification, administrators must create an application configuration UI.
For more information, see Enabling-a-custom-application-for-cognitive-service.
Based on the classification service provider you select, perform one of the following tasks. To know more about classification service providers, see Centralized-configuration.
- To communicate with IBM Watson Assistant, add the Watson credentials in BMC Helix Innovation Studio.
  For more information, see Configuring-cognitive-service-for-custom-applications-by-using-IBM-Watson-activated-by-BMC.
  Important
  If you want to update the IBM Watson Assistant API key, ensure that you complete the following tasks:
  - For csv data sets, take back-up of existing data sets
  - Delete the data sets and restore them, if required.
- To communicate with Google Cloud Platform and use BMC Native (Google) classification service, add the service account credentials in BMC Helix Innovation Studio.
  For more information, see Configuring-cognitive-service-for-a-custom-application-by-using-BMC-Native-Google-classification.

If you want to use application data for training data sets, identify the record definition and fields that you want to use for training and testing.

To upload a CSV file

After creating the CSV data set, perform the following steps to upload the CSV file:

Log in to BMC Helix Innovation Studio and navigate to the Administration tab.
Click the configuration that you created for training the cognitive service.
For example, select My application > Cognitive Training.
Based on the value defined in the Classification-Service-Provider setting, one of the following tabs is displayed:
Value
Tab
WATSON
Auto-classification Training and Evaluation - IBM Watson
NATIVE
Auto-classification training and evaluation - BMC Native (Google)
For information about the Classification-Service-Provider setting, see Configuration-settings-C-D.
From the selected tab, in the Data Sets section, click New, and select CSV Data Set.

The following image is an example of uploading CSV data sets:
Fill out the Data Set Name and Description fields.
In Training Type, the option is populated automatically based on the value defined in the Classification-Service-Provider setting.
You cannot change this option.
In CSV File, select the CSV training data set file that you created earlier.
From the Locale list, select the locale of the training data set.
Important
Arabic (ar) and Japanese (ja) locales are not supported, for BMC Native (Google) for natural language classification.
In Training Data, select the percentage of the CSV data that you want to use as training data.
In Testing Data, the percentage of CSV data that you want to use as test data is automatically calculated according to the Training Data percentage.
Click Save.

Value	Tab
WATSON	Auto-classification Training and Evaluation - IBM Watson
NATIVE	Auto-classification training and evaluation - BMC Native (Google)

To use application data in a training data set

Perform the following tasks when you want to use application data to train and test the cognitive service:

To upload seed data
To specify application data that you want to use for training and testing

To upload seed data

Log in to BMC Helix Innovation Studio and navigate to the Administration tab.
Click the configuration that you created for training the cognitive service.
For example, select My application > Cognitive Training.
Click Configure Training Data Sets.
On the Training Data Sets section, click New, and select Platform Data Set.
Fill out the Data Set Name and Description fields.
In Training Type, the option is populated automatically based on the value defined the Classification-Service-Provider setting.
You cannot change this option.
In CSV File, select the CSV data set file that you created earlier.
From the Locale list, select the locale of the training data set.
Important
Arabic (ar) and Japanese (ja) locales are not supported, for BMC Native (Google) for natural language classification.
In Record Definition Name, select the record definition that you want to use to provide data to the cognitive service.
In Text Fields, click Add/Remove Text Fields and select one or more text fields that contain the text values for the cognitive service to classify. If you select more than one text field, the values are concatenated.

In Category Fields, click Add/Remove Category Fields and select one or more category fields to classify the values in the text fields. If you select more than one text field, the values are concatenated.

Click Save.

To specify application data that you want to use for training and testing

Log in to BMC Helix Innovation Studio and navigate to the Administration tab.
Click the configuration that you created for training the cognitive service.
For example, select My application > Cognitive Training.
Click Configure Training Data Sets.
On the Training Data Sets section, select one of the following options:
- If you have not uploaded seed data, click New, and select Platform Data Set.
- If you have uploaded seed data, click the name of the data set in which you uploaded the seed data.
Modify the data set in which you have uploaded the seed data.
If you have uploaded the seed data
1. On the Edit Platform Data Set page, in Record Definition Name, select the record definition that you want to use to provide data to the cognitive service.
2. In Text Fields, click Add/Remove Text Fields and select one or more text fields that contain the text values for the cognitive service to classify.
3. In Category Fields, click Add/Remove Category Fields and select one or more category fields to classify the values in the text fields.
4. (Optional) In the Data Split section, in Training Data, specify the percentage of data that you want to use as training data.
  For new data sets, by default, 80% of the CSV seed data is used as Training Data, and the remaining 20% is used as Test Data. For existing data sets, 100% of the rows are used as training data.
  The percentage of CSV data that you want to use as Test Data is automatically calculated according to the Training Data percentage.
Specify the application data that you want to use for training.
If you have not uploaded the seed data
1. On the New Platform Data Set page, fill out the Data Set Name and Description fields.
2. From the Locale list, select the locale of the training data set.
  Important
  Arabic (ar) and Japanese (ja) locales are not supported, for BMC Native (Google) for natural language classification.
3. In Record Definition Name, select the record definition that you want to use to provide data to the cognitive service.
4. In Text Fields, click Add/Remove Text Fields and select one or more text fields that contain the text values for the cognitive service to classify. If you select more than one text field, the values are concatenated.
1. In Category Fields, click Add/Remove Category Fields and select one or more category fields to classify the values in the text fields. If you select more than one text field, the values are concatenated.
2. In Data Split, in Training Data, change the percentage of data that you want to use as training data.
  The percentage of Test Data is automatically calculated.
The new training data set is displayed in the Auto-classification Training and Evaluation section. An administrator can delete the training data set or create a copy of the existing training data set.
Click Save.

To train and test the cognitive service

After you have uploaded the CSV data set or selected the application data, you can train and test the cognitive service.

Best practice

We recommend that you do not change the value defined in the Classification-Service-Provider setting if the status of the training data set is Training. If you change the value defined in the Classification-Service-Provider setting, the training data set with status as Training is not displayed in the grid.

Log in to BMC Helix Innovation Studioand navigate to the Administration tab.
Click the configuration that you created for training the cognitive service.
For example, select My application > Cognitive Training.
In the Auto-classification Training and Evaluation section, select the training data set that you want to use for training, and click Train and Test.
The data set is randomly split into training data set and test data set based on the percentage that you specified earlier.
The status of the training data is changed to Training and when the training is completed successfully, the status of the training data set is changed to Trained.

Warning

If the status of the training data set is Trained and if you change the value defined in the Classification-Service-Provider setting, then the Auto-classification Training and Evaluation section displays the data sets from the Classification-Service-Provider you selected. The trained data sets from the earlier Classification-Service-Provider are not displayed in the grid.

You do not lose the trained data sets after you change the Classification-Service-Provider. The data sets will be displayed when you switch back to the other Classification-Service-Provider.

Where to go from here

To understand how to evaluate the cognitive service test results, see Evaluating-the-cognitive-service-test-results.