Discovery Training Tool


Discovery Training Tool provides automatic training for project data in IBM Watson Discovery V2. This data is used as a database that responds to queries in HVA. The trained data provides most accurate response to the users. IBM deprecated IBM Watson Discovery V1. 

Administrators had a trained collection in IBM Watson Discovery V1 that provided accurate answers to user queries asked through chatbot. As part of migration process to IBM Watson Discovery V2, users may want to retain the training data that was available in V1 instance.

To get access to the IBM Watson Discovery V1 features, administrators need to get all the relevant information that is available into V2 instance. However, IBM does not offer the copy paste process to get the data from V1 to V2. As a result, the administrator needs to ingest the collections in V2 instance. Administrator ingests the collections by using BMC crawler utility. After the source data in the V1 instance, such as record definition and knowledge articles is migrated to the V2 instance, the user needs to restore all the training data of the V1 instance and migrate it to the V2 instance.   

Before you begin

Make sure that you have completed the following tasks:

Product

Task

System configuration

You must have the following system and software:

  • A Linux, Unix, or Windows machine on which you can run the BMC crawler utility.
  • Java version 11 or later.

IBM Watson Discovery

You must have the following details of the IBM Watson Discovery instance to which you want to upload the BMC Helix ITSM: Knowledge Management articles:


    • Identity and Access Management (IAM) API key
    • Endpoint URL
    • IBM Watson Discovery V2

You can get the IAM API key and the endpoint URL image-2023-7-26_10-26-6.png by logging in to IBM Cloud.


To migrate IBM Watson Discovery V1 training data to IBM Watson Discovery V2

  1. Add the questions in the <filename>.txt. Refer to Required Filestable and sample files.
  2. Modify the <filename>. properties file. Refer to Required Filestable and sample files.
  3. Download the Discovery Training jar file.
  4. Ensure that the Java archive and <filename>.txt files are in the same folder.
  5. Open the command prompt and run the JAR file.

    java -DdiscoveryTrainingPropFile=<sample-training-propertis-file>.properties -jar com.bmc.dsm.DiscoveryTraining-99.0.00-SNAPSHOT.jar

    Java utility converts the <filename>.txt (file that has user queries) to a *.csv file. objects.

    The system stores the results in the generate *.csv, file, with the following data:

    • Question
    • Document ID
    • Collection ID
    • Title
    • Passage text: first 400 characters
    • Confidence answer
    • Confidence passage 
  6. Run the IBM query API search process. 
    After successful run, you can go access the V2 instance and customize the training data.
  7. Fine tune the training data further.


Required files 

Following table describes the files and its attributes that are required to run the Discovery Training tool. You must set the attributes in the properties before running the Discovery Training tool.

File name 

Description

Attributes or examples

<filename>.txt

Administrator must prepare a list of questions that are similar to user queries. 

Warning

Important

The <filename>.txt file must contain a minimum of 51 questions.

To prepare the questions, the administrator must have prior knowledge about the data in the IBM Watson Discovery instance.

Example: <filename>.txt


<filename>. properties

This is a configuration file. Ensure that this file is available in the IBM Watson V2 instance.

Mandatory attributes:

  • questionsFile: File that has user queries.
  • trainingFile: File that is autogenerated by the Discovery tool.
  • version: Latest version of IBM api calls
  • API key
  • URL IBM Watson Discovery v2 URL
  • Project ID
  • Collection ID

Optional attributes:

  • desireConfidenceKey: Desire confidence rank to train the data (default value is 0.01)
  • desireExampleKey: Number of examples that will be used to train the data (default value is 10)
  • resultsCountKey: Number of results that IBM Watson Discovery V2 instance returns from query request API (default value is 10)
  • Field names separated with comma
  • Field Name Key: Default values are id, confidence, extracted_metadata, text, metadata and title)
  • resultsCharsKey: Number of the characters that are returned in the "TEXT" field (default value is 400)
  • toolModeKey: Tool mode key queries the project or trains the project data. Valid values are query and train (default value is train)


Mandatory attributes:

  • questionsFile=<Filename.txt>
  • trainingFile=<Filename.csv>
  • version=2023-03-31
  • apikey=<watson_discovery_v2_apikey>
  • url=<watson_discovery_v2_url>
  • projectId=<project_id>
  • collectionId=<collection_id>

Optional attributes:

  • desireConfidenceKey=<desireConfidenceKey>
  • desireExampleKey=<desireExampleKey>
  • resultsCountKey=<resultsCountKey>
  • Field name1, Field name2,.... 
  • fieldNamesKey=<fieldNames1,fieldName2...>
  • resultsCharsKey=<resultsCharsKey>
  • toolModeKey=query



Related Topics

 

Tip: For faster searching, add an asterisk to the end of your partial query. Example: cert*

BMC Helix Innovation Suite 25.4