This documentation supports the 20.02 version of BMC Helix Platform. 
To view an earlier version, select 19.11 from the Product version menu.

Creating file system and database collections for searching external data

To include file system and database tables that are outside of Remedy Knowledge Management, you must create an IBM Watson Discovery collection. After you create the collection, use the IBM Watson Data Crawler to upload this data to the IBM Watson Discovery collection. You can then map the collection to an external search data set in BMC Helix Platform.

Note

If you want to include Remedy Knowledge Management articles in cognitive insights, you must install and configure the BMC crawler. For more information, see Installing and configuring the cognitive search data crawler for Remedy Knowledge Management articles.

The following image outlines the tasks that you must perform to create the collection in IBM Watson Discovery. After you create the collection, you can upload your external data into this collection.

Before you begin

  • You must have the IBM Watson Discovery account credentials (API key). 
  • You must have access to a Linux virtual machine on which the data crawler is set up. 
  • You must have the username and password of the database for which you want to crawl data.

To create a configuration for the collection 

Perform the following steps to create the configuration required for defining the IBM Watson Discovery collection.

  1. Log in to the virtual machine.
  2. Create a configuration file for a file system or database by using the JSON file.
  3. If an environment is not created in your IBM Watson Discovery instance, at the command prompt, run the following sample command to create the environment in IBM Watson. 

    Command to create an environment
    curl -X POST -u "apikey":"<apikey>" -H "Content-Type: application/json" -d '{"name": "my_environment", "description": "My environment"}' <IBM Watson Discovery instance URL>

    Note

    You can run the command by using an API key your IBM Watson Discovery account.

    Note down the environment ID for the next step.

  4. Run the following command to upload the configuration in IBM Watson Discovery.

    Command to upload a configuration
    curl -X POST -u "apikey":"<api key>" -H "Content-Type: application/json" -d @<Name and location of configuration JSON file> <IBM Watson Discovery instance URL>

    Replace values in the command placeholders as follows:

      • apikeyEnter the API key of your IBM Watson Discovery service instance.
      • Environment IDEnter the value that was generated in step 3.
      • IBM Watson Discovery instance URLEnter the URL of your IBM Watson Discovery instance, such as https://gateway-syd.watsonplatform.net/discovery/api/v1/environments/Environment ID/configurations?version=2018-12-03
      • Name and location of the configuration JSON fileEnter the absolute path for the configuration file.
  5. Perform this step if you want to use the IBM Watson data crawler to upload data from external file system to the IBM Watson Discovery collection.

    Example of command to upload articles from external database or file system
    curl -X PUT -u "apikey":"<api key>" -H "Content-Type: application/json" -d @<json> "https://gateway-syd.watsonplatform.net/discovery/api/v1/environments/<ENV ID>/configurations/<CONFIGID>?version=2018-10-15"

To create the IBM Watson Discovery collection

After you have created the configuration, perform the following steps to create the IBM Watson Discovery collection.

  1. Log in to the IBM Cloud console.

  2. From the IBM Cloud dashboard, select the IBM Watson Discovery service instance.

  3. Click Launch tool. 

    The list of existing collections, if any, is displayed.

  4. To create a new collection, click Upload your own data.

  5. In the Name you new collection dialog box, enter the following values and click Create.

    FieldAction
    Collection nameEnter a name for the collection.
    Select the language of you documentsSelect the language in which your documents are written.

    A new collection is created.

  6. On the new collection screen, click View API details.
    Note down the Configuration ID and Environment ID of the new collection for use in later steps.
    The following image shows an example of Configuration ID and Environment ID:
  7. At the command prompt, run the following sample command to get the JSON (configuration details):

    Command to get the JSON details
    curl -u "apikey":"<apikey>"
    "https://gateway.watsonplatform.net/discovery/api/v1/environments/<environment_id>/configurations/<configuration_id>?version=2019-03-25"

    Replace the values in the command placeholders as follows:

    • <apikey>—Enter the value of the collection’s API key.

    • <environment_id>—Enter the Environment ID as noted in an earlier step.

    • <configuration_id>Enter the Configuration ID as noted in an earlier step.

  8. Modify the JSON as per the requirements of your collection of articles.
  9. At the command prompt, run the following sample command to update your configuration with the new JSON details:

    Command to update the configuration with the new JSON details
    curl -X PUT -u "apikey":"<apikey>" -H
    "Content-Type: application/json" -d @<new_config.json>
    "https://gateway.watsonplatform.net/discovery/api/v1/environments/<environment_id>/configurations/<configuration_id>?version=2019-03-25"

    Replace the values in the command placeholders as follows:

    • <apikey>—Enter the value of the collection’s API key.

    • <environment_id>—Enter the Environment ID as noted in an earlier step.

    • <configuration_id>Enter the Configuration ID as noted in an earlier step.

    • <new_config.json>: Enter the name and absolute path of the modified configuration JSON file. 

    The new configuration takes effect when you next upload your documents to the collection. 

If you are using the collection to crawl a .pdf or.doc file to populate html and text in IBM Watson Discovery, then ensure you create the collection by using the following command:

curl -X POST -u "<userName>":"<password>" -H
"Content-Type: application/json" -d '{
 "name":"<collectionName>",
 "description": "Mytest collection",
 "configuration_id":"<configuration_id>",
 "language":"en"
}' "https://gateway.watsonplatform.net/discovery/api/v1/environments/<environment_id>/collections?version=2019-01-30"

Replace the values in the command placeholders as follows:

  • <apikey>Enter the API key of your IBM Watson Discovery service instance.
  • <configuration_id>—Enter the Configuration ID as noted in an earlier step.

  • <environment_id>—Enter the Environment ID as noted in an earlier step.
curl -X POST -u "apikey":"<apikey>" -H "Content-Type: application/json"
-d '{
 "name":"<collectionname>",
 "description": "My test collection",
 "configuration_id":"<configuration_id>",
 "language":"en"
}' https://gateway.watsonplatform.net/discovery/api/v1/environments/<environment_id>/collections?version=2019-01-30

Replace the values in the command placeholders as follows:

  • <apikey>—Enter the value of the collection’s API key.

  • <configuration_id>—Enter the Configuration ID as noted in an earlier step.

  • <environment_id>—Enter the Environment ID as noted in an earlier step.

To configure a stop word list

After you create a collection, you can filter out specific words from the collection by using stop words. To filter out words from the collection by using a stop word list, perform the following steps:

  1. Go to the command prompt.
  2. Navigate to the directory from where you want to run the command to filter out words from the collection.
  3. Upload the text file that contains the stop words list and run the following command:

    Command to upload the text file for stop words list
    curl -u "apikey":"<apikey>" -X POST --data-binary @stopwords.txt "<IBM Watson Discovery instance URL >/discovery/api/v1/environments/<environment_id>/collections/<collection_id>/word_lists/stopwords?version=2018-12-03"

    Replace values in the command placeholders as follows:

    • collection_idEnter the unique Collection ID that is associated with a collection.

    • stopwords.txt—Enter the filename that contains the stop words list, such as sample stopwords.txt. The stop words file must be present in the current directory.

To install the data crawler

You must use the data crawler to upload documents into the IBM Watson Discovery collection. Perform the following steps to install the data crawler.

Note

You must have the data crawler executable file for Linux in either DEB, RPM, or ZIP format. To get the executable file, contact BMC Customer Support.

  1. Depending on the operating system, run the corresponding command to install the crawler:

    Operating systemCommand
    On Red Hat and CentOS virtual machines that use rpm packagesrpm -i /full/path/to/rpm/package/rpm-file-name
    On Ubuntu and Debian virtual machines that use deb packagesdpkg -i /full/path/to/deb/package/deb-file-name

    The crawler scripts are installed into the installation_directory/bin directory, such as /opt/ibm/crawler/bin

    The crawler scripts are also installed into the /usr/local/bin directory.

  2. Create a working directory and copy the contents of installation_directory/share/examples/config folder to the working directory, such as /home/config.
  3. On the virtual machine, run the following commands to set the environment variables:

    Commands to set environment variables
    export JAVA_HOME=/opt/jdk
    export PATH=/opt/jdk/jre/bin:$PATH
    export PATH={installation_directory}/bin:$PATH

    With the data crawler installed, you can upload the file system documents and data from the database tables into the IBM Watson Discovery collection.

To upload file system documents by using the crawler 

After you have installed the crawler on the virtual machine, perform the following steps to configure the crawler for uploading file system-based documents to the IBM Watson Discovery collection.

Note

In the following steps, the directory paths are specified considering that you have copied the contents of installation_directory/share/examples/config folder to /home/config while installing the data crawler. If you have not copied the contents of installation_directory/share/examples/config folder to /home/config, you must replace /home/config folder path with the appropriate path.

  1. On the virtual machine, go to the /home/config/connectors directory, open the filesystem.conf file, and specify the following values:

    Parameter nameActionValue
    protocolEnter the name of the connector protocol used for the crawler.sdk-fs
    collection Enter the attribute is used to unpack temporary files.crawler-fs
    logging-configEnter the file name that is used for configuring the logging options.Must be formatted as a log4j XML string.
    classnameEnter the Java class name for the connector.plugin:filesystem.plugin@filesystem
  2. Go to the /home/config/seeds directory, open the filesystem-seed.conf file, and specify the following values:

    Parameter nameActionValue
    urlEnter the list of files and folders to upload.
    Use a newline character to separate each list entry.
    For example, to crawl the /home/watson/mydocs folder, the value of this URL is sdk-fs:///home/watson/mydocs.
  3. Go to the /home/config/ directory, open the crawler.conf file, and specify the following values. 

    Parameter nameActionValue
    crawl_config_fileEnter the path of the configuration file that you updated in step 1.connectors/filesystem.conf
    crawl_seed_fileEnter the path of the seed configuration file that you updated in step 2.seeds/filesystem-seed.conf
    output_adapter class

    Enter the following value:

    class - 
    "com.ibm.watson.crawler.discoveryserviceoutputadapter.DiscoveryServiceOutputAdapter",
    config - "discovery_service", 
    discovery_service { 
        include "discovery/discovery_service.conf" 
    },

    For other parameters, see  Configuring crawler options .

  4. Go to the /home/config/discovery directory, open the discovery_service.conf file, and specify the following values.

    To obtain the values for this step, log into your IBM Watson Discovery account and navigate to the file system collection that you created.

    Parameter nameAction
    environment_id Enter the Environment ID of the collection.
    collection_id Enter the Collection ID of the collection.
    configuration_id Enter the Configuration ID of the collection.
    configurationEnter the complete path of this discovery_service.conf file, such as /home/config/discovery/discovery_service.conf.
    apikey

    Enter the API key of your IBM Watson Discovery service instance.


    Note

    The base URL changes based on the region of the IBM Watson Discovery service instance.

  5. To upload the documents from the virtual machine to the IBM Watson Discovery collection, run the following command from the installation_directory/bin directory:

    Command to upload the documents
    ./crawler crawl --config /home/config/crawler.conf
  6. To verify that the documents were successfully uploaded in IBM Watson Discovery, check the console logs.

    [crawler-output-adapter-41] INFO: HikariPool-1 - Shutdown initiated...

    [crawler-output-adapter-41] INFO: HikariPool-1 - Shutdown completed.

    [crawler-io-13] INFO: The service for the Connector Framework Input Adapter was signaled to halt.

    You can also log in to IBM Watson Discovery and view the number of uploaded documents. 

To upload database data by using the crawler

After you have installed the crawler on the virtual machine, perform the following steps to configure the crawler for uploading the database tables to the IBM Watson Discovery collection.

Note

In the following steps, the directory paths are specified considering that you have copied the contents of installation_directory/share/examples/config folder to /home/config while installing the data crawler. If you have not copied the contents of installation_directory/share/examples/config folder to /home/config, you must replace /home/config folder path with the appropriate path.

  1. On the virtual machine, go to the /home/config/connectors directory, open the database.conf file, and specify the following values:

    Parameter nameActionValue
    protocolEnter the name of the connector protocol used for the crawler.sqlserver
    collection Enter the attribute is used to unpack temporary files.tempcollection
    logging-configEnter the file name that is used for configuring the logging options.Must be formatted as a log4j XML string.
    classnameEnter the Java class name for the connector.plugin:database.plugin@database
  2. Go to the /home/config/seeds directory, open the database-seed.conf file, and specify the following values:

    Parameter nameActionExample
    urlEnter the seed URL for your custom SQL database.
    T
    he structure of the URL is as follows: database-system://host:port/database?[per=number of records]&[sql=SQL]
    sqlserver://mydbserver.test.com:5000/countries/street_view?per=1000
    user-passwordEnter the credentials for the database system.
    Note:
    You must separate the user name and password by a using a colon.
    You must encrypt the password by using the vcrypt utility that is available with the data crawler.
    Encrypt the password by issuing the following command:
    vcrypt --encrypt --keyfile /home/config/id_vcrypt -- "myPassw0rd" > /home/config/db_pwd.txt
    None
    jdbc-classEnter the name of the jdbc driver.com.microsoft.sqlserver.jdbc.SQLServerDriver
    connection-string If you enter a value, this string will override the automatically generated JDBC connection string. Enter a value if you want to provide more detailed configuration about the database connection, such as load-balancing or SSL connections. jdbc:netezza://127.0.0.1:5480/databasename

    Note

    The third-party JDBC drivers are located in the connectorFramework/crawler-connector-framework-#.#.#/lib/java/database folder within the crawler installation directory. 

     You can use the extra_jars_dir parameter in the crawler.conf file to specify another location.

  3. Go to the /home/config/ directory, open the crawler.conf file, and specify the following values. 

    For other parameters, see  Configuring crawler options .

    Parameter nameActionValue
    crawl_config_fileEnter the path of the configuration file that you updated in step 1.connectors/filesystem.conf
    crawl_seed_fileEnter the path of the seed configuration file that you updated in step 2.seeds/filesystem-seed.conf
    output_adapter class

    Enter the following value:

    class = "com.ibm.watson.crawler.discoveryserviceoutputadapter.DiscoveryServiceOutputAdapter",
      config = "discovery_service",
      discovery_service {
        include "discovery/discovery_service.conf"
      },

  4. Go to the /home/config/discovery directory, open the discovery_service.conf file, and specify the following values.

    To obtain the values for this step, log into your IBM Watson Discovery account and navigate to the file system collection that you previously created.

    Parameter nameAction
    environment_id Enter the Environment ID of the collection.
    collection_id Enter the Collection ID of the collection.
    configuration_id Enter the Configuration ID of the collection.
    configurationEnter the complete path of this discovery_service.conf file. For example, /home/config/discovery/discovery_service.conf.
    apikey

    Enter the API key of your IBM Watson Discovery service instance.

    Note

    The base URL changes based on the region of the IBM Watson Discovery service instance.

  5. To upload the documents from the virtual machine to the IBM Watson Discovery collection, run the following command from the installation_directory/bin directory:

    Command for uploading documents
    ./crawler crawl --config /home/config/crawler.conf
  6. To verify that the documents are successfully uploaded to IBM Watson Discovery, check the console logs.

    You can also log in to IBM Watson Discovery and view the number of uploaded data.

Where to go from here

Defining search data sets

Was this page helpful? Yes No Submitting... Thank you

Comments