To include knowledge articles from BMC Helix ITSM: Knowledge Management in cognitive search, you must install and configure the BMC crawler utility. The BMC crawler utility is an improved crawler that is used to crawl the BMC Helix ITSM: Knowledge Management articles only.
After crawling, the knowledge articles in BMC Helix ITSM: Knowledge Management are uploaded to the IBM Watson Discovery collection. For more details about the data flow, see Leveraging-cognitive-search-in-your-application.
Tip
BMC Helix Innovation Studio cognitive search supports BMC Helix Business Workflows knowledge articles and does not require a crawler to include knowledge articles in cognitive search.
Important
- If you have previously crawled the BMC Helix ITSM: Knowledge Management articles by using the IBM Watson Discovery Data Crawler, you must perform one of the following tasks:
- Create a new collection and re-crawl by using BMC crawler.
- Configure the BMC crawler utility so that the utility deletes the older collection, creates a new collection, and re-crawls the knowledge articles.
After re-crawling, use the updated display templates. The older display templates do not work with the BMC crawler.
Before you begin
Ensure that you have completed the following tasks:
| |
---|
| - Make sure that you have a Linux, Unix, or Windows machine on which you can run the BMC crawler utility.
- Java 11 or later.
|
| - Have the following details of the IBM Watson Discovery instance to which you want to upload the BMC Helix ITSM: Knowledge Management articles:
- Identity and Access Management (IAM) API key
- Endpoint URL
You can get the IAM API key and the endpoint URL by logging in to IBM Cloud.
Copy the IAM API key and endpoint URL from this screen
|
BMC Helix ITSM: Knowledge Management
| - Have the Action Request System hostname and port. (You do not need to run the utility on AR System server.)
- Have the Remedy administrator credentials.
- To ensure that end users can access the articles in BMC Helix ITSM: Knowledge Management, set the visibility conditions for the knowledge articles. For information about configuring the visibility for knowledge articles, see Managing knowledge article visibility.
To ensure that the articles are indexed, ensure that you have registered the file system path. For more information, see Registering file system paths. The files in the registered path are mapped to the RKM:VF_FileSystem_Manageable_Join vendor form in .BMC Helix ITSM: Knowledge Management
|
Process for installing and configuring the BMC crawler utility
The following image illustrates the end-to-end process to install and configure the BMC crawler utility:

Task 1: To set up your system and download the BMC crawler utility
Download the BMC crawler utility.
- As an administrator, download the BMC crawler utility file.
- Save the crawler utility in your local machine.
Extract the .zip file of the utility.
The extracted folder includes the following files:
- bmccrawler.properties
- bmcCrawler-versionNumber.jar
Set the Java environment variables.
You must set the Java environment variables in your system to run the BMC crawler utility.
- Open the command prompt.
- Perform one of the following steps:
For Linux, run the following command:
Commands to set environment variables
export JAVA_HOME=/opt/jdk
export PATH=/opt/jdk/jre/bin:$PATH
export PATH=$JAVA_HOME/bin:$PATH
Configure the bmcCrawler.properties file.
- Navigate to the folder where you downloaded the BMC crawler utility.
- Open the extracted folder and then open the bmccrawler.properties file.
Specify the following parameter values:
|
| | |
---|
| Specify the API key of the IBM Watson Discovery instance. | zABCd4EFgHijKLmn1O1pqrStuu0vWXYzaBCDEfGHIjk_ |
| Specify the endpoint URL of the IBM Watson Discovery instance. | https://gateway-/location.watsonplatform.net/discovery/api |
| Specify the host name of the Action Request System. | |
| Specify the port number of the Action Request System. | |
| Specify the AR System administrator user name. | |
| - (If you are running the BMC Crawler utility to encrypt the password)
Specify the password in plain text.
- (If you are running the BMC Crawler utility after encrypting the password)
Specify the encrypted password.
| - Plain text password—password123
- Encrypted password—1234567891106a28b29c843d98e5fg1hi59j0k12345678953592258d3a981312b1
|
| Specify the BMC Helix ITSM: Knowledge Management form name. Important: You can specify only one form at a time. Comma-separated form names are not valid. | RKM:HowToTemplate_Manageable_Join |
| Specify the field names whose values are appended to the Text field of the IBM Watson Discovery collection. During a cognitive search, matching paragraphs are derived from the Text field. Important: - For file system articles, specify only one attachment field. If you specify multiple attachment fields, only the first one is considered.
- For database tables, do not specify attachment fields.
| Article_Keywords,ArticleTitle |
| Specify the field names that should be added as metadata fields in the IBM Watson Discovery collection. During a cognitive search, the metadata fields are used to display the knowledge article details. Important: - For file system articles, specify only one attachment field. If you specify multiple attachment fields, only the first one is considered.
- For database tables, do not specify attachment fields.
| |
discoveryConfigurationName | - If you do not want to specify the value for discoveryConfigurationFilePath, specify the name of the IBM Watson Discovery configuration.
- If you specify the value for discoveryConfigurationFilePath, the configuration name is taken from the JSON file.
| |
| Specify the name of the IBM Watson Discovery collection. | |
| Specify the customer ID that you want to use for labeling data for General Data Protection Regulation (GDPR). | |
| Specify the name of the IBM Watson Discovery project. Important: - If the project is only of the type document_retrieval, the crawl words are uploaded to IBM Watson Discovery v2.
- For other types, the crawl words are not uploaded to IBM Watson Discovery v2.
| Default value—BMC Crawler Project |
Optional parameters with default values |
| Specify the batch size for querying entries from the BMC Helix ITSM: Knowledge Management form.
| |
| Specify the number of knowledge articles that will be crawled simultaneously.
| Default value—10 Maximum valid value—50 |
deleteCollectionBeforeCrawl | Specify whether you want to delete the collection before you start crawling. | Default value—no Valid values—yes/no |
| Specify a valid AR System qualification to include BMC Helix ITSM: Knowledge Management articles. | |
| Specify the language of the IBM Watson Discovery collection. Valid values: en, es, de, fr, it, ja, ko, pt, nl, zh-CN | |
| Specify the IBM Watson Discovery plan. | Default value—LT (for the Lite plan) Valid values—LT/ S (for all other plans) |
fieldNameForGettingModifiedRecord | Specify the field name that represents the modified date of a form in BMC Helix ITSM: Knowledge Management. | Default value—Modified Date |
Optional parameters without default values |
discoveryConfigurationFilePath | (After you download and modify the out-of-the-box files) Specify the path of the JSON file that you modified. | C:\\Path\\Demo_Discovery_Collection_Configuration_HowTo_.json |
| Specify the field name that will be considered as the title for the knowledge article in the search results. | |
| Specify the document ID to be used in IBM Watson Discovery. Important: - If the parameter value is blank, the BMC Helix ITSM: Knowledge Management article request ID is set as the unique ID in IBM Watson Discovery. If there are multiple versions of the knowledge article, all the versions of the article might be displayed to the end user.
- If you want to use the BMC Helix ITSM: Knowledge Management knowledge base ID as the unique ID in IBM Watson Discovery, you must set the value to DocID. If there are multiple version of the knowledge article, only the latest version of the article are displayed to the end user.
| |
| Specify whether to upload custom stop words list to IBM Watson Discovery collection. | Path to file to upload custom stop words list to IBM Watson Discovery collection. |
| Compares stop words in the file with stop words available in IBM Watson Discovery v2. | Default value—No Valid values—yes/no |
Restrict the properties file to administrators.
After downloading the BMC crawler utility, you must restrict the bmcCrawler.properties file so that only authorized administrators can access and modify this file.
Best Practice
We recommend that you provide access to the file to only that user who owns the bmcCrawler.properties file. Do not provide access to any other users.
- To restrict the file on Windows, perform one of the following steps:
- Restrict file permissions by using Windows Explorer.
To restrict the file on Linux, open the command prompt and run the chmod 700 command as specified in The chmod command description.
For example, use one of the following commands:
- chmod 700 bmcCrawler.properties
- chmod u+rwx,go-rwx bmcCrawler.properties
Task 2: To download the out-of-the-box files
BMC provides the following files out-of-the-box:
- Stop words file—Text file with a list of words that you can filter out from the data collection.
- Discovery Collection configuration file—JSON file format of knowledge articles that are crawled and uploaded to the IBM Watson Discovery collection in this format.
After downloading, you can modify the stop words list or the enrichments of the knowledge articles.
- Download the sample stop words file.
- Download the Discovery Collection configuration file.
- (Optional) If required, modify the stop words list.
- Modify the enrichments section in the Discovery Collection configuration file according to the BMC Helix ITSM: Knowledge Management form that you want to crawl.
- Save the modified file with a different name.
Task 3: To encrypt the password and forcefully pause the crawler for stop words
Run the crawler for the first time to encrypt the Remedy password.
You run the BMC crawler for the first time to encrypt the Remedy password. The password is encrypted by using the AES with GCM cipher and 256-bit key.
- Open command prompt.
Run the following command:
Example of command to encrypt the password
java -jar -DbmcCrawlerPropertyFile=.\bmcCrawler.properties bmcCrawler-20.2.0.jar -encpassword
The encrypted password is displayed in the command prompt.
- Note the encrypted password for future reference.
(For Microsoft Windows only) Restrict the Keys file to administrators
After the password is encrypted, the key.txt file is generated and is located in the same directory from where you run the BMC crawler utility. If you have Microsoft Windows, you must restrict this file to administrators only. If you have Linux, this file is automatically restricted and you do not have to restrict it manually.
- Navigate to the location of the key.txt file.
- Perform one of the following steps:
- Restrict file permissions by using Windows Explorer.
Task 4: To run the crawler for a second time and forcefully pause the crawler for stop words
Perform this task if you want to upload and activate the stop words.
Best practice
We recommend that you wait till the stop words file status becomes active. It takes several minutes for the stop words file status to become active. Perform the steps to check the status of the stop words file.
- Open command prompt.
Run the following command:
Example of command to encrypt the password
java -jar -DbmcCrawlerPropertyFile=.\bmcCrawler.properties bmcCrawler-20.2.0.jar -stopToCreateStopwordList
The BMC Crawler displays a message that the crawler has stopped. The user must go to the IBM Watson Discovery UI to upload and activate the stop words file.
Task 5: To upload the stop words file
- Log in to IBM Watson Discovery.
- Navigate to the data collection to which you want to upload the BMC Helix ITSM: Knowledge Management articles.
- Click the Search settings tab.
- In the Stopwords section, click Upload, as shown in the following image:

For more information about uploading stop words, see Defining stop words in IBM documentation.
Task 6: To check the status of the stop words file
After uploading the stop words file, it takes several minutes for the stop words status to become active. You can check the status of the file, by performing the following steps:
- In IBM Watson Discovery, open the data collection that was created (data collection to which you want to upload the articles).
- Copy the Collection Id and the Environment Id, as shown in the following image:

- Open command prompt.
Run the following command:
Example of command to check the stop words file status
curl -u "apikey":"{apikey}" -X GET https://gateway.watsonplatform.net/discovery/api/v1/environments/{environment_id}/collections/{collection_id}/word_lists/stopwords?version=2019-04-30
If the file is in pending status, as shown in the following example, wait for the status to change to active:
{"status":"pending","type":"stopwords"}
After the file is active, the status is displayed as shown in the following example:
{"status":"active","type":"stopwords"}
Task 7: To run the crawler
You run the BMC crawler for a third time to crawl BMC Helix ITSM: Knowledge Management articles so that they are uploaded to the IBM Watson Discovery collection.
- Open the command prompt.
Run the following command:
Example command to run BMC crawler
java -jar -DbmcCrawlerPropertyFile=.\bmcCrawler.properties bmcCrawler-20.2.0.jar
A log file is generated every time you run the BMC crawler and is saved in the same directory from where you ran the crawler.
Tip
After crawling articles from BMC Helix ITSM: Knowledge Management, you can train and test the search result relevancy by using IBM Watson Discovery tooling methods. For more information, see Improving result relevance with the tooling in IBM documentation.
Where to go from here
Defining-search-data-sets