Installing and configuring the cognitive search data crawler for BMC Helix ITSM: Knowledge Management articles

To include knowledge articles from BMC Helix ITSM: Knowledge Management in cognitive search, you must install and configure the BMC crawler utility. The BMC crawler utility is an improved crawler that is used to crawl the BMC Helix ITSM: Knowledge Management articles only.

After crawling, the knowledge articles in BMC Helix ITSM: Knowledge Management are uploaded to the IBM Watson Discovery collection. For more details about the data flow, see Leveraging-cognitive-search-in-your-application.

Tip

BMC Helix Innovation Studio cognitive search supports BMC Helix Business Workflows knowledge articles and does not require a crawler to include knowledge articles in cognitive search.

Important

If you have previously crawled the BMC Helix ITSM: Knowledge Management articles by using the IBM Watson Discovery Data Crawler, you must perform one of the following tasks:
- Create a new collection and re-crawl by using BMC crawler.
- Configure the BMC crawler utility so that the utility deletes the older collection, creates a new collection, and re-crawls the knowledge articles.

After re-crawling, use the updated display templates. The older display templates do not work with the BMC crawler.

If you want to crawl a database or file system that is outside of BMC Helix ITSM: Knowledge Management, use the IBM Watson Discovery Data Crawler.
For more information, see Creating-file-system-and-database-collections-for-searching-external-data.

Before you begin

Ensure that you have completed the following tasks:

Product	Task
System configuration	Make sure that you have a Linux, Unix, or Windows machine on which you can run the BMC crawler utility. Java 11 or later.
IBM Watson Discovery	Have the following details of the IBM Watson Discovery instance to which you want to upload the BMC Helix ITSM: Knowledge Management articles: Identity and Access Management (IAM) API key Endpoint URL You can get the IAM API key and the endpoint URL by logging in to IBM Cloud. Copy the IAM API key and endpoint URL from this screen
BMC Helix ITSM: Knowledge Management	Have the Action Request System hostname and port. (You do not need to run the utility on AR System server.) Have the Remedy administrator credentials. To ensure that end users can access the articles in BMC Helix ITSM: Knowledge Management, set the visibility conditions for the knowledge articles. For information about configuring the visibility for knowledge articles, see Managing knowledge article visibility. To ensure that the articles are indexed, ensure that you have registered the file system path. For more information, see Registering file system paths. The files in the registered path are mapped to the RKM:VF_FileSystem_Manageable_Join vendor form in .BMC Helix ITSM: Knowledge Management

Process for installing and configuring the BMC crawler utility

The following image illustrates the end-to-end process to install and configure the BMC crawler utility:

Task 1: To set up your system and download the BMC crawler utility

Download the BMC crawler utility.
1. As an administrator, download the BMC crawler utility file.
1. Save the crawler utility in your local machine.
2. Extract the .zip file of the utility.
  The extracted folder includes the following files:
- bmccrawler.properties
  bmcCrawler-versionNumber.jar
Set the Java environment variables.
You must set the Java environment variables in your system to run the BMC crawler utility.
1. Open the command prompt.
2. Perform one of the following steps:
- For Linux, run the following command:
  Commands to set environment variables
  export JAVA_HOME=/opt/jdk
  export PATH=/opt/jdk/jre/bin:$PATH
  export PATH=$JAVA_HOME/bin:$PATH
  For Windows, Set the Java_Home variable.

Configure the bmcCrawler.properties file.

Navigate to the folder where you downloaded the BMC crawler utility.
Open the extracted folder and then open the bmccrawler.properties file.

Specify the following parameter values:

Parameter	Description	Example value
Mandatory parameters
discoveryApiKey	Specify the API key of the IBM Watson Discovery instance.	zABCd4EFgHijKLmn1O1pqrStuu0vWXYzaBCDEfGHIjk_
discoveryEndPoint	Specify the endpoint URL of the IBM Watson Discovery instance.	https://gateway-/location.watsonplatform.net/discovery/api
remedyHostName	Specify the host name of the Action Request System.	abc-tenantname-1234
remedyPort	Specify the port number of the Action Request System.	46262
remedyUser	Specify the AR System administrator user name.	Administrator
remedyPassword	(If you are running the BMC Crawler utility to encrypt the password) Specify the password in plain text. (If you are running the BMC Crawler utility after encrypting the password) Specify the encrypted password.	Plain text password—password123 Encrypted password—1234567891106a28b29c843d98e5fg1hi59j0k12345678953592258d3a981312b1
formName	Specify the BMC Helix ITSM: Knowledge Management form name. Important: You can specify only one form at a time. Comma-separated form names are not valid.	RKM:HowToTemplate_Manageable_Join
fieldNamesForParagraphs	Specify the field names whose values are appended to the Text field of the IBM Watson Discovery collection. During a cognitive search, matching paragraphs are derived from the Text field. Important: For file system articles, specify only one attachment field. If you specify multiple attachment fields, only the first one is considered. For database tables, do not specify attachment fields.	Article_Keywords,ArticleTitle
fieldNamesForDetails	Specify the field names that should be added as metadata fields in the IBM Watson Discovery collection. During a cognitive search, the metadata fields are used to display the knowledge article details. Important: For file system articles, specify only one attachment field. If you specify multiple attachment fields, only the first one is considered. For database tables, do not specify attachment fields.	DocID,ArticleTitle
discoveryConfigurationName	If you do not want to specify the value for discoveryConfigurationFilePath, specify the name of the IBM Watson Discovery configuration. If you specify the value for discoveryConfigurationFilePath, the configuration name is taken from the JSON file.	HowTo-Config
discoveryCollectionName	Specify the name of the IBM Watson Discovery collection.	HowTo-Collection
customerId	Specify the customer ID that you want to use for labeling data for General Data Protection Regulation (GDPR).	BMC
discoveryProjectName	Specify the name of the IBM Watson Discovery project. Important: If the project is only of the type document_retrieval, the crawl words are uploaded to IBM Watson Discovery v2. For other types, the crawl words are not uploaded to IBM Watson Discovery v2.	Default value—BMC Crawler Project
Optional parameters with default values
chunkSize	Specify the batch size for querying entries from the BMC Helix ITSM: Knowledge Management form.	Default value—1000
threadPoolSize	Specify the number of knowledge articles that will be crawled simultaneously.	Default value—10 Maximum valid value—50
deleteCollectionBeforeCrawl	Specify whether you want to delete the collection before you start crawling.	Default value—no Valid values—yes/no
qualification	Specify a valid AR System qualification to include BMC Helix ITSM: Knowledge Management articles.	Default value—'1=1'
language	Specify the language of the IBM Watson Discovery collection. Valid values: en, es, de, fr, it, ja, ko, pt, nl, zh-CN	Default value—en
discoveryEnvironmentSize	Specify the IBM Watson Discovery plan.	Default value—LT (for the Lite plan) Valid values—LT/ S (for all other plans)
fieldNameForGettingModifiedRecord	Specify the field name that represents the modified date of a form in BMC Helix ITSM: Knowledge Management.	Default value—Modified Date
Optional parameters without default values
discoveryConfigurationFilePath	(After you download and modify the out-of-the-box files) Specify the path of the JSON file that you modified.	*C:\\Path\\Demo_Discovery_Collection_Configuration_HowTo_.json*
titleFieldName	Specify the field name that will be considered as the title for the knowledge article in the search results.	Title
documentUniqueID	Specify the document ID to be used in IBM Watson Discovery. Important: If the parameter value is blank, the BMC Helix ITSM: Knowledge Management article request ID is set as the unique ID in IBM Watson Discovery. If there are multiple versions of the knowledge article, all the versions of the article might be displayed to the end user. If you want to use the BMC Helix ITSM: Knowledge Management knowledge base ID as the unique ID in IBM Watson Discovery, you must set the value to DocID. If there are multiple version of the knowledge article, only the latest version of the article are displayed to the end user.	docID
stopWordsFilePath	Specify whether to upload custom stop words list to IBM Watson Discovery collection.	Path to file to upload custom stop words list to IBM Watson Discovery collection.
checkStopwords	Compares stop words in the file with stop words available in IBM Watson Discovery v2.	Default value—No Valid values—yes/no

Restrict the properties file to administrators.
After downloading the BMC crawler utility, you must restrict the bmcCrawler.properties file so that only authorized administrators can access and modify this file.
Best Practice
We recommend that you provide access to the file to only that user who owns the bmcCrawler.properties file. Do not provide access to any other users.
- To restrict the file on Windows, perform one of the following steps:
  Restrict file permissions by using Windows Explorer.
  Open the command prompt and Apply Discretionary Access Control Lists (DACLs) to the file.
- To restrict the file on Linux, open the command prompt and run the chmod 700 command as specified in The chmod command description.
  For example, use one of the following commands:
  
  chmod 700 bmcCrawler.properties
  chmod u+rwx,go-rwx bmcCrawler.properties

Task 2: To download the out-of-the-box files

BMC provides the following files out-of-the-box:

Stop words file—Text file with a list of words that you can filter out from the data collection.
Discovery Collection configuration file—JSON file format of knowledge articles that are crawled and uploaded to the IBM Watson Discovery collection in this format.

After downloading, you can modify the stop words list or the enrichments of the knowledge articles.

Download the sample stop words file.
Download the Discovery Collection configuration file.
(Optional) If required, modify the stop words list.
Modify the enrichments section in the Discovery Collection configuration file according to the BMC Helix ITSM: Knowledge Management form that you want to crawl.
Save the modified file with a different name.

Task 3: To encrypt the password and forcefully pause the crawler for stop words

Run the crawler for the first time to encrypt the Remedy password.
You run the BMC crawler for the first time to encrypt the Remedy password. The password is encrypted by using the AES with GCM cipher and 256-bit key.
1. Open command prompt.
2. Run the following command:
  Example of command to encrypt the password
  java -jar -DbmcCrawlerPropertyFile=.\bmcCrawler.properties bmcCrawler-20.2.0.jar -encpassword
  The encrypted password is displayed in the command prompt.
3. Note the encrypted password for future reference.
(For Microsoft Windows only) Restrict the Keys file to administrators
After the password is encrypted, the key.txt file is generated and is located in the same directory from where you run the BMC crawler utility. If you have Microsoft Windows, you must restrict this file to administrators only. If you have Linux, this file is automatically restricted and you do not have to restrict it manually.
1. Navigate to the location of the key.txt file.
2. Perform one of the following steps:
  Restrict file permissions by using Windows Explorer.
  Open command prompt and Apply Discretionary Access Control Lists (DACLs) to the file.

Task 4: To run the crawler for a second time and forcefully pause the crawler for stop words

Perform this task if you want to upload and activate the stop words.

Best practice

We recommend that you wait till the stop words file status becomes active. It takes several minutes for the stop words file status to become active. Perform the steps to check the status of the stop words file.

Open command prompt.
Run the following command:
Example of command to encrypt the password
java -jar -DbmcCrawlerPropertyFile=.\bmcCrawler.properties bmcCrawler-20.2.0.jar -stopToCreateStopwordList
The BMC Crawler displays a message that the crawler has stopped. The user must go to the IBM Watson Discovery UI to upload and activate the stop words file.

Task 5: To upload the stop words file

Log in to IBM Watson Discovery.
Navigate to the data collection to which you want to upload the BMC Helix ITSM: Knowledge Management articles.
Click the Search settings tab.
In the Stopwords section, click Upload, as shown in the following image:

For more information about uploading stop words, see Defining stop words in IBM documentation.

Task 6: To check the status of the stop words file

After uploading the stop words file, it takes several minutes for the stop words status to become active. You can check the status of the file, by performing the following steps:

In IBM Watson Discovery, open the data collection that was created (data collection to which you want to upload the articles).
Copy the Collection Id and the Environment Id, as shown in the following image:
Open command prompt.
Run the following command:
Example of command to check the stop words file status
curl -u "apikey":"{apikey}" -X GET https://gateway.watsonplatform.net/discovery/api/v1/environments/{environment_id}/collections/{collection_id}/word_lists/stopwords?version=2019-04-30
If the file is in pending status, as shown in the following example, wait for the status to change to active:
{"status":"pending","type":"stopwords"}
After the file is active, the status is displayed as shown in the following example:

{"status":"active","type":"stopwords"}

Task 7: To run the crawler

You run the BMC crawler for a third time to crawl BMC Helix ITSM: Knowledge Management articles so that they are uploaded to the IBM Watson Discovery collection.

Open the command prompt.
Run the following command:

Example command to run BMC crawler
java -jar -DbmcCrawlerPropertyFile=.\bmcCrawler.properties bmcCrawler-20.2.0.jar
A log file is generated every time you run the BMC crawler and is saved in the same directory from where you ran the crawler.