Installing and configuring the cognitive search data crawler for BMC Helix ITSM: Knowledge Management articles
To include knowledge articles from BMC Helix ITSM: Knowledge Management in cognitive search, you must install and configure the BMC crawler utility. The BMC crawler utility is an improved crawler that is used to crawl the BMC Helix ITSM: Knowledge Management articles only.
After crawling, the knowledge articles in BMC Helix ITSM: Knowledge Management are uploaded to the IBM Watson Discovery collection. For more details about the data flow, see Leveraging cognitive search in your application.
Tip
BMC Helix Innovation Studio cognitive search supports BMC Helix Business Workflows knowledge articles and does not require a crawler to include knowledge articles in cognitive search.
Important
- If you have previously crawled the BMC Helix ITSM: Knowledge Management articles by using the IBM Watson Discovery Data Crawler, you must perform one of the following tasks:
- Create a new collection and re-crawl by using BMC crawler.
- Configure the BMC crawler utility so that the utility deletes the older collection, creates a new collection, and re-crawls the knowledge articles.
After re-crawling, use the updated display templates. The older display templates do not work with the BMC crawler.
- If you want to crawl a database or file system that is outside of BMC Helix ITSM: Knowledge Management, use the IBM Watson Discovery Data Crawler.
For more information, see Creating file system and database collections for searching external data.
Before you begin
Ensure that you have completed the following tasks:
Product | Task |
---|---|
System configuration |
|
IBM Watson Discovery |
You can
get the IAM API key and the endpoint URL
|
BMC Helix ITSM: Knowledge Management |
|
Process for installing and configuring the BMC crawler utility
The following image illustrates the end-to-end process to install and configure the BMC crawler utility:
Task 1: To set up your system and download the BMC crawler utility
Task 2: To download the out-of-the-box files
BMC provides the following files out-of-the-box:
- Stop words file—Text file with a list of words that you can filter out from the data collection.
- Discovery Collection configuration file—JSON file format of knowledge articles that are crawled and uploaded to the IBM Watson Discovery collection in this format.
After downloading, you can modify the stop words list or the enrichments of the knowledge articles.
- Download the sample stop words file.
- Download the Discovery Collection configuration file.
- (Optional) If required, modify the stop words list.
- Modify the enrichments section in the Discovery Collection configuration file according to the BMC Helix ITSM: Knowledge Management form that you want to crawl.
- Save the modified file with a different name.
Task 3: To encrypt the password and forcefully pause the crawler for stop words
Task 4: To run the crawler for a second time and forcefully pause the crawler for stop words
Perform this task if you want to upload and activate the stop words.
Best practice
We recommend that you wait till the stop words file status becomes active. It takes several minutes for the stop words file status to become active. Perform the steps to check the status of the stop words file.
- Open command prompt.
Run the following command:
Example of command to encrypt the passwordjava -jar -DbmcCrawlerPropertyFile=.\bmcCrawler.properties bmcCrawler-20.2.0.jar -stopToCreateStopwordList
The BMC Crawler displays a message that the crawler has stopped. The user must go to the IBM Watson Discovery UI to upload and activate the stop words file.
Task 5: To upload the stop words file
- Log in to IBM Watson Discovery.
- Navigate to the data collection to which you want to upload the BMC Helix ITSM: Knowledge Management articles.
- Click the Search settings tab.
- In the Stopwords section, click Upload, as shown in the following image:
For more information about uploading stop words, see
Defining stop words
in IBM documentation.
Task 6: To check the status of the stop words file
After uploading the stop words file, it takes several minutes for the stop words status to become active. You can check the status of the file, by performing the following steps:
- In IBM Watson Discovery, open the data collection that was created (data collection to which you want to upload the articles).
- Copy the Collection Id and the Environment Id, as shown in the following image:
- Open command prompt.
Run the following command:
Example of command to check the stop words file statuscurl -u "apikey":"{apikey}" -X GET https://gateway.watsonplatform.net/discovery/api/v1/environments/{environment_id}/collections/{collection_id}/word_lists/stopwords?version=2019-04-30
If the file is in pending status, as shown in the following example, wait for the status to change to active:
{"status":"pending","type":"stopwords"}
After the file is active, the status is displayed as shown in the following example:
{"status":"active","type":"stopwords"}
Task 7: To run the crawler
You run the BMC crawler for a third time to crawl BMC Helix ITSM: Knowledge Management articles so that they are uploaded to the IBM Watson Discovery collection.
- Open the command prompt.
Run the following command:
Example command to run BMC crawlerjava -jar -DbmcCrawlerPropertyFile=.\bmcCrawler.properties bmcCrawler-20.2.0.jar
A log file is generated every time you run the BMC crawler and is saved in the same directory from where you ran the crawler.
Tip
After crawling articles from BMC Helix ITSM: Knowledge Management, you can train and test the search result relevancy by using IBM Watson Discovery tooling methods. For more information, see
Improving result relevance with the tooling
in IBM documentation.
Comments
Log in or register to comment.