Installing and configuring the cognitive search data crawler for BMC Helix ITSM: Knowledge Management articles
After crawling, the knowledge articles in BMC Helix ITSM: Knowledge Management are uploaded to the IBM Watson Discovery collection. For more details about the data flow, see Leveraging-cognitive-search-in-your-application.
Before you begin
Make sure that you have completed the following tasks:
Product | Task |
---|---|
System configuration | Make sure that you have:
|
IBM Watson Discovery |
You can get the IAM API key and the endpoint URL by logging in to IBM Cloud. |
BMC Helix ITSM: Knowledge Management |
|
Process for installing and configuring the BMC crawler utility
The following image illustrates the end-to-end process to install and configure the BMC crawler utility:
Task 1: To set up your system and download the BMC crawler utility
Task 2: To download the out-of-the-box files
BMC provides the following files out-of-the-box:
- Stop words file—Text file with a list of words that you can filter out from the data collection.
- Discovery Collection configuration file—JSON file format of knowledge articles that are crawled and uploaded to the IBM Watson Discovery collection in this format.
After downloading, you can modify the stop words list or the enrichments of the knowledge articles.
- Download the sample stop words file.
- Download the Discovery Collection configuration file.
- (Optional) If required, modify the stop words list.
- Modify the enrichments section in the Discovery Collection configuration file according to the BMC Helix ITSM: Knowledge Managementform that you want to crawl.
- Save the modified file with a different name.
Task 3: To encrypt the password and forcefully pause the crawler for stop words
Task 4: To run the crawler for a second time and forcefully pause the crawler for stop words
Perform this task if you want to upload and activate the stop words.
- Open a command prompt.
Run the following command:
Example of command to encrypt the passwordjava -jar -DbmcCrawlerPropertyFile=.\bmcCrawler.properties bmcCrawler-20.2.0.jar -stopToCreateStopwordListThe BMC Crawler displays a message that the crawler has stopped. The user must go to the IBM Watson Discovery UI to upload and activate the stop words file.
Task 5: To upload the stop words file
- Log in to BMC Helix Innovation Studio.
- Open an existing project.
- Open the Improvize and customize pane.
- Go to Improve Relevance and select Stopwords.
- Click Upload stopwords, to upload the stop word file.
File format: <filename>.json
The IBM Watson Discovery search will ignore the word mentioned in the stop word file.
For more information about uploading stop words, see Identifying words to ignore in IBM documentation.
Task 6: To run the crawler
You run the BMC crawler for a third time to crawl BMC Helix ITSM: Knowledge Management articles so that they are uploaded to the IBM Watson Discovery collection.
- Open a command prompt.
Run the following command:
Example command to run BMC crawlerjava -jar -DbmcCrawlerPropertyFile=.\bmcCrawler.properties bmcCrawler-20.2.0.jarA log file is generated every time you run the BMC crawler and is saved in the same directory from where you ran the crawler.
To run the crawler to delete retired knowledge articles
Run the BMC crawler on BMC Helix ITSM: Knowledge Management articles to delete retired knowledge articles from the IBM Watson Discovery Collection. The deletion process ensures that when the user sends their queries to the chatbot, the chatbot does not display retired knowledge articles to users when they query the chatbot.
- Open the command prompt.
Run the following command:
java -jar -DbmcCrawlerPropertyFile=./<filename>.properties com.bmc.dsm.bmcCrawler-99.0.00-SNAPSHOT.jar
Where to go from here
To define search data sets, see Defining-search-data-sets.