Deletion Tool


As an administrator, it is important to keep the projects updated with only the required documents and delete the documents that are not required. When there are many documents in the collections that must be deleted in the IBM Watson Discovery V2 instance, there is a need to automate deletion. Deletion tool automatically deletes the documents from the collections in the IBM Watson Discovery V2 instance. 

For example, Jordan, an administrator, observes that a web crawler has created several documents in multiple collections in the IBM Watson Discovery V2 instance. All these documents are not useful and need to be deleted. Because there are too many documents and manual deletion would take time, Jordan uses the Deletion tool to remove the documents from the project. 

The Deletion tool performs the following two functions:

  • Get the details of the documents from IBM Watson Discovery V2 instance. Document details including documentID, collectionID, and source URL.
  • Delete the documents in the collection. 

You must know the documentID of the document that needs to be deleted. You can either manually search for the documentIDs  or you can run the Deletion tool to automatically get the documentIDs for all the documents in the collection. You can run the tool multiple times to get the list of documentIDs.

The Deletion tool deletes the documents in chunks. You can configure the number of documents to be deleted. For example, you can configure the Deletion tool to delete 100 documents at a time.

Before you begin

Make sure that you have completed the following tasks:

Product

Task

System configuration

You must have the following system and software:

  • A Linux, Unix, or Windows machine on which you can run the BMC crawler utility.
  • Java version 11 or later.

IBM Watson Discovery

You must have the following details of the IBM Watson Discovery instance to which you want to upload the BMC Helix ITSM: Knowledge Management articles:


    • Identity and Access Management (IAM) API key
    • Endpoint URL
    • IBM Watson Discovery V2

You can get the IAM API key and the endpoint URL image-2023-7-26_10-26-6.png by logging in to IBM Cloud.

To delete the documents in the IBM Watson Discovery V2 instance collection

  1. Get the list of document IDs using the Deletion tool, perform the following steps. Refer to the Properties filetable.
    1. In the properties file (refer to the Properties filetable and sample files), specify the values for the following parameters: 
      • documentIdsFile = <filename>.txt
      • filterPattern = SharePoint URL 
      • saveDocumentIdsToFileKey = Y 
      • responseFilenameKey = <filename>.json 
    2. Run the POST request on the IBM Watson Discovery V2 instance.
      The POST request returns documents that match the filter criteria.
      The Deletion tool writes document details in response.json.
      The Deletion tool writes the document IDs to <filename>.txt specified in the documentIdsFile parameters.
  2. Modify the properties files. Refer to the Properties filetable and sample files.
  3. Download the Deletion Tool jar file.
  4. Ensure that the Java archive and its files are in the same folder.
  5. Open command prompt and run the JAR file.

    java -DdeletionToolPropertyFile=<sample-deletionTool-file>.properties -jar com.bmc.dsm.DeletionTool-99.0.00-SNAPSHOT.jar

    When you run this command after getting the list of IDs, the documents are deleted.

Properties file

Following table describes the files and its attributes that are required to run the Deletion tool. You must set the attributes in the properties before running the Deletion tool.

Mandatory attributes

Description

Example

documentIdsFile

Specify the file that has valid IDs of all the documents that need to be deleted. All the documents mentioned in the text file are deleted. You can auto generate the file by using the deletion tool. See step 1 of Delete documents.

<file name>.txt 

version

Specify the latest version of IBM API calls

2023-03-31

discoveryApiKey

Specify the IBM Watson Discovery V2 API key


discoveryEndPoint

Specify the IBM Watson Discovery V2 endpoint URL


 https://api.us-south.discovery.watson.cloud.ibm.com/instances/4eeed535-b28f-4ee9-9049-21661e7ceabc

discoveryCollectionName

Specify the IBM Watson Discovery V2 collection name


 SP project

discoveryProjectName

Specify the IBM Watson Discovery V2 project name

Default is BMC Default Project 

BMC Default Project

customerId

Specify the name of your company

BMC

filterPattern

Specify the SharePoint URL. The Deletion tool extracts the document IDs of the documents listed on this page.  

Provide the url in one of the following formats:

  • ! (*//*.aspx) - Filter to exclude certain files.
  • (*//*.aspx) - Filter to include certain files.
  • //<url>


//helixvirtualagent.sharepoint.com/sites/<SOME_SITE_NAME>/Lists/

Optional attributes

Description

Example

approveToDelete

Specify if the tool must delete document ids.

Valid values are Y or N. 

Y: Yes. Deletes all the documents mentioned in the documentIdsFile. (default)

N: No. Does not delete the documents

saveDocumentIdsToFileKey

Specify if the tool should write the list of document IDs to the file mentioned in the  documentIdsFile parameter before deleting the documents.

Valid values are Y or N. 

Y: Yes. Writes the documentIDs for all documents in the collection. (default)

N: No.Does not write the document IDs to the file.

responseFilenameKey

Specify response file name. The IBM Watson Discovery V2 instance writes the document details to this file. 

<filename.json>

response.json (default)

chunkSizeKey

Specify number of documents to delete in each chunk.

1000 (default)

threadPoolSizeKey

Specify number of threads that must run parallelly to delete the documents.

10 (default)

 

Tip: For faster searching, add an asterisk to the end of your partial query. Example: cert*

BMC Helix Innovation Suite 25.4