Ingesting data into BMC HelixGPT


After defining the data sources for the chatbot, knowledge article search, and summarization use cases, you must ingest data into the database through data connection jobs. The data connection jobs collect data from the configured data sources and ingest the data into the BMC HelixGPT database.

After the data is ingested, users receive responses to their queries from the information that is ingested into the BMC HelixGPT database. Users have a seamless experience of getting the appropriate answer across multiple data sources.

 

To ingest data into BMC HelixGPT you can set scheduler rules available out-of-the-box.

You can enable BMC HelixGPT to read text from attachments linked to BMC Helix Innovation Studio record definitions. 

BMC HelixGPT can read text data from the following attachment types:

  • txt
  • docx
  • pdf

However, the attachments linked to a record definition are unavailable as an out-of-the-box data source. You must create a data connection if you plan to use text from attachments.
For more information about using text from attachments linked to record definitions, see Read text from attachments.

 

Before you begin

You must have the HelixGPT Administrator role to ingest data into BMC HelixGPT.

 

Process for setting up BMC HelixGPT

The following image shows the process of setting up BMC HelixGPT and the current step that you are on:

23301_SettingUpAndGoingLive_IngestingData.jpg

 

Ingesting data in BMC HelixGPT

Perform the following tasks to ingest data into BMC HelixGPT :

Step

Action

Reference

1

Set scheduler rules available out-of-the-box.

(Optional) Create a connection to read text from attachments.

2

Create a data connection job.

3

Verify the data connection.

 

Task 1:  To ingest data by specifying a schedule

You can schedule data ingestion and regular data updates for the following data sources by configuring a schedule for the respective out-of-the-box rules:

Data source

Rule

BMC Helix Business Workflows

Index Update - Business Workflows

(Saas-only) BMC Helix Knowledge Management by ComAround

Index Update - Helix Knowledge Management

BMC Helix ITSM: Knowledge Management

Important:

You cannot ingest data from custom BMC Helix ITSM: Knowledge Management templates.

Index Update - ITSM Knowledge Management

BMC Helix ITSM

Index Update - ITSM Tickets

Important

  • The rules are not enabled by default. You must configure and schedule the rules for ingesting data into the BMC HelixGPT database.
  • No out-of-the-box rules are available for Confluence, Microsoft SharePoint Online, Salesforce Knowledge, BMC Helix Customer Management, Web.

To schedule data ingestion or updates

  1. Log in to BMC Helix Innovation Studio. 
  2. On the Workspace tab, click HelixGPT Manager
  3. On the Rules tab, open one of the following rules:
    • Index Update - Business Workflows
    • Index Update - Helix Knowledge Management
    • Index Update - ITSM Knowledge Management
    • Index Update - ITSM Tickets
  4. On the rule details page, select the Trigger element.
    The following image shows the rule details page:
    Edit rule page
  5. To modify the default schedule time, click Edit for the Schedule Definition parameter.
  6. In the Edit Schedule Definition window, select the month days, week days, hours, and time to schedule the data update, and click OK.
    Scheduler for data update
  7. Click Save.

 

(Optional)  To read text from attachments linked to BMC Helix Innovation Studio record definitions

You can enable BMC HelixGPT to read text from attachments linked to a record definition. To do this, you must connect with an existing record definition.

  1. On the Innovation Studio > Workspace tab, select HelixGPT Manager.
  2. Select a record definition and click Edit data.
    23_3_04_Attachment_Helper.png
    The data editor is displayed.
  3. In Data editor, click New to add a new record.
    The New record dialog box is displayed.
  4. On the File tab, select an attachment that you want to add.
  5. You can add a text file, a Microsoft Word file, or a PDF file as an attachment.
  6. In the Name field, enter the name of the file you want to attach.
  7. Click Save.

The new record definition you created is available in the HelixGPT Manager.
 

To create a connection record 

  1. In the HelixGPT Manager, select Connection_Record_Definition and click Edit data.
  2. Click New to create a new connection record definition.
    A new record dialog box is displayed:
    23_3_04_Connection_record.png
  1. In the DataSource ID list, select RECORD_DEFINITION.
  2. In the Field ID field, enter the Field ID of the File field on the record definition.
  3. In the Record Definition field, enter the name of the record definition you have given. 
  4. In the Name field, enter the name of the record definition
  5. Click Save.

A connection record is created to read data from attachments.

 

Task 2: To ingest data by creating a data connection job

You can ingest data into the BMC HelixGPT database by creating a data connection job. All published documents are ingested from the data sources into BMC HelixGPT. From the SharePoint and Confluence data sources, attached documents, such as PDFs, Microsoft Word documents, and plain text files, are ingested.  You can also ingest a single document by specifying the document or article ID. However, The SharePoint web pages are not ingested.

  1. Log in to BMC Helix Innovation Studio. 
  2. On the Workspace tab, click HelixGPT Manager
  3. On the Records tab, select the DataConnectionJob record definition and click Edit data, as shown in the following image:

    image-2023-12-7_14-27-7.png
     
  4. On the Data Editor (DataConnectionJob) page, click New.

    image-2023-12-7_14-47-59.png
     
  5. In the New Record pane, specify the following information:
    1. In the Data source field, enter one of the following data sources:

      Data source

      Value to be entered

      BMC Helix Business Workflows

      BWF

      BMC Helix Knowledge Management by ComAround

      HKM

      BMC Helix ITSM: Knowledge Management

      RKM

      BMC Helix ITSM

      ITSM

      Confluence

      CNF

      Microsoft SharePoint Online

      SPT

      Web

      WEB

      BMC Helix Customer Service Management

      CSM

      Salesforce Knowledge

      SALESFORCE_KNOWLEDGE

      Attachments of a record defintion

      RECORD_DEFINITION

    2. Specify a description for the data connection job.
    3. (Optional) Specify the Assignee.
    4. Specify the Connection ID.
      The Connection ID is the ID that you noted when you added the data source successfully in HelixGPT Manager in Adding-data-sources-in-BMC-HelixGPT.
    5. (Optional) To ingest a single document, specify the DocDisplayId and DocId, or click Attach file, and select a file.
      The DocDisplayId or DocId is the unique ID of the single document that you want to upload, such as the article display ID in BMC Helix ITSM: Knowledge Management, content ID in BMC Helix Knowledge Management by ComAround, and article UUID or content ID in BMC Helix Business Workflows.
      The following table shows the usage of DocDisplayId and DocId:

      Data source

      Inputs

      Example

      Scope

      Notes

      ITSM

      NA

      NA

      Closed Incidents associated with BMC Helix ITSM: Knowledge Management knowledge articles

      NA

      RKM

      NA

      NA

      All RKM articles

      NA

      RKM

      DocId = <article instance ID>

      Datasource = RKM
      DocId = KMHAA5V0GPLUUANDADAXGA6CSQG49C

      A single RKM article

      Use the instance ID of RKM:KnowledgeArticleManager.

      RKM

      DocDisplayId = <article display ID>

      Datasource = RKM
      DocDisplayId = KBA90000067

      A single RKM article

      The display ID is visible in the BMC Helix ITSM: Knowledge Management user interface.

      HKM

      NA

      NA

      All HKM articles

      NA

      HKM
       

      DocId = <article "content ID">

      Datasource = HKM
      DocId = 1721446-2537-1033-1772837

      A single HKM article

      NA

      HKM
       

      ConnectionId = <Connection_HKM record ID>

      Datasource = HKM
      ConnectionId = AGGADGG8ECDC2ASI46SDSI46SD3O1X

      All HKM articles while given user is being impersonated.

      Using a connection allows a user to impersonate another user when connecting to BMC Helix Knowledge Management by ComAround. It is sometimes needed because the default IS user might not have the correct group mappings in BMC Helix Knowledge Management by ComAround. To specify such a user, you must create or update a record in the Connection_HKM record definition.

      BWF

      NA

      NA

      All BWF articles

      NA

      BWF

      DocId = <article UUID>

      DocId = AGGADG1AAP0ICAOQVYJ6OPZVOTL7BU

      A single BWF article

      Use field 379 of BWF:KnowledgeArticleTemplate.

      BWF

      DocDisplayId = <article "Content ID">

      DocDisplayId = KA-000000000007

      A single BWF article

      The ID is visible in the BMC Helix Business Workflows user interface.

       

    6. To run the job immediately, enable the Execute now Execute now button toggle key.
    7. If you are updating data, in the ModifiedSince field, specify the date and time since it was last updated.
      Use this option for incremental updates, meaning only indexed documents modified since a date. 
    8. To delete the data from BMC HelixGPT that has been deleted from the source, select Sync deletions Sync deletions button
      The following screen shows an example of creating a new data connection job:
      Data connection job
  1. Click Save.

Repeat the steps to add multiple data connection jobs.

 

Verifying data ingestion

Data ingestion takes place one item at a time, and the time required for the ingestion to be completed depends on the number of documents to be ingested and the amount of data. If a user asks queries during data ingestion, the responses might be incorrect or incomplete. Therefore, it is important to verify that data ingestion is completed successfully.

  1. Log in to BMC Helix Innovation Studio. 
  2. On the Workspace tab, click HelixGPT Manager
  3. On the Records tab, select the DataConnectionJobStep record definition and click Edit data.
  4. Verify that the status of the job that you created is DONE.
    The following image shows sample jobs with the DONE status:
    Data ingestion verification

 

Result

The following screenshot shows BMC HelixGPTfetching data from a PDF file attached to a record definition:

23_3_04_Result.png

 

 

 

Tip: For faster searching, add an asterisk to the end of your partial query. Example: cert*