Ingesting data into BMC HelixGPT
After defining the data sources for the chatbot, knowledge article search, and summarization use cases, you must ingest data into the database through data connection jobs. The data connection jobs collect data from the configured data sources and ingest the data into the BMC HelixGPT database.
After the data is ingested, users receive responses to their queries from the information that is ingested into the BMC HelixGPT database. Users have a seamless experience of getting the appropriate answer across multiple data sources.
To ingest data into BMC HelixGPT you can set scheduler rules available out-of-the-box.
You can enable BMC HelixGPT to read text from attachments linked to BMC Helix Innovation Studio record definitions.
BMC HelixGPT can read text data from the following attachment types:
- txt
- docx
However, the attachments linked to a record definition are unavailable as an out-of-the-box data source. You must create a data connection if you plan to use text from attachments.
For more information about using text from attachments linked to record definitions, see Read text from attachments.
Before you begin
You must have the HelixGPT Administrator role to ingest data into BMC HelixGPT.
Process for setting up BMC HelixGPT
The following image shows the process of setting up BMC HelixGPT and the current step that you are on:
(Optional) Configure tags in data sources for indexing
Adding tags as metadata in data sources helps you differentiate the content indexed from the different connections of the same data source. You can add tags relevant to your data sources, such as product name and version.
Before you add tags to your data sources, make sure to update your router prompt with the same tags. Note that the tags are case sensitive.
The following code is a sample router prompt with the tags defined in it:
To add tags to data sources:
- On the Innovation Studio > Workspace tab, select HelixGPT Manager.
- Select the Connection record definition and click Edit data.
The Data editor page opens.
The tags you add must be the same as those added in the router prompt. - In Data editor, click the data source for which you want to add the tags.
The Edit record window opens. - In the Edit record window, add the required tags.
- Click Save.
Ingesting data in BMC HelixGPT
Perform the following tasks to ingest data into BMC HelixGPT :
Step | Action | Reference |
---|---|---|
1 | Set scheduler rules available out-of-the-box. | |
(Optional) Create a connection to read text from attachments. | ||
2 | Create a data connection job. | |
3 | Verify the data connection. |
Task 1: To ingest data by specifying a schedule
Scheduling the index update helps to keep the data sources updated. Having an up-to-date index helps models find the required data more efficiently. It also helps to keep the indexes updated with the latest changes.
- In BMC HelixGPT Manager, click Settings.
- Select HelixGPT > Connections > Information sources.
- Click the connection name for which you want to add the schedule.
The Edit connection window opens. - Click Schedule index updates.
The Schedule section opens. - In the Schedule section, specify the following fields:
Field Description Month dates Select the dates of the month when you want to run the index update job. Days of the week Select the days of the week when you want to run the index update job. Time Set the time to run the job on the scheduled dates and days. - Click Save.
- To add the schedule for index updates for new connections, first add a connection and then edit it to configure the schedule.
- When you select the dates of the month and the days of the week, the indexing job runs on the selected dates and also on the selected days.
- The index updates already exist for the out-of-the-box information sources; you can change the existing configuration by following the steps in this section.
(Optional) To read text from attachments linked to BMC Helix Innovation Studio record definitions
You can enable BMC HelixGPT to read text from attachments linked to a record definition. To do this, you must connect with an existing record definition.
- On the Innovation Studio > Workspace tab, select HelixGPT Manager.
- Select a record definition and click Edit data.
The data editor is displayed. - In Data editor, click New to add a new record.
The New record dialog box is displayed. - On the File tab, select an attachment that you want to add.
- You can add a text file, a Microsoft Word file, or a PDF file as an attachment.
- In the Name field, enter the name of the file you want to attach.
- Click Save.
The new record definition you created is available in the HelixGPT Manager.
To create a connection record
- In the HelixGPT Manager, select Connection_Record_Definition and click Edit data.
- Click New to create a new connection record definition.
A new record dialog box is displayed:
- In the DataSource ID list, select RECORD_DEFINITION.
- In the Field ID field, enter the Field ID of the File field on the record definition.
- In the Record Definition field, enter the name of the record definition you have given.
- In the Name field, enter the name of the record definition.
- Click Save.
A connection record is created to read data from attachments.
Task 2: To ingest data by creating a data connection job
You can ingest data into the BMC HelixGPT database by creating a data connection job. All published documents are ingested from the data sources into BMC HelixGPT. From the SharePoint and Confluence data sources, attached documents, such as PDFs, Microsoft Word documents, and plain text files, are ingested. You can also ingest a single document by specifying the document or article ID. However, The SharePoint web pages are not ingested.
- Log in to BMC Helix Innovation Studio.
- On the Workspace tab, click HelixGPT Manager.
- On the Records tab, select the DataConnectionJob record definition and click Edit data, as shown in the following image:
- On the Data Editor (DataConnectionJob) page, click New.
- In the New Record pane, specify the following information:
In the Data source field, enter one of the following data sources:
Data source
Value to be entered
BMC Helix Business Workflows
BWF
BMC Helix Knowledge Management by ComAround
HKM
BMC Helix ITSM: Knowledge Management
RKM
BMC Helix ITSM
ITSM
Confluence
CNF
Microsoft SharePoint Online
SPT
Web
WEB
BMC Helix Customer Service Management
CSM
Salesforce Knowledge
SALESFORCE_KNOWLEDGE
Attachments of a record defintion
RECORD_DEFINITION
- Specify a description for the data connection job.
- (Optional) Specify the Assignee.
- Specify the Connection ID.
The Connection ID is the ID that you noted when you added the data source successfully in HelixGPT Manager in Adding-data-sources-in-BMC-HelixGPT. (Optional) To ingest a single document, specify the DocDisplayId and DocId, or click Attach file, and select a file.
The DocDisplayId or DocId is the unique ID of the single document that you want to upload, such as the article display ID in BMC Helix ITSM: Knowledge Management, content ID in BMC Helix Knowledge Management by ComAround, and article UUID or content ID in BMC Helix Business Workflows.
The following table shows the usage of DocDisplayId and DocId:Data source
Inputs
Example
Scope
Notes
ITSM
NA
NA
Closed Incidents associated with BMC Helix ITSM: Knowledge Management knowledge articles
NA
RKM
NA
NA
All RKM articles
NA
RKM
DocId = <article instance ID>
Datasource = RKM
DocId = KMHAA5V0GPLUUANDADAXGA6CSQG49CA single RKM article
Use the instance ID of RKM:KnowledgeArticleManager.
RKM
DocDisplayId = <article display ID>
Datasource = RKM
DocDisplayId = KBA90000067A single RKM article
The display ID is visible in the BMC Helix ITSM: Knowledge Management user interface.
HKM
NA
NA
All HKM articles
NA
HKM
DocId = <article "content ID">
Datasource = HKM
DocId = 1721446-2537-1033-1772837A single HKM article
NA
HKM
ConnectionId = <Connection_HKM record ID>
Datasource = HKM
ConnectionId = AGGADGG8ECDC2ASI46SDSI46SD3O1XAll HKM articles while given user is being impersonated.
Using a connection allows a user to impersonate another user when connecting to BMC Helix Knowledge Management by ComAround. It is sometimes needed because the default IS user might not have the correct group mappings in BMC Helix Knowledge Management by ComAround. To specify such a user, you must create or update a record in the Connection_HKM record definition.
BWF
NA
NA
All BWF articles
NA
BWF
DocId = <article UUID>
DocId = AGGADG1AAP0ICAOQVYJ6OPZVOTL7BU
A single BWF article
Use field 379 of BWF:KnowledgeArticleTemplate.
BWF
DocDisplayId = <article "Content ID">
DocDisplayId = KA-000000000007
A single BWF article
The ID is visible in the BMC Helix Business Workflows user interface.
- To run the job immediately, enable the Execute now
toggle key.
- If you are updating data, in the ModifiedSince field, specify the date and time since it was last updated.
Use this option for incremental updates, meaning only indexed documents modified since a date. - To delete the data from BMC HelixGPT that has been deleted from the source, select Sync deletions
The following screen shows an example of creating a new data connection job:
- Click Save.
Repeat the steps to add multiple data connection jobs.
Verifying data ingestion
Data ingestion takes place one item at a time, and the time required for the ingestion to be completed depends on the number of documents to be ingested and the amount of data. If a user asks queries during data ingestion, the responses might be incorrect or incomplete. Therefore, it is important to verify that data ingestion is completed successfully.
- Log in to BMC Helix Innovation Studio.
- On the Workspace tab, click HelixGPT Manager.
- On the Records tab, select the DataConnectionJobStep record definition and click Edit data.
- Verify that the status of the job that you created is DONE.
The following image shows sample jobs with the DONE status:
Result
The following screenshot shows BMC HelixGPTfetching data from a PDF file attached to a record definition:
Where to go from here
Provisioning-and-setting-up-the-generative-AI-provider-for-your-application
Related topics