Testing models by using BMC HelixGPT Agent Studio Test Automation


 

HelixGPT Agent Studio test automation empowers BMC HelixGPT administrators to efficiently evaluate model performance, automate scenario testing, and analyze metrics, enabling selection of the most accurate and cost-effective model while ensuring consistent outcomes for AI agents in HelixGPT Agent Studio.

To create test questions for a model

  1. Log in to HelixGPT Agent Studio and click Testing.
  2. Click the Questions tab.
  3. Perform one of the following procedures:
  • To create a set of questions, perform the following steps:
    1. Click Upload.
    2. Attach the file or multiple files containing questions and their expected outputs.
      upload_questions.png
    3. Click Upload.
      1. In the Upload dialog box, attach the test set file(s) containing questions and their expected outputs.
        Supported file types: csv or xls
        Maximum file size: 20 MB
        Maximum file count: 4

        If you upload a set of questions, each question is displayed individually.
  • To create a single question, perform the following steps:
    1. Click Create question, and in the Create question dialog box, enter the question and the expected output.
      create_question.png
    2. Click Create.
      The question is added to the list of questions.

​To run a test question

  1. Log in to HelixGPT Agent Studio and click Testing.
  2. Select the Questions tab. 
  3. Select the question that you want to run, and click Run.

    run_dialog_box.png
  4. Specify the required fields and click Run.

To add questions to the test sets

  1. Log in to HelixGPT Agent Studio and click Testing.
  2. Click the Questions tab.
  3. Perform one of the following procedures:
  • To add the questions to an existing test set, perform the following steps:
    1. Select the questions that you want to add and then click Add to test set.
      add_to_test_set.png
    2. In the Create output folder dialog box, select the existing test set from the list, and click Add.
  • To add the questions to a new test set, perform the following steps:
    1. Select the questions that you want to add and then click Create test set.
      add_to_new_test_set.png
    2. In the Create a test set dialog box, specify the test set name and the type of use case, and click Create.
      Alternatively, if you select multiple questions and run them, it automatically creates a test set.

To upload and run a test set

  1. Log in to HelixGPT Agent Studio and click Testing.
  2. Click the Test sets tab.
  3. Click Upload to upload a test set.
    upload test set.png
  4. Specify the type of use case and attach the test set file(s) containing questions and their expected outputs.
    Supported file types: csv or xls
    Maximum file size: 20 MB
    Maximum file count: 4
  5. Click Upload.
  6. To run a test set, select it from the list, click Action, and select Run.

    run_a_test_set.png

To view the metrics for a specific run

  1. Log in to HelixGPT Agent Studio and click Testing.
  2. Select the Runs tab.
    The output folders containing the test set runs are displayed.
    output_folders_and_filtering.png
  3. Select the desired output folder.
    A list of test set runs is displayed.
    You can filter the records according to your requirements and search for a specific output folder or the test set run name by using search.

    search_for_output_folder.png
  4. To create a new folder in which you want to add the test set runs, click Add output folder.
  5. In the Create output folder dialog box, specify a name for the folder and click Create.
  6. Click the arrow on a test set run record to expand it.
    Metrics for the selected run are displayed.
    metrics.png

    The metrics help evaluate the most accurate and cost-effective model by using the passed and failed test cases.
    While running the test cases, the agent or a skill gives a response. Another AI model, called the LLM judge, checks how close that response is to the correct answer.
    The LLM judge assigns a score between 1 and 10, where a higher score means the response is more similar to the expected answer. This score is then compared against a threshold to decide whether the test case is passed or failed. The default threshold is 5, but the administrator can change the default value as needed. 
    Example:
    If the model response is Try rebooting your system and the correct answer is Please restart your computer, the LLM judge might give it a score of 8.
    If the score is 5 or more, the test case is marked as passed, and if it is less than 5, the test case is marked as failed. In this way, the results reflected in metrics help assess a model's performance.

 

 

Tip: For faster searching, add an asterisk to the end of your partial query. Example: cert*

BMC HelixGPT 25.4