Configuring the summarization middleware for agents


Summarization middleware helps administrators ensure that the AI agents can handle long conversations without losing important context. When a conversation reaches a predefined limit for messages or tokens, the agent automatically summarizes the older parts of the conversation into a single, clear summary. At the same time, it preserves the most recent messages unchanged and keeps key contextual information available without retaining the entire message history. This approach helps users to follow and continue long conversations without interruption, while enabling the agent to retain recent context and key documents. By reducing the amount of conversation history sent to the large language model (LLM), it also helps control usage and operational costs.

Information
Scenario

Apex Global uses an AI agent to help IT teams troubleshoot complex incidents, such as database outages affecting payroll systems, repeated application crashes caused by memory leaks, and network latency issues traced across multiple services. These investigations often require long, detailed conversations. As investigations progress, earlier parts of the discussion become less critical, while recent findings and decisions remain essential. To support this workflow, an administrator enables summarization so the agent automatically condenses older conversation history and keeps only the most recent and relevant details. This capability prevents the agent from breaking during extended sessions due to LLM context limits, eliminates the need for restarts or repeated explanations, and enables operators to stay focused on resolving the issue. By reducing the amount of text sent to an LLM, Apex Global also controls LLM usage and lowers operational costs.

Availability and enablement of summarization across agent types

Summarization middleware support is provided either through a built‑in middleware or via API integration, depending on the agent type:

  • Summarization middleware is available for the React Agent, Supervisor Agent, and Catalog Request Agent.
  • For other agent types, the agent owner can enable the summarization feature by using summarize_messages and build_summarized_messages APIs.

    For example, an agent owner can call summarize_messages to generate a summary of older conversation messages and then use build_summarized_messages to replace those messages with the summary while keeping recent messages unchanged.
    Example:

    summary = summarize_messages(
      messages=conversation_history,
      max_tokens=800
    )

    updated_history = build_summarized_messages(
      summary=summary,
      keep_last_messages=20
    )

To configure the summarization middleware

  1. Log in to the HelixGPT Agent Studio.
  2. On the Agents tab, select the agent for which you want to enable history summarization.
  3. Go to the General configuration tab and expand the Behavior settings.
  4. In the Configuration box, add the following middleware parameter and click Save changes.

    "middleware": [
      {
        "type": "summary",
        "config": {
          "trigger": ["messages", 60],
          "keep": ["messages", 20]
        }
      }
    ]

The trigger value defines when summarization must occur, and the keep value specifies how many recent messages must be preserved in full. The administrator can adjust both values as required. 

ParameterDescriptionDefault valueRecommended value
trigger

Specifies when summarization runs based on conversation size. Summarization is triggered when the configured limit is reached, such as the number of messages, the total number of tokens, or a fraction of the model’s context window. Multiple trigger conditions can be defined, and summarization runs when any one condition is met.

Not set. If not specified, summarization does not run automatically.

["tokens", 3000–4000] for long troubleshooting sessions, or

["messages", 30–50] for message‑heavy conversations.

keep

Defines how much of the most recent conversation history is preserved unchanged when summarization runs. Older messages outside this limit are summarized. This helps retain recent context while reducing overall conversation size.

["messages", 20]

 

The following screenshot shows an example of adding summarization middleware for an agent:

Add summarization middleware to an agent

In this example, summarization is triggered when the conversation reaches either 3,000 tokens or 30 messages, whichever happens first. When summarization runs, the agent keeps the most recent 20 messages unchanged and summarizes the older messages. The keep_document: 3 setting ensures that up to three important referenced documents are preserved without summarization. This approach helps the agent continue long conversations smoothly, retain recent context and key documents, and reduce the amount of text sent to the language model, which helps control usage and cost.

 

Tip: For faster searching, add an asterisk to the end of your partial query. Example: cert*

BMC HelixGPT 26.1