Configuring the summarization middleware for agents
Summarization middleware helps administrators ensure that the AI agents can handle long conversations without losing important context. When a conversation reaches a predefined limit for messages or tokens, the agent automatically summarizes the older parts of the conversation into a single, clear summary. At the same time, it preserves the most recent messages unchanged and keeps key contextual information available without retaining the entire message history. This approach helps users to follow and continue long conversations without interruption, while enabling the agent to retain recent context and key documents. By reducing the amount of conversation history sent to the large language model (LLM), it also helps control usage and operational costs.
Availability and enablement of summarization across agent types
Summarization middleware support is provided either through a built‑in middleware or via API integration, depending on the agent type:
- Summarization middleware is available for the React Agent, Supervisor Agent, and Catalog Request Agent.
- For other agent types, the agent owner can enable the summarization feature by using summarize_messages and build_summarized_messages APIs.
For example, an agent owner can call summarize_messages to generate a summary of older conversation messages and then use build_summarized_messages to replace those messages with the summary while keeping recent messages unchanged.
Example:summary = summarize_messages(
messages=conversation_history,
max_tokens=800
)
updated_history = build_summarized_messages(
summary=summary,
keep_last_messages=20
)
To configure the summarization middleware
- Log in to the HelixGPT Agent Studio.
- On the Agents tab, select the agent for which you want to enable history summarization.
- Go to the General configuration tab and expand the Behavior settings.
- In the Configuration box, add the following middleware parameter and click Save changes.
"middleware": [
{
"type": "summary",
"config": {
"trigger": ["messages", 60],
"keep": ["messages", 20]
}
}
]
The trigger value defines when summarization must occur, and the keep value specifies how many recent messages must be preserved in full. The administrator can adjust both values as required.
| Parameter | Description | Default value | Recommended value |
|---|---|---|---|
| trigger | Specifies when summarization runs based on conversation size. Summarization is triggered when the configured limit is reached, such as the number of messages, the total number of tokens, or a fraction of the model’s context window. Multiple trigger conditions can be defined, and summarization runs when any one condition is met. | Not set. If not specified, summarization does not run automatically. | ["tokens", 3000–4000] for long troubleshooting sessions, or ["messages", 30–50] for message‑heavy conversations. |
| keep | Defines how much of the most recent conversation history is preserved unchanged when summarization runs. Older messages outside this limit are summarized. This helps retain recent context while reducing overall conversation size. | ["messages", 20] |
The following screenshot shows an example of adding summarization middleware for an agent:

In this example, summarization is triggered when the conversation reaches either 3,000 tokens or 30 messages, whichever happens first. When summarization runs, the agent keeps the most recent 20 messages unchanged and summarizes the older messages. The keep_document: 3 setting ensures that up to three important referenced documents are preserved without summarization. This approach helps the agent continue long conversations smoothly, retain recent context and key documents, and reduce the amount of text sent to the language model, which helps control usage and cost.