Troubleshooting performance latency issues

Latency in BMC HelixGPT's performance might occur due to the following reasons:

Changes in parameter values in the default skill configuration.
A proxy or gateway is configured between BMC HelixGPT and the LLM provider.
Changes in the LLM model or client configuration parameters.

Issue symptoms

End users experience latency in responses from BMC HelixGPT. Sometimes,BMC HelixGPT takes more than a minute to generate a response.

Issue scope

This issue can occur in the following scenarios:

Change in the default skill configuration parameters.
A proxy or gateway is configured between BMC HelixGPT and the LLM provider.
Lack of optimization of client and LLM model configuration parameters for specific requirements.

Resolution

For agentic skills, we recommend that the parameter value for numberOfDocumentsToReturn is set to 5 or less.
Increasing the default value can sometimes affect performance. For more information, see Updating the configuration parameters of a skill.
If a proxy or gateway is configured between BMC HelixGPT and the LLM provider, make sure to allocate sufficient memory and CPU to the proxy server, optimize its configuration, and enable resource usage monitoring.
Optimize the client and LLM model configuration parameters for your specific requirements. To optimize the model configurations, based on your environment requirements, perform any or all of the following steps:

Add custom headers

This configuration can be added to all LLM models, such as Azure, OpenAI, Llama, and Gemini, to improve streaming performance when a proxy or gateway is configured between BMC HelixGPT and the LLM model.
Add the following configurations to the model's default configurations. For more information about updating skill configurations, see Updating the configuration parameters of a skill.

"customHeaders": [
{"name": "X-Accel-Buffering", "value": "no"},
{"name": "Cache-Control", "value": "no-cache, no-store"},
{"name": "Connection", "value": "keep-alive"}
]

Enable custom HTTP client

This configuration can be added only to Azure OpenAPI models to improve network stability and performance by optimizing timeouts, connection pooling, and persistent (keep-alive) connections.
Add the following configurations to the model's default configurations:

"httpxClientEnabled": true
The following default settings are configured in a model when the custom HTTPS client is enabled. You can customize these settings as per your requirements. However, we recommend that you do not change the default configurations.

"httpxClientEnabled": true
"httpxClientConfig": {
"readTimeout": 300.0,
"connectTimeout": 120.0,
"writeTimeout": 120.0,
"poolTimeout": 120.0,
"maxConnections": 100,
"maxKeepaliveConnections": 20,
"keepaliveExpiry": 30.0,
"http2Enabled": false,
"followRedirects": true
}

Increase HTTP Connection Pool Max Size

The default value for this configuration is 10. But, for high-concurrency environments, we recommend setting the value to 50. To optimize the system configurations, increase the HTTP Connection Pool Max Size.
Add the following configuration to your .env file and restart the application for the updates to take effect.
#.env file
HTTP_CONNECTION_POOL_MAX_SIZE=50

Latency caused by LLM provider rate limiting
In some environments, latency might occur due to rate-limiting configurations on the LLM provider side.
For example, if the configured requests per minute or tokens per minute limits are exceeded, the provider might return HTTP 429 (Too Many Requests) errors, causing delayed responses in BMC HelixGPT.
Review the provider configuration and adjust the rate-limiting parameters as required. For more information, see the knowledge article BMC HelixGPT - Using HelixGPT Assistant Is Returning Slow Results.

Troubleshooting performance latency issues

Issue symptoms

Issue scope

Resolution

Latency caused by LLM provider rate limiting

BMC HelixGPT 26.1

On this page