Configuring guardrails for AI agents
Guardrails help administrators secure AI agents by enforcing security controls across the agent lifecycle.
Integrated into HelixGPT Agent Studio, guardrails validate prompts and responses for security and compliance, enforce least-privilege access, monitor runtime behavior for policy violations, and ensure visibility through audit logs.
Administrators can enable and configure guardrails manually at the global and agent levels to meet specific security and governance requirements.
With guardrails, administrators can reduce risks such as prompt injection, data poisoning, unauthorized actions, and exposure of sensitive information, while maintaining control and visibility.
Benefits of using guardrails
| Business challenge | How guardrails help | Result |
|---|---|---|
| AI agents exposed to misuse and attacks | Validates prompts and responses and blocks prompt injection, jailbreaks, and manipulation. | Safer AI interactions with reduced security risk. |
| Limited control over agent actions | Enforces least‑privilege permissions and policy‑based runtime checks. | Controlled and predictable agent behavior. |
| Limited visibility and governance | Provides logs for consistent runtime enforcement. | Better governance, auditability, and trust at scale. |
Guardrail configuration during fresh installation and upgrade
| Scenario | Agent type | Guardrail configuration |
|---|---|---|
| Fresh installation | Prebuilt agents | Manually enable guardrail configuration. |
| New custom agent | Guardrail configuration is added by default. | |
| Upgrade | Prebuilt agents | Manually enable guardrail configuration. |
| New custom agent | Guardrail configuration is added by default. | |
| Existing custom agents | Manually enable guardrail configuration. |
Supported models
Guardrails support the following LLM models for security checks and enforcement:
- Azure OpenAI
- OpenAI
- Gemini
- Bring your own LLM for guardrail validation
Administrators can set up a separate, lower-cost model just for guardrail checks, while the main AI agent continues to use its primary model to generate responses.
Guardrail policies
Guardrails include prebuilt security policies that protect BMC HelixGPT agents from common AI‑specific threats. The following table describes prebuilt policies:
| Guardrail policy | Description |
|---|---|
| Prompt injection | Prevents malicious or hidden inputs from altering the system's intended behavior. |
| Role manipulation | Blocks attempts to change or override the assistant's assigned role or authority. |
| System prompt extraction | Protects internal system instructions from being exposed to users. |
| Jailbreak detection | Identifies and stops attempts to bypass safety and policy restrictions. |
| Instruction injection detection | Detects unauthorized or harmful instructions embedded within user inputs. |
| Context manipulation detection | Prevents tampering with or misusing conversation context to influence responses. |
These policies are applied during agent configuration and enforced at runtime. Administrators can review and adjust how these policies are applied to specific agents, but cannot add new detection categories beyond those provided.
Best practices for configuring guardrails
Configure guardrails when building agents
Keep guardrails enabled by default when configuring and building BMC HelixGPT agents.
- Make sure that the guardrails are applied in HelixGPT Agent Studio.
Use default guardrail policies
Use the default guardrail policies provided for each BMC HelixGPT agent use case and subject data.
- Modify policies only when required to meet organizational security standards or regulatory requirements.
Validate agent prompts and responses
Enable input validation to detect and block prompt injection, jailbreak attempts, role manipulation, and the extraction of system prompts.
- Enable output validation to prevent sensitive data leakage, exposure of internal configuration, or unsafe responses.
(Optional) To select a low-cost model for security checks
- Open BMC Helix Innovation Studio > Administration tab.
- Click HelixGPT > Connections > AI Service Providers.
- Select a model.
- In the Edit model dialog box, select a value from the Guardrail model drop-down list, and click Save.
The following screenshot shows the selected option:
To enable guardrails for all agents
- Open BMC Helix Innovation Studio > Workspace tab.
- Select HelixGPT Agent Studio.
- On the Records tab, select GuardRailSettings.
- Click Edit data.
- In Data editor (GuardRailsSetting), select enableGuardRails and click Edit.
- In the Value field, enter true and click Save.

To modify the guardrail configuration
- Open BMC Helix Innovation Studio > Workspace tab.
- Select HelixGPT Agent Studio.
- On the Records tab, select GuardRailSettings.
- Click Edit data.
- On the Data editor (GuardRailsSetting), from the following available options, select the guardrail policy record definition for which you want to modify settings:
- Prompt injection detection
- Role manipulation detection
- System prompt extraction prevention
- Jailbreak detection
- Context manipulation detection
- Instruction injection detection
- In the Edit record dialog box, update the Value field as per your requirement, and click Save.

To add a guardrail policy for an agent
- From the Application launcher, select HelixGPT Agent Studio.
- Click the Agents tab and select the agent for which you want to add a guardrail policy.
- Click the General Configuration tab, and click Guardrail Configuration.
- Add the following configuration and click Save changes.
.# NeMo Guardrails Configuration for Prompt Injection Prevention
# This configuration file defines the behavior of the guardrails for detecting
# and preventing prompt injection attacks.
# Model configuration will be dynamically overridden from environment variables
# by the middleware at runtime
models:
- type: main
engine: openai
model: gpt-3.5-turbo
# Rails configuration
rails:
input:
flows:
- prompt injection detection
- role manipulation detection
- jailbreak detection
- system prompt extraction prevention
- instruction injection detection
- context manipulation detection
To turn off the guardrail policy for an agent
- From the Application launcher, select HelixGPT Agent Studio.
- On the Agents tab, select the agent for which you want to turn off the guardrail policy.
- Click the General Configuration tab, and click Guardrail Configuration.

Remove the pre-built guardrails configuration, and save the changes.
To view guardrail logs
- Open BMC Helix Innovation Studio > Workspace tab.
- Click HelixGPT Agents Studio.
- On the Records tab, select GuardRailAuditLogs.
- Click Edit data.
- Select a log record and click Edit.
The following details are displayed:Field name Description Attack type Classification of the detected threat, such as direct prompt injection, jailbreak attempt, role manipulation, system prompt extraction, context manipulation, instruction injection, or obfuscation-based attack. Detection message The reason the input or output was flagged. User content Truncated snippet of the user input or agent response. User ID Identifier of the user whose input triggered the detection. Session ID Identifier of the session in which the detection occurred. Agent ID Identifier of the agent involved, if applicable. Request ID Identifier of the request or conversation. Blocked Indicates whether the request was blocked (true) or only monitored (false). Source Origin of detection, such as middleware (input validation), output check (response validation), agent, or tool. Timestamp UTC time when the detection occurred.
Results
The following screenshot shows the response sent for a sensitive query, such as a request for system prompts.
Related topic