Bringing your own integrations


This topic describes how you can use the Bring Your Own Integrations feature to seamlessly integrate generative AI services into your application, tailored to your unique use case. You can use BMC-provided LLMs or connect external ones for greater flexibility and control over your AI-powered solutions.

Using API keys for authentication

BMC AMI Platform API uses API keys for authentication. You can download API keys for integration at the account level.  

You can set each API key to one of the following settings:

  • Integration ID—Unique identifier of an integration for internal use. 
  • Integration key—Provides access to an integration. We highly recommend transitioning to project keys for best security practices, although access via this method is still supported. 
Warning

Important

Keep your API key confidential. Do not share it or expose it in client-side code (browsers, apps). Route production requests through your back-end server, where the API key can be securely loaded from an environment variable or key management service.

All API requests should include your integration key in an Authorization HTTP header and integration ID as follows: 

Authorization: Bearer INTEGRATION_API_KEY
integration-id: INTEGRATION_ID

Generating a chat completion endpoint

You can create a model response for chat conversations.

To generate a chat endpoint, use the following command:

POST $INTEGRATION_PATH/generate

Parameter support varies depending on the model used to generate the response. 

BMC AMI AI Services

 provides models that support the following parameters, but locally added models, such as Bring Your Own LLM (BYOLLM), might not support all parameters.Request body

Name 

Type

Default

Optional

Description

messages 

 None

No 

Specifies a list of messages that make up the conversation up to this point.

frequency_penalty 

float 

0.0 

Yes 

Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood of repeating the same line verbatim. 

logit_bias 

dict[str, float] 

None 

Yes 

Modifies the likelihood of specified tokens appearing in the completion

This parameter accepts a JSON object that maps tokens (specified by their token ID in the model tokenizer) to an associated bias value from -100 to 100. You can use this tokenizer tool to convert text to token IDs. Mathematically, the bias is added to the logits generated by the model before sampling. The exact effect varies per model, but values between -1 and 1 should decrease or increase the likelihood of selection; values like -100 or 100 should result in a ban or exclusive selection of the relevant token. 

For example, you can pass {"50256": -100} to prevent the <|endoftext|> token from being generated. 

logprobs 

bool 

False 

Yes 

Specifies whether to return log probabilities of the output tokens. The true value returns the log probabilities of each output token returned in the content of the message parameter. 

top_logprobs 

int 

None 

Yes 

Specifies the number of most likely tokens to return at each token position, each with an associated log probability. 

Valid values are integers from 0 to 5.

To use this parameter, you must set the logprobs parameter value to true.

max_completion_tokens 

int 

None 

Yes 

Specifies an upper limit for the number of tokens that can be generated for a completion, including visible output tokens and reasoning tokens

int 

Yes 

Specifies how many completions to generate for each prompt

Warning

Important

This parameter generates many completions, so it can quickly consume your token quota. Use carefully and ensure that you have reasonable settings for max_tokens and stop

presence_penalty 

float 

0.0 

Yes 

Number between -2.0 and 2.0. Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood of talking about new topics. 

stop 

str or list[str] 

[] 

Yes 

Specifies up to four sequences where the API stops generating further tokens. The returned text will not contain the stop sequence. 

stream 

bool 

False 

Yes 

Specifies whether to stream back partial progress. If set, tokens are sent as data-only server-sent events as they become available, with the stream terminated by a data: [DONE] message. For more information about the event stream format for server-sent events, see the Server-Sent Events documentation on the MDN Web Docs.

Warning

Important

This parameter supports BMC-provided LLMs only.

stream_options 

None 

Yes 

Specifies options for streaming response. Only set this parameter when you set the stream parameter value to true.

temperature 

float 

None 

Yes 

Specifies which sampling temperature to use, between 0 and 2. Higher values, such as 0.8, make the output more random, while lower values, such as 0.2, make it more focused and deterministic. 

We generally recommend altering this or top_p, but not both. 

top_p 

float 

None 

Yes 

Specifies an alternative to sampling with temperature is nucleus sampling, in which the model considers the results of the tokens with top_p probability mass. So 0.1 means that only the tokens comprising the top 10 percent probability mass are considered. 

We generally recommend altering this parameter or the temperature parameter, but not both. 

Warning

Important

top_p value must be between 0.1 and 1.

best_of 

int 

None 

Yes 

Generates best_of completions server-side and returns the "best" (the one with the highest log probability per token). Results cannot be streamed. 

When used with n, best_of controls the number of candidate completions and n specifies how many to return. The best_of value must be greater than or equal to n

Warning

Important

Because this parameter generates many completions, it can quickly consume your token quota. Use this parameter carefully and make sure that you have reasonable settings for max_tokens and stop.

use_beam_search 

bool 

False 

Yes 

Specifies whether to use beam search instead of sampling

top_k 

int 

None 

Yes 

Controls the number of top tokens to consider. Set to -1 to consider all tokens. 

min_p 

float 

0.0 

Yes 

Specifies a float that represents the minimum probability for a token to be considered relative to the probability of the most likely token. Values must be in [0, 1]. Set to 0 to disable this parameter. 

repetition_penalty 

float 

None 

Yes 

Specifies a float that penalizes new tokens based on whether they appear in the prompt and the generated text so far. Values greater than 1 encourage the model to use new tokens, while values less than 1 encourage the model to repeat tokens. 

length_penalty 

float 

1.0 

Yes 

Specifies a float that penalizes sequences based on their length. This parameter is used in beam search. 

stop_token_ids 

list[int] 

[] 

Yes 

Specifies a list of tokens that stop the generation when they are generated. The returned output contains the stop tokens unless the stop tokens are special tokens. 

include_stop_str_in_output 

bool 

False 

Yes 

Specifies whether to include the stop strings in output text.

ignore_eos 

bool 

False 

Yes 

Specifies whether to ignore the EOS token and continue generating tokens after the EOS token is generated

min_tokens 

int 

Yes 

Specifies the minimum number of tokens to generate per output sequence before EOS or stop_token_ids can be generated. 

skip_special_tokens 

bool 

True 

Yes 

Specifies whether to skip special tokens in the output

spaces_between_special_tokens 

bool 

True 

Yes 

Specifies whether to add spaces between special tokens in the output.

truncate_prompt_tokens 

int 

None 

Yes 

If set to an integer k, this parameter uses only the last k tokens from the prompt (that is, left truncation).

Response

Name

Type

Optional

Description

id 

str 

No 

Unique identifier for the chat completion

choices 

No 

List of chat completion choices. Multiple choices are valid if n is greater than 1. 

created 

int 

No 

UNIX time stamp (in seconds) of when the chat completion was created

model 

str 

No 

Model used for the chat completion. 

object 

str 

No 

Object type, which is always chat completion

usage 

No 

Usage statistics for the completion request

ChatMessage 

Name

Type

Optional

Description

role 

No 

Specifies the user role (user, system, or assistant)

content 

str 

No 

Contains the query or input from the user

MessageRoleType 

Role

Value

USER 

user 

SYSTEM 

system 

AI 

assistant 

StreamOptions 

Name

Type

Default

Optional

Description

include_usage 

bool 

True 

Yes 

If set, an additional chunk is streamed before the data: [DONE] message. This chunk's usage field shows token usage statistics for the entire request, while the choices field is always an empty array. All other chunks include a usage field with a null value.

continuous_usage_stats 

bool 

False 

Yes 

When continuous_usage_stats is set to true, it tracks statistics continuously during the model run.

Choices

Name

Type

Optional

Description

index 

int 

No 

Index of the choice in the list of choices

message 

No 

Chat completion message is generated by the model

logprobs 

Yes 

Log probability information for the choice

finish_reason 

str 

No 

Valid values are as follows:

  • stop—The generation stopped because a specified stop condition was met, such as encountering a stop token or string.
  • length—The generation stopped because it reached the maximum number of specified tokens.
  • abort—The generation was stopped due to an error or interruption.

Message 

Name

Type

Optional

Description

content

str 

Yes 

Contents of the message

role

str 

No 

Role of the author of this message

Logprob

Name

Type

Optional

Description

content

Yes 

List of message content tokens with log probability information

LogprobData 

Name

Type

Optional

Description

top_logprobs

No

List of the most likely tokens and their log probability, at this token position. In rare cases, fewer than the number of requested top_logprobs are returned. 

TopLogprob

Name

Type

Optional

Description

token

str

No

Token name

logprob

number

No

Log probability of this token, if it is within the top 20 most likely tokens. Otherwise, the value -9999.0 is used to signify that the token is very unlikely. 

bytes

list

Yes

List of integers representing the UTF-8 bytes representation of the token. This parameter is useful when characters are represented by multiple tokens and their byte representations must be combined to generate the correct text representation. The value can be null if the token has no bytes representation.

Usage

Name

Type

Optional

Description

completion_tokens

int

No

Number of tokens in the generated completion

prompt_tokens

int

No

Number of tokens in the prompt

total_tokens

int

No

Total number of tokens used in the request (prompt and completion)

prompt_tokens_details

No

Breakdown of tokens used in the prompt

PromptTokensDetails

Name

Type

Optional

Description

cached_tokens

integer

No

Cached tokens present in the prompt

Getting the integration status (health endpoint)

You can use the health endpoint to check the status of the integration. It internally validates whether all dependent services and models are operational and functioning properly.

To generate the health endpoint, use the following command:

POST $INTEGRATION_PATH/health

When you run this endpoint, send the API keys mentioned in the authenticationsection.

Using a chat completion endpoint

To run your first API request, paste the following command into your terminal. Make sure to replace $INTEGRATION_PATH  with your integration path, $INTEGRATION_API_KEY with your integration key, and $INTEGRATION_ID with your integration ID.

Request

import httpx



# Replace $INTEGRATION_PATH with your integration path
URL = "$INTEGRATION_PATH/generate"

# Replace $INTEGRATION_API_KEY with your integration API key
BEARER_TOKEN = "$INTEGRATION_API_KEY"

# Replace $INTEGRATION_ID with your integration ID
INTEGRATION_ID = "$INTEGRATION_ID"

# Define the JSON data to be sent in the POST request
json_data = {
   "messages": [
        {
           "role": "system",
           "content": "Provide a concise summary of the provided text, limiting it to 100 words.",
        },
        {
           "role": "user",
           "content": "Artificial Intelligence (AI) agents are systems designed to perform specific tasks autonomously or with minimal human intervention, using machine learning and reasoning capabilities. These agents are capable of perceiving their environment, making decisions, and taking actions to achieve specific goals. AI agents can be categorized into reactive agents, which respond to their environment without internal models, and deliberative agents, which use internal models and reasoning to plan actions.\r\n\r\nOne of the key components of an AI agent is its ability to sense and act within its environment. This is often facilitated through sensors (which gather information about the world) and actuators (which execute actions based on decisions). The sophistication of an AI agent’s decision-making process can vary; some agents rely on predefined rules and logic, while others use advanced algorithms such as neural networks and reinforcement learning to adapt and optimize their behavior over time.\r\n\r\nAI agents can be applied in various domains, such as robotics, virtual assistants, and autonomous vehicles. In robotics, AI agents allow machines to perform tasks like object manipulation, navigation, and interaction with humans. Virtual assistants, like Siri and Alexa, are also examples of AI agents that help users with daily tasks such as setting reminders or answering questions. Autonomous vehicles, on the other hand, rely on AI agents to interpret sensor data, make real-time decisions, and navigate roads safely without human drivers.\r\n\r\nWhile AI agents offer significant benefits, they also present challenges, especially when it comes to ensuring ethical behavior, decision-making transparency, and avoiding unintended consequences. As AI technology continues to evolve, researchers and engineers are working on improving the capabilities of AI agents while addressing these concerns to ensure that they can be safely integrated into society.",
        },
    ]
}

# Define the headers, including the Bearer token in the Authorization header
headers = {
   "Authorization": f"Bearer {BEARER_TOKEN}",
   "integration-id": f"{INTEGRATION_ID}",
   "Content-Type": "application/json",
}

# Send the POST request using httpx
response = httpx.post(URL, json=json_data, headers=headers, timeout=600)

# Check if the request was successful
if response.status_code == 200:
   print(response.json())

Response

This request queries BMC AMI AI Services to summarize the content that you provide. The response should resemble the following response:

{
    "id": "chatcmpl-85282769098a4307b75a38f9aa857d1b",
    "object": "chat.completion",
    "created": 1738664849,
    "model": "mistralai/Mixtral-8x7B-Instruct-v0.1",
    "choices": [
        {
            "index": 0,
            "message": {
                "role": "assistant",
                "content": " Artificial Intelligence (AI) agents are systems that perform tasks autonomously using machine learning and reasoning. They can be reactive or deliberative, perceiving the environment through sensors and actuators. AI agents can be found in robotics, virtual assistants, and autonomous vehicles. While beneficial, they also pose ethical and safety challenges. As AI technology advances, researchers aim to improve capabilities while addressing these concerns. AI agents make decisions based on predefined rules or advanced algorithms like neural networks and reinforcement learning. They help with tasks in various domains, including object manipulation, navigation, and virtual assistance."
            },
            "logprobs": null,
            "finish_reason": "stop"
        }
    ],
    "usage": {
        "prompt_tokens": 400,
        "total_tokens": 525,
        "completion_tokens": 125,
        "prompt_tokens_details": {
            "cached_tokens": 0
        }
    }
}

Using a health endpoint

To use a health endpoint, see the following example. Make sure to replace $INTEGRATION_PATH with your integration path, $INTEGRATION_API_KEY with your integration key, and $INTEGRATION_ID with your integration ID. 

Response: A JSON object


Request

import httpx



# Replace $INTEGRATION_PATH with your integration path
URL = "$INTEGRATION_PATH/health"

# Replace $INTEGRATION_API_KEY with your integration API key
BEARER_TOKEN = "$INTEGRATION_API_KEY"

# Replace $INTEGRATION_ID with your integration ID
INTEGRATION_ID = "$INTEGRATION_ID"

# Define the headers, including the Bearer token in the Authorization header
headers = {
   "Authorization": f"Bearer {BEARER_TOKEN}",
   "integration-id": f"{INTEGRATION_ID}",
   "Content-Type": "application/json",
}

# Send the POST request using httpx
response = httpx.get(URL, headers=headers, timeout=60)

# Check if the request was successful
if response.status_code == 200:
   print(response.json())

 Response

{

   "data": null,

   "error": null

} 

 

Tip: For faster searching, add an asterisk to the end of your partial query. Example: cert*

BMC AMI Platform 1.4