Bringing your own integrations
Using API keys for authentication
BMC AMI Platform API uses API keys for authentication. You can download API keys for integration at the account level.
You can set each API key to the following setting:
- Integration key—Provides access to an integration. We highly recommend transitioning to project keys for best security practices, although access via this method is still supported.
All API requests should include your integration key in an Authorization HTTP header and integration ID as follows:
Generating a chat completion endpoint
You can create a model response for chat conversations.
To generate a chat endpoint, use the following command:
Parameter support varies depending on the model used to generate the response.
BMC AMI AI Services
provides models that support the following parameters, but locally added models, such as Bring Your Own LLM (BYOLLM), might not support all parameters.Request body
Name | Type | Default | Optional | Description |
|---|---|---|---|---|
messages | list[ChatMessage] | None | No | Specifies a list of messages that make up the conversation up to this point. |
frequency_penalty | float | 0.0 | Yes | Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood of repeating the same line verbatim. |
logit_bias | dict[str, float] | None | Yes | Modifies the likelihood of specified tokens appearing in the completion This parameter accepts a JSON object that maps tokens (specified by their token ID in the model tokenizer) to an associated bias value from -100 to 100. You can use this tokenizer tool to convert text to token IDs. Mathematically, the bias is added to the logits generated by the model before sampling. The exact effect varies per model, but values between -1 and 1 should decrease or increase the likelihood of selection; values like -100 or 100 should result in a ban or exclusive selection of the relevant token. For example, you can pass {"50256": -100} to prevent the <|endoftext|> token from being generated. |
logprobs | bool | False | Yes | Specifies whether to return log probabilities of the output tokens. The true value returns the log probabilities of each output token returned in the content of the message parameter. |
top_logprobs | int | None | Yes | Specifies the number of most likely tokens to return at each token position, each with an associated log probability. Valid values are integers from 0 to 5. To use this parameter, you must set the logprobs parameter value to true. |
max_completion_tokens | int | None | Yes | Specifies an upper limit for the number of tokens that can be generated for a completion, including visible output tokens and reasoning tokens |
n | int | 1 | Yes | Specifies how many completions to generate for each prompt |
presence_penalty | float | 0.0 | Yes | Number between -2.0 and 2.0. Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood of talking about new topics. |
stop | str or list[str] | [] | Yes | Specifies up to four sequences where the API stops generating further tokens. The returned text will not contain the stop sequence. |
stream | bool | False | Yes | Specifies whether to stream back partial progress. If set, tokens are sent as data-only server-sent events as they become available, with the stream terminated by a data: [DONE] message. For more information about the event stream format for server-sent events, see the Server-Sent Events documentation on the MDN Web Docs. |
stream_options | None | Yes | Specifies options for streaming response. Only set this parameter when you set the stream parameter value to true. | |
temperature | float | None | Yes | Specifies which sampling temperature to use, between 0 and 2. Higher values, such as 0.8, make the output more random, while lower values, such as 0.2, make it more focused and deterministic. We generally recommend altering this or top_p, but not both. |
top_p | float | None | Yes | Specifies an alternative to sampling with temperature is nucleus sampling, in which the model considers the results of the tokens with top_p probability mass. So 0.1 means that only the tokens comprising the top 10 percent probability mass are considered. We generally recommend altering this parameter or the temperature parameter, but not both. |
best_of | int | None | Yes | Generates best_of completions server-side and returns the "best" (the one with the highest log probability per token). Results cannot be streamed. When used with n, best_of controls the number of candidate completions and n specifies how many to return. The best_of value must be greater than or equal to n. |
use_beam_search | bool | False | Yes | Specifies whether to use beam search instead of sampling |
top_k | int | None | Yes | Controls the number of top tokens to consider. Set to -1 to consider all tokens. |
min_p | float | 0.0 | Yes | Specifies a float that represents the minimum probability for a token to be considered relative to the probability of the most likely token. Values must be in [0, 1]. Set to 0 to disable this parameter. |
repetition_penalty | float | None | Yes | Specifies a float that penalizes new tokens based on whether they appear in the prompt and the generated text so far. Values greater than 1 encourage the model to use new tokens, while values less than 1 encourage the model to repeat tokens. |
length_penalty | float | 1.0 | Yes | Specifies a float that penalizes sequences based on their length. This parameter is used in beam search. |
stop_token_ids | list[int] | [] | Yes | Specifies a list of tokens that stop the generation when they are generated. The returned output contains the stop tokens unless the stop tokens are special tokens. |
include_stop_str_in_output | bool | False | Yes | Specifies whether to include the stop strings in output text. |
ignore_eos | bool | False | Yes | Specifies whether to ignore the EOS token and continue generating tokens after the EOS token is generated |
min_tokens | int | 0 | Yes | Specifies the minimum number of tokens to generate per output sequence before EOS or stop_token_ids can be generated. |
skip_special_tokens | bool | True | Yes | Specifies whether to skip special tokens in the output |
spaces_between_special_tokens | bool | True | Yes | Specifies whether to add spaces between special tokens in the output. |
truncate_prompt_tokens | int | None | Yes | If set to an integer k, this parameter uses only the last k tokens from the prompt (that is, left truncation). |
Response
Name | Type | Optional | Description |
|---|---|---|---|
id | str | No | Unique identifier for the chat completion |
choices | list[Choices] | No | List of chat completion choices. Multiple choices are valid if n is greater than 1. |
created | int | No | UNIX time stamp (in seconds) of when the chat completion was created |
model | str | No | Model used for the chat completion. |
object | str | No | Object type, which is always chat completion |
usage | Usage | No | Usage statistics for the completion request |
ChatMessage
Name | Type | Optional | Description |
|---|---|---|---|
role | MessageRoleType | No | Specifies the user role (user, system, or assistant) |
content | str | No | Contains the query or input from the user |
MessageRoleType
Role | Value |
|---|---|
USER | user |
SYSTEM | system |
AI | assistant |
StreamOptions
Name | Type | Default | Optional | Description |
|---|---|---|---|---|
include_usage | bool | True | Yes | If set, an additional chunk is streamed before the data: [DONE] message. This chunk's usage field shows token usage statistics for the entire request, while the choices field is always an empty array. All other chunks include a usage field with a null value. |
continuous_usage_stats | bool | False | Yes | When continuous_usage_stats is set to true, it tracks statistics continuously during the model run. |
Choices
Name | Type | Optional | Description |
|---|---|---|---|
index | int | No | Index of the choice in the list of choices |
message | list[Message] | No | Chat completion message is generated by the model |
logprobs | list[Logprob] | Yes | Log probability information for the choice |
finish_reason | str | No | Valid values are as follows:
|
Message
Name | Type | Optional | Description |
|---|---|---|---|
content | str | Yes | Contents of the message |
role | str | No | Role of the author of this message |
Logprob
Name | Type | Optional | Description |
|---|---|---|---|
content | List[LogprobData] | Yes | List of message content tokens with log probability information |
LogprobData
Name | Type | Optional | Description |
|---|---|---|---|
top_logprobs | list[TopLogprob] | No | List of the most likely tokens and their log probability, at this token position. In rare cases, fewer than the number of requested top_logprobs are returned. |
TopLogprob
Name | Type | Optional | Description |
|---|---|---|---|
token | str | No | Token name |
logprob | number | No | Log probability of this token, if it is within the top 20 most likely tokens. Otherwise, the value -9999.0 is used to signify that the token is very unlikely. |
bytes | list | Yes | List of integers representing the UTF-8 bytes representation of the token. This parameter is useful when characters are represented by multiple tokens and their byte representations must be combined to generate the correct text representation. The value can be null if the token has no bytes representation. |
Usage
Name | Type | Optional | Description |
|---|---|---|---|
completion_tokens | int | No | Number of tokens in the generated completion |
prompt_tokens | int | No | Number of tokens in the prompt |
total_tokens | int | No | Total number of tokens used in the request (prompt and completion) |
prompt_tokens_details | No | Breakdown of tokens used in the prompt |
PromptTokensDetails
Name | Type | Optional | Description |
|---|---|---|---|
cached_tokens | integer | No | Cached tokens present in the prompt |
Getting the integration status (health endpoint)
You can use the health endpoint to check the status of the integration. It internally validates whether all dependent services and models are operational and functioning properly.
To generate the health endpoint, use the following command:
When you run this endpoint, send the API keys mentioned in the authentication section.
Using a chat completion endpoint
To run your first API request, paste the following command into your terminal. Make sure to replace $INTEGRATION_PATH with your integration path, $INTEGRATION_API_KEY with your integration key.
Request
# Replace $INTEGRATION_PATH with your integration path
URL = "$INTEGRATION_PATH/generate"
# Replace $INTEGRATION_API_KEY with your integration API key
BEARER_TOKEN = "$INTEGRATION_API_KEY"
# Define the JSON data to be sent in the POST request
json_data = {
"messages": [
{
"role": "system",
"content": "Provide a concise summary of the provided text, limiting it to 100 words.",
},
{
"role": "user",
"content": "Artificial Intelligence (AI) agents are systems designed to perform specific tasks autonomously or with minimal human intervention, using machine learning and reasoning capabilities. These agents are capable of perceiving their environment, making decisions, and taking actions to achieve specific goals. AI agents can be categorized into reactive agents, which respond to their environment without internal models, and deliberative agents, which use internal models and reasoning to plan actions.\r\n\r\nOne of the key components of an AI agent is its ability to sense and act within its environment. This is often facilitated through sensors (which gather information about the world) and actuators (which execute actions based on decisions). The sophistication of an AI agent’s decision-making process can vary; some agents rely on predefined rules and logic, while others use advanced algorithms such as neural networks and reinforcement learning to adapt and optimize their behavior over time.\r\n\r\nAI agents can be applied in various domains, such as robotics, virtual assistants, and autonomous vehicles. In robotics, AI agents allow machines to perform tasks like object manipulation, navigation, and interaction with humans. Virtual assistants, like Siri and Alexa, are also examples of AI agents that help users with daily tasks such as setting reminders or answering questions. Autonomous vehicles, on the other hand, rely on AI agents to interpret sensor data, make real-time decisions, and navigate roads safely without human drivers.\r\n\r\nWhile AI agents offer significant benefits, they also present challenges, especially when it comes to ensuring ethical behavior, decision-making transparency, and avoiding unintended consequences. As AI technology continues to evolve, researchers and engineers are working on improving the capabilities of AI agents while addressing these concerns to ensure that they can be safely integrated into society.",
},
]
}
# Define the headers, including the Bearer token in the Authorization header
headers = {
"Authorization": f"Bearer {BEARER_TOKEN}",
"Content-Type": "application/json",
}
# Send the POST request using httpx
response = httpx.post(URL, json=json_data, headers=headers, timeout=600)
# Check if the request was successful
if response.status_code == 200:
print(response.json())
Response
This request queries BMC AMI AI Services to summarize the content that you provide. The response should resemble the following response:
"id": "chatcmpl-85282769098a4307b75a38f9aa857d1b",
"object": "chat.completion",
"created": 1738664849,
"model": "mistralai/Mixtral-8x7B-Instruct-v0.1",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": " Artificial Intelligence (AI) agents are systems that perform tasks autonomously using machine learning and reasoning. They can be reactive or deliberative, perceiving the environment through sensors and actuators. AI agents can be found in robotics, virtual assistants, and autonomous vehicles. While beneficial, they also pose ethical and safety challenges. As AI technology advances, researchers aim to improve capabilities while addressing these concerns. AI agents make decisions based on predefined rules or advanced algorithms like neural networks and reinforcement learning. They help with tasks in various domains, including object manipulation, navigation, and virtual assistance."
},
"logprobs": null,
"finish_reason": "stop"
}
],
"usage": {
"prompt_tokens": 400,
"total_tokens": 525,
"completion_tokens": 125,
"prompt_tokens_details": {
"cached_tokens": 0
}
}
}
Using a health endpoint
To use a health endpoint, see the following example. Make sure to replace $INTEGRATION_PATH with your integration path, $INTEGRATION_API_KEY with your integration key.
Response: A JSON object
Request
# Replace $INTEGRATION_PATH with your integration path
URL = "$INTEGRATION_PATH/health"
# Replace $INTEGRATION_API_KEY with your integration API key
BEARER_TOKEN = "$INTEGRATION_API_KEY"
# Define the headers, including the Bearer token in the Authorization header
headers = {
"Authorization": f"Bearer {BEARER_TOKEN}",
"Content-Type": "application/json",
}
# Send the POST request using httpx
response = httpx.get(URL, headers=headers, timeout=60)
# Check if the request was successful
if response.status_code == 200:
print(response.json())
Response
"data": null,
"error": null
}