Bringing your own integrations

This topic describes how you can use the Bring Your Own Integrations feature to seamlessly integrate generative AI services into your application, tailored to your unique use case. You can use BMC-provided LLMs or connect external ones for greater flexibility and control over your AI-powered solutions.

Using API keys for authentication

BMC AMI Platform API uses API keys for authentication. You can download API keys for integration at the account level.

You can set each API key to the following setting:

Integration key—Provides access to an integration. We highly recommend transitioning to project keys for best security practices, although access via this method is still supported.

Important

Keep your API key confidential. Do not share it or expose it in client-side code (browsers, apps). Route production requests through your back-end server, where the API key can be securely loaded from an environment variable or key management service.

All API requests should include your integration key in an Authorization HTTP header and integration ID as follows:

Authorization: Bearer INTEGRATION_API_KEY

Generating a chat completion endpoint

You can create a model response for chat conversations.

To generate a chat endpoint, use the following command:

POST $INTEGRATION_PATH/generate

Parameter support varies depending on the model used to generate the response.

BMC AMI AI Services

provides models that support the following parameters, but locally added models, such as Bring Your Own LLM (BYOLLM), might not support all parameters.Request body

Name	Type	Default	Optional	Description
messages	list[ChatMessage]	None	No	Specifies a list of messages that make up the conversation up to this point.
frequency_penalty	float	0.0	Yes	Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood of repeating the same line verbatim.
logit_bias	dict[str, float]	None	Yes	Modifies the likelihood of specified tokens appearing in the completion This parameter accepts a JSON object that maps tokens (specified by their token ID in the model tokenizer) to an associated bias value from -100 to 100. You can use this tokenizer tool to convert text to token IDs. Mathematically, the bias is added to the logits generated by the model before sampling. The exact effect varies per model, but values between -1 and 1 should decrease or increase the likelihood of selection; values like -100 or 100 should result in a ban or exclusive selection of the relevant token. For example, you can pass {"50256": -100} to prevent the <\|endoftext\|> token from being generated.
logprobs	bool	False	Yes	Specifies whether to return log probabilities of the output tokens. The true value returns the log probabilities of each output token returned in the content of the message parameter.
top_logprobs	int	None	Yes	Specifies the number of most likely tokens to return at each token position, each with an associated log probability.  Valid values are integers from 0 to 5. To use this parameter, you must set the logprobs parameter value to true.
max_completion_tokens	int	None	Yes	Specifies an upper limit for the number of tokens that can be generated for a completion, including visible output tokens and reasoning tokens
n	int	1	Yes	Specifies how many completions to generate for each prompt Warning Important This parameter generates many completions, so it can quickly consume your token quota. Use carefully and ensure that you have reasonable settings for max_tokens and stop.
presence_penalty	float	0.0	Yes	Number between -2.0 and 2.0. Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood of talking about new topics.
stop	str or list[str]	[]	Yes	Specifies up to four sequences where the API stops generating further tokens. The returned text will not contain the stop sequence.
stream	bool	False	Yes	Specifies whether to stream back partial progress. If set, tokens are sent as data-only server-sent events as they become available, with the stream terminated by a data: [DONE] message. For more information about the event stream format for server-sent events, see the Server-Sent Events documentation on the MDN Web Docs. Warning Important This parameter supports BMC-provided LLMs only.
stream_options	StreamOptions	None	Yes	Specifies options for streaming response. Only set this parameter when you set the stream parameter value to true.
temperature	float	None	Yes	Specifies which sampling temperature to use, between 0 and 2. Higher values, such as 0.8, make the output more random, while lower values, such as 0.2, make it more focused and deterministic. We generally recommend altering this or top_p, but not both.
top_p	float	None	Yes	Specifies an alternative to sampling with temperature is nucleus sampling, in which the model considers the results of the tokens with top_p probability mass. So 0.1 means that only the tokens comprising the top 10 percent probability mass are considered. We generally recommend altering this parameter or the temperature parameter, but not both. Warning Important top_p value must be between 0.1 and 1.
best_of	int	None	Yes	Generates best_of completions server-side and returns the "best" (the one with the highest log probability per token). Results cannot be streamed. When used with n, best_of controls the number of candidate completions and n specifies how many to return. The best_of value must be greater than or equal to n. Warning Important Because this parameter generates many completions, it can quickly consume your token quota. Use this parameter carefully and make sure that you have reasonable settings for max_tokens and stop.
use_beam_search	bool	False	Yes	Specifies whether to use beam search instead of sampling
top_k	int	None	Yes	Controls the number of top tokens to consider. Set to -1 to consider all tokens.
min_p	float	0.0	Yes	Specifies a float that represents the minimum probability for a token to be considered relative to the probability of the most likely token. Values must be in [0, 1]. Set to 0 to disable this parameter.
repetition_penalty	float	None	Yes	Specifies a float that penalizes new tokens based on whether they appear in the prompt and the generated text so far. Values greater than 1 encourage the model to use new tokens, while values less than 1 encourage the model to repeat tokens.
length_penalty	float	1.0	Yes	Specifies a float that penalizes sequences based on their length. This parameter is used in beam search.
stop_token_ids	list[int]	[]	Yes	Specifies a list of tokens that stop the generation when they are generated. The returned output contains the stop tokens unless the stop tokens are special tokens.
include_stop_str_in_output	bool	False	Yes	Specifies whether to include the stop strings in output text.
ignore_eos	bool	False	Yes	Specifies whether to ignore the EOS token and continue generating tokens after the EOS token is generated
min_tokens	int	0	Yes	Specifies the minimum number of tokens to generate per output sequence before EOS or stop_token_ids can be generated.
skip_special_tokens	bool	True	Yes	Specifies whether to skip special tokens in the output
spaces_between_special_tokens	bool	True	Yes	Specifies whether to add spaces between special tokens in the output.
truncate_prompt_tokens	int	None	Yes	If set to an integer k, this parameter uses only the last k tokens from the prompt (that is, left truncation).

Response

Name	Type	Optional	Description
id	str	No	Unique identifier for the chat completion
choices	list[Choices]	No	List of chat completion choices. Multiple choices are valid if n is greater than 1.
created	int	No	UNIX time stamp (in seconds) of when the chat completion was created
model	str	No	Model used for the chat completion.
object	str	No	Object type, which is always chat completion
usage	Usage	No	Usage statistics for the completion request

ChatMessage

Name	Type	Optional	Description
role	MessageRoleType	No	Specifies the user role (user, system, or assistant)
content	str	No	Contains the query or input from the user

MessageRoleType

Role	Value
USER	user
SYSTEM	system
AI	assistant

StreamOptions

Name	Type	Default	Optional	Description
include_usage	bool	True	Yes	If set, an additional chunk is streamed before the data: [DONE] message. This chunk's usage field shows token usage statistics for the entire request, while the choices field is always an empty array. All other chunks include a usage field with a null value.
continuous_usage_stats	bool	False	Yes	When continuous_usage_stats is set to true, it tracks statistics continuously during the model run.

Choices

Name	Type	Optional	Description
index	int	No	Index of the choice in the list of choices
message	list[Message]	No	Chat completion message is generated by the model
logprobs	list[Logprob]	Yes	Log probability information for the choice
finish_reason	str	No	Valid values are as follows: stop—The generation stopped because a specified stop condition was met, such as encountering a stop token or string. length—The generation stopped because it reached the maximum number of specified tokens. abort—The generation was stopped due to an error or interruption.

Message

Name	Type	Optional	Description
content	str	Yes	Contents of the message
role	str	No	Role of the author of this message

Logprob

Name	Type	Optional	Description
content	List[LogprobData]	Yes	List of message content tokens with log probability information

LogprobData

Name	Type	Optional	Description
top_logprobs	list[TopLogprob]	No	List of the most likely tokens and their log probability, at this token position. In rare cases, fewer than the number of requested top_logprobs are returned.

TopLogprob

Name	Type	Optional	Description
token	str	No	Token name
logprob	number	No	Log probability of this token, if it is within the top 20 most likely tokens. Otherwise, the value -9999.0 is used to signify that the token is very unlikely.
bytes	list	Yes	List of integers representing the UTF-8 bytes representation of the token. This parameter is useful when characters are represented by multiple tokens and their byte representations must be combined to generate the correct text representation. The value can be null if the token has no bytes representation.

Usage

Name	Type	Optional	Description
completion_tokens	int	No	Number of tokens in the generated completion
prompt_tokens	int	No	Number of tokens in the prompt
total_tokens	int	No	Total number of tokens used in the request (prompt and completion)
prompt_tokens_details	PromptTokensDetails	No	Breakdown of tokens used in the prompt

PromptTokensDetails

Name	Type	Optional	Description
cached_tokens	integer	No	Cached tokens present in the prompt

Getting the integration status (health endpoint)

You can use the health endpoint to check the status of the integration. It internally validates whether all dependent services and models are operational and functioning properly.

To generate the health endpoint, use the following command:

POST $INTEGRATION_PATH/health

When you run this endpoint, send the API keys mentioned in the authentication section.

Using a chat completion endpoint

To run your first API request, paste the following command into your terminal. Make sure to replace $INTEGRATION_PATH with your integration path, $INTEGRATION_API_KEY with your integration key.

Request

import httpx

# Replace $INTEGRATION_PATH with your integration path
URL = "$INTEGRATION_PATH/generate"

# Replace $INTEGRATION_API_KEY with your integration API key
BEARER_TOKEN = "$INTEGRATION_API_KEY"

# Define the JSON data to be sent in the POST request
json_data = {
   "messages": [
        {
           "role": "system",
           "content": "Provide a concise summary of the provided text, limiting it to 100 words.",
        },
        {
           "role": "user",
           "content": "Artificial Intelligence (AI) agents are systems designed to perform specific tasks autonomously or with minimal human intervention, using machine learning and reasoning capabilities. These agents are capable of perceiving their environment, making decisions, and taking actions to achieve specific goals. AI agents can be categorized into reactive agents, which respond to their environment without internal models, and deliberative agents, which use internal models and reasoning to plan actions.\r\n\r\nOne of the key components of an AI agent is its ability to sense and act within its environment. This is often facilitated through sensors (which gather information about the world) and actuators (which execute actions based on decisions). The sophistication of an AI agent’s decision-making process can vary; some agents rely on predefined rules and logic, while others use advanced algorithms such as neural networks and reinforcement learning to adapt and optimize their behavior over time.\r\n\r\nAI agents can be applied in various domains, such as robotics, virtual assistants, and autonomous vehicles. In robotics, AI agents allow machines to perform tasks like object manipulation, navigation, and interaction with humans. Virtual assistants, like Siri and Alexa, are also examples of AI agents that help users with daily tasks such as setting reminders or answering questions. Autonomous vehicles, on the other hand, rely on AI agents to interpret sensor data, make real-time decisions, and navigate roads safely without human drivers.\r\n\r\nWhile AI agents offer significant benefits, they also present challenges, especially when it comes to ensuring ethical behavior, decision-making transparency, and avoiding unintended consequences. As AI technology continues to evolve, researchers and engineers are working on improving the capabilities of AI agents while addressing these concerns to ensure that they can be safely integrated into society.",
        },
    ]
}

# Define the headers, including the Bearer token in the Authorization header
headers = {
   "Authorization": f"Bearer {BEARER_TOKEN}",
   "Content-Type": "application/json",
}

# Send the POST request using httpx
response = httpx.post(URL, json=json_data, headers=headers, timeout=600)

# Check if the request was successful
if response.status_code == 200:
   print(response.json())

Response

This request queries BMC AMI AI Services to summarize the content that you provide. The response should resemble the following response:

{
    "id": "chatcmpl-85282769098a4307b75a38f9aa857d1b",
    "object": "chat.completion",
    "created": 1738664849,
    "model": "mistralai/Mixtral-8x7B-Instruct-v0.1",
    "choices": [
        {
            "index": 0,
            "message": {
                "role": "assistant",
                "content": " Artificial Intelligence (AI) agents are systems that perform tasks autonomously using machine learning and reasoning. They can be reactive or deliberative, perceiving the environment through sensors and actuators. AI agents can be found in robotics, virtual assistants, and autonomous vehicles. While beneficial, they also pose ethical and safety challenges. As AI technology advances, researchers aim to improve capabilities while addressing these concerns. AI agents make decisions based on predefined rules or advanced algorithms like neural networks and reinforcement learning. They help with tasks in various domains, including object manipulation, navigation, and virtual assistance."
            },
            "logprobs": null,
            "finish_reason": "stop"
        }
    ],
    "usage": {
        "prompt_tokens": 400,
        "total_tokens": 525,
        "completion_tokens": 125,
        "prompt_tokens_details": {
            "cached_tokens": 0
        }
    }
}

Using a health endpoint

To use a health endpoint, see the following example. Make sure to replace $INTEGRATION_PATH with your integration path, $INTEGRATION_API_KEY with your integration key.

Response: A JSON object

Request

import httpx

# Replace $INTEGRATION_PATH with your integration path
URL = "$INTEGRATION_PATH/health"

# Replace $INTEGRATION_API_KEY with your integration API key
BEARER_TOKEN = "$INTEGRATION_API_KEY"

# Define the headers, including the Bearer token in the Authorization header
headers = {
   "Authorization": f"Bearer {BEARER_TOKEN}",
   "Content-Type": "application/json",
}

# Send the POST request using httpx
response = httpx.get(URL, headers=headers, timeout=60)

# Check if the request was successful
if response.status_code == 200:
   print(response.json())

Response

{

"data": null,

"error": null

}

Bringing your own integrations

Using API keys for authentication

Generating a chat completion endpoint

Getting the integration status (health endpoint)

Using a chat completion endpoint

Using a health endpoint

BMC AMI Platform 2.0

On this page