Key concepts
Related topics
The BMC AMI Platform leverages Kubernetes and OpenShift to provide resilient workload orchestration, secure service communication, and scalable operations for containerized applications.
Kubernetes and OpenShift
The BMC AMI Platform runs on Kubernetes‑based container orchestration, with Red Hat OpenShift providing the enterprise Kubernetes distribution. Kubernetes provides the core control plane and worker node architecture required to schedule, execute, scale, and self‑heal containerized workloads. Control‑plane components, including the API server, scheduler, controller manager, and etcd datastore, manage cluster state and orchestration logic, while worker nodes host platform and application pods via the kubelet and container runtime. Cluster networking, implemented through the Container Network Interface (CNI) and kube‑proxy, enables reliable pod‑to‑pod and pod‑to‑service communication. This architecture exposes a consistent API for workload definition, traffic routing, capacity scaling, and automated recovery from node or pod failures. Red Hat OpenShift extends upstream Kubernetes with enterprise‑focused security, operational, and lifecycle‑management capabilities. It maintains the same highly available control‑plane and worker‑node model while adding hardened security controls, integrated networking and routing, and automation. Key enhancements include Security Context Constraints (SCCs) for enforcing secure pod execution policies, integrated image management where configured, native HTTP/S ingress and routing, and Operator‑based lifecycle management for platform and add‑on services. OpenShift also provides unified operational interfaces through the oc CLI and web console alongside standard Kubernetes tooling. Within the BMC AMI Platform, services are deployed using standard Kubernetes resources such as Deployments, StatefulSets, Services, and Routes or Ingress. The control plane continuously reconciles the desired state, redistributes workloads, and reschedules failed pods across worker nodes. OpenShift augments this model with enterprise‑grade guardrails and automation, enabling the BMC AMI Platform to operate securely and reliably at scale in multi‑tenant and regulated environments.
Core objectives
- Standardize how services run across environments.
- Scale quickly while keeping deployments reproducible.
- Enable controlled rollouts and fast recovery.
Key primitives
- Deployment: declares the desired app version, pod count, and rollback strategy.
- ReplicaSet: guarantees the Deployment keeps the correct number of identical pods running.
- Pod: wraps one or more containers with shared networking and storage.
- Service: exposes pods through a stable DNS/IP address with built-in load balancing.
- ConfigMap / secret: externalize configuration and sensitive values.
Cluster strategy
- Control vs. Compute nodes: Control-plane nodes run etcd, the API server, scheduler, and controllers; worker nodes run business workloads managed by kubelet.
- Namespace isolation: group workloads by Service group (e.g., AI, Core, DataService, and observability) to simplify quota and security policies.
- Scheduling and availability: rely on node labels/taints and Pod anti-affinity to spread replicas, with Horizontal Pod Autoscaler for demand spikes and PodDisruptionBudgets to protect availability.
- Cluster health: monitor control-plane latency, node readiness, resource saturation (CPU/memory/disk), and kube-system pods to catch degradation early.
- Takeaway: It provides consistent packaging and automated deployment. Our cluster strategy keeps workloads isolated yet portable, ensures rollouts are safe, and enables us to scale and recover quickly.
Large Language Models (LLMs)
The BMC AMI Platform product currently supports the following large language models (LLMs) on BMC AMI Platform:
- Mixtral 8x7B
- Llama 3.1 8B
- Granite 3.1 8B
This topic describes the key capabilities and hardware support of these LLMs. It also describes the test-evaluation framework we used to select these LLMs.
Mixtral 8x7B
Mixtral 8x7B is a state-of-the-art, instruction-tuned LLM developed by Mistral AI. It is an instruction-based model quantized to 3-bit to reduce hardware requirements and improve computational efficiency. It employs a Sparse Mixture of Experts (SMoE) architecture, allowing for high performance with a relatively compact model size. It is trained on diverse data, providing a broad understanding while delivering compact yet accurate responses across various tasks.
Key capabilities
- High performance: Outperforms many other open-source LLMs across various benchmarks
- Efficiency: The SMoE architecture enables faster inference, making it more cost-effective
- Context length: Supports up to 32,768 tokens, allowing for more complex and detailed interactions
- Language support: Supports multiple languages, including English, French, Italian, German, and Spanish
- Code generation: Excels in code generation tasks
- Instruction following: Can be fine-tuned to follow instructions effectively
Benefits of the SMoE architecture
- Increased capacity: Allows for a larger model size without sacrificing efficiency
- Improved performance: Boosts performance on various tasks
- Reduced cost: Uses only a subset of the model’s parameters for each input, reducing computational costs
Hardware support
GPU support: Optimized for running efficiently on GPU clusters, leveraging its architecture for faster and more resource-efficient inference.
Llama 3.1 8B
Meta Llama 3.1 8B is an eight-billion-parameter instruction-tuned model optimized for query-answer tasks.
This model is pre-trained, instruction-tuned, and optimized for dialogue use cases. It outperforms many open-source chat models on industry benchmarks. Llama3.1 was developed to optimize helpfulness and safety.
Key capabilities
- Auto-regressive model: Uses an optimized transformer architecture for efficient text generation
- Instruction-tuned: Models are fine-tuned via Supervised Fine-Tuning (SFT) and Reinforcement Learning with Human Feedback (RLHF) to align with human preferences for helpfulness and safety
- Context length: Handles up to 128k tokens, allowing for detailed interactions
- Benchmark performance: Excels in dialogue and assistant-like chat tasks, outperforming other open-source models
- Language support: Primarily optimized for English, but can be adapted for a variety of natural language generation tasks
Hardware support
GPU support: Optimized for GPU, allowing faster training and inference.
Granite 3.1 8B
Granite 3.1 8B Instruct is a powerful instruction-tuned LLM designed for a wide range of natural language tasks. It is built on the foundation of the Granite 3.1 base model, a state-of-the-art LLM that has been pre-trained on a massive dataset of text and code.
The Granite 3.1 8B Instruct model has been further fine-tuned on a diverse set of instruction data to improve its ability to follow instructions, write various types of creative content, and answer your questions informatively.
Key capabilities
- Instruction following: The model is highly adept at understanding and following instructions, making it suitable for a wide range of tasks, from simple question answering to complex content generation.
- High performance: The model delivers state-of-the-art performance on a variety of benchmarks, including question answering, text generation, and translation.
- Text generation: The model can generate high-quality text, including stories, poems, articles, and code.
- Translation: The model can translate text between multiple languages.
- Versatility: The model can be used for a wide range of tasks, making it a valuable tool for businesses and researchers.
- Ease of use: The model is easy to use and can be accessed through a variety of APIs.
Hardware support
GPU support: The model is optimized for running on GPUs, which can significantly accelerate its performance.
Test evaluation framework
We tested multiple LLMs and selected models producing higher-quality outcomes based on evaluations from subject-matter experts.
After selection, we fine-tuned inference hyperparameters and applied effective prompt engineering to further improve output quality.
We automatically evaluated all tests by using an in-house test evaluation framework to maintain and benchmark result quality. Each response was assessed against the following groups of metrics:
- Similarity metrics
- Generation metrics
- End-to-end metrics