Key concepts

The BMC AMI AI Services product currently supports the following Large Language Models (LLMs) on BMC AMI Platform:

Mixtral 8x7B
Llama 3 8B
Granite 3 8B

This topic describes these LLMs' key capabilities and hardware support. It also describes the test evaluation framework that we used to select these LLMs.

Key capabilities

High performance: Outperforms many other open-source LLMs across various benchmarks
Efficiency: The SMoE architecture enables faster inference, making it more cost-effective
Context length: Supports up to 32,768 tokens, allowing for more complex and detailed interactions
Language support: Supports multiple languages, including English, French, Italian, German, and Spanish
Code generation: Excels in code generation tasks
Instruction following: Can be fine-tuned to follow instructions effectively

Benefits of the SMoE architecture

Increased capacity: Allows for a larger model size without sacrificing efficiency
Improved performance: Boosts performance on various tasks
Reduced cost: Uses only a subset of the model’s parameters for each input, reducing computational costs

Hardware support

GPU support: Optimized for running efficiently on GPU clusters, leveraging its architecture for faster and more resource-efficient inference.

Llama 3 8B

Meta Llama 3 8B is an eight-billion-parameter instruction-tuned model optimized for query-answer tasks.

This model is pre-trained, instruction-tuned, and optimized for dialogue use cases. It outperforms many open-source chat models on industry benchmarks. Llama3 was developed to optimize helpfulness and safety.

Key capabilities

Auto-Regressive model: Uses an optimized transformer architecture for efficient text generation
Instruction-tuned: Models are fine-tuned via Supervised Fine-Tuning (SFT) and Reinforcement Learning with Human Feedback (RLHF) to align with human preferences for helpfulness and safety
Context length: Handles up to 8,192 tokens, allowing for detailed interactions
Benchmark performance: Excels in dialogue and assistant-like chat tasks, outperforming other open-source models
Language support: Primarily optimized for English, but can be adapted for a variety of natural language generation tasks

Hardware support

GPU support: Optimized for GPU, allowing faster training and inference.
CPU compatibility: Llama3’s quantized version can be run efficiently on CPU machines, making it a cost-effective solution for deployments on low-cost hardware configurations.

Granite 3 8B

Granite 3 8B Instruct is a powerful instruction-tuned LLM designed for a wide range of natural language tasks. It is built on the foundation of the Granite 3 base model, a state-of-the-art LLM that has been pre-trained on a massive dataset of text and code.

The Granite 3 8B Instruct model has been further fine-tuned on a diverse set of instruction data to improve its ability to follow instructions, write different kinds of creative content, and answer your questions in an informative way.

Key capabilities

Instruction following: The model is highly adept at understanding and following instructions, making it suitable for a wide range of tasks, from simple question answering to complex content generation.
High performance: The model delivers state-of-the-art performance on a variety of benchmarks, including question answering, text generation, and translation.
Text generation: The model can generate high-quality text, including stories, poems, articles, and code.
Translation: The model can translate text between multiple languages.
Versatility: The model can be used for a wide range of tasks, making it a valuable tool for businesses and researchers.
Ease of use: The model is easy to use and can be accessed through a variety of APIs.

Hardware support

GPU support: The model is optimized for running on GPUs, which can significantly accelerate its performance.

Test evaluation framework

We tested multiple LLMs and selected models producing higher-quality outcomes based on evaluations from subject-matter experts.

Following the selection, we fine-tuned inference hyperparameters and applied effective prompt engineering to further improve the output quality.

We automatically evaluated all tests by using an in-house test evaluation framework to maintain and benchmark result quality. Each response was assessed against the following groups of metrics:

Similarity metrics
Generation metrics
End-to-end metrics

Key concepts

Key capabilities

Benefits of the SMoE architecture

Hardware support

Llama 3 8B

Key capabilities

Hardware support

Granite 3 8B

Key capabilities

Hardware support

Test evaluation framework

BMC AMI Platform 1.3

On this page