Key concepts

Before you start interacting with LLMs, you should understand LLM concepts. This topic provides information to understand the following concepts:

The LLMs provided and their capabilities
The difference between those LLMs
Hardware support
How the LLMs are finalized (test evaluation framework)
Key capabilities

LLMs provided and their capabilities

The BMC AMI AI Services product currently supports the following Large Language Models (LLMs) on the BMC AMI Platform:

Mixtral 8x7B
Llama 3 8B

Mixtral 8x7B

Mixtral 8x7B is a state-of-the-art, instruction-tuned LLM developed by Mistral AI. It is an instruction-based model quantized to 3-bit to reduce hardware requirements and improve computational efficiency. It employs a Sparse Mixture of Experts (SMoE) architecture, allowing for high performance with a relatively compact model size. It is trained on diverse data, providing a broad understanding while delivering compact yet accurate responses across various tasks.

Key capabilities

High performance: Outperforms many other open-source LLMs across various benchmarks
Efficiency: The SMoE architecture enables faster inference, making it more cost-effective
Context length: Supports up to 32,768 tokens, allowing for more complex and detailed interactions
Language support: Supports multiple languages, including English, French, Italian, German, and Spanish
Code generation: Excels in code generation tasks
Instruction following: Can be fine-tuned to follow instructions effectively

Benefits of the SMoE architecture

Increased capacity: Allows for a larger model size without sacrificing efficiency
Improved performance: Boosts performance on various tasks
Reduced cost: Uses only a subset of the model’s parameters for each input, reducing computational costs

Hardware Support

GPU support: Optimized for running efficiently on GPU clusters, leveraging its architecture for faster and more resource-efficient inference.

Llama 3 8B

Meta Llama 3 8B is an eight-billion-parameter instruction-tuned model optimized for query-answer tasks. It is available in a 4-bit quantized version on the BMC AMI AI Services platform, making it suitable for deployment on CPU-based machines for cost efficiency. Utilizing attention mechanisms, this model excels in domain-specific tasks by providing coherent and well-structured responses.

This model is pre-trained, instruction-tuned, and optimized for dialogue use cases. It outperforms many open-source chat models on industry benchmarks. Llama3 was developed to optimize helpfulness and safety.

Key capabilities

Auto-Regressive model: Uses an optimized transformer architecture for efficient text generation
Instruction-tuned: Models are fine-tuned using Supervised Fine-Tuning (SFT) and Reinforcement Learning with Human Feedback (RLHF) to align with human preferences for helpfulness and safety
Context length: Handles up to 8192 tokens, allowing for detailed interactions
Benchmark performance: Excels in dialogue and assistant-like chat tasks, outperforming other open-source models
Language support: Primarily optimized for English, but can be adapted for a variety of natural language generation tasks

Hardware support

GPU support: Optimized for GPU, allowing faster training and inference.
CPU compatibility: Llama3’s quantized version can be run efficiently on CPU machines, making it a cost-effective solution for deployments on low-cost hardware configurations.

How we selected the provided LLM models (test evaluation framework)

We tested multiple LLMs and selected models producing higher-quality outcomes based on evaluations from subject-matter experts.

Following the selection, we fine-tuned inference hyperparameters and applied effective prompt engineering to further improve the output quality.

We automatically evaluated all tests by using an in-house test evaluation framework to maintain and benchmark result quality. Each response was assessed against nine distinct metrics, categorized into these groups:

Similarity metrics
Generation metrics
End-to-end metrics

Key concepts

LLMs provided and their capabilities

Mixtral 8x7B

Key capabilities

Benefits of the SMoE architecture

Hardware Support

Llama 3 8B

Key capabilities

Hardware support

How we selected the provided LLM models (test evaluation framework)

BMC AMI Platform 1.2

On this page