Key concepts
Mixtral 8x7B
Mixtral 8x7B is a state-of-the-art, instruction-tuned LLM developed by Mistral AI. It is an instruction-based model quantized to 3-bit to reduce hardware requirements and improve computational efficiency. It employs a Sparse Mixture of Experts (SMoE) architecture, allowing for high performance with a relatively compact model size. It is trained on diverse data, providing a broad understanding while delivering compact yet accurate responses across various tasks.
Key capabilities
- High performance: Outperforms many other open-source LLMs across various benchmarks
- Efficiency: The SMoE architecture enables faster inference, making it more cost-effective
- Context length: Supports up to 32,768 tokens, allowing for more complex and detailed interactions
- Language support: Supports multiple languages, including English, French, Italian, German, and Spanish
- Code generation: Excels in code generation tasks
- Instruction following: Can be fine-tuned to follow instructions effectively
Benefits of the SMoE architecture
- Increased capacity: Allows for a larger model size without sacrificing efficiency
- Improved performance: Boosts performance on various tasks
- Reduced cost: Uses only a subset of the model’s parameters for each input, reducing computational costs
Hardware support
GPU support: Optimized for running efficiently on GPU clusters, leveraging its architecture for faster and more resource-efficient inference.
Llama 3 8B
Meta Llama 3 8B is an eight-billion-parameter instruction-tuned model optimized for query-answer tasks.
This model is pre-trained, instruction-tuned, and optimized for dialogue use cases. It outperforms many open-source chat models on industry benchmarks. Llama3 was developed to optimize helpfulness and safety.
Key capabilities
- Auto-Regressive model: Uses an optimized transformer architecture for efficient text generation
- Instruction-tuned: Models are fine-tuned via Supervised Fine-Tuning (SFT) and Reinforcement Learning with Human Feedback (RLHF) to align with human preferences for helpfulness and safety
- Context length: Handles up to 8,192 tokens, allowing for detailed interactions
- Benchmark performance: Excels in dialogue and assistant-like chat tasks, outperforming other open-source models
- Language support: Primarily optimized for English, but can be adapted for a variety of natural language generation tasks
Hardware support
- GPU support: Optimized for GPU, allowing faster training and inference.
- CPU compatibility: Llama3’s quantized version can be run efficiently on CPU machines, making it a cost-effective solution for deployments on low-cost hardware configurations.
Granite 3 8B
Granite 3 8B Instruct is a powerful instruction-tuned LLM designed for a wide range of natural language tasks. It is built on the foundation of the Granite 3 base model, a state-of-the-art LLM that has been pre-trained on a massive dataset of text and code.
The Granite 3 8B Instruct model has been further fine-tuned on a diverse set of instruction data to improve its ability to follow instructions, write different kinds of creative content, and answer your questions in an informative way.
Key capabilities
- Instruction following: The model is highly adept at understanding and following instructions, making it suitable for a wide range of tasks, from simple question answering to complex content generation.
- High performance: The model delivers state-of-the-art performance on a variety of benchmarks, including question answering, text generation, and translation.
- Text generation: The model can generate high-quality text, including stories, poems, articles, and code.
- Translation: The model can translate text between multiple languages.
- Versatility: The model can be used for a wide range of tasks, making it a valuable tool for businesses and researchers.
- Ease of use: The model is easy to use and can be accessed through a variety of APIs.
Hardware support
- GPU support: The model is optimized for running on GPUs, which can significantly accelerate its performance.
Test evaluation framework
We tested multiple LLMs and selected models producing higher-quality outcomes based on evaluations from subject-matter experts.
Following the selection, we fine-tuned inference hyperparameters and applied effective prompt engineering to further improve the output quality.
We automatically evaluated all tests by using an in-house test evaluation framework to maintain and benchmark result quality. Each response was assessed against the following groups of metrics:
- Similarity metrics
- Generation metrics
- End-to-end metrics