Key concepts


Before you start interacting with LLMs, you should understand LLM concepts. This topic provides information to understand the following concepts:

  • The LLMs provided and their capabilities
  • The difference between those LLMs
  • Hardware support
  • How the LLMs are finalized (test evaluation framework)
  • Key capabilities

Related topic

LLMs provided and their capabilities

The BMC AMI AI Services product currently supports the following Large Language Models (LLMs) on the BMC AMI Platform:

  • Mixtral 8x7B
  • Llama 3 8B 

Mixtral 8x7B

Mixtral 8x7B is a state-of-the-art, instruction-tuned LLM developed by Mistral AI. It is an instruction-based model quantized to 3-bit to reduce hardware requirements and improve computational efficiency. It employs a Sparse Mixture of Experts (SMoE) architecture, allowing for high performance with a relatively compact model size. It is trained on diverse data, providing a broad understanding while delivering compact yet accurate responses across various tasks.

Key capabilities

  • High performance: Outperforms many other open-source LLMs across various benchmarks
  • Efficiency: The SMoE architecture enables faster inference, making it more cost-effective
  • Context length: Supports up to 32,768 tokens, allowing for more complex and detailed interactions
  • Language support: Supports multiple languages, including English, French, Italian, German, and Spanish
  • Code generation: Excels in code generation tasks
  • Instruction following: Can be fine-tuned to follow instructions effectively

Benefits of the SMoE architecture

  • Increased capacity: Allows for a larger model size without sacrificing efficiency
  • Improved performance: Boosts performance on various tasks
  • Reduced cost: Uses only a subset of the model’s parameters for each input, reducing computational costs

Hardware Support

GPU support: Optimized for running efficiently on GPU clusters, leveraging its architecture for faster and more resource-efficient inference. 

Llama 3 8B

Meta Llama 3 8B is an eight-billion-parameter instruction-tuned model optimized for query-answer tasks. It is available in a 4-bit quantized version on the BMC AMI AI Services platform, making it suitable for deployment on CPU-based machines for cost efficiency. Utilizing attention mechanisms, this model excels in domain-specific tasks by providing coherent and well-structured responses.

This model is pre-trained, instruction-tuned, and optimized for dialogue use cases. It outperforms many open-source chat models on industry benchmarks. Llama3 was developed to optimize helpfulness and safety. 

Key capabilities

  • Auto-Regressive model: Uses an optimized transformer architecture for efficient text generation
  • Instruction-tuned: Models are fine-tuned using Supervised Fine-Tuning (SFT) and Reinforcement Learning with Human Feedback (RLHF) to align with human preferences for helpfulness and safety
  • Context length: Handles up to 8192 tokens, allowing for detailed interactions
  • Benchmark performance: Excels in dialogue and assistant-like chat tasks, outperforming other open-source models
  • Language support: Primarily optimized for English, but can be adapted for a variety of natural language generation tasks

Hardware support

  • GPU support: Optimized for GPU, allowing faster training and inference. 
  • CPU compatibility: Llama3’s quantized version can be run efficiently on CPU machines, making it a cost-effective solution for deployments on low-cost hardware configurations. 

How we selected the provided LLM models (test evaluation framework)

We tested multiple LLMs and selected models producing higher-quality outcomes based on evaluations from subject-matter experts.

Following the selection, we fine-tuned inference hyperparameters and applied effective prompt engineering to further improve the output quality. 

We automatically evaluated all tests by using an in-house test evaluation framework to maintain and benchmark result quality. Each response was assessed against nine distinct metrics, categorized into these groups:

  • Similarity metrics
  • Generation metrics
  • End-to-end metrics

 

Tip: For faster searching, add an asterisk to the end of your partial query. Example: cert*

BMC AMI Platform 1.2