ย What do you think? Please drop us a line and let us know what you like and what can be better. ๐
About the Mistral-7B Model (Large Language Model)
Mistral-7B-v0.1 is a 7-billion-parameter language model known for its high performance and efficiency in Natural Language Processing. It integrates innovative attention mechanisms including grouped-query attention (GQA) for accelerated inference and sliding window attention (SWA) to handle long sequences effectively at a lower computational cost. The model outperforms Llama 2 13B and Llama 1 34B in various benchmarks including reasoning, mathematics, and code generation.
Overview
- Use Case: Text generation, reasoning, mathematics, code generation
- Creator: Mistral AI
- Architecture: Transformer with grouped-query attention (GQA) and sliding window attention (SWA)
- Parameters: 7.3B
- Release Date: 2023-09-27
- License: Apache 2.0
- Context Length: 8,192 tokens
GPU Memory Requirements
Default (FP16) inference requires approximately 14 GB of GPU memory.
| Quantization | Memory (GB) | Notes |
|---|---|---|
| FP16 | 14 | - |
| INT8 | 7 | - |
| INT4 | 4 | Using GPTQ or AWQ quantization |
Training Data
Not publicly disclosed by Mistral AI
Evaluation Benchmarks
- Hellaswag
- Winogrande
- PIQA
- SIQA
- OpenbookQA
- ARC-Easy
- ARC-Challenge
- CommonsenseQA
- NaturalQuestions
- TriviaQA
- BoolQ
- QuAC
- GSM8K
- MATH
- HumanEval
- MBPP
- MMLU
- BBH
- AGI Eval
Compare GPUs for AI/ML
Compare GPUs by price-per-performance metrics for machine learning workloads.
View GPU RankingsTry on Hugging Face
Explore the Mistral-7B model on Hugging Face, including model weights and documentation.
View ModelRead the Paper
Read the original research paper describing the Mistral-7B architecture and training methodology.
View PaperReferences
Notes
- Training data details not publicly disclosed by Mistral AI
- Fine-tuned variant Mistral 7B Instruct available for instruction-following tasks
- System prompts available for enforcing ethical guardrails and content moderation
- Parameter count is 7.3B (approximately 7.24B precise); the 'Mistral-7B' name uses a rounded approximation