About the Llama 2 Model (Large Language Model)

Llama 2 is a collection of pretrained and fine-tuned large language models developed by Meta AI, with parameter sizes ranging from 7 billion to 70 billion. It represents a significant advancement over its predecessor Llama 1, featuring an expanded pretraining corpus (40% larger), doubled context length, and grouped-query attention. The fine-tuned Llama 2-Chat variants are optimized for dialogue use cases and match some closed-source models in helpfulness and safety benchmarks.

Overview

GPU Memory Requirements

Default (FP16) inference requires approximately 14 GB of GPU memory.

QuantizationMemory (GB)Notes
FP1614-
INT87-
INT44Using GPTQ or bitsandbytes quantization

Training Data

2 trillion tokens from publicly available sources (pretraining data cutoff September 2022). Fine-tuning includes publicly available instruction datasets and over 1 million new human-annotated examples.

Evaluation Benchmarks

Compare GPUs for AI/ML

Compare GPUs by price-per-performance metrics for machine learning workloads.
View GPU Rankings

Try on Hugging Face

Explore the Llama 2 model on Hugging Face, including model weights and documentation.
View Model

Read the Paper

Read the original research paper describing the Llama 2 architecture and training methodology.
View Paper

References

Notes