ย What do you think? Please drop us a line and let us know what you like and what can be better. ๐
About the BERT Model (Large Language Model)
BERT (Bidirectional Encoder Representations from Transformers) is a deep learning model that focuses on pre-training deep bidirectional representations from unlabeled text. This approach enables the model to understand the context of a word based on all of its surroundings (left and right of the word). BERT has achieved state-of-the-art results in a wide range of natural language processing tasks, showcasing its versatility and effectiveness.
Overview
- Use Case: Natural language understanding tasks including question answering, language inference, sentiment analysis, and named entity recognition
- Creator: Google AI Language (Jacob Devlin, Ming-Wei Chang, Kenton Lee, Kristina Toutanova)
- Architecture: Transformer-based encoder with bidirectional self-attention using Masked Language Model (MLM) and Next Sentence Prediction (NSP) pre-training objectives
- Parameters: 110M
- Release Date: 2018
- License: Apache 2.0
- Context Length: 512 tokens
GPU Memory Requirements
Default (FP16) inference requires approximately 0.25 GB of GPU memory.
| Quantization | Memory (GB) | Notes |
|---|---|---|
| FP32 | 0.5 | - |
| FP16 | 0.25 | - |
| INT8 | 0.12 | - |
Training Data
BooksCorpus (800M words) and English Wikipedia (2,500M words)
Evaluation Benchmarks
- GLUE
- SQuAD 1.1
- SQuAD 2.0
- SWAG
Compare GPUs for AI/ML
Compare GPUs by price-per-performance metrics for machine learning workloads.
View GPU RankingsTry on Hugging Face
Explore the BERT model on Hugging Face, including model weights and documentation.
View ModelRead the Paper
Read the original research paper describing the BERT architecture and training methodology.
View PaperReferences
- https://arxiv.org/abs/1810.04805
- https://huggingface.co/bert-base-uncased
- https://github.com/google-research/bert
Notes
- Parameter count is for BERT-base model; BERT-large has 340M parameters
- GPU memory requirements are approximate for inference with batch size 1; gpuMemoryRequirementGB represents FP16 precision
- Care should be taken in applications that could amplify biases present in training data
- BERT is an encoder-only transformer model designed for natural language understanding tasks, not text generation