Model notes
Llama 3.1 8B
Compact dense Llama model with grouped-query attention and a 128K context window.
8B dense • 131,072 context • 8 KV heads
Architecture
Model spec
Architecture
Dense decoder-only transformer
Total params
8B
Active params
Dense model
Layers
32
Hidden size
4,096
Attention heads
32
KV heads
8
KV-bearing layers
32
Context length
131,072
Modality
Text
License
Llama 3.1 Community License
Why it matters
Why memory behaves this way
Research highlight
Grouped-query attention keeps KV state lighter than full multi-head attention while retaining a long native context window.
Memory note
Dense weights dominate the footprint; grouped KV heads help prevent cache growth from exploding at long context.
Checkpoints
Official profiles
Official BF16 checkpoint
BF16 checkpoint
Meta's official Llama 3.1 8B Instruct release is a BF16 checkpoint with grouped-query attention.
vLLMTransformers
Open checkpointSources