Model notes
Gemma 2 9B
Instruction-tuned Gemma checkpoint with a relatively short native context window and efficient KV usage.
9.2B dense • 8,192 context • 8 KV heads
Architecture
Model spec
Architecture
Dense decoder-only transformer
Total params
9.2B
Active params
Dense model
Layers
42
Hidden size
3,584
Attention heads
16
KV heads
8
KV-bearing layers
42
Context length
8,192
Modality
Text
License
Gemma terms
Why it matters
Why memory behaves this way
Research highlight
Gemma 2 focuses on efficient dense inference rather than extreme context length.
Memory note
The shorter native context window keeps KV cache moderate, so the main memory driver is still the dense weight tensor.
Checkpoints
Official profiles
Official BF16 checkpoint
BF16 checkpoint
Google's official Gemma 2 9B Instruct release is exported in bfloat16.
vLLMTransformers
Open checkpointSources