Model notes
Mistral Nemo 12B
Long-context dense Mistral checkpoint that remains practical on a single 24 GB card with quantization.
12.2B dense • 128,000 context • 8 KV heads
Architecture
Model spec
Architecture
Dense decoder-only transformer
Total params
12.2B
Active params
Dense model
Layers
40
Hidden size
5,120
Attention heads
32
KV heads
8
KV-bearing layers
40
Context length
128,000
Modality
Text
License
Apache 2.0
Why it matters
Why memory behaves this way
Research highlight
Long-context dense Mistral design tuned for efficient single-node inference.
Memory note
Dense weights set the baseline footprint; long-context use makes KV cache the next thing to watch after quantization.
Checkpoints
Official profiles
Official BF16 checkpoint
BF16 checkpoint
Mistral's official consolidated BF16 weights for Mistral Nemo are about 24.5 GB.
vLLMTransformers
Open checkpointOfficial FP8 checkpoint
FP8 checkpoint
Mistral's official FP8 checkpoint repository for Mistral Nemo is about 13.6 GB on Hugging Face.
vLLMTransformers
Open checkpointSources