Model notes
Qwen 2.5 32B
Larger dense Qwen variant that often pushes single-GPU inference toward aggressive quantization.
32.5B dense • 131,072 context • 8 KV heads
Architecture
Model spec
Architecture
Total params
Active params
Layers
Hidden size
Attention heads
KV heads
KV-bearing layers
Context length
Modality
License
Why it matters
Why memory behaves this way
Research highlight
High-capacity dense Qwen checkpoint optimized for long-context inference rather than sparse routing.
Memory note
Dense resident weights dominate here, which is why 4-bit loading is usually the difference between fitting and not fitting on one card.
Checkpoints
Official profiles
Official BF16 checkpoint
BF16 checkpoint
The official Qwen2.5-32B-Instruct checkpoint repository is about 65.5 GB on Hugging Face.
Official GPTQ 4-bit checkpoint
4-bit checkpoint
The official Qwen2.5-32B-Instruct-GPTQ-Int4 checkpoint repository is about 19.4 GB on Hugging Face.
Official AWQ 4-bit checkpoint
4-bit checkpoint
The official Qwen2.5-32B-Instruct-AWQ checkpoint repository is about 19.3 GB on Hugging Face.
Sources