Model notes
Qwen 2.5 7B
Small-to-mid-sized Qwen model with long context support and efficient grouped KV heads.
7.6B dense • 131,072 context • 4 KV heads
Architecture
Model spec
Architecture
Total params
Active params
Layers
Hidden size
Attention heads
KV heads
KV-bearing layers
Context length
Modality
License
Why it matters
Why memory behaves this way
Research highlight
Long-context Qwen architecture with grouped KV heads to keep inference memory manageable.
Memory note
This is still a dense model, so resident weights set the floor; the compact KV layout mainly helps as context grows.
Checkpoints
Official profiles
Official BF16 checkpoint
BF16 checkpoint
The official Qwen2.5-7B-Instruct checkpoint repository is about 15.2 GB on Hugging Face.
Official GPTQ 4-bit checkpoint
4-bit checkpoint
The official Qwen2.5-7B-Instruct-GPTQ-Int4 checkpoint repository is about 5.59 GB on Hugging Face.
Official AWQ 4-bit checkpoint
4-bit checkpoint
The official Qwen2.5-7B-Instruct-AWQ checkpoint repository is about 5.58 GB on Hugging Face.
Sources