Model notes
Qwen 3.5 9B
Largest practical Qwen3.5 release for this batch, pairing a 9B language model with a resident multimodal stack that still targets single-GPU text serving.
10B dense • 262,144 context • 4 KV heads
Architecture
Model spec
Architecture
Total params
Active params
Layers
Hidden size
Attention heads
KV heads
KV-bearing layers
Context length
Modality
License
Why it matters
Why memory behaves this way
Research highlight
The hybrid layout keeps only 8 of 32 layers in the gated-attention path, which materially changes KV-cache behavior versus a dense long-context model.
Memory note
This estimate intentionally keeps the full multimodal checkpoint resident even for text-only use, so it is conservative relative to runtime-specific language-only shortcuts.
Checkpoints
Official profiles
Official BF16 checkpoint
BF16 checkpoint
Qwen documents Qwen3.5-9B for Transformers and vLLM.
Sources