Model notes

Qwen 3.5 4B

Mid-sized Qwen3.5 checkpoint with a larger resident multimodal footprint but still practical for careful single-GPU text-only serving.

5B dense • 262,144 context • 4 KV heads

Open base model Open selected checkpoint

Architecture

Model spec

Architecture

Hybrid multimodal transformer

Total params

Active params

Dense model

Layers

Hidden size

2,560

Attention heads

KV heads

KV-bearing layers

Context length

262,144

Modality

Multimodal, text-only estimate

License

Apache 2.0

Why it matters

Why memory behaves this way

Research highlight

The 4B language model sits inside a roughly 5B resident multimodal artifact and uses only 8 gated-attention layers for KV-heavy generation.

Memory note

The hybrid layout keeps cache growth lower than dense 32-layer models, but the extra multimodal resident weights raise the single-card floor.

Checkpoints

Official profiles

Official BF16 checkpoint

BF16 checkpoint

Current

Qwen publishes Qwen3.5-4B in Hugging Face Transformers format with explicit Transformers and vLLM guidance, including a text-only serving mode in vLLM.

vLLMTransformers

Open checkpoint

Sources

Reference links

https://huggingface.co/Qwen/Qwen3.5-4Bopen