Model notes
Qwen 3.5 0.8B
Compact Qwen3.5 checkpoint with a hybrid text-plus-vision stack and a small resident footprint for text-only local experimentation.
900M dense • 262,144 context • 2 KV heads
Architecture
Model spec
Architecture
Total params
Active params
Layers
Hidden size
Attention heads
KV heads
KV-bearing layers
Context length
Modality
License
Why it matters
Why memory behaves this way
Research highlight
Qwen3.5 alternates gated DeltaNet blocks with gated attention, so only a subset of layers carry a full KV cache during text generation.
Memory note
This text-only estimate still counts the resident multimodal checkpoint weights; only media-token-specific memory is excluded in v1.
Checkpoints
Official profiles
Official BF16 checkpoint
BF16 checkpoint
Qwen ships Qwen3.5-0.8B in Hugging Face Transformers format with documented Transformers and vLLM usage.
Sources