Qwen
Qwen 3 4B
Dense Qwen3 release with switchable thinking modes, stronger reasoning, and 131K extended-context support through YaRN.
Overview and architecture
What it is
Company
Family
Release date
Architecture
License
Modality
Context window
Total params
Active params
Layers
Hidden size
Attention heads
KV heads
KV-bearing layers
Research highlight
What improved
Thinking-mode switch
The 4B model keeps the family’s dual-mode design, letting one checkpoint cover deeper reasoning and faster everyday dialogue.
Reasoning and agent uplift
Qwen emphasizes stronger reasoning, tool use, and agent-task performance over earlier Qwen2.5 instruct models.
Extended context with YaRN
The model is published with 32K native context and 131K support through YaRN, which is part of its practical release story.
Training and release context
How it was released
Family release
Qwen3 is released as a dense and MoE model family centered on switching between thinking and non-thinking modes within the same model.
Training stage
Qwen describes the release as a pretraining plus post-training model rather than a small instruction-only adaptation.
Context packaging
The 4B model is published with 32K native context, and the larger dense variants explicitly extend to 131K with YaRN.
Where it is strong
Where it is strong
Thinking and non-thinking use
The 4B release is built to switch between deeper reasoning mode and faster general dialogue mode without changing models.
Agent workflows
Qwen positions the family for tool use and agent-style tasks in both thinking and non-thinking modes.
Multilingual assistant work
The family is published with support for 100+ languages and dialects, making it a broad multilingual assistant line rather than a narrow specialist release.
Memory behavior
What dominates VRAM
This is still a dense model, so weights set the floor; the main extra lever is that longer contexts can extend to 131K when YaRN-style serving is used.
Sources