Qwen
Qwen 3 30B A3B
Qwen3 MoE release with 30.5B total parameters and 3.3B active parameters, built for lower active compute than a comparable dense model.
Overview and architecture
What it is
Company
Family
Release date
Architecture
License
Modality
Context window
Total params
Active params
Layers
Hidden size
Attention heads
KV heads
KV-bearing layers
Research highlight
What improved
MoE branch of Qwen3
This model moves Qwen3 into a sparse MoE serving geometry while keeping the same user-facing thinking/non-thinking framing.
Low active path
Only 3.3B parameters are activated per token out of 30.5B total, which is the central deployment distinction versus the dense Qwen3 line.
Agent and reasoning focus
Qwen still positions the model for reasoning, instruction following, and complex agent workflows rather than only general chat.
Training and release context
How it was released
MoE family branch
Qwen3 includes dedicated MoE models alongside the dense line, keeping the same user-facing thinking/non-thinking framing while changing the serving geometry materially.
Sparse activation
The MoE releases expose total and activated parameter counts separately, which is the key deployment distinction versus the dense Qwen3 models.
Long-context packaging
The base MoE releases are published with 32K native context and 131K support with YaRN, while the 2507 update is packaged at 256K native context.
Where it is strong
Where it is strong
Reasoning with lower active compute
The MoE line is for users who want larger total capacity without paying dense-model active compute per token.
Agent and tool use
Qwen still positions the MoE branch around agent workflows, tool calling, and mixed reasoning/general dialogue use.
Large multilingual serving
Useful when you want very large-capacity multilingual serving without moving to a purely dense 70B+ model.
Memory behavior
What dominates VRAM
Resident VRAM tracks the full 30.5B parameter pool even though token compute is closer to the 3.3B activated path, so MoE changes compute pressure more than the weight floor.
Sources