FitMyGPU
Back to calculator

Qwen

Qwen 3 30B A3B Instruct 2507

Non-thinking Qwen3 MoE update with stronger general capabilities, better alignment, and native 256K context packaging.

Overview and architecture

What it is

Company

Qwen

Family

Qwen

Release date

Jul 28, 2025

Architecture

Mixture-of-experts transformer

License

Apache 2.0

Modality

Text

Context window

262,144

Total params

30.5B

Active params

3.3B

Layers

48

Hidden size

2,048

Attention heads

32

KV heads

4

KV-bearing layers

48

Research highlight

What improved

Non-thinking update

This release is explicitly the non-thinking-mode update and no longer requires users to force thinking off at inference time.

General-capability uplift

Qwen describes stronger instruction following, logical reasoning, comprehension, mathematics, science, coding, and tool use than the earlier non-thinking version.

256K native context

The update is packaged with 256K native context, making long-context serving more central than in the base 30B-A3B release.

Training and release context

How it was released

Release lineage

This is an updated non-thinking-mode variant of Qwen3-30B-A3B rather than a brand-new architecture branch.

MoE geometry

The model keeps the same 30.5B total / 3.3B active parameter geometry, 48 layers, 128 experts, and 8 activated experts as the base A3B release.

Output behavior

Qwen notes that this update no longer emits <think></think> blocks and is intended as a cleaner non-thinking deployment target.

Where it is strong

Where it is strong

General assistant quality

Best fit when you want the Qwen3 MoE branch without exposing explicit thinking-mode behavior in outputs.

Tool and workflow use

Qwen emphasizes stronger tool usage, instruction following, and text generation alignment in this update.

Long-context non-thinking serving

The 256K native context makes it useful for long-input assistant workflows where explicit reasoning blocks are not desired.

Memory behavior

What dominates VRAM

Resident VRAM still tracks the full 30.5B MoE checkpoint, but the 256K native context means cache growth becomes much more visible during long-context serving than in the base 32K-native release.

Sources

Where this page is grounded