Qwen
Qwen 3.6 27B
Qwen3.6 dense hybrid release focused on coding-agent stability, repository reasoning, and preserved thinking context across longer development sessions.
Overview and architecture
What it is
Company
Family
Release date
Architecture
License
Modality
Context window
Total params
Active params
Layers
Hidden size
Attention heads
KV heads
KV-bearing layers
Training scope
Built as a unified vision-language foundation with pre-training and post-training on multimodal tokens rather than a separate late-fusion stack.
Hybrid layout
16 of 64 layers use gated attention while the rest use Gated DeltaNet blocks, so the stack is not a full-attention transformer end to end.
Context design
Published with a native 262K context window and an architecture intended to stretch beyond that range in longer-context settings.
Research highlight
What improved
Agentic coding upgrade
Qwen3.6 is framed around better coding-agent behavior, especially frontend workflows and repository-level reasoning rather than a broad architecture reset.
Thinking preservation
The release adds an option to preserve reasoning context across prior messages, which matters for iterative development workflows and multi-turn tool use.
Stability over novelty
Qwen presents 3.6 as the first open-weight follow-up to Qwen3.5 built from community feedback, with more emphasis on dependable real-world utility than on introducing a new model family.
Training and release context
How it was released
Release lineage
Qwen3.6 is a direct successor to the February Qwen3.5 series rather than a separate architecture branch, and it keeps the same unified multimodal release format.
Architecture continuity
The line still uses the hybrid DeltaNet-plus-attention recipe, so the serving geometry stays governed by partial KV layers plus static sequence state rather than by full-attention on every layer.
Deployment target
Qwen explicitly packages the release for Transformers, vLLM, SGLang, and related serving stacks, which signals an operationally mature release rather than a research-only drop.
Where it is strong
Where it is strong
Coding agents
The line is tuned most visibly for repository work, frontend changes, tool use, and multi-step coding-agent flows.
Iterative reasoning
Thinking preservation makes the release better suited to long back-and-forth development sessions where reasoning context should not be rebuilt from scratch every turn.
Long-context hybrid serving
It keeps the hybrid long-context advantage of Qwen3.5 while shifting the capability story toward developer productivity and stability.
Memory behavior
What dominates VRAM
This text-only estimate still keeps the resident multimodal checkpoint weights on card, so the floor is higher than a pure language-only artifact of similar active size.
Only 16 of 64 layers carry a standard KV cache. The remaining layers contribute a fixed sequence-state term instead, which makes long-context growth less aggressive than a dense full-attention stack.
Longer context and higher concurrency still increase memory monotonically, but more of the footprint shifts into mixed KV-plus-state behavior instead of pure transformer cache expansion.
FitMyGPU currently treats this as a text-only estimate. Resident multimodal weights remain counted, but media-token overhead is excluded.
Sources