Gemma
Gemma 2 27B
Larger Gemma model that trades a shorter native context window for more capacity per token.
Overview and architecture
What it is
Company
Family
Release date
Architecture
License
Modality
Context window
Total params
Active params
Layers
Hidden size
Attention heads
KV heads
KV-bearing layers
Research highlight
What improved
Compact open model line
Google positions Gemma 2 27B as part of a lightweight open family derived from Gemini-era research, aimed at getting strong dense-model quality from smaller deployment footprints.
Efficiency over frontier scale
The release emphasis is efficient open deployment and good quality-per-parameter, not sparse routing, multimodal fusion, or ultra-long-context serving.
Instruction-tuned product focus
The instruction variants are framed as practical developer models, so the story is real deployment usability rather than experimental architecture novelty.
Training and release context
How it was released
Open-model tier
Gemma 2 sits in Google's smaller open model line rather than in the flagship Gemini product tier.
Architecture continuity
The family stays within a straightforward dense-transformer deployment pattern and does not depend on sparse or hybrid serving mechanics.
Release packaging
The instruction-tuned variants are packaged as practical deployment checkpoints rather than as research-preview artifacts.
Where it is strong
Where it is strong
Efficiency
Strong quality-per-parameter for teams that want a smaller dense model footprint.
General language tasks
Useful as a compact open baseline for assistant-style and retrieval-augmented applications.
Operational simplicity
Straightforward dense checkpoints make the family easier to reason about than more exotic architectures.
Memory behavior
What dominates VRAM
Because the context window is shorter, most VRAM pressure comes from resident weights rather than cache growth.
Sources