Microsoft Phi
Phi-4 14B
Reasoning-oriented dense Phi model with moderate context length and a straightforward single-GPU footprint.
Overview and architecture
What it is
Company
Family
Release date
Architecture
License
Modality
Context window
Total params
Active params
Layers
Hidden size
Attention heads
KV heads
KV-bearing layers
Research highlight
What improved
Reasoning-per-parameter focus
Microsoft positions Phi-4 around unusually strong reasoning and coding quality for its size, so the release story is capability density rather than frontier-scale parameters.
Synthetic and curated data mix
The model card emphasizes the training recipe itself, especially high-quality synthetic and curated data for math, code, instruction following, and commonsense tasks.
Straightforward dense deployment
Phi-4 does not introduce sparse routing or hybrid attention; the practical angle is that it stays a normal dense deployment target while aiming for stronger reasoning than many peers in its class.
Training and release context
How it was released
Data-centric release
Phi-4 is framed heavily around its synthetic and curated training recipe rather than around a radical architecture change.
Architecture continuity
The family stays close to a conventional dense-transformer deployment story rather than introducing sparse or hybrid serving behavior.
Packaging path
Microsoft complements the BF16 release with an official ONNX INT4 path, so lower-VRAM deployment is part of the release packaging itself.
Where it is strong
Where it is strong
Reasoning density
Strong per-parameter reasoning is the main reason to consider Phi-4.
Coding and math
The release is consistently framed around quantitative and code-heavy capability.
Smaller deployment footprint
Useful when teams want a serious reasoning model without stepping into 30B+ VRAM territory.
Memory behavior
What dominates VRAM
With a moderate context window, the model behaves like a classic dense checkpoint where weights dominate and cache stays secondary.
Sources