FitMyGPU
Back to calculator

OpenAI

GPT-OSS 20B

Smaller GPT-OSS release for general-purpose and reasoning use cases that need to stay within a much lighter single-card memory budget.

Overview and architecture

What it is

Company

OpenAI

Family

GPT-OSS

Release date

Aug 4, 2025

Architecture

Mixture-of-experts transformer

License

Apache 2.0

Modality

Text

Context window

128,000

Total params

21B

Active params

3.6B

Layers

24

Hidden size

2,880

Attention heads

64

KV heads

8

KV-bearing layers

24

Research highlight

What improved

Low-memory GPT-OSS entry point

The main release-level change is that GPT-OSS capability becomes practical in roughly 16 GB of memory rather than requiring an 80 GB class accelerator.

Configurable reasoning effort

Like the larger model, gpt-oss-20b supports low, medium, and high reasoning effort settings so latency and reasoning depth can be traded off per use case.

Native agent features

The smaller release still keeps the same first-class agent surface: function calling, web browsing, Python execution, and structured outputs.

Full chain-of-thought access

OpenAI exposes the reasoning trace for debugging and trust, even though it is not intended for direct end-user display.

Training and release context

How it was released

Harmony-only format

Both GPT-OSS models were trained on OpenAI's Harmony response format and are expected to be used with that format rather than a generic chat template.

Model geometry

gpt-oss-20b uses 24 layers, 21B total parameters, 3.6B active parameters per token, 32 total experts, 4 active experts per token, and a 128K context window.

Quantized MoE release

The MoE weights were post-trained in MXFP4, which is the release decision that makes the smaller checkpoint practical in roughly 16 GB of memory.

Training data and tokenizer

OpenAI describes the training mix as mostly English, text-only data with emphasis on STEM, coding, and general knowledge, tokenized with the open-sourced o200k_harmony tokenizer.

Where it is strong

Where it is strong

Smaller-memory deployment

Best fit when you want GPT-OSS reasoning and agent behavior without stepping into 80 GB class hardware first.

General-purpose assistant work

Designed as a broad open assistant and reasoning model rather than a narrow specialist checkpoint.

Fine-tuning and customization

OpenAI positions the model as fine-tunable, which makes it useful when a smaller open reasoning model needs to be adapted to a specific task.

Commercial deployment

The Apache 2.0 license keeps experimentation and product deployment straightforward for teams that want permissive usage terms.

Memory behavior

What dominates VRAM

More than 90% of GPT-OSS 20B's parameters sit in MoE weights quantized to MXFP4, while the remaining shared weights stay in BF16.

Sources

Where this page is grounded