OpenAI

GPT-OSS 120B

Production GPT-OSS release for general-purpose and higher-reasoning workloads that can fit on a single 80 GB class GPU.

Overview and architecture

What it is

Company

OpenAI

Family

GPT-OSS

Release date

Aug 4, 2025

Architecture

Mixture-of-experts transformer

License

Apache 2.0

Modality

Text

Context window

128,000

Total params

117B

Active params

5.1B

Layers

Hidden size

2,880

Attention heads

KV heads

KV-bearing layers

Research highlight

What improved

Single-80GB target

The release is explicitly positioned around fitting production-grade reasoning into one 80 GB class GPU such as an H100 or MI300X, which is the main operational change versus a typical 100B+ open model.

Configurable reasoning effort

OpenAI exposes low, medium, and high reasoning effort settings so latency and reasoning depth can be traded off at inference time instead of using one fixed behavior.

Native agent features

The model is released with native support for function calling, web browsing, Python execution, and structured outputs rather than treating those as wrapper-level add-ons.

Full chain-of-thought access

The release provides access to the model's reasoning trace for debugging and auditability, though OpenAI notes it is not intended for direct end-user display.

Training and release context

How it was released

Harmony-only format

Both GPT-OSS models were trained on OpenAI's Harmony response format and are expected to be used with that format rather than a generic chat template.

Model geometry

gpt-oss-120b uses 36 layers, 117B total parameters, 5.1B active parameters per token, 128 total experts, 4 active experts per token, and a 128K context window.

Quantized MoE release

The MoE weights were post-trained in MXFP4, which is the packaging decision that makes the 120B checkpoint practical on a single 80 GB GPU.

Training data and tokenizer

OpenAI describes the training mix as mostly English, text-only data with emphasis on STEM, coding, and general knowledge, tokenized with the open-sourced o200k_harmony tokenizer.

Where it is strong

Production general-purpose serving

Best fit when you want one open model that can cover broad assistant, coding, and reasoning workloads without moving to multi-GPU serving first.

High-reasoning workloads

Strong match for use cases that benefit from controllable deeper reasoning rather than the fastest possible low-latency answers.

Fine-tuning and customization

OpenAI explicitly positions the model as fine-tunable, which matters if you want to adapt one large reasoning-capable checkpoint to a narrower production task.

Commercial deployment

The Apache 2.0 license makes it unusually straightforward to experiment, customize, and deploy commercially without copyleft friction.

Memory behavior

What dominates VRAM

More than 90% of GPT-OSS 120B's parameters sit in MXFP4-quantized MoE weights, while the remaining shared weights stay in BF16.

Sources

Where this page is grounded

https://openai.com/open-modelsopen https://huggingface.co/openai/gpt-oss-120bopen