OpenAI Launches GPT-5.4 Mini and Nano for High-Volume AI Workloads

Peter Zhang Mar 17, 2026 18:05

OpenAI releases GPT-5.4 mini and nano models with 2x faster speeds and dramatically lower costs, targeting coding assistants and agentic AI systems.

OpenAI Launches GPT-5.4 Mini and Nano for High-Volume AI Workloads

OpenAI dropped its most cost-efficient models yet on March 17, 2026—GPT-5.4 mini and nano—targeting developers building latency-sensitive applications where the flagship model's horsepower becomes overkill.

The mini variant runs more than twice as fast as GPT-5 mini while approaching the full GPT-5.4's performance on coding benchmarks. On SWE-Bench Pro, mini scored 54.4% compared to the flagship's 57.7%—a narrow gap that matters when you're paying 75 cents per million input tokens instead of premium rates.

Nano goes even cheaper at $0.20 per million input tokens and $1.25 per million output tokens. OpenAI positions it for classification, data extraction, and what they call "coding subagents"—smaller AI workers handling simpler tasks within larger systems.

The Subagent Play

Here's where this gets interesting for developers building agentic systems. OpenAI is explicitly pushing a tiered architecture: let GPT-5.4 handle planning and complex judgment while mini or nano subagents execute narrower tasks in parallel. In their Codex platform, mini uses only 30% of the GPT-5.4 quota.

The benchmark numbers back this up. Mini hit 72.1% on OSWorld-Verified for computer use tasks—nearly matching the flagship's 75%—while nano dropped to 39%. Translation: mini can interpret screenshots and navigate interfaces almost as well as the big model, but nano shouldn't touch those workflows.

Where Each Model Fits

The performance spread tells you exactly what OpenAI optimized for:

Mini excels at coding (54.4% SWE-Bench Pro, 60% Terminal-Bench 2.0) and tool-calling (93.4% on τ2-bench telecom tasks). It supports a 400k context window with text and image inputs, web search, and function calling.

Nano trades capability for cost efficiency. It scored 52.4% on SWE-Bench Pro and 46.3% on Terminal-Bench 2.0—respectable for a model at one-quarter mini's price point. But its long-context performance drops significantly, hitting just 33.1% on the 128K-256K needle retrieval test.

Hebbia's CTO Aabhas Sharma noted that mini "matched or exceeded competitive models on several output tasks and citation recall at a much lower cost" while achieving "stronger source attribution than the larger GPT-5.4 model."

Availability

Mini is live across the API, Codex, and ChatGPT. Free and Go users can access it through the Thinking feature; other tiers get it as a rate limit fallback for GPT-5.4 Thinking.

Nano remains API-only—a signal that OpenAI sees it primarily as infrastructure for developers rather than a consumer-facing product.

For teams running high-volume AI workloads, the math just changed. The question isn't whether to use smaller models anymore—it's figuring out which tasks actually need the flagship.

Image source: Shutterstock