OpenAI Ships GPT-5.5: The Agent That Actually Operates Your Computer

OpenAI released GPT-5.5 on Thursday, just seven weeks after GPT-5.4. The company is calling it "a new class of intelligence" — its first fully retrained base model since GPT-4.5, and the model that makes OpenAI's long-promised agent vision feel real instead of speculative.

The headline numbers: 82.7% on Terminal-Bench 2.0, 84.9% on GDPval, and 78.7% on OSWorld-Verified, the benchmark that measures whether a model can autonomously operate real computer environments. On Tau2-bench Telecom — a test of multi-turn customer-service workflows — GPT-5.5 hits 98.0% without any prompt tuning. Across those benchmarks, OpenAI says GPT-5.5 edges out Anthropic's Claude Opus 4.7 and Google's Gemini 3.1 Pro.

Context: The "Spud" That Finished Training Two Weeks Ago

This release has been telegraphed. OpenAI finished pretraining on the model — codenamed "Spud" internally — in late March, and Altman has been openly talking about it at employee all-hands. The window between GPT-5.4 (early March) and GPT-5.5 is the shortest gap between flagship releases in OpenAI's history.

The pace matters. Anthropic shipped Claude Opus 4.7 on April 16. Google announced its Gemini Enterprise Agent Platform and its 8th-generation TPUs at Cloud Next on April 22. GPT-5.5 lands in the middle of the most compressed week of frontier-model and infrastructure announcements the industry has ever seen.

What's Actually New

GPT-5.5 is built around one central premise: you give it a messy, multi-part task, and you trust it to plan, pick the right tools, check its own work, and keep going until the job is done. That's a meaningful shift from GPT-5.4, which was better at single-turn reasoning but still needed hand-holding for long workflows.

The concrete capabilities OpenAI highlighted:

Computer use. The model can see a screen, click, type, and move between applications. The 78.7% OSWorld-Verified score is the highest any general-purpose model has posted on that benchmark — and OSWorld is specifically designed to be hard, with tasks spread across browsers, spreadsheets, file managers, and terminals.

Long, multi-step workflows. GPT-5.5 can operate across tools for extended runs — the kind of work where GPT-5.4 would lose the thread halfway through. OpenAI's framing: "messy, multi-part tasks."

Scientific and technical research. OpenAI claims meaningful gains on expert-level research workflows, with the model now capable of helping scientists make actual progress rather than just summarizing papers.

Documents and spreadsheets. The model produces finished outputs — populated .xlsx files, formatted .pdf reports — rather than descriptions of what those outputs should contain. It's the same "artifact generation" push that xAI shipped in Grok 4.3 Beta last week and Anthropic teased with Claude Design.

The context window expands to 1 million tokens, matching Gemini and the larger end of open-source releases.

Pricing: Double the Cost of GPT-5.4

Here's the part developers won't love: GPT-5.5 API pricing is $5 per million input tokens and $30 per million output tokens. That's exactly 2x GPT-5.4, which ran at $2.50 input and $15 output. GPT-5.5 Pro, the higher-accuracy variant, costs $30 input and $180 output — six times GPT-5.5 standard.

OpenAI hasn't said much about the pricing jump publicly, but the implied logic is straightforward. A model that can actually complete a multi-step workflow on its own is worth more per token than a model that needs a human in the loop. And the compute costs of a fully retrained base model — GPT-5.5 isn't a fine-tune, it's a new training run — are substantial.

For consumer users, GPT-5.5 rolls out starting Thursday to Plus, Pro, Business, and Enterprise tiers in ChatGPT and Codex. GPT-5.5 Pro ships to Pro, Business, and Enterprise only. API availability is "very soon," pending additional safety work.

Industry Impact

The people who should pay closest attention are the enterprise buyers evaluating frontier models for automation projects. OSWorld-Verified isn't an academic benchmark — it maps directly to the "have an AI do my computer work" use case that every RPA vendor, customer-support platform, and back-office automation team is trying to build around. If GPT-5.5 can hold 78.7% accuracy on that benchmark in production, a lot of pilot projects suddenly have a much better chance of making it to rollout.

The competitive picture gets tighter. Claude Opus 4.7 and Gemini 3.1 Pro are both strong frontier models, and GPT-5.5's lead on the announced benchmarks is narrow rather than decisive. VentureBeat's early review noted that GPT-5.5 "narrowly beats Anthropic's Claude Mythos Preview on Terminal-Bench 2.0" — meaning the frontier is functionally a three-way tie with different strengths in different domains.

For the "super app" thesis OpenAI has been selling to enterprises, GPT-5.5 is the model that makes the pitch coherent. The vision has always been ChatGPT plus Codex plus an AI browser, unified into one thing a knowledge worker can send tasks to. That was hard to imagine with GPT-5.4. It's easier to imagine now.

What's Next

Watch for the API launch, which OpenAI said is "very soon" but hasn't dated. That's when the Terminal-Bench and OSWorld numbers get tested by actual developer workloads rather than OpenAI's own eval harness.

Watch for Claude Opus 4.7's response. Anthropic shipped Opus 4.7 a week before this launch, and early users have been reporting it's meaningfully strong on coding and long-context tasks. A head-to-head benchmark comparison in the next two weeks will likely drive a lot of enterprise decision-making.

Watch the price. If GPT-5.4 at $2.50/$15 was already pressuring Anthropic on price-performance, GPT-5.5 at $5/$30 opens the door for Claude Sonnet and Gemini Flash to pitch themselves as the "good enough, half the cost" option for less demanding workloads.

Bottom Line

If you build with the OpenAI API, GPT-5.5 is the first model that genuinely makes "let the AI drive the computer" feel viable instead of aspirational — but you'll pay double for the privilege. If you're on ChatGPT Plus or Pro, log in and try it on a multi-step task you'd normally do yourself. The benchmarks suggest you'll be surprised by how far it gets.

OpenAI Ships GPT-5.5: The Agent That Actually Operates Your Computer

OpenAI Ships GPT-5.5: The Agent That Actually Operates Your Computer

Context: The "Spud" That Finished Training Two Weeks Ago

What's Actually New

Pricing: Double the Cost of GPT-5.4

Industry Impact

What's Next

Bottom Line

Sources

Don't fall behind

Related Articles

China Blocks Meta's $2B Acquisition of AI Startup Manus

EU Expands Digital Markets Act to Target Cloud and AI

Meta Reserves 1 GW of Space Solar to Power Its AI Data Centers