Claude Opus 4.7 Hits 87.6% on SWE-bench, Tops Five Major AI Benchmarks

Anthropic released Claude Opus 4.7 to general availability earlier this month, and the benchmark numbers are reshaping how enterprises pick a frontier model. The new flagship leads on five major public benchmarks as of release day, including a 64.3% score on SWE-bench Pro — a 10.9-point jump over the prior Opus 4.6 in a single version bump.

For developers, the bigger story is that the price stayed the same: $5 per million input tokens and $25 per million output tokens, identical to Opus 4.6. A double-digit benchmark gain at flat pricing is rare in the frontier model market.

What's New in Opus 4.7

Anthropic positions Opus 4.7 as a refinement of the Claude 4 family rather than a new generation. The headline improvements are in coding, vision resolution, instruction-following, and agentic reliability — the cluster of capabilities most relevant to long-running agent workflows.

On SWE-bench Verified, the standard benchmark for real-world GitHub issue resolution, Opus 4.7 hits 87.6%, up from 80.8% on Opus 4.6. On SWE-bench Pro, the harder variant focused on more complex codebases, it reaches 64.3%. Anthropic also reports a 70% score on CursorBench, a coding-agent benchmark maintained in collaboration with Cursor.

Outside coding, the model scores 94.2% on GPQA Diamond (graduate-level science Q&A) and 64.4% on Finance Agent — both state-of-the-art at release.

Why this matters: SWE-bench is the closest the AI industry has to a real-world test for whether a model can actually fix bugs in production codebases. Pushing past 87% on Verified means a coding agent built on Opus 4.7 will solve a meaningful majority of the issues teams put in front of it, with less hand-holding.

Where Opus 4.7 Sits Against Rivals

The competitive picture as of April 26, 2026: Opus 4.7 leads OpenAI's GPT-5.4 (57.7% on SWE-bench Pro) and Google's Gemini 3.1 Pro (54.2%) by wide margins on coding. OpenAI's GPT-5.5, released April 23, has not yet posted full SWE-bench Pro results.

DeepSeek's V4 Pro preview, released April 24, falls "marginally short" of GPT-5.4 and Gemini 3.1 Pro on coding benchmarks — putting it roughly 3 to 6 months behind the Opus 4.7 frontier.

Anthropic's restricted Claude Mythos Preview sits above Opus 4.7 on raw capability but is not commercially available. Mythos is gated behind Project Glasswing and limited to roughly 50 vetted partners.

Vision and Instruction-Following

The under-discussed upgrade in Opus 4.7 is vision. Anthropic raised the maximum image resolution the model can ingest, which makes it materially better at reading dense screenshots, design mockups, financial documents, and scientific figures.

That improvement is what makes the new Claude Design product — Anthropic's design tool launched the day after Opus 4.7 — possible. Claude Design relies on Opus 4.7's ability to interpret existing mockups and brand systems with high fidelity, then generate matching artifacts.

Instruction-following gains show up most clearly in long-running agent runs, where small drift in following instructions compounds into failed tasks. Anthropic reports lower failure rates on multi-step agent benchmarks, though the company has not published a single headline number.

Distribution and Pricing

Opus 4.7 is available across Anthropic's full distribution: claude.ai (Pro, Max, Team, Enterprise), the Anthropic API, Amazon Bedrock, Google Cloud Vertex AI, Microsoft Foundry, and GitHub Copilot's Pro+, Business, and Enterprise tiers.

Pricing is unchanged at $5 per million input tokens and $25 per million output tokens. Prompt caching and batch API discounts also carry over from Opus 4.6.

For developers, the migration path is straightforward — change the model identifier in your API call. Anthropic says behavior is largely backward-compatible with Opus 4.6, though prompts heavily tuned for older Claude models may behave slightly differently.

Industry Implications

For Cursor, Replit, GitHub Copilot, and the broader AI coding ecosystem, Opus 4.7 raises the bar on coding-agent expectations. Cursor routes a significant share of traffic to Claude, and the SWE-bench Pro jump means fewer abandoned agent sessions for its users.

For OpenAI, the timing is awkward. GPT-5.5 launched a week later with a strong coding pitch, but public benchmarks favor Anthropic for now. Both labs have been trading the SWE-bench lead for over a year.

For Google, Gemini 3.1 Pro's coding scores have lagged for several releases. Google is betting on the broader Gemini Enterprise Agent Platform — but for code-heavy customers, the model gap matters.

Expert Perspectives

Researchers and engineers on X focused on two themes. First, the size of the SWE-bench Pro jump suggests Anthropic made real progress on long-context coding reasoning, not just fine-tuning. Second, flat pricing is a strategic signal: Anthropic is choosing to compete on capability per dollar.

Skeptics noted benchmarks tell only part of the story — real-world coding agents fail in ways benchmarks don't capture.

What's Next

Expect Anthropic to follow Opus 4.7 with more application-layer products. Claude Design is already in market, and Claude Code and Claude Cowork are likely to get Opus 4.7-powered upgrades soon. On the infrastructure side, the recent $40 billion Google investment and 5-gigawatt Broadcom-Google compute deal will fund the next Opus generation.

The bottom line: Claude Opus 4.7 is the model to beat for AI coding as of late April 2026, and at unchanged pricing it's an easy swap for any team on a frontier coding model. If you build agents or run developer tools, this is the upgrade to evaluate this week.

Claude Opus 4.7 Hits 87.6% on SWE-bench, Tops Five Major AI Benchmarks

Claude Opus 4.7 Hits 87.6% on SWE-bench, Tops Five Major AI Benchmarks

What's New in Opus 4.7

Where Opus 4.7 Sits Against Rivals

Vision and Instruction-Following

Distribution and Pricing

Industry Implications

Expert Perspectives

What's Next

Sources

Don't fall behind

Related Articles

OpenAI's GeneBench-Pro Tests AI on Real Biology Research

Anthropic Launches Claude Fable 5: Its Most Capable Model Yet

China Plans $295B AI Data Center Buildout to Rival the US