Claude Opus 4.7 Hits 87.6% on SWE-bench, Tops Five Major AI Benchmarks
Krasa AI
2026-04-26
5 minute read
Claude Opus 4.7 Hits 87.6% on SWE-bench, Tops Five Major AI Benchmarks
Anthropic released Claude Opus 4.7 to general availability earlier this month, and the benchmark numbers are reshaping how enterprises pick a frontier model. The new flagship leads on five major public benchmarks as of release day, including a 64.3% score on SWE-bench Pro — a 10.9-point jump over the prior Opus 4.6 in a single version bump.
For developers, the bigger story is that the price stayed the same: $5 per million input tokens and $25 per million output tokens, identical to Opus 4.6. A double-digit benchmark gain at flat pricing is rare in the frontier model market.
What's New in Opus 4.7
Anthropic positions Opus 4.7 as a refinement of the Claude 4 family rather than a new generation. The headline improvements are in coding, vision resolution, instruction-following, and agentic reliability — the cluster of capabilities most relevant to long-running agent workflows.
On SWE-bench Verified, the standard benchmark for real-world GitHub issue resolution, Opus 4.7 hits 87.6%, up from 80.8% on Opus 4.6. On SWE-bench Pro, the harder variant focused on more complex codebases, it reaches 64.3%. Anthropic also reports a 70% score on CursorBench, a coding-agent benchmark maintained in collaboration with Cursor.
Outside coding, the model scores 94.2% on GPQA Diamond (graduate-level science Q&A) and 64.4% on Finance Agent — both state-of-the-art at release.
Why this matters: SWE-bench is the closest the AI industry has to a real-world test for whether a model can actually fix bugs in production codebases. Pushing past 87% on Verified means a coding agent built on Opus 4.7 will solve a meaningful majority of the issues teams put in front of it, with less hand-holding.
Where Opus 4.7 Sits Against Rivals
The competitive picture as of April 26, 2026: Opus 4.7 leads OpenAI's GPT-5.4 (57.7% on SWE-bench Pro) and Google's Gemini 3.1 Pro (54.2%) by wide margins on coding. OpenAI's GPT-5.5, released April 23, has not yet posted full SWE-bench Pro results.
DeepSeek's V4 Pro preview, released April 24, falls "marginally short" of GPT-5.4 and Gemini 3.1 Pro on coding benchmarks — putting it roughly 3 to 6 months behind the Opus 4.7 frontier.
Anthropic's restricted Claude Mythos Preview sits above Opus 4.7 on raw capability but is not commercially available. Mythos is gated behind Project Glasswing and limited to roughly 50 vetted partners.
Vision and Instruction-Following
The under-discussed upgrade in Opus 4.7 is vision. Anthropic raised the maximum image resolution the model can ingest, which makes it materially better at reading dense screenshots, design mockups, financial documents, and scientific figures.
That improvement is what makes the new Claude Design product — Anthropic's design tool launched the day after Opus 4.7 — possible. Claude Design relies on Opus 4.7's ability to interpret existing mockups and brand systems with high fidelity, then generate matching artifacts.
Instruction-following gains show up most clearly in long-running agent runs, where small drift in following instructions compounds into failed tasks. Anthropic reports lower failure rates on multi-step agent benchmarks, though the company has not published a single headline number.
Distribution and Pricing
Opus 4.7 is available across Anthropic's full distribution: claude.ai (Pro, Max, Team, Enterprise), the Anthropic API, Amazon Bedrock, Google Cloud Vertex AI, Microsoft Foundry, and GitHub Copilot's Pro+, Business, and Enterprise tiers.
Pricing is unchanged at $5 per million input tokens and $25 per million output tokens. Prompt caching and batch API discounts also carry over from Opus 4.6.
For developers, the migration path is straightforward — change the model identifier in your API call. Anthropic says behavior is largely backward-compatible with Opus 4.6, though prompts heavily tuned for older Claude models may behave slightly differently.
Industry Implications
For Cursor, Replit, GitHub Copilot, and the broader AI coding ecosystem, Opus 4.7 raises the bar on coding-agent expectations. Cursor routes a significant share of traffic to Claude, and the SWE-bench Pro jump means fewer abandoned agent sessions for its users.
For OpenAI, the timing is awkward. GPT-5.5 launched a week later with a strong coding pitch, but public benchmarks favor Anthropic for now. Both labs have been trading the SWE-bench lead for over a year.
For Google, Gemini 3.1 Pro's coding scores have lagged for several releases. Google is betting on the broader Gemini Enterprise Agent Platform — but for code-heavy customers, the model gap matters.
Expert Perspectives
Researchers and engineers on X focused on two themes. First, the size of the SWE-bench Pro jump suggests Anthropic made real progress on long-context coding reasoning, not just fine-tuning. Second, flat pricing is a strategic signal: Anthropic is choosing to compete on capability per dollar.
Skeptics noted benchmarks tell only part of the story — real-world coding agents fail in ways benchmarks don't capture.
What's Next
Expect Anthropic to follow Opus 4.7 with more application-layer products. Claude Design is already in market, and Claude Code and Claude Cowork are likely to get Opus 4.7-powered upgrades soon. On the infrastructure side, the recent $40 billion Google investment and 5-gigawatt Broadcom-Google compute deal will fund the next Opus generation.
The bottom line: Claude Opus 4.7 is the model to beat for AI coding as of late April 2026, and at unchanged pricing it's an easy swap for any team on a frontier coding model. If you build agents or run developer tools, this is the upgrade to evaluate this week.
Sources
Don't fall behind
Expert AI Implementation →Related Articles
NVIDIA Cosmos 3: First Open Physical AI Omnimodel Cuts Training Cycles to Days
NVIDIA's Cosmos 3 launches at Computex 2026 — a fully open foundation model that unifies vision, world generation, and action for robots and autonomous systems.
min read
Anthropic Adds Services Track and Partner Hub to Claude Network
Anthropic launches a 3-tier Services Track and a public Partner Hub. 40,000 firms have applied; 10,000 consultants are certified.
min read
Apoha Exits Stealth With $36M to Build 'Liquid Brain' AI for Materials
UK startup Apoha emerges with $36M Series A and a wild new data type: how materials vibrate in liquid. The pitch is AI for materials discovery.
min read