Cursor Composer 2.5 Matches Opus 4.7 at One-Tenth the Cost

Cursor has officially shipped Composer 2.5, its third-generation in-house coding agent — and the pricing is what's making the whole industry pay attention. The model matches Claude Opus 4.7 and GPT-5.5 on coding benchmarks at roughly one-tenth the cost per task, according to benchmark data Cursor published with the launch.

With a free first-week promotion ending today (May 25), Cursor is also betting that developers who try Composer 2.5 will stick with it once the discount runs out.

The Headline Numbers

On SWE-Bench Multilingual, the benchmark most often cited for coding agents, Composer 2.5 scores 79.8%. On Cursor's own CursorBench v3.1, it hits 63.2%. Both numbers are essentially tied with the frontier models from Anthropic and OpenAI — within margin of error on the public scores Opus 4.7 and GPT-5.5 have posted.

The pricing is where Composer 2.5 separates from the pack. Standard tier sits at $0.50 per million input tokens and $2.50 per million output. The "Fast" interactive tier is $3.00 in and $15.00 out per million. For comparison, Opus 4.7 lists at roughly $15.00 in and $75.00 out — almost exactly 10x higher on the output side, which is where coding tasks burn the most tokens.

Why this matters: agent runs are token-hungry. A real coding session — read a file, plan an edit, write code, run tests, iterate — easily produces hundreds of thousands of output tokens. At Opus prices, that's expensive. At Composer 2.5 prices, it's affordable to run on production workloads.

What's Under the Hood

The base architecture is Moonshot AI's open-source Kimi K2.5 checkpoint — a mixture-of-experts (MoE) model with roughly 1 trillion total parameters and about 32 billion active parameters per inference. That's the same family Cursor used for Composer 2, but the post-training is where the real work happened.

Cursor spent 85% of the total compute budget on its own post-training pipeline: reinforcement learning, continued pretraining, and a new technique the team calls textual feedback RL. Instead of learning only from a final reward at the end of a long rollout, the model gets localized hints at the exact tool call where it failed. That changes the learning signal from "you got it wrong somewhere" to "you got it wrong here, and here's why."

The team also generated 25x more synthetic training tasks than for Composer 2, including a category it calls "feature deletion" puzzles — exercises where the model has to reconstruct deleted functionality from surrounding context. And they rebuilt the training infrastructure for MoE scale, using sharded Muon optimizers and dual-mesh hybrid sharded data parallelism (a way to split very large models across many GPUs efficiently).

The result is a model that handles longer, more autonomous coding jobs without losing the thread mid-task.

What It Actually Does

Composer 2.5 is an agentic coding model, not a chat assistant. It reads files, writes code across multiple files at once, runs terminal commands, executes tests, iterates on failures, and stays inside the Cursor IDE and CLI — no human required at each step. A developer hands it a task, the agent grinds through the work, and the developer reviews the result.

That's the same loop other coding agents target. What's different here is the cost per loop. At Composer 2.5 prices, running an agent on every pull request, every bug ticket, or every small refactor stops being a budget question. Teams can leave agents running.

The pricing also makes Composer 2.5 viable for indie hackers and small teams who couldn't afford frontier-model agent loops at scale. That's a meaningful expansion of the user base.

Why the Open-Source Base Matters

Building on Kimi K2.5 is a strategic choice with real implications. Moonshot AI's decision to open-source the 1T-parameter checkpoint a few weeks ago gave Cursor — and any other lab willing to invest in post-training — a starting point that would have cost hundreds of millions to train from scratch.

This is part of the larger pattern that's been playing out all year. Open-source base models from Moonshot, Alibaba's Qwen team, and DeepSeek have collapsed the cost of getting to frontier-level performance. The differentiator has moved from "who has the biggest model" to "who has the best post-training stack." Cursor's investment in textual feedback RL and the synthetic task pipeline is exactly that kind of differentiator.

Frontier labs are now competing not against each other's flagship models but against the combination of open-source base + specialist post-training. That's a different shape of competition than 2025.

Industry Reaction

The launch landed with the developer community quickly, and reactions on X have leaned heavily toward "this changes the math." Several engineering leaders posted side-by-side cost comparisons showing the same agent run costing $40 on Opus 4.7 and $4 on Composer 2.5.

Analysts at WinBuzzer and Testing Catalog framed Composer 2.5 as a pricing event more than a capability event. If the CursorBench cost curve holds in production, the frontier labs will need to respond — either with their own cheaper coding-specific models, or with price cuts on the flagship tier.

Anthropic and OpenAI are both reportedly testing more aggressive pricing on coding-specific products. xAI's Grok Build CLI, launched earlier in May, was already positioned as a lower-cost coding agent. Composer 2.5 just lowered the floor again.

What's Next

The free-week promo ends today (May 25). After that, Composer 2.5 reverts to standard Cursor pricing — which is still aggressive relative to frontier models, but no longer free for heavy use. Developers who've been testing the model on real workloads will now decide whether to migrate their default agent from Opus or GPT-5.5 to Composer.

Cursor has also confirmed that a successor is already in training, in collaboration with SpaceX and xAI under a partnership the companies are calling "SpaceXAI." That model is being trained on Colossus 2 with roughly 10x the compute of Composer 2.5. No timeline yet, but the implication is clear: Cursor's pricing pressure isn't going to ease.

Bottom Line

Composer 2.5 isn't the most capable coding agent on the market — it's tied. What makes it the news of the week is the price. At one-tenth the cost of the frontier flagships, it shifts agent coding from a premium feature into a default workflow. The question for Anthropic, OpenAI, and Google is no longer "can we be better?" but "can we be 10x cheaper?"

Cursor Composer 2.5 Matches Opus 4.7 at One-Tenth the Cost

Cursor Composer 2.5 Matches Opus 4.7 at One-Tenth the Cost

The Headline Numbers

What's Under the Hood

What It Actually Does

Why the Open-Source Base Matters

Industry Reaction

What's Next

Bottom Line

Sources

Don't fall behind

Related Articles

Anthropic Launches Claude Science and Enters Drug Discovery

AI Uncovers Squidbleed, a 29-Year-Old Squid Proxy Bug

Anthropic Launches Claude Fable 5: Its Most Capable Model Yet