DeepSeek Makes 75% V4-Pro Price Cut Permanent, Reigniting AI Price War
Krasa AI
2026-05-29
5 minute read
DeepSeek Makes 75% V4-Pro Price Cut Permanent, Reigniting AI Price War
DeepSeek has made its 75% price cut on flagship model V4-Pro permanent, locking in API pricing that is roughly seven times cheaper than Anthropic's Claude Opus 4.7 and nine times cheaper than OpenAI's GPT-5.5. The change, originally rolled out as a temporary promotion when V4-Pro launched on April 24, was scheduled to expire May 31. As of this week, it's the new floor.
The math is the story. V4-Pro now permanently sits at $0.435 per million input tokens and $0.87 per million output tokens, with cache hits at roughly $0.0036 per million. For a developer running 100 million output tokens per month, that's about $348 a month on V4-Pro versus around $2,500 on Claude Opus 4.7 and roughly $3,000 on GPT-5.5.
What's actually new
DeepSeek published the price cut as a limited-time promotion at the V4-Pro launch — a familiar tactic in the Chinese AI ecosystem, which has used aggressive launch pricing to seed adoption. The expected pattern was that promotional pricing would expire May 31 and the model would float back up to something closer to the original card.
That didn't happen. DeepSeek announced this week that the discounted rate is permanent. The company framed the move as a structural change rather than a marketing push, and credited improved infrastructure economics — specifically the buildout of Huawei Ascend 950 supernode capacity — as making the lower pricing sustainable.
That last point is the geopolitical undercurrent. V4-Pro reportedly runs on Huawei Ascend 950 silicon rather than Nvidia GPUs, which remain difficult for Chinese AI labs to access at scale due to U.S. export controls. If Huawei's Ascend supply chain is stable enough to support permanent pricing this aggressive, that's a notable independence signal for the Chinese AI stack.
The benchmark math
The price comparison gets attention because V4-Pro is competitive on capability, not just cost. The model is reportedly within striking distance of Claude Opus 4.7 and GPT-5.5 on several reasoning and coding benchmarks. It is not the absolute frontier on every task, but for a meaningful share of production workloads — RAG (retrieval-augmented generation), summarization, code generation, structured extraction — the gap is small enough that price wins.
The cost differential is what changes purchasing behavior. A platform team running an internal coding assistant for 5,000 engineers can swap from Claude or GPT-5.5 to V4-Pro and free up six figures of monthly spend. Even with a 5-10% capability hit on some tasks, the economics push enterprises to evaluate.
Why this matters
The AI inference market has been quietly bifurcating for months. Frontier U.S. labs have held pricing flat or raised it, on the theory that capability gains justify the premium. Chinese labs — DeepSeek, Alibaba's Qwen, Moonshot's Kimi, Zhipu's GLM — have been racing in the opposite direction.
The OpenRouter data from last week was already striking: Chinese models hit 60% of OpenRouter traffic for the first time in May. The DeepSeek price move is likely to push that share higher.
For Western frontier labs, the strategic question becomes whether to respond on price, on capability, or on tooling. None of those are easy. Cutting prices on Claude Opus or GPT-5.5 by 75% would be ruinous given the training and inference costs involved. Holding the line on capability assumes customers will keep paying the premium even as the gap to "good enough" Chinese models narrows. And tooling moats — Claude Code, Codex, Gemini's Workspace integration — only matter if developers are still in the ecosystem.
What changes for developers
In practice, developers are likely to do what they already do: route by task. Frontier tasks (long-context reasoning, complex agentic flows, the most demanding code generation) keep going to Claude or GPT-5.5. Higher-volume, lower-stakes tasks (summarization, extraction, classification, internal tooling) increasingly move to V4-Pro or other open-weights alternatives via OpenRouter.
The interesting wrinkle is that V4-Pro is not just cheap — it's also available through self-hosted deployments because DeepSeek continues to release weights. For enterprises with data-residency or compliance constraints that previously blocked them from using DeepSeek's hosted API, that's a path that doesn't exist with closed models.
Industry reaction
Reactions on X have split along predictable lines. Developers are largely enthusiastic — the cost-per-token math is hard to argue with, and many have posted before/after spend comparisons after swapping V4-Pro into their pipelines. AI infrastructure executives at U.S. labs have been quieter publicly, but several have noted privately that the pressure on mid-tier pricing is now structural rather than transient.
Some U.S. national-security commentators flagged the Huawei dependency as a reminder that the Chinese AI stack is becoming functionally independent of U.S. supplier ecosystems. That's a longer-term policy question, but it is shaping how the U.S. Commerce Department thinks about export controls going forward.
What's next
Three things to watch.
First, whether Anthropic, OpenAI, and Google respond with mid-tier price cuts. Claude Sonnet, GPT-5.5 mini, and Gemini 3.5 Flash are the obvious targets — frontier models are unlikely to move on price, but the rung below them is now exposed.
Second, whether DeepSeek follows with a V4-Pro-Plus or similar successor that pushes capability while keeping the new pricing intact. The company has been on a roughly six-week release cadence.
Third, the OpenRouter share data for June. If Chinese models cross 65-70% of routed traffic, that becomes hard to dismiss as a temporary distortion and starts to reshape the AI market narrative outside specialist circles.
Bottom line
DeepSeek just removed the most plausible "this is temporary" caveat from the inference price war. V4-Pro at $0.435 in and $0.87 out, as a permanent rate, restructures the cost basis for AI products at scale. Western labs are now in the position of either matching on price, differentiating sharply on capability, or accepting that the high-volume middle of the market belongs to open-weights competitors. Developers should be running V4-Pro evals against their workloads this week if they aren't already.
Don't fall behind
Expert AI Implementation →Related Articles
NVIDIA Cosmos 3: First Open Physical AI Omnimodel Cuts Training Cycles to Days
NVIDIA's Cosmos 3 launches at Computex 2026 — a fully open foundation model that unifies vision, world generation, and action for robots and autonomous systems.
min read
Anthropic Adds Services Track and Partner Hub to Claude Network
Anthropic launches a 3-tier Services Track and a public Partner Hub. 40,000 firms have applied; 10,000 consultants are certified.
min read
Apoha Exits Stealth With $36M to Build 'Liquid Brain' AI for Materials
UK startup Apoha emerges with $36M Series A and a wild new data type: how materials vibrate in liquid. The pitch is AI for materials discovery.
min read