AI experts sharing free tutorials to accelerate your business.
← Back to Blog
Weekly Roundup

AI News Roundup: Week of May 25-31, 2026 - The Price War Becomes Permanent

Krasa AI

2026-05-31

13 min read

AI News Roundup: Week of May 25-31, 2026 - The Price War Becomes Permanent

If last week was about which lab could ship the most product in five days, this week was about what those products are actually worth. The throughline isn't a single keynote — it's a market that has stopped pretending capability alone will hold pricing intact. DeepSeek made its 75% V4-Pro price cut permanent on Friday. Anthropic shipped Claude Opus 4.8 with a flagship fast mode running at one-third the previous price. Microsoft and Anthropic opened formal talks to migrate Claude inference onto Maia 200 silicon. Chinese models crossed the 60% threshold on OpenRouter for the first time. Cognition closed a $1 billion round at a $26 billion valuation. Meta launched its first paid AI subscriptions at $7.99.

Underneath the pricing story, the enterprise-AI-actually-works story finally produced numbers worth quoting in board meetings: TD Bank compressed a 15-hour mortgage review into 3 minutes, and OpenAI's self-improving tax system hit 97% accuracy across 7,000 returns. Two years into the production-AI cycle, the week's signal is that the conversation has moved from "does this work" to "what does it cost." Here is what this week's news actually means.


Top Stories of the Week

1. DeepSeek Makes the 75% Price Cut Permanent — and Chinese Models Cross 60% of OpenRouter

The biggest story of the week is two stories. On Friday, DeepSeek announced that the 75% V4-Pro discount it shipped as a "promotional" launch price on April 24 — and was scheduled to expire May 31 — is now the permanent rate. Read full article V4-Pro now sits at $0.435 per million input tokens and $0.87 per million output tokens — roughly 7x cheaper than Claude Opus 4.7 and 9x cheaper than GPT-5.5. DeepSeek framed it as a structural change, crediting the buildout of Huawei Ascend 950 supernode capacity for making the economics work.

Two days earlier, the platform-level data caught up to the model-level economics. OpenRouter — the largest third-party model router used by developers — confirmed that Chinese-built large language models now serve more than 60% of all tokens routed through the platform, a roughly 50x jump in 18 months. Read full article Kimi K2.6 (Moonshot), DeepSeek V4, GLM-5.1 (Zhipu), and Qwen 3 (Alibaba) have become the default open-weights stack for developers building cost-sensitive agentic workloads. Meta's delayed Avocado model, the last credible Western open-weights frontier candidate, has gone silent.

The two stories compound. The strategic problem for Western labs isn't the 60% number itself — it's that the trajectory is structural. As long as Chinese labs release weights at price points the closed-model labs cannot match without ruining their unit economics, the open-weights frontier stays Chinese-led, and the high-volume middle of the inference market with it.


2. Cognition Raises $1B at $26B Valuation, Doubling in Eight Months

On Wednesday, Cognition — the company behind autonomous AI software engineer Devin — announced it had closed more than $1 billion at a $25 billion pre-money valuation, roughly $26 billion post-money. Read full article That's more than double the $10.2 billion valuation it closed at just eight months ago. The round was led by Lux Capital, General Catalyst, and 8VC, with Founders Fund, Ribbit, Atreides, and Peter Thiel's firm also participating.

The numbers behind the valuation explain why investors moved so fast. Cognition reports $492 million in annualized run-rate revenue, with enterprise Devin usage growing 50% month-over-month for six consecutive months. Customers now include Mercedes-Benz, NASA, Goldman Sachs, Santander, Citi, Dell, and the U.S. military. Cognition has previously disclosed that the share of its own codebase written by Devin climbed from 13% in December 2025 to 89% in May 2026.

The strategic message matters more than the dollars. A year ago, the consensus thesis on AI coding was that Anthropic's Claude Code, OpenAI's Codex, and Google's Jules would absorb the entire category. Cognition's round is the clearest counterpunch yet — top-tier VCs collectively betting that owning the workflow how engineers actually build, test, and deploy software is defensible even if you don't own the underlying model. Expect M&A among the smaller agentic coding startups to pick up sharply in the second half of 2026, now that a clear $26B leader exists at the top of the category.


3. Anthropic and Microsoft Open Talks on Maia 200 — Custom Silicon Goes Multi-Tenant

CNBC reported on Tuesday that Anthropic and Microsoft are in early-stage negotiations for Claude to become the first external frontier model running on Microsoft's custom Maia 200 AI accelerator. Read full article The deal under discussion would have Anthropic rent Azure servers running Maia 200 to serve Claude Haiku and Claude Sonnet inference — workloads that dominate the company's request volume by raw count.

Maia 200 is fabricated on TSMC's 3nm process with four linked accelerators per package. On Microsoft's April earnings call, Satya Nadella said the chip "offers over 30% improved tokens per dollar, compared to the latest silicon in our fleet." Internal benchmarks reported in industry press suggest up to 40% better performance-per-watt for LLM inference. For Anthropic, even a modest 30% migration of inference volume could trim its Azure cloud bill by double-digit percentages against the $30 billion long-term Azure compute commitment.

The deal, if it closes, makes Microsoft the fifth silicon partner in Anthropic's compute portfolio — alongside Nvidia GPUs, AWS Trainium, Google TPUs, and SpaceX's Colossus capacity. Most frontier labs lock into one chip vendor; Anthropic is doing the opposite, betting that compute optionality is the durable moat. The broader signal: custom silicon has graduated from a hyperscaler cost-control experiment into a real market for outside frontier customers. If Claude runs on Maia 200 at scale, every other lab considering a multi-vendor compute strategy gets a working template.


4. Claude Opus 4.8 Lands With Dynamic Workflows and a 3x Cheaper Fast Mode

On Thursday, just 41 days after Opus 4.7 shipped, Anthropic released Claude Opus 4.8 with a runtime called Dynamic Workflows that orchestrates up to 1,000 AI subagents in parallel on a single task. Read full article The release also brought a fast mode running 2.5x faster than standard Opus inference at one-third the previous price — bringing Opus into a price-performance range that previously belonged to mid-tier models.

The benchmark story is incremental on coding (SWE-bench Verified climbs from 87.6% to 88.6%; SWE-bench Pro from 64.3% to 69.2%) but dramatic on reasoning (USAMO 2026 jumps from 69.3% to 96.7%; GraphWalks at 1M-token context more than doubles from 40.3% to 68.1%). Opus 4.8 also beats GPT-5.5 and Gemini 3.1 Pro on GDPval-AA, the real-world economic-task benchmark Anthropic has been pushing as a more meaningful test than purely academic ones.

The headline framing isn't capability, though — it's honesty. Anthropic says Opus 4.8 is roughly 4x less likely than 4.7 to let coding flaws slip through unflagged, scores 0% on uncritically reporting flawed results (a first for any Claude model), and shows a more than 10x reduction in overconfidence. For agentic systems that amplify whatever errors the underlying model makes, this is the difference between a tool senior engineers can review in 10 minutes and one that quietly ships broken code. Opus 4.8 is generally available across the Claude API, AWS Bedrock, Google Vertex AI, GitHub Copilot, and Harvey from day one.


5. Enterprise AI Finally Produces ROI Numbers Worth Quoting

Two enterprise stories this week shifted the conversation from "promising deployments" to "do the math." TD Bank Group, working with its in-house AI lab Layer 6, launched its first agentic AI system for mortgage and HELOC applications — and early data shows the system is compressing a 15-hour underwriter task into less than 3 minutes. Read full article Mortgage processing has been the canonical AI-replaces-work use case for two years; TD just shipped the production version of the demo every fintech vendor has been pitching.

The same week, OpenAI and Thrive Holdings disclosed the results of a Codex-powered tax preparation pilot across 30+ accounting firms processing 7,000 returns. Read full article The system rewrites its own implementation in response to accountant corrections — not just adjusting prompts, but regenerating underlying logic. At launch, 25% of returns hit 75% correct field completion; six weeks later, 86%. Final accuracy crossed 97%, with prep time down a third and throughput up roughly 50%.

These deployments matter because they answer the question CFOs have been asking for two years: where does the AI money come back. The other side of the answer came from Intuit, which disclosed Tuesday it was cutting 3,000 employees — 17% of its workforce — and taking a $300 million restructuring charge to "refocus on AI." Read full article CEO Sasan Goodarzi insisted "none of it had to do with AI." It is increasingly hard to find an enterprise software company shrinking headcount where AI is not the explanation either offered openly or denied prominently.


Industry Impact Analysis

For Finance and Professional Services. The agentic-AI ROI story crystallized this week for the financial-services sector. TD's 15-hour-to-3-minute mortgage compression is the kind of number that becomes a board slide at every other major North American bank within a quarter. Layer 6 has been TD's quiet differentiator since the 2017 acquisition, and the company is now using it to ship benchmarks competitors will have to match or accept a structural cost disadvantage on mortgage origination. The OpenAI–Thrive tax system extends the same logic into accounting — and Thrive's 30+ firm pilot footprint suggests the playbook is already being repeated across regional practices. CFOs in banking, insurance, and accounting should treat the next 18 months as the window in which back-office process automation moves from pilot to standard. Expect the second half of 2026 to surface the first round of mortgage-processor, paralegal, and junior-accountant headcount adjustments at firms that adopted these tools as 2026 productivity programs.

For Software Engineering. The agentic coding stack reached a level of competitive maturity this week that meaningfully changes engineering procurement. Cognition's $1 billion round at $492 million ARR validates that an independent application-layer company can build a durable business alongside the labs. Claude Opus 4.8's Dynamic Workflows ships a runtime that orchestrates up to 1,000 subagents per task — a primitive competing agent frameworks have been chasing for two years. Cursor's Composer 2.5 release this week running on Kimi K2.5 at one-tenth the cost of Opus is the other half: developers are increasingly routing to whichever model wins their task economics. For engineering leaders, per-developer AI spend will continue to fragment across a Devin-style autonomous tier ($150-$500/seat/month), an interactive Copilot/Cursor tier ($30-$80), and a high-volume open-weights inference layer routed via OpenRouter. Building cost controls and usage tagging into the engineering stack before the bill organically arrives is now the table-stakes governance task.

For Consumer AI and Media. The publisher-versus-AI fight took a major step on Thursday when CNN sued Perplexity in federal court over 17,000 stories, photos, and videos, plus trademark dilution claims. Read full article It's the first AI copyright action by any television network and pushes the fight into broadcast journalism, where video and image rights are the financial core. On the consumer side, Meta finally put a price on the $145 billion AI capex it committed to for 2026, launching Meta One AI subscriptions globally at $7.99 (Plus) and $19.99 (Premium). Read full article Meta One undercuts ChatGPT Plus on the entry tier and gives Zuckerberg his first direct consumer revenue line. The contradiction in the consumer market is now explicit: large language models are getting structurally cheaper to serve, but consumer subscription pricing is rising to fund the build-out. The labs that solve this gap first — through ad-supported inference, agent-commerce revenue, or platform partnerships — will define the consumer-AI business model for the rest of the decade.


What's Coming Next

The most-watched event of the next seven days is Microsoft Build 2026, opening Tuesday in Seattle. Read full article Build is expected to ship the public preview of the Windows Agent Store, deeper Copilot Studio governance, and broader Maia 200 commercial pricing — especially interesting in light of the Anthropic talks reported this week. The same morning, Nvidia's Jensen Huang opens Computex 2026 in Taipei with the Vera Rubin keynote, expected to ship production SKUs for the company's first in-house CPU paired with the Rubin GPU.

The following Monday, June 8, Apple's WWDC 2026 opens at Apple Park. Bloomberg's build-verified reporting points to a rebuilt Siri 2.0 with a chatbot interface, multi-step task handling, on-screen awareness, and — most consequentially — a new Extensions system that lets third-party models like Claude, Gemini, and OpenAI's GPT line plug in as swappable backends. Read full article If Extensions ships as reported, it is the most significant AI distribution change since ChatGPT was embedded in Bing, putting every frontier lab into competition for default placement across 1.4 billion active iPhones.

On the financial calendar, Anthropic is reportedly preparing an October S-1 and OpenAI's confidential filing puts a September listing window in play — both roadshows could overlap in Q4. DeepSeek's roughly six-week release cadence puts a V4-Pro-Plus successor in the late-June window. OpenAI's o3 and GPT-4.5 sunset dates landed as part of the ChatGPT consolidation, freeing inference capacity for GPT-5.5 Instant. Read full article The FTC and EU AI Office are both expected to publish frontier-model transparency guidance before the end of June.


Resources & Tools Mentioned

For readers who want to go deeper, the following resources are worth a look this week.

The official Anthropic announcement of Claude Opus 4.8 and Dynamic Workflows is at anthropic.com/news; the system card with the honesty benchmarks is linked in the post. DeepSeek's V4-Pro pricing page is at api-docs.deepseek.com. OpenRouter's public traffic rankings at openrouter.ai/rankings are the source for the Chinese-share data. Cognition's funding announcement is at cognition.ai/blog. OpenAI's Rosalind Biodefense program and the Thrive self-improving tax case study are both on the OpenAI newsroom.

For the enterprise stories, the Krasa.ai hubs to start with are the TD Bank Layer 6 deployment, the OpenAI–Thrive tax case study, and the Intuit restructuring memo. For the publisher fight, CNN v. Perplexity (1:26-cv-04427) is the docket to watch. For the consumer pricing layer, the Meta One launch coverage is the most concrete data point on consumer-AI monetization yet.

For ongoing follows, the highest-signal accounts this week were the official DeepSeek, Anthropic, Cognition, and Microsoft Azure accounts on X; analysis newsletters from Stratechery, The Information, and Latent Space; and OpenRouter's weekly rankings post. Krasa.ai will publish ongoing coverage of Microsoft Build, Computex, and Apple WWDC over the next two weeks.

This was the week the AI race stopped being about who could ship the most product and became about what those products cost to deliver. DeepSeek made the floor permanent, Cognition proved the application layer can hold value, Anthropic turned a chip negotiation into a moat, and the first real enterprise ROI numbers landed. The next two weeks decide whether Microsoft, Nvidia, and Apple can shift the conversation back to capability — or whether pricing is now the dominant frame for the rest of the year.

#Weekly#AI News#Roundup#DeepSeek#Anthropic#Cognition#Enterprise AI#AI Pricing

Related Posts