Anthropic Admits Claude Got Worse: Three Bugs, One Ugly Post-Mortem

Anthropic published a post-mortem on Thursday acknowledging what Claude Code users have been complaining about loudly for weeks: quality dropped, it wasn't their imagination, and it was the company's fault. The post-mortem identifies three separate changes between March 4 and April 20 that degraded performance — two bugs and one bad configuration choice — and lists the fixes for each.

The API wasn't affected. The damage was concentrated in Claude Code, the Claude Agent SDK, and Claude Cowork (the consumer-facing agent product). Anthropic has reset usage limits for all subscribers as of April 23, which is the company's unusually blunt way of saying: we know we owe you.

Context: A Quiet Revolt in the Developer Community

For the last six weeks, Claude Code users on X, Reddit, and Hacker News have been reporting the same thing: the model got dumber. Tasks it used to handle cleanly started requiring multiple attempts. Responses felt truncated. Power users started openly benchmarking Claude against GPT-5.4 and Gemini 3.1 and publishing unflattering comparisons.

Anthropic's initial response through most of March and early April was cautious. The company acknowledged the reports and said it was investigating, but wouldn't confirm anything had changed. That's defensible when complaints are subjective — but untenable when the benchmark community starts producing hard numbers showing real degradation.

The tipping point came last week, when high-profile developers started publicly threatening to cancel. Anthropic also tested "yanking Claude Code from Pro," per The Register on April 22 — which raised the temperature further. Today's post-mortem is the resolution: a detailed technical confession plus usage-limit resets.

What Actually Broke

The post-mortem is structured around three distinct root causes. Each one is minor on its own. Compounded, they made Claude Code meaningfully worse for weeks.

The first issue landed on March 4, when Anthropic reduced Claude Code's default reasoning effort from "high" to "medium." Reasoning effort controls how much internal deliberation the model does before producing output. Lowering it saves compute — fine for simple tasks, but for the complex, multi-step engineering work Claude Code is actually used for, it meant the model was thinking less about each step. The change ran unnoticed for over a month before being reverted.

The second issue landed on March 26. Engineers shipped a cache optimization meant to clear output tokens for idle sessions. The intent was reasonable. The implementation had a bug — the cache clear was firing on every single turn of every active session, not just idle ones. The model was losing its working context mid-conversation and reconstructing state from scratch. This is exactly the kind of bug that makes a model feel "dumber" in ways that are hard to pin down.

The third issue landed on April 16 — the same day Anthropic shipped Claude Opus 4.7 and /ultrareview. The new system prompt added verbosity constraints: "keep text between tool calls to ≤25 words" and "keep final responses to ≤100 words." Those produced a 3 percent measurable drop on internal benchmarks. Reverted on April 20.

Each individual change was small. Stacked, the result was a noticeable regression that persisted for six weeks.

Industry Impact

This post-mortem is a reputation event, not just a technical one. Anthropic's entire brand is built on safety, rigor, and predictability — exactly the kind of company enterprises are supposed to trust with production AI workloads. Shipping three unrelated regressions in six weeks and only catching them after a public outcry is not the positioning Anthropic wants.

Enterprise buyers will read this post-mortem carefully. The good news: the failures weren't in the model itself. Claude Opus 4.7, Claude Sonnet 4.6, and the underlying API were unaffected. The damage was in the harness and configuration wrapping the model for specific products. For customers using the API directly, there's no change. For customers using Claude Code as a managed product, it's a warning about the difference between trusting a model and trusting the full stack.

The timing is also hard. OpenAI shipped GPT-5.5 this same morning with 82.7% on Terminal-Bench 2.0 and aggressive agent claims. A developer-focused competitor launching a flagship agent model on the day you publish a "we shipped three regressions in six weeks" post-mortem is not ideal.

Expert Perspectives

The Register was blunt: "Anthropic admits it dumbed down Claude when trying to make it smarter." That framing will stick, fair or not — because the pattern repeats. Labs ship optimizations, optimizations introduce regressions, users notice, labs deny, evidence mounts, labs eventually confirm.

VentureBeat emphasized a structural point: "the degradation came from changes to Claude's harnesses and operating instructions, not from the model itself." That distinction matters. Model quality is one thing. Platform quality — reasoning budgets, system prompts, caching, tool use — is another, and it's increasingly where real product differences show up.

What's Next

Watch how Anthropic changes its release process. Post-mortems this public usually come with commitments to process changes, and the obvious one here is better regression testing on the harness layer, not just on the model. Expect an announcement in the next few weeks about how Claude Code changes will be staged and tested going forward.

Watch the subscriber churn numbers. Anthropic hasn't disclosed them, but usage-limit resets signal that the company is worried about cancellations. If Q2 paid subscriber growth comes in weak, this post-mortem is the reason.

Watch competitors' positioning. OpenAI, Google, and xAI will all be tempted to run comparative benchmarks against Claude over the next few weeks, now that there's a specific "Claude got worse" story they can point to. Some of those comparisons will be fair. Most won't be.

Bottom Line

If you use Claude Code, the fixes are in — run your workflows again and see if quality feels restored. If you were considering switching to Codex or Cursor with GPT-5.5, Anthropic just handed you a reason to try. And if you're an enterprise buyer evaluating vendor reliability, this post-mortem is unusually honest — worth reading in full, because most labs would not publish it.

Anthropic Admits Claude Got Worse: Three Bugs, One Ugly Post-Mortem

Anthropic Admits Claude Got Worse: Three Bugs, One Ugly Post-Mortem

Context: A Quiet Revolt in the Developer Community

What Actually Broke

Industry Impact

Expert Perspectives

What's Next

Bottom Line

Sources

Don't fall behind

Related Articles

China Blocks Meta's $2B Acquisition of AI Startup Manus

EU Expands Digital Markets Act to Target Cloud and AI

Meta Reserves 1 GW of Space Solar to Power Its AI Data Centers