Anthropic Admits Claude Got Worse: Three Bugs, One Ugly Post-Mortem
Krasa AI
2026-04-23
6 minute read
Anthropic Admits Claude Got Worse: Three Bugs, One Ugly Post-Mortem
Anthropic published a post-mortem on Thursday acknowledging what Claude Code users have been complaining about loudly for weeks: quality dropped, it wasn't their imagination, and it was the company's fault. The post-mortem identifies three separate changes between March 4 and April 20 that degraded performance — two bugs and one bad configuration choice — and lists the fixes for each.
The API wasn't affected. The damage was concentrated in Claude Code, the Claude Agent SDK, and Claude Cowork (the consumer-facing agent product). Anthropic has reset usage limits for all subscribers as of April 23, which is the company's unusually blunt way of saying: we know we owe you.
Context: A Quiet Revolt in the Developer Community
For the last six weeks, Claude Code users on X, Reddit, and Hacker News have been reporting the same thing: the model got dumber. Tasks it used to handle cleanly started requiring multiple attempts. Responses felt truncated. Power users started openly benchmarking Claude against GPT-5.4 and Gemini 3.1 and publishing unflattering comparisons.
Anthropic's initial response through most of March and early April was cautious. The company acknowledged the reports and said it was investigating, but wouldn't confirm anything had changed. That's defensible when complaints are subjective — but untenable when the benchmark community starts producing hard numbers showing real degradation.
The tipping point came last week, when high-profile developers started publicly threatening to cancel. Anthropic also tested "yanking Claude Code from Pro," per The Register on April 22 — which raised the temperature further. Today's post-mortem is the resolution: a detailed technical confession plus usage-limit resets.
What Actually Broke
The post-mortem is structured around three distinct root causes. Each one is minor on its own. Compounded, they made Claude Code meaningfully worse for weeks.
The first issue landed on March 4, when Anthropic reduced Claude Code's default reasoning effort from "high" to "medium." Reasoning effort controls how much internal deliberation the model does before producing output. Lowering it saves compute — fine for simple tasks, but for the complex, multi-step engineering work Claude Code is actually used for, it meant the model was thinking less about each step. The change ran unnoticed for over a month before being reverted.
The second issue landed on March 26. Engineers shipped a cache optimization meant to clear output tokens for idle sessions. The intent was reasonable. The implementation had a bug — the cache clear was firing on every single turn of every active session, not just idle ones. The model was losing its working context mid-conversation and reconstructing state from scratch. This is exactly the kind of bug that makes a model feel "dumber" in ways that are hard to pin down.
The third issue landed on April 16 — the same day Anthropic shipped Claude Opus 4.7 and /ultrareview. The new system prompt added verbosity constraints: "keep text between tool calls to ≤25 words" and "keep final responses to ≤100 words." Those produced a 3 percent measurable drop on internal benchmarks. Reverted on April 20.
Each individual change was small. Stacked, the result was a noticeable regression that persisted for six weeks.
Industry Impact
This post-mortem is a reputation event, not just a technical one. Anthropic's entire brand is built on safety, rigor, and predictability — exactly the kind of company enterprises are supposed to trust with production AI workloads. Shipping three unrelated regressions in six weeks and only catching them after a public outcry is not the positioning Anthropic wants.
Enterprise buyers will read this post-mortem carefully. The good news: the failures weren't in the model itself. Claude Opus 4.7, Claude Sonnet 4.6, and the underlying API were unaffected. The damage was in the harness and configuration wrapping the model for specific products. For customers using the API directly, there's no change. For customers using Claude Code as a managed product, it's a warning about the difference between trusting a model and trusting the full stack.
The timing is also hard. OpenAI shipped GPT-5.5 this same morning with 82.7% on Terminal-Bench 2.0 and aggressive agent claims. A developer-focused competitor launching a flagship agent model on the day you publish a "we shipped three regressions in six weeks" post-mortem is not ideal.
Expert Perspectives
The Register was blunt: "Anthropic admits it dumbed down Claude when trying to make it smarter." That framing will stick, fair or not — because the pattern repeats. Labs ship optimizations, optimizations introduce regressions, users notice, labs deny, evidence mounts, labs eventually confirm.
VentureBeat emphasized a structural point: "the degradation came from changes to Claude's harnesses and operating instructions, not from the model itself." That distinction matters. Model quality is one thing. Platform quality — reasoning budgets, system prompts, caching, tool use — is another, and it's increasingly where real product differences show up.
What's Next
Watch how Anthropic changes its release process. Post-mortems this public usually come with commitments to process changes, and the obvious one here is better regression testing on the harness layer, not just on the model. Expect an announcement in the next few weeks about how Claude Code changes will be staged and tested going forward.
Watch the subscriber churn numbers. Anthropic hasn't disclosed them, but usage-limit resets signal that the company is worried about cancellations. If Q2 paid subscriber growth comes in weak, this post-mortem is the reason.
Watch competitors' positioning. OpenAI, Google, and xAI will all be tempted to run comparative benchmarks against Claude over the next few weeks, now that there's a specific "Claude got worse" story they can point to. Some of those comparisons will be fair. Most won't be.
Bottom Line
If you use Claude Code, the fixes are in — run your workflows again and see if quality feels restored. If you were considering switching to Codex or Cursor with GPT-5.5, Anthropic just handed you a reason to try. And if you're an enterprise buyer evaluating vendor reliability, this post-mortem is unusually honest — worth reading in full, because most labs would not publish it.
Sources
Don't fall behind
Expert AI Implementation →Related Articles
China Blocks Meta's $2B Acquisition of AI Startup Manus
Beijing's NDRC vetoed Meta's $2 billion takeover of AI agent startup Manus, citing technology transfer concerns and forcing the deal's unwinding.
min read
EU Expands Digital Markets Act to Target Cloud and AI
European regulators will extend the Digital Markets Act to cloud services and AI, targeting Amazon, Microsoft, and major virtual assistant platforms.
min read
Meta Reserves 1 GW of Space Solar to Power Its AI Data Centers
Meta signed deals with Overview Energy and Noon Energy for up to 1 GW of orbital solar and 100 GWh of long-duration storage to power AI infrastructure.
min read