OpenAI and Thrive Launch Self-Improving Tax AI Hitting 97% Accuracy

OpenAI and Thrive Holdings have unveiled a Codex-powered tax preparation system that rewrites itself based on accountant corrections — and the results are striking. In a pilot across 30+ accounting firms, the system processed 7,000 tax returns, hit up to 97% accuracy, cut preparation time by a third, and raised throughput by roughly 50%. The technical details landed in an OpenAI blog post this week, formally announcing what may be the most consequential demonstration yet of agentic AI that improves itself in production.

The headline isn't the accuracy number. It's the trajectory. At launch, only 25% of returns hit 75% correct field completion. Six weeks later, that figure had climbed to 86%. The system was learning from accountant corrections, regenerating its own code in response, and getting measurably better every week with no model retraining required.

What's actually new

The Tax AI is built on OpenAI's Codex, but it doesn't work like a typical Codex deployment. Most production AI systems are static: the model ships, customers use it, feedback gets collected into a backlog, and a future model release eventually incorporates the improvements. The cycle is measured in months.

The OpenAI–Thrive system collapses that cycle into a continuous loop. The system ingests three streams of input simultaneously: direct feedback from practitioners on individual returns, full production traces of prior corrections and filings, and pattern data about what kinds of errors are happening most frequently. Codex then runs targeted evaluations and writes code modifications in response to recurring problems.

Every time a human accountant fixes something the AI got wrong, the system learns from that correction and updates the underlying logic, not just a prompt. That's the "self-improving" part — Codex is rewriting the tax agent's own implementation, not just adjusting its outputs.

Why this matters

Self-improving production systems have been a long-promised feature of agentic AI, and most demonstrations have been narrow lab benchmarks. The OpenAI–Thrive system is the first widely-reported deployment to show large gains in a real, regulated, high-stakes domain.

Tax preparation is also a useful proving ground. Returns are structured, the correctness signal is unambiguous (the IRS either accepts a return or doesn't), and the feedback loop from accountants is high-quality. If you wanted to test self-improvement on a non-trivial enterprise task, this is close to the ideal setup.

The economic implications for the accounting industry are significant. A 33% reduction in preparation time and 50% throughput increase, combined with continuously improving accuracy, changes the unit economics of a tax practice. Small and mid-sized firms — which have been losing talent to larger firms and have been most exposed to capacity constraints during tax season — are the most direct beneficiaries.

How the partnership came together

OpenAI took an equity stake in Thrive Holdings in December 2025. Thrive is a holding company building enterprise software businesses across IT services and accounting, and the OpenAI stake was framed at the time as an investment to accelerate enterprise AI adoption in mid-market verticals.

The Tax AI is the first major output of that partnership. OpenAI and Thrive engineers worked together for six months to build the system. Notably, Thrive Holdings owns the resulting intellectual property and products from the collaboration — an unusual structure for a Big Tech AI partnership, where the model provider typically retains most IP rights.

That arrangement gives Thrive room to build a standalone accounting software business on top of the Tax AI, rather than just serving as a distribution channel for an OpenAI product. It also signals that OpenAI's strategy with Thrive is more about vertical penetration of the SMB market than direct competition with Intuit or H&R Block.

What the system actually does

The Tax AI drafts returns end-to-end. It ingests source documents (1099s, W-2s, K-1s, brokerage statements, real-estate documents), extracts the data, structures it into the appropriate return forms, and produces a draft for accountant review. The accountant then reviews and corrects the draft. Those corrections feed back into the system, where Codex evaluates whether the error reflects a systemic issue or a one-off, and rewrites the relevant logic if needed.

The 97% accuracy figure is the ceiling on common return types. The 86% figure (up from 25%) is the more meaningful metric — first-pass accuracy across the full distribution of returns. That climb in six weeks is what's driving industry attention.

Industry reaction

The accounting industry's reaction has been a mix of fascination and concern. Trade publications have noted that the throughput gains are exactly what mid-sized firms need to survive the demographic squeeze in the profession without losing client relationships. The concern is what happens to staff accountants when the routine preparation work that used to train them gets automated.

MSP industry analysts also flagged the Thrive partnership as a signal that OpenAI is preparing to roll the same self-improving agent pattern out to IT services — ticket triage, network configuration, software support — all areas where Thrive has portfolio companies.

What to watch

The most important question is whether the self-improvement loop keeps delivering past the six-week mark. Most learning curves flatten quickly. If the system keeps improving through tax season 2027, that establishes self-improvement as a durable pattern, not an onboarding boost.

The second thing to watch is the regulatory response. The IRS has not yet taken a public position on AI-prepared returns. If error rates stay near the 97% ceiling, that question stays quiet; if they slip, regulatory attention follows.

Bottom line

This is what self-improving production AI looks like when it works. The 97% number is impressive; the 25%-to-86% climb is the more important story. For accounting firms, the system is a real efficiency gain available now. For the AI industry, the OpenAI–Thrive deployment is the clearest evidence yet that self-improvement is moving from research demo to production. Expect the pattern to show up in other regulated verticals soon.

OpenAI and Thrive Launch Self-Improving Tax AI Hitting 97% Accuracy

OpenAI and Thrive Launch Self-Improving Tax AI Hitting 97% Accuracy

What's actually new

Why this matters

How the partnership came together

What the system actually does

Industry reaction

What to watch

Bottom line

Sources

Don't fall behind

Related Articles

OpenAI's GeneBench-Pro Tests AI on Real Biology Research

Anthropic Launches Claude Fable 5: Its Most Capable Model Yet

China Plans $295B AI Data Center Buildout to Rival the US