Anthropic's Project Deal: Claude Agents Closed 186 Real Deals

Anthropic ran a one-week experiment inside its San Francisco office where Claude agents handled real money, real goods, and real negotiations on behalf of employees — and closed 186 deals worth more than $4,000 with no human approvals along the way. The company published the results, branded "Project Deal," as a window into what agent-to-agent commerce looks like when it actually works.

The setup was simple: a Craigslist-style marketplace open only to Anthropic staff, but where every transaction had to be conducted by a Claude agent acting for each side.

Why this matters

Most discussion of "AI agents" in commerce so far has been hypothetical. Companies talk about agents booking travel or buying groceries, but very little of that actually runs end-to-end without a human in the loop. Project Deal removed the human entirely — once an employee told their agent what to do, the agent ran the negotiation, agreed to terms, and committed to a transaction.

That's an important threshold. It's the first widely shared empirical look at how today's frontier models behave when asked to make real economic decisions on someone else's behalf, with their counterparty also being an AI.

The takeaways are practical for anyone thinking about agentic commerce: agent quality directly determines who wins, agents will close deals their humans might not have, and people are willing to delegate more financial authority to AI than industry observers have assumed.

What Anthropic actually did

The experiment ran for a week. Sixty-nine Anthropic employees opted in, each given a $100 budget paid out via gift cards. Participants told their assigned Claude agent what they wanted to buy or sell, and the agents handled the rest — listings, search, messaging, negotiation, and final agreement.

Crucially, there was no escape hatch back to the human during a deal. The agents didn't ping their owners for approval mid-negotiation, and they didn't pause when bidding got competitive. Once the parameters were set, the agents executed.

By the end of the week, Anthropic counted 186 completed deals across the marketplace, totaling more than $4,000 in transaction value. The items traded ranged from used office equipment to event tickets to handmade crafts.

The most interesting finding: model quality decided outcomes

Anthropic ran an A/B variant where some employees were represented by Claude Opus 4.5 and others by the smaller, faster Claude Haiku 4.5. The result: people represented by the more capable Opus model got measurably better outcomes — better prices, better terms, more deals closed.

Opus negotiated more aggressively, recognized leverage points, and held firm on pricing more often than Haiku did. Haiku tended to settle quickly and accept first offers more often.

The kicker, according to Anthropic's writeup: people whose agents lost the negotiation didn't notice. Without a benchmark to compare against, participants represented by Haiku were just as satisfied with their outcomes as those who used Opus — even when, on the data, they paid more or got less.

That has real implications for the agentic commerce market that's about to form. If consumers can't tell whether their agent is actually negotiating well, the choice of which model represents you starts to matter a lot — and most consumers won't have a way to evaluate that on their own.

Industry impact: agent-to-agent commerce is closer than it looks

Project Deal isn't a product. Anthropic was clear that it's a research experiment, not the rollout of a marketplace. But it reads as a deliberate signal to the industry: agent-to-agent transactions are technically feasible right now, and the market is going to materialize whether the rules are figured out or not.

That's a problem for existing commerce platforms. Marketplaces like eBay, Amazon, and Facebook Marketplace are designed around human buyers and sellers reading listings, sending messages, and making decisions. If agents become the actual users — listing items, negotiating, buying — the entire UX of e-commerce becomes irrelevant. The interesting question becomes which platforms expose protocols that agents can use directly versus which double down on human-only experiences.

It's also a problem for trust. In a marketplace where both sides are bots, fraud detection, dispute resolution, and identity verification all need to be redesigned. Anthropic acknowledged in its writeup that handling edge cases — disputes, fraud, low-quality counterparties — is one of the open research questions.

What competitors are doing

The agentic commerce race is no longer abstract. OpenAI has published research on agents that complete shopping tasks across the open web. Google has demonstrated agents booking travel and reservations on its I/O stage. Visa and Mastercard have both announced AI-agent-aware payment rails over the past six months.

Anthropic's contribution is the first to publish a controlled internal study with real money and real outcomes, which gives it more empirical weight than the typical demo.

What industry insiders are saying

The experiment has been the most-discussed AI research drop on X this week. The conversation has split roughly in two: developer enthusiasm about how cleanly the agents handled multi-turn negotiation, and concern from policy and trust-and-safety researchers about the implications of letting agents commit users to financial deals without human review.

Several commentators noted that the most striking result — that humans don't notice when their agent loses — should put pressure on agent providers to publish negotiation benchmarks the way labs publish coding and reasoning benchmarks. If model quality determines who wins in agentic commerce, buyers deserve to see the scorecards.

What's next

Anthropic hasn't announced any plans to commercialize Project Deal. Internally, the company says it sees the experiment as input to ongoing work on agent capabilities, alignment, and the protocols that enable agents to interact safely.

For builders, the more useful signal is what to expect from frontier models in production: today's Opus-tier models can already handle multi-turn negotiation against another AI without losing context, hallucinating commitments, or failing to close. That's a higher capability bar than most consumer agent products are using right now.

For everyone else, the takeaway is more philosophical. Agent-to-agent commerce is no longer a thought experiment. The first marketplace where bots closed every deal already happened — inside Anthropic, last month, with real money. The version that reaches consumers is just a question of who builds it first and how the rules get written.

The bottom line

Project Deal is a proof point that the agentic economy is technically here. The next eighteen months will be about turning that proof point into infrastructure: protocols, payment rails, dispute mechanisms, and benchmarks that let consumers know which agents are actually working in their interest. If you're building in commerce, fintech, or marketplaces, the implication is clear — the user on the other end of your API may not be a human for much longer.

Anthropic's Project Deal: Claude Agents Closed 186 Real Deals

Anthropic's Project Deal: Claude Agents Closed 186 Real Deals

Why this matters

What Anthropic actually did

The most interesting finding: model quality decided outcomes

Industry impact: agent-to-agent commerce is closer than it looks

What competitors are doing

What industry insiders are saying

What's next

The bottom line

Sources

Don't fall behind

Related Articles

OpenAI's GeneBench-Pro Tests AI on Real Biology Research

Anthropic Launches Claude Fable 5: Its Most Capable Model Yet

China Plans $295B AI Data Center Buildout to Rival the US