Sakana's Fugu: a 7B Model That Conducts GPT-5, Claude, and Gemini

Tokyo-based Sakana AI has opened beta access to Fugu, a commercial multi-agent system that uses a small, reinforcement-learned "conductor" to route problems across pools of frontier models including GPT-5, Claude Sonnet 4, and Gemini 2.5 Pro. The system reaches state-of-the-art performance on hard coding and scientific reasoning benchmarks while using far fewer tokens and API calls than the underlying models do on their own.

The headline detail is the conductor itself. It's a 7-billion-parameter model — smaller than most local laptop LLMs — trained from scratch through reinforcement learning to do one thing well: decide which frontier model should handle which subproblem, and stitch their outputs together. Sakana calls this "collective intelligence" and is betting it's the next architectural step after monolithic frontier models.

What Fugu Actually Does

Most multi-agent systems on the market are scaffolds. You write a graph of prompts, plug in one or more LLMs, and orchestrate the flow with code. LangGraph, AutoGen, and CrewAI all work that way. Fugu is different: the orchestrator is itself a trained model that learned, through trial and error, which combinations of workers and communication structures produce the best answer for a given input.

During training, the 7B conductor was given a task, a pool of candidate worker models, and a reward signal based on whether its final answer was correct and properly formatted. Through reinforcement learning, it discovered strategies — sometimes routing the whole problem to one model, sometimes splitting it across several, sometimes running models in parallel and picking the best answer.

Sakana says the conductor is based on its ICLR 2026 papers, Trinity and Conductor, both of which laid the groundwork for using RL to train orchestrators rather than hand-coding routing logic. Fugu is the commercial product built on top of that research.

The Performance Numbers

The benchmarks Sakana published are pointed. Fugu sets state of the art on SWE-Pro (real-world software engineering tasks), GPQA-D (graduate-level science Q&A), and ALE-Bench (a competitive programming and algorithmic reasoning suite). On those benchmarks, the orchestrated system outperforms any individual frontier model it draws from — including the ones running Sakana's own conductor.

That last part is the unusual claim. A 7B model is too small to do graduate-level physics or solve a real GitHub issue on its own. But by routing those problems intelligently to GPT-5 or Claude Sonnet 4 — and sometimes by asking multiple models the same question and picking the best answer — the system beats what any of those models do alone.

Crucially, it does this with fewer tokens. VentureBeat's reporting notes the framework hit state of the art "while using far fewer tokens and API calls than competing orchestration systems." For enterprises that pay per token across multiple frontier APIs, that's the actual selling point.

Why a Conductor Beats a Bigger Model

The intuition behind Sakana's bet is straightforward. Frontier models keep getting bigger, more expensive, and less differentiated from one another at the top. Each has strengths — Claude on reasoning chains, GPT-5 on tool use, Gemini on long context — but no single model dominates every task.

Hand-routing between them works for simple flows but breaks down at scale. A human-designed router for a complex agentic workflow has to anticipate which model handles which subproblem best, which is exactly the kind of decision a trained RL agent can learn from feedback faster than humans can rewrite their code.

The 7B size is also deliberate. A larger conductor would defeat the purpose: the whole point is that orchestration is cheap and the expensive frontier models are called sparingly. The conductor decides; the heavy models execute.

Two Product Tiers

Sakana is shipping Fugu in two configurations. Fugu Mini is built for low-latency operations — customer support, real-time agents, anything where waiting matters more than perfecting the answer. Fugu Ultra is the maximum-performance version, designed for the hardest reasoning workloads where you'd run a frontier model with extended thinking budgets anyway.

Both are accessible through OpenAI-compatible API endpoints, which is the right deployment choice. Any enterprise that has already written code against the OpenAI API can point at Sakana's endpoint with minimal change. That dramatically lowers the switching cost from a single-model deployment to an orchestrated one.

Pricing has not been published. Sakana is taking applications for the beta and prioritizing software development, research, and "autonomous multi-agent workflows across industries such as finance and defense."

Industry Implications

If Sakana's claims hold up in production, the multi-agent layer becomes the place where margin lives in enterprise AI. Frontier model providers — OpenAI, Anthropic, Google — are commodified into "skilled workers" that a router calls. The router operator captures the relationship with the customer and the workflow logic.

That dynamic is exactly why companies like LangChain, Cohere, and Microsoft have been building orchestration tooling, and why Anthropic has invested in the Model Context Protocol (MCP) as a way to keep Claude central to agent workflows. Sakana's wrinkle is that the orchestrator isn't a framework — it's a model. That makes the orchestration itself a defensible asset rather than glue code.

The Japanese provenance also matters strategically. Sakana raised a Series B in 2025 backed in part by sovereign-adjacent capital and has been positioned as Japan's frontier AI champion. A commercial product that turns the global frontier model market into an inference backend for Japanese-built coordination is a coherent strategic story for Tokyo's AI policy goals.

What to Watch Next

Three things will determine whether Fugu is a curiosity or a platform shift. First, third-party benchmark replication. Sakana's published numbers are strong; independent evaluation from groups like LMSYS or Vellum will tell us how much holds up outside their test setup.

Second, who the early commercial customers are. Sakana has flagged finance and defense as priority sectors. If Fugu ships into a large bank's coding workflow or a defense contractor's R&D pipeline, that's a meaningful signal that the orchestration layer is real enterprise infrastructure, not a research demo.

Third, the response from frontier model providers. OpenAI, Anthropic, and Google all have a strong interest in keeping their models central to customer workflows rather than being routed by a third-party model. Expect counter-moves around native orchestration features in the next two quarters.

The bottom line: a 7B Japanese model just claimed it can outperform any individual frontier model by directing them as a team. If that's true at production scale, the next twelve months of enterprise AI architecture look different than the last twelve. The frontier itself may matter less than the conductor that calls it.

Sakana's Fugu: a 7B Model That Conducts GPT-5, Claude, and Gemini

Sakana's Fugu: a 7B Model That Conducts GPT-5, Claude, and Gemini

What Fugu Actually Does

The Performance Numbers

Why a Conductor Beats a Bigger Model

Two Product Tiers

Industry Implications

What to Watch Next

Sources

Don't fall behind

Related Articles

OpenAI's GeneBench-Pro Tests AI on Real Biology Research

Anthropic Launches Claude Fable 5: Its Most Capable Model Yet

China Plans $295B AI Data Center Buildout to Rival the US