Google Splits TPUs in Two: Meet TPU 8t and 8i, Built for the Agent Era

Google used Cloud Next 2026 in Las Vegas to unveil its eighth-generation Tensor Processing Units — and for the first time, the flagship accelerator is shipping as two different chips. TPU 8t handles training. TPU 8i handles inference. Both are purpose-built for AI agents that plan, call tools, and run for hours at a time.

The split is Google's most aggressive swing yet at Nvidia's data-center dominance. It also signals that the "one chip does everything" era of AI silicon is ending.

Why the split, and why now

For seven generations, Google shipped a single TPU design that tried to do it all. That worked when the industry was focused on training ever-larger models. But the workload has changed. In 2026, most compute is spent on inference — running models in production, often inside multi-step agent loops that hammer memory bandwidth far more than raw FLOPs.

Google's argument is that training and inference have drifted so far apart that a single chip compromises both. Training needs massive parallelism across tens of thousands of chips. Inference needs huge, low-latency memory for the key-value caches that keep long conversations fast. One die can't win at both.

That's the opening TPU 8t and 8i are built to exploit.

TPU 8t: the training chip

TPU 8t is the beast. Google says a single cluster can scale to 9,600 chips sharing 2 petabytes of high-bandwidth memory. Total compute is roughly three times the previous-generation Ironwood TPU.

Performance-per-dollar is the headline number Google wants enterprises to hear: the company claims 2.8x the training performance of Ironwood at the same price. For frontier labs, that's the difference between a six-month pretraining run and a six-week one.

Why this matters: training cost is the gating factor for how fast any AI lab can iterate. Shaving months off a pretraining cycle isn't a marginal improvement — it changes how often new models can ship.

TPU 8i: the inference chip

TPU 8i is the more surprising half of the announcement. Instead of chasing FLOPs, Google tripled the on-chip SRAM relative to Ironwood. That extra static memory lets a single TPU 8i hold a much larger key-value cache at inference time, which is exactly what long-context models and agent workloads need.

The result is roughly 80% better inference performance at the same price point as Ironwood. Both chips also deliver up to 2x better performance-per-watt than the previous generation — a real number when a single hyperscale deployment can draw multiple gigawatts.

If 8t is aimed at the labs building models, 8i is aimed at everyone running them.

Industry impact

The timing is loud. Google made the announcement one day after Anthropic confirmed a 3.5-gigawatt TPU capacity deal that comes online starting in 2027. Nvidia is still the default for most AI startups, but Google is quietly stacking the biggest captive inference customer outside of OpenAI.

For cloud customers, the practical change is simpler. If you're training a custom model, you rent 8t clusters. If you're running a product — a chatbot, a coding assistant, an agent — you rent 8i. Google's pitch is that you pay less for both than you would on equivalent GPU capacity.

Nvidia still has the software moat. CUDA isn't going anywhere in the short term. But for workloads that live inside Google Cloud, the economics just shifted.

What analysts are saying

Industry observers at Hyperframe Research called the split "the end of general-purpose silicon," arguing that specialization is now the only way to keep improving performance-per-watt at frontier scale. SiliconANGLE noted the bifurcation mirrors what Nvidia itself has started to do with its Blackwell and Rubin inference-tuned variants, but Google has gone further by separating the product lines entirely.

Sundar Pichai, in his Cloud Next keynote, framed the announcement as Google Cloud's "full-stack bet" on the agentic era — chips, models, and platform together.

What's next

Ironwood, the seventh-generation TPU, is now generally available to Google Cloud customers as of Cloud Next. TPU 8t and 8i are both listed as "coming soon," with no firm GA date yet. Based on Google's historical cadence, broad availability is likely in late 2026 or early 2027, which lines up with the Anthropic capacity ramp.

Enterprises that want early access will go through Google's Vertex AI and the new Gemini Enterprise Agent Platform — the two surfaces where TPU 8i will show up first for customer inference.

Bottom line

Google just told the AI industry that training and inference are two different businesses now, and it's going to sell a different chip for each. If the performance numbers hold up in production, every team running serious inference on GPUs will at least have to run the math on 8i. That alone makes this one of the most consequential hardware announcements of the year.

Google Splits TPUs in Two: Meet TPU 8t and 8i, Built for the Agent Era

Google Splits TPUs in Two: Meet TPU 8t and 8i, Built for the Agent Era

Why the split, and why now

TPU 8t: the training chip

TPU 8i: the inference chip

Industry impact

What analysts are saying

What's next

Bottom line

Sources

Don't fall behind

Related Articles

OpenAI's GeneBench-Pro Tests AI on Real Biology Research

Anthropic Launches Claude Fable 5: Its Most Capable Model Yet

China Plans $295B AI Data Center Buildout to Rival the US