Google’s Ironwood TPU v7 Is Coming — The Inference Superchip Set to Rewrite AI’s Power and Profit Rules

Google’s Ironwood Chip Redefines AI Economics, Powering a New Era of Inference

A Technical Leap for a Power-Constrained World

Google Cloud’s Ironwood TPU v7 is stepping into the spotlight, moving toward general availability after its April 2025 preview. This isn’t just another chip launch—it’s a bold architectural gamble. Google is betting big on inference rather than training, a shift made crystal clear after new technical details emerged at Hot Chips 2025.

Each Ironwood unit delivers a staggering 4,614 teraflops of FP8 compute power, supported by 192 gigabytes of lightning-fast HBM3e memory running at 7.3 terabytes per second. Built on an advanced 5-nanometer process, the chip sips roughly 600 watts of power—impressive for its output.

The real magic happens at the pod level. Picture 9,216 liquid-cooled chips connected through optical circuit switching, together achieving 42.5 exaflops of FP8 performance and an eye-popping 1.77 petabytes of shared memory. That’s a record-breaking figure in the world of machine learning systems. This setup reveals Google’s core belief: the biggest hurdle in late-2025 AI deployment isn’t raw compute anymore—it’s memory, bandwidth, and power efficiency needed to run massive, stateful AI agents at scale.

Ironwood’s 1.2-terabyte-per-second I/O fabric and its doubled performance-per-watt compared to the previous Trillium generation directly tackle those pain points. Hyperscalers are running into the physical limits of power grids, so squeezing more inference out of every watt has become the new benchmark. In today’s multi-gigawatt data centers, the key metric isn’t how fast you can train, but how efficiently you can serve inference workloads.

Anthropic Deal Sparks Demand and Fuels Rivalry with Nvidia

The turning point came on October 23, 2025. Anthropic signed a massive deal committing to “up to one million TPUs” and “tens of billions of dollars” in contracts, with projected power usage topping one gigawatt by 2026. Overnight, Ironwood transformed from a roadmap promise into a production reality backed by real, high-stakes demand.

For Google, this deal means visibility and stability. It can now plan data center construction and power agreements without fearing unused capacity—a major worry back in April.

The scale of Anthropic’s bet says it all. Instead of waiting for Nvidia’s or AWS’s latest chips, the Claude developer chose Google’s Ironwood for its speed-to-market and power efficiency. That’s a clear nod to TPU v7’s economics: more inference, less energy. In a world where power, not silicon, limits growth, that matters more than ever.

Competition in the AI chip world is now splitting by workload. Nvidia’s Blackwell chips still rule frontier training, delivering up to 30 times faster inference than Hopper and setting the stage for Rubin’s 3.6-exaflops rack-scale setups coming in 2026. AWS, meanwhile, has deployed 500,000 Trainium2 chips connected through its UltraCluster network, though each chip has less onboard memory (about 1.29 petaflops FP8 per 16-chip block). Microsoft’s Maia program is still lagging, with next-gen hardware delayed until 2026.

Google’s strategy is different. It isn’t chasing the biggest number—it’s chasing the right one. Ironwood’s 1.77-petabyte shared memory gives it an edge in handling mixture-of-experts models, long-context reasoning, and retrieval-heavy systems. Those are the workhorses of modern AI. While Nvidia sells a one-size-fits-all solution, Google is building infrastructure tailor-made for what it calls the “age of inference.”

Investment Insight: Protecting Margins Through Vertical Integration

For Alphabet investors, Ironwood represents more than just a new chip—it’s a defense against shrinking margins in the cloud AI business. Hyperscalers like AWS are projected to hit 11.8 gigawatts of power capacity by 2027, and the entire industry is spending heavily through that period. Custom silicon lets Google turn that spending into profit, capturing value from chip design to deployment.

The numbers tell the story. Ironwood doubles performance-per-watt compared to Trillium, meaning each megawatt of data center capacity in 2026 can produce twice the inference output of 2024 systems. Add in smarter software—like Google’s vLLM integration and improved Pathways scheduling—and Google can price its AI services competitively while still improving margins. Simply put, running your own chips beats reselling someone else’s.

Anthropic’s contract also removes the uncertainty from Google’s capital spending plans. Instead of building capacity and hoping customers come, Google now builds against guaranteed demand. That shifts the financial model from speculation to certainty—AI infrastructure spending is now linked directly to locked-in revenue.

Still, three big questions hang in the air. First, can Google attract more anchor clients? Two or three additional long-term TPU deals would prove Ironwood isn’t just a one-customer wonder. Second, will power projects stay on schedule? The 2026 target hinges on substation approvals and construction timelines that aren’t entirely in Google’s hands. Third, can Google’s software stack keep pace with Nvidia’s CUDA ecosystem? Utilization rates will depend on it—falling from 90% to 70% would hurt efficiency.

While Nvidia’s dominance in training remains safe, the threat from custom chips like Ironwood is real. Google isn’t trying to dethrone Nvidia in research or rapid prototyping. Instead, it’s targeting the bread-and-butter workloads—the massive, steady inference tasks that keep AI systems running daily. By 2027, Google’s TPUs could handle up to 30% of the total inference market.

That shift, combined with similar pushes from Amazon and Microsoft, explains why Nvidia faces mounting pressure to justify its pricing. The AI chip market is evolving from one giant supplier to several vertically integrated ecosystems, each owning its stack.

Ironwood’s true significance lies in proving that AI infrastructure can boost profit margins—not just scale endlessly. For Alphabet, it’s a strategic safety net, offering flexibility to switch between internal use and cloud rental while reducing reliance on outside chipmakers. In a world where efficiency is king, Google’s Ironwood may be the chip that rewires the economics of AI itself.

NOT INVESTMENT ADVICE