Google has sold so much TPU capacity that its own researchers are queueing for the rest
At a glance:
- Google's AI researchers, including those in DeepMind, now compete for internal TPU access due to massive external commitments to Anthropic and Meta.
- The company has locked up 5GW of TPU capacity for Anthropic over five years and signed a separate deal with Meta, straining internal resources.
- Hardware constraints (high-bandwidth memory shortages) and internal allocation policies based on seniority exacerbate the compute crunch.
The TPU Advantage and Its Consequences
Alphabet has spent the past decade cultivating an enviable AI infrastructure position: a thriving cloud business, proprietary custom chips, and strategic supply deals that position its TPUs as the default alternative to Nvidia for major external customers. This deliberate strategy has yielded remarkable success, transforming Google into a critical infrastructure provider for its own competitors. However, the very triumph of this approach has created an unforeseen internal bottleneck, as Bloomberg reported on Monday. Google’s own AI researchers—including teams within DeepMind—are now jockeying for access to the same TPU computing resources the company aggressively markets to third-party clients like Anthropic and Meta. The success that made Google a cornerstone of external AI infrastructure has inadvertently turned its internal compute fabric into a scarce commodity.
The structural cause of this crunch is straightforward: Google’s external commitments have consumed capacity that was historically reserved for internal innovation. The company’s decade-long TPU bet has finally produced the unit-economics advantage that allows it to sell chips to rivals while simultaneously hosting their models and running frontier research. But the shared infrastructure fabric is no longer large enough to accommodate all three uses simultaneously. This dynamic underscores a fundamental tension in Google’s AI strategy: validating TPU technology against Nvidia requires demonstrating volume traction with named customers, yet sustaining internal innovation demands reserved compute capacity. The result is a zero-sum game where external sales directly compete with internal research priorities.
The Scale of External Commitments
Google’s most significant external TPU commitment is its landmark deal with Anthropic, valued at up to $40 billion over five years. This agreement encompasses five gigawatts of TPU capacity and access to up to one million seventh-generation Ironwood chips—resources central to Anthropic’s training and serving roadmap. A separate Broadcom-mediated supply line adds another 3.5GW of TPU capacity for Anthropic starting in 2027, building on the 1GW already allocated for 2026. These commitments aren’t theoretical; Anthropic has explicitly cited Google’s TPU stack as foundational to its AI operations. Meanwhile, Meta signed its own TPU deal with Google earlier this year, further locking down compute resources that would otherwise flow internally. The combined capacity dedicated to these external partners represents a substantial portion of Google’s total TPU infrastructure, leaving researchers to queue for whatever remains.
The arithmetic behind these commitments is staggering. Alphabet has guided a capex range of $175 billion-$185 billion for 2026 alone, within a broader Big Tech AI infrastructure spend exceeding $650 billion this year. Google itself has brought well over a gigawatt of new AI compute capacity online in 2026, yet even this expansion hasn’t outpaced demand. The external deals with Anthropic and Meta alone account for gigawatts of capacity—resources that would have historically fueled Gemini training runs and DeepMind experiments. This scale explains why internal researchers now face rationing: the same infrastructure enabling Google to challenge Nvidia in the AI chip market simultaneously constrains its own innovation pipeline.
The Internal Crunch: Hardware and Allocation
The compute crunch manifests in two distinct forms, as DeepMind CEO Demis Hassabis outlined earlier this year. Hardware constraints stem from limited supply of key components, particularly high-bandwidth memory from Samsung, Micron, and SK Hynix. These choke points affect both external and internal users, but the bottleneck is more acute internally because researchers ‘need a lot of chips to be able to experiment on new ideas at a big enough scale,’ as Hassabis described. Beyond hardware, the internal allocation system creates friction: compute is rationed by managerial seniority rather than the unit-cost economics governing external customer contracts. Oren Etzioni, former CEO of the Allen Institute for AI, has publicly framed this as a predictable outcome of an internal market where access depends on hierarchy rather than merit or project potential.
This dual constraint has tangible consequences. Bloomberg’s reporting notes that researchers like Ioannis Antonoglou, a long-tenured DeepMind contributor, have departed for startup roles in the past 18 months—a pattern accelerating as compute access became harder to secure inside Google. The departure of key talent underscores how the resource crunch threatens Google’s ability to attract and retain top AI researchers. Meanwhile, the allocation system based on seniority creates inefficiencies: junior teams with breakthrough ideas may lack access to sufficient compute, while senior teams with established projects hoard resources. This dynamic not only slows innovation but also risks creating a two-tiered research culture where access to compute depends more on organizational hierarchy than scientific merit.
Google’s Delicate Position
Google has navigated this tension for the past 18 months in a precarious balancing act. On one hand, it needs its TPU program to demonstrate volume traction with marquee customers like Anthropic and Meta to validate the technology against Nvidia’s dominance. On the other hand, it must preserve enough internal capacity for Gemini training runs and DeepMind research to maintain its leadership in AI development. The company’s response has included expanding its inference-chip supply chain through partnerships with Broadcom, MediaTek, and Marvell—a hedge designed to relieve downstream constraints by adding capacity beyond TPU training. However, this solution remains nascent; the inference chips have not yet shipped at the scale required by the demand curve.
Google has not disputed Bloomberg’s characterization of internal allocation challenges on the record. Instead, the company emphasized its broader infrastructure investment posture and noted that compute constraints are a category-wide condition affecting all major model providers. This is accurate: Q1 2026 earnings across the industry reveal every major AI firm operating under compute constraints relative to its research ambitions. Yet what makes Google’s situation distinct is the paradoxical role it now plays: simultaneously serving as its main competitors’ largest infrastructure supplier while struggling to resource its own internal teams. This unique position forces Google into a delicate dance—selling the asset that underpins its competitors’ growth while ensuring enough remains for its own innovation.
Industry Context and Future Outlook
The compute crunch isn’t unique to Google, but its manifestation is particularly notable given the company’s dual role as infrastructure provider and innovator. Every major model provider faces hardware constraints, but only Google has become its rivals’ primary chip supplier. This creates a strategic dilemma: if Google continues selling TPU capacity at its current pace, internal research capacity may remain perpetually constrained. Conversely, if it prioritizes internal access, it risks ceding market leadership in the AI chip market to Nvidia. The next several quarters will settle this question, as Google’s capex expansion and inference-chip partnerships mature.
For the broader AI industry, Google’s situation highlights a critical inflection point. As compute becomes the scarcest resource in AI development, companies must decide whether to prioritize external revenue or internal innovation. Google’s experience suggests that even the most well-resourced organizations cannot indefinitely satisfy both demands. The company’s response—expanding supply chains through partners like Broadcom and investing heavily in new capacity—may offer a template for others facing similar tensions. Ultimately, Google’s ability to navigate this internal-external resource competition will shape not only its own AI trajectory but also the evolution of the entire AI infrastructure market.
FAQ
What is causing Google's internal TPU capacity crunch?
How much TPU capacity has Google committed to Anthropic?
What steps has Google taken to alleviate the TPU shortage?
More in the feed
Prepared by the editorial stack from public data and external sources.
Original article