AI Infrastructure Is Commoditizing Fast. Build on Top of It.

Five simultaneous waves are collapsing the AI stack's cost floor — expanding, not shrinking, the surface area for value creation above each layer.

Nora Gray|March 1, 2026|15 min read

When Amazon launched S3 and EC2 in 2006, the conventional fear was that cheap, managed compute would destroy the server hardware business. That fear was correct. It also missed almost everything that mattered. What AWS actually did was eliminate the capital barrier to starting a technology company — and in doing so, it made Airbnb, Dropbox, Netflix's streaming architecture, and thousands of other businesses possible for the first time. The companies that were threatened were the ones selling the plumbing. The companies that won were the ones building on top of it.

The AI stack is now running the same play, on a compressed timeline. The question for anyone building on AI infrastructure is not "will my moat survive?" but "what does the new plumbing make possible that wasn't possible before?"

There is a wrinkle the AWS analogy obscures: this time, commoditization is happening at multiple layers simultaneously, not sequentially. That compression deserves careful treatment.

The Pattern History Keeps Running

Infrastructure commoditization follows a recognizable arc. It destroys value at the layer being commoditized and creates far more value at the application layer above it, by collapsing the barrier to entry for applications that depended on the now-cheap infrastructure.

Stripe commoditized payments infrastructure. Months of bank relationship development and PCI compliance engineering — the work that previously stopped companies from launching with online payments — collapsed to a few API calls. Shopify, Lyft, and an entire generation of internet commerce were the result. Not competitors to Stripe, but beneficiaries of its existence.

The App Store commoditized mobile distribution, which had previously required carrier relationships and platform approvals that most developers couldn't navigate. The mobile application economy, generating over $130 billion in annual revenue within fifteen years, followed.³

The invariant: infrastructure commoditization lowers the cost floor for dependent applications, expands their addressable market, and shifts the competitive axis from "who can build the infrastructure" to "who builds the most valuable things on top of it."

The AWS analogy also contains a warning usually omitted from the bullish retelling. Not every company that benefited from cheap cloud compute won. The ones that lost treated the infrastructure saving as margin to capture rather than leverage to deploy — they ran the same business cheaper instead of building a categorically different one. Cheap agent infrastructure will draw the same distinction. Companies that use it to reduce cost in existing workflows will capture modest gains. Companies that use it to pursue workflows that were previously impossible will capture the era.

Five Waves, Climbing the Stack

The AI infrastructure stack is commoditizing in waves, each building on the last. What is unusual about this cycle — and what the AWS analogy understates — is that waves two and three are arriving before wave one has fully settled. That concurrency changes the risk calculus.

Wave 1 — Foundation models (roughly 2023–2025): The capability-to-cost ratio of foundation models has improved at a pace without precedent in computing history. The "Densing Law," documented in Nature Machine Intelligence, describes capability density doubling approximately every 3.3 months.⁴ In concrete terms: achieving more than 60% on the MMLU benchmark required a 540-billion-parameter model in 2022; by 2024, a 3.8-billion-parameter model crossed the same threshold — a 142-fold reduction in parameter count for equivalent capability.⁴ GPT-3.5-class query costs fell from roughly $20 per million tokens in November 2022 to roughly $0.07 per million tokens by October 2024 — a 280-fold reduction in eighteen months.⁵ Frontier model capability at a given quality level is commoditizing, but the frontier itself keeps advancing. The shape of commoditization is a staircase, not a floor: yesterday's premium capability becomes cheap while today's frontier remains scarce.

Wave 2 — Inference infrastructure (2025–2026): Managed inference APIs from AWS, Azure, and Google Vertex AI commoditized the operational burden of serving foundation models. Multi-provider routing enabled automatic fallback and price arbitrage. Quantization techniques — including 1-bit and ternary weight approaches that match full-precision performance — dramatically reduced hardware requirements for inference.⁶ Serving models at scale is commoditizing. Inference margin, once a significant source of value for early API providers, is converging toward commodity pricing; differentiation is shifting to reliability, latency guarantees, and compliance posture rather than raw capability.

Wave 3 — Agent infrastructure (2026, now): Stateful orchestration, session management, tool coordination, identity and permissions, sandboxed execution — the custom engineering that every serious agent application previously required — are becoming managed platform services. This wave is arriving now. Its significance is not just cost reduction but complexity reduction: the six-to-twelve-month infrastructure buildout that preceded any serious agent application disappears.

Wave 4 — Agent frameworks and patterns (2026–2027, projected): The Model Context Protocol, developed by Anthropic and now being adopted broadly, represents an attempt to standardize the interface between AI agents and external tools — the HTTP of agent-tool communication.⁸ When this or a successor becomes the universal tool interface, integrating agents with external services drops from an engineering problem to a configuration problem. Multi-agent coordination patterns and standard workflow topologies will follow. The HTTP analogy makes the consequence precise: standardization enables ecosystem explosion, but the interface itself confers no advantage. Value migrates entirely to what sits at each endpoint — the quality and depth of the tool or agent behind the MCP interface, not the interface itself.

Wave 5 — Unknown: History suggests Wave 5 will be whatever requires significant custom engineering at the Wave 4 layer. The safe prediction is that it involves domain-specific intelligence, trust and verification infrastructure for autonomous agents in regulated or high-stakes environments, or governance systems for multi-agent coordination at scale — the problems that don't look like infrastructure problems until they obviously are. This is worth naming because it is actionable: companies building Wave 4 applications today will develop the clearest view of what Wave 5 needs to solve, and that view has durable strategic value.

The critical observation across all five waves: each makes something categorically new possible at the application layer, not merely a cheaper version of what existed before. Wave 1 made AI-powered features viable for any company, regardless of ML team size. Wave 3 makes autonomous agent systems viable for any company with a use case, without custom orchestration engineering. Wave 4 will make complex multi-agent organizations viable without reinventing coordination. The surface area for value creation expands with each wave.

A Chapter Break in February

Something significant happened in the agent infrastructure market: Amazon Web Services and OpenAI announced a jointly developed Stateful Runtime Environment running natively in Amazon Bedrock — persistent working context, memory, tool and workflow state, and identity boundaries that survive across agent sessions.¹

The technical content is notable. Amazon Bedrock AgentCore provides session isolation via Firecracker microVMs (the same lightweight virtualization underlying AWS Lambda), abstracted short- and long-term memory, native OAuth2 identity integration, Model Context Protocol-based tool coordination, and sandboxed code execution — supporting agent tasks up to eight hours without custom cluster management.¹

The strategic content is more significant. AWS has framed this explicitly through their classic "undifferentiated heavy lifting" thesis — the same argument they made for compute in 2006, for managed databases in 2009, and for container orchestration in 2017. The thesis: handle the commodity infrastructure burden so customers can focus on building their actual business. When AWS applies this to agent state management, it signals that the hyperscalers have reached a collective conclusion — stateless language models are insufficient as a foundation for autonomous AI applications, and stateful orchestration is the next platform layer.

LangGraph Cloud, with roughly 400 companies in production and 90 million monthly downloads, and Letta (the commercialized successor to MemGPT) represent the open-source and commercial parallel to the hyperscaler buildout.² The infrastructure race for stateful agent orchestration is not emerging — it has already been decided. The race that matters is the one above it.

Why Stateless to Stateful Is a Categorical Upgrade

The shift from stateless to stateful AI deserves precise treatment, because the intuition that it's merely incremental is wrong. The distinction is categorical in a formal sense.

Alan Turing's foundational work established that the key distinction between finite automata and Turing machines is the tape — external, unbounded memory. A finite automaton can recognize patterns within a fixed window but cannot, for example, verify that an arbitrarily long sequence of parentheses is properly balanced, because doing so requires remembering how deep the nesting goes. A Turing machine, with access to external memory, can solve this and any computable problem. External persistent state changes the class of problems a system can solve, not just the efficiency with which it solves a fixed class.

A stateless language model is an extraordinarily sophisticated finite system. It can answer questions, synthesize information, generate code — all within a single context window. But it has no memory between calls, no persistent identity, no ability to learn from its own actions over time.

A stateful system — one that persists memory, tracks tool state, maintains identity, and builds institutional knowledge across sessions — can pursue goals across time, accumulate expertise, correct its own errors, and improve its own processes. The distinction is not "better" in the sense of a faster engine. It is "better" in the sense of a computer versus a calculator.

Precision matters here, because it shapes application design. Statefulness does not make the model smarter — the weights do not change. It makes the system capable of compound behavior across time: early actions inform later ones, errors surface through feedback loops, successful patterns get encoded and reused. This is the same reason a junior employee with a good memory and a notebook outperforms a brilliant consultant who takes no notes and never returns.

The Voyager system, developed at Cornell and NVIDIA, provides the clearest empirical evidence. Operating in a Minecraft environment, Voyager achieved a 15.3x speedup through persistent skill accumulation — learned skills stored as executable code in an external library, retrieved and composed to solve new problems as the library grew.⁷ The model weights never changed. The system improved because the external state accumulated. What makes this result transferable beyond Minecraft is the structure: the feedback loop between in-context computation and persistent external memory is not domain-specific. It is a general mechanism, and it is now accessible as managed infrastructure.

MemGPT (now Letta, out of UC Berkeley) implements a virtual memory hierarchy for language models — core memory in-context, archival memory in an external database retrieved by search, conversation history in a rolling buffer.² The formal analogy is precise: MemGPT is to a language model as a virtual memory system is to RAM. A finite resource appears effectively infinite to the application. The bottleneck was never raw intelligence; it was the absence of a disciplined architecture for managing what the intelligence had already learned.

What Bedrock AgentCore now provides as a managed service — session isolation, memory abstraction, identity, tool coordination, sandboxed execution — was previously custom engineering. The barrier to building systems with these capabilities has just dropped by an order of magnitude. The Voyager result, which required a research team at Cornell and NVIDIA to replicate, now requires a weekend.

The Undifferentiated Heavy Lifting Thesis, Applied to Agents

AWS's 2006 pitch was elegant in its simplicity: handle the heavy lifting of running data centers so customers can focus on their actual business. The "undifferentiated" qualifier mattered — it referred to work that every company needed to do but that gave no company a competitive advantage. Building and operating a data center was expensive and critical, but doing it slightly better than competitors produced no differentiation in the product customers actually valued.

The same logic applies, precisely, to agent state management. Every company building autonomous agent systems needs session isolation, persistent memory, tool coordination, and identity management. None of them gains competitive advantage from building these components better than competitors. Differentiation lies in what they do with the agents — the domain knowledge, the workflow design, the quality of decisions made at critical junctures.

There is a subtlety the simple "undifferentiated heavy lifting" framing glosses over. When infrastructure becomes a managed service, it becomes a dependency as well as a capability. Companies that built their differentiation on top of custom agent infrastructure now face a choice: migrate to the managed service, ceding control in exchange for cost reduction and reliability, or maintain custom infrastructure as a hedge against platform risk. This is the same tension that emerged when companies decided whether to move workloads from their own data centers to AWS — and the resolution will be the same. Companies with genuine reasons to control their infrastructure (specific latency requirements, regulatory constraints, data residency needs) will maintain it. Companies that confused "custom infrastructure" with "competitive advantage" will migrate and be better off.

The competitive axis is already shifting. The race is moving from model capability to control plane — but even this framing will prove transient as the control plane commoditizes. The race that persists is the one for the best applications built on the control plane, not the control plane itself.

What this means for strategy is a reframe. The old question: "how do we protect our infrastructure investment from commoditization?" The new question: "what does the newly commoditized infrastructure make possible that we should build right now?" These questions produce entirely different answers and entirely different resource allocations.

The Constraint That Doesn't Commoditize

There is one limitation that infrastructure waves cannot solve, and it sets a hard boundary on application design.

Current AI systems — even sophisticated compound systems with persistent state, tool use, and multi-agent coordination — score roughly 37.6% on ARC-AGI-2, a benchmark designed to require genuinely novel reasoning that cannot be solved by pattern matching from training data.⁹ The tasks require fluid intelligence: recognizing abstract patterns from minimal examples and applying them in unfamiliar contexts. Pure language models score 0%. Even with substantial compute, performance plateaus well below human levels.

This is not primarily an infrastructure problem, and it is not clearly a scaling problem either. It reflects a deeper distinction between two types of cognitive work: pattern application within a known distribution (where current systems are remarkably capable) and novel abstraction that restructures existing conceptual frameworks (where they reliably fail). Stateful orchestration, more compute, and better tool use all improve the former. None of them obviously address the latter — the latter doesn't fail due to memory limitations or context length, but because the underlying computation is not performed.

The theoretical framing is useful here. Language models process information within a fixed computational depth per token. Deep novel reasoning — the kind required by ARC-AGI-2 — requires iterative refinement of intermediate representations that current inference-time architecture does not natively support. Chain-of-thought prompting is a partial workaround, not a solution. Extended thinking approaches show some improvement, but the gains are modest relative to the gap.

The practical implication for application design is precise: target domains where the task is pattern application within distribution, even sophisticated and multi-step pattern application. Identify and respect the boundary where genuinely novel abstraction is required. This boundary is not a vague caveat — it can be operationalized. ARC-AGI-2 performance is a leading indicator. When frontier systems cross 90% at reasonable inference cost, the boundary has moved materially. Until then, the constraint is real, measurable, and should be reflected in application scope.

The more consequential insight is that this constraint is domain-shaped. In some domains — legal research within established precedent, financial analysis within known market regimes, code generation within documented APIs — the vast majority of productive work is pattern application within distribution, and the novel-abstraction tail can be routed to human oversight. In others — scientific hypothesis generation, strategic planning under genuine uncertainty, novel medical diagnosis — the proportion is reversed. Infrastructure commoditization dramatically expands the viable surface area in the former category. It does not yet touch the latter.

Operational Data: The Asset With a Time Dimension

Every infrastructure layer that commoditizes makes one thing more valuable by comparison: the operational data generated by running applications on that infrastructure.

Model capability is being commoditized by the Densing Law.⁴ Inference infrastructure is being commoditized by managed APIs and quantization.⁶ Orchestration infrastructure is being commoditized by Bedrock AgentCore, LangGraph Cloud, and Letta.¹ ² None of these assets has a persistent time dimension — a competitor can access them today for roughly the same price as anyone else.

Operational data — agent performance histories, failure patterns, research outputs, workflow traces, decision quality metrics, domain-specific accumulated knowledge — can only be generated by running the system over time. It is the natural byproduct of building and operating actual applications. It cannot be purchased, downloaded, or replicated by a competitor who starts later. Every week of operation generates data a new entrant cannot have.

This claim deserves scrutiny, because it is the foundation of the timing argument. The counterargument: operational data only compounds into durable advantage if it is (a) proprietary, (b) systematically captured, and (c) actually used to improve the system. Many companies generate operational data and use none of it. The data exists in logs, not in a feedback loop.

The differentiating factor is architecture, not operation. A system designed from the start to capture agent traces, annotate failures, and feed this data back into workflow refinement will compound. A system that happens to generate logs while its operators focus on other things will not. The companies that own the most valuable operational data sets in a few years will be the ones that treated data capture as a primary design requirement from the beginning, not a retrospective analytics project.

This is the only competitive asset in AI that accretes over time rather than commoditizing. Architecture can be replicated; the data it generates, and the systematic feedback loops that make it useful, cannot.

The timing implication follows. The combination arriving now — frontier intelligence at commodity prices, plus managed agent infrastructure reaching general availability — creates the preconditions for autonomous AI applications that could not have been built before. The window is defined by first-mover advantage in operational data accumulation, not first-mover advantage in infrastructure. Infrastructure will be equally available to everyone. The applications running on it, and the operational data they generate and exploit, will not be.

The Build Window

The infrastructure waves converge on a specific moment. Frontier model capability is available at commodity prices within the next year, following the Densing Law's 3.3-month capability density doubling.⁴ Managed agent infrastructure — stateful orchestration, session management, memory, tool coordination, identity — is reaching general availability from hyperscalers and open-source alternatives simultaneously.¹ ²

This combination is the precondition for building autonomous AI applications that can pursue goals across time, accumulate domain knowledge, coordinate multiple specialized agents, and improve through feedback — without requiring a large custom infrastructure engineering team.

None of this was possible two years ago, because the infrastructure cost was prohibitive. In two more years, it will be widely available and the first-mover advantage will have compounded into the operational data of whoever started building now.

The window has three components that close at different rates. The infrastructure availability window — during which early builders gain experience with managed services before competitors do — is already closing; the services are public. The complexity window — during which genuine domain expertise in agent orchestration is scarce enough to constitute an advantage — will persist for another one to two years. The operational data window — during which running production systems generates compounding, proprietary knowledge that late entrants cannot replicate — closes on the timeline of the applications being built, not the infrastructure underneath them.

The race is not defensive. It is not about protecting a moat from the commoditization wave — the wave will arrive regardless. It is about building the valuable things that the new infrastructure makes possible, and accumulating the operational knowledge that comes from running them, before others recognize that the plumbing has changed.

That is precisely what AWS made possible in 2006, what Stripe made possible in 2010, and what app distribution made possible in 2008. The companies that won weren't the ones worried about the infrastructure commoditizing. They were the ones who recognized that the starting gun had fired and started building.

The starting gun for autonomous agent applications has fired. The infrastructure that was a barrier six months ago is managed overhead today. The question left standing is not whether to build, but whether the applications being built are the ones only possible now — or cheaper versions of what was already possible before.

Sources

1.
A
Amazon Web Services — mazon Bedrock AgentCore," AWS Documentation — 2025
2.
L
LangChain — angGraph Cloud," LangChain Documentation — 2025
3.
S
Data.ai (formerly App Annie) — tate of Mobile 2023," Data.ai — 2023
4.
T
Wang, S., et al. — he Densing Law of LLMs," *Nature Machine Intelligence* — 2025
5.
A
Artificial Analysis — I Model & API Benchmarks," Artificial Analysis — 2024
6.
T
Ma, S., et al. — he Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits," Microsoft Research — 2024
7.
V
Wang, G., et al. — oyager: An Open-Ended Embodied Agent with Large Language Models," Cornell University and NVIDIA — 2023
8.
M
Anthropic — odel Context Protocol," Anthropic Documentation — 2024
9.
A
ARC Prize Foundation — RC-AGI-2 Benchmark," ARC Prize — 2025