Skip to main content
🎓 Claude Code Masterclass Learn AI-assisted development on Udemy — plus the companion book on Leanpub & Amazon. Start Learning
Spencer Kimball's keynote slide 'What CockroachDB Becomes' at RoachFest London 2026
database

Inside Cockroach Labs' AI Playbook: The Database Reckoning

From Cockroach Labs' own floor at RoachFest London 2026: 1,000 AI-built internal apps, an AI support engine, and why agents have no natural ceiling.

LB
Luca Berton
· 8 min read

Between the main-stage keynote and a series of floor conversations at RoachFest London 2026, I got a far more detailed picture of how Cockroach Labs itself is thinking about the agentic AI wave than any single talk could cover. Pieced together, it reads less like a vendor pitch and more like a company narrating its own reckoning with what is coming.

1,000 AI-Built Apps in Two Months

The starting data point: Cockroach Labs built an internal tool called Micah in a single week back in February. It gives every employee a ready-made environment to build apps, with access to whichever data sources their credentials already permit, out of the box. Two months after launch, 1,000 internal applications had already been deployed — every one backed by a CockroachDB serverless database.

Their working rule of thumb: one AI hour is roughly equivalent to one human week of output, and they consider that a conservative estimate. Scale that ratio across every company running the same playbook, and the proliferation curve gets steep fast — which is exactly the premise behind the next part of the story.

Trillion Agents, Zero Ceiling

One line stopped me cold on the floor: “There’s no limit to this. There’ll be 10 billion agents. There’ll be 100 billion agents. And there’ll be a trillion agents.”

The argument behind it: mobile saturated at roughly 8 billion humans — a hard ceiling set by the size of the population. Agents have no equivalent ceiling. Cockroach Labs is already seeing 10x traffic growth from AI agents, and each successive 10x is arriving faster than the last, far faster than the three-to-five years the first jump took. The practical consequence for infrastructure: monoliths hit a wall, and most enterprises are still not on distributed infrastructure. What is coming is described as a “swarm proliferation” of databases following a power-law distribution across every enterprise, as AI-generated apps spin up in minutes and each one needs its own data layer.

This is the same rising-tide pattern I heard described elsewhere on the floor: training GPUs came first, inference flipped the balance, agents went from a speculative proposition to near-ubiquitous in coding workflows in under a year, and data infrastructure is next in line. Agents call tools, tools call APIs, and data infrastructure has to handle a new class of machine-driven traffic at a scale nobody has architected for yet. Cloud providers are already publishing breakdowns of human-driven versus bot/agent API traffic — this is measurable now, not a forecast.

Mission-Critical, Zero RPO

None of this matters if the database underneath cannot survive it. Cockroach Labs’ pitch to mission-critical customers stays constant regardless of the AI framing: lose an entire region and still hit zero recovery point objective, hold cross-country global transactions with strong consistency, and keep elastic scalability firm at peak load. In the speaker’s words, these are “very, very difficult problems to solve in a context of distributed relational data systems.” The customer logos on screen — Booking.com, DoorDash, Roblox, OpenAI, SpaceX, Cisco — were running mission-critical workloads on this foundation long before the market understood it needed this level of resilience.

A separate, more technical floor conversation walked through what that resilience actually costs to operate: topology and node configuration, data locality and its latency implications, what Raft consensus really means for an ops team’s day-to-day, and where latency shifts during a large-scale outage. The reassurance offered was genuine rather than glib — once a team has climbed that learning curve, those concerns fade into the background and focus shifts back to shipping.

The Database Cost Iceberg

Most companies see only the tip of the database cost iceberg, and that is exactly why modernization stalls before it starts. An estimated 80% of database spend sits hidden below the surface — idle compute, storage, and network capacity quietly draining budget, plus human labor costs that are among the most significant line items in the total cost equation. Migration costs are high enough, and success rates uncertain enough, that many organizations simply do not move, sitting on mountains of legacy technical debt instead.

The reframing on offer: a single database going from supporting hundreds of use cases to supporting thousands is a genuinely disruptive multiplier, but every inefficiency compounds if the underlying dynamics are not addressed first — infrastructure cannot keep scaling linearly the way it did in the 1990s.

Storage Virtualization: The Next Chapter

The most forward-looking technical thread covered CockroachDB’s next storage architecture. A new, purpose-built storage layer is coming — echoing how Pebble replaced RocksDB — cutting write amplification, delivering roughly 10x throughput gains, reducing latency, and lowering storage costs by multiples. Compute-storage decoupling lets clusters auto-scale without moving data between nodes, and virtual clusters — already powering the serverless product since around 2020 — are being extended to enterprise and BYOC customers on dedicated infrastructure.

The uncomfortable number behind the pitch: most operational databases run at just 5-15% utilization. Virtualization can push that above 50%, which is a direct cost reduction rather than a marginal efficiency gain. An emerging agent layer is planned to sit on top of this stack, covering migration tooling and fleet management — the same agent-native database architecture Spencer Kimball outlined on the main stage.

AI Support: 90%+ Case Resolution From the First Email

The most striking individual statistic came from a conversation about turning years of CockroachDB support history — runbooks, filed tickets, engineer back-and-forths, symptoms, remediations — into a real-time AI knowledge engine. The idea: an AI agent queries that institutional knowledge the moment a customer’s first support email lands, with no human engineer in the loop yet.

The results: 40% of cases get the exact correct answer from the initial email alone, with cited evidence and a full remediation path. A second AI evaluates the first AI’s output for quality control. More than 90% of the time, the result is at least a mostly correct, genuinely useful answer. The broader point volunteered alongside the number: this is not a database-specific story. AI in the support loop is coming for every industry that has years of distilled institutional knowledge sitting in ticket history, because it moves faster and holds an entire organization’s accumulated knowledge at once.

Fleet Management and Multiplying Team Leverage

The operational thread tying all of this together was a shift in how Cockroach Labs frames database ops: stop treating individual workloads as isolated problems and start managing the full estate as a fleet. Setting policy, economics, and capacity centrally dramatically amortizes costs at scale. AI becomes the ops engine for the grunt work no team would ever staff for, catching the “little threads” before they escalate into outages — while humans stay elevated to own policy and governance rather than executing every routine task themselves.

The people-side framing was equally direct: you cannot hire your way out of a database estate that is growing faster than headcount ever will. The real unlock is multiplying the leverage of the team already in place — including, notably, finally tackling the backlog most teams currently cannot even reach because there was never enough capacity to get to it.

AI Agents as Real-Time Guardrails

The most concrete safety pitch involved AI agents functioning as a real-time layer against operator error. The opening example was sobering: a trading app incident where a costly mistake led to someone being put on administrative leave — entirely preventable, in the speaker’s view. The proposed fix borrows a “two-key” analogy from nuclear launch protocols: destructive database actions should require a second check, the same way turning two keys at once prevents a single person from acting alone. An agent with full cluster context can flag “this is a production cluster” or catch a query touching an index that is no longer in use, essentially for free and in real time — no waiting on a slow human review cycle.

The same agent-as-monitor pattern extends to day-to-day operations: agents stepping into monitoring and advisory roles, catching slow queries and missing indexes before they escalate, and handling migration diagnosis proactively rather than after the fact. The stated payoff is freeing engineering teams to focus on the strategic decisions the business actually needs from them.

The Throughline

Every one of these conversations, taken separately, sounds like a different pitch: internal tooling, resilience, cost, storage architecture, support automation, ops, safety. Taken together, they describe one company using its own infrastructure as the test bed for a thesis it is also selling: that the AI wave does not arrive as a single product feature, it arrives as pressure on every layer of the database estate simultaneously, and the only architecture built to absorb that pressure is one that was already distributed before agents showed up asking for it.

About the Author

I am Luca Berton, AI and Cloud Advisor. I work at the intersection of distributed systems, platform engineering, and enterprise AI deployments. Book a consultation.

Free 30-min AI & Cloud consultation

Book Now