The Clearinghouse for Code
Part of the AI Systems topic guide.
Why code review is only the first bottleneck in an agent-scale software delivery stack.
Code review is the bottleneck today because we are still forcing an agent-scale world through a human-scale workflow. That's only going to be temporary. Review is just the first place the old software delivery stack starts to crack.
The next bottleneck in software is not writing code. It's clearing it.
For the last few years, AI has been eating software development one layer at a time. First autocomplete. Then code generation. Then agents opening pull requests directly: bug fixes, dependency bumps, entire features produced overnight without a human in the loop. Then AI code review. All of this is moving fast, but what hasn't caught up is everything that happens after the code is written.
Code still has to be tested, scheduled against finite CI capacity, merged without conflicting with other work, deployed into real environments, validated, monitored, and rolled back if it misbehaves. Those steps were tolerable when software changes arrived at human speed. They start to break when changes arrive at agent speed. This bottleneck is moving downstream and it isn't stopping at review.
Even AI-powered code review still operates on the same abstraction: a diff that someone has to read and approve. At agent scale, the problem isn't reading diffs faster. It's that the diff is the wrong unit of work entirely. A diff says "these lines changed". It doesn't say what outcome was intended or what evidence proves it's safe. That gap is manageable when a human author is sitting right there to explain. It's not manageable when a thousand agents are submitting overnight.
And even if you solved that, the system around it (CI capacity, merge ordering, environment contention, deploy sequencing) was never built for this volume. Review is the first wall, but it is not the last.
Your repo needs a clearinghouse.
What breaks first
The failure mode is easy to miss because it doesn't look like "bad code." It looks like congestion.
Imagine it's 9am. Your team's agents ran overnight and produced 23 pull requests. An AI reviewer has already triaged most of them. Two were flagged for logic issues. Six dependency bumps were auto-approved. A latency fix in the query layer has inline comments waiting for follow-up.
So far, so good.
Then you look at the actual queue.
Four PRs are blocked on CI. Three are contending for the same ephemeral test environment, which only one can hold at a time. Two others conflict because separate agents touched overlapping files without knowing about each other. One PR has been sitting for 18 hours not because anyone thought it was risky, but because it got buried. A security patch that should have shipped yesterday is queued behind a performance experiment with a week-long test run.
None of these are exotic failures. They are normal software delivery failures amplified by volume.
The review problem is increasingly tractable. The coordination problem is not.
23 legitimate changes are now competing for the same scarce resources: CI minutes, environments, deploy windows, and human attention on the small number of things that actually need judgment. Nothing in the system is deciding clearly what should clear first, what should block what, or what policy is supposed to govern the queue.
At 1,000x the hard part is no longer reading diffs. It's routing, sequencing, and allocating trust across a system that was never designed for this much contribution.
Once agents can generate code cheaply, code is no longer the scarce thing. Trust is.
The unit of work has to change
In the current model, the unit of work is a diff. It says: "someone changed these lines, please inspect them."
That works when humans are the authors, because the missing context lives outside the diff. Reviewers can ask questions. They can infer intent. They can rely on some combination of conversation, reputation, and intuition.
Agents don't have that kind of ambient legibility. They have outputs.
And if we keep treating those outputs as diffs to be manually reviewed, we'll drown in them.
The unit of work has to change from diff to claim.
A claim has two parts:
- Intent: what outcome is this change supposed to produce?
- Evidence: what proof shows that it did so safely?
"Fix this crash signature" is intent.
And "here is a reproducible sandbox run, before-and-after traces, passing tests, no regression in p99 latency, and a rollback plan" is evidence.
That shift matters because it changes what humans are reviewing. Humans stop reviewing code line by line. They review claims backed by evidence.
That is a different model of software delivery. Trust moves away from "someone looked at the diff and felt okay about it" and toward "the change satisfied policy with verifiable proof."
Why a clearinghouse?
Clearinghouses show up when transaction volume outgrows trust.
The old model (each party has to assess every other party directly) works at small scales. At large scales, it becomes unmanageable. These systems stop scaling not because the underlying activity is bad, but because the coordination mechanism around it can't keep up.
The fix is familiar: move trust out of individual relationships and into process. This means risk checks, settlement rules, audit trails, and clear guarantees about what happens when something goes wrong. Software delivery is heading the same way.
Today, review is still largely bilateral and social. An author submits work. A reviewer reads the diff. A team relies on its habits and shared context to decide what is safe enough to merge. That works when contributions arrive in dozens and most authors are people you know.
It breaks when contributions arrive in thousands and many of the "authors" are agents running continuously.
At that point, the system cannot depend on manual inspection and social trust as its primary mechanism. It needs something closer to a clearinghouse: a layer that classifies risk, allocates scarce resources, verifies claims, settles approved changes, and maintains a complete provenance record.
The change should not move forward because a human happened to trust it. It should move forward because it produced the evidence required by policy.
What the clearinghouse for code actually does
An agent does not simply open a pull request and hope someone looks at it. It submits a claim: the intended outcome, the proposed change, the evidence it plans to produce, and the risk class it believes applies.
Imagine not just 23 changes overnight, but 1,000.
The clearinghouse takes 1,000 incoming changes and immediately classifies them -- 400 dependency bumps, 300 localized bug fixes, 200 test/docs changes, 80 performance optimizations needing benchmarks, 20 touching sensitive surfaces like auth or payments. Then it nets: which changes overlap, which are redundant, which conflict, which can be batched. That alone might collapse 1,000 changes into 650 real units. Then those 650 compete explicitly for scarce resources (CI minutes, staging environments, canary slots) based on policy, not arrival order. Security fixes jump the queue. Low-risk bumps run in bulk on cheap shared runners. Experiments wait if they'd crowd out production fixes.
Then verification happens, proportional to risk. A docs fix needs almost nothing. An auth change needs sandboxing, invariants, and human signoff. Of the 650, maybe 500 clear automatically, 90 get kicked back to agents, 40 batch together, and 10 surface to humans because they exceed confidence thresholds.
Only after that comes settlement.
Approved changes are merged in an order the system understands, not in the order they happened to show up. Some settle immediately. Some settle in batches. Some go out through canaries with rollback conditions already attached. Some are held until related changes clear so the system can avoid introducing instability from partial ordering.
And then everything lands in an audit trail: which agent proposed it, which model generated it, what evidence it produced, what policy was applied, what resources it consumed, why it cleared, and what happened after deployment. And when a claim that cleared verification later fails in production, that signal feeds back, tightening classification, raising evidence thresholds, recalibrating which agents and which risk classes earn automated trust.
The point is not that review disappears. It changes what the system is optimizing for. It asks, "How do we turn incoming change into a flow of classified, verified, resource-aware settlement objects, such that only true exceptions ever need a human?"
So the winning product here probably isn't "GitHub, but with more AI." The core interface in today's workflow is the pull request: a place where humans exchange trust through comments, approvals, conventions, and social knowledge. The product is built around making those interactions easy.
If the unit of work becomes a claim backed by evidence, and if the primary problem becomes clearing risk rather than reading diffs, then the valuable surface moves. The "future Github" starts to look less like a collaboration tool and more like a control plane.
Its job is to accept intent, enforce policy, schedule scarce resources, route exceptions, and settle approved changes. It auto-clears everything that satisfies the evidentiary bar. What it surfaces to humans is not an inbox full of diffs. It is a small, curated set of exceptions: unusual risk profiles, failed verification, policy edge cases, high-blast-radius changes that exceeded automated confidence thresholds.
Human review becomes exception handling.
CI/CD becomes evidence infrastructure
Today, CI is mostly a gate. Did the tests pass? Did the build succeed? That's designed for a world where humans are doing the primary trust evaluation and CI is just providing a bit of extra confidence.
The clearinghouse needs something much more expansive. CI has to become the system that actually produces the evidence claims are verified against, and that evidence has to go well beyond testing.
A claim that says "this speeds up the query layer" shouldn't be verified by a unit test. It should be verified against benchmark deltas across representative workloads, with before-and-after latency traces showing whether p99 actually moved. A claim that says "this simplifies the checkout flow" should face synthetic users attempting to complete a purchase, not just a passing integration suite. A bug fix should come with trace replay showing the failure is gone, not just a green checkmark.
The gap between what CI produces today and what a clearinghouse needs to consume is enormous. Reproducing bugs from traces, spinning up production-faithful sandboxes, generating property-based checks, comparing real behavioral patterns before and after. That's a whole layer of infrastructure that mostly doesn't really exist yet, and someone has to build it.
The new bottleneck
A lot of people are still framing agentic coding as a throughput race: generate more code, open more PRs, automate more of the writing.
That is the wrong place to look. Code generation is becoming abundant, and human attention is not. And in between those two facts sits the real bottleneck: how quickly a system can verify that a change is safe, prioritize it against competing demands, and move it into production with confidence.
As AI-generated code floods the system, the real job shifts from producing changes to clearing them. The next big software platform will be built around that fact. It will look less like GitHub and more like a clearinghouse for code.