Summary

Post-generation review was designed for human-written code, where generation and review operated at comparable speeds. AI tools have decoupled those rates. The result is a governance model that is structurally unable to scale with generation volume. The fix is not better review — it is moving architectural constraint enforcement earlier in the workflow, before code reaches human review at all.

There is a pattern in how software engineering disciplines mature. They begin as things humans do manually at the end of a workflow, and they migrate, over time, towards the beginning. Testing began as QA at the end of the release cycle. It migrated to unit tests written by developers, then to test-driven development, then to pre-commit hooks that refuse to pass untested code. Security scanning began as periodic penetration tests. It migrated to static analysis integrated into the IDE, firing before code leaves the developer’s machine. Linting began as style reviews in pull requests. It migrated to formatters that run on save.

The pattern is not coincidence. It reflects a consistent engineering principle: the cost of catching a violation grows with every stage it survives. A bug caught in unit tests costs minutes. The same bug caught in production costs days, sometimes weeks, and carries the additional cost of everything built on the assumption that it was not there.

AI governance has not yet gone through this migration. It is still operating at the post-generation stage — catching violations during pull request review, after code has been written, after it has been committed, after it has potentially influenced decisions built on top of it. That is the wrong place. And the pressure to move it earlier is now acute in a way it was not before.

Why Post-Generation Review Cannot Scale

The core argument for shifting AI governance left is straightforward. Post-generation review was calibrated for a world where the speed of code generation was bounded by human capacity. When a developer wrote a thousand lines in a day, a reviewer could reasonably be expected to examine those thousand lines at depth. The ratio of generation to review was roughly manageable.

AI-assisted development has broken the assumption that generation and review operate at comparable speeds.

A developer using AI tools generates code substantially faster than a developer writing manually. The exact multiplier varies by task type, by tool, by developer experience with the tooling — but the direction is not in dispute. Generation volume is up. Review capacity is flat. That asymmetry creates queue pressure even when teams are not yet feeling it acutely, because the queue is cumulative and the pressure compounds.

There is a second problem specific to AI-generated code that makes post-generation review more expensive, not less. When a human developer writes code, the reviewer can, in principle, ask the author what they intended. The code has a human provenance. The reasoning behind specific choices can be surfaced through conversation. When an AI tool generates code, that option disappears. The reviewer must reconstruct intent from output alone. They must determine not just whether the code does what the ticket specifies, but whether the model understood the architectural constraints governing that area of the codebase — constraints the model was never given and cannot be interrogated about.

That reconstruction is a cognitively expensive task. Performing it at scale, across a growing volume of AI-generated PRs, is not a problem that more diligent review solves. It is a structural mismatch between the volume of work and the throughput available to handle it.

Where Governance Needs to Live Instead

If post-generation review is too late, the question is: how early can governance checks move?

The answer depends on what you can make machine-readable. Testing moved left because tests could be automated — the check did not require human judgement to execute. Security scanning moved left because vulnerability patterns could be expressed in rules that static analysis could apply mechanically. Linting moved left because code style could be formalised as a rule set that a formatter could apply without human involvement.

Architectural governance is harder because architectural decisions are harder to formalise. A decision about which ORM adapter to use, or which authentication pattern to follow, or which module structure to maintain, does not have an obvious automated check. It requires understanding intent, not just syntax. The obvious response is that it therefore cannot be automated, and must remain in the human review stage.

That conclusion is partially right and mostly wrong.

It is partially right because some architectural decisions will always require human judgement to evaluate. Novel design choices, context-dependent trade-offs, decisions that depend on understanding the broader product roadmap — these are genuinely human review tasks and should remain so.

It is mostly wrong because a large class of architectural decisions can be expressed precisely enough to be enforced mechanically. “No new database adapters without an ADR” can be expressed as a dependency constraint and checked automatically. “All API endpoints must go through the versioned router” can be expressed as a structural rule and validated against the file tree. “JWT middleware must not be modified without senior review” can be expressed as a file ownership constraint. None of these require a human to evaluate. They require a rule and a check.

The current state of most codebases is that these decisions exist as prose documentation — written down somewhere, consulted rarely, not integrated into any automated tooling. Moving governance left means converting those prose decisions into machine-readable constraints and enforcing them automatically at three points in the workflow: in AI context before generation, in pre-commit checks before code is committed, and in CI before code is merged.

Shift-Left Governance in Practice

What does this actually look like as a workflow change?

The first step is making decisions machine-readable. This is the hardest step, and also the most durable investment. An architectural decision expressed as a structured record — with a scope, a severity level, and a machine-interpretable constraint — can be enforced automatically, injected into AI context, and validated against generated output. The same decision as a Confluence page can do none of those things.

The second step is injecting decisions into AI context before generation begins. When the model has access to the constraints that govern the area of the codebase it is working on, drift becomes less likely. Not impossible — models do not always follow instructions perfectly — but structurally less likely, because the model is generating code with awareness of the decisions it should be respecting. This is the generation-time layer.

The third step is enforcing before review. A CLI check run before opening a pull request — or, better, as a pre-commit hook that runs before code is committed — surfaces violations when they are cheapest to fix: immediately after generation, before the code has been seen by anyone else, before it has influenced anything downstream. The human reviewing the PR then focuses on what the automated check cannot cover: the genuinely novel decisions, the trade-offs that require context, the edge cases that no rule anticipates.

Human review is most valuable when it is focused on decisions that actually require human judgement.

That value is diluted when reviewers are spending bandwidth catching drift that a machine could have flagged. Shifting governance left does not reduce the role of human review. It concentrates it on the work that makes it irreplaceable.

The Sequencing Problem

The objection most teams raise when they first encounter this argument is that their decisions are not currently in a form that can be made machine-readable. They live in Confluence, in Slack, in the institutional memory of senior engineers. The implication is that shift-left governance requires a large up-front investment in converting existing decisions before the approach can deliver value.

That is a real cost, but it is not a reason to defer. The alternative — continuing to rely on post-generation review as generation volume grows — has a cost that compounds with every sprint. The decisions accumulated in human memory depreciate as people leave. The Slack threads become harder to find. The Confluence pages get out of date. The gap between where decisions live and where code is generated widens over time, not narrows.

The practical approach is incremental. Start with the decisions that matter most: the constraints that, if violated, create the most expensive remediation. The dependency choices, the security-adjacent patterns, the structural rules that touch the most code. Record those as machine-readable constraints first. Build the enforcement habit before attempting to capture everything.

The shift does not have to be complete before it delivers value. Even partial constraint enforcement at generation time reduces the review overhead on the decisions it covers, freeing human review capacity for everything else.

The Governance Model AI-Assisted Development Needs

The software industry is in an early period of adapting its governance models to AI tools. Most of the adaptation so far has been in the generation layer: better prompts, better context management, better model selection. The review layer has received less systematic attention. The enforcement layer, where decisions are actually applied to generated output, has received almost none.

Shift-left governance is the missing layer. It is not a complete answer to AI governance — there are aspects of architectural quality that no automated check can substitute for human judgement on. But it is the layer that enables human review to function at scale in an environment where generation volume is growing indefinitely. Without it, the governance model breaks under its own weight. With it, human review can focus where it genuinely matters.

The question is not whether to shift governance left. It is how quickly the toolchain catches up to the structural reality that generation and review no longer operate at comparable speeds.

Key Takeaways

FAQ

Why can’t post-generation review simply be done more carefully?
Review throughput is bounded by human cognitive capacity. Generation volume in AI-assisted teams is not similarly bounded. Asking reviewers to review more carefully does not increase throughput; it increases the per-review cost, accelerating the queue pressure rather than relieving it.
What architectural decisions can actually be automated?
Dependency constraints, file structure rules, pattern requirements (e.g., all new services must extend a base class), import restrictions, security-relevant file protections. The class of decisions that can be partially automated is larger than most teams assume; the goal is not to automate all decisions, but to automate the ones where the rule can be stated clearly.
How does shift-left governance interact with existing CI pipelines?
It adds a new enforcement layer alongside existing checks (linting, testing, security scanning). Constraint checks run as pre-commit hooks or CI steps, fail builds on severity-error violations, and generate reports on warnings. No existing tooling needs to be replaced.
Is this just ADRs by another name?
ADRs are the source format — a well-established practice for recording architectural decisions as structured documents. Shift-left governance adds the enforcement layer: converting ADR prose into machine-readable constraints and running them automatically. ADRs tell you what was decided. Enforcement checks whether the decision is still being respected.