The Core Finding
AI tools are remarkably good at making the change you ask for. They are structurally bad at propagating that change to every place it needs to go.
This is not just a performance limitation. It is not mainly a matter of choosing a better model or writing a better prompt. The research points to a repeatable mechanism: when a rule applies in multiple places but the instruction names only one of those places, the AI updates the named place and leaves the others untouched.
The rule is implemented where it is mentioned. It is silently absent everywhere else.
The shortest version is this:
AI binds to the scope you name.
Everything you do not name tends to remain unchanged, even when the same rule should apply there too.
In the underlying experiments, the system under test had two file-handling paths: a deletion path and an overwrite path. A new requirement — preserve manually created files — applied to both. In every code-only run that reached the cross-surface stage, the deletion path was updated and the overwrite path was not. Manual files were protected during deletion and silently destroyed during overwrite.
The fix was not “prompt better.” The fix was explicit scope. When the full set of affected surfaces was written down in a durable artifact — what the research calls a contract — the failure disappeared. When that artifact did not exist, the failure reproduced with striking consistency.
The Mechanism
To understand why this happens, it helps to distinguish between two kinds of changes.
Same-surface changes affect one thing. “Make the button blue.” “Fix the deletion logic.” These are comparatively reliable under iteration because the scope of the request and the scope of the required change are the same.
Cross-surface changes affect multiple things. “Preserve manual files” applies to deletion, overwrite, and directory reset. “Update the brand color” applies to the product, the marketing site, the email templates, and the pitch deck. “Enforce this compliance rule” applies to every workflow that touches the regulated data.
The research shows a clear stability gradient:
Change Type | Scope | AI Stability |
Same-surface, behavioral | One place, one behavior | Stable |
Same-surface, architectural | One place, structural change | Prompt-sensitive |
Cross-surface invariant | Multiple places, one rule | Fails without enumeration |
The critical point is that this is not primarily a model-quality problem. A prompt-gradient experiment showed that the failure disappears only when the prompt explicitly lists every affected surface — at which point the prompt has become structurally equivalent to a contract.
The immediate mechanism is surface-scope binding: AI changes what you mention, and only what you mention.
Why This Is Not Obvious
This failure is easy to miss for three reasons.
First, the changed surface usually works correctly. Every test of the thing you asked about passes.
Second, the unchanged surfaces still function. They keep doing what they were doing before. Nothing visibly explodes. They are simply now wrong relative to the new rule.
Third, each individual iteration can look reasonable. The drift accumulates across steps rather than announcing itself in a single dramatic failure.
This is what the research describes as authority drift: reasonable-seeming change without explicit consent, compounding across iterations until the system no longer reflects what anyone actually decided it should do.
The Research Behind This
This document summarizes findings from a published research corpus. The claims here are not presented as loose theory. They are grounded in controlled experiments with documented methodology, replication materials, and source artifacts.
The broader finding is not simply about code. Code was the experimental medium because it allowed controlled measurement. The mechanism itself — surface-scope binding — generalizes as a way of thinking about iteration wherever a rule must hold across multiple surfaces. That is why the bridge into design, operations, and governance is legitimate, even though the measured experiments were in software implementation.
Recommended reading path
Start here: Authority, Execution, and Refusal
Ten linked essays diagnosing where control actually lives in systems that execute.
The proof: Contract-Centered Iterative Stability
The experimental paper documenting the stability boundary, the failure mechanism, and the cross-run results.
The method: Contract-Centered Engineering
The practical loop for specification, derivation, evaluation, and refinement.
The mechanism: Breaking the Loop
A shorter treatment focused on why iterative AI edits produce structural drift.
Replication materials
Methodology, comparison summaries, failure-class documentation, and operational contracts used in the experiments.
The practical minimum is simple:
If a rule applies in more than one place, name every place.
The stronger structural solution is a versioned contract that preserves that scope across iteration.