Start Here | Probabilistic Systems Engineering

The Core Finding

AI tools are remarkably good at making the change you ask for. They are structurally bad at propagating that change to every place it needs to go.

This is not just a performance limitation. It is not mainly a matter of choosing a better model or writing a better prompt. The research points to a repeatable mechanism: when a rule applies in multiple places but the instruction names only one of those places, the AI updates the named place and leaves the others untouched.

The rule is implemented where it is mentioned. It is silently absent everywhere else.

The shortest version is this:

AI binds to the scope you name.

Everything you do not name tends to remain unchanged, even when the same rule should apply there too.

In the underlying experiments, the system under test had two file-handling paths: a deletion path and an overwrite path. A new requirement — preserve manually created files — applied to both. In every code-only run that reached the cross-surface stage, the deletion path was updated and the overwrite path was not. Manual files were protected during deletion and silently destroyed during overwrite.

The fix was not “prompt better.” The fix was explicit scope. When the full set of affected surfaces was written down in a durable artifact — what the research calls a contract — the failure disappeared. When that artifact did not exist, the failure reproduced with striking consistency.

The Mechanism

To understand why this happens, it helps to distinguish between two kinds of changes.

Same-surface changes affect one thing. “Make the button blue.” “Fix the deletion logic.” These are comparatively reliable under iteration because the scope of the request and the scope of the required change are the same.

Cross-surface changes affect multiple things. “Preserve manual files” applies to deletion, overwrite, and directory reset. “Update the brand color” applies to the product, the marketing site, the email templates, and the pitch deck. “Enforce this compliance rule” applies to every workflow that touches the regulated data.

The research shows a clear stability gradient:

Change Type	Scope	AI Stability
Same-surface, behavioral	One place, one behavior	Stable
Same-surface, architectural	One place, structural change	Prompt-sensitive
Cross-surface invariant	Multiple places, one rule	Fails without enumeration

The critical point is that this is not primarily a model-quality problem. A prompt-gradient experiment showed that the failure disappears only when the prompt explicitly lists every affected surface — at which point the prompt has become structurally equivalent to a contract.

The immediate mechanism is surface-scope binding: AI changes what you mention, and only what you mention.

Why This Is Not Obvious

This failure is easy to miss for three reasons.

First, the changed surface usually works correctly. Every test of the thing you asked about passes.

Second, the unchanged surfaces still function. They keep doing what they were doing before. Nothing visibly explodes. They are simply now wrong relative to the new rule.

Third, each individual iteration can look reasonable. The drift accumulates across steps rather than announcing itself in a single dramatic failure.

This is what the research describes as authority drift: reasonable-seeming change without explicit consent, compounding across iterations until the system no longer reflects what anyone actually decided it should do.

The Research Behind This

This document summarizes findings from a published research corpus. The claims here are not presented as loose theory. They are grounded in controlled experiments with documented methodology, replication materials, and source artifacts.

The broader finding is not simply about code. Code was the experimental medium because it allowed controlled measurement. The mechanism itself — surface-scope binding — generalizes as a way of thinking about iteration wherever a rule must hold across multiple surfaces. That is why the bridge into design, operations, and governance is legitimate, even though the measured experiments were in software implementation.

The Core Finding

The Mechanism

Why This Is Not Obvious

The Research Behind This

Recommended reading path

Continue

Referenced artifacts