Probabilistic Systems Engineering

Contract-Centered Iterative Stability v4.7.2


Abstract

This paper identifies a stability boundary in iterative AI-assisted software development.

Across five structured experimental executions spanning two LLM families (Claude Opus 4.6 and GPT-5.2), same-surface changes remained stable under iteration, while cross-surface invariants repeatedly failed to propagate unless their full scope was made explicit.

At Stage D, code-only workflows consistently modified the deletion path while leaving the overwrite/re-export path untouched, producing the same structural failure: manual files were preserved during deletion but destroyed during overwrite. Spec-first workflows updated all affected surfaces and passed.

A prompt gradient experiment shows that the failure disappears only when prompts explicitly encode full invariant scope — i.e., when the prompt becomes structurally equivalent to a contract.

The core finding is not a performance comparison between models. It is a repeatable mechanism: when an invariant spans multiple mutation surfaces, iterative prompts bind to the surfaces they name.


1. Thesis Under Test

When implementation cost collapses under AI assistance, iteration accelerates.

The structural question:
Does conversational iteration without invariant enumeration introduce predictable invariant-scope drift?

Falsifiable claim:
Workflows that externalize authority into explicit, versioned contract invariants exhibit measurably lower regression and drift than workflows that modify code directly via conversational prompts.

Counter-claim:
If code-only workflows exhibit equivalent invariant preservation rates across tightening deltas, the contract-first hypothesis weakens.

The experiment does not ask whether a model can implement a requirement once. It asks whether invariant guarantees survive sequential tightening without explicit restatement.

In the code-only track, authority lives in the prompt, not in a durable artifact. At Stage C or D, the only stable yardstick remains the original v2.6.3 contract. Prompt-based deltas do not accumulate into an evolving reference. In the spec-first track, the yardstick itself evolves with each versioned contract patch.


2. Experimental System

2.1 Baseline Artifact

System under test:
artifact-sync v2.6.3 (verified baseline)


convergence-contract-v2.6.3

convergence-contract-v2.6.3

Baseline properties:

Baseline harness status:


2.2 Delta Chain

Four sequential tightening deltas:

Stage A — Baseline
No change.

Stage B — ΔB (Deletion Tightening)
If deletions are disabled and stale units exist:
Run MUST fail (exit 1).
Topology: same-surface behavioral.

Stage C — ΔC (Atomic Convergence)
Convergence MUST be atomic.
No partial state permitted.
All writes and deletes must be deferred and committed as a single operation.
Topology: same-surface architectural.

Stage D — ΔD (Manual File Preservation)
Manual (non-managed) files inside managed unit directories MUST be preserved under ALL operations:

  1. Stale deletion path
  2. Overwrite / re-export path
  3. Directory reset logic
    Topology: cross-surface invariant.

2.3 Two Tracks Per Run

Spec-first
Each delta encoded in a versioned contract patch (v2.7 ΔB, v2.8 ΔC, v2.9 ΔD). Implementation derived from contract.

Code-only
Plain-English conversational prompt. No contract artifact updated. No persistent invariant enumeration.


3. Experimental Phases

The experiment proceeded in structured phases rather than identical linear runs.


Phase I — Architectural Tightening Exploration (ΔC)

Purpose: determine whether same-surface architectural tightening remains stable under conversational prompting.

Result:

Observed failure mechanism (code-only, one run):

Spec-first track restructured the full pipeline.

Interpretation:
ΔC requires inference about the full mutation surface of convergence. When scope language is explicit, code-only can succeed. When scope language is implicit, surface coverage may be incomplete.


Phase II — Cross-Surface Invariant Replication (ΔD)

Purpose: test whether conversational iteration propagates a cross-surface invariant.

Across all runs reaching Stage D:

Failure mechanism was identical across:

The overwrite path (write_unit_atomically()) retained shutil.rmtree(old) behavior in code-only runs, destroying manual files during re-export.


Prior-Outcome Awareness Test

In one run, the agent was aware of prior ΔD failures. Despite awareness:

This suggests the failure is not resolved by awareness of the prior outcome. The constraint is structural: prompts bind to the scope they name.


Phase II-B — Placement Sensitivity Testing

Variants executed:

Observed pattern: regardless of injection position:

Conclusion: ΔD failure is topology-dependent, not depth-dependent.


4. Invariant Topology Gradient

Observed stability gradient:

Delta

Topology

Code-only Stability

ΔB

Same-surface behavioral

Stable

ΔC

Same-surface architectural

Prompt-sensitive

ΔD

Cross-surface invariant

Structurally incomplete without enumeration

Failure probability increases with required invariant-scope inference:


5. Prompt Gradient Threshold Experiment

Independent P1 gradient (P1-a → P1-e) progressively increased specificity.

Result:


5.1 Mechanism Clarification — Enumeration vs Externalization

The experiment isolates invariant enumeration as the immediate mechanism preventing drift. A sufficiently explicit prompt can succeed.

However, conversational prompts naturally scope work to mentioned surfaces. Clause-structured contracts require global invariant expression and force enumeration of affected mutation surfaces.

Thus:


Concrete Illustration — Contract Surface Enumeration

Consider §6.3 Atomic Write from convergence-contract-v2.6.3:


Exporter MUST:

  1. Build full Derived Unit in a temporary directory.
  2. Validate Representation Invariants.
  3. Write complete meta.json.
  4. Atomically rename into place.

In v2.9 (ΔD), the contract patch adds:


Manual (non-managed) files within the target directory MUST be preserved prior to atomic rename. This applies to:

This patch cannot be written without enumerating which sections of the contract mutate directory state. The act of writing the invariant forces reconciliation with every clause that touches directory contents.


Mechanism Walkthrough

In the code-only track, the Stage D prompt typically states:
“Preserve manual files during deletion.”

The model therefore modifies the deletion path. The overwrite/re-export path (§6.3 Atomic Write) is not mentioned and remains unchanged. Manual files are preserved during deletion but destroyed during overwrite via shutil.rmtree(old).

The failure is not incorrect implementation. It is incomplete invariant propagation across mutation surfaces.

In the spec-first track, ΔD is expressed as a global invariant. That invariant must reconcile with all clauses that mutate directory state. The implementer must therefore examine both §7.3 (deletion) and §6.3 (atomic write), forcing modification of both paths.

The mechanism is surface-scope binding: prompts bind to mentioned surfaces; contracts bind to enumerated invariant scope.


6. Analogous Scope Behavior in a Different Task Domain

The following observation is qualitative and not part of the controlled artifact-sync experiment. It documents analogous scope-binding behavior observed while drafting and restructuring this paper using conversational AI. The purpose is illustrative, not evidentiary.

During drafting and restructuring tasks:

Mechanism mirrors ΔD behavior: scope-bound mutation expands unless constrained by an explicit boundary.


7. Operational Definition of Authority Structure

In this experiment, authority structure refers to the artifact that determines what scope is treated as in-bounds during implementation.

Authority structure determines whether invariant topology must be re-derived at each conversational step or is persistently encoded.

The experiment demonstrates that when cross-surface invariant scope is not persistently encoded, propagation failure is structurally predictable.


8. Limitations

This experiment evaluates iterative stability within controlled runs. The broader persistence advantage of contracts — that versioned invariants accumulate into a durable evolving yardstick across temporally separated sessions while prompts do not — is demonstrated within runs but not empirically tested across independent, time-separated sessions.

This paper demonstrates mechanism replication, not statistical generalization.


9. Conclusion

Same-surface behavioral changes remain stable under conversational mutation. Architectural tightening is prompt-sensitive. Cross-surface invariants drift predictably when invariant topology is not explicitly enumerated.

Stopping drift requires either:

The differentiator is whether invariant scope is explicitly enumerated and persistently authoritative.


10. Practical Implications

This experiment isolates a specific boundary in AI-assisted iteration: when a change affects more than one code path, and the prompt names only one of them, only that named path should be expected to change.

In Stage D, the requirement was to preserve manual files. In every code-only run:

Practical takeaway:

If a change applies in multiple places, name all of those places explicitly.

Instead of:

“Preserve manual files during deletion.”

Write:

“Preserve manual files during deletion and during overwrite/re-export (and any directory reset path).”

In this study, once prompts explicitly named all affected paths, the propagation gap disappeared.

Version status

Version status

Referenced artifacts

Read next

Related

Verification & replication