Probabilistic Systems Engineering

Replication v0.7 — Draft

1. Purpose

This document defines the replication instruction artifact for rerunning the experiment.

Its purpose is to tell an operator or AI agent how to execute a valid replication of the methodology against a shared baseline system, shared governing artifact, and shared delta set.

This document is not a findings summary, benchmark report, or theory document. Execution of this document produces replicated runs, replication results, and later comparison artifacts.

The goal is to test whether a workflow that updates a written contract preserves a cross-surface invariant more reliably than a workflow that modifies code directly from conversational instructions.

2. Minimum Shape

A valid replication requires:

The same-surface change is calibration.

The cross-surface change is the primary target.

3. Two Tracks

Spec-First

For each delta:

  1. Update the written contract for the change.
  2. Update the code to conform.
  3. Evaluate against the shared invariant under test using the agreed validation method.
  4. Save the code and results.

Code-Only

For each delta:

  1. State the change conversationally.
  2. Update the code directly.
  3. Do not update the contract artifact.
  4. Evaluate against the same shared invariant under test using the same validation method.
  5. Save the code and results.

The asymmetry is intentional and under test.

4. Surface Requirement

The replication must include both of these change classes.

Same-Surface Change

A same-surface change is one where the invariant can be satisfied within the already-targeted mutation surface.

Cross-Surface Change

A cross-surface change is one where the invariant must hold across multiple independently reachable mutation surfaces affecting the same governed outcome.

That is the core requirement for replication.

A valid cross-surface replication must make it possible for:

5. Evaluation Rule

The workflow under test is not the oracle.

The code-only path may not be judged by treating its resulting implementation as proof of what should have been true. Both tracks must be evaluated against the same invariant under test for each delta.

The invariant under test may come from business rules, operational obligations, or system-level correctness requirements. It is not derived from whatever final code shape emerges.

This matters most for cross-surface changes, where the prompt may name one affected path while the required invariant applies across several.

6. Decision-Surface Rule

Validation must be judged at the Decision Surface.

A replication result matters only if it changes an externally observable decision or state transition, such as:

Wording differences, formatting differences, byte-shape differences, or implementation-style differences do not count as instability unless they alter the Decision Surface.

7. What the Agent Must Be Given

To run the test, the agent must receive:

If you want the agent to generate the exact delta wording itself, say so explicitly.

If you want to hand it the delta wording, hand it the wording.

8. Minimum Recorded Outputs

For each track and each change, record:

For the cross-surface change, also record:

Minimum outputs should be sufficient to support:

9. What Counts as Supporting the Finding

The replication supports the finding if:

10. What Counts as Weakening the Finding

The replication weakens the finding if:

11. Relationship to Other Artifacts

This document should be read with the accompanying glossary, thesis, and methodology.

Those documents define:

This document does not replace them. It tells an operator how to rerun the experiment in conformance with them.

12. Worked Example

Assume a system has a rule that manually created files must be preserved unless explicitly targeted for removal.

A valid replication must make it possible for:

In that case:

Read next

Related

Verification & replication