Probabilistic Systems Engineering Protocols & Contracts
Probabilistic Systems Engineering

Publishing Repos Operational Semantics Contract v0.11.0

1. Purpose

This contract defines the operational semantics for the Probabilistic Systems Engineering publishing repository.

Its purpose is to transform committed publishing inputs into a deployed public research/library site while preserving explicit authority over:

This contract governs the publishing and deployment repository only.

Canonical prose-authoring authority remains outside this repository.

2. Scope

This contract applies to:

This contract covers:

This contract does not yet cover:

3. Repository Roles

3.1 Canonical Authoring Source

Canonical document authority remains outside this repository.

The repository SHALL NOT be treated as the prose authoring source of truth.

3.2 Repository Role

The repository is the publishing and deployment system.

It stores:

3.3 Generated Output

Generated site output is derived state only.

dist/ SHALL be treated as build output.

Generated HTML, extracted assets, derived metadata artifacts, recommendation artifacts, and other dist/ contents SHALL NOT be committed back into git as part of normal publishing behavior.

4. Source Artifact Model

4.1 Incoming Roots

Committed governed source roots SHALL be:

4.2 Publish-Unit Roots vs Asset Roots

incoming/authority/, incoming/papers/, incoming/contracts/, and incoming/replication/ are publish-unit roots.

incoming/assets/ is not a publish-unit root. It is a committed static-asset source copied into dist/assets/.

Files under incoming/assets/ SHALL NOT be interpreted as candidate publish units.

4.3 Grouping Folders

Grouping folders MAY exist at arbitrary depth under a publish-unit root.

Grouping folders are organizational only.

A grouping folder SHALL NOT itself be treated as a publishable unit unless it independently satisfies §4.4.

4.4 Renderable Publish Unit

A renderable publish unit is a directory under a publish-unit root that contains:

A candidate directory that does not satisfy both requirements SHALL NOT be published.

For a candidate publish-unit directory:

4.5 Authority Publish Unit

An authority publish unit remains one publish unit even when it emits:

Derived authority essay pages SHALL NOT be treated as separately discovered incoming publish units.

4.6 Publish-Unit Identity

The publish-unit identity SHALL be its relative path from the incoming type root.

Examples:

The full relative slug path is authoritative for output routing and grouping.

4.7 Display Label

Default display label SHALL be the PDF stem unless a more specific governed rule overrides it for a derived authority essay page.

No sidecar authoring metadata file is required by this contract.

5. Discovery Semantics

5.1 Traversal

Discovery SHALL traverse each governed incoming publish-unit root recursively.

Traversal order SHALL NOT affect governed output semantics.

5.2 Candidate Directory Rule

If a directory under a publish-unit root contains any governed publish artifacts (*.pdf or *.zip), that directory MUST be treated as a candidate publish-unit directory.

5.3 Qualification and Refusal

Qualification and refusal SHALL follow §4.4.

The system SHALL fail explicitly rather than guess intended grouping, parentage, or primary content shape.

5.4 Traversal Termination

If a directory qualifies as a publish unit, traversal below that directory SHALL terminate.

Child directories beneath a qualified publish unit SHALL NOT be independently discovered or published.

6. HTML ZIP Handling

6.1 Accepted Export Shape

The system SHALL accept Google Docs HTML exports packaged as ZIP files.

6.2 Extraction Rule

For each qualified publish unit, the ZIP SHALL be extracted into a temporary working directory.

6.3 Main HTML Detection

After extraction:

The system SHALL NOT guess a primary HTML file.

6.4 Asset Preservation

Assets required for correct rendered output SHALL be preserved in published output.

6.5 Authority Collection Projection

For authority publish units:

6.6 Projection Refusal Boundary

The system SHALL fail rather than guess when ambiguity affects:

The absence of child-essay projection alone does not require failure if the collection landing page remains deterministically renderable.

7. Normalization Semantics

7.1 Head and Metadata Normalization

Rendered pages SHALL normalize or inject:

7.2 Exported Style Preservation

Exported Google Docs <style> blocks MAY be preserved where required for text fidelity.

7.3 Empty Paragraph Cleanup

Paragraphs containing no meaningful text and no structural content SHALL be removed.

Paragraphs containing structural content such as images, tables, SVG, or rules SHALL NOT be treated as empty.

7.4 Generic Structural Cleanup

The system MAY remove or normalize generic Google Docs export leakage where that cleanup does not alter governed reading semantics.

7.5 Canonical Rendered Title

Each rendered page SHALL expose one canonical rendered title.

7.6 Competing Title Cleanup

Where exported HTML contains competing title-like wrappers that clearly represent the same document title, the system MAY canonicalize them down to one rendered title.

The system SHALL prefer omission over duplicate title noise.

7.7 Authority Essay Title Authority

For derived authority essay pages:

7.8 Duplicate Wrapper Removal for Authority Essays

For derived authority essay pages only:

7.9 Structural Classification

The system MAY classify normalized blocks into governed presentation classes such as lead-in, callout, compact paragraph, and similar presentational categories where implemented.

This contract governs the output behavior, not one mandatory internal classifier implementation.

8. Homepage Semantics

8.1 Homepage Role

The homepage is a governed public entry surface for the published site.

It SHALL reflect the current site role as a research/library surface rather than only an archive landing page.

8.2 Required Homepage Contents

The homepage SHALL include:

8.3 Homepage Role Hierarchy

Homepage ordering SHALL reflect role hierarchy, not only box order.

The primary path SHALL emphasize:

Supporting surfaces SHALL include:

The browse/library path SHALL expose public browsing into the larger corpus.

8.4 Homepage Reachability

Every homepage entry MUST expose a valid primary navigation target.

8.5 Homepage Proof Entry

The homepage secondary proof/papers CTA SHALL resolve deterministically to a designated proof-entry family or designated proof-entry artifact as defined by implementation under this contract.

That choice SHALL NOT be left as silent hardcoded version-specific authority without explicit designation.

If no qualifying proof-entry target exists, the CTA SHALL be suppressed rather than dead-link.

9. Listing Semantics

9.1 Listing Surfaces

The system SHALL emit:

9.2 Listing Sections

Listing surfaces SHALL expose:

9.3 Deterministic Listing Order

Section order, group order, family order, and item order SHALL be deterministic and SHALL respect current version/lineage rules.

9.4 Authority Collection Listing Integrity

Listing surfaces MAY expose derived authority essay links beneath their parent authority collection entry.

Authority derived essays SHALL NOT be flattened into independent peer top-level list items when they are already being represented via their parent collection entry.

9.5 Latest Visibility

latest SHALL:

9.6 Archive Visibility

archive SHALL:

10. Structured Metadata Semantics

10.1 Per-Rendered-Document Metadata

Every rendered HTML document SHALL emit required derived metadata and page-level structured metadata.

10.2 Site Metadata Index

dist/metadata/documents.json SHALL contain one metadata entry per rendered document.

10.3 Authority Collection Metadata

For authority collection landing pages, metadata MAY include:

10.4 Derived Authority Essay Metadata

For derived authority essay pages, metadata MAY include:

Such metadata is implementation-supporting derived metadata, not sidecar authoring metadata.

10.5 Metadata Completeness Boundary

Only rendered HTML documents are required members of rendered-document metadata completeness under this contract.

PDF-only contract entries are not rendered HTML documents and are excluded unless a future contract version explicitly changes that rule.

11. Discovery and Recommendation Semantics

11.1 Distinct Discovery Surfaces

When emitted, the following surfaces SHALL remain distinct and labeled:

11.2 Recommendation Computation Timing

Recommendations SHALL be computed at build/publish time over the full rendered document corpus.

11.3 Candidate Generation Latitude

Candidate generation MAY use corpus-wide textual similarity over cleaned rendered document text.

Metadata MAY be used for reranking or policy shaping.

This contract does not freeze one specific scoring formula or threshold implementation beyond the governed outcomes below.

11.4 Hard Exclusions

Normal recommendation/discovery surfaces SHALL exclude, where applicable:

11.5 Reference Conservatism

Referenced-artifact detection MUST prefer omission over weak or ambiguous matches.

11.6 Related-Docs Conservatism

Heuristic related-document recommendations MUST be thresholded, suppressible, and conservative.

False positives are worse than no recommendation.

11.7 Authority Essay Read-Next Precedence

For derived authority essay pages:

This clause governs Read next precedence only. It does not require blanket suppression of other discovery surfaces when they otherwise qualify.

12. Version and Lineage Semantics

12.1 Slug-Family Authority

Version-family derivation MAY only use the final slug segment and terminal version suffix pattern.

12.2 Stable Historical Reachability

Versioned artifact URLs SHALL remain stable even when newer versions exist.

12.3 Latest-Only Main Visibility

When multiple versions exist in the same slug family, only the latest version appears in latest-oriented primary listing contexts, subject to family rules.

12.4 Authority Child Lineage

Authority collection child essay pages are not independent version families derived from essay titles.

They inherit collection lineage from the parent publish unit.

12.5 No Cross-Collection Latest Collapse for Child Essays

Collection child essays SHALL NOT be independently latest-collapsed across unrelated parent collections.

13. Document Navigation Semantics

13.1 Depth-Correct Navigation

Per-document home navigation MUST resolve correctly for the document’s relative slug depth.

13.2 Collection-Relative Navigation

Authority essay navigation SHALL be ordered and collection-relative.

13.3 Collection Back-Link

Derived authority essay pages SHALL expose navigation back to the parent collection landing page.

13.4 Previous/Next Essay Navigation

When projection succeeds and adjacent essays exist, derived authority essay pages SHALL preserve deterministic previous/next navigation according to collection order.

13.5 Collection Landing Essay List

Authority collection landing pages MAY render an ordered essay list when derived essay pages exist.

13.6 PDF Navigation for Authority Collections

Per-document PDF navigation for authority collection-derived pages SHALL remain rooted in the collection output directory as implemented under this contract.

14. PDF-Only Contract Entry Semantics

14.1 Supported PDF-Only Mode

PDF-only contract entries stored under contracts/ are a supported contract publication form.

14.2 Scope Restriction

PDF-only contract entry mode is supported only for contracts.

This mode SHALL NOT be generalized by this contract to papers, authority collections, or replication materials.

14.3 PDF-Only Entry Behavior

A PDF-only contract entry:

14.4 Precedence

If both a rendered incoming contract and a PDF-only contract entry exist for the same slug:

14.5 Rendered-Surface Exclusion

PDF-only contract entries SHALL NOT be treated as rendered documents for:

unless a future contract version explicitly adds such support.

15. Static Asset Semantics

15.1 Static Asset Copy

Committed assets under incoming/assets/ SHALL be copied to dist/assets/.

15.2 Non-Publish-Unit Rule

Static assets SHALL NOT be treated as publish units.

15.3 Preservation

Static asset copy semantics SHALL preserve relative asset reachability required by governed rendered output.

16. Sitemap Semantics

16.1 Sitemap Output

The system SHALL emit dist/sitemap.xml.

16.2 HTML Reachability Constraint

Sitemap entries MUST resolve to valid published HTML pages.

16.3 Completeness Boundary

This contract requires sitemap validity for included entries.

This contract does not require every non-listed reachable artifact class to appear in sitemap unless otherwise stated by a future version.

17. Build Manifest and Published State

17.1 Build Manifest

Successful builds MUST emit dist/build.json.

17.2 Manifest Sufficiency

The manifest MUST be sufficient for live published-state observability and drift comparison.

17.3 Recommendation Artifact

When discovery/recommendation emission is enabled, successful builds SHALL emit dist/metadata/recommendations.json.

18. Deployment and Reconciliation

18.1 Deployment Source

Deployment SHALL publish from dist/.

18.2 Reconciliation Role

Scheduled reconciliation MAY republish generated site output but MUST NOT mutate repository source artifacts.

18.3 Drift Detection

Scheduled reconciliation MUST determine drift from live manifest readability and manifest/source identity sufficient to compare current published state to the current source state for the run.

18.4 Self-Heal

When governed drift is detected, reconciliation MAY republish generated output to restore intended published state.

19. Invariants

20. Non-Goals

This contract does not require:

21. Acceptance Criteria

This contract is satisfied when the system can:

  1. consume incoming inputs under incoming/authority/, incoming/papers/, incoming/contracts/, and incoming/replication/
  2. copy static assets from incoming/assets/ to dist/assets/
  3. qualify only candidate directories containing exactly one PDF and one ZIP
  4. terminate traversal below qualified publish units
  5. extract and normalize Google Docs HTML export content
  6. fail rather than guess when no unique main HTML file exists
  7. canonicalize duplicate title wrappers conservatively
  8. generate collection landing pages for authority publish units
  9. generate derived authority essay pages only when deterministic section boundaries are available
  10. refrain from guessed child-essay projection when deterministic section boundaries are unavailable
  11. generate rendered pages under dist/authority/..., dist/papers/..., dist/contracts/..., and dist/replication/...
  12. generate homepage, latest, and archive surfaces consistent with the governed role hierarchy
  13. keep authority child essays nested under their parent collection in listing contexts rather than flattening them into peer top-level items
  14. emit per-document structured metadata and site-level metadata index artifacts
  15. emit dist/metadata/recommendations.json when recommendation/discovery emission is enabled
  16. emit sitemap.xml
  17. emit build.json with required published-state fields
  18. detect explicit references conservatively
  19. emit distinct discovery/recommendation surfaces at publish time
  20. preserve old versioned URLs while showing only latest versions in latest-oriented listing contexts subject to lineage rules
  21. resolve collection and per-document navigation correctly
  22. support PDF-only contract entries under contracts/ without implying rendered HTML pages
  23. enforce rendered incoming contract precedence over same-slug PDF-only contract entries
  24. deploy from dist/
  25. detect governed drift and republish on drift
  26. avoid mutating source artifacts during reconciliation

Older versions

Read next

Related