Architecture
A compiler for hostile, underspecified legal deltas.
LawVM's architecture is not chosen by taste. It is forced by the structural properties of law itself. The essay derives the necessity; this page describes the result.
The compiler model
A jurisdiction frontend is a phased compiler with explicit contracts:
- Acquire and archive source artifacts
- Parse amendment text into a clause surface
- Extract payloads and normalize source-locally
- Elaborate against live legal state (snapshot-pure)
- Lower to canonical typed operations
- Replay over a base state
- Materialize point-in-time text
- Adjudicate against oracle or witness surfaces
The pipeline separates lowering, target resolution, replay, and divergence accounting so that mismatches are inspectable rather than collapsed into a single opaque failure state.
Two planes
LawVM operates on two simultaneous planes:
Semantic plane: source artifacts → clause surface → payload surface → elaborated intent → canonical effects → timelines → PIT materialization. This is the path from raw legal text to point-in-time state.
Epistemic plane: parse witnesses → observations → obligations → adjudications → claims → evidence bundle. This is the path that records why the result should be trusted — what was observed, what was inferred, what was recovered, and what remains unresolved.
Both planes run together. A replay result without its epistemic trail is not a LawVM result.
Three hard waists
The architecture has three stable interfaces that must not be bypassed:
- Clause surface — the first stable representation of amendment meaning. A typed AST for amendment instruction language.
- Payload surface — the amendment body after source-local normalization, before live-state-dependent meaning recovery. This is where the source text stops being raw and starts being structured, but meaning recovery against the current statute state has not yet happened.
- Canonical execution — replay consumes only typed canonical execution artifacts, not raw amendment XML. No unresolved meaning crosses this boundary.
Strict mode and quirks mode
LawVM serves two worlds:
Quirks mode is for the historical corpus. Real legislative text is full of omitted context, editorial shortcuts, inconsistent numbering, source encoding oddities, and amendments that only make sense against a specific live consolidated witness. Quirks mode uses recovery heuristics — but marks every recovery path with provenance. It never pretends inferred structure was explicit in source.
Strict mode is for a future where law is authored to compile cleanly. Every amendment is structurally unambiguous, every target is explicitly addressable, every action is typed, every temporal effect is explicit. Strict mode forbids: target guessing, hidden insertion anchoring, fallback whole-section replacement, ambiguous omission expansion, silent date estimation.
The endgame is not "replace legal prose with code." It is: law remains human-readable, but official publication also emits canonical machine-readable state/change artifacts alongside the human text. Strict mode is the compilation target for that future. Quirks mode is the recovery compiler for the past.
Frontend / kernel boundary
The shared kernel is jurisdiction-agnostic: canonical legal-address and tree model, operation vocabulary, replay execution, timeline semantics, materialization, structural invariants.
Frontends are jurisdiction-local: source acquisition, parsing conventions, drafting idioms, payload extraction, elaboration rules, source pathology, oracle comparison.
The important design question is never "can we extract something useful?" It is: what is the smallest honest executable claim for this jurisdiction, and what source family makes that claim defensible?
Beyond Layer 0
LawVM is deliberately narrow. It computes what the legal text says at a point in time. It does not compute what the law means, how it is applied in practice, or what it costs. Those are higher layers:
| Layer | Question | Scope |
|---|---|---|
| L0: LawVM | What does the text say? | Text-state compilation, provenance, timelines |
| L1: Legal views | Which view to run? | Territorial, commencement, transitional overlays |
| L2: Interpretation | What do authorities say it means? | Court holdings, guidance, doctrine |
| L3: Praxis | How is it actually applied? | Enforcement, institutional behavior |
| L4: Reasoning | What follows for this fact pattern? | Compliance, simulation, argument |
| L5: Products | What can users do? | Search, Q&A, drafting assistants |
Upper layers attach claims to L0 anchors without mutating the text-state kernel. LawVM is designed as a substrate: stable identities, span-level anchoring, explicit provenance, overlay hooks.
Downstream examples: Lakikartta joins the legal graph to budget data (92k statutes, 500B€ budget weights, PageRank/Katz/DebtRank centrality). MeV mechanism tests analyze whether government bills' mechanisms produce their stated goals. These are separate projects that demonstrate what becomes possible once L0 text-state compilation is reliable.