Why Law Is Law-Shaped

And why that shape demands a compiler.

Elias Kunnas

I. The Structural Constraint

Law is not prose that happens to be organized in sections. It is an incrementally maintained system authored by distributed agents with partial authority over time, requiring stable fine-grained addresses for external reference.

Every element of this definition is load-bearing:

  • Incrementally maintained: Parliament cannot restate the entire legal corpus each session. Amendments modify specific provisions of existing statutes. The legal state at any moment is the accumulated result of thousands of incremental patches applied over decades or centuries.
  • Distributed agents with partial authority: Different parliaments, at different times, with different mandates, enacted different provisions. A subsection added in 2021 coexists with a section from 1995 and a chapter structure from 1972. Each retains its own authority provenance. No single entity “owns” the current text of a statute — it is a palimpsest of multiple authors across time.
  • Stable fine-grained addresses: Other laws say “pursuant to Section 12(2) of Act X.” Court decisions cite specific provisions. Contracts reference them. These are external pointers into the legal corpus. If addresses change, external references break silently. The addressing scheme must survive amendments — which is why law uses hierarchical structural paths rather than page numbers or byte offsets.

This is not a metaphor for software. Software codebases evolved the same structural constraint for the same reason: incremental modification, multiple authors, stable external references (API contracts, imports, URLs). The resemblance between git blame and statutory provenance tracing is convergent evolution from identical structural pressure.

II. The Tree Is a Serialization Format

Statutes are organized as trees: parts contain chapters, chapters contain sections, sections contain subsections. This hierarchy exists because paper is linear — a statute must be printed as a sequence of pages, and the hierarchy provides navigable structure.

But law does not operate as a tree:

  • Section 12 says “as defined in Section 4” — a cross-reference, a pointer from one node to another, often across branches.
  • Section 30 says “notwithstanding Section 15(3)” — a conditional override, an edge that modifies the meaning of a distant node.
  • A tax statute says “as specified in Regulation (EU) 2016/679 Article 4” — a cross-jurisdiction dependency, linking nodes in different legal corpora entirely.
  • An EU directive requires member state implementation — the Finnish implementing statute is a derived node whose existence was caused by an EU-level obligation.

These are graph relationships. They connect nodes across branches of the tree, across statutes, across jurisdictions. The tree structure cannot represent them — it only holds the content of each provision and its position within one statute’s hierarchy.

This separation of structure from semantics is not new. Akoma Ntoso (ISO 24679) separated the document structure (the tree) from legal analysis (references, metadata, lifecycle) in the early 2000s. ELI and FRBR provide identification frameworks. LegalRuleML encodes normative content. The contribution here is not the observation that law has graph structure — that’s established — but the argument that computing the text layer correctly (which no tool does reliably across jurisdictions) is a prerequisite for computing the semantic layer correctly.

Law is written as a tree because paper demands it. Law operates as a graph because provisions interact through references, overrides, and dependencies that ignore hierarchical boundaries. A legal state compiler operates on the tree (the text) to produce the substrate that semantic tools operate on.

III. The Amendment Is an Operation, Not an Edit

An amendment act does not say “here is the new text of Section 12.” It says “Section 12, subsection 2 is amended to read as follows.” This is a typed operation with:

  • A target address: Section 12, subsection 2
  • An action: replace (or: repeal, insert, renumber)
  • A payload: the new text
  • A source: which act, enacted when, effective when, by whose authority

The vocabulary of text-level operations is small:

Action Structural effect
Replace Node content update
Repeal Tombstone version (not deletion — see §III.1)
Insert New node at specified position
Renumber Address change, identity preserved
Text-replace Substring substitution within a leaf
Text-repeal Substring removal

This vocabulary is verified across Finnish, UK, Estonian, and EU amendment systems. The surface language differs (“muutetaan,” “muudetakse,” “is amended to read”) but the structural operations on the text are the same.

III.1: What the vocabulary does NOT cover

The text-level operation vocabulary captures how the serialized text changes. It does not capture how the legal meaning changes. Several classes of legal action operate on meaning rather than text:

Interpretive overlays. “Section 5 shall be read as if ‘the Board’ meant ‘the Council’.” The text of §5 is unchanged. Its meaning changes. This is a semantic operation, not a text operation. Common law “deeming clauses” and reading-down provisions operate this way.

Delegated legislation. An act empowers a minister to make regulations. This creates authority to produce new law — a meta-operation that generates future operations, not an operation on existing text.

Conditional applicability. “This section applies only to entities exceeding a turnover of €10M.” The text exists unconditionally; its legal effect is conditional. The condition is metadata, not a text operation.

Revivor. If a repealer is itself repealed, does the original provision revive? The answer is jurisdiction-dependent (no in some common law systems, yes in others). The tombstone model (repeal = version with null content) doesn’t inherently resolve this — it requires a policy decision at the VM level.

Immutable-base traditions. In systems where the foundational text is sacred or constitutionally entrenched (Sharia, constitutional provisions with supermajority requirements), the operation isn’t “replace” — it’s “layer an interpretation over an immutable base.” The base text cannot be patched; it can only be wrapped.

These are real phenomena. A compiler that claims to capture “what the law says” must either model them or explicitly scope itself to the text layer. LawVM takes the second approach: it compiles the text state — what each provision literally says at a point in time — and leaves the normative state (what the law means, how it applies, what obligations it creates) to downstream semantic tools. This is a deliberate separation of concerns: get the text right first, then build interpretation on a correct textual substrate. The alternative — building semantic models on unverified text — is what produced decades of legal ontology work with no reliable text layer underneath it.

IV. Multiple Time Axes

An amendment published in December 2024 may take effect on January 1, 2026. During the intervening 13 months, the amendment exists — enacted, published, legally valid — but the provisions it modifies have not yet changed for legal purposes.

Law has at least two temporal dimensions:

  1. Publication/enactment time: when the amendment was officially decided
  2. Legal effect time: when the changed provisions enter into force

These axes are independent. “What has parliament decided?” and “what is the law?” give different answers during the gap.

It gets harder:

  • A single amendment act can specify different effective dates for different provisions (§§1–5 immediately, §6 next year, §7 “when a decree so provides”).
  • Retroactive amendments change the legal effect of provisions for a past period — an amendment published today can declare that it applies from last year. This retroactively alters the legal state at historical points in time.
  • Ultra-active provisions continue to apply to past events even after repeal. A provision repealed today still governs contracts entered into while it was in force. The provision is legally dead for new events but alive for old ones.

A third operational axis exists: corpus observation time — when the compiler ingested the data. If Finlex publishes a correction, the corpus before and after differs. Reproducibility requires recording which source version was used.

Version control systems like git have one temporal dimension (commit history). Law requires at minimum two substantive dimensions plus conditional dimensions (territory, sector, contingency). “What is the law at date T?” is a multi-dimensional query, not a linear checkout.

V. The Atom Is Not What You Think

The tree structure suggests that provisions (sections, articles) are the atoms — the smallest units. In practice, the atom is whatever granularity the amendment system addresses.

Most amendments target structural nodes: replace this section, repeal this subsection. These are tree operations.

But text-level amendments go below any structural node: “In Section 12(2), the words ‘Secretary of State’ are replaced by the word ‘Minister’.” This targets a substring within a leaf node. The sentence is not a node in the tree. The amendment operates at a granularity finer than the tree’s resolution.

Two regimes:

  1. Structural operations: target a node. The tree handles these.
  2. Text operations: target content within a node. The tree represents the result (leaf text changed) but not the operation (which words and why).

Note that even a “simple” word substitution like replacing “Secretary of State” with “Minister” shifts statutory powers — it’s not just a string edit but a reallocation of legal authority. The compiler operates on the text; downstream normative analysis operates on the meaning. Both layers matter. The compiler provides the correct text so that normative analysis can be accurate.

VI. Convergent Evolution of Addressing

Any system requiring stable addressing into a hierarchically maintained structure under distributed authority converges on the same pattern: path-based identifiers.

  • Law: §12(2)(c) — path through the statute tree
  • OIDs (ITU-T/ISO): 1.3.6.1.4.1.311 — path through the global object tree
  • Xanadu tumblers (Nelson, 1960s): hierarchical address down to character granularity
  • ELI (European Legislation Identifier): /eli/fi/sd/2002/738/...
  • FRBR (Functional Requirements for Bibliographic Records): Work → Expression → Manifestation → Item hierarchy
  • Filesystem paths: /usr/local/bin/lawvm

These are all the same abstraction: a tuple of (kind, label) pairs specifying a traversal through a hierarchy.

The convergence is forced by the structural constraint: when multiple agents independently modify parts of a hierarchical structure, and external systems reference specific parts by address, the address must encode the path through the hierarchy.

Path-based addresses have a known weakness: they are positional. Insertion of §11a between §11 and §12 doesn’t break §12’s path, but renumbering (§12 becomes §13) does. This is why renumber is an explicit operation in the vocabulary — it records address changes so that external references can be updated. FRBR and ELI handle this through abstraction levels (Work-level identity survives Expression-level renumbering). LawVM’s ProvisionTimeline preserves identity through renumber events — the same timeline, new address.

Ted Nelson’s Xanadu anticipated this in the 1960s. His tumblers addressed any content at any granularity, and transclusion meant references were live links. Legal cross-references (“as defined in §4”) are exactly Nelson’s transclusions — without the live-update mechanism. When §4 is amended, every provision referencing it changes meaning silently. The legal system has unversioned, untyped, silently-breaking transclusions.

VII. Authority Models Vary

Different jurisdictions assign different legal authority to consolidated texts, which changes the compiler’s role:

Finland: Finlex consolidated texts are “informational” — not legally binding. The “real” law is the original statute plus all amendment acts. If Finlex’s consolidation has a bug, that’s editorial. Nobody is legally responsible for the computed state of law. The compiler serves as an independent oracle — replaying amendments to find bugs in the editorial consolidation.

Estonia: Riigi Teataja consolidated text IS authoritative law. If there is a discrepancy between amendment acts and consolidation, the consolidation wins. The compiler serves as a consistency verifier for binding law — divergences are legally significant findings, not editorial footnotes.

United Kingdom: legislation.gov.uk publishes versioned texts with effect metadata. The compiler serves as independent verification of the official version graph.

The structural operations are the same. The social function of finding a divergence differs: editorial error (Finland), legal inconsistency (Estonia), version graph bug (UK).

VIII. Why Nobody Built It

The problem is well-defined: parse amendment acts, extract operations, compile against prior state, produce consolidated text with full provenance and temporal versioning. It is a compiler problem.

The closest prior work:

  • Akoma Ntoso / ELI / FRBR: document standards and identification frameworks — the schema layer. They define how legal documents should be structured and identified, but don’t compile amendments.
  • Semantic Finlex / LawSampo (Aalto/SECO): linked data, semantic search, faceted browsing over Finnish legislation. World-class ontology work. No amendment replay, no temporal compilation, no independent verification. Code not published.
  • Graphie project (King’s College London): network analysis of UK legislation. Structural metrics. No budget weighting, no compilation.
  • LegalRuleML: encoding normative content as rules. Operates on the semantic layer, not the text layer.

Each addresses a piece. None compiles the text layer reliably across jurisdictions — producing consolidated statute text from amendment chains with provenance and temporal versioning. Commercial systems (Xcential, Propylon) handle parts of this in production contexts; official systems (legislation.gov.uk) maintain versioned editorial workflows. But the open, cross-jurisdiction, proof-bearing replay approach remains underexplored. This gap exists because the problem spans computer science (compilers, VCS, graph theory), legal informatics (amendment semantics, temporal logic), public administration (authority, publication), and linguistics (parsing amendment language). Each discipline sees its part. The tool requires all four.

IX. Therefore LawVM

LawVM is shaped the way it is because law is shaped the way it is:

  • A tree parser (IRNode) because law is stored as hierarchical trees
  • Typed operations (LegalOperation) because amendments are structured operations, not prose edits
  • Path-based addressing (LegalAddress) because stable fine-grained addressing is forced by the structural constraint
  • A graph model (ProvisionTimeline + operation edges) because law operates as a graph despite being serialized as trees
  • Multiple temporal axes because law has irreducibly multi-dimensional time
  • Scope predicates because applicability varies along dimensions orthogonal to time and structure
  • Jurisdiction-agnostic core because the structural operations are universal even though surface languages and authority models differ
  • Explicit text/semantics boundary because compiling the text correctly is prerequisite to interpreting it correctly — and because the text layer is computationally tractable while the normative layer requires interpretation

The end state is a semantic version control system for law: a graph where nodes are provision versions, edges are legislative operations, and point-in-time materialization, cross-date diff, provision lineage, and cross-jurisdiction dependency tracking are queries over the graph.

Not a replacement for legal ontologies or normative reasoning systems. The substrate they need. A correct, verified, temporally versioned text layer that semantic tools can build on without first having to solve the compilation problem themselves.

Built because the structural constraints of law demand it, and the open tooling to do it reliably did not yet exist.


This essay explains the structural constraints behind LawVM, an open-source replay compiler for amendment-driven law. See the Finland showcase for empirical evidence, or the repository.