cssbruno · cssbruno · Jun 11, 2026 · Jun 11, 2026
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -7,6 +7,33 @@ packages, and tooling contracts may change before a stable release.
 
 ### Changed
 
+- Conflict diagnostics (`duplicate_route`, `route_method_conflict`, including
+  contract-route conflicts) now carry a related source location pointing at the
+  first declaration. `gowdk check --json` gains an additive `related` array per
+  diagnostic, and the language server reports it as `relatedInformation`.
+- The formatter now tracks brace depth with the parser's string- and
+  comment-aware scanner, so braces inside string literals, comments, and
+  template literals (for example `title "a { b"`) no longer skew indentation.
+
+### Implemented
+
+- A machine-checked `.gwdk` conformance corpus
+  (`internal/lang/testdata/conformance/`) pins the language contract: `accept/`
+  cases must check clean and `reject/` cases must produce their declared stable
+  diagnostic codes. See `docs/language/conformance.md`.
+- A per-construct stability and deprecation table
+  (`docs/language/stability.md`) documents which blocks, metadata keywords, and
+  `g:` directives are stable, partial, planned, or deprecated, guarded against
+  drift from the code registries by a test.
+- `source.SourcePosition` carries a byte `Offset`, with `source.PositionAt` and
+  `source.OffsetOf` conversion helpers, as the exact substrate for future
+  AST-backed formatting and precise editor edits.
+- ADR 0010 records the decision to replace the line-oriented parser with a
+  shared tokenizer and a recursive-descent parser with error recovery, migrated
+  behind the stable `gwdkast` AST seam.
+
+### Changed
+
 - A page that declares no `guard` is no longer a build error. `guard` is now
   optional, but a page is not public by default: `missing_page_guard` is now a
   **warning** and the page's route is denied (403) at request time until the

diff --git a/docs/compiler/pipeline.md b/docs/compiler/pipeline.md
@@ -70,3 +70,9 @@ project config
 
 Future build work should expand from the current generated-output slice while
 keeping downstream passes on `internal/gwdkir.Program`.
+
+The `lex/parse full AST` front-end is the line-oriented parser today. The
+decision to replace it with a shared tokenizer and a recursive-descent parser
+with error recovery, migrated behind the stable `internal/gwdkast` AST seam, is
+recorded in
+`docs/engineering/decisions/0010-tokenizer-recursive-descent-parser.md`.
diff --git a/docs/compiler/syntax-contributors.md b/docs/compiler/syntax-contributors.md
@@ -44,6 +44,11 @@ language contract.
    - LSP/editor: `go test ./internal/lsp` plus editor checks when touched.
    - CLI report changes: update `cmd/gowdk/testdata/*_golden` and run
      `go test ./cmd/gowdk`.
+8. Add a conformance corpus case:
+   - Accepted syntax: an `accept/` file under
+     `internal/lang/testdata/conformance/` that exercises it.
+   - A rejection or new diagnostic: a `reject/` file with a leading
+     `// expect: <code>` directive. See `docs/language/conformance.md`.
 
 ## Guardrails
 

diff --git a/docs/engineering/decisions/0010-tokenizer-recursive-descent-parser.md b/docs/engineering/decisions/0010-tokenizer-recursive-descent-parser.md
@@ -0,0 +1,139 @@
+# ADR 0010: Tokenizer and Recursive-Descent Parser Direction
+
+Date: 2026-06-11
+
+## Status
+
+Accepted
+
+## Context
+
+The compiler front-end is line-oriented. `internal/parser.ParseSyntax` reads
+source with a `bufio.Scanner`, matches patterns against each trimmed line
+(`internal/parser/patterns.go` `lexLine`), tracks nesting with a separate stateful
+brace scanner (`internal/parser/braces.go`), and returns on the first syntax
+error with no recovery. Source positions are 1-based line/column with no byte
+offset, so many spans are line-wide approximations (`sourceLineSpan`). The
+formatter (`internal/lang/format.go`) is independent whitespace-only string
+manipulation that counts braces without skipping strings or comments.
+
+This single foundation is the upstream constraint behind most of the deferred
+parser/formatter/diagnostics work (#250): error recovery, an AST-backed
+formatter, exact token spans, and granular per-construct diagnostic codes are all
+downstream of having a real token stream and a node-producing parser. Right now
+the line-oriented parser is deferred by omission rather than by an explicit
+decision.
+
+Two facts make the direction clear rather than open-ended:
+
+1. The documented target pipeline (`docs/compiler/pipeline.md`) already names a
+   `lex/parse full AST -> semantic analysis -> stable internal IR` front-end.
+   This ADR makes explicit the parser-internals decision that target already
+   implies.
+2. A real character-level tokenizer already exists. `internal/lang.Lex`
+   (`internal/lang/lexer.go`) scans runes into typed tokens with line/column
+   positions, but only editor and CLI tooling consume it. The compiler parser
+   ignores it and re-lexes per line. The codebase therefore maintains two
+   divergent front-ends for the same language.
+
+Crucially, the typed AST is already a stable seam. `internal/parser.ParseSyntax`
+produces the `internal/gwdkast` AST, and every downstream pass
+(`internal/gwdkanalysis` lowering to `internal/gwdkir.Program`, validation, and
+generation) consumes that AST. The parser can be replaced behind that seam
+without disturbing IR, validation, reports, or codegen.
+
+## Decision
+
+Commit to a single shared tokenizer and a recursive-descent parser with error
+recovery, producing the existing `internal/gwdkast` AST. Migrate incrementally
+behind the AST seam.
+
+Concretely:
+
+- **One tokenizer.** Promote the `internal/lang` rune scanner into the shared
+  lexer that both the compiler parser and editor/CLI tooling consume. Retire the
+  per-line `lexLine` path in `internal/parser`. There is one lexical definition
+  of `.gwdk`, not two.
+- **Recursive-descent parser over tokens.** Parse the token stream into
+  `gwdkast.File` with explicit declaration, block, and view productions instead
+  of line-pattern matching. The brace scanner's string/comment/template state
+  becomes ordinary lexer state rather than a separate counter.
+- **Error recovery.** The parser synchronizes at top-level declaration
+  boundaries and block braces so one syntax error does not hide the rest of the
+  file. It accumulates diagnostics instead of returning on the first error.
+- **Exact spans.** Tokens carry byte offsets (ADR depends on #294), so AST nodes
+  and diagnostics get exact token ranges instead of line-wide approximations.
+- **AST is the frozen seam.** `internal/gwdkast.File` is the contract. The new
+  parser must produce the same AST as the line-oriented parser for the currently
+  supported subset; `gwdkanalysis`, `gwdkir`, validation, reports, and codegen do
+  not change as part of this work.
+- **Formatter follows.** Once the parser yields full nodes, the AST-backed
+  formatter deferred in #250 becomes possible and replaces line-oriented
+  `format.go`. Until then, the line-oriented formatter keeps its documented
+  limits (see #296).
+
+Migration is incremental and non-breaking. The line-oriented parser keeps working
+while the new parser is built to produce identical `gwdkast.File` output for the
+supported subset, gated by golden AST-equivalence tests and the language
+conformance corpus (#295). Cutover happens per declaration kind once equivalence
+holds, then the line-oriented path and `lexLine` are removed.
+
+## Consequences
+
+### Positive
+
+- One lexical and grammatical definition of `.gwdk` shared by the compiler and
+  the language server, instead of a line parser plus a separate tooling lexer.
+- Error recovery, exact spans, AST-backed formatting, and granular diagnostic
+  codes become reachable; #250 stops being blocked by the front-end.
+- Diagnostics point at tokens rather than whole lines, improving CLI output and
+  LSP precision.
+- Braces inside strings, comments, and template literals are handled by lexer
+  state, removing a class of parser and formatter miscounts by construction.
+
+### Negative
+
+- A recursive-descent parser plus recovery is materially more code than the
+  current line parser, and the migration must preserve AST output exactly to stay
+  non-breaking.
+- Equivalence testing across every declaration kind is required before cutover;
+  this is real up-front cost before any user-visible benefit lands.
+- Recovery and span precision depend on byte offsets (#294) landing first.
+
+### Neutral
+
+- The public language surface does not change. This is a front-end
+  implementation decision, not a grammar change; the conformance corpus (#295)
+  pins behavior across the migration.
+- Downstream passes are untouched because the AST seam is stable.
+
+## Alternatives Considered
+
+- **Keep the line-oriented parser, document its limits.** Lowest cost, but
+  permanently caps span precision, error recovery, and AST-backed formatting, and
+  keeps two divergent front-ends. Rejected: it contradicts the already-documented
+  target pipeline and leaves #250 structurally blocked.
+- **Adopt a parser generator or third-party combinator library** (ANTLR,
+  participle, goyacc). Rejected: adds a dependency and a generated/runtime layer
+  against the project's lean-dependency stance, and a hand-written
+  recursive-descent parser gives better control over recovery and diagnostics for
+  a small surface language.
+- **Incremental/streaming parser from day one.** Useful for an editor, but
+  premature. The AST seam lets an incremental layer be added later without
+  another front-end decision.
+
+## Follow-Up
+
+- #294 (byte offsets in source positions) is the prerequisite; land it first.
+- Build the shared tokenizer by promoting `internal/lang`'s scanner; retire
+  `internal/parser` `lexLine`.
+- Build the recursive-descent parser to `gwdkast.File` with recovery, gated by
+  golden AST-equivalence tests and the conformance corpus (#295).
+- Cut over per declaration kind; remove the line-oriented parser when equivalence
+  holds across the supported subset.
+- AST-backed formatter and granular per-construct diagnostic codes (#250) consume
+  the new parser; #296 is the interim formatter guard.
+- Link this ADR from the #250 deferral so the line-oriented limitation is a
+  conscious choice with a committed exit.
+- Keep `docs/compiler/pipeline.md` and `docs/engineering/architecture.md` aligned
+  as the migration proceeds.
diff --git a/docs/engineering/decisions/README.md b/docs/engineering/decisions/README.md
@@ -23,3 +23,6 @@ Recommended naming:
 - `0007-static-first-spa-navigation.md`: accepted static-first SPA navigation and generated JavaScript guardrails.
 - `0008-bounded-client-language.md`: accepted bounded `client {}` language and page-scoped store boundaries.
 - `0009-optional-inline-go-authoring.md`: accepted optional inline Go authoring direction, with extraction to normal package Go.
+- `0010-tokenizer-recursive-descent-parser.md`: accepted shared tokenizer and
+  recursive-descent parser with error recovery, migrated behind the stable
+  `gwdkast` AST seam.
diff --git a/docs/language/README.md b/docs/language/README.md
@@ -51,6 +51,8 @@ component contract and inline package-go-block slices.
 - `hybrid.md`: hybrid request-time behavior and deferred hybrid capabilities.
 - `diagnostics.md`: current diagnostic shape and known codes.
 - `formatting.md`: current formatter behavior.
+- `stability.md`: per-construct stability and deprecation tiers.
+- `conformance.md`: machine-checked accept/reject corpus that pins the contract.
 
 ## File Kinds
 

diff --git a/docs/language/conformance.md b/docs/language/conformance.md
@@ -0,0 +1,51 @@
+# .gwdk Conformance Corpus
+
+The conformance corpus is the machine-checked source of truth for the `.gwdk`
+language contract. The prose in `docs/language/spec.md` and
+`docs/language/grammar.md` describes the language; the corpus *pins* it, so a
+parser or validator change that silently accepts or rejects different syntax
+fails a test instead of drifting from the docs.
+
+## Location
+
+```text
+internal/lang/testdata/conformance/
+  accept/   # files that must check clean (no error-severity diagnostics)
+  reject/   # files that must produce specific stable diagnostic codes
+```
+
+The runner is `TestConformanceCorpusAccept` and `TestConformanceCorpusReject` in
+`internal/lang/conformance_test.go`. Each file is checked with
+`lang.CheckSource`, the same single-file path the editor and `gowdk check` use,
+so cases are hermetic and need no project layout.
+
+## Accept cases
+
+Any `.gwdk` file under `accept/` must produce no error-severity diagnostics.
+Warnings (for example `missing_img_alt`) are allowed, because they do not fail a
+build. File-kind classification follows the filename suffix, so a component case
+is named `*.cmp.gwdk` and a layout case `*.layout.gwdk`.
+
+## Reject cases
+
+Any `.gwdk` file under `reject/` must declare the stable diagnostic codes it is
+expected to produce in a leading directive comment:
+
+```gwdk
+// expect: old_action_block_syntax
+package pages
+...
+```
+
+Multiple codes may be comma- or space-separated. The test asserts every named
+code appears among the diagnostics for that file. Diagnostic codes are the ones
+registered in `internal/diagnostics/registry.go` and documented in
+`docs/reference/diagnostic-codes.md`.
+
+## Adding a corpus case
+
+New or changed `.gwdk` syntax must come with a corpus case. Adding accepted
+syntax means an `accept/` file exercising it; adding a rejection or a new
+diagnostic means a `reject/` file with the expected code. This requirement is
+part of the syntax contributor checklist in
+`docs/compiler/syntax-contributors.md`.
diff --git a/docs/language/grammar.md b/docs/language/grammar.md
@@ -2,6 +2,10 @@
 
 This is the grammar accepted by the current metadata parser. It is intentionally line-oriented and incomplete.
 
+Accepted and rejected syntax is pinned by the machine-checked conformance corpus
+in [Conformance Corpus](conformance.md), which is the contract source of truth
+when this grammar drifts.
+
 ```text
 file        = line*
 line        = blank | comment | packageDecl | metadataDecl | importDecl | useDecl | blockDecl | goDecl | actionDecl | apiDecl | unsupportedBlock | other

diff --git a/docs/language/spec.md b/docs/language/spec.md
@@ -8,6 +8,16 @@ instead of becoming accidental behavior.
 Detailed behavior stays in the feature pages linked from
 [GOWDK Language](README.md).
 
+This prose is pinned by the machine-checked conformance corpus described in
+[Conformance Corpus](conformance.md): accepted syntax has an `accept/` case that
+must check clean, and rejected syntax has a `reject/` case asserting its stable
+diagnostic code. When this spec and the corpus disagree, the corpus is the
+contract and one of them is a bug.
+
+Per-construct stability and deprecation tiers (which blocks, metadata keywords,
+and `g:` directives are stable, partial, planned, or deprecated) are published
+in [Language Construct Stability](stability.md).
+
 ## Status Terms
 
 - Implemented: accepted by the current compiler and covered by tests or a