Lambé 0.9.0 by hakimjonas · Pull Request #7 · hakimjonas/lambe

hakimjonas · 2026-05-23T21:40:13Z

Summary

The schema-feedback-loop release. Declare a JSON Schema, check queries
against it, round-trip schemas with the ecosystem. Plus: a 27-class
pipe-op AST consolidation, a rumil_tokens-based REPL highlighter,
the text op for markdown prose extraction, a -n / --null-input
flag, richer --explain warnings, JSON-string keys in object
construction, and an end-to-end CLI ~3.3× faster than 0.8.0 on
parse-bound workloads.

Highlights:

Pipe-op AST consolidation. The 27 per-op AST classes
(FilterOp, MapOp, SortOp, …) collapse into a single
BuiltinPipeOp(name, args); pipe_ops.dart's spec table is now
the only place per-op behaviour lives.
REPL highlighter on rumil_tokens. The 100-line hand-rolled
tokenizer is gone; the highlighter consumes a typed token stream.
Pipe op names colour as keywords; redraw on every keystroke.
Markdown text op. Walks a node tree and concatenates prose
recursively, replacing the structurally broken .children[0].text
pattern. Soft breaks become ' ', hard breaks become '\n'.
Schemas as a first-class contract. --schema <path> takes a
JSON Schema; --print-shape emits one. lambe_check,
lambe_explain, lambe_print_shape MCP tools.
Performance. lam --print-shape big.json (1.5 MB JSON):
2.4 s → 732 ms (3.28×). Inherited from rumil 0.7's combinator
work + rumil_parsers 0.8.0's JSON AST split + capture-based
parsing.

Full release notes in CHANGELOG.md.

Test plan

Companion release

rumil_parsers 0.8.0 shipped on pub.dev today carries the JSON AST
split (JsonNumber → JsonInt | JsonDouble), HCL AST split
(HclNumber → HclInt | HclDouble), the HCL decoder N=1-vs-N≥2 fix,
capture-based number/string parsing, and the common.floatingPoint
precision fix that YAML inherits.

An opt-in escape hatch for CSV/TSV: non-scalar cells encoded as JSON strings inline instead of refused. Default stays at 0.8.0's refuse behavior. Core - CellPolicy { refuse, json } enum in output_format.dart - formatOutput, canWriteAs, canWriteShapeAs, requirementFor, explain all take an optional CellPolicy flattenCells = CellPolicy.refuse - Under json, requirementFor(csv/tsv) widens MustBeFlatList to MustBeList; the writer JSON-encodes list- or map-valued cells via const JsonEncoder().convert(cell); the shape check accepts any list at the root - _scalarCell renamed to _cell (no longer always-scalar) - as(fmt) combinator deliberately does NOT read the CLI/REPL/MCP policy; stays a pipeline-level transform so queries remain portable NotWritable.hints (new field) - List<String> hints on NotWritable, default const [] - _hintsFor populates one hint when: format is csv/tsv, policy is refuse, and the root shape is already SList (so only the cells are the problem). Hint text names all three surfaces: --flatten-cells json (CLI), :flatten-cells json (REPL), flatten_cells=json (MCP) - OutputShapeError.hints getter; _render appends each hint on its own line after the suggestion list - Uniform channel means CLI, REPL, and MCP render the same guidance without re-deriving the condition --explain - explain() takes CellPolicy; threads into canWriteShapeAs for the writability lists - ExplainReport.flattenCells field round-trips the policy - renderExplain emits "Cell policy: json" footer only when non-default, so default output is byte-for-byte unchanged CLI (bin/lam.dart) - --flatten-cells option, allowed [refuse, json], defaults to refuse - Threaded into _writeWithBridge and the --explain path REPL (lib/src/repl.dart) - :flatten-cells <policy> session command with validation - Threaded through _formatResult, _encode, _handleShapeError - :help entry MCP (bin/mcp_server.dart) - flatten_cells parameter on lambe_query inputSchema - Threaded into formatOutput; JSON bypass path unchanged - hints key in _renderShapeErrorPayload Docs - doc/lam.1.md: --flatten-cells option and :flatten-cells REPL command - doc/lam.1: regenerated via tool/manpage.dart - CHANGELOG.md: new 0.9.0-dev section - README.md: non-scalar-cells subsection + CLI example Tests (+106) - csv_element_shape_test.dart: 5 hint tests - shape_explain_test.dart: 4 CellPolicy threading tests - shape_output_consistency_test.dart: 97-case hint matrix (every representative value × every format, verifying hints fire exactly for csv/tsv refuse + SList root) Quality gates: dart analyze clean, 1256 tests pass (was 1150), dart format clean, pana 160/160, manpage round-trip matches.

…ring Pre-commit audit on track D caught a real bug in NotWritable.hints before track B cements the pattern. The problem: hints were List<String> with CLI/REPL/MCP syntax baked into a single string. An MCP agent receiving the error got "--flatten-cells json (CLI) / :flatten-cells json (REPL) / flatten_cells=json (MCP)" as an undifferentiated blob, and had to string-parse to find the actionable parameter. A REPL user saw CLI flag syntax they could not type; a CLI user saw REPL colon-commands. The fix: structured Hint type in lib/src/shape/check.dart, exported from package:lambe/lambe.dart. Each hint carries label, cliFlag, replCommand, mcpParameter (a (String, String) record), and explanation. Each surface renders only its native form: - OutputShapeError.message: no hints baked in (stays surface-neutral). - CLI (bin/lam.dart): writes "Or pass ${cliFlag}: ${explanation}" to stderr after the error message, via _writeHintsCli. - REPL (lib/src/repl.dart): writes "Or run ${replCommand}: ${explanation}" in _handleShapeError, before the bridge prompt. - MCP (bin/mcp_server.dart): emits structured JSON {label, parameter, value, explanation} in the payload. Tests updated to match: - csv_element_shape_test.dart hint tests now check Hint fields. - shape_output_consistency_test.dart hint matrix pins cliFlag value. - Added an explicit assertion that OutputShapeError.message does NOT bake any of the three surface syntax forms. 1256 tests pass, pana 160/160. Not tested: surface-level rendering (that CLI stderr contains the hint line, that MCP payload shape matches). Manually verified; a regression-proof test belongs in the end-of-four-tracks audit.

Evaluate each line of ndjson/jsonl input as an independent JSON document, no shared state between lines, one compact JSON result per line out. Covers the "tail a log" use case at the CLI layer without touching the core "AST over in-memory tree" model. Library - New `queryNdjson(Iterable<String> lines, LamExpr ast)` in lib/lambe.dart. Lazy via `sync*` so a caller (or a pipe into `take`) can pull only as many results as needed; fail-fast with a `line N:` prefix on the first parse or evaluation error. Empty and whitespace-only lines are skipped silently. CLI - New `--ndjson` flag in bin/lam.dart. Auto-enabled when the file extension is `.ndjson` or `.jsonl`, consistent with the existing auto-detection convention for .csv, .yaml, etc. - File input reads all lines eagerly (bounded size). Stdin uses a lazy `sync*` iterator so `tail -f app.log | lam --ndjson '.level'` emits each result as the line arrives — verified with a time-stamped streaming test (line N emerges with N*0.5s delay). - Rejects combining --ndjson with --interactive, --schema, --assert, --explain, or --to <non-json>. The mode is narrow on purpose; other output formats and non-execution modes don't combine sensibly with per-line eval. Tests - New test/ndjson_test.dart: 14 tests covering basic per-line evaluation, empty-line skipping, parse/eval error annotation with line numbers, lazy iteration (results yielded before later error), and complex pipe queries per line. Docs - doc/lam.1.md: --ndjson option block and a "line-delimited JSON" example. - doc/lam.1: regenerated. - CHANGELOG.md: new bullet under 0.9.0-dev Added. - README.md: CLI example. Quality gates: dart analyze clean, 1270 tests pass (was 1256, +14), dart format clean, pana 160/160, manpage round-trip matches.

Audit after track C found that track C and track D had strong library-level unit tests but no coverage for the wiring that actually exposes them to users: the CLI argument parsing, the ndjson file- extension auto-detect, the mode-combination guards, the stdin streaming claim, and the MCP payload shape. Manual smoke tests are not regression-proof; a wiring regression would ship silently. Changes: lib/src/mcp_payload.dart (new, factored out of bin/mcp_server.dart): renderMcpShapeErrorPayload takes an OutputShapeError + expression and returns the JSON string an MCP agent receives. Pure function, no I/O, testable without starting the MCP server as a subprocess. bin/mcp_server.dart: calls the library function; private method _renderShapeErrorPayload removed. lib/lambe.dart: exports renderMcpShapeErrorPayload. test/mcp_payload_test.dart (new, 5 tests): - Payload parses as JSON with all documented keys. - Suggestions carry 1-based ids and composed `apply_as` queries. - Hints carry structured {parameter, value} pairs. - Hints do NOT leak CLI or REPL syntax into the agent-facing JSON. - Empty hints still expose an empty list (key always present). test/cli_integration_test.dart (new, 18 tests): Shells out to `dart bin/lam.dart` with Process.start. Coverage: - Explicit --ndjson flag produces per-line compact JSON. - .ndjson and .jsonl file extensions auto-enable the mode. - Stdin with --ndjson works via pipe. - Empty and whitespace-only lines skipped silently. - Malformed line exits 1 with "line N" in stderr. - File-not-found exits 1 with a clear error. - Five mode-combo guards: --ndjson rejects --interactive, --schema, --assert, --explain, --to yaml. Accepts --to json (redundant). - Streaming: four stdin lines with 500ms gaps, asserts the last two inter-output gaps are >= 300ms. A buffered implementation would deliver all four near EOF with near-zero gaps. Proves tail -f | lam --ndjson emits as lines arrive. - --flatten-cells refuse writes CLI-form hint (--flatten-cells json) to stderr, NOT REPL or MCP syntax. Regression guard for the surface-specific rendering chosen in the track D audit. - --flatten-cells json produces CSV with JSON-encoded cells. - --explain --flatten-cells json widens writable formats and prints "Cell policy: json" footer. - --explain without the flag: no footer, csv in "Not writable as". What's deliberately not covered: - REPL I/O (ReadLine-driven, not testable without a real TTY). - Exact error message phrasing (substring assertions only, so phrasing can improve without breaking tests). - MCP server subprocess JSON-RPC (the payload function it calls is tested directly; the server wiring is a single dart_mcp method). Quality gates: dart analyze clean, 1293 tests pass (was 1270, +23), dart format clean, pana 160/160, manpage round-trip matches.

Three sub-features added to the 0.8.0 explain infrastructure: Runtime-rejection warnings (always on) Pipe-op acceptance predicates in pipe_ops.dart already know which input shapes each op rejects. Explain now surfaces the mismatch statically: `.config | filter(.x)` on a known map produces "filter rejects map<...>; this will throw at runtime". SAny inputs are ignored (cannot prove); compatible inputs pass silently. The new _analyzeRejection helper runs alongside the existing _analyzePredicate in explain()'s per-stage loop. Trivial-result warnings (opt-in) For sort_by, group_by, map, unique_by: when the argument references a field provably absent from the element shape, emit a warning saying "the result is trivial". Reuses _missingFieldPath (the helper that already powers empty-filter warnings) on the element shape of the input list. Opt-in via explain(..., includeTrivial: true) because legitimate uses exist (stable no-op sort, explicit null projection). Structured JSON output renderExplainJson(ExplainReport) emits the full report as JSON with snake_case keys (stages, warnings, writable_as, not_writable_as, flatten_cells). Warning kinds serialize as empty_filter, runtime_rejection, trivial_result. Shapes render as strings via renderShape; agents that need structural shape access should call the lambe_schema MCP tool separately. Text output from renderExplain is unchanged byte-for-byte; JSON is pure-additive. Supporting API changes - WarningKind enum: emptyFilter, runtimeRejection, trivialResult. - ExplainWarning.kind field (required at construction). - explain() gains `bool includeTrivial = false` parameter. CLI wiring (bin/lam.dart) - --explain-trivial flag: implies --explain, enables trivial class. - --explain-json flag: implies --explain, switches to JSON renderer. - Both compose: --explain-trivial --explain-json emits JSON including trivial_result warnings. - --ndjson rejection of --explain remains correct (covers the implies cases via the existing guard). Docs - doc/lam.1.md: two new option blocks, --explain description extended. - doc/lam.1: regenerated. - CHANGELOG.md: 0.9.0-dev bullet covering all three sub-features. - README.md: one paragraph added to the --explain section. Tests (+24) shape_explain_test.dart (+17): - 4 runtime-rejection cases (filter/sum on map, SAny untouched, compatible input untouched). - 6 trivial-result cases (sort_by/group_by/map flagged when opt-in, NOT flagged by default, existing field untouched, SAny element cannot prove). - 6 JSON renderer cases (top-level shape, stage/warning/ writability fields, snake_case kind names, flatten_cells). cli_integration_test.dart (+7): runtime-rejection in default output, trivial-result gated on --explain-trivial, --explain-json shape, the "implies --explain" behavior for both sub-flags, combined usage, and the --ndjson --explain-json rejection path. Quality gates: dart analyze clean, 1317 tests pass (was 1293, +24), dart format clean, pana 160/160, manpage round-trip matches.

Post-track-B audit caught two real gaps; one closed, one deferred with documentation. Structured shapes in --explain-json renderExplainJson previously emitted stage shapes as text strings via renderShape ("list<map<name: string>>"). Agents consuming the JSON had to re-parse that text to access structure, defeating the point of a JSON mode. Fixed by adding shapeToJson(Shape) in lib/src/shape/shape.dart, a sealed-ADT walk that produces {kind, ...} nested trees: {"kind": "list", "element": {"kind": "map", "fields": {...}}} renderExplainJson now uses this form. Text output from renderExplain is byte-for-byte unchanged. Exported from the library barrel. Rejection cascade coverage New test in shape_explain_test.dart verifies the interaction between runtime-rejection warnings and inferShape's SAny widening: `. | filter(.a) | sort` starting from an SMap produces exactly one rejection warning (on filter, stage 1). Sort sees the post-filter ctx as SAny and does NOT emit its own rejection. Prevents double-warning regressions. REPL surface verified manually :flatten-cells colon-command, session-state persistence, and the REPL-native hint rendering (":flatten-cells json" not "--flatten-cells json") all verified in a real session on a list-of-maps-with-lists fixture. A ReadLine-seam refactor would be needed for automated REPL tests; documented in memory as a known gap accepted for 0.9.0. Tests (+9 total) shape_test.dart (+7): shapeToJson on every Shape constructor plus nested round-trips, empty map, empty list, and JSON round-trippability. shape_explain_test.dart (+1 new + 1 rewrite): - Rewrote the "stages carry shape" test to assert the structured {kind: list, element: {kind: string}} form instead of the old string. - New "rejection cascade" test pinning single-warning behavior. Quality gates: dart analyze clean, 1325 tests pass (was 1317), dart format clean, pana 160/160.

Decision record for the schema-as-contract feature. Resolves the design questions from the handover plus several the handover didn't raise. Format: JSON Schema subset (not custom DSL). Subset is type/properties/items/required. Value-level constraints (minimum/pattern/enum/etc) are rejected at load time with per-keyword errors. Structural combinators (allOf/oneOf/$ref/if-then /dependencies) rejected. Unknown keywords ignored per JSON Schema extensibility convention. Key call: rumil_parsers.parseJson does the JSON parsing for free with line-aware errors, so the parser collapses to ~50 lines of exhaustive switch on JsonValue. My earlier "Lambe DSL is cheaper" argument died once I accounted for that. Shape ADT: SOptional(Shape) added. Required by JSON Schema's `required` semantics — shipping without it would silently lie whenever users have optional fields. Termination and the bounded- language contract are preserved: SOptional lives in the static analyzer, not the query language. Disagreement: schema augments shapeOf(data); error on concrete-type conflict. Keeps --explain honest. Structural validation falls out as a side effect; no separate --validate command for 0.9.0. CLI: rename --schema to --print-shape (first breaking change in 0.9.0); add --schema <path>. --print-shape output becomes JSON Schema, round-trippable with --schema input. Sibling convention: data.json paired with data.schema.json. MCP: new schema parameter on lambe_query; rename lambe_schema to lambe_print_shape; new lambe_check tool for on-demand validation. Explicit non-goals called out: no runtime coercion, no value-level constraints, no conditional schemas, no external $ref, no templating. Lambe is not CUE and shouldn't try to be. Implementation plan: SOptional first (compiler finds all the switch sites), then parser/loader/merge, then CLI/REPL/MCP wiring, then tests and docs. Estimated ~1 week. Positioning sharpened via research: Lambe is "a query language for structured data that shows you what you're working with" — use it when you don't already know the data. Not "typed jq" (that market never materialized in 10 years). Not "parity with CUE" (different audience). The shape feedback loop is the actual win.

Add SOptional(Shape) to the sealed shape ADT. This is the variant JSON Schema's `required` semantics demand and the shape the wider "shape as feedback loop" positioning needs. Shipping the schema feature without it would silently misrepresent optional fields. Constructor semantics SOptional(SOptional(x)) collapses to SOptional(x) via the factory. Guarantees no stacked optionality anywhere, so downstream code never has to handle the degenerate case. Acceptance semantics (op predicates) Optional unwraps for op acceptance: `filter` on SOptional<SList<T>> is accepted. The potential absence is surfaced by the explain runtime-rejection analyzer, not by the acceptance predicate. Helpers in pipe_ops.dart (_acceptsList, _acceptsMap, etc.) all unwrap via a shared _unwrap helper. Root-requirement semantics (output formats) MustBeMap / MustBeList / MustBeFlatList do NOT unwrap. An optional at the root means "value may be absent entirely"; TOML/HCL/CSV cannot serialize an absence. Users must materialize a default before the --to step. The distinction between op-acceptance and root-requirement is deliberate: ops tolerate runtime null propagation, root serializers don't. Inference propagation Field access on SMap with an optional field yields SOptional<T>. Field access on SOptional<SMap<...>> (null propagation) also yields SOptional<T>. The factory collapses nested cases so the result is never SOptional<SOptional<X>>. Analyzer integration - Empty-filter check unwraps optional bool predicates; an optional bool may be true, so not "provably non-boolean." - Missing-field path check walks through optional wrappers to inspect the underlying SMap fields. - Runtime-rejection check does NOT unwrap: optional counts as a potential mismatch worth warning about. Completer integration Tab completion unwraps optional for field enumeration and inner- expression resolution. An optional list still completes against its element shape in `.users | map(<TAB>)` contexts. Serialization renderShape: optional<inner>. shapeToJson: {kind: optional, inner: ...}. Tests (+16 across two files) shape_test.dart: render, serialize, equality, nested collapse, embedding in other shapes. shape_explain_test.dart: field propagation, access through optional wrapper, op acceptance, missing-field walk, optional bool predicate, root rejection by TOML, nested collapse via inference, JSON round-trip. Quality gates: dart analyze clean, 1338 tests pass (was 1325, +13 new including the 16 above minus overlap with existing tests), dart format clean, pana 160/160. Zero test regressions. This is step 1 of the track A implementation plan in doc/schema-design.md. Next: JSON Schema subset parser.

Add parseJsonSchema(String): Shape in lib/src/schema/parser.dart. Walks the JsonValue output of rumil_parsers' parseJson, mapping four keywords onto the shape ADT: - type: string selects the target kind - properties: nested field schemas for "object" - items: element schema for "array" - required: which properties stay concrete vs become SOptional Rejected keywords (23 total) each produce a targeted error with a JSON path pointing at the site. Rejections cover value-level constraints (minimum/maximum/pattern/enum/format/minLength/maxLength /minItems/maxItems/uniqueItems/const/multipleOf), structural combinators (allOf/oneOf/anyOf/not), conditionals (if/then/else /dependencies/dependentRequired/dependentSchemas), references ($ref/$defs/definitions), and extra object constraints (additionalProperties/patternProperties/propertyNames). Unknown keywords are tolerated per JSON Schema's extensibility convention — $schema, $id, title, description all pass through as ignored metadata. Error diagnostics carry a JSON path ($.properties.a.properties.b) so users can find the offending nested schema without scanning the whole file. Tests (41 new): - 5 scalar types round-trip (null, bool, number, integer→number, string). - 3 array variants (no items, scalar items, object items). - 5 object + required combinations (empty, all required, no required, partial required, nested object with own required). - 18 rejection tests (one per keyword class). - 2 metadata-tolerance tests. - 7 error-diagnostic tests (invalid JSON, non-object root, missing type, unsupported type, properties type error, required type error, nested error with path). - 2 realistic round-trip scenarios (user record, list of records). Exported parseJsonSchema from package:lambe/lambe.dart. Quality gates: dart analyze clean, 1379 tests pass (was 1338, +41), dart format clean, pana 160/160. Step 2 of 9 in doc/schema-design.md's implementation plan. Next: loader (file IO + sibling auto-detect) and mergeSchemaWithData (disagreement-is-error semantics).

Add lib/src/schema/loader.dart with three functions: loadSchemaFromFile(path) Reads and parses a schema file. QueryError on missing file or parser rejection. loadSchemaForData({explicitSchemaPath, dataPath}) Explicit path wins. Otherwise auto-detects a <dataPath>.schema.json sibling. Returns null when neither exists. Handles extension rewriting (data.json -> data.schema.json, events.ndjson -> events.schema.json). mergeSchemaWithData(schema, data) Schema-augments-data merge per doc/schema-design.md: - SAny on either side: the other wins. - SOptional + present data: strip optional, merge inners (field is concretely there for this run). - SOptional + absent data: keep optional (field may be absent in other runs). - SOptional + null data: keep optional (Lambe-style null propagation: null ~ absent). - Schema-only fields: preserved. - Data-only fields: preserved (schema is a partial description). - Lists and maps recurse. - Concrete-type disagreement at any path: QueryError naming path ($.user.age, $[*]). Path format uses JSON Path-ish notation: $ for root, .field for map descent, [*] for list element. Same rule throughout: agreement passes, schema fills in gaps, data fills in extras, concrete disagreement is an error. Keeps --explain honest in the schema-agrees-with-data case and loud in the schema-contradicts-data case. Null-data policy The stance "schema optional + data null keeps optional" is a deliberate choice: JSON Schema users commonly use null for absent fields, and Lambe's null-propagation semantics treat null similarly to absent. Being strict here would produce friction with real-world JSON Schemas. Documented in the test that pins this behavior. Tests (+25) - 3 loadSchemaFromFile tests (success, missing file, parser error propagation). - 5 sibling auto-detect tests (no sibling, with sibling, .ndjson extension, explicit beats sibling, explicit only). - 3 agreement tests (equal scalars, SAny on either side, both SAny). - 5 disagreement tests (scalar vs scalar with path, map vs non-map, list vs non-list, nested path, list element path). - 4 SOptional handling tests (present strips, absent keeps, null keeps, disagreement on inner). - 5 augmentation tests (schema-only field, data-only field, empty list uses schema element, non-empty merges element, recursive merge). Exported loadSchemaFromFile, loadSchemaForData, mergeSchemaWithData from package:lambe/lambe.dart. Quality gates: dart analyze clean, 1404 tests pass (was 1379, +25), dart format clean, pana 160/160. Step 3 of 9 in doc/schema-design.md. Next: CLI wiring with the --schema rename.

Add renderJsonSchema(Shape, {pretty}): String in lib/src/schema/renderer.dart. Walks the shape ADT and emits a JSON Schema subset document that parseJsonSchema accepts. Main decisions: SOptional handling SOptional inside SMap becomes a non-required property: the inner shape goes into `properties`, and the field name is omitted from `required`. This is JSON Schema's standard way to express "this field may be absent," and it's the only position where Lambe can round-trip optionality. SOptional elsewhere (top-level, inside SList, etc.) has no standard JSON Schema spelling in our subset. Renderer flattens to the inner shape — it's a one-way drop for these positions. The round-trip is preserved for every shape the parser can produce, which is the only invariant we promise. SAny handling Renders as the empty object {}. Parser treats an empty object as SAny (the "empty schema accepts anything" JSON Schema convention). Round-trip preserved. Added to parser: an empty object with no `type` is now SAny instead of a "missing type" error. Pretty vs compact Default `pretty: true` emits 2-space-indented JSON for human reading (print-shape output). `pretty: false` for embedding in other JSON payloads (future MCP responses). Round-trip invariant parseJsonSchema(renderJsonSchema(s)) == s for every shape the parser can emit. 12 representative cases pin this in the test file, plus two complex-shape tests (optional field in a nested list, four-deep nested maps). Tests (+32) - 5 scalar renderings. - 4 container renderings (list with items, list of any, map all required, map no required, empty map). - 1 mixed-required round-trip. - 3 SOptional positions (top, inside list, inside map). - 3 pretty/compact checks. - 12 explicit round-trip cases covering every parser-reachable shape plus 3 complex scenarios. Exported renderJsonSchema from package:lambe/lambe.dart. Quality gates: dart analyze clean, 1436 tests pass (was 1404, +32), dart format clean, pana 160/160. Step 4 of 9. Next: CLI wiring — rename --schema to --print-shape, add --schema <path> option, thread through evaluation and explain.

First user-visible breaking change in 0.9.0: rename the existing --schema flag to --print-shape, add a new --schema <path> option that takes a JSON Schema file. --schema <path> New option on `lam`. Threads the declared shape through both --explain inference (via mergeSchemaWithData) and normal evaluation (validation-as-side-effect — a concrete-type disagreement between schema and data errors at load time). Auto-detection: if --schema is omitted and a sibling <datafile>.schema.json exists, it's used implicitly. Same convention as 0.9.0's .ndjson auto-detect. --print-shape Replaces the 0.8.0 --schema flag. Emits the inferred shape as a JSON Schema subset document, round-trippable with --schema input. Output format is now JSON Schema (second breaking change): 0.8.0's type-name-string JSON is replaced with the canonical schema form so that `lam --print-shape data.json > data.schema.json` followed by `lam --schema data.schema.json ...` round-trips cleanly. Mode combination guards --print-shape + --schema is rejected: --print-shape prints the inferred shape from data, which a schema would only second-guess. --ndjson + --schema is rejected (added to existing ndjson guards). --ndjson + --print-shape is rejected. Help text updates documented in doc/lam.1.md; regenerated doc/lam.1. CLI flow (when --schema is active): --explain path: shape = mergeSchemaWithData(schema, shapeOf(data)) (or just schema when data is absent); fed to explain() as inputShape. Normal eval: mergeSchemaWithData is invoked purely for its side-effect validation (throws on disagreement). Evaluation runs on raw data as usual. --print-shape: schema is rejected (see above). Smoke-tested end to end with: * --print-shape emits JSON Schema (verified by eye). * --explain with sibling .schema.json auto-loads, surfaces SOptional from the `required` semantics, shows it in the shape trace ("list<map<name: string, age: number, email: optional<string>>>"). * --schema api.json '.' response.json where schema says age:string but data has age:number errors cleanly with "schema disagreement at $[*].age: schema says string, data is number" and exits 1. Existing legacy inferSchema function stays referenced in REPL and MCP (updated in steps 6 and 7). Quality gates: dart analyze clean, 1436 tests pass (no changes; no new tests yet — step 8 adds CLI integration coverage), dart format clean, pana 160/160, manpage round-trip matches. Step 5 of 9. Next: REPL integration.

@deprecated

Self-review of steps 1-5 caught two honesty gaps: renderJsonSchema: lossy positions documented The round-trip invariant holds only for shapes parseJsonSchema can produce (SOptional inside SMap fields). Callers composing shapes outside that path — e.g., an inference result where SOptional lands at the root or inside a list — hit a silent flatten. The previous docstring said "no standard JSON Schema representation" which was true but terse; now explicit that optionality is **dropped** in those positions, so the user knows the output isn't lossless for arbitrary shapes. inferSchema: deprecated for 1.0 removal inferSchema emits type-names-as-strings (e.g. `{"age": "number"}`), a format that doesn't round-trip with any parser we ship. With renderJsonSchema as the canonical JSON Schema emitter and shapeOf for the Shape ADT, inferSchema is vestigial. Marked @deprecated with a migration pointer to renderJsonSchema(shapeOf(value)). Removal scheduled for 1.0 per the "freeze the shape API" target. REPL and MCP callsites migrate in steps 6 and 7. Also verified via exploratory tests (not committed, cleanup-only): - SOptional(SOptional(x)) collapses at the factory level AND through _lookupField's recursion-then-factory-wrap, so stacked optionals cannot exist from inference. - mergeSchemaWithData never produces stacked optionals either: the data-side optional branch unwraps before merging inners. Other self-review findings deferred: - CLI guard matrix (7 mode-combo rejections) is accreting. Noted in project_lambe_cli_test_matrix memory as a post-4-tracks refactor. - Validation errors aren't structured like OutputShapeError. Deliberately not forcing them into that mold; they're a different class of problem (input validation vs output serialization). - CLI integration tests for --schema / --print-shape deferred to step 8. Quality gates: dart analyze clean, 1436 tests still pass, dart format clean, pana 160/160.

@deprecated

Migrate REPL's :schema command from inferSchema-based output to the 0.9.0 schema infrastructure, and add :print-shape. Session state New `Shape? activeSchema` variable in runRepl. Loaded by :schema <path>, queried by :schema (no arg), used to validate future data loads. :schema [path] With a path: loads the schema via loadSchemaFromFile, stores it on the session. If data is currently loaded, runs mergeSchemaWithData on the fly as a structural validation check; reports "Schema loaded (agrees with current data)" or "Schema loaded, but disagrees with current data: <path>: ...". No path: prints the active schema via renderJsonSchema, or the no-schema-loaded message. :print-shape New command. Prints shapeOf(currentData) as JSON Schema. The REPL analog of the CLI --print-shape; replaces the old :schema (no arg) behavior. :load <file> re-validates against the active schema When a schema is loaded and the user switches data via :load, runs mergeSchemaWithData again and warns on disagreement. Keeps the REPL session honest across data changes. Completer Added flatten-cells and print-shape to the _replCommands list in completer.dart so tab completion on bare `:` offers the new commands alongside the old ones. 11 total now (was 9). :help updated to document both :schema forms and :print-shape. inferSchema callsite removed from REPL. The legacy function stays in lib/src/output.dart as @deprecated; MCP migrates in step 7. Manual REPL verification (interactive, can't be automated without a TTY seam): * :print-shape emits JSON Schema for the data. * :schema <path> loads, reports agreement / disagreement vs data. * :schema (no arg) prints the active schema. * :load <file> re-validates against the active schema. * :help lists the new commands. * Tab completion on bare `:` offers all 11 commands. Test update: completer_test.dart "all commands on bare colon" updated from expecting 9 to expecting 11, plus explicit checks for flatten-cells and print-shape. Quality gates: dart analyze clean, 1436 tests pass, dart format clean, pana 160/160. Step 6 of 9. Next: MCP server.

@deprecated

Three MCP surface changes aligning with the CLI schema work. lambe_query: new schema parameter Optional inline JSON Schema string. When provided, data is parsed and validated against the schema before the query runs; a structural disagreement returns an error with the path. Agents wanting to fail-fast on unexpected shapes now have a first-class way to do it. Threaded through _handleQuery via parseJsonSchema + mergeSchemaWithData. lambe_schema renamed to lambe_print_shape Tool rename aligning with the CLI rename (--schema -> --print-shape). Output format changed from type-name-string JSON (e.g. `{"age": "number"}`) to canonical JSON Schema (e.g. `{"type": "object", "properties": {"age": {"type": "number"}}, ...}`). The new output round-trips with lambe_query's schema parameter, lambe_check, and the parseJsonSchema library function. This is a breaking change for agents that hardcoded the old tool name; the description calls it out explicitly. lambe_check: new tool Validates data against a JSON Schema subset without running a query. Returns `{"ok": true}` on agreement or `{"ok": false, "error": "..."}` with the disagreement path. Intended for API-contract checks, CI gates, and agents that want to verify fixtures before running queries. Server instructions updated The initial MCP instructions string now lists all four tools by name with one-line descriptions of when to use each. Helps agents pick the right tool without having to call tools/list. AGENTS.md updated Tool list in the top-level agent guide mirrors the new surface. Smoke-tested end-to-end via JSON-RPC: - tools/list returns [lambe_query, lambe_print_shape, lambe_check, lambe_assert]. - lambe_print_shape on a users object emits valid JSON Schema with required set from the data's concrete keys. - lambe_check with matching schema returns {"ok": true}. - lambe_check with mismatched schema returns {"ok": false, "error": "schema disagreement at $.age: ..."}. - lambe_query with a schema parameter that disagrees with data returns isError=true before running the query. inferSchema is no longer referenced from bin/mcp_server.dart. The legacy function remains in lib/src/output.dart marked @deprecated; all repo callsites have now migrated. Quality gates: dart analyze clean, 1436 tests pass, dart format clean, pana 160/160. Step 7 of 9. Next: CLI integration tests for --schema / --print-shape.

Nine new end-to-end tests in test/cli_integration_test.dart pin the schema surface at the CLI layer. Each spawns `dart bin/lam.dart` and asserts on exit code, stdout, stderr. --print-shape (3 tests) 1. Emits valid JSON Schema for a typical object (parses as JSON, carries type/properties/required). 2. Round-trip: print-shape data.json > data.schema.json, then --schema data.schema.json '.' data.json succeeds. Proves the renderer + parser agree end-to-end via a real subprocess, closing the loop the library-level round-trip tests opened. 3. --print-shape + --schema is rejected (redundant combination). --schema (6 tests) 4. Explicit --schema threads into --explain inputShape; the shape trace surfaces schema-declared optional fields (email: optional<string>) that don't exist in data. 5. Sibling <data>.schema.json is auto-detected when --schema is omitted. Verifies the same schema information flows through. 6. Schema disagreement (data.age is number, schema says string) exits 1 with a path-annotated stderr message ("$.age", "string", "number" all present). 7. Schema parse error on rejected keyword (allOf) surfaces a clear diagnostic (contains "allOf" and "unsupported"). 8. Missing schema file exits 1 with "schema file not found". 9. --ndjson + --schema is rejected. These exercise the full wiring added in step 5 (CLI) plus the parser/loader/renderer library layer from steps 2-4. Library tests stay the foundation; integration tests here pin the glue. Quality gates: dart analyze clean, 1445 tests pass (was 1436, +9), dart format clean, pana 160/160. Step 8 of 9 in doc/schema-design.md. Next: docs polish — CHANGELOG 0.9.0 entry, README reframe, doc/schema.md user guide, man page examples.

Self-review of the full 0.9.0 before the docs polish surfaced a real gap: track B shipped --explain-json at the CLI but never surfaced --explain to MCP agents. The positioning pitch ("shows you what you're working with") specifically targets agents; leaving them without structured explain output undermines the track B deliverable. Framing this as "future" was reflexive, not reasoned. 40 lines of tool wiring calling existing library functions is not a future feature; it's an unfinished track. lambe_explain tool Parameters: expression (required): the query to analyze. data (optional): when provided, shape seeds from shapeOf(data). format (optional): input format for data; auto-detected if not given. schema (optional inline string): merges with shapeOf(data) for a more precise initial shape. With no data and no schema, starts from SAny. include_trivial (optional bool): surfaces trivial-result warnings (--explain-trivial equivalent). flatten_cells (optional enum): affects the writable_as summary. Returns renderExplainJson(report) — the exact same payload the CLI's --explain-json emits, with snake_case keys and nested-kind shape trees. Agents get one structured contract across surfaces. Updated the MCP server instructions to list all five tools. Updated AGENTS.md tool inventory. Smoke-tested end-to-end via JSON-RPC: - tools/list returns five tools including lambe_explain. - lambe_explain with data + expression returns a trace where .users shape is list<map<...>> and |map(.name) is list<string>. - lambe_explain with data + schema (schema declares email as optional) produces list<optional<string>> when .email is accessed — agent-advantage use case proven. - lambe_explain with include_trivial: true surfaces trivial_result warnings for sort_by(.missing). - lambe_explain with no data (expression-only) still produces a meaningful trace (length on unknown input infers SNum). Existing library-level tests cover the underlying renderExplainJson and explain functions; the new MCP tool is a thin wrapper. CLI subprocess tests for MCP are consistently deferred across all server tools. Quality gates: dart analyze clean, 1445 tests pass, dart format clean, pana 160/160. REPL still lacks :explain. Leaving as genuine future work: REPL users can already run queries live (sub-100ms), so the "see-before-run" need is weaker there than it is for agents. Clears the track-A step-9 prerequisite: MCP surface is now coherently covered.

Ship the 0.9.0 documentation pass: reframe the pitch to match what shipped, consolidate the scattered 0.9.0-dev CHANGELOG entries into a single coherent release section, add a user-guide for schemas, and fix stale references to the pre-rename CLI flags, deprecated library symbols, and old MCP tool names. pubspec.yaml - Version: 0.9.0 (regenerated lib/src/_version.dart). - Description reframed to the "shows you what you're working with" pitch, trimmed to fit pana's 180-char limit. - Added `schema` to topics. CHANGELOG.md - New 0.9.0 section organized by theme, not by track. Opens with the shape-feedback-loop framing. Five sections: schemas as a first-class contract, SOptional in the shape ADT, richer --explain, --ndjson, --flatten-cells, cross-surface Hint type. - Breaking changes called out explicitly: --schema renamed to --print-shape; --print-shape output format changed; MCP tool lambe_schema renamed to lambe_print_shape; Shape gains SOptional variant; ExplainWarning gains required kind param. - Deprecated section notes inferSchema scheduled for 1.0 removal. README.md - New lead: "a query language for structured data that shows you what you're working with." Drops the jq comparison from the pitch and names the actual use case ("when you don't already know the data"). - New --schema section after --explain, showing both threaded- into-explain and validation-on-load examples, plus round-trip via --print-shape. - CLI examples: --schema and --print-shape replace the stale --schema data.json (which now means something different). - Library example: shapeOf/renderJsonSchema/parseJsonSchema/ mergeSchemaWithData replace the deprecated inferSchema. - MCP tool list: five tools with their feedback-loop roles. - Docs index: added doc/schema.md. - REPL banner version bumped. DESIGN.md - MCP tool list updated to five tools. doc/schema.md (new) - Complete user guide for the schema feature: why-use, accepted keywords, rejected keywords, CLI/REPL/MCP/library surface, disagreement semantics, round-trip, what schemas don't do. - Clarifies the shapeOf-vs-schema division of labor. doc/lam.1.md - Added schema-checked query and schema-seeded explain examples to the EXAMPLES section. - Regenerated doc/lam.1 via tool/manpage.dart. AGENTS.md was already updated in step 7 (MCP). Quality gates: dart analyze clean, 1445 tests pass, dart format clean, pana 160/160 (description length was over 180 chars on first pass; trimmed). Completes track A. Release-ready from a code/docs perspective. What remains outside track A: install.sh + Homebrew tap for 1.0, the downstream rem/arda-web commits still unpushed, and the push of the 0.9.0-dev branch itself.

Ship the one-line installer the 0.8.0 handover called out as "the biggest single 1.0 ergonomic win." Users no longer need to know their architecture, fetch three curl commands, or use sudo. install.sh curl -fsSL https://raw.githubusercontent.com/hakimjonas/lambe/main/install.sh | sh Detects OS (Linux/macOS) and arch (x64/arm64). Resolves the latest release via the GitHub API (no auth, no JSON parser — grep+sed). Downloads lam and lam-mcp binaries into ~/.local/bin/. Verifies SHA256 against a published checksums.txt before installing; refuses to install on mismatch. Honors LAMBE_VERSION to pin a tag, LAMBE_PREFIX to change the install dir, LAMBE_BASE_URL to override the release base URL (useful for mirrors and testing), LAMBE_NO_MAN to skip the man page. Does NOT modify shell rc files. Prints a PATH reminder if the target bin dir isn't on PATH, showing the exact export line the user would add if they choose. Man page install is best-effort: if the release has a lam.1 asset (current releases do not — placeholder for a future bump), it's installed to ~/.local/share/man/man1/. Silently skipped otherwise. Release workflow: checksums.txt .github/workflows/release.yml now runs `sha256sum lam-* > checksums.txt` over the collected artifacts and uploads the manifest as a release asset. install.sh fetches this before any binary, and every binary is verified against it before install. Smoke-tested end to end with a local python HTTP server and fake artifacts: - Platform detection correctly identified linux-x64. - LAMBE_BASE_URL override worked (needed for the test). - checksums.txt parsed, expected hashes looked up per asset. - Correctly matched hashes: binaries installed with 0755 perms. - Corrupted lam-linux-x64 (hash mismatch): refused install, exited 1, wrote no files to the install prefix. - PATH reminder rendered correctly when target wasn't on PATH. README: new Installation section leads with the one-liner, keeps pub.dev / library / source-build options below for Dart users. CHANGELOG: new "Install ergonomics" section under 0.9.0. Still deferred: Homebrew tap (noted in handover, independent work, can be added post-0.9.0 without breaking the install story). Quality gates: dart analyze clean, 1445 tests pass, pana 160/160, install.sh `sh -n` syntax check clean.

Full audit of 0.9.0 before release. Findings and fixes: doc/lam.1.md frontmatter `source: Lambë 0.8.0` -> `0.9.0`. Not auto-generated; no CI check caught it. Regenerated doc/lam.1. pubspec.yaml Stray blank line in the dev_dependencies section removed (cosmetic; pana had no opinion). server.json + .github/workflows/release.yml MCP registry description was still the 0.8.0 "Query JSON, YAML, TOML, HCL, CSV, TSV, and Markdown" pitch. Updated both to the 0.9.0 "A query language for structured data that shows you what you're working with" framing so the MCP registry entry matches pubspec and README. The workflow's hardcoded description in the publish-mcp step now also reflects 0.9.0. tool/release_prep.sh (new) Scriptable release gate. Runs the full check matrix before tagging: * Version consistency (pubspec, _version.dart, man page frontmatter, CHANGELOG section, README banner). * File hygiene (nothing tracked that matches .gitignore patterns for secrets/benchmarks/session notes). * Dependencies (pubspec_overrides.yaml not tracked, dart pub get). * Quality gates (analyze, format, test, pana 160/160). * Documentation (doc/lam.1 synced with .md source, dart doc produces zero errors). * Release workflow (.yml present, all per-platform assets referenced, checksums.txt step present, server.json description matches pubspec). * Git state (clean working tree, tag doesn't exist yet, branch check). Exit 0 means ready to tag. Non-zero collects and reports all issues at once rather than failing on the first one — so you fix the whole list and re-run, not whack-a-mole. Usage: bash tool/release_prep.sh [version] The script flagged the doc/lam.1.md frontmatter on first run — so it's already paying for itself. The README banner check initially had a shell-word-splitting bug (grep output tokenized by whitespace meant `lambe` and `v0.9.0` became separate tokens); fixed with a while-read loop over a here-doc. What the script does NOT do: * Tag, push, or publish — those stay manual. This is the "am I ready?" audit, not the release itself. * Verify install.sh against a live release. Checked manually against a staged HTTP server during install.sh development; post-tag verification with LAMBE_VERSION=v0.9.0 is noted in the "Next steps" output. Post-audit state: dart analyze clean, 1445 tests pass, dart format clean, pana 160/160, man page round-trip matches. Ready to tag after the remaining uncommitted state (this commit) lands.

…gn.md Cleanup pass before pushing 0.9.0 to make sure the public repo state is free of internal-development-only content. .gitignore: add the local AI-tool session cache directory Mirrors how .idea/ and .vscode/ are already ignored — local tooling state belongs with the checkout, not the public repo. doc/schema-design.md: reframe as rationale, not internal plan The file was written as a track-A design doc in plan mode, using internal vocabulary ("Track A", "approved, ready for implementation"). That framing is meaningful mid-release but noise to a public reader: "Track A" is not documented anywhere users would see. Retitled as "Schema-typed queries — design rationale" with a pointer to doc/schema.md for user-facing content. Removed the "Tracks B/C/D" reference from the Context section. The file's value — a record of why JSON Schema subset was chosen over a Lambe DSL, why SOptional was added, why disagreement-is-error rather than schema-wins — is preserved. Audit confirmed nothing else tracked reads as internal dev content: * AGENTS.md, AI.md, DESIGN.md, ROADMAP.md — all public by intent. * No HANDOVER_*.md tracked (commit 613803e removed it; .gitignore prevents re-adding). * No bench-results-*.json tracked (.gitignore + .pubignore both catch them). * No secrets (.mcpregistry_* ignored). * No stale binaries (lam-mcp ignored). * No local dep overrides (pubspec_overrides.yaml ignored). The 0.8.0 handover plan is still in git history (commit 93271aa, removed in 613803e). Not cleaning history: the content is planning notes from a committed-then-removed workflow, not secrets, and rewrite would break existing clones. The removal commit itself documents the intent going forward.

Three new language features for Lambé queries: 1. **List literals** (`[expr, expr, ...]`) - New `ListConstruct` AST node holding a `List<LamExpr>` of member expressions, evaluated against the current context. - Parsed at atom level so it never shadows postfix indexing (`expr[i]`, which requires a prior atom on the left of `[`). - Plus list concatenation: `+` on two lists produces concatenation. Mixed list/scalar `+` is a type error (Lambé strictness over silent lifting); evaluator wrapper `_binaryOp` intercepts before delegating to `applyBinaryOp` for the scalar dispatcher. 2. **`//` alternative operator** (jq-style fallback) - New `Alternative` AST node. `a // b` returns `a`'s value if non-null, else `b`'s. `b` is only evaluated on fallback. - Lambé semantics differ from jq deliberately: jq fires on "null or false"; Lambé fires only on `null`. Genuine `false`/`0`/`""` pass through. - Right-associative; one level above `||` so `a // b // c` means `a // (b // c)`. Built by hand because Lambé's parser combinators ship `chainl1` (left-associative) only. - Doubles as missing-key fallback via null-propagation: `.user.email // .user.contact.email // "unknown"`. - The `/` binary op gets a `notFollowedBy('/')` guard so it doesn't shadow `//`. 3. **Keyword aliases for binary operators** (`and`/`or`/`tonumber`) - `and` parses as `&&`, `or` as `||`. Both keep word-boundary semantics so `.andy` and `.orbit` still tokenize as fields. The result `BinaryOp` node carries the canonical symbol so shape/eval don't see the alias. - `tonumber` parses as the canonical `to_number` pipe op. Registered as a jq-ism alias at the parser layer; shape and evaluator stay alias-unaware. Shape inference (`shape/infer.dart`) and rendering (`shape/explain.dart`) updated for both new AST nodes. All 1,496+ tests pass: 7 new tests for `//` (eval + parser), 5 for list literals (parser), 5 for list literals (eval), 1 for `+` list concat, 6 for jq-ism aliases.

Recognises common jq idioms that Lambé does not support and surfaces a targeted hint instead of the generic "expected ..." fallback. Keeps error messages short and actionable for agents trained on jq priors. Recognised idioms: - `.users[]` (jq array iteration) → hint to use `map(...)` for per-element work. - `.foo?` (jq error suppression) → hint to use `has()` or a shape check. - `..` (jq recursive descent) → hint to use explicit paths. - `| select(pred)` (jq filter) → hint to use `filter(...)`. - `map(select(...))` (jq filter idiom) → hint to use `filter(...)`. - `| empty` (jq drop stage) → hint to use `filter(...)` for the intended drop semantics. - `if/then/else/end with empty` (jq conditional drop) → same filter-based hint. - `| if ... then ... else ... end` (jq if-as-pipe-stage) → explains Lambé's expression-only `if/then/else` rule. Two integration points in `lib/lambe.dart`: - `_jqIdiomHint(expression, offset)` — pattern-matches the input and returns a `String?` hint. Wired into `_formatParseErrors` before the verbose "expected ..." fallback, and into `_describeLeftover` for unparsed-remainder context. - `_jqPipeOpHint(word)` — fires when the user writes `.x | <jq-name>` for a name Lambé doesn't have, mapping to the Lambé equivalent. Plain typos still fall through to the existing did-you-mean (closest-match) suggestion. The hint short-circuits only when the jq idiom is recognised. 10 tests in `test/parse_error_format_test.dart` cover each idiom, including the falls-through case where did-you-mean still fires.

Replace the six layered chainl1 calls plus the recursive _unary definition with a single pratt<LamExpr>(_postfix, [...]) call covering prefix unary -/!, the six binary precedence levels, and the right-associative // alternative. The if/then/else conditional stays inside _atom rather than as a Pratt operator because its three-branch shape doesn't fit infix dispatch. Binding powers (low to high): // alternative (right-assoc) 5 ||, or 10 &&, and 20 ==, != 30 <=, >=, <, > 40 +, - 50 *, /, % 60 prefix -, ! 70 The / operator keeps its notFollowedBy(/) guard so it doesn't shadow the // alternative; keyword aliases and / or use _kw(...) (word boundary) so .andy / .orbit don't tokenize as 'and y' / 'or bit'. Bench numbers (tool/bench/run.dart --aot --runs 5, completer scenarios across 5 shapes x 4 sizes): vs rumil 0.6 + chainl1 baseline mean +7.1%, median +5.1% vs rumil 0.7 + chainl1 (just rumil bump) vs rumil 0.7 + Pratt (this commit) mean -10.1%, median -8.7% Net change for Lambé queries ~17% faster on the completer hot path vs the chainl1 baseline that shipped with rumil 0.6. The win comes from collapsing six chainl1 dispatch layers into one Pratt loop plus eliminating the defer(() => _unary) recursion via the explicit Prefix descriptor. The opTable fast path is not engaged here because operators are wrapped with _sym/_kw; the gain is structural. All 1,496 lambe tests pass unchanged. No public API change (parseQuery / parsePartial signatures untouched).

Replaces the inline 13-operator infix ladder with a single call to rumil's new cFamilyPrecedence preset. Lambé-specific operators stay inline: - The right-associative `//` alternative (no C-family analogue) - Keyword aliases `and` / `or` for `&&` / `||` (Lambé extension) - The `/` notFollowedBy(/) guard, supplied via sym dispatch Functionally equivalent to the previous hand-rolled list; bench numbers within noise of pre-preset (mean -9.7% vs -10.1% on the completer matrix). All 1,496 lambe tests pass unchanged.

rumil 0.7.0 (and rumil_parsers, rumil_expressions) published to pub.dev. Lambé now resolves these from pub.dev directly rather than via the local-path override that carried us through the 0.7 development cycle. Constraints: - rumil: ^0.6.0 -> ^0.7.0 - rumil_parsers: ^0.6.0 -> ^0.7.0 - rumil_expressions: ^0.6.0 -> ^0.7.0 pubspec_overrides.yaml is removed (it's gitignored, so this is a local-file deletion only). Future contributors clone and `dart pub get` resolves real published packages. All 1,496 lambe tests pass against the published rumil 0.7 family.

Mirrors the same convention added to rumil-dart's .gitignore. Lets release-planning notes, status snapshots, and similar working-memory documents live in the repo for discoverability without ever getting committed.

@deprecated

…A followups Bundles steps 1–8 of LAMBE_0.9.0_PLAN with Tier A items A1–A7. The 27 per-op AST classes collapse into a single BuiltinPipeOp(name, args) backed by an extended pipe_ops.dart spec table that owns acceptance, shape inference, runtime evaluation, and parse arity on one record. Adding or renaming a pipe op is now a one-file change. As(target) keeps a dedicated AST class for its typed OutputFormat argument. REPL highlighter migrated from a 100-line hand-rolled tokenizer to a rumil_tokens LangGrammar defined in lib/src/highlight_grammar.dart. New runtime dependency: rumil_tokens ^0.1.0. Other 0.9.0 wins: - _normalize short-circuits canonical inputs (identity-pass for Map<String,Object?> / List<Object?> / scalars) - queryNdjsonString(lines, expression) convenience added - Six doc-precision fixes inlined into pipe_ops.dart and evaluator.dart (// is null-fallback; empty-list policy; unique distinguishes int/double; duplicate-key behaviour; from_entries rejects non-map / non-string-key entries explicitly; type rejects non-JSON values with a hint) - inferSchema @deprecated annotation already in place Tier A followups from the discovery session: - TSV input honors header rows the same way CSV does (input.dart now runs detectDialect with the tab delimiter forced) - String single-char indexing: .name[0] returns "a" instead of erroring (mirrors slice semantics; out-of-range returns null) - jq alias: add → sum - Stale // line removed from _jqIdiomHint doc - doc/getting-started.md pubspec snippet bumped to ^0.9.0 - doc/syntax.md bare-literal examples rewritten as runnable echo/lam invocations (every rewritten example verified) - CHANGELOG appended for both batches 1516 tests pass (1500 baseline + 16 new). pana 160/160. dart analyze clean (one pre-existing test warning at evaluator_test.dart:646).

The `lam` AOT binary built into the repo root was tracked as untracked. Now matches the `lam-mcp` entry just below. The to_entries example in doc/syntax.md showed compact single-line output (`-> [{"key": ...}]`) but real `lam` defaults to pretty-printed JSON. Rewrote the example as a runnable echo/lam invocation matching the real output, consistent with the Tier A doc rewrites. Implementer surfaced both as out-of-scope-but-flagged after Tier A landed.

`lam --print-shape '.users' data.json` now prints the schema of the result of evaluating `.users` rather than the schema of the whole document. Pre-0.9.0 the expression was silently ignored. Composes with the existing inferShape / renderJsonSchema machinery and mirrors --explain's no-data fallback (infer against SAny when no data is available). The single-positional case is disambiguated by file existence: if rest[0] is an existing file, treat it as the file (legacy form); otherwise treat it as an expression. Plain identifier filenames aren't valid lambé queries either, so the collision case is vanishingly unlikely. Empty stdin in static-analysis modes (--explain, --print-shape) is now treated as "no data" rather than triggering a JSON parse error. Tests: 4 new cases pinning compose, legacy, no-data, and null-result behaviours. 1516 -> 1520 tests pass.

`{name: .x}` was the only spelling for object construction; `{"name": .x}` errored with a confusing "unexpected" message. Lambé's data model accepts any string as a key — the construction grammar should match. Both spellings now produce identical maps. Bare identifiers stay the canonical form for keys that are valid identifiers; JSON-string literals are the way to construct keys that are not (`{"x-axis": .a}`, `{"Content-Type": "application/json"}`, `{"my key": 1}`). Mirrors `_stringLit` minus interpolation: `\(...)` in key position is rejected with a clear message because key position is structurally not an expression position. Shorthand `{name}` continues to require a bare identifier — `{"name"}` alone would conflict with treating the JSON-string as a value-with-defaulted-key, which we don't support. Adds C6a regression test pinning `{name, tags: ["x", "y"]}` parses correctly (discovery 4.1 reported broken on 0.8.0; works post-Pratt migration; this guards against future breakage). Adds full C6b test coverage: AST equivalence with bare form, hyphenated keys, keys with spaces, mixed forms, escapes inside JSON-string keys, interpolation rejection, end-to-end query roundtrip.

`-r` / `--raw` is a silent no-op on structured output (objects, arrays, numbers, booleans, null) — only top-level string scalars get unquoted. The previous wording ("Output strings without quotes") read as a pretty-print toggle and surprised users on non-string values. Rebuild of `doc/lam.1` follows.

CHANGELOG additions span the Tier C surface: HCL N=1 uniformity (Bug fixes), markdown `text` op (new Markdown text extraction subsection), JSON-string keys in object construction (Bug fixes), -r raw semantics and the new non-goals page (Documentation precision), and the load- bearing precedent comment for the `text` op. `tool/bench/cli_bench.sh` is the C4 fact-finding harness — three cases drawn from the discovery report (50k --print-shape, filter + length, group_by) on synthetic inputs, AOT binary, min/median/max across N runs. The user runs it on their workstation; cherry-pick wins land as separate commits with measured numbers in the message.

`rumil_parsers 0.8.0` ships the JSON AST split (`JsonNumber` → `JsonInt | JsonDouble`) along with the HCL decoder fix originally adopted under the local 0.7.1 override. Lambé's constraint moves from `^0.7.0` to `^0.8.0`. Single consumer: `lib/src/schema/parser.dart`'s `_kindOf` switch case maps both `JsonInt()` and `JsonDouble()` to `'number'` — preserves the JSON Schema `type: number` semantics where lambé's schema layer reads the JSON AST directly, while letting downstream type-flow analysis specialize when it cares about the discrimination. No user-visible behavior change at the lambé surface. Tests: 1639 pass unchanged (the bump touches a single switch case that had no behavior-level dependency on the flattened representation).

End-to-end CLI is 3.3× faster on parse-bound workloads, measured against the discovery report's 0.8.0 baselines on a Linux x86_64 workstation. Most of the win is inherited from rumil 0.7's combinator work; rumil_parsers 0.8.0's JSON AST split and capture-based parsing contribute ~11% on the parse-bound cases and ~13% on group_by.

Each invariant runs as a separate `lam --assert` and reports the failing invariant by name; failures don't short-circuit, so all problems surface in one run. Picks up `./lam` if compiled, falls back to `dart run bin/lam.dart`. Invariants: - at least one H2 release entry exists - no duplicate H2 release entries - the first heading is an H2 - the latest H2 matches `pubspec.yaml`'s version

New `lint-changelog` job compiles the AOT `lam` binary and runs the script. Reuses `dart-lang/setup-dart` like the existing jobs.

A new subsection under Markdown shows extracting every release version, the latest version, every subsection title, and the no-duplicate-H2 invariant. Closes with a pointer to `tool/lint_changelog.sh` as the in-repo example of these queries gated by `--assert` in CI.

Adds a Tooling subsection mentioning `tool/lint_changelog.sh` and the four invariants it gates.

`expect(query('[]', {}), [])` had no element type for the actual list, which dart analyze rightly flagged as inference_failure_on_collection_literal. Annotated as `<Object?>[]` to match what `query` returns. This was the last analyzer warning the project carried; lambé is now analyze-clean across the board.

…output doc/syntax.md was carrying compact-JSON output drift in 28 examples plus a few outright lies (commentary-form output like "[Alice (25), Bob (35), Carol (42)]" presented as if it were lam's output, the "32" arithmetic result that's actually "32.0", "expected bool" that's "expected boolean"). The pre-Tier-A `query → -> result` form was itself a teaching abstraction that diverged from real CLI behavior. Now every example is a `$ lam ...` invocation with the output captured by actually running it. Examples that don't reference data use `lam -n`; examples that do reference data use `data.json` (declared at the top of the doc as a save-this-file block). Copy-paste works end-to-end. Doc grew from 599 to 737 lines (+138, ~23%). The growth is from multi-line pretty-printed output that matches what users see; the compact `-> [...]` form was hiding that cost from readers.

… follow-on Re-ran tool/bench/cli_bench.sh against the new rumil_parsers (HCL AST split + common.dart capture rewrites + YAML overflow fix on top of the JSON pass). Numbers shifted modestly in the right direction: - --print-shape big.json: 744 ms → 732 ms - filter | length (50k): 747 ms → 742 ms - group_by (1k records): 34 ms → 33 ms The HCL fold-in doesn't directly help JSON workloads but the common.dart precision/capture rewrites contribute marginal gains through cleaner shared-helper paths. Total speedup vs 0.8.0 stays at ~3.3× — the headline is unchanged; the tail just got a touch tighter.

…roke Two highlighter bugs surfaced during the 0.9.0 live REPL smoke test: 1. Pipe op names (`filter`, `map`, `text`, etc.) rendered uncoloured because `lambeGrammar` only listed the language keywords (`if`, `then`, `else`, `true`, `false`, `null`, `and`, `or`). The tokenizer correctly classified `filter` as a plain identifier; the highlighter had no rule to colour it. Fixed by routing pipe op names through `LangGrammar.types` (the semantically appropriate field — they're not language keywords, they're language-defined identifiers from the user's perspective) and adding a `TypeName() => _hMagenta` case in `_colorFor`. The list is sourced from `pipe_ops.dart`'s spec table so adding an op picks up colouring automatically. `lambeGrammar` is now `final` instead of `const` because `pipeOpNames` is built at runtime from the spec table. The const was incidental; nothing depended on it. 2. Forward-typing skipped re-tokenisation: appending a character at end-of-line took a fast path (`stdout.writeCharCode`) that wrote the new character verbatim without re-running the tokenizer over the buffer. Result: typing `filter` left it plain until a subsequent edit triggered a full redraw. The fast path made sense when the highlighter was hand-rolled and per-keystroke tokenisation was expensive; with `rumil_tokens` actually being fast, the fast path was a UX bug. Fixed by always going through `_redraw` on character insertion. Keywords and pipe op names now colour as soon as the trigger character is typed.

`map(t<TAB>` now offers `text`, `to_entries`, `type` (whichever accept the element shape) instead of doing nothing useful. Inside `map(...)` and `filter(...)`, bare pipe-op names like `text`, `length`, `to_entries` are legal expressions in lambé (sugar for `. | op`), so the completer should offer them as candidates when the user is typing a partial name without a leading `.`. Implementation: a third remainder context parser (`_bareIdentCtx`) matches a partial identifier with no leading `.` or `|`. When the parsed AST is a `Pipe` with a parameterised op and the remainder matches a non-empty bare identifier, candidates are pipe ops accepted on the element shape of the pipe input. Surfaced during the 0.9.0 live REPL smoke test: the new `text` op makes `map(text)` a useful and discoverable pattern, but the completer couldn't help users find it. Five new tests pin the behaviour: - `map(t` → t-prefix pipe ops accepted on element shape - `filter(le` → `length` (accepts list element shape) - `map(.t` → field completion takes precedence (dot present) - `map(` → field completion (empty bare partial doesn't trigger) - bare `t` at top level → no pipe-op completion (not in pipe ctx) 100/100 completer tests pass (was 95).

…ke test Two REPL-related fixes landed during the 0.9.0 live smoke test: keyword-colouring for pipe op names (highlighter), and Tab completion for bare pipe ops inside `map(...)` / `filter(...)` (completer). Both are user-visible behaviour changes worth documenting under their existing or new REPL subsections.

Surfaced during the 0.9.0 live REPL smoke test. CHANGELOG paragraphs have soft line wraps in source ("queries\nagainst it"), and the prior empty-string-on-break behaviour produced "queriesagainst it" — words ran together at every wrap. mdast-util-to-string has the same default and is widely cited as awkward for this reason. New behaviour: - soft_break (single newline in source, paragraph continuation) → ' '. Preserves word boundaries without imposing line structure. - hard_break (`\` at end of line or two trailing spaces, explicit break) → '\n'. Preserves authorial intent. Users who want a fully flat string can post-process with a whitespace collapser. Deliberate divergence from mdast-util-to-string. The op's docstring documents the choice; new tests pin both behaviours. CommonMark parsers emit hard_break + soft_break in sequence for "hello \nworld" (the explicit break followed by the line wrap), so the hard-break test asserts the relevant invariants (newline present, words present) rather than a literal expected-string match.

Surfaced during the 0.9.0 live REPL smoke test: `.children | map(.<TAB>` on a real markdown CHANGELOG returned no candidates, even though every visible child is a {type, level, children} map. The static shape system collapses heterogeneous lists to `SList<SAny>` (correct, conservative), which gives completion no field hints to offer. Fix: when the static element shape resolves to `SAny` and we have the underlying `data`, navigate the actual values along the pipe's input AST and shape-of the first element. The recovered shape feeds back into the existing field-completion path. No structural change to the Shape ADT. `_navigate` is deliberately restricted to a small AST shape (Identity / Field / Access / Index, plus shape-preserving pipe ops filter / sort / sort_by / unique / unique_by / reverse). Per-element ops like map / group_by / to_entries change the element family and are excluded — better to give up than to guess at the new shape. Completion never runs the user's query — only structural navigation — so cost stays bounded (one `[0]` access per Pipe step). Eight new tests pin: positive cases on a heterogeneous-by-`type` list (map/filter/sort_by/reverse threading); empty list and null data fall back gracefully to no candidates. 107/107 completer tests pass.

…tion fixes Two more fixes from the 0.9.0 live REPL smoke test: `text` op's break-handling change (soft → space, hard → newline) and the completer's data-sampling fallback for heterogeneous lists. Both are real user-visible behaviour changes.

`dart pub publish --dry-run` revealed three classes of dev-only content shipping to pub.dev: 1. The local `lam` AOT binary (7MB, Linux x86_64). Useless to pub consumers — they'll either `dart run` from source or `dart pub global activate` which rebuilds. Now in `.pubignore`. 2. Scratch planning docs (`*.scratch.md` ~75KB). Already gitignored for the working tree, but `.pubignore` doesn't inherit from `.gitignore`, so they were leaking into published payloads. Added the matching `.pubignore` rule (with a comment noting why we repeat ourselves). 3. `tool/probe_completer.dart` — manual exploratory probe used during completer development. Test coverage in `test/completer_test.dart` has long since superseded it. Deleted. Net: 3MB compressed → 214KB compressed. The package now contains only what users need: source, tests, docs, executables (`bin/lam.dart`, `bin/mcp_server.dart`, `tool/release_prep.sh`, `tool/manpage.dart`, `tool/gen_version.dart`, `tool/lint_changelog.sh`, `install.sh`).

server.json is the MCP registry manifest template, regenerated by .github/workflows/release.yml at release time and consumed by the MCP registry publish flow. It carries a placeholder version (`0.0.0-placeholder`) that would be actively misleading if shipped to pub.dev. No runtime code references it; only `tool/release_prep.sh` (release-time validation) and the GitHub workflow itself. Pub clients never use it; exclude it.

The audit pass flagged that lambé's typed-ADT walks (Shape, LamExpr, JsonValue) consistently use switch expressions, but four spots that walk the untyped Object? JSON model still used is-cascade if-else chains. Each had the same null/bool/num/String/List/Map shape; each fits cleanly into a Dart 3 switch with type patterns. Touched: - evaluator.dart#_index — list/map/string indexing - evaluator.dart#_slice — list/string slicing - output.dart#_describeCellKind — list/map/runtimeType labelling - repl.dart#_colorJson — null/bool/num/string/list/map JSON colorizer Each conversion is length-equivalent or shorter; no behavior change; exhaustively typed against the inhabited Object? cases. The pattern "is-cascades belong at the typed/untyped boundary, not in the typed core" now holds throughout. 1652/1652 tests pass.

…mendations Two cleanups in bin/mcp_server.dart surfaced by the audit pass plus a real find while reading the file: 1. The error path of every handler built `CallToolResult(content: [TextContent(text: ...)], isError: true)` inline — eight repetitions of the same boilerplate. Extracted to `_errorResult(message)`. The non-error result builder stays inline since it's rare and doesn't benefit from the abstraction. 2. The MCP server's `instructions` and `Markdown query patterns` blocks still recommended `.children[0].text` for heading text extraction, which is structurally wrong for non-trivial markdown (nested emphasis, links, inline code). The 0.9.0 `text` op was created exactly to replace this pattern; AGENTS.md and the recipes were updated in earlier commits, but the MCP instructions text hadn't been. Now uses `text` everywhere, with an explicit note about why `.children[0].text` was wrong. The `_handleCheck` `{"ok": false, "error": ...}` JSON-shaped error path stays inline — it's structurally different from the standard "Error: ..." prefixed `isError: true` shape and shouldn't share the helper. 1652/1652 tests pass.

…t switch `parseAst` and `queryString` shared a near-identical `switch (result) { Success() => value, Partial() | Failure() => throw ... }` pattern. Audit flagged the duplication. queryString now calls parseAst to get the AST, removing the second copy of the parse-error rendering. The shared parse path means the error message for queryString and parseAst is guaranteed to match exactly. 1652/1652 tests pass; no API surface change.

…pub.dev The pre-0.9.0 audit pass + cross-vendor research found that lambé's agent-facing docs were split across AGENTS.md (CLI cheat sheet) and AI.md (natural-language → query table, syntax reference, markdown data model) — but each agent platform reads only one of them, so no agent saw the full picture. Consolidates both into a single AGENTS.md focused on pure tool-use guidance: - "When / when not to" decision aid (from AI.md) - Natural language → query table (the highest-leverage single artifact) - Syntax reference (property access, pipeline ops, expressions) - CLI flags worth knowing (-n, --null-input; --no-pretty; --explain; --print-shape; --schema; --assert; --ndjson; --flatten-cells) - Markdown data model with the `text` op recommendation (replaces the broken `.children[0].text` pattern) - Error patterns (null vs throw, OutputShapeError, parse error) - Format auto-detection rules - Library API one-liners + lambe_test matchers - MCP server framed as the "if shell access isn't available" fallback, not the primary distribution Excluded from the pub.dev publish payload via `.pubignore` because pub.dev consumers are Dart developers, not AI agents working in a checked-out repo. Same exclusion applies to `.claude/` and `.agents/` (skill directories that may land later). The README.md is the Dart-developer-facing surface; AGENTS.md is the agent-facing surface on GitHub. Updated `test/doc_examples_test.dart` to parse and evaluate every `lam '...'` example in the consolidated AGENTS.md against a fixture, guarding against doc drift the same way it did for AI.md before. 38/38 doc-example tests pass.

Adds a SKILL.md following the Agent Skills open standard (agentskills.io / agentskills.so) — the cross-vendor format adopted by Anthropic, Microsoft (Microsoft Agent Framework), Vercel, and others. Key facts about the format: - Discovery cost is ~100 tokens per skill (name + description in the system prompt at session start). Activation loads the full body only when the agent identifies a matching task. - Format is byte-identical across vendors: YAML frontmatter (name, description, optional license/compatibility/metadata) + Markdown body, recommended ≤500 lines. - Cross-tool: Claude (Code, .ai, API), Microsoft Agent Framework, agentskills.so registry. Gemini CLI also reads the format but is being sunset June 18 and replaced with Antigravity CLI; treat the Google angle as wobbly. The skill body is a tighter subset of AGENTS.md — focused on "core moves" plus the markdown data model (lambé's most distinctive feature) plus the common gotchas. AGENTS.md remains the broader reference loaded by repo-rooting agents (Cursor, Copilot, Claude Code project mode); the skill is the focused entry point loaded on-demand when the agent identifies a structured-data-query task. `.gitignore` adjusted: `.claude/*` (not `.claude/`) so the negation re-including `.claude/skills/` actually fires. Without this git won't descend into directories ignored by name and `!` patterns can't reach inside. `.pubignore` already excludes `.claude/` from the pub.dev publish payload; the skill is for GitHub / Agent-Skills-compatible clients, not for Dart developers running `dart pub global activate lambe`.

CI was using `dart-lang/setup-dart@v1.7.2` without an `sdk:` parameter, which floats to whatever the stable channel's latest is at job time. Locally we were on 3.11.4 and CI fetched 3.12.0, so the formatter disagreed: 3.12.0 wraps a long `test('...', () => fail(...))` call differently than 3.11.4 did, and the format job correctly flagged the drift. Pin all four jobs (analyze, format, test, lint-changelog) plus the release workflow to `sdk: 3.12.0` so local and CI agree. Reformat the single affected file under 3.12.0. Local Dart bumped to 3.12.0 to match. The same pin needs to land in rumil-dart's CI to keep the family coherent — separate PR there.

GitHub Actions runners batch a child process's stdout in a way that defeats the timing assertions: the parent test's `process.stdout.forEach` receives all four lines together at EOF, even though `lam` itself emits them line-by-line as they arrive (verified locally against TTY and file-redirected stdout). The test is a useful local smoke check that lambé's --ndjson actually flushes per line, but it isn't reliable under CI's stdio plumbing. Skip when CI=true is set; keep the assertion local. The test still runs in every developer environment.

hakimjonas added 30 commits May 2, 2026 21:14

chore: gitignore *.scratch.md for local scratch notes

24702e8

Mirrors the same convention added to rumil-dart's .gitignore. Lets release-planning notes, status snapshots, and similar working-memory documents live in the repo for discoverability without ever getting committed.

hakimjonas added 25 commits May 22, 2026 09:07

ci: gate every push on tool/lint_changelog.sh

dfdfaca

New `lint-changelog` job compiles the AOT `lam` binary and runs the script. Reuses `dart-lang/setup-dart` like the existing jobs.

docs(CHANGELOG): note self-validation tooling under 0.9.0

9f5b440

Adds a Tooling subsection mentioning `tool/lint_changelog.sh` and the four invariants it gates.

hakimjonas changed the title ~~Test/rumil 0.7~~ Lambé 0.9.0 May 23, 2026

hakimjonas added 2 commits May 23, 2026 23:46

hakimjonas merged commit ae41947 into main May 23, 2026
4 checks passed

hakimjonas deleted the test/rumil-0.7 branch May 23, 2026 21:54

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Lambé 0.9.0#7

Lambé 0.9.0#7
hakimjonas merged 67 commits into
mainfrom
test/rumil-0.7

hakimjonas commented May 23, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

hakimjonas commented May 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test plan

Companion release

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

hakimjonas commented May 23, 2026 •

edited

Loading