From f8375bef6254392a8437da0814f2b50a585b7a89 Mon Sep 17 00:00:00 2001 From: Hakim Jonas Ghoula Date: Sat, 2 May 2026 21:14:49 +0200 Subject: [PATCH 01/67] Track D: --flatten-cells json for CSV/TSV, with NotWritable.hints MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit An opt-in escape hatch for CSV/TSV: non-scalar cells encoded as JSON strings inline instead of refused. Default stays at 0.8.0's refuse behavior. Core - CellPolicy { refuse, json } enum in output_format.dart - formatOutput, canWriteAs, canWriteShapeAs, requirementFor, explain all take an optional CellPolicy flattenCells = CellPolicy.refuse - Under json, requirementFor(csv/tsv) widens MustBeFlatList to MustBeList; the writer JSON-encodes list- or map-valued cells via const JsonEncoder().convert(cell); the shape check accepts any list at the root - _scalarCell renamed to _cell (no longer always-scalar) - as(fmt) combinator deliberately does NOT read the CLI/REPL/MCP policy; stays a pipeline-level transform so queries remain portable NotWritable.hints (new field) - List hints on NotWritable, default const [] - _hintsFor populates one hint when: format is csv/tsv, policy is refuse, and the root shape is already SList (so only the cells are the problem). Hint text names all three surfaces: --flatten-cells json (CLI), :flatten-cells json (REPL), flatten_cells=json (MCP) - OutputShapeError.hints getter; _render appends each hint on its own line after the suggestion list - Uniform channel means CLI, REPL, and MCP render the same guidance without re-deriving the condition --explain - explain() takes CellPolicy; threads into canWriteShapeAs for the writability lists - ExplainReport.flattenCells field round-trips the policy - renderExplain emits "Cell policy: json" footer only when non-default, so default output is byte-for-byte unchanged CLI (bin/lam.dart) - --flatten-cells option, allowed [refuse, json], defaults to refuse - Threaded into _writeWithBridge and the --explain path REPL (lib/src/repl.dart) - :flatten-cells session command with validation - Threaded through _formatResult, _encode, _handleShapeError - :help entry MCP (bin/mcp_server.dart) - flatten_cells parameter on lambe_query inputSchema - Threaded into formatOutput; JSON bypass path unchanged - hints key in _renderShapeErrorPayload Docs - doc/lam.1.md: --flatten-cells option and :flatten-cells REPL command - doc/lam.1: regenerated via tool/manpage.dart - CHANGELOG.md: new 0.9.0-dev section - README.md: non-scalar-cells subsection + CLI example Tests (+106) - csv_element_shape_test.dart: 5 hint tests - shape_explain_test.dart: 4 CellPolicy threading tests - shape_output_consistency_test.dart: 97-case hint matrix (every representative value × every format, verifying hints fire exactly for csv/tsv refuse + SList root) Quality gates: dart analyze clean, 1256 tests pass (was 1150), dart format clean, pana 160/160, manpage round-trip matches. --- CHANGELOG.md | 27 ++++ README.md | 5 + bin/lam.dart | 28 +++- bin/mcp_server.dart | 23 ++- doc/lam.1 | 6 + doc/lam.1.md | 6 + lib/lambe.dart | 3 +- lib/src/errors.dart | 8 ++ lib/src/output.dart | 67 +++++---- lib/src/output_format.dart | 23 +++ lib/src/repl.dart | 63 +++++++-- lib/src/shape/check.dart | 71 ++++++++-- lib/src/shape/explain.dart | 25 +++- test/csv_element_shape_test.dart | 179 ++++++++++++++++++++++++ test/shape_explain_test.dart | 52 +++++++ test/shape_output_consistency_test.dart | 96 +++++++++++++ 16 files changed, 625 insertions(+), 57 deletions(-) diff --git a/CHANGELOG.md b/CHANGELOG.md index 091e480..9ceda74 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -1,3 +1,30 @@ +## 0.9.0-dev + +In progress. + +### Added + +- **`--flatten-cells` option for CSV/TSV output.** Accepts `refuse` + (default, 0.8.0 behavior) or `json`. Under `json`, non-scalar cells + are encoded as JSON strings inline; the shape check widens + `MustBeFlatList` to `MustBeList` for csv/tsv. Available in the CLI + (`--flatten-cells`), the REPL (`:flatten-cells`), the MCP server + (`flatten_cells` parameter), and as a `CellPolicy flattenCells` + named parameter on `formatOutput`, `canWriteAs`, `canWriteShapeAs`, + `requirementFor`, and `explain`. Round-tripping the resulting CSV + back into Lambë does not recover the original structure; this is + an output-side escape hatch, not a faithful encoding. +- **`NotWritable.hints`.** A list of strings surfacing environmental + guidance (flags, settings) relevant to the mismatch. The first such + hint covers the `--flatten-cells json` escape hatch: when a + CSV/TSV request rejects under `refuse` but a list root is already + present, the hint points at the equivalent CLI flag, REPL command, + and MCP parameter. Uniform channel across CLI, REPL, and MCP so + tools don't re-derive the condition. +- **`ExplainReport.flattenCells`.** The cell policy the report was + generated under. `renderExplain` prints `Cell policy: json` as a + footer when non-default; default output is byte-for-byte unchanged. + ## 0.8.0 Adds element-level shape checking for CSV/TSV output, union headers diff --git a/README.md b/README.md index 0750561..674514b 100644 --- a/README.md +++ b/README.md @@ -57,6 +57,10 @@ The same flow applies to CSV and TSV (which require a list of records at the roo Suggestions surface the intent-level `as()` form. The explanation names the raw fragment (`{value: .}`, `to_entries`, etc.) the bridge composes, so `--explain` and manual composition stay available to anyone who wants them. +### Non-scalar cells in CSV/TSV + +By default, nested lists or maps in CSV/TSV cells are rejected — there is no faithful delimited rendering for them. When you need a quick export and lossy is acceptable, pass `--flatten-cells json` (CLI) or `:flatten-cells json` (REPL) to encode them as JSON strings inline. Round-tripping the resulting file back into Lambë does not recover the original structure; prefer reshaping the data query-side when fidelity matters. + ### `as(fmt)` — bridging in the query language When the shape of the target format is known up front, `as(fmt)` performs the bridge inside the query. The combinator is a no-op when the input already satisfies the target, applies a single curated bridge when one exists, and lists the candidates when more than one could apply. @@ -181,6 +185,7 @@ lam --assert '.replicas >= 2' deployment.yaml lam --to yaml '.config' data.json lam --to csv '.users | map({name, age})' data.json lam --to toml '.config | as(toml)' data.json +lam --to csv --flatten-cells json '.users' data.json # encode nested cells as JSON # Query any format (auto-detected from extension) lam '. | filter(.status != "closed")' issues.csv diff --git a/bin/lam.dart b/bin/lam.dart index d9bee8e..41e6b9a 100644 --- a/bin/lam.dart +++ b/bin/lam.dart @@ -34,6 +34,15 @@ void main(List arguments) { help: 'Output format', allowed: ['json', 'yaml', 'toml', 'csv', 'tsv', 'hcl'], ) + ..addOption( + 'flatten-cells', + help: + 'CSV/TSV policy for non-scalar cells. ' + 'refuse (default) rejects them; json encodes them as ' + 'JSON strings inline.', + allowed: ['refuse', 'json'], + defaultsTo: 'refuse', + ) ..addFlag( 'schema', help: 'Show data structure without values', @@ -183,7 +192,8 @@ void main(List arguments) { exit(1); } final inputShape = data == null ? const SAny() : shapeOf(data); - final report = explain(ast, inputShape); + final cellPolicy = CellPolicy.values.byName(args.option('flatten-cells')!); + final report = explain(ast, inputShape, flattenCells: cellPolicy); stdout.write(renderExplain(report)); return; } @@ -225,12 +235,14 @@ void main(List arguments) { final toArg = args.option('to'); if (toArg != null) { final outputFormat = OutputFormat.values.byName(toArg); + final cellPolicy = CellPolicy.values.byName(args.option('flatten-cells')!); _writeWithBridge( result, outputFormat, pretty: args.flag('pretty'), queryAst: queryAst, data: data, + flattenCells: cellPolicy, ); } else if (args.flag('raw') && result is String) { stdout.writeln(result); @@ -253,9 +265,12 @@ void _writeWithBridge( required bool pretty, required LamExpr queryAst, required Object? data, + required CellPolicy flattenCells, }) { try { - stdout.writeln(formatOutput(result, fmt, pretty: pretty)); + stdout.writeln( + formatOutput(result, fmt, pretty: pretty, flattenCells: flattenCells), + ); return; } on OutputShapeError catch (e) { if (!(stdin.hasTerminal && stdout.hasTerminal)) { @@ -274,7 +289,14 @@ void _writeWithBridge( final bridged = applyBridge(queryAst, choice.template); try { final Object? newResult = evaluateAst(bridged, data); - stdout.writeln(formatOutput(newResult, fmt, pretty: pretty)); + stdout.writeln( + formatOutput( + newResult, + fmt, + pretty: pretty, + flattenCells: flattenCells, + ), + ); } on QueryError catch (e2) { stderr.writeln('Error applying "${choice.display}": ${e2.message}'); exit(1); diff --git a/bin/mcp_server.dart b/bin/mcp_server.dart index df2d91c..86e6962 100644 --- a/bin/mcp_server.dart +++ b/bin/mcp_server.dart @@ -168,6 +168,14 @@ base class LambeServer extends MCPServer with ToolsSupport { 'list of lists).', values: ['json', 'yaml', 'toml', 'csv', 'tsv', 'hcl'], ), + 'flatten_cells': UntitledSingleSelectEnumSchema( + description: + 'CSV/TSV policy for non-scalar cells. refuse (default) ' + 'rejects list- or map-valued cells with a shape error; ' + 'json encodes them as JSON strings inline. Ignored for ' + 'other output formats.', + values: ['refuse', 'json'], + ), }, required: ['expression', 'data'], ), @@ -179,6 +187,7 @@ base class LambeServer extends MCPServer with ToolsSupport { final data = args['data'] as String; final formatStr = args['format'] as String?; final outputFormatStr = args['output_format'] as String?; + final flattenCellsStr = args['flatten_cells'] as String?; try { final format = formatStr != null ? Format.values.byName(formatStr) : null; @@ -187,10 +196,14 @@ base class LambeServer extends MCPServer with ToolsSupport { outputFormatStr != null ? OutputFormat.values.byName(outputFormatStr) : OutputFormat.json; + final flattenCells = + flattenCellsStr != null + ? CellPolicy.values.byName(flattenCellsStr) + : CellPolicy.refuse; final rendered = outputFormat == OutputFormat.json ? const JsonEncoder.withIndent(' ').convert(result) - : formatOutput(result, outputFormat); + : formatOutput(result, outputFormat, flattenCells: flattenCells); return CallToolResult(content: [TextContent(text: rendered)]); } on OutputShapeError catch (e) { return CallToolResult( @@ -214,11 +227,14 @@ base class LambeServer extends MCPServer with ToolsSupport { /// consumption. /// /// The payload has keys `error`, `message`, `format`, `got_shape`, - /// `original_expression`, and `suggestions`. Each entry in + /// `original_expression`, `suggestions`, and `hints`. Each entry in /// `suggestions` carries a 1-based `id`, a `label`, a `template_text` /// (the query-fragment source), an `apply_as` (the complete query /// formed by appending the template to the original expression via - /// `|`), and an `explanation`. + /// `|`), and an `explanation`. `hints` is a list of strings + /// describing environmental remedies (tool parameters, CLI flags) + /// that would resolve the mismatch without changing the query; + /// empty when no such remedy exists. String _renderShapeErrorPayload(OutputShapeError e, String expression) => const JsonEncoder.withIndent(' ').convert({ 'error': 'output_shape_mismatch', @@ -236,6 +252,7 @@ base class LambeServer extends MCPServer with ToolsSupport { 'explanation': e.suggestions[i].explanation, }, ], + 'hints': e.hints, }); final _schemaTool = Tool( diff --git a/doc/lam.1 b/doc/lam.1 index c397693..3cb7f1c 100644 --- a/doc/lam.1 +++ b/doc/lam.1 @@ -33,6 +33,9 @@ Input format. One of: json, yaml, toml, hcl, csv, tsv, markdown. Auto-detected f \fB-t\fR, \fB--to\fR \fIFMT\fR Output format. One of: json, yaml, toml, csv, tsv, hcl. Default is json. .TP +\fB--flatten-cells\fR \fIPOLICY\fR +CSV/TSV policy for non-scalar cells. \fBrefuse\fR (default) rejects list- or map-valued cells with a shape error. \fBjson\fR encodes them as JSON strings inline; the shape check correspondingly widens to accept any list at the root. Ignored for other output formats. +.TP \fB--schema\fR Show the data structure with type names instead of values. .TP @@ -171,6 +174,9 @@ Toggle unquoted string output. \fB:pretty\fR Toggle pretty-printing. .TP +\fB:flatten-cells\fR \fIPOLICY\fR +Set CSV/TSV cell policy for this session. One of: refuse, json. +.TP \fB:load\fR \fIfile\fR Load a different data file. .TP diff --git a/doc/lam.1.md b/doc/lam.1.md index 8c86380..d943ba7 100644 --- a/doc/lam.1.md +++ b/doc/lam.1.md @@ -41,6 +41,9 @@ If no file is given, reads from standard input. **-t**, **--to** *FMT* : Output format. One of: json, yaml, toml, csv, tsv, hcl. Default is json. +**--flatten-cells** *POLICY* +: CSV/TSV policy for non-scalar cells. **refuse** (default) rejects list- or map-valued cells with a shape error. **json** encodes them as JSON strings inline; the shape check correspondingly widens to accept any list at the root. Ignored for other output formats. + **--schema** : Show the data structure with type names instead of values. @@ -193,6 +196,9 @@ Computation on null throws: **null + 5** and **null > 3** are errors. **:pretty** : Toggle pretty-printing. +**:flatten-cells** *POLICY* +: Set CSV/TSV cell policy for this session. One of: refuse, json. + **:load** *file* : Load a different data file. diff --git a/lib/lambe.dart b/lib/lambe.dart index f9eaf35..10878f3 100644 --- a/lib/lambe.dart +++ b/lib/lambe.dart @@ -29,7 +29,8 @@ export 'src/ast.dart'; export 'src/errors.dart'; export 'src/input.dart' show Format, detectFormat, sniffFormat, parseInput, mdToNative; -export 'src/output.dart' show OutputFormat, formatOutput, inferSchema; +export 'src/output.dart' + show OutputFormat, CellPolicy, formatOutput, inferSchema; export 'src/shape/shape.dart' show Shape, diff --git a/lib/src/errors.dart b/lib/src/errors.dart index 0f7d337..a3b2cac 100644 --- a/lib/src/errors.dart +++ b/lib/src/errors.dart @@ -47,6 +47,10 @@ class OutputShapeError extends QueryError { /// Query-fragment suggestions that would produce a compatible shape. List get suggestions => report.suggestions; + /// Environmental hints (CLI flags, REPL settings, MCP parameters) + /// that would resolve the mismatch without altering the query. + List get hints => report.hints; + static String _render(NotWritable r) { final buf = StringBuffer(); buf.write(r.format.name.toUpperCase()); @@ -64,6 +68,10 @@ class OutputShapeError extends QueryError { buf.write(s.explanation); } } + for (final h in r.hints) { + buf.write('\n'); + buf.write(h); + } return buf.toString(); } } diff --git a/lib/src/output.dart b/lib/src/output.dart index 7c86822..89bea63 100644 --- a/lib/src/output.dart +++ b/lib/src/output.dart @@ -9,7 +9,7 @@ import 'errors.dart'; import 'output_format.dart'; import 'shape/check.dart'; -export 'output_format.dart' show OutputFormat; +export 'output_format.dart' show OutputFormat, CellPolicy; /// Format [value] as a string in the given [format]. /// @@ -19,23 +19,32 @@ export 'output_format.dart' show OutputFormat; /// For CSV/TSV, requires a list of maps, a list of lists, or a list of /// scalars. For a list of maps, headers are the union of keys across /// all rows in first-seen order; a row missing a key renders as an -/// empty cell. Every cell value must be a scalar: null, bool, num, -/// or string. List-of-maps or list-of-lists with non-scalar cells -/// throws [OutputShapeError]; a non-scalar cell that slips past shape -/// inference (for example via [SAny]) throws [QueryError] at -/// serialization time. -String formatOutput(Object? value, OutputFormat format, {bool pretty = true}) => - switch (format) { - OutputFormat.json => - pretty - ? const JsonEncoder.withIndent(' ').convert(value) - : const JsonEncoder().convert(value), - OutputFormat.yaml => _toYaml(value), - OutputFormat.toml => _toToml(value), - OutputFormat.csv => _toCsv(value, ','), - OutputFormat.tsv => _toCsv(value, '\t'), - OutputFormat.hcl => _toHcl(value), - }; +/// empty cell. +/// +/// Cell handling in CSV/TSV is governed by [flattenCells]. With the +/// default [CellPolicy.refuse], every cell value must be a scalar: +/// null, bool, num, or string. Non-scalar cells throw [OutputShapeError] +/// from the shape check, or [QueryError] from the writer's defensive +/// guard if the shape check was too lossy to prove incompatibility. +/// With [CellPolicy.json], non-scalar cells are encoded as JSON +/// strings inline; the shape check widens to accept any list at the +/// root. +String formatOutput( + Object? value, + OutputFormat format, { + bool pretty = true, + CellPolicy flattenCells = CellPolicy.refuse, +}) => switch (format) { + OutputFormat.json => + pretty + ? const JsonEncoder.withIndent(' ').convert(value) + : const JsonEncoder().convert(value), + OutputFormat.yaml => _toYaml(value), + OutputFormat.toml => _toToml(value), + OutputFormat.csv => _toCsv(value, ',', flattenCells), + OutputFormat.tsv => _toCsv(value, '\t', flattenCells), + OutputFormat.hcl => _toHcl(value), +}; /// Infer the structure of [value] without showing actual data. /// @@ -81,9 +90,9 @@ String _toToml(Object? value) { return serializeToml(doc); } -String _toCsv(Object? value, String delimiter) { +String _toCsv(Object? value, String delimiter, CellPolicy policy) { final fmt = delimiter == '\t' ? OutputFormat.tsv : OutputFormat.csv; - final report = canWriteAs(value, fmt); + final report = canWriteAs(value, fmt, flattenCells: policy); if (report is NotWritable) throw OutputShapeError(report); final list = value as List; final config = DelimitedConfig(delimiter: delimiter); @@ -96,7 +105,7 @@ String _toCsv(Object? value, String delimiter) { for (final map in maps) [ for (final h in headers) - map.containsKey(h) ? _scalarCell(map[h], fmt) : '', + map.containsKey(h) ? _cell(map[h], fmt, policy) : '', ], ]; return serializeCsvWithHeaders(headers, rows, config: config); @@ -105,29 +114,33 @@ String _toCsv(Object? value, String delimiter) { if (list.first is List) { final rows = [ for (final row in list) - [for (final cell in row as List) _scalarCell(cell, fmt)], + [for (final cell in row as List) _cell(cell, fmt, policy)], ]; return serializeCsv(rows, config: config); } return serializeCsv([ - for (final item in list) [_scalarCell(item, fmt)], + for (final item in list) [_cell(item, fmt, policy)], ], config: config); } -/// Render a single cell for CSV/TSV output, refusing any non-scalar -/// value. +/// Render a single cell for CSV/TSV output. /// -/// The shape check in [_toCsv] is the primary defense; this is a +/// Under [CellPolicy.refuse], non-scalar cells throw [QueryError]. The +/// shape check in [_toCsv] is the primary defense; this is a /// belt-and-braces guard for cases where the check was bypassed (for /// example, a [SAny] shape that the checker could not prove /// incompatible, or heterogeneous list elements that sampling missed). /// Throws [QueryError] rather than [OutputShapeError] because by this /// point the shape check has already passed: reaching here means the /// shape language was unable to prove the mismatch. -String _scalarCell(Object? cell, OutputFormat fmt) { +/// +/// Under [CellPolicy.json], non-scalar cells are JSON-encoded inline +/// as compact strings, and the writer never throws for shape reasons. +String _cell(Object? cell, OutputFormat fmt, CellPolicy policy) { if (cell == null) return ''; if (cell is num || cell is bool || cell is String) return '$cell'; + if (policy == CellPolicy.json) return const JsonEncoder().convert(cell); throw QueryError( '${fmt.name.toUpperCase()} cell must be a scalar, ' 'got ${_describeCellKind(cell)}.', diff --git a/lib/src/output_format.dart b/lib/src/output_format.dart index 9db1f8a..1203d40 100644 --- a/lib/src/output_format.dart +++ b/lib/src/output_format.dart @@ -22,3 +22,26 @@ enum OutputFormat { /// HCL output (root must be a map). hcl, } + +/// Policy for handling non-scalar cells in CSV/TSV output. +/// +/// Delimited formats project rows onto a flat grid of scalar cells. A +/// list-of-maps whose cells hold nested lists or maps has no faithful +/// delimited rendering. This policy controls what the writer does when +/// it encounters such a cell, and correspondingly widens what +/// [requirementFor] accepts at shape-check time. +enum CellPolicy { + /// Refuse to serialize: the shape check rejects non-scalar element + /// shapes, and the writer's defensive guard throws when a non-scalar + /// cell slips past (for example via [SAny]). This is the 0.8.0 + /// default and the safest choice. + refuse, + + /// Encode non-scalar cells as JSON strings inline. The shape check + /// accepts any list at the root; the writer JSON-encodes list- or + /// map-valued cells rather than refusing them. Round-tripping the + /// resulting CSV back into Lambë does not recover the original + /// structure; this is an output-side escape hatch, not a faithful + /// encoding. + json, +} diff --git a/lib/src/repl.dart b/lib/src/repl.dart index e695276..9d40aaf 100644 --- a/lib/src/repl.dart +++ b/lib/src/repl.dart @@ -31,6 +31,7 @@ void runRepl(Object? data, {OutputFormat format = OutputFormat.json}) { var outputFormat = format; var pretty = true; var raw = false; + var flattenCells = CellPolicy.refuse; final history = _loadHistory(); final rl = ReadLine( @@ -81,6 +82,19 @@ void runRepl(Object? data, {OutputFormat format = OutputFormat.json}) { pretty = !pretty; stdout.writeln('Pretty-printing: ${pretty ? "on" : "off"}'); + case 'flatten-cells' when arg != null: + final policy = + CellPolicy.values.where((p) => p.name == arg).firstOrNull; + if (policy != null) { + flattenCells = policy; + stdout.writeln('Flatten cells: ${policy.name}'); + } else { + stderr.writeln('Usage: :flatten-cells '); + } + + case 'flatten-cells': + stderr.writeln('Usage: :flatten-cells '); + case 'load' when arg != null: final loaded = _loadFile(arg); if (loaded != null) { @@ -123,6 +137,7 @@ void runRepl(Object? data, {OutputFormat format = OutputFormat.json}) { outputFormat, pretty: pretty, raw: raw, + flattenCells: flattenCells, ); if (elapsed >= 100) { stdout.writeln('[${elapsed}ms] $output'); @@ -138,6 +153,7 @@ void runRepl(Object? data, {OutputFormat format = OutputFormat.json}) { outputFormat: outputFormat, pretty: pretty, raw: raw, + flattenCells: flattenCells, ); } } on QueryError catch (e) { @@ -161,6 +177,7 @@ void _handleShapeError( required OutputFormat outputFormat, required bool pretty, required bool raw, + required CellPolicy flattenCells, }) { stderr.writeln('Error: ${e.message}'); if (e.suggestions.isEmpty) return; @@ -183,7 +200,13 @@ void _handleShapeError( try { final result = evaluateAst(bridged, data); stdout.writeln( - _formatResult(result, outputFormat, pretty: pretty, raw: raw), + _formatResult( + result, + outputFormat, + pretty: pretty, + raw: raw, + flattenCells: flattenCells, + ), ); } on QueryError catch (e2) { stderr.writeln('Error applying "${choice.display}": ${e2.message}'); @@ -272,21 +295,32 @@ String _formatResult( OutputFormat format, { required bool pretty, required bool raw, + required CellPolicy flattenCells, }) { if (raw && result is String) return result; if (result is List && result.length > 10) { final truncated = result.sublist(0, 10); final rest = result.length - 10; - return '${_encode(truncated, format, pretty: pretty)}\n... and $rest more'; + return '${_encode(truncated, format, pretty: pretty, flattenCells: flattenCells)}\n... and $rest more'; } - return _encode(result, format, pretty: pretty); + return _encode(result, format, pretty: pretty, flattenCells: flattenCells); } -String _encode(Object? value, OutputFormat format, {required bool pretty}) { +String _encode( + Object? value, + OutputFormat format, { + required bool pretty, + required CellPolicy flattenCells, +}) { if (format != OutputFormat.json) { - return formatOutput(value, format, pretty: pretty); + return formatOutput( + value, + format, + pretty: pretty, + flattenCells: flattenCells, + ); } if (stdout.hasTerminal && pretty) { return _colorJson(value, 0); @@ -379,16 +413,19 @@ Object? _loadFile(String path) { void _printHelp() { stdout.writeln('Commands:'); - stdout.writeln(' :schema Show data structure'); + stdout.writeln(' :schema Show data structure'); + stdout.writeln( + ' :to Set output format (json, yaml, toml, csv, tsv, hcl)', + ); + stdout.writeln(' :raw Toggle raw string output'); + stdout.writeln(' :pretty Toggle pretty-printing'); stdout.writeln( - ' :to Set output format (json, yaml, toml, csv, tsv, hcl)', + ' :flatten-cells CSV/TSV cell policy (refuse, json)', ); - stdout.writeln(' :raw Toggle raw string output'); - stdout.writeln(' :pretty Toggle pretty-printing'); - stdout.writeln(' :load Load a different data file'); - stdout.writeln(' :history Show query history'); - stdout.writeln(' :help Show this help'); - stdout.writeln(' :quit, :q Exit'); + stdout.writeln(' :load Load a different data file'); + stdout.writeln(' :history Show query history'); + stdout.writeln(' :help Show this help'); + stdout.writeln(' :quit, :q Exit'); stdout.writeln(); stdout.writeln('Shortcuts: Tab for completion, Up/Down for history'); } diff --git a/lib/src/shape/check.dart b/lib/src/shape/check.dart index 082e36e..4c1d97c 100644 --- a/lib/src/shape/check.dart +++ b/lib/src/shape/check.dart @@ -130,13 +130,23 @@ final class MustBeFlatList extends ShapeRequirement { } /// The requirement for each supported [OutputFormat]. -ShapeRequirement requirementFor(OutputFormat format) => switch (format) { +/// +/// [flattenCells] relaxes the cell-shape requirement for CSV/TSV. When +/// [CellPolicy.json], a list-of-maps or list-of-lists with non-scalar +/// cells is accepted at shape-check time because the writer will +/// JSON-encode those cells inline. +ShapeRequirement requirementFor( + OutputFormat format, { + CellPolicy flattenCells = CellPolicy.refuse, +}) => switch (format) { OutputFormat.json => const AnyShape(), OutputFormat.yaml => const AnyShape(), OutputFormat.toml => const MustBeMap(), OutputFormat.hcl => const MustBeMap(), - OutputFormat.csv => const MustBeFlatList(), - OutputFormat.tsv => const MustBeFlatList(), + OutputFormat.csv || OutputFormat.tsv => + flattenCells == CellPolicy.json + ? const MustBeList() + : const MustBeFlatList(), }; /// Report returned by [canWriteAs]. @@ -158,7 +168,9 @@ final class Writable extends ShapeReport { /// /// Carries the target [format], the actual [got] shape, the expected /// [required], and a non-empty list of [suggestions] the user can append -/// to their query to produce a shape the format accepts. +/// to their query to produce a shape the format accepts. [hints] surface +/// environmental remedies (CLI flags, REPL settings, MCP parameters) +/// that would change the outcome without modifying the query itself. final class NotWritable extends ShapeReport { /// The output format that was requested. final OutputFormat format; @@ -172,12 +184,20 @@ final class NotWritable extends ShapeReport { /// Query-fragment suggestions that would produce a compatible shape. final List suggestions; + /// Environmental guidance for the consumer (CLI flags, REPL + /// settings, MCP parameters) that would resolve the mismatch without + /// altering the query. Populated when a configuration knob exists; + /// empty otherwise. Suggestions modify the query, hints modify the + /// invocation. + final List hints; + /// Creates a [NotWritable] report. const NotWritable({ required this.format, required this.got, required this.required, required this.suggestions, + this.hints = const [], }); } @@ -273,28 +293,63 @@ final class Remediation { /// Returns [Writable] if the value's shape satisfies the format's /// requirement, otherwise [NotWritable] with suggestions. /// +/// [flattenCells] widens the CSV/TSV element-shape requirement; see +/// [requirementFor]. +/// /// Cost is dominated by [shapeOf] on [value], which is bounded by /// structural depth rather than element count. -ShapeReport canWriteAs(Object? value, OutputFormat format) { +ShapeReport canWriteAs( + Object? value, + OutputFormat format, { + CellPolicy flattenCells = CellPolicy.refuse, +}) { final shape = shapeOf(value); - return canWriteShapeAs(shape, format); + return canWriteShapeAs(shape, format, flattenCells: flattenCells); } /// Shape-only variant of [canWriteAs]. /// /// Prefer this when a [Shape] is already available, for example from /// [inferShape] over a query AST, to avoid re-inferring from a value. -ShapeReport canWriteShapeAs(Shape shape, OutputFormat format) { - final req = requirementFor(format); +ShapeReport canWriteShapeAs( + Shape shape, + OutputFormat format, { + CellPolicy flattenCells = CellPolicy.refuse, +}) { + final req = requirementFor(format, flattenCells: flattenCells); if (req.accepts(shape)) return const Writable(); return NotWritable( format: format, got: shape, required: req, suggestions: _suggestionsFor(shape, format), + hints: _hintsFor(shape, format, flattenCells), ); } +/// Environmental remedies for a shape/format/policy mismatch. +/// +/// Currently one class fires: a CSV/TSV request under the default +/// [CellPolicy.refuse] where the root is already a list, so only the +/// cells are the problem. Switching to [CellPolicy.json] would accept +/// the value as-is. Hints are surfaced via [NotWritable.hints] and +/// rendered in [OutputShapeError]'s message, REPL, and MCP payload by +/// their respective consumers. +List _hintsFor(Shape got, OutputFormat format, CellPolicy policy) { + if (policy != CellPolicy.refuse) return const []; + if (format != OutputFormat.csv && format != OutputFormat.tsv) { + return const []; + } + if (got is! SList) return const []; + // At this point the list root is fine; the rejection must be + // element-level. Flipping to json would accept. + return const [ + 'Or pass --flatten-cells json (CLI) / :flatten-cells json (REPL) / ' + 'flatten_cells=json (MCP) to encode non-scalar cells as JSON ' + 'strings inline.', + ]; +} + List _suggestionsFor(Shape got, OutputFormat format) => switch (( got, format, diff --git a/lib/src/shape/explain.dart b/lib/src/shape/explain.dart index ba3da72..d70a295 100644 --- a/lib/src/shape/explain.dart +++ b/lib/src/shape/explain.dart @@ -69,18 +69,35 @@ final class ExplainReport { /// nothing was flagged. final List warnings; + /// The CSV/TSV cell policy the report was generated under. Default + /// is [CellPolicy.refuse]; pass [CellPolicy.json] to [explain] to + /// get writability lists that reflect the widened element-shape + /// requirement. + final CellPolicy flattenCells; + /// Creates an [ExplainReport]. const ExplainReport({ required this.stages, required this.writableAs, required this.notWritableAs, this.warnings = const [], + this.flattenCells = CellPolicy.refuse, }); } /// Produce an [ExplainReport] for [expr] given [inputShape] as the /// initial context. Pass [SAny] when the input's shape is unknown. -ExplainReport explain(LamExpr expr, Shape inputShape) { +/// +/// [flattenCells] widens the CSV/TSV element-shape requirement so the +/// report's writability lists reflect the policy in effect at the +/// caller (CLI `--flatten-cells`, REPL `:flatten-cells`, MCP +/// `flatten_cells`). Default is [CellPolicy.refuse], matching the +/// library's conservative default. +ExplainReport explain( + LamExpr expr, + Shape inputShape, { + CellPolicy flattenCells = CellPolicy.refuse, +}) { final backbone = _flattenPipe(expr); final stages = []; final warnings = []; @@ -105,7 +122,7 @@ ExplainReport explain(LamExpr expr, Shape inputShape) { final writable = []; final notWritable = []; for (final fmt in OutputFormat.values) { - if (canWriteShapeAs(ctx, fmt) is Writable) { + if (canWriteShapeAs(ctx, fmt, flattenCells: flattenCells) is Writable) { writable.add(fmt); } else { notWritable.add(fmt); @@ -117,6 +134,7 @@ ExplainReport explain(LamExpr expr, Shape inputShape) { writableAs: writable, notWritableAs: notWritable, warnings: warnings, + flattenCells: flattenCells, ); } @@ -337,5 +355,8 @@ String renderExplain(ExplainReport report) { ); buf.write('\n'); } + if (report.flattenCells != CellPolicy.refuse) { + buf.write('Cell policy: ${report.flattenCells.name}\n'); + } return buf.toString(); } diff --git a/test/csv_element_shape_test.dart b/test/csv_element_shape_test.dart index 1fd92a5..82b8e76 100644 --- a/test/csv_element_shape_test.dart +++ b/test/csv_element_shape_test.dart @@ -129,6 +129,66 @@ void main() { }); }); + group('NotWritable.hints surface the --flatten-cells escape hatch', () { + test('csv refuse + non-flat list-of-maps: hint points at the flag', () { + final v = [ + { + 'k': [1, 2], + }, + ]; + final report = canWriteAs(v, OutputFormat.csv) as NotWritable; + expect(report.hints, isNotEmpty); + expect(report.hints.first, contains('--flatten-cells')); + expect(report.hints.first, contains(':flatten-cells')); + expect(report.hints.first, contains('flatten_cells')); + }); + + test('csv under json policy accepts the value, no hint to produce', () { + final v = [ + { + 'k': [1, 2], + }, + ]; + final report = canWriteAs( + v, + OutputFormat.csv, + flattenCells: CellPolicy.json, + ); + expect(report, isA()); + }); + + test('toml mismatch carries no --flatten-cells hint', () { + final report = + canWriteAs([1, 2, 3], OutputFormat.toml) as NotWritable; + expect(report.hints, isEmpty); + }); + + test( + 'csv refuse + map-rooted rejection: no hint (flag would not help)', + () { + final report = + canWriteAs({'a': 1}, OutputFormat.csv) + as NotWritable; + expect(report.hints, isEmpty); + }, + ); + + test('OutputShapeError.message includes the hint text', () { + final v = [ + { + 'k': [1, 2], + }, + ]; + try { + formatOutput(v, OutputFormat.csv); + fail('expected OutputShapeError'); + } on OutputShapeError catch (e) { + expect(e.message, contains('--flatten-cells')); + expect(e.hints, isNotEmpty); + } + }); + }); + group('Defensive writer guard uses descriptive type names', () { test('list cell fires _scalarCell with "list" in the message', () { final heteroRows = [ @@ -146,6 +206,125 @@ void main() { }); }); + group('CSV/TSV with CellPolicy.json encodes non-scalar cells inline', () { + final listOfMapsWithListValue = [ + { + 'key': 'items', + 'value': [1, 2, 3], + }, + ]; + final listOfMapsWithMapValue = [ + { + 'key': 'first', + 'value': {'nested': 'x'}, + }, + ]; + final listOfListsWithListElement = [ + [ + 1, + [2, 3], + ], + ]; + + for (final fmt in [OutputFormat.csv, OutputFormat.tsv]) { + test('${fmt.name}: list-valued cell JSON-encodes', () { + final out = formatOutput( + listOfMapsWithListValue, + fmt, + flattenCells: CellPolicy.json, + ); + expect(out, contains('[1,2,3]')); + expect(out, isNot(contains('{key:'))); + }); + + test('${fmt.name}: map-valued cell JSON-encodes', () { + final out = formatOutput( + listOfMapsWithMapValue, + fmt, + flattenCells: CellPolicy.json, + ); + // JSON's embedded double-quotes get RFC 4180 escaping (doubled + // and quote-wrapped) by the delimited writer regardless of + // delimiter. + expect(out, contains('"{""nested"":""x""}"')); + }); + + test('${fmt.name}: nested-list cell JSON-encodes', () { + final out = formatOutput( + listOfListsWithListElement, + fmt, + flattenCells: CellPolicy.json, + ); + expect(out, contains('[2,3]')); + }); + + test('${fmt.name}: scalar cells still pass through unchanged', () { + final v = [ + {'a': 1, 'b': 'x'}, + {'a': 2, 'b': 'y'}, + ]; + expect( + formatOutput(v, fmt, flattenCells: CellPolicy.json), + formatOutput(v, fmt), + ); + }); + } + + test('canWriteAs widens MustBeFlatList to MustBeList under json', () { + final value = listOfMapsWithListValue; + expect(canWriteAs(value, OutputFormat.csv), isA()); + expect( + canWriteAs(value, OutputFormat.csv, flattenCells: CellPolicy.json), + isA(), + ); + }); + + test('requirementFor csv/tsv returns MustBeList under json policy', () { + expect(requirementFor(OutputFormat.csv), isA()); + expect( + requirementFor(OutputFormat.csv, flattenCells: CellPolicy.json), + isA(), + ); + expect( + requirementFor(OutputFormat.tsv, flattenCells: CellPolicy.json), + isA(), + ); + }); + + test('refuse policy is unchanged from 0.8.0 default', () { + expect( + () => formatOutput( + listOfMapsWithListValue, + OutputFormat.csv, + flattenCells: CellPolicy.refuse, + ), + throwsA(isA()), + ); + }); + + test('json policy is still a scalar-root list rejection for non-list', () { + const scalarRoot = 'hello'; + expect( + canWriteAs(scalarRoot, OutputFormat.csv, flattenCells: CellPolicy.json), + isA(), + ); + }); + + test('json policy: embedded delimiter triggers cell quoting for CSV', () { + final v = [ + { + 'k': [1, 2], + }, + ]; + final csvOut = formatOutput( + v, + OutputFormat.csv, + flattenCells: CellPolicy.json, + ); + expect(csvOut, contains('"[1,2]"')); + }); + }); + group('CSV/TSV preserve every column across heterogeneous-keyed rows', () { test('disjoint keys: both columns appear, rows fill with empties', () { final v = [ diff --git a/test/shape_explain_test.dart b/test/shape_explain_test.dart index eb31071..a7454d4 100644 --- a/test/shape_explain_test.dart +++ b/test/shape_explain_test.dart @@ -282,4 +282,56 @@ void main() { expect(text, isNot(contains('Warning:'))); }); }); + + group('explain: CellPolicy threads through to writability', () { + // A list of maps whose cells hold a list. Under refuse (default), + // csv/tsv are NOT writable; under json, they ARE. + const nonFlatShape = SList( + SMap({'name': SString(), 'tags': SList(SString())}), + ); + + test('default (refuse) rejects csv/tsv for non-flat list-of-maps', () { + final report = explain(_parse('.'), nonFlatShape); + expect(report.writableAs, isNot(contains(OutputFormat.csv))); + expect(report.writableAs, isNot(contains(OutputFormat.tsv))); + expect(report.notWritableAs, contains(OutputFormat.csv)); + expect(report.notWritableAs, contains(OutputFormat.tsv)); + }); + + test('json policy accepts csv/tsv for the same shape', () { + final report = explain( + _parse('.'), + nonFlatShape, + flattenCells: CellPolicy.json, + ); + expect(report.writableAs, contains(OutputFormat.csv)); + expect(report.writableAs, contains(OutputFormat.tsv)); + expect(report.notWritableAs, isNot(contains(OutputFormat.csv))); + expect(report.notWritableAs, isNot(contains(OutputFormat.tsv))); + }); + + test('report.flattenCells round-trips the requested policy', () { + final refuse = explain(_parse('.'), nonFlatShape); + expect(refuse.flattenCells, CellPolicy.refuse); + + final json = explain( + _parse('.'), + nonFlatShape, + flattenCells: CellPolicy.json, + ); + expect(json.flattenCells, CellPolicy.json); + }); + + test('renderExplain emits Cell policy footer only when non-default', () { + final refuse = explain(_parse('.'), nonFlatShape); + expect(renderExplain(refuse), isNot(contains('Cell policy:'))); + + final json = explain( + _parse('.'), + nonFlatShape, + flattenCells: CellPolicy.json, + ); + expect(renderExplain(json), contains('Cell policy: json')); + }); + }); } diff --git a/test/shape_output_consistency_test.dart b/test/shape_output_consistency_test.dart index 56f4b65..a1895b5 100644 --- a/test/shape_output_consistency_test.dart +++ b/test/shape_output_consistency_test.dart @@ -107,6 +107,102 @@ void main() { } }); + group('canWriteAs agrees with formatOutput under CellPolicy.json', () { + for (final entry in _representatives.entries) { + final label = entry.key; + final value = entry.value; + for (final fmt in [OutputFormat.csv, OutputFormat.tsv]) { + test('$label as ${fmt.name} with json policy', () { + final report = canWriteAs(value, fmt, flattenCells: CellPolicy.json); + Object? thrown; + try { + formatOutput(value, fmt, flattenCells: CellPolicy.json); + } catch (e) { + thrown = e; + } + + switch (report) { + case Writable(): + expect( + thrown, + isNot(isA()), + reason: + 'canWriteAs(flattenCells: json) said Writable for ' + '$label -> ${fmt.name}, but formatOutput raised ' + 'OutputShapeError. Under json policy the writer ' + 'must accept any list shape the check accepts.', + ); + case NotWritable(): + expect( + thrown, + isA(), + reason: + 'canWriteAs(flattenCells: json) said NotWritable ' + 'for $label -> ${fmt.name}, but formatOutput did ' + 'not raise OutputShapeError. Widened check and ' + 'widened writer must agree on rejection too.', + ); + } + }); + } + } + }); + + group('NotWritable.hints fire exactly for CSV/TSV refuse + SList root', () { + for (final entry in _representatives.entries) { + final label = entry.key; + final value = entry.value; + for (final fmt in OutputFormat.values) { + test('$label as ${fmt.name} under refuse', () { + final report = canWriteAs(value, fmt); + if (report is! NotWritable) return; // hints only on rejection. + final isListRoot = value is List; + final isDelimited = + fmt == OutputFormat.csv || fmt == OutputFormat.tsv; + if (isListRoot && isDelimited) { + expect( + report.hints, + isNotEmpty, + reason: + 'List-root rejection under csv/tsv refuse should surface ' + 'the --flatten-cells hint for $label -> ${fmt.name}.', + ); + expect(report.hints.first, contains('--flatten-cells')); + } else { + expect( + report.hints, + isEmpty, + reason: + 'Hint should not fire for $label -> ${fmt.name}: the flag ' + 'would not resolve this mismatch.', + ); + } + }); + } + } + + test('json policy never produces hints (nothing left to recommend)', () { + for (final entry in _representatives.entries) { + for (final fmt in [OutputFormat.csv, OutputFormat.tsv]) { + final report = canWriteAs( + entry.value, + fmt, + flattenCells: CellPolicy.json, + ); + if (report is NotWritable) { + expect( + report.hints, + isEmpty, + reason: + 'Already under json policy; no further --flatten-cells ' + 'hint should be added for ${entry.key} -> ${fmt.name}.', + ); + } + } + } + }); + }); + group('Writer never silently stringifies non-scalar CSV/TSV cells', () { final offenders = { 'list of maps with a list-valued cell': [ From ad449db07e937babded29e4d3772420491e51967 Mon Sep 17 00:00:00 2001 From: Hakim Jonas Ghoula Date: Sat, 2 May 2026 21:24:04 +0200 Subject: [PATCH 02/67] Track D audit: structured Hint type instead of baked multi-surface string Pre-commit audit on track D caught a real bug in NotWritable.hints before track B cements the pattern. The problem: hints were List with CLI/REPL/MCP syntax baked into a single string. An MCP agent receiving the error got "--flatten-cells json (CLI) / :flatten-cells json (REPL) / flatten_cells=json (MCP)" as an undifferentiated blob, and had to string-parse to find the actionable parameter. A REPL user saw CLI flag syntax they could not type; a CLI user saw REPL colon-commands. The fix: structured Hint type in lib/src/shape/check.dart, exported from package:lambe/lambe.dart. Each hint carries label, cliFlag, replCommand, mcpParameter (a (String, String) record), and explanation. Each surface renders only its native form: - OutputShapeError.message: no hints baked in (stays surface-neutral). - CLI (bin/lam.dart): writes "Or pass ${cliFlag}: ${explanation}" to stderr after the error message, via _writeHintsCli. - REPL (lib/src/repl.dart): writes "Or run ${replCommand}: ${explanation}" in _handleShapeError, before the bridge prompt. - MCP (bin/mcp_server.dart): emits structured JSON {label, parameter, value, explanation} in the payload. Tests updated to match: - csv_element_shape_test.dart hint tests now check Hint fields. - shape_output_consistency_test.dart hint matrix pins cliFlag value. - Added an explicit assertion that OutputShapeError.message does NOT bake any of the three surface syntax forms. 1256 tests pass, pana 160/160. Not tested: surface-level rendering (that CLI stderr contains the hint line, that MCP payload shape matches). Manually verified; a regression-proof test belongs in the end-of-four-tracks audit. --- bin/lam.dart | 13 +++++ bin/mcp_server.dart | 22 ++++++-- lib/lambe.dart | 1 + lib/src/errors.dart | 14 ++--- lib/src/repl.dart | 3 ++ lib/src/shape/check.dart | 69 ++++++++++++++++++++----- test/csv_element_shape_test.dart | 41 +++++++++------ test/shape_output_consistency_test.dart | 4 +- 8 files changed, 126 insertions(+), 41 deletions(-) diff --git a/bin/lam.dart b/bin/lam.dart index 41e6b9a..bb0e373 100644 --- a/bin/lam.dart +++ b/bin/lam.dart @@ -275,11 +275,13 @@ void _writeWithBridge( } on OutputShapeError catch (e) { if (!(stdin.hasTerminal && stdout.hasTerminal)) { stderr.writeln('Error: ${e.message}'); + _writeHintsCli(e.hints); exit(1); } final choice = _promptForRemediation(e); if (choice == null) { stderr.writeln('Error: ${e.message}'); + _writeHintsCli(e.hints); exit(1); } // Re-evaluate with the chosen bridge applied to the user's AST, @@ -307,6 +309,14 @@ void _writeWithBridge( } } +/// Render [hints] in CLI form to stderr, one per line, after the +/// shape-error message. Silent when no hints are present. +void _writeHintsCli(List hints) { + for (final h in hints) { + stderr.writeln('Or pass ${h.cliFlag}: ${h.explanation}'); + } +} + /// Interactive prompt for the remediations carried by an /// [OutputShapeError]. /// @@ -315,6 +325,9 @@ void _writeWithBridge( /// `q`, a blank line, EOF, or an index outside the valid range. Remediation? _promptForRemediation(OutputShapeError err) { stdout.writeln(err.message); + for (final h in err.hints) { + stdout.writeln('Or pass ${h.cliFlag}: ${h.explanation}'); + } stdout.writeln(); stdout.writeln('Apply a bridge?'); for (var i = 0; i < err.suggestions.length; i++) { diff --git a/bin/mcp_server.dart b/bin/mcp_server.dart index 86e6962..9c1f4a6 100644 --- a/bin/mcp_server.dart +++ b/bin/mcp_server.dart @@ -231,10 +231,14 @@ base class LambeServer extends MCPServer with ToolsSupport { /// `suggestions` carries a 1-based `id`, a `label`, a `template_text` /// (the query-fragment source), an `apply_as` (the complete query /// formed by appending the template to the original expression via - /// `|`), and an `explanation`. `hints` is a list of strings - /// describing environmental remedies (tool parameters, CLI flags) - /// that would resolve the mismatch without changing the query; - /// empty when no such remedy exists. + /// `|`), and an `explanation`. + /// + /// `hints` describes environmental remedies (tool parameters) that + /// would resolve the mismatch without changing the query. Each hint + /// carries a `label`, a `parameter`/`value` pair naming an argument + /// of this MCP tool, and an `explanation`. CLI-flag and REPL-command + /// forms are omitted because they do not apply to an agent calling + /// the MCP server. Empty when no such remedy exists. String _renderShapeErrorPayload(OutputShapeError e, String expression) => const JsonEncoder.withIndent(' ').convert({ 'error': 'output_shape_mismatch', @@ -252,7 +256,15 @@ base class LambeServer extends MCPServer with ToolsSupport { 'explanation': e.suggestions[i].explanation, }, ], - 'hints': e.hints, + 'hints': [ + for (final h in e.hints) + { + 'label': h.label, + 'parameter': h.mcpParameter.$1, + 'value': h.mcpParameter.$2, + 'explanation': h.explanation, + }, + ], }); final _schemaTool = Tool( diff --git a/lib/lambe.dart b/lib/lambe.dart index 10878f3..d141b90 100644 --- a/lib/lambe.dart +++ b/lib/lambe.dart @@ -55,6 +55,7 @@ export 'src/shape/check.dart' Writable, NotWritable, Remediation, + Hint, canWriteAs, canWriteShapeAs; export 'src/shape/explain.dart' diff --git a/lib/src/errors.dart b/lib/src/errors.dart index a3b2cac..ffa552d 100644 --- a/lib/src/errors.dart +++ b/lib/src/errors.dart @@ -47,9 +47,13 @@ class OutputShapeError extends QueryError { /// Query-fragment suggestions that would produce a compatible shape. List get suggestions => report.suggestions; - /// Environmental hints (CLI flags, REPL settings, MCP parameters) - /// that would resolve the mismatch without altering the query. - List get hints => report.hints; + /// Structured environmental remedies (invocation-level changes that + /// would resolve the mismatch). Each [Hint] carries the CLI, REPL, + /// and MCP syntax; consumers render the form that applies to their + /// surface. [message] does NOT include hints, so that a REPL user + /// does not see `--flatten-cells` CLI syntax and an MCP agent does + /// not see REPL colon-commands. + List get hints => report.hints; static String _render(NotWritable r) { final buf = StringBuffer(); @@ -68,10 +72,6 @@ class OutputShapeError extends QueryError { buf.write(s.explanation); } } - for (final h in r.hints) { - buf.write('\n'); - buf.write(h); - } return buf.toString(); } } diff --git a/lib/src/repl.dart b/lib/src/repl.dart index 9d40aaf..bebed9a 100644 --- a/lib/src/repl.dart +++ b/lib/src/repl.dart @@ -180,6 +180,9 @@ void _handleShapeError( required CellPolicy flattenCells, }) { stderr.writeln('Error: ${e.message}'); + for (final h in e.hints) { + stderr.writeln('Or run ${h.replCommand}: ${h.explanation}'); + } if (e.suggestions.isEmpty) return; stdout.writeln(); stdout.writeln('Apply a bridge?'); diff --git a/lib/src/shape/check.dart b/lib/src/shape/check.dart index 4c1d97c..1486103 100644 --- a/lib/src/shape/check.dart +++ b/lib/src/shape/check.dart @@ -184,12 +184,14 @@ final class NotWritable extends ShapeReport { /// Query-fragment suggestions that would produce a compatible shape. final List suggestions; - /// Environmental guidance for the consumer (CLI flags, REPL - /// settings, MCP parameters) that would resolve the mismatch without - /// altering the query. Populated when a configuration knob exists; - /// empty otherwise. Suggestions modify the query, hints modify the - /// invocation. - final List hints; + /// Environmental guidance for the consumer that would resolve the + /// mismatch without altering the query. Each [Hint] carries the + /// invocation-syntax for every supported surface (CLI flag, REPL + /// command, MCP parameter); surfaces render the form that applies + /// to them. + /// + /// Suggestions modify the query; hints modify the invocation. + final List hints; /// Creates a [NotWritable] report. const NotWritable({ @@ -201,6 +203,45 @@ final class NotWritable extends ShapeReport { }); } +/// An environmental remedy: a flag, setting, or parameter change that +/// would resolve a shape mismatch without modifying the query. +/// +/// One [Hint] can be rendered as a CLI flag (`--flatten-cells json`), +/// a REPL command (`:flatten-cells json`), or an MCP parameter +/// (`flatten_cells=json`). Consumers pick the form that matches their +/// surface, so the message seen by an end user is never cluttered with +/// the other surfaces' syntax. +final class Hint { + /// Short human-readable label, for example `"Flatten non-scalar + /// cells"`. Suitable for menu items or UI chips. + final String label; + + /// CLI flag form, including value: `"--flatten-cells json"`. + final String cliFlag; + + /// REPL command form, including value: `":flatten-cells json"`. + final String replCommand; + + /// MCP tool parameter as a `(name, value)` pair: + /// `('flatten_cells', 'json')`. Consumers serialize this into their + /// own tool-argument format. + final (String, String) mcpParameter; + + /// One-line description of the change's effect, for example + /// `"Encodes list- or map-valued cells as JSON strings inline."`. + /// Must read naturally as a sentence after "Or" or "With". + final String explanation; + + /// Creates a [Hint]. + const Hint({ + required this.label, + required this.cliFlag, + required this.replCommand, + required this.mcpParameter, + required this.explanation, + }); +} + /// A query fragment that bridges a shape mismatch. /// /// A [Remediation] is intended to be composed with the user's query via @@ -333,9 +374,9 @@ ShapeReport canWriteShapeAs( /// [CellPolicy.refuse] where the root is already a list, so only the /// cells are the problem. Switching to [CellPolicy.json] would accept /// the value as-is. Hints are surfaced via [NotWritable.hints] and -/// rendered in [OutputShapeError]'s message, REPL, and MCP payload by -/// their respective consumers. -List _hintsFor(Shape got, OutputFormat format, CellPolicy policy) { +/// rendered into their surface's native form (CLI flag, REPL command, +/// MCP parameter) by each consumer. +List _hintsFor(Shape got, OutputFormat format, CellPolicy policy) { if (policy != CellPolicy.refuse) return const []; if (format != OutputFormat.csv && format != OutputFormat.tsv) { return const []; @@ -344,9 +385,13 @@ List _hintsFor(Shape got, OutputFormat format, CellPolicy policy) { // At this point the list root is fine; the rejection must be // element-level. Flipping to json would accept. return const [ - 'Or pass --flatten-cells json (CLI) / :flatten-cells json (REPL) / ' - 'flatten_cells=json (MCP) to encode non-scalar cells as JSON ' - 'strings inline.', + Hint( + label: 'Flatten non-scalar cells', + cliFlag: '--flatten-cells json', + replCommand: ':flatten-cells json', + mcpParameter: ('flatten_cells', 'json'), + explanation: 'Encodes list- or map-valued cells as JSON strings inline.', + ), ]; } diff --git a/test/csv_element_shape_test.dart b/test/csv_element_shape_test.dart index 82b8e76..c3c3dec 100644 --- a/test/csv_element_shape_test.dart +++ b/test/csv_element_shape_test.dart @@ -130,18 +130,24 @@ void main() { }); group('NotWritable.hints surface the --flatten-cells escape hatch', () { - test('csv refuse + non-flat list-of-maps: hint points at the flag', () { - final v = [ - { - 'k': [1, 2], - }, - ]; - final report = canWriteAs(v, OutputFormat.csv) as NotWritable; - expect(report.hints, isNotEmpty); - expect(report.hints.first, contains('--flatten-cells')); - expect(report.hints.first, contains(':flatten-cells')); - expect(report.hints.first, contains('flatten_cells')); - }); + test( + 'csv refuse + non-flat list-of-maps: hint carries all three forms', + () { + final v = [ + { + 'k': [1, 2], + }, + ]; + final report = canWriteAs(v, OutputFormat.csv) as NotWritable; + expect(report.hints, hasLength(1)); + final h = report.hints.first; + expect(h.cliFlag, '--flatten-cells json'); + expect(h.replCommand, ':flatten-cells json'); + expect(h.mcpParameter, ('flatten_cells', 'json')); + expect(h.label, isNotEmpty); + expect(h.explanation, isNotEmpty); + }, + ); test('csv under json policy accepts the value, no hint to produce', () { final v = [ @@ -173,7 +179,10 @@ void main() { }, ); - test('OutputShapeError.message includes the hint text', () { + test('OutputShapeError.message does NOT bake hint text', () { + // Hints are structured data; each surface renders the form that + // applies to it. The baked message stays neutral so a REPL user + // does not see --flatten-cells CLI syntax and vice versa. final v = [ { 'k': [1, 2], @@ -183,8 +192,10 @@ void main() { formatOutput(v, OutputFormat.csv); fail('expected OutputShapeError'); } on OutputShapeError catch (e) { - expect(e.message, contains('--flatten-cells')); - expect(e.hints, isNotEmpty); + expect(e.message, isNot(contains('--flatten-cells'))); + expect(e.message, isNot(contains(':flatten-cells'))); + expect(e.message, isNot(contains('flatten_cells'))); + expect(e.hints, hasLength(1)); } }); }); diff --git a/test/shape_output_consistency_test.dart b/test/shape_output_consistency_test.dart index a1895b5..106fc62 100644 --- a/test/shape_output_consistency_test.dart +++ b/test/shape_output_consistency_test.dart @@ -162,12 +162,12 @@ void main() { if (isListRoot && isDelimited) { expect( report.hints, - isNotEmpty, + hasLength(1), reason: 'List-root rejection under csv/tsv refuse should surface ' 'the --flatten-cells hint for $label -> ${fmt.name}.', ); - expect(report.hints.first, contains('--flatten-cells')); + expect(report.hints.first.cliFlag, '--flatten-cells json'); } else { expect( report.hints, From d2094e1de93057c67c4988add27e29b347978f67 Mon Sep 17 00:00:00 2001 From: Hakim Jonas Ghoula Date: Sat, 2 May 2026 21:34:19 +0200 Subject: [PATCH 03/67] Track C: --ndjson mode for line-delimited JSON input MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Evaluate each line of ndjson/jsonl input as an independent JSON document, no shared state between lines, one compact JSON result per line out. Covers the "tail a log" use case at the CLI layer without touching the core "AST over in-memory tree" model. Library - New `queryNdjson(Iterable lines, LamExpr ast)` in lib/lambe.dart. Lazy via `sync*` so a caller (or a pipe into `take`) can pull only as many results as needed; fail-fast with a `line N:` prefix on the first parse or evaluation error. Empty and whitespace-only lines are skipped silently. CLI - New `--ndjson` flag in bin/lam.dart. Auto-enabled when the file extension is `.ndjson` or `.jsonl`, consistent with the existing auto-detection convention for .csv, .yaml, etc. - File input reads all lines eagerly (bounded size). Stdin uses a lazy `sync*` iterator so `tail -f app.log | lam --ndjson '.level'` emits each result as the line arrives — verified with a time-stamped streaming test (line N emerges with N*0.5s delay). - Rejects combining --ndjson with --interactive, --schema, --assert, --explain, or --to . The mode is narrow on purpose; other output formats and non-execution modes don't combine sensibly with per-line eval. Tests - New test/ndjson_test.dart: 14 tests covering basic per-line evaluation, empty-line skipping, parse/eval error annotation with line numbers, lazy iteration (results yielded before later error), and complex pipe queries per line. Docs - doc/lam.1.md: --ndjson option block and a "line-delimited JSON" example. - doc/lam.1: regenerated. - CHANGELOG.md: new bullet under 0.9.0-dev Added. - README.md: CLI example. Quality gates: dart analyze clean, 1270 tests pass (was 1256, +14), dart format clean, pana 160/160, manpage round-trip matches. --- CHANGELOG.md | 11 +++ README.md | 4 + bin/lam.dart | 110 ++++++++++++++++++++++++++ doc/lam.1 | 10 +++ doc/lam.1.md | 8 ++ lib/lambe.dart | 38 +++++++++ test/ndjson_test.dart | 180 ++++++++++++++++++++++++++++++++++++++++++ 7 files changed, 361 insertions(+) create mode 100644 test/ndjson_test.dart diff --git a/CHANGELOG.md b/CHANGELOG.md index 9ceda74..29b07c9 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -4,6 +4,17 @@ In progress. ### Added +- **`--ndjson` mode for line-delimited JSON input.** Each line of the + source is parsed as an independent JSON document, the query is + evaluated per line with no shared state, and one compact JSON + result is emitted per line. Auto-enabled when the file extension is + `.ndjson` or `.jsonl`. Fail-fast on the first malformed or + unevaluable line; the line number is carried in the error. Covers + the "tail a log" use case without touching the core "AST over + in-memory tree" model. Available as a new top-level `queryNdjson` + function on the library (`Iterable -> Iterable`). + Cannot combine with `--interactive`, `--schema`, `--assert`, or + `--explain`; output is restricted to JSON. - **`--flatten-cells` option for CSV/TSV output.** Accepts `refuse` (default, 0.8.0 behavior) or `json`. Under `json`, non-scalar cells are encoded as JSON strings inline; the shape check widens diff --git a/README.md b/README.md index 674514b..6b6e220 100644 --- a/README.md +++ b/README.md @@ -187,6 +187,10 @@ lam --to csv '.users | map({name, age})' data.json lam --to toml '.config | as(toml)' data.json lam --to csv --flatten-cells json '.users' data.json # encode nested cells as JSON +# Line-delimited JSON (logs, event streams) +lam --ndjson '.user.id' events.ndjson +tail -f app.log | lam --ndjson '.level' + # Query any format (auto-detected from extension) lam '. | filter(.status != "closed")' issues.csv lam '.resource | map(._labels)' main.tf diff --git a/bin/lam.dart b/bin/lam.dart index bb0e373..e2a35cb 100644 --- a/bin/lam.dart +++ b/bin/lam.dart @@ -64,6 +64,13 @@ void main(List arguments) { help: 'Interactive REPL mode', negatable: false, ) + ..addFlag( + 'ndjson', + help: + 'Treat input as ndjson/jsonl: one JSON document per line, ' + 'evaluated independently. One result per line on stdout.', + negatable: false, + ) ..addFlag('help', abbr: 'h', negatable: false, help: 'Show usage'); final ArgResults args; @@ -86,6 +93,7 @@ void main(List arguments) { final isAssertMode = args.flag('assert'); final isInteractive = args.flag('interactive'); final isExplainMode = args.flag('explain'); + var isNdjsonMode = args.flag('ndjson'); final rest = args.rest; if (rest.isEmpty && !isSchemaMode && !isInteractive) { @@ -105,6 +113,45 @@ void main(List arguments) { final expression = rest.isNotEmpty ? rest[0] : '.'; final fileArgIndex = (isSchemaMode || isInteractive) && rest.length == 1 ? 0 : 1; + + // Auto-enable ndjson mode when the file extension suggests it, even + // without an explicit --ndjson flag. Consistent with the existing + // format auto-detection convention for .csv, .yaml, etc. + if (!isNdjsonMode && rest.length > fileArgIndex) { + final fpath = rest[fileArgIndex].toLowerCase(); + if (fpath.endsWith('.ndjson') || fpath.endsWith('.jsonl')) { + isNdjsonMode = true; + } + } + + if (isNdjsonMode) { + if (isInteractive) { + stderr.writeln('Error: --ndjson cannot be combined with --interactive.'); + exit(1); + } + if (isSchemaMode) { + stderr.writeln('Error: --ndjson cannot be combined with --schema.'); + exit(1); + } + if (isAssertMode) { + stderr.writeln('Error: --ndjson cannot be combined with --assert.'); + exit(1); + } + if (isExplainMode) { + stderr.writeln('Error: --ndjson cannot be combined with --explain.'); + exit(1); + } + final toArg = args.option('to'); + if (toArg != null && toArg != 'json') { + stderr.writeln( + 'Error: --ndjson emits one compact JSON document per line; ' + '--to $toArg is not supported.', + ); + exit(1); + } + _runNdjson(argParser, expression, rest, fileArgIndex); + return; + } String? input; String? filePath; @@ -346,6 +393,69 @@ Remediation? _promptForRemediation(OutputShapeError err) { return err.suggestions[pick - 1]; } +/// Handle `--ndjson` mode: evaluate the query against each non-empty +/// line of input independently, emit one compact JSON document per +/// line. +/// +/// File input is read eagerly into a list of lines (sufficient for +/// typical ndjson files). Stdin is read line by line, so `tail -f | +/// lam --ndjson` works as expected. On the first line that fails to +/// parse or evaluate, writes the error with line number to stderr and +/// exits 1; subsequent lines are not evaluated. Fail-fast matches the +/// single-document CLI's semantics and jq's default behavior. +void _runNdjson( + ArgParser argParser, + String expression, + List rest, + int fileArgIndex, +) { + final LamExpr queryAst; + try { + queryAst = parseAst(expression); + } on QueryError catch (e) { + stderr.writeln('Error: ${e.message}'); + exit(1); + } + + Iterable lines; + if (rest.length > fileArgIndex) { + final filePath = rest[fileArgIndex]; + final file = File(filePath); + if (!file.existsSync()) { + stderr.writeln('Error: file not found: $filePath'); + exit(1); + } + lines = file.readAsLinesSync(); + } else if (stdin.hasTerminal) { + stderr.writeln('Error: --ndjson needs a file argument or piped stdin.'); + stderr.writeln(); + _usage(argParser); + exit(1); + } else { + // Lazy stdin reader so `tail -f app.log | lam --ndjson ...` emits + // each line's result as soon as it arrives, not after EOF. The + // iterable completes when readLineSync returns null (pipe closed). + lines = _stdinLines(); + } + + try { + for (final result in queryNdjson(lines, queryAst)) { + stdout.writeln(const JsonEncoder().convert(result)); + } + } on QueryError catch (e) { + stderr.writeln('Error: ${e.message}'); + exit(1); + } +} + +/// Lazy iterable over stdin lines, terminating at EOF. +Iterable _stdinLines() sync* { + String? line; + while ((line = stdin.readLineSync()) != null) { + yield line!; + } +} + /// Print usage information to stderr. void _usage(ArgParser parser) { stderr.writeln('Usage: lam [options] [file]'); diff --git a/doc/lam.1 b/doc/lam.1 index 3cb7f1c..b9de202 100644 --- a/doc/lam.1 +++ b/doc/lam.1 @@ -48,6 +48,9 @@ Evaluate the expression and exit with code 0 if the result is true, 1 if false. \fB-i\fR, \fB--interactive\fR Start the interactive REPL. Requires a file argument. .TP +\fB--ndjson\fR +Treat input as ndjson or jsonl: one JSON document per line, evaluated independently with no state shared between lines. Emits one compact JSON result per line on stdout. Auto-enabled when the file extension is \fB.ndjson\fR or \fB.jsonl\fR. Cannot combine with \fB--interactive\fR, \fB--schema\fR, \fB--assert\fR, or \fB--explain\fR. Output must be JSON (\fB--to json\fR or default); other \fB--to\fR values are refused. +.TP \fB-h\fR, \fB--help\fR Show usage information. .SH QUERY LANGUAGE @@ -262,6 +265,13 @@ Interactive exploration: .nf lam -i data.json .fi +.PP +Line-delimited JSON (logs, event streams): +.PP +.nf +lam --ndjson '.level' events.ndjson +tail -f app.log | lam --ndjson '.user.id' +.fi .SH SEE ALSO .PP \fBjq\fR(1) — the established JSON query tool. Lambe shares its pipeline aesthetic and extends to multi-format input with shape-aware output. diff --git a/doc/lam.1.md b/doc/lam.1.md index d943ba7..a279c8e 100644 --- a/doc/lam.1.md +++ b/doc/lam.1.md @@ -56,6 +56,9 @@ If no file is given, reads from standard input. **-i**, **--interactive** : Start the interactive REPL. Requires a file argument. +**--ndjson** +: Treat input as ndjson or jsonl: one JSON document per line, evaluated independently with no state shared between lines. Emits one compact JSON result per line on stdout. Auto-enabled when the file extension is **.ndjson** or **.jsonl**. Cannot combine with **--interactive**, **--schema**, **--assert**, or **--explain**. Output must be JSON (**--to json** or default); other **--to** values are refused. + **-h**, **--help** : Show usage information. @@ -268,6 +271,11 @@ Interactive exploration: lam -i data.json +Line-delimited JSON (logs, event streams): + + lam --ndjson '.level' events.ndjson + tail -f app.log | lam --ndjson '.user.id' + # SEE ALSO **jq**(1) — the established JSON query tool. Lambe shares its pipeline aesthetic and extends to multi-format input with shape-aware output. diff --git a/lib/lambe.dart b/lib/lambe.dart index d141b90..633b281 100644 --- a/lib/lambe.dart +++ b/lib/lambe.dart @@ -144,6 +144,44 @@ Object? queryString(String expression, String input, {Format? format}) { Object? queryJson(String expression, String json) => queryString(expression, json, format: Format.json); +/// Evaluate [ast] against each non-empty line of [lines] independently +/// as a JSON document. +/// +/// Each line is parsed as JSON, normalized, and evaluated in isolation. +/// No state is shared between lines; each line sees a fresh context. +/// Empty or whitespace-only lines are skipped silently. +/// +/// A parse or evaluation error on any line throws [QueryError] with a +/// `line N:` prefix and stops iteration; subsequent lines are not +/// evaluated. This is the same fail-fast semantics `lam` uses at the +/// CLI. Callers that want per-line error isolation should iterate +/// their own lines and call [evaluateAst] per line with their own +/// exception handling. +/// +/// Lazy: returns an [Iterable] that evaluates on demand. Safe to use +/// over large inputs as long as individual lines fit in memory. +Iterable queryNdjson(Iterable lines, LamExpr ast) sync* { + var lineNum = 0; + for (final raw in lines) { + lineNum++; + final line = raw.trim(); + if (line.isEmpty) continue; + final Object? data; + try { + data = input_.parseInput(line, Format.json); + } on QueryError catch (e) { + throw QueryError('line $lineNum: ${e.message}'); + } + try { + yield eval_.evaluate(ast, data); + } on EvalException catch (e) { + throw QueryError('line $lineNum: ${e.message}'); + } on QueryError catch (e) { + throw QueryError('line $lineNum: ${e.message}'); + } + } +} + /// Parse a query expression string into a [LamExpr] AST. /// /// Returns a Rumil [Result] which is [Success], [Partial], or [Failure]. diff --git a/test/ndjson_test.dart b/test/ndjson_test.dart new file mode 100644 index 0000000..41abf94 --- /dev/null +++ b/test/ndjson_test.dart @@ -0,0 +1,180 @@ +/// Tests for [queryNdjson]: per-line evaluation of JSON documents. +/// +/// Properties to pin: +/// 1. Each non-empty line is evaluated independently; no state bleeds +/// across lines. +/// 2. Empty and whitespace-only lines are skipped silently. +/// 3. A parse error on any line throws [QueryError] with a `line N:` +/// prefix and stops iteration there (fail-fast, matching the +/// single-document CLI's semantics). +/// 4. An evaluation error is surfaced the same way. +/// 5. Lazy iteration: earlier results are produced before later +/// lines are parsed, so a failing line does not prevent the +/// already-yielded results from being consumed. +library; + +import 'package:lambe/lambe.dart'; +import 'package:test/test.dart'; + +void main() { + group('queryNdjson: basic evaluation', () { + test('one line, one result', () { + final ast = parseAst('.name'); + final results = queryNdjson(['{"name": "alice"}'], ast).toList(); + expect(results, ['alice']); + }); + + test('three lines, three results, in order', () { + final ast = parseAst('.age'); + final results = + queryNdjson([ + '{"name": "alice", "age": 30}', + '{"name": "bob", "age": 25}', + '{"name": "carol", "age": 45}', + ], ast).toList(); + expect(results, [30, 25, 45]); + }); + + test('per-line evaluation is independent', () { + // A query that would fail on an aggregate tree but succeeds on + // individual lines proves no accidental aggregation. + final ast = parseAst('.x'); + final results = queryNdjson(['{"x": 1}', '{"x": 2}'], ast).toList(); + expect(results, [1, 2]); + }); + + test('filter predicate returning booleans', () { + final ast = parseAst('.age > 28'); + final results = + queryNdjson([ + '{"age": 30}', + '{"age": 25}', + '{"age": 45}', + ], ast).toList(); + expect(results, [true, false, true]); + }); + }); + + group('queryNdjson: skipping empty lines', () { + test('empty strings are skipped', () { + final ast = parseAst('.a'); + final results = queryNdjson(['{"a": 1}', '', '{"a": 2}'], ast).toList(); + expect(results, [1, 2]); + }); + + test('whitespace-only lines are skipped', () { + final ast = parseAst('.a'); + final results = + queryNdjson(['{"a": 1}', ' ', '\t', '{"a": 2}'], ast).toList(); + expect(results, [1, 2]); + }); + + test('all empty lines produces no results (but does not error)', () { + final ast = parseAst('.a'); + final results = queryNdjson(['', ' ', '\t'], ast).toList(); + expect(results, isEmpty); + }); + }); + + group('queryNdjson: error handling', () { + test('parse error annotates line number', () { + final ast = parseAst('.a'); + expect( + () => queryNdjson(['{"a": 1}', 'not json'], ast).toList(), + throwsA( + isA().having( + (e) => e.message, + 'message', + contains('line 2'), + ), + ), + ); + }); + + test('evaluation error annotates line number', () { + // Arithmetic on null throws at evaluation; `.age + 5` on a line + // without age fails. + final ast = parseAst('.age + 5'); + expect( + () => queryNdjson(['{"age": 30}', '{"name": "bob"}'], ast).toList(), + throwsA( + isA().having( + (e) => e.message, + 'message', + contains('line 2'), + ), + ), + ); + }); + + test('line numbers count empty lines too', () { + final ast = parseAst('.a'); + // Bad input on line 3 of source, still reported as line 3. + expect( + () => queryNdjson(['{"a": 1}', '', 'bad'], ast).toList(), + throwsA( + isA().having( + (e) => e.message, + 'message', + contains('line 3'), + ), + ), + ); + }); + }); + + group('queryNdjson: laziness', () { + test('yields earlier results before hitting a later error', () { + final ast = parseAst('.a'); + final it = + queryNdjson(['{"a": 1}', '{"a": 2}', 'bad line'], ast).iterator; + + expect(it.moveNext(), isTrue); + expect(it.current, 1); + expect(it.moveNext(), isTrue); + expect(it.current, 2); + expect(it.moveNext, throwsA(isA())); + }); + + test('only consumes as many lines as are pulled', () { + final ast = parseAst('.a'); + // Line 3 is malformed; if we only pull two, we never see the + // error. + final results = + queryNdjson([ + '{"a": 1}', + '{"a": 2}', + 'malformed', + ], ast).take(2).toList(); + expect(results, [1, 2]); + }); + }); + + group('queryNdjson: complex queries per line', () { + test('pipe chain works per-line', () { + final ast = parseAst('.users | filter(.active) | map(.name)'); + final results = + queryNdjson([ + '{"users": [{"name": "a", "active": true}, {"name": "b", "active": false}]}', + '{"users": [{"name": "c", "active": true}]}', + ], ast).toList(); + expect(results, [ + ['a'], + ['c'], + ]); + }); + + test('object construction per line', () { + final ast = parseAst('{name, senior: .age > 65}'); + final results = + queryNdjson([ + '{"name": "alice", "age": 30}', + '{"name": "carol", "age": 70}', + ], ast).toList(); + expect(results, [ + {'name': 'alice', 'senior': false}, + {'name': 'carol', 'senior': true}, + ]); + }); + }); +} From 3f6741c3d65d9916bbf3164420a0045fe4399180 Mon Sep 17 00:00:00 2001 From: Hakim Jonas Ghoula Date: Sat, 2 May 2026 21:41:33 +0200 Subject: [PATCH 04/67] Functional test coverage for CLI + MCP wiring Audit after track C found that track C and track D had strong library-level unit tests but no coverage for the wiring that actually exposes them to users: the CLI argument parsing, the ndjson file- extension auto-detect, the mode-combination guards, the stdin streaming claim, and the MCP payload shape. Manual smoke tests are not regression-proof; a wiring regression would ship silently. Changes: lib/src/mcp_payload.dart (new, factored out of bin/mcp_server.dart): renderMcpShapeErrorPayload takes an OutputShapeError + expression and returns the JSON string an MCP agent receives. Pure function, no I/O, testable without starting the MCP server as a subprocess. bin/mcp_server.dart: calls the library function; private method _renderShapeErrorPayload removed. lib/lambe.dart: exports renderMcpShapeErrorPayload. test/mcp_payload_test.dart (new, 5 tests): - Payload parses as JSON with all documented keys. - Suggestions carry 1-based ids and composed `apply_as` queries. - Hints carry structured {parameter, value} pairs. - Hints do NOT leak CLI or REPL syntax into the agent-facing JSON. - Empty hints still expose an empty list (key always present). test/cli_integration_test.dart (new, 18 tests): Shells out to `dart bin/lam.dart` with Process.start. Coverage: - Explicit --ndjson flag produces per-line compact JSON. - .ndjson and .jsonl file extensions auto-enable the mode. - Stdin with --ndjson works via pipe. - Empty and whitespace-only lines skipped silently. - Malformed line exits 1 with "line N" in stderr. - File-not-found exits 1 with a clear error. - Five mode-combo guards: --ndjson rejects --interactive, --schema, --assert, --explain, --to yaml. Accepts --to json (redundant). - Streaming: four stdin lines with 500ms gaps, asserts the last two inter-output gaps are >= 300ms. A buffered implementation would deliver all four near EOF with near-zero gaps. Proves tail -f | lam --ndjson emits as lines arrive. - --flatten-cells refuse writes CLI-form hint (--flatten-cells json) to stderr, NOT REPL or MCP syntax. Regression guard for the surface-specific rendering chosen in the track D audit. - --flatten-cells json produces CSV with JSON-encoded cells. - --explain --flatten-cells json widens writable formats and prints "Cell policy: json" footer. - --explain without the flag: no footer, csv in "Not writable as". What's deliberately not covered: - REPL I/O (ReadLine-driven, not testable without a real TTY). - Exact error message phrasing (substring assertions only, so phrasing can improve without breaking tests). - MCP server subprocess JSON-RPC (the payload function it calls is tested directly; the server wiring is a single dart_mcp method). Quality gates: dart analyze clean, 1293 tests pass (was 1270, +23), dart format clean, pana 160/160, manpage round-trip matches. --- bin/mcp_server.dart | 47 +---- lib/lambe.dart | 1 + lib/src/mcp_payload.dart | 55 ++++++ test/cli_integration_test.dart | 336 +++++++++++++++++++++++++++++++++ test/mcp_payload_test.dart | 119 ++++++++++++ 5 files changed, 514 insertions(+), 44 deletions(-) create mode 100644 lib/src/mcp_payload.dart create mode 100644 test/cli_integration_test.dart create mode 100644 test/mcp_payload_test.dart diff --git a/bin/mcp_server.dart b/bin/mcp_server.dart index 9c1f4a6..1696c42 100644 --- a/bin/mcp_server.dart +++ b/bin/mcp_server.dart @@ -207,7 +207,7 @@ base class LambeServer extends MCPServer with ToolsSupport { return CallToolResult(content: [TextContent(text: rendered)]); } on OutputShapeError catch (e) { return CallToolResult( - content: [TextContent(text: _renderShapeErrorPayload(e, expression))], + content: [TextContent(text: renderMcpShapeErrorPayload(e, expression))], isError: true, ); } on QueryError catch (e) { @@ -223,49 +223,8 @@ base class LambeServer extends MCPServer with ToolsSupport { } } - /// Render an [OutputShapeError] as a JSON payload for agent - /// consumption. - /// - /// The payload has keys `error`, `message`, `format`, `got_shape`, - /// `original_expression`, `suggestions`, and `hints`. Each entry in - /// `suggestions` carries a 1-based `id`, a `label`, a `template_text` - /// (the query-fragment source), an `apply_as` (the complete query - /// formed by appending the template to the original expression via - /// `|`), and an `explanation`. - /// - /// `hints` describes environmental remedies (tool parameters) that - /// would resolve the mismatch without changing the query. Each hint - /// carries a `label`, a `parameter`/`value` pair naming an argument - /// of this MCP tool, and an `explanation`. CLI-flag and REPL-command - /// forms are omitted because they do not apply to an agent calling - /// the MCP server. Empty when no such remedy exists. - String _renderShapeErrorPayload(OutputShapeError e, String expression) => - const JsonEncoder.withIndent(' ').convert({ - 'error': 'output_shape_mismatch', - 'message': e.message, - 'format': e.format.name, - 'got_shape': renderShape(e.got), - 'original_expression': expression, - 'suggestions': [ - for (var i = 0; i < e.suggestions.length; i++) - { - 'id': i + 1, - 'label': e.suggestions[i].label, - 'template_text': e.suggestions[i].display, - 'apply_as': '$expression | ${e.suggestions[i].display}', - 'explanation': e.suggestions[i].explanation, - }, - ], - 'hints': [ - for (final h in e.hints) - { - 'label': h.label, - 'parameter': h.mcpParameter.$1, - 'value': h.mcpParameter.$2, - 'explanation': h.explanation, - }, - ], - }); + // See `renderMcpShapeErrorPayload` in package:lambe/lambe.dart for + // the payload shape this server emits on output-shape mismatches. final _schemaTool = Tool( name: 'lambe_schema', diff --git a/lib/lambe.dart b/lib/lambe.dart index 633b281..fd29990 100644 --- a/lib/lambe.dart +++ b/lib/lambe.dart @@ -29,6 +29,7 @@ export 'src/ast.dart'; export 'src/errors.dart'; export 'src/input.dart' show Format, detectFormat, sniffFormat, parseInput, mdToNative; +export 'src/mcp_payload.dart' show renderMcpShapeErrorPayload; export 'src/output.dart' show OutputFormat, CellPolicy, formatOutput, inferSchema; export 'src/shape/shape.dart' diff --git a/lib/src/mcp_payload.dart b/lib/src/mcp_payload.dart new file mode 100644 index 0000000..09f5f3a --- /dev/null +++ b/lib/src/mcp_payload.dart @@ -0,0 +1,55 @@ +/// MCP payload rendering for structured errors. +/// +/// The functions here produce JSON strings intended to be returned as +/// the text content of an MCP `CallToolResult`'s error response. They +/// are pure (no I/O, no process state) so tests can pin the payload +/// shape without running the server. +library; + +import 'dart:convert'; + +import 'errors.dart'; +import 'shape/shape.dart' show renderShape; + +/// Render an [OutputShapeError] as a JSON payload for agent consumption. +/// +/// The payload has keys `error`, `message`, `format`, `got_shape`, +/// `original_expression`, `suggestions`, and `hints`. Each entry in +/// `suggestions` carries a 1-based `id`, a `label`, a `template_text` +/// (the query-fragment source), an `apply_as` (the complete query +/// formed by appending the template to the original expression via +/// `|`), and an `explanation`. +/// +/// `hints` describes environmental remedies (tool parameters) that +/// would resolve the mismatch without changing the query. Each hint +/// carries a `label`, a `parameter`/`value` pair naming an argument of +/// this MCP tool, and an `explanation`. CLI-flag and REPL-command +/// forms are omitted because they do not apply to an agent calling the +/// MCP server. Empty when no such remedy exists. +String renderMcpShapeErrorPayload(OutputShapeError e, String expression) => + const JsonEncoder.withIndent(' ').convert({ + 'error': 'output_shape_mismatch', + 'message': e.message, + 'format': e.format.name, + 'got_shape': renderShape(e.got), + 'original_expression': expression, + 'suggestions': [ + for (var i = 0; i < e.suggestions.length; i++) + { + 'id': i + 1, + 'label': e.suggestions[i].label, + 'template_text': e.suggestions[i].display, + 'apply_as': '$expression | ${e.suggestions[i].display}', + 'explanation': e.suggestions[i].explanation, + }, + ], + 'hints': [ + for (final h in e.hints) + { + 'label': h.label, + 'parameter': h.mcpParameter.$1, + 'value': h.mcpParameter.$2, + 'explanation': h.explanation, + }, + ], + }); diff --git a/test/cli_integration_test.dart b/test/cli_integration_test.dart new file mode 100644 index 0000000..2f431db --- /dev/null +++ b/test/cli_integration_test.dart @@ -0,0 +1,336 @@ +/// End-to-end CLI tests that shell out to `dart bin/lam.dart`. +/// +/// These cover behaviors that live above the library surface: flag +/// parsing, auto-detection from file extension, mode-combination +/// rejection, stdin streaming, and the stderr/stdout split for hints +/// and errors. Individual library functions are unit-tested in their +/// own files; this file pins the wiring that glues them together. +/// +/// Each test runs the real `dart bin/lam.dart` so regressions in the +/// argument parser, the ndjson loop, or the error rendering surface +/// here rather than in a mock. +library; + +import 'dart:convert'; +import 'dart:io'; + +import 'package:test/test.dart'; + +/// Runs `dart bin/lam.dart [args]` with optional [stdinContents] and +/// returns `(exitCode, stdout, stderr)`. +Future<(int, String, String)> _runLam( + List args, { + String? stdinContents, +}) async { + final process = await Process.start('dart', [ + 'bin/lam.dart', + ...args, + ], workingDirectory: Directory.current.path); + + if (stdinContents != null) { + process.stdin.add(utf8.encode(stdinContents)); + } + await process.stdin.close(); + + final stdoutFuture = process.stdout.transform(utf8.decoder).join(); + final stderrFuture = process.stderr.transform(utf8.decoder).join(); + final exitCode = await process.exitCode; + return (exitCode, await stdoutFuture, await stderrFuture); +} + +/// Runs `dart bin/lam.dart` with [stdinLines] fed one at a time with +/// [gap] between each line, so streaming behavior can be observed. +/// Returns stdout lines paired with their arrival timestamps (ms since +/// process start). +Future> _runLamWithTimedStdin( + List args, + List stdinLines, + Duration gap, +) async { + final start = DateTime.now(); + final process = await Process.start('dart', [ + 'bin/lam.dart', + ...args, + ], workingDirectory: Directory.current.path); + + // Feed lines with gaps; don't await each write (writeln is buffered + // through IOSink). Close stdin after the last line. + () async { + for (var i = 0; i < stdinLines.length; i++) { + if (i > 0) await Future.delayed(gap); + process.stdin.writeln(stdinLines[i]); + await process.stdin.flush(); + } + await process.stdin.close(); + }(); + + final results = <(int, String)>[]; + await process.stdout + .transform(utf8.decoder) + .transform(const LineSplitter()) + .forEach((line) { + final ms = DateTime.now().difference(start).inMilliseconds; + results.add((ms, line)); + }); + await process.exitCode; + return results; +} + +void main() { + late Directory tmp; + + setUp(() { + tmp = Directory.systemTemp.createTempSync('lambe_cli_test_'); + }); + + tearDown(() { + if (tmp.existsSync()) tmp.deleteSync(recursive: true); + }); + + group('--ndjson: basic CLI invocation', () { + test( + 'explicit --ndjson flag evaluates per line, compact JSON out', + () async { + final file = File('${tmp.path}/events.ndjson') + ..writeAsStringSync('{"name":"a","age":30}\n{"name":"b","age":25}\n'); + final (code, out, _) = await _runLam(['--ndjson', '.age', file.path]); + expect(code, 0); + expect(out.trim().split('\n'), ['30', '25']); + }, + ); + + test('.ndjson extension auto-enables the mode without the flag', () async { + final file = File('${tmp.path}/events.ndjson') + ..writeAsStringSync('{"a":1}\n{"a":2}\n'); + final (code, out, _) = await _runLam(['.a', file.path]); + expect(code, 0); + expect(out.trim().split('\n'), ['1', '2']); + }); + + test('.jsonl extension auto-enables the mode without the flag', () async { + final file = File('${tmp.path}/events.jsonl') + ..writeAsStringSync('{"a":1}\n{"a":2}\n'); + final (code, out, _) = await _runLam(['.a', file.path]); + expect(code, 0); + expect(out.trim().split('\n'), ['1', '2']); + }); + + test('stdin with --ndjson works (piped input)', () async { + final (code, out, _) = await _runLam([ + '--ndjson', + '.a', + ], stdinContents: '{"a":1}\n{"a":2}\n{"a":3}\n'); + expect(code, 0); + expect(out.trim().split('\n'), ['1', '2', '3']); + }); + + test('empty lines are skipped silently', () async { + final file = File('${tmp.path}/sparse.ndjson') + ..writeAsStringSync('{"a":1}\n\n{"a":2}\n \n{"a":3}\n'); + final (code, out, _) = await _runLam(['.a', file.path]); + expect(code, 0); + expect(out.trim().split('\n'), ['1', '2', '3']); + }); + }); + + group('--ndjson: error handling', () { + test('malformed line fails with line number, exit 1', () async { + final file = File('${tmp.path}/bad.ndjson') + ..writeAsStringSync('{"a":1}\nnot json\n{"a":3}\n'); + final (code, _, err) = await _runLam(['.a', file.path]); + expect(code, 1); + expect(err, contains('line 2')); + }); + + test('file not found exits 1 with a clear error', () async { + final (code, _, err) = await _runLam([ + '--ndjson', + '.a', + '${tmp.path}/nonexistent.ndjson', + ]); + expect(code, 1); + expect(err, contains('file not found')); + }); + }); + + group('--ndjson: mode combination guards', () { + test('rejects --ndjson --interactive', () async { + final file = File('${tmp.path}/x.ndjson')..writeAsStringSync('{}\n'); + final (code, _, err) = await _runLam(['--ndjson', '-i', file.path]); + expect(code, 1); + expect(err, contains('--interactive')); + }); + + test('rejects --ndjson --schema', () async { + final file = File('${tmp.path}/x.ndjson')..writeAsStringSync('{}\n'); + final (code, _, err) = await _runLam(['--ndjson', '--schema', file.path]); + expect(code, 1); + expect(err, contains('--schema')); + }); + + test('rejects --ndjson --assert', () async { + final file = File('${tmp.path}/x.ndjson')..writeAsStringSync('{}\n'); + final (code, _, err) = await _runLam([ + '--ndjson', + '--assert', + '.a > 0', + file.path, + ]); + expect(code, 1); + expect(err, contains('--assert')); + }); + + test('rejects --ndjson --explain', () async { + final file = File('${tmp.path}/x.ndjson')..writeAsStringSync('{}\n'); + final (code, _, err) = await _runLam([ + '--ndjson', + '--explain', + '.a', + file.path, + ]); + expect(code, 1); + expect(err, contains('--explain')); + }); + + test('rejects --ndjson --to yaml (and other non-json formats)', () async { + final file = File('${tmp.path}/x.ndjson')..writeAsStringSync('{"a":1}\n'); + final (code, _, err) = await _runLam([ + '--ndjson', + '--to', + 'yaml', + '.a', + file.path, + ]); + expect(code, 1); + expect(err, contains('not supported')); + }); + + test('accepts --ndjson --to json (redundant but explicit)', () async { + final file = File('${tmp.path}/x.ndjson')..writeAsStringSync('{"a":1}\n'); + final (code, out, _) = await _runLam([ + '--ndjson', + '--to', + 'json', + '.a', + file.path, + ]); + expect(code, 0); + expect(out.trim(), '1'); + }); + }); + + group('--ndjson: stdin streaming', () { + test( + 'lines emitted as they arrive on stdin, not buffered to EOF', + () async { + // Feed four lines with 500ms between each. Dart VM startup + // (~400ms) means the first two lines are likely already in + // the pipe when the process starts reading, so lines 1 and 2 + // may appear to arrive together. The real streaming signal is + // the gap between the *last two* output lines, since by then + // the VM is fully up and any delay reflects the stdin flush + // pattern. + const gap = Duration(milliseconds: 500); + final results = await _runLamWithTimedStdin( + ['--ndjson', '.a'], + ['{"a":1}', '{"a":2}', '{"a":3}', '{"a":4}'], + gap, + ); + + expect(results.length, 4); + expect([for (final (_, l) in results) l], ['1', '2', '3', '4']); + + // The two mid-stream gaps (between lines 2->3 and 3->4) must + // each be at least 300ms (300ms slack on the 500ms feed gap). + // A buffered implementation would deliver all four at EOF + // with near-zero mid-stream gaps. + final t2 = results[1].$1; + final t3 = results[2].$1; + final t4 = results[3].$1; + expect( + t3 - t2, + greaterThanOrEqualTo(300), + reason: 'gap between lines 2 and 3 too small; output is batched', + ); + expect( + t4 - t3, + greaterThanOrEqualTo(300), + reason: 'gap between lines 3 and 4 too small; output is batched', + ); + }, + // Spawning dart + waiting on three 500ms gaps + VM startup + // takes several seconds; bump the default timeout. + timeout: const Timeout(Duration(seconds: 30)), + ); + }); + + group('--flatten-cells: CLI error surface', () { + test( + 'refuse writes CLI-form hint to stderr, not REPL/MCP syntax', + () async { + final file = File('${tmp.path}/data.json') + ..writeAsStringSync('[{"name":"a","tags":["x","y"]}]'); + final (code, _, err) = await _runLam(['--to', 'csv', '.', file.path]); + expect(code, 1); + expect(err, contains('--flatten-cells json')); + // The baked message must not leak other-surface syntax. + expect(err, isNot(contains(':flatten-cells'))); + expect(err, isNot(contains('flatten_cells=json'))); + }, + ); + + test('--flatten-cells json produces CSV with JSON-encoded cells', () async { + final file = File('${tmp.path}/data.json') + ..writeAsStringSync('[{"name":"a","tags":["x","y"]}]'); + final (code, out, _) = await _runLam([ + '--to', + 'csv', + '--flatten-cells', + 'json', + '.', + file.path, + ]); + expect(code, 0); + expect(out, contains('name')); + expect(out, contains('tags')); + // JSON-encoded cell, CSV-escaped: "[""x"",""y""]" + expect(out, contains(r'"[""x"",""y""]"')); + }); + + test( + '--explain --flatten-cells json widens writable formats and prints footer', + () async { + final file = File('${tmp.path}/data.json') + ..writeAsStringSync('[{"name":"a","tags":["x","y"]}]'); + final (code, out, _) = await _runLam([ + '--explain', + '--flatten-cells', + 'json', + '.', + file.path, + ]); + expect(code, 0); + expect(out, contains('Writable as:')); + expect(out, contains('csv')); + expect(out, contains('Cell policy: json')); + }, + ); + + test( + '--explain without --flatten-cells: no footer, csv NOT writable', + () async { + final file = File('${tmp.path}/data.json') + ..writeAsStringSync('[{"name":"a","tags":["x","y"]}]'); + final (code, out, _) = await _runLam(['--explain', '.', file.path]); + expect(code, 0); + expect(out, isNot(contains('Cell policy:'))); + // csv appears under "Not writable as:" in this scenario. + expect(out, contains('Not writable as:')); + final notLine = out + .split('\n') + .firstWhere((l) => l.startsWith('Not writable as:')); + expect(notLine, contains('csv')); + }, + ); + }); +} diff --git a/test/mcp_payload_test.dart b/test/mcp_payload_test.dart new file mode 100644 index 0000000..26d1a18 --- /dev/null +++ b/test/mcp_payload_test.dart @@ -0,0 +1,119 @@ +/// Tests for [renderMcpShapeErrorPayload]: the JSON payload shape an +/// MCP agent receives when the query result's shape is incompatible +/// with the requested output format. +/// +/// The contract this pins: +/// 1. Payload is valid JSON with the documented top-level keys. +/// 2. Suggestions carry 1-based ids and include both the template +/// text and the fully-composed `apply_as` query. +/// 3. Hints carry structured `parameter`/`value` pairs, not the CLI +/// or REPL syntax (which do not apply to an agent). +/// 4. Both lists are empty when no guidance exists, not missing. +library; + +import 'dart:convert'; + +import 'package:lambe/lambe.dart'; +import 'package:test/test.dart'; + +void main() { + group('renderMcpShapeErrorPayload: top-level shape', () { + test('payload parses as JSON and carries all documented keys', () { + // A scalar root against TOML: has suggestions, no hints. + final report = canWriteAs('hello', OutputFormat.toml) as NotWritable; + final error = OutputShapeError(report); + + final json = renderMcpShapeErrorPayload(error, '.name'); + final payload = jsonDecode(json) as Map; + + expect(payload['error'], 'output_shape_mismatch'); + expect(payload['message'], contains('TOML')); + expect(payload['format'], 'toml'); + expect(payload['got_shape'], 'string'); + expect(payload['original_expression'], '.name'); + expect(payload['suggestions'], isA>()); + expect(payload['hints'], isA>()); + }); + }); + + group('renderMcpShapeErrorPayload: suggestions', () { + test( + 'each suggestion has id, label, template_text, apply_as, explanation', + () { + final report = + canWriteAs([1, 2, 3], OutputFormat.toml) as NotWritable; + final error = OutputShapeError(report); + final json = renderMcpShapeErrorPayload(error, '.items'); + final payload = jsonDecode(json) as Map; + final suggestions = payload['suggestions'] as List; + + expect(suggestions, isNotEmpty); + final first = suggestions.first as Map; + expect(first['id'], 1); + expect(first['label'], isA()); + expect(first['template_text'], isA()); + expect(first['apply_as'], startsWith('.items | ')); + expect(first['explanation'], isA()); + }, + ); + + test('ids are 1-based and increment across suggestions', () { + final report = + canWriteAs([1, 2, 3], OutputFormat.toml) as NotWritable; + final error = OutputShapeError(report); + final json = renderMcpShapeErrorPayload(error, '.items'); + final payload = jsonDecode(json) as Map; + final suggestions = payload['suggestions'] as List; + + for (var i = 0; i < suggestions.length; i++) { + final s = suggestions[i] as Map; + expect(s['id'], i + 1); + } + }); + }); + + group('renderMcpShapeErrorPayload: hints', () { + test( + 'csv + non-scalar cells produces a structured hint, no CLI/REPL noise', + () { + final v = [ + { + 'k': [1, 2], + }, + ]; + final report = canWriteAs(v, OutputFormat.csv) as NotWritable; + final error = OutputShapeError(report); + final json = renderMcpShapeErrorPayload(error, '.rows'); + final payload = jsonDecode(json) as Map; + final hints = payload['hints'] as List; + + expect(hints, hasLength(1)); + final hint = hints.first as Map; + expect(hint['label'], 'Flatten non-scalar cells'); + expect(hint['parameter'], 'flatten_cells'); + expect(hint['value'], 'json'); + expect(hint['explanation'], isA()); + + // The payload MUST NOT leak CLI or REPL syntax: an agent can + // only invoke MCP tool parameters, so those forms would be + // misleading noise in the structured response. + expect(hint.keys, isNot(contains('cliFlag'))); + expect(hint.keys, isNot(contains('replCommand'))); + expect(json, isNot(contains('--flatten-cells'))); + expect(json, isNot(contains(':flatten-cells'))); + }, + ); + + test('toml mismatch has empty hints (no relevant parameter)', () { + final report = + canWriteAs([1, 2, 3], OutputFormat.toml) as NotWritable; + final error = OutputShapeError(report); + final json = renderMcpShapeErrorPayload(error, '.items'); + final payload = jsonDecode(json) as Map; + expect(payload['hints'], isEmpty); + // Key must still be present; missing hints would make agents + // guess whether the field is optional or absent. + expect(payload.containsKey('hints'), isTrue); + }); + }); +} From 8b679691bbf2661d3886adbb92850c035118f377 Mon Sep 17 00:00:00 2001 From: Hakim Jonas Ghoula Date: Sat, 2 May 2026 22:01:12 +0200 Subject: [PATCH 05/67] Track B: richer --explain output (runtime/trivial warnings + JSON) Three sub-features added to the 0.8.0 explain infrastructure: Runtime-rejection warnings (always on) Pipe-op acceptance predicates in pipe_ops.dart already know which input shapes each op rejects. Explain now surfaces the mismatch statically: `.config | filter(.x)` on a known map produces "filter rejects map<...>; this will throw at runtime". SAny inputs are ignored (cannot prove); compatible inputs pass silently. The new _analyzeRejection helper runs alongside the existing _analyzePredicate in explain()'s per-stage loop. Trivial-result warnings (opt-in) For sort_by, group_by, map, unique_by: when the argument references a field provably absent from the element shape, emit a warning saying "the result is trivial". Reuses _missingFieldPath (the helper that already powers empty-filter warnings) on the element shape of the input list. Opt-in via explain(..., includeTrivial: true) because legitimate uses exist (stable no-op sort, explicit null projection). Structured JSON output renderExplainJson(ExplainReport) emits the full report as JSON with snake_case keys (stages, warnings, writable_as, not_writable_as, flatten_cells). Warning kinds serialize as empty_filter, runtime_rejection, trivial_result. Shapes render as strings via renderShape; agents that need structural shape access should call the lambe_schema MCP tool separately. Text output from renderExplain is unchanged byte-for-byte; JSON is pure-additive. Supporting API changes - WarningKind enum: emptyFilter, runtimeRejection, trivialResult. - ExplainWarning.kind field (required at construction). - explain() gains `bool includeTrivial = false` parameter. CLI wiring (bin/lam.dart) - --explain-trivial flag: implies --explain, enables trivial class. - --explain-json flag: implies --explain, switches to JSON renderer. - Both compose: --explain-trivial --explain-json emits JSON including trivial_result warnings. - --ndjson rejection of --explain remains correct (covers the implies cases via the existing guard). Docs - doc/lam.1.md: two new option blocks, --explain description extended. - doc/lam.1: regenerated. - CHANGELOG.md: 0.9.0-dev bullet covering all three sub-features. - README.md: one paragraph added to the --explain section. Tests (+24) shape_explain_test.dart (+17): - 4 runtime-rejection cases (filter/sum on map, SAny untouched, compatible input untouched). - 6 trivial-result cases (sort_by/group_by/map flagged when opt-in, NOT flagged by default, existing field untouched, SAny element cannot prove). - 6 JSON renderer cases (top-level shape, stage/warning/ writability fields, snake_case kind names, flatten_cells). cli_integration_test.dart (+7): runtime-rejection in default output, trivial-result gated on --explain-trivial, --explain-json shape, the "implies --explain" behavior for both sub-flags, combined usage, and the --ndjson --explain-json rejection path. Quality gates: dart analyze clean, 1317 tests pass (was 1293, +24), dart format clean, pana 160/160, manpage round-trip matches. --- CHANGELOG.md | 25 ++++ README.md | 2 + bin/lam.dart | 34 ++++- doc/lam.1 | 8 +- doc/lam.1.md | 8 +- lib/lambe.dart | 9 +- lib/src/shape/explain.dart | 185 +++++++++++++++++++++++++-- test/cli_integration_test.dart | 121 ++++++++++++++++++ test/shape_explain_test.dart | 220 +++++++++++++++++++++++++++++++++ 9 files changed, 595 insertions(+), 17 deletions(-) diff --git a/CHANGELOG.md b/CHANGELOG.md index 29b07c9..4bcbc23 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -4,6 +4,31 @@ In progress. ### Added +- **Richer `--explain` output.** Three new categories of static + analysis, plus a structured output mode: + - **Runtime-rejection warnings** (always on): flags pipe ops whose + input shape is provably incompatible. `.config | filter(.x)` on a + known map produces "filter rejects map<...>; this will throw at + runtime." The existing pipe-op acceptance predicates in + `pipe_ops.dart` supply the check; `explain` surfaces it. + - **Trivial-result warnings** (opt-in via `--explain-trivial`): + flags `sort_by`, `group_by`, `map`, and `unique_by` whose + argument references a field provably absent on the element shape. + Often a typo but legitimate uses exist (stable no-op sort, + explicit null projection), hence opt-in. + - **Structured JSON output** (`--explain-json`): emits the full + explain report as JSON with snake_case keys + (`stages`, `warnings`, `writable_as`, `not_writable_as`, + `flatten_cells`). Warning kinds serialize as `empty_filter`, + `runtime_rejection`, `trivial_result`. For agent tooling and + build-pipeline integration. +- **`ExplainWarning.kind`** (new field, [`WarningKind`] enum). + Classifier for filtering: CLI, JSON consumers, and future tooling + can select warning categories without parsing message strings. The + existing `emptyFilter` case carries the kind it always had. +- **`renderExplainJson`** library function: produces the JSON report. +- Both `--explain-trivial` and `--explain-json` imply `--explain`, + following the pattern of `--ndjson` being a non-combinable mode. - **`--ndjson` mode for line-delimited JSON input.** Each line of the source is parsed as an independent JSON document, the query is evaluated per line with no shared state, and one compact JSON diff --git a/README.md b/README.md index 6b6e220..f53b958 100644 --- a/README.md +++ b/README.md @@ -93,6 +93,8 @@ Writable as: json, yaml, csv, tsv Not writable as: toml, hcl ``` +Explain flags provably-empty filters (`filter(.missing)` on a known shape) and runtime-rejection mismatches (`filter` on a non-list input) by default. Pass `--explain-trivial` to also flag `sort_by`/`group_by`/`map`/`unique_by` whose argument references a missing field (often a typo, sometimes intentional). For agent tooling and build pipelines, `--explain-json` emits the same information as a structured JSON document. + ## Query Syntax Queries start with `.` (the current data) and chain operations with `|`: diff --git a/bin/lam.dart b/bin/lam.dart index e2a35cb..bf0269e 100644 --- a/bin/lam.dart +++ b/bin/lam.dart @@ -53,6 +53,21 @@ void main(List arguments) { help: 'Show shape trace of the query (static analysis, no execution)', negatable: false, ) + ..addFlag( + 'explain-trivial', + help: + 'Include trivial-result warnings in the explain report ' + '(sort_by/group_by/map/unique_by on a missing field). ' + 'Implies --explain.', + negatable: false, + ) + ..addFlag( + 'explain-json', + help: + 'Emit the explain report as JSON instead of the text table. ' + 'Implies --explain.', + negatable: false, + ) ..addFlag( 'assert', help: 'Assert expression is true (exit 1 if false)', @@ -92,7 +107,11 @@ void main(List arguments) { final isSchemaMode = args.flag('schema'); final isAssertMode = args.flag('assert'); final isInteractive = args.flag('interactive'); - final isExplainMode = args.flag('explain'); + // --explain-trivial and --explain-json imply --explain, so enable + // explain mode if any of the three is set. + final explainTrivial = args.flag('explain-trivial'); + final explainJson = args.flag('explain-json'); + final isExplainMode = args.flag('explain') || explainTrivial || explainJson; var isNdjsonMode = args.flag('ndjson'); final rest = args.rest; @@ -240,8 +259,17 @@ void main(List arguments) { } final inputShape = data == null ? const SAny() : shapeOf(data); final cellPolicy = CellPolicy.values.byName(args.option('flatten-cells')!); - final report = explain(ast, inputShape, flattenCells: cellPolicy); - stdout.write(renderExplain(report)); + final report = explain( + ast, + inputShape, + flattenCells: cellPolicy, + includeTrivial: explainTrivial, + ); + if (explainJson) { + stdout.writeln(renderExplainJson(report)); + } else { + stdout.write(renderExplain(report)); + } return; } diff --git a/doc/lam.1 b/doc/lam.1 index b9de202..f8ee63d 100644 --- a/doc/lam.1 +++ b/doc/lam.1 @@ -40,7 +40,13 @@ CSV/TSV policy for non-scalar cells. \fBrefuse\fR (default) rejects list- or map Show the data structure with type names instead of values. .TP \fB--explain\fR -Trace the shape of values flowing through each pipeline stage. Static analysis only; does not execute the query. Reports which output formats the final shape can be serialized as. +Trace the shape of values flowing through each pipeline stage. Static analysis only; does not execute the query. Reports which output formats the final shape can be serialized as. Flags provably-empty filters and runtime-rejection mismatches. +.TP +\fB--explain-trivial\fR +Include trivial-result warnings in the explain report. Flags parameterised ops (\fBsort_by\fR, \fBgroup_by\fR, \fBmap\fR, \fBunique_by\fR) whose argument references a field provably absent on the element shape. Implies \fB--explain\fR. +.TP +\fB--explain-json\fR +Emit the explain report as a JSON document instead of the text table. Useful for agent tooling or build-pipeline integration. Implies \fB--explain\fR. .TP \fB--assert\fR Evaluate the expression and exit with code 0 if the result is true, 1 if false. diff --git a/doc/lam.1.md b/doc/lam.1.md index a279c8e..c051aa3 100644 --- a/doc/lam.1.md +++ b/doc/lam.1.md @@ -48,7 +48,13 @@ If no file is given, reads from standard input. : Show the data structure with type names instead of values. **--explain** -: Trace the shape of values flowing through each pipeline stage. Static analysis only; does not execute the query. Reports which output formats the final shape can be serialized as. +: Trace the shape of values flowing through each pipeline stage. Static analysis only; does not execute the query. Reports which output formats the final shape can be serialized as. Flags provably-empty filters and runtime-rejection mismatches. + +**--explain-trivial** +: Include trivial-result warnings in the explain report. Flags parameterised ops (**sort_by**, **group_by**, **map**, **unique_by**) whose argument references a field provably absent on the element shape. Implies **--explain**. + +**--explain-json** +: Emit the explain report as a JSON document instead of the text table. Useful for agent tooling or build-pipeline integration. Implies **--explain**. **--assert** : Evaluate the expression and exit with code 0 if the result is true, 1 if false. diff --git a/lib/lambe.dart b/lib/lambe.dart index fd29990..30bbf2d 100644 --- a/lib/lambe.dart +++ b/lib/lambe.dart @@ -60,7 +60,14 @@ export 'src/shape/check.dart' canWriteAs, canWriteShapeAs; export 'src/shape/explain.dart' - show ExplainReport, ExplainStage, ExplainWarning, explain, renderExplain; + show + ExplainReport, + ExplainStage, + ExplainWarning, + WarningKind, + explain, + renderExplain, + renderExplainJson; export 'src/shape/infer.dart' show inferShape; export 'src/shape/pipe_ops.dart' show diff --git a/lib/src/shape/explain.dart b/lib/src/shape/explain.dart index d70a295..0a9f74b 100644 --- a/lib/src/shape/explain.dart +++ b/lib/src/shape/explain.dart @@ -8,10 +8,13 @@ /// pass [SAny] when no input data is available. library; +import 'dart:convert'; + import '../ast.dart'; import '../output_format.dart'; import 'check.dart'; import 'infer.dart'; +import 'pipe_ops.dart'; import 'shape.dart'; /// A single row in an explain trace. @@ -33,15 +36,38 @@ final class ExplainStage { const ExplainStage({required this.source, required this.shape}); } -/// A static-analysis warning attached to an explain report. +/// Category of static-analysis finding surfaced by [explain]. /// -/// Warnings call out constructs that evaluate to a trivial result -/// regardless of input, such as a `filter` predicate whose inferred -/// shape is not [SBool]. `filter` requires `== true`, so any non-bool -/// predicate makes the filter always empty. +/// - [emptyFilter]: a `filter`/`filter_values`/`filter_keys` predicate +/// is provably non-boolean, so the filter always returns empty. +/// - [runtimeRejection]: a pipe op's input shape is provably +/// incompatible with the op (e.g. `filter` on an [SMap]); the query +/// will throw at runtime if reached. +/// - [trivialResult]: a parameterised op (`sort_by`, `group_by`, +/// `map`, `unique_by`) references a field that is provably absent +/// from the element shape. The op runs, but the field access yields +/// null for every element, so the result is trivial (same order, +/// same group, same null). Often a typo but legitimate uses exist, +/// which is why this class is opt-in via [explain]'s +/// `includeTrivial` parameter. +enum WarningKind { + /// A filter predicate is provably non-boolean. + emptyFilter, + + /// The op's input shape is provably incompatible; runtime throw. + runtimeRejection, + + /// The op runs but the result is trivial (opt-in). + trivialResult, +} + +/// A static-analysis finding attached to an explain report. /// -/// [stageIndex] points into [ExplainReport.stages] so a renderer can -/// highlight the offending stage. +/// Each warning points at a specific [ExplainReport.stages] entry +/// via [stageIndex] and carries a one-line human-readable [message] +/// plus a [kind] classifier for filtering (CLI flag gates +/// [WarningKind.trivialResult], for example, and a JSON consumer +/// might want to surface only [WarningKind.runtimeRejection]). final class ExplainWarning { /// The stage this warning refers to, as an index into /// [ExplainReport.stages]. @@ -50,8 +76,15 @@ final class ExplainWarning { /// One-line human-readable message. final String message; + /// The warning category, for filtering and machine-readable output. + final WarningKind kind; + /// Creates an [ExplainWarning]. - const ExplainWarning({required this.stageIndex, required this.message}); + const ExplainWarning({ + required this.stageIndex, + required this.message, + required this.kind, + }); } /// A full explain report for a query. @@ -93,10 +126,20 @@ final class ExplainReport { /// caller (CLI `--flatten-cells`, REPL `:flatten-cells`, MCP /// `flatten_cells`). Default is [CellPolicy.refuse], matching the /// library's conservative default. +/// +/// [includeTrivial] controls whether [WarningKind.trivialResult] +/// findings are emitted. Defaults to `false`; trivial findings are +/// often legitimate (e.g. `sort_by(.missing)` intentionally as a +/// stable no-op sort) and can produce noise. The CLI enables them via +/// `--explain-trivial`. [WarningKind.emptyFilter] and +/// [WarningKind.runtimeRejection] findings are always emitted: +/// empty-filter is almost always a bug, and runtime-rejection means +/// the query will throw. ExplainReport explain( LamExpr expr, Shape inputShape, { CellPolicy flattenCells = CellPolicy.refuse, + bool includeTrivial = false, }) { final backbone = _flattenPipe(expr); final stages = []; @@ -105,10 +148,42 @@ ExplainReport explain( var ctx = inputShape; for (var i = 0; i < backbone.length; i++) { final piece = backbone[i]; - final warning = _analyzePredicate(piece, prev); - if (warning != null) { - warnings.add(ExplainWarning(stageIndex: i, message: warning)); + + final emptyFilter = _analyzePredicate(piece, prev); + if (emptyFilter != null) { + warnings.add( + ExplainWarning( + stageIndex: i, + message: emptyFilter, + kind: WarningKind.emptyFilter, + ), + ); + } + + final rejection = _analyzeRejection(piece, prev); + if (rejection != null) { + warnings.add( + ExplainWarning( + stageIndex: i, + message: rejection, + kind: WarningKind.runtimeRejection, + ), + ); + } + + if (includeTrivial) { + final trivial = _analyzeTrivial(piece, prev); + if (trivial != null) { + warnings.add( + ExplainWarning( + stageIndex: i, + message: trivial, + kind: WarningKind.trivialResult, + ), + ); + } } + ctx = inferShape(piece, ctx); stages.add( ExplainStage( @@ -200,6 +275,54 @@ String? _predicateWarning( '$opName requires a boolean, so this will always be empty'; } +/// Detect input shapes that the pipe op will reject at runtime. +/// +/// Each pipe op has an `accepts(Shape)` predicate in `pipe_ops.dart`. +/// When the input shape is concrete (not [SAny]) and the predicate +/// returns false, the query will throw at runtime. This warning +/// surfaces that statically. +/// +/// Returns `null` when [op] is not a pipe op (e.g. an object +/// constructor), when the input shape is [SAny] (cannot prove), or +/// when the op accepts the input shape. +String? _analyzeRejection(LamExpr op, Shape inputShape) { + if (inputShape is SAny) return null; + final info = pipeOpInfoFor(op); + if (info == null) return null; + if (info.accepts(inputShape)) return null; + return '${info.name} rejects ${renderShape(inputShape)}; ' + 'this will throw at runtime'; +} + +/// Detect parameterised ops whose argument references a field +/// provably absent from the element shape. +/// +/// Applies to `sort_by`, `group_by`, `map`, `unique_by`. The op runs, +/// but because the field access yields null for every element, the +/// result is trivial (identity sort, single group, all-nulls map). +/// Often a typo but legitimate uses exist (stable no-op sort for +/// padding, explicit null projection), which is why this warning is +/// opt-in via `explain(..., includeTrivial: true)`. +/// +/// Returns `null` for ops not in this set, for inputs that are not +/// lists (outer shape errors surface as runtime-rejection warnings +/// instead), or when the argument references a field that may exist. +String? _analyzeTrivial(LamExpr op, Shape inputShape) { + final (argExpr, opName) = switch (op) { + SortByOp(:final key) => (key, 'sort_by'), + GroupByOp(:final key) => (key, 'group_by'), + MapOp(:final transform) => (transform, 'map'), + UniqueByOp(:final key) => (key, 'unique_by'), + _ => (null, null), + }; + if (argExpr == null || opName == null) return null; + if (inputShape is! SList) return null; + final missing = _missingFieldPath(argExpr, inputShape.element); + if (missing == null) return null; + return '$opName argument $missing does not exist on the element shape; ' + 'the result is trivial'; +} + /// Render `.a.b.c` if [expr] is a [Field]/[Access] chain whose root /// resolves to a known [SMap] missing a segment in the chain; otherwise /// `null`. @@ -360,3 +483,43 @@ String renderExplain(ExplainReport report) { } return buf.toString(); } + +/// Render an [ExplainReport] as a JSON string for programmatic +/// consumers (agent tooling, build pipelines). +/// +/// The payload is a map with keys `stages`, `warnings`, `writable_as`, +/// `not_writable_as`, and `flatten_cells`. Each stage carries its +/// `source` string and a `shape` rendered via [renderShape] (same text +/// form as the human-readable renderer). Each warning carries +/// `stage_index`, `kind` (one of `empty_filter`, `runtime_rejection`, +/// `trivial_result`), and `message`. +/// +/// Shapes are rendered as strings rather than structurally decomposed +/// into nested maps. Agents that need structural access should use +/// the `lambe_schema` MCP tool on the relevant input. +String renderExplainJson(ExplainReport report) { + final payload = { + 'stages': [ + for (final s in report.stages) + {'source': s.source, 'shape': renderShape(s.shape)}, + ], + 'warnings': [ + for (final w in report.warnings) + { + 'stage_index': w.stageIndex, + 'kind': _warningKindName(w.kind), + 'message': w.message, + }, + ], + 'writable_as': [for (final f in report.writableAs) f.name], + 'not_writable_as': [for (final f in report.notWritableAs) f.name], + 'flatten_cells': report.flattenCells.name, + }; + return const JsonEncoder.withIndent(' ').convert(payload); +} + +String _warningKindName(WarningKind k) => switch (k) { + WarningKind.emptyFilter => 'empty_filter', + WarningKind.runtimeRejection => 'runtime_rejection', + WarningKind.trivialResult => 'trivial_result', +}; diff --git a/test/cli_integration_test.dart b/test/cli_integration_test.dart index 2f431db..744b70c 100644 --- a/test/cli_integration_test.dart +++ b/test/cli_integration_test.dart @@ -333,4 +333,125 @@ void main() { }, ); }); + + group('--explain: richer warnings', () { + test('--explain flags runtime-rejection by default', () async { + final file = File('${tmp.path}/data.json') + ..writeAsStringSync('{"users":[]}'); + final (code, out, _) = await _runLam([ + '--explain', + '. | filter(.x)', + file.path, + ]); + expect(code, 0); + expect(out, contains('Warning')); + expect(out, contains('filter rejects')); + expect(out, contains('throw at runtime')); + }); + + test('--explain does NOT flag trivial-result by default', () async { + final file = File('${tmp.path}/data.json') + ..writeAsStringSync('[{"name":"a","age":30}]'); + final (code, out, _) = await _runLam([ + '--explain', + '. | sort_by(.missing)', + file.path, + ]); + expect(code, 0); + expect(out, isNot(contains('result is trivial'))); + }); + + test('--explain-trivial enables trivial-result warnings', () async { + final file = File('${tmp.path}/data.json') + ..writeAsStringSync('[{"name":"a","age":30}]'); + final (code, out, _) = await _runLam([ + '--explain-trivial', + '. | sort_by(.missing)', + file.path, + ]); + expect(code, 0); + expect(out, contains('Warning')); + expect(out, contains('sort_by')); + expect(out, contains('trivial')); + }); + + test('--explain-trivial implies --explain', () async { + final file = File('${tmp.path}/data.json') + ..writeAsStringSync('[{"a":1}]'); + final (code, out, _) = await _runLam([ + '--explain-trivial', + '.', + file.path, + ]); + expect(code, 0); + expect(out, contains('Writable as:')); + }); + + test('--explain-json emits a JSON document', () async { + final file = File('${tmp.path}/data.json') + ..writeAsStringSync('{"name":"alice"}'); + final (code, out, _) = await _runLam([ + '--explain-json', + '.name', + file.path, + ]); + expect(code, 0); + final parsed = jsonDecode(out.trim()) as Map; + expect( + parsed.keys, + containsAll([ + 'stages', + 'warnings', + 'writable_as', + 'not_writable_as', + 'flatten_cells', + ]), + ); + }); + + test('--explain-json implies --explain', () async { + final file = File('${tmp.path}/data.json')..writeAsStringSync('{"a":1}'); + final (code, out, _) = await _runLam(['--explain-json', '.a', file.path]); + expect(code, 0); + // Without --explain-json implying --explain, the query would + // execute and print `1`, not the structured report. + final parsed = jsonDecode(out.trim()); + expect(parsed, isA>()); + expect((parsed as Map).keys, contains('stages')); + }); + + test( + '--explain-json --explain-trivial: structured warnings include trivial_result', + () async { + final file = File('${tmp.path}/data.json') + ..writeAsStringSync('[{"a":1}]'); + final (code, out, _) = await _runLam([ + '--explain-json', + '--explain-trivial', + '. | sort_by(.missing)', + file.path, + ]); + expect(code, 0); + final parsed = jsonDecode(out.trim()) as Map; + final warnings = parsed['warnings'] as List; + expect(warnings, isNotEmpty); + final kinds = [ + for (final w in warnings) (w as Map)['kind'], + ]; + expect(kinds, contains('trivial_result')); + }, + ); + + test('--ndjson rejects --explain-json (via --explain guard)', () async { + final file = File('${tmp.path}/x.ndjson')..writeAsStringSync('{}\n'); + final (code, _, err) = await _runLam([ + '--ndjson', + '--explain-json', + '.', + file.path, + ]); + expect(code, 1); + expect(err, contains('--explain')); + }); + }); } diff --git a/test/shape_explain_test.dart b/test/shape_explain_test.dart index a7454d4..c6e3977 100644 --- a/test/shape_explain_test.dart +++ b/test/shape_explain_test.dart @@ -9,6 +9,8 @@ /// final shape. library; +import 'dart:convert'; + import 'package:lambe/lambe.dart'; import 'package:lambe/src/parser.dart' show parseQuery; import 'package:rumil/rumil.dart' show Success, ParseError; @@ -334,4 +336,222 @@ void main() { expect(renderExplain(json), contains('Cell policy: json')); }); }); + + group('explain: runtime-rejection warnings', () { + test('filter on a map shape is flagged', () { + const shape = SMap({'a': SNum()}); + final report = explain(_parse('. | filter(.x)'), shape); + final rejection = + report.warnings + .where((w) => w.kind == WarningKind.runtimeRejection) + .toList(); + expect(rejection, hasLength(1)); + expect(rejection.first.message, contains('filter rejects')); + expect(rejection.first.message, contains('throw at runtime')); + }); + + test('sum on a map shape is flagged', () { + const shape = SMap({'a': SNum()}); + final report = explain(_parse('. | sum'), shape); + final rejection = + report.warnings + .where((w) => w.kind == WarningKind.runtimeRejection) + .toList(); + expect(rejection, hasLength(1)); + expect(rejection.first.message, contains('sum rejects')); + }); + + test('SAny input does not trigger rejection (cannot prove)', () { + final report = explain(_parse('. | filter(.x)'), const SAny()); + final rejection = report.warnings.where( + (w) => w.kind == WarningKind.runtimeRejection, + ); + expect(rejection, isEmpty); + }); + + test('compatible input (list for filter) does not trigger', () { + const shape = SList(SMap({'active': SBool()})); + final report = explain(_parse('. | filter(.active)'), shape); + final rejection = report.warnings.where( + (w) => w.kind == WarningKind.runtimeRejection, + ); + expect(rejection, isEmpty); + }); + }); + + group('explain: trivial-result warnings (opt-in)', () { + const userListShape = SList(SMap({'name': SString(), 'age': SNum()})); + + test('sort_by(.missing) flagged when includeTrivial: true', () { + final report = explain( + _parse('. | sort_by(.missing)'), + userListShape, + includeTrivial: true, + ); + final trivial = + report.warnings + .where((w) => w.kind == WarningKind.trivialResult) + .toList(); + expect(trivial, hasLength(1)); + expect(trivial.first.message, contains('sort_by')); + expect(trivial.first.message, contains('.missing')); + }); + + test('group_by(.missing) flagged when includeTrivial: true', () { + final report = explain( + _parse('. | group_by(.missing)'), + userListShape, + includeTrivial: true, + ); + final trivial = + report.warnings + .where((w) => w.kind == WarningKind.trivialResult) + .toList(); + expect(trivial, hasLength(1)); + expect(trivial.first.message, contains('group_by')); + }); + + test('map(.missing) flagged when includeTrivial: true', () { + final report = explain( + _parse('. | map(.missing)'), + userListShape, + includeTrivial: true, + ); + final trivial = + report.warnings + .where((w) => w.kind == WarningKind.trivialResult) + .toList(); + expect(trivial, hasLength(1)); + expect(trivial.first.message, contains('map')); + }); + + test('NOT flagged by default (includeTrivial: false)', () { + final report = explain(_parse('. | sort_by(.missing)'), userListShape); + final trivial = report.warnings.where( + (w) => w.kind == WarningKind.trivialResult, + ); + expect(trivial, isEmpty); + }); + + test('existing field does not produce a trivial warning', () { + final report = explain( + _parse('. | sort_by(.age)'), + userListShape, + includeTrivial: true, + ); + final trivial = report.warnings.where( + (w) => w.kind == WarningKind.trivialResult, + ); + expect(trivial, isEmpty); + }); + + test('SAny element shape cannot prove missing; no trivial warning', () { + final report = explain( + _parse('. | sort_by(.missing)'), + const SList(SAny()), + includeTrivial: true, + ); + final trivial = report.warnings.where( + (w) => w.kind == WarningKind.trivialResult, + ); + expect(trivial, isEmpty); + }); + }); + + group('renderExplainJson: machine-readable output', () { + test('valid JSON with documented top-level keys', () { + final report = explain( + _parse('.users | map(.name)'), + const SMap({ + 'users': SList(SMap({'name': SString()})), + }), + ); + final json = renderExplainJson(report); + final parsed = jsonDecode(json) as Map; + expect( + parsed.keys, + containsAll([ + 'stages', + 'warnings', + 'writable_as', + 'not_writable_as', + 'flatten_cells', + ]), + ); + }); + + test('stages carry source and shape strings', () { + final report = explain( + _parse('.users | length'), + const SMap({'users': SList(SString())}), + ); + final parsed = + jsonDecode(renderExplainJson(report)) as Map; + final stages = parsed['stages'] as List; + expect(stages, hasLength(2)); + final first = stages.first as Map; + expect(first['source'], '.users'); + expect(first['shape'], 'list'); + }); + + test('warnings carry stage_index, kind (snake_case), and message', () { + const shape = SMap({'a': SNum()}); + final report = explain(_parse('. | filter(.x)'), shape); + final parsed = + jsonDecode(renderExplainJson(report)) as Map; + final warnings = parsed['warnings'] as List; + expect(warnings, isNotEmpty); + final w = warnings.first as Map; + expect(w.keys, containsAll(['stage_index', 'kind', 'message'])); + expect(w['kind'], 'runtime_rejection'); + }); + + test('kind uses snake_case for all three categories', () { + // empty_filter + const listShape = SList(SMap({'a': SNum()})); + final emptyReport = explain(_parse('. | filter(.b)'), listShape); + final emptyKinds = [ + for (final w + in (jsonDecode(renderExplainJson(emptyReport)) + as Map)['warnings'] + as List) + (w as Map)['kind'], + ]; + expect(emptyKinds, contains('empty_filter')); + + // trivial_result + final trivialReport = explain( + _parse('. | sort_by(.missing)'), + listShape, + includeTrivial: true, + ); + final trivialKinds = [ + for (final w + in (jsonDecode(renderExplainJson(trivialReport)) + as Map)['warnings'] + as List) + (w as Map)['kind'], + ]; + expect(trivialKinds, contains('trivial_result')); + }); + + test('writable_as / not_writable_as are name lists', () { + final report = explain(_parse('.'), const SList(SString())); + final parsed = + jsonDecode(renderExplainJson(report)) as Map; + expect(parsed['writable_as'], contains('json')); + expect(parsed['not_writable_as'], contains('toml')); + }); + + test('flatten_cells is the policy name string', () { + final report = explain( + _parse('.'), + const SList(SMap({'a': SList(SNum())})), + flattenCells: CellPolicy.json, + ); + final parsed = + jsonDecode(renderExplainJson(report)) as Map; + expect(parsed['flatten_cells'], 'json'); + }); + }); } From 9c96f7c7a61d41b6a3405347af113e70fd329e4e Mon Sep 17 00:00:00 2001 From: Hakim Jonas Ghoula Date: Sat, 2 May 2026 22:31:26 +0200 Subject: [PATCH 06/67] Gap closure: structured shapes in --explain-json, cascade cover Post-track-B audit caught two real gaps; one closed, one deferred with documentation. Structured shapes in --explain-json renderExplainJson previously emitted stage shapes as text strings via renderShape ("list>"). Agents consuming the JSON had to re-parse that text to access structure, defeating the point of a JSON mode. Fixed by adding shapeToJson(Shape) in lib/src/shape/shape.dart, a sealed-ADT walk that produces {kind, ...} nested trees: {"kind": "list", "element": {"kind": "map", "fields": {...}}} renderExplainJson now uses this form. Text output from renderExplain is byte-for-byte unchanged. Exported from the library barrel. Rejection cascade coverage New test in shape_explain_test.dart verifies the interaction between runtime-rejection warnings and inferShape's SAny widening: `. | filter(.a) | sort` starting from an SMap produces exactly one rejection warning (on filter, stage 1). Sort sees the post-filter ctx as SAny and does NOT emit its own rejection. Prevents double-warning regressions. REPL surface verified manually :flatten-cells colon-command, session-state persistence, and the REPL-native hint rendering (":flatten-cells json" not "--flatten-cells json") all verified in a real session on a list-of-maps-with-lists fixture. A ReadLine-seam refactor would be needed for automated REPL tests; documented in memory as a known gap accepted for 0.9.0. Tests (+9 total) shape_test.dart (+7): shapeToJson on every Shape constructor plus nested round-trips, empty map, empty list, and JSON round-trippability. shape_explain_test.dart (+1 new + 1 rewrite): - Rewrote the "stages carry shape" test to assert the structured {kind: list, element: {kind: string}} form instead of the old string. - New "rejection cascade" test pinning single-warning behavior. Quality gates: dart analyze clean, 1325 tests pass (was 1317), dart format clean, pana 160/160. --- CHANGELOG.md | 9 +++-- lib/lambe.dart | 3 +- lib/src/shape/explain.dart | 15 ++++---- lib/src/shape/shape.dart | 35 +++++++++++++++++++ test/shape_explain_test.dart | 25 ++++++++++++- test/shape_test.dart | 68 ++++++++++++++++++++++++++++++++++++ 6 files changed, 142 insertions(+), 13 deletions(-) diff --git a/CHANGELOG.md b/CHANGELOG.md index 4bcbc23..5720ab7 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -20,8 +20,13 @@ In progress. explain report as JSON with snake_case keys (`stages`, `warnings`, `writable_as`, `not_writable_as`, `flatten_cells`). Warning kinds serialize as `empty_filter`, - `runtime_rejection`, `trivial_result`. For agent tooling and - build-pipeline integration. + `runtime_rejection`, `trivial_result`. Shapes serialize as nested + `{kind, ...}` trees (via `shapeToJson`) rather than stringified, + so agents can pattern-match shape structure without re-parsing. + For agent tooling and build-pipeline integration. +- **`shapeToJson`** library function: serializes a [`Shape`] as a + nested `Map` with a `kind` discriminator on each + node. The structured format used by `--explain-json`. - **`ExplainWarning.kind`** (new field, [`WarningKind`] enum). Classifier for filtering: CLI, JSON consumers, and future tooling can select warning categories without parsing message strings. The diff --git a/lib/lambe.dart b/lib/lambe.dart index 30bbf2d..ed53510 100644 --- a/lib/lambe.dart +++ b/lib/lambe.dart @@ -43,7 +43,8 @@ export 'src/shape/shape.dart' SList, SMap, shapeOf, - renderShape; + renderShape, + shapeToJson; export 'src/shape/check.dart' show ShapeRequirement, diff --git a/lib/src/shape/explain.dart b/lib/src/shape/explain.dart index 0a9f74b..7ec0a88 100644 --- a/lib/src/shape/explain.dart +++ b/lib/src/shape/explain.dart @@ -489,19 +489,16 @@ String renderExplain(ExplainReport report) { /// /// The payload is a map with keys `stages`, `warnings`, `writable_as`, /// `not_writable_as`, and `flatten_cells`. Each stage carries its -/// `source` string and a `shape` rendered via [renderShape] (same text -/// form as the human-readable renderer). Each warning carries -/// `stage_index`, `kind` (one of `empty_filter`, `runtime_rejection`, -/// `trivial_result`), and `message`. -/// -/// Shapes are rendered as strings rather than structurally decomposed -/// into nested maps. Agents that need structural access should use -/// the `lambe_schema` MCP tool on the relevant input. +/// `source` string and a `shape` serialized via [shapeToJson] (a +/// nested `{kind, ...}` tree rather than the `renderShape` text form, +/// so consumers can pattern-match without re-parsing). Each warning +/// carries `stage_index`, `kind` (one of `empty_filter`, +/// `runtime_rejection`, `trivial_result`), and `message`. String renderExplainJson(ExplainReport report) { final payload = { 'stages': [ for (final s in report.stages) - {'source': s.source, 'shape': renderShape(s.shape)}, + {'source': s.source, 'shape': shapeToJson(s.shape)}, ], 'warnings': [ for (final w in report.warnings) diff --git a/lib/src/shape/shape.dart b/lib/src/shape/shape.dart index 49401ba..7742ece 100644 --- a/lib/src/shape/shape.dart +++ b/lib/src/shape/shape.dart @@ -208,3 +208,38 @@ const int _heteroSampleLimit = 8; /// Equivalent to [Shape.toString], provided as a function for callers that /// prefer `renderShape(s)` over `s.toString()`. String renderShape(Shape shape) => shape.toString(); + +/// Serialize [shape] as a JSON-shaped `Map` tree. +/// +/// Every shape is a map with a `kind` discriminator. List shapes carry +/// `element` (a nested shape). Map shapes carry `fields` (a +/// `Map` of field-name to nested shape). Scalars carry +/// only the kind. The output is intended for `--explain-json` and +/// other programmatic consumers that want to reason about shape +/// structure without re-parsing the `renderShape` text form. +/// +/// ``` +/// shapeToJson(SList(SMap({'a': SNum()}))) +/// // { +/// // "kind": "list", +/// // "element": { +/// // "kind": "map", +/// // "fields": {"a": {"kind": "number"}} +/// // } +/// // } +/// ``` +Map shapeToJson(Shape shape) => switch (shape) { + SAny() => const {'kind': 'any'}, + SNull() => const {'kind': 'null'}, + SBool() => const {'kind': 'bool'}, + SNum() => const {'kind': 'number'}, + SString() => const {'kind': 'string'}, + SList(:final element) => {'kind': 'list', 'element': shapeToJson(element)}, + SMap(:final fields) => { + 'kind': 'map', + 'fields': { + for (final MapEntry(:key, :value) in fields.entries) + key: shapeToJson(value), + }, + }, +}; diff --git a/test/shape_explain_test.dart b/test/shape_explain_test.dart index c6e3977..8dd739e 100644 --- a/test/shape_explain_test.dart +++ b/test/shape_explain_test.dart @@ -377,6 +377,24 @@ void main() { ); expect(rejection, isEmpty); }); + + test( + 'after a rejection, downstream stages see SAny and do not double-warn', + () { + // `. | filter(.a) | sort` starting from a map: filter rejects + // (warning emitted), inferShape widens ctx to SAny, sort then + // accepts any shape and should NOT emit its own rejection. + const shape = SMap({'a': SNum()}); + final report = explain(_parse('. | filter(.a) | sort'), shape); + final rejections = + report.warnings + .where((w) => w.kind == WarningKind.runtimeRejection) + .toList(); + expect(rejections, hasLength(1)); + expect(rejections.first.stageIndex, 1); + expect(rejections.first.message, contains('filter rejects')); + }, + ); }); group('explain: trivial-result warnings (opt-in)', () { @@ -491,7 +509,12 @@ void main() { expect(stages, hasLength(2)); final first = stages.first as Map; expect(first['source'], '.users'); - expect(first['shape'], 'list'); + // Structured shape: {kind: list, element: {kind: string}}. + expect(first['shape'], isA>()); + final shape = first['shape'] as Map; + expect(shape['kind'], 'list'); + final element = shape['element'] as Map; + expect(element['kind'], 'string'); }); test('warnings carry stage_index, kind (snake_case), and message', () { diff --git a/test/shape_test.dart b/test/shape_test.dart index 6f168f4..7186a51 100644 --- a/test/shape_test.dart +++ b/test/shape_test.dart @@ -5,6 +5,8 @@ /// `list`, and nested structures recurse predictably. library; +import 'dart:convert'; + import 'package:lambe/src/shape/shape.dart'; import 'package:test/test.dart'; @@ -142,4 +144,70 @@ void main() { expect(renderShape(const SList(SAny())), 'list'); }); }); + + group('shapeToJson: structured serialization', () { + test('scalars encode as {kind: ...}', () { + expect(shapeToJson(const SAny()), {'kind': 'any'}); + expect(shapeToJson(const SNull()), {'kind': 'null'}); + expect(shapeToJson(const SBool()), {'kind': 'bool'}); + expect(shapeToJson(const SNum()), {'kind': 'number'}); + expect(shapeToJson(const SString()), {'kind': 'string'}); + }); + + test('list encodes with nested element shape', () { + expect(shapeToJson(const SList(SNum())), { + 'kind': 'list', + 'element': {'kind': 'number'}, + }); + }); + + test('map encodes with nested fields', () { + expect(shapeToJson(const SMap({'a': SNum(), 'b': SString()})), { + 'kind': 'map', + 'fields': { + 'a': {'kind': 'number'}, + 'b': {'kind': 'string'}, + }, + }); + }); + + test('nested list-of-maps round-trips the shape tree', () { + const shape = SList(SMap({'name': SString(), 'tags': SList(SString())})); + expect(shapeToJson(shape), { + 'kind': 'list', + 'element': { + 'kind': 'map', + 'fields': { + 'name': {'kind': 'string'}, + 'tags': { + 'kind': 'list', + 'element': {'kind': 'string'}, + }, + }, + }, + }); + }); + + test('empty map has empty fields', () { + expect(shapeToJson(const SMap({})), { + 'kind': 'map', + 'fields': {}, + }); + }); + + test('empty list has SAny element', () { + expect(shapeToJson(const SList(SAny())), { + 'kind': 'list', + 'element': {'kind': 'any'}, + }); + }); + + test('result serializes to JSON without error', () { + const shape = SMap({'a': SList(SNum())}); + final json = jsonEncode(shapeToJson(shape)); + expect(json, contains('"kind":"map"')); + expect(json, contains('"kind":"list"')); + expect(json, contains('"kind":"number"')); + }); + }); } From 8979e44378e314d6312a44a7d0b8ff3e11fba965 Mon Sep 17 00:00:00 2001 From: Hakim Jonas Ghoula Date: Sat, 2 May 2026 23:37:37 +0200 Subject: [PATCH 07/67] Track A design doc: schema-typed queries with SOptional MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Decision record for the schema-as-contract feature. Resolves the design questions from the handover plus several the handover didn't raise. Format: JSON Schema subset (not custom DSL). Subset is type/properties/items/required. Value-level constraints (minimum/pattern/enum/etc) are rejected at load time with per-keyword errors. Structural combinators (allOf/oneOf/$ref/if-then /dependencies) rejected. Unknown keywords ignored per JSON Schema extensibility convention. Key call: rumil_parsers.parseJson does the JSON parsing for free with line-aware errors, so the parser collapses to ~50 lines of exhaustive switch on JsonValue. My earlier "Lambe DSL is cheaper" argument died once I accounted for that. Shape ADT: SOptional(Shape) added. Required by JSON Schema's `required` semantics — shipping without it would silently lie whenever users have optional fields. Termination and the bounded- language contract are preserved: SOptional lives in the static analyzer, not the query language. Disagreement: schema augments shapeOf(data); error on concrete-type conflict. Keeps --explain honest. Structural validation falls out as a side effect; no separate --validate command for 0.9.0. CLI: rename --schema to --print-shape (first breaking change in 0.9.0); add --schema . --print-shape output becomes JSON Schema, round-trippable with --schema input. Sibling convention: data.json paired with data.schema.json. MCP: new schema parameter on lambe_query; rename lambe_schema to lambe_print_shape; new lambe_check tool for on-demand validation. Explicit non-goals called out: no runtime coercion, no value-level constraints, no conditional schemas, no external $ref, no templating. Lambe is not CUE and shouldn't try to be. Implementation plan: SOptional first (compiler finds all the switch sites), then parser/loader/merge, then CLI/REPL/MCP wiring, then tests and docs. Estimated ~1 week. Positioning sharpened via research: Lambe is "a query language for structured data that shows you what you're working with" — use it when you don't already know the data. Not "typed jq" (that market never materialized in 10 years). Not "parity with CUE" (different audience). The shape feedback loop is the actual win. --- doc/schema-design.md | 364 +++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 364 insertions(+) create mode 100644 doc/schema-design.md diff --git a/doc/schema-design.md b/doc/schema-design.md new file mode 100644 index 0000000..900b028 --- /dev/null +++ b/doc/schema-design.md @@ -0,0 +1,364 @@ +# Lambe 0.9.0 Track A: Schema-typed queries — design document + +Status: **approved**, ready for implementation. + +## Context + +0.9.0 completes the shape feedback loop: declare a shape, check queries +against it, round-trip with JSON Schema tooling. Tracks B/C/D landed +the per-feature polish; track A ships the piece that lets Lambe's +shape system act as a contract between the tool and its users' data. + +The positioning is *"a query language for structured data that shows +you what you're working with."* Schemas are how a user tells Lambe +what they're working with when the data alone doesn't say enough +(empty lists, optional fields, heterogeneous sampling) — and how +Lambe tells the user, statically, whether their query makes sense +against that contract. + +## Non-goals + +- **No value-level constraints.** `minimum`, `maximum`, `pattern`, + `enum`, `format`, `minLength`, `maxLength` are rejected at + schema-load time with a one-line per-keyword error. Lambe is a + query tool that understands shape, not a constraint system. Users + who want value validation reach for ajv, check-jsonschema, or CUE. +- **No conditional schemas** (`if`/`then`/`else`, `dependencies`, + `allOf`/`oneOf`/`anyOf`/`not`). These introduce a constraint solver + and break the bounded-tree-transformer promise. +- **No external `$ref` resolution.** Schemas are single-file. +- **No runtime coercion.** A schema saying `age: number` does not + cause CSV's `"30"` string to be parsed as a number at query time. + The user still writes `.age | to_number`. +- **No pure-validation CLI command** (`lam --validate`). A user who + wants to validate data against a schema can write + `lam --schema s.json '.' data.yaml`. If data violates the schema, + the load fails with a structural error. That's enough. + +## Design decisions + +### 1. Schema format: JSON Schema subset + +Accept JSON files that describe a shape using four JSON Schema +keywords: `type`, `properties`, `items`, `required`. + +**Chosen over a custom Lambe DSL because:** + +- Ecosystem leverage. JSON Schema is what users already have — + OpenAPI specs, pub.dev metadata, IDE validators, CI linters all + emit or consume it. Zero authoring cost for users with an existing + schema. +- `rumil_parsers.parseJson` does the parse for free, with typed + errors and line/column locations. The "walk `JsonValue` → build + `Shape`" layer is ~50 lines of exhaustive switch. +- JSON Schema as the ecosystem's lingua franca for structural + description is a fact. A Lambe-specific DSL would be one more + thing to learn with no reciprocal win. + +**Accepted keywords and their mapping:** + +| JSON Schema | Maps to | +|-------------|---------| +| `{"type": "null"}` | `SNull` | +| `{"type": "boolean"}` | `SBool` | +| `{"type": "number"}` or `"integer"` | `SNum` | +| `{"type": "string"}` | `SString` | +| `{"type": "array", "items": S}` | `SList(parse(S))` | +| `{"type": "object", "properties": P}` | `SMap({...})` with each property recursively parsed | +| `"required": [names]` on an object | Non-listed properties become `SOptional` in the `SMap` | + +**Rejected keywords** produce a clear per-keyword error pointing at +the source location: + +- `minimum`, `maximum`, `exclusiveMinimum`, `exclusiveMaximum` +- `multipleOf` +- `minLength`, `maxLength`, `pattern`, `format` +- `minItems`, `maxItems`, `uniqueItems`, tuple-form `items` +- `minProperties`, `maxProperties`, `additionalProperties`, + `patternProperties`, `propertyNames` +- `const`, `enum` +- `allOf`, `oneOf`, `anyOf`, `not` +- `if`/`then`/`else`, `dependencies` +- `$ref`, `$defs`, `definitions`, `$schema`, `$id` + +**Unknown keywords** are ignored (JSON Schema's extensibility +convention). A schema with `"description"` or `"title"` is fine — +those are metadata that don't affect shape. + +### 2. `SOptional(Shape)` variant + +Adding a new sealed variant to `Shape`: + +```dart +/// A value that may be absent. Used for JSON Schema properties not +/// listed in `required`, and for other cases where optionality is +/// statically known. +final class SOptional extends Shape { + final Shape inner; + const SOptional(this.inner); + // == / hashCode / toString +} +``` + +**What this gives us:** + +- JSON Schema's `required` semantics correctly map to the shape + system. Schemas ship honestly or not at all. +- `--explain` can point out "this field is optional; `.age + 5` may + throw at runtime on rows without `age`." +- `SMap` field shapes can carry `SOptional(...)` to represent + "declared but optional." + +**What this costs:** + +- Every exhaustive `switch` on `Shape` gets a new case. The Dart + compiler finds them all. Expected sites: `pipe_ops.dart` predicates, + `inferShape`, `renderShape`, `shapeToJson`, `canWriteAs` + requirements, `check.dart` hints. +- Op acceptance semantics: for a list pipe op (like `filter`), an + `SOptional>` input means "might be a list, might not be." + The op accepts (treating as the inner `SList`), but a + runtime-rejection warning fires: "this may be absent; guard with a + null check." + +**What this preserves:** + +- **Termination.** `SOptional` lives in the shape ADT, not the query + language. Query evaluation semantics unchanged. +- **The bounded-language contract.** No new query operators. +- **The "narrow on purpose" scope.** The analyzer gets richer; the + language surface is unchanged. + +### 3. Disagreement semantics: schema augments data + +When `--schema` is provided AND data is present, the initial shape +for `inferShape` is `mergeSchemaWithData(schemaShape, shapeOf(data))`. + +Merge rules: + +- Both agree on a concrete type: use that type. +- Schema has a field, data doesn't: use schema's shape. +- Data has a field, schema doesn't: use data's shape. +- Schema marks a field optional, data has it: field is present; + outer `SOptional` wrapper is stripped at the merged point. +- Schema and data disagree on a concrete type at any path: **error + at load time** with a diagnostic showing the path, expected, and + actual. +- Empty-list element shapes always take the schema's element if one + is declared. `shapeOf([])` = `SList(SAny)`, schema + `list` → result is `SList`. + +**Rationale:** the value proposition of `--explain` is "what it +says is what will happen." A schema-wins policy would make +`--explain` lie whenever schema and data diverge. Error-on-conflict +keeps `--explain` honest. The merge preserves the case where schema +adds information (optionality, empty-list elements) without +overriding data. + +**This gives structural validation as a side effect.** A user +running `lam --schema api.json '.' response.json` whose response +doesn't match the schema gets a load-time error naming the path and +types. No separate `--validate` mode needed. + +### 4. CLI surface + +**Rename `--schema` to `--print-shape`.** The existing `--schema` +flag prints the inferred shape of data; its semantics are really +"print the shape you'd infer." Renaming aligns the flag names with +their verbs. + +```bash +# 0.8.0 (old) +lam --schema data.json # prints the inferred shape + +# 0.9.0 (new) +lam --print-shape data.json # prints the inferred shape (as JSON Schema) +lam --schema spec.json data.json # uses spec.json as the input schema +lam --schema spec.json 'query' # schema-only (no data) +lam --schema spec.json --explain 'q' # trace a query against the schema +``` + +**Auto-detection:** if `--schema` is not passed and a file named +`.schema.json` exists next to the data file, use it +implicitly. Consistent with the `.ndjson` auto-detection shipped in +track C. + +**`--print-shape` output is JSON Schema.** Round-trips with +`--schema` input: + +```bash +lam --print-shape data.json > data.schema.json +# edit data.schema.json as needed +lam --schema data.schema.json query.lam data.json +``` + +This replaces the current type-name-string JSON output. **Breaking +change**, documented in CHANGELOG. + +**REPL additions:** + +- `:schema ` — load a schema for this session. +- `:schema` — show the active schema (if any). +- `:print-shape` — print the inferred shape of the currently loaded + data, in JSON Schema form. + +**JSON-Schema-looking reject:** if `--schema` is passed a file with +no recognized content (empty, random text, HTML, etc.), error with +a clear message. If it contains unsupported JSON Schema features, +error per feature. If it's valid JSON but not a schema (a bare +number, a plain object without `type`/`properties`), error with +"schema root must declare a shape (use `{\"type\": ...}`)." + +### 5. MCP integration + +The `lambe_query` tool gains an optional `schema` parameter: a JSON +string containing the schema. Threaded through like `flatten_cells`. + +The `lambe_schema` MCP tool is renamed to `lambe_print_shape` for +consistency. Returns the shape as JSON Schema. (Agents that were +calling `lambe_schema` get a clear deprecation: tool not found, +suggest `lambe_print_shape`.) + +New MCP tool: `lambe_check` — takes `schema` and `data`, returns +`{ok: true}` or `{ok: false, errors: [...]}`. This is structural +validation on demand, using the same `mergeSchemaWithData` logic. +Useful for agents verifying they have the right fixtures before +running queries. + +### 6. Library surface + +New module: `lib/src/schema/parser.dart` + +```dart +/// Parse a JSON Schema subset into a [Shape]. +/// +/// Accepts a subset of JSON Schema: `type`, `properties`, `items`, +/// `required`. Rejects value-level constraints and structural +/// combinators; see doc/schema-design.md for the full list. +/// +/// Throws [QueryError] with a line-aware diagnostic on parse error. +Shape parseJsonSchema(String source); +``` + +New module: `lib/src/schema/loader.dart` + +```dart +/// Load a schema from a file path, auto-detecting siblings if +/// [explicitPath] is null. +Shape? loadSchema({String? explicitPath, String? dataPath}); + +/// Merge a schema shape with an observed data shape per the rules +/// in doc/schema-design.md section 3. Throws [QueryError] on concrete- +/// type disagreement. +Shape mergeSchemaWithData(Shape schema, Shape dataShape); + +/// Render a [Shape] as a JSON Schema document. +/// +/// Round-trips with [parseJsonSchema] — parsing the output of +/// `renderJsonSchema(s)` yields a shape equal to `s`. +String renderJsonSchema(Shape shape); +``` + +Library barrel exports: `parseJsonSchema`, `loadSchema`, +`mergeSchemaWithData`, `renderJsonSchema`, `SOptional`. + +Existing APIs (`explain`, `inferShape`, `canWriteShapeAs`, +`renderExplain`, `renderExplainJson`, `shapeToJson`) are unchanged +in signature. `SOptional` propagates through them naturally via the +exhaustive-switch update. + +### 7. Interaction with existing 0.9.0 features + +- **ndjson**: `lam --ndjson --schema line.schema.json query file.ndjson` + threads the schema as each line's initial shape. No new design. +- **`--flatten-cells json`**: schema-aware. Nested-list cells still + refuse by default; `--flatten-cells json` still widens writer + acceptance. Schema provides richer element shape for CSV writers. +- **`--explain-trivial`**: a schema-provided optional field accessed + without a null guard still triggers the runtime-rejection warning + even under `--explain-trivial`. Trivial-result detection benefits + from schema: `sort_by(.missing)` becomes provably missing when the + schema doesn't declare it. +- **Hints**: `mergeSchemaWithData` errors populate `hints` where a + CLI flag would resolve the conflict. For 0.9.0, no such flags + exist, so `hints` stays empty on schema errors. + +### 8. Grammar of the accepted JSON Schema subset + +``` +schema := object_schema + | array_schema + | scalar_schema + +scalar_schema := {"type": "null"} + | {"type": "boolean"} + | {"type": "number"} + | {"type": "integer"} # same as number, per lambe + | {"type": "string"} + +array_schema := {"type": "array", "items": } + +object_schema := {"type": "object", + "properties": {: , ...}, + "required": [, ...]?} +``` + +Keywords outside this grammar (but not in the explicit reject list) +are ignored as metadata. Reject-list violations are errors. + +## Implementation plan + +~1 week. Order: + +1. **`SOptional` variant.** Add to `shape.dart`. Run the analyzer; + fix every exhaustive-switch compile error. Each fix is local: + - `renderShape`: `optional`. + - `shapeToJson`: `{"kind": "optional", "inner": ...}`. + - `canWriteAs` requirements: optional unwraps to inner for + writability purposes, except for TOML/HCL where "optional at + root" is unwritable. + - `inferShape`: field access on `SMap` with optional field yields + `SOptional`. Subsequent ops propagate or strip as + appropriate. + - `pipe_ops.dart` predicates: optional is accepted wherever + inner is, but a runtime-rejection warning is emitted. +2. **Parser** (`lib/src/schema/parser.dart`): walk `JsonValue`, + recursive. Line-aware errors via `rumil_parsers.parseJson` error + positions. +3. **Loader + merge** (`lib/src/schema/loader.dart`): file reader, + sibling auto-detect, `mergeSchemaWithData` with diagnostic errors. +4. **Renderer** (`lib/src/schema/render.dart` or inline): shape → + JSON Schema. Used by `--print-shape`. +5. **CLI** (`bin/lam.dart`): rename flag, add option, thread + through explain and evaluation paths. +6. **REPL** (`lib/src/repl.dart`): `:schema`, `:print-shape`. +7. **MCP** (`bin/mcp_server.dart`): `schema` param, `lambe_check` + tool, `lambe_schema` → `lambe_print_shape` rename. +8. **Tests**: + - `test/schema_parser_test.dart`: every shape constructor, + `required` semantics, unknown-keyword tolerance, rejected- + keyword errors, round-trip with `renderJsonSchema`. + - `test/schema_loader_test.dart`: sibling auto-detect, merge + rules, disagreement errors, validation-as-side-effect. + - Extend `test/cli_integration_test.dart`: `--schema`, + `--print-shape`, rename rejection error. + - Extend `test/shape_explain_test.dart`: schema-seeded explain + reports. +9. **Docs**: + - `doc/schema.md`: user-facing guide with examples. + - `doc/lam.1.md`: `--schema`, `--print-shape`. + - `CHANGELOG.md`: Added bullets + Breaking callout for rename. + - `README.md`: reframe to the shape-feedback-loop pitch (held + until all tracks land). + +## Open decisions + +- **MCP tool rename `lambe_schema` → `lambe_print_shape`.** Strictly + speaking, backward-compatible would keep the old name. Renaming + aligns with the CLI rename. Lean: rename. Any agent with the old + name gets a clear "tool not found" and can update. + +- **Auto-detect behavior when both `--schema ` and a sibling + `.schema.json` exist.** Explicit wins. + +These are resolved; calling them out for the record. From c61ae44778ea5885dffd7021ee0ae9447f70385f Mon Sep 17 00:00:00 2001 From: Hakim Jonas Ghoula Date: Sat, 2 May 2026 23:47:39 +0200 Subject: [PATCH 08/67] Track A step 1: SOptional shape variant Add SOptional(Shape) to the sealed shape ADT. This is the variant JSON Schema's `required` semantics demand and the shape the wider "shape as feedback loop" positioning needs. Shipping the schema feature without it would silently misrepresent optional fields. Constructor semantics SOptional(SOptional(x)) collapses to SOptional(x) via the factory. Guarantees no stacked optionality anywhere, so downstream code never has to handle the degenerate case. Acceptance semantics (op predicates) Optional unwraps for op acceptance: `filter` on SOptional> is accepted. The potential absence is surfaced by the explain runtime-rejection analyzer, not by the acceptance predicate. Helpers in pipe_ops.dart (_acceptsList, _acceptsMap, etc.) all unwrap via a shared _unwrap helper. Root-requirement semantics (output formats) MustBeMap / MustBeList / MustBeFlatList do NOT unwrap. An optional at the root means "value may be absent entirely"; TOML/HCL/CSV cannot serialize an absence. Users must materialize a default before the --to step. The distinction between op-acceptance and root-requirement is deliberate: ops tolerate runtime null propagation, root serializers don't. Inference propagation Field access on SMap with an optional field yields SOptional. Field access on SOptional> (null propagation) also yields SOptional. The factory collapses nested cases so the result is never SOptional>. Analyzer integration - Empty-filter check unwraps optional bool predicates; an optional bool may be true, so not "provably non-boolean." - Missing-field path check walks through optional wrappers to inspect the underlying SMap fields. - Runtime-rejection check does NOT unwrap: optional counts as a potential mismatch worth warning about. Completer integration Tab completion unwraps optional for field enumeration and inner- expression resolution. An optional list still completes against its element shape in `.users | map()` contexts. Serialization renderShape: optional. shapeToJson: {kind: optional, inner: ...}. Tests (+16 across two files) shape_test.dart: render, serialize, equality, nested collapse, embedding in other shapes. shape_explain_test.dart: field propagation, access through optional wrapper, op acceptance, missing-field walk, optional bool predicate, root rejection by TOML, nested collapse via inference, JSON round-trip. Quality gates: dart analyze clean, 1338 tests pass (was 1325, +13 new including the 16 above minus overlap with existing tests), dart format clean, pana 160/160. Zero test regressions. This is step 1 of the track A implementation plan in doc/schema-design.md. Next: JSON Schema subset parser. --- lib/lambe.dart | 1 + lib/src/completer.dart | 19 ++++++--- lib/src/shape/check.dart | 11 +++++ lib/src/shape/explain.dart | 30 +++++++++++--- lib/src/shape/infer.dart | 8 ++++ lib/src/shape/pipe_ops.dart | 39 +++++++++++++++--- lib/src/shape/shape.dart | 38 +++++++++++++++++ test/shape_explain_test.dart | 79 ++++++++++++++++++++++++++++++++++++ test/shape_test.dart | 37 +++++++++++++++++ 9 files changed, 245 insertions(+), 17 deletions(-) diff --git a/lib/lambe.dart b/lib/lambe.dart index ed53510..c1987c7 100644 --- a/lib/lambe.dart +++ b/lib/lambe.dart @@ -42,6 +42,7 @@ export 'src/shape/shape.dart' SString, SList, SMap, + SOptional, shapeOf, renderShape, shapeToJson; diff --git a/lib/src/completer.dart b/lib/src/completer.dart index 92339d3..d72b889 100644 --- a/lib/src/completer.dart +++ b/lib/src/completer.dart @@ -231,8 +231,10 @@ Completions _completionContext(LamExpr ast, int astEnd, Shape inputShape) { final inner = _innerExpr(ast.op); if (inner != null) { final collection = inferShape(ast.input, inputShape); - if (collection is SList) { - return _completionContext(inner, astEnd, collection.element); + // An optional list completes against its element shape. + final unwrapped = collection is SOptional ? collection.inner : collection; + if (unwrapped is SList) { + return _completionContext(inner, astEnd, unwrapped.element); } return (start: astEnd, end: astEnd, candidates: []); } @@ -282,11 +284,15 @@ Completions _completeAstTail( /// produces no field candidates. Completions _fieldsOf(Shape target, String partial, int dotPos) { final tokenEnd = dotPos + 1 + partial.length; - if (target is! SMap) { + // Optional maps still offer their fields for completion; null + // propagation at runtime handles the absent case. + final unwrapped = target is SOptional ? target.inner : target; + if (unwrapped is! SMap) { return (start: tokenEnd, end: tokenEnd, candidates: []); } final matching = - target.fields.keys.where((k) => k.startsWith(partial)).toList()..sort(); + unwrapped.fields.keys.where((k) => k.startsWith(partial)).toList() + ..sort(); return ( start: dotPos, end: tokenEnd, @@ -307,8 +313,9 @@ Shape _resolveTarget(LamExpr? ast, Shape inputShape) { final inner = _innerExpr(ast.op); if (inner != null) { final collection = inferShape(ast.input, inputShape); - if (collection is SList) { - return inferShape(inner, collection.element); + final unwrapped = collection is SOptional ? collection.inner : collection; + if (unwrapped is SList) { + return inferShape(inner, unwrapped.element); } return const SAny(); } diff --git a/lib/src/shape/check.dart b/lib/src/shape/check.dart index 1486103..e826792 100644 --- a/lib/src/shape/check.dart +++ b/lib/src/shape/check.dart @@ -60,6 +60,10 @@ final class MustBeMap extends ShapeRequirement { @override String describe() => 'a map'; + + // Note: does NOT unwrap [SOptional]. An optional root means the + // value may be absent; TOML/HCL cannot serialize that. Users must + // materialize with a default before the `--to` step. } /// Requires a list at the root, with no constraint on element shape. @@ -113,19 +117,26 @@ final class MustBeFlatList extends ShapeRequirement { /// Whether an element shape of the outer list produces only scalar /// cells when serialized as a CSV/TSV row. + /// + /// [SOptional] is transparent: an optional cell is flat iff its + /// inner shape is flat. An absent optional renders as an empty + /// cell, which is always valid. static bool _cellShapeIsFlat(Shape elem) => switch (elem) { SAny() || SNull() || SBool() || SNum() || SString() => true, SList(:final element) => _isScalar(element), SMap(:final fields) => fields.values.every(_isScalar), + SOptional(:final inner) => _cellShapeIsFlat(inner), }; /// Whether [s] is a scalar shape (null, bool, num, string, or unknown). /// /// `SAny` counts as scalar here: when the shape is unknown, the check /// cannot prove incompatibility and defers to the runtime guard. + /// [SOptional] is transparent; the inner shape decides. static bool _isScalar(Shape s) => switch (s) { SAny() || SNull() || SBool() || SNum() || SString() => true, SList() || SMap() => false, + SOptional(:final inner) => _isScalar(inner), }; } diff --git a/lib/src/shape/explain.dart b/lib/src/shape/explain.dart index 7ec0a88..709c58a 100644 --- a/lib/src/shape/explain.dart +++ b/lib/src/shape/explain.dart @@ -234,12 +234,16 @@ ExplainReport explain( /// /// Returns `null` when no warning applies. String? _analyzePredicate(LamExpr op, Shape inputShape) { + // Unwrap optional for per-op analysis: the op's behavior is + // determined by the inner shape; absence is handled by the + // runtime-rejection warning elsewhere. + final concrete = inputShape is SOptional ? inputShape.inner : inputShape; switch (op) { case FilterOp(:final predicate): - final element = inputShape is SList ? inputShape.element : const SAny(); + final element = concrete is SList ? concrete.element : const SAny(); return _predicateWarning(predicate, element, 'filter', 'element'); case FilterValuesOp(:final predicate): - final value = switch (inputShape) { + final value = switch (concrete) { SMap(:final fields) when fields.isNotEmpty => fields.values.reduce( (a, b) => a == b ? a : const SAny(), ), @@ -270,7 +274,15 @@ String? _predicateWarning( '$opName will always be empty'; } final predShape = inferShape(predicate, context); - if (predShape is SBool || predShape is SAny) return null; + // Unwrap optional: an optional bool predicate may be absent at + // runtime (yields null, fails the == true check) but isn't + // "provably" non-boolean. Let it pass this check; the + // runtime-rejection warning surfaces the absence concern. + final unwrapped = switch (predShape) { + SOptional(:final inner) => inner, + _ => predShape, + }; + if (unwrapped is SBool || unwrapped is SAny) return null; return '$opName predicate has shape ${renderShape(predShape)}; ' '$opName requires a boolean, so this will always be empty'; } @@ -316,8 +328,11 @@ String? _analyzeTrivial(LamExpr op, Shape inputShape) { _ => (null, null), }; if (argExpr == null || opName == null) return null; - if (inputShape is! SList) return null; - final missing = _missingFieldPath(argExpr, inputShape.element); + // Unwrap optional: the op either runs on the inner list or is + // flagged by runtime-rejection, both handled elsewhere. + final concrete = inputShape is SOptional ? inputShape.inner : inputShape; + if (concrete is! SList) return null; + final missing = _missingFieldPath(argExpr, concrete.element); if (missing == null) return null; return '$opName argument $missing does not exist on the element shape; ' 'the result is trivial'; @@ -351,6 +366,11 @@ String? _missingFieldPath(LamExpr expr, Shape context) { var ctx = context; for (var i = 0; i < segments.length; i++) { + // Walk through optional wrappers transparently: an optional map + // still has its declared fields, just maybe absent as a whole. + while (ctx is SOptional) { + ctx = ctx.inner; + } if (ctx is SAny) return null; if (ctx is! SMap) return null; final name = segments[i]; diff --git a/lib/src/shape/infer.dart b/lib/src/shape/infer.dart index f3c4e75..34bbd22 100644 --- a/lib/src/shape/infer.dart +++ b/lib/src/shape/infer.dart @@ -116,6 +116,14 @@ Shape _asShape(Shape input, OutputFormat target) { } Shape _lookupField(Shape context, String name) { + if (context is SOptional) { + // Field access through an optional propagates the optional: if + // the outer value is absent, null propagation returns null for + // the field access too. So `.field` on SOptional yields + // SOptional. + final inner = _lookupField(context.inner, name); + return SOptional(inner); + } if (context is SMap) { return context.fields[name] ?? const SAny(); } diff --git a/lib/src/shape/pipe_ops.dart b/lib/src/shape/pipe_ops.dart index deaeada..d6b9685 100644 --- a/lib/src/shape/pipe_ops.dart +++ b/lib/src/shape/pipe_ops.dart @@ -211,12 +211,39 @@ Shape inferPipeOpShape(Shape input, LamExpr op) { // call site, keeps the invariant a property of the spec table // itself: any new spec defined via these helpers inherits it. -bool _acceptsList(Shape s) => s is SList || s is SAny; -bool _acceptsMap(Shape s) => s is SMap || s is SAny; -bool _acceptsListOrMap(Shape s) => s is SList || s is SMap || s is SAny; -bool _acceptsListMapOrString(Shape s) => - s is SList || s is SMap || s is SString || s is SAny; -bool _acceptsStringOrNum(Shape s) => s is SString || s is SNum || s is SAny; +// Optional wraps the value's potential absence. For acceptance +// purposes, unwrap: if the inner shape is accepted, so is the +// optional. The runtime-rejection warning in `explain.dart` is the +// user-visible note that "may be absent at runtime." Downstream +// inference still sees the optional propagated by [inferShape] so +// warnings keep firing along the chain. +Shape _unwrap(Shape s) => s is SOptional ? s.inner : s; + +bool _acceptsList(Shape s) { + s = _unwrap(s); + return s is SList || s is SAny; +} + +bool _acceptsMap(Shape s) { + s = _unwrap(s); + return s is SMap || s is SAny; +} + +bool _acceptsListOrMap(Shape s) { + s = _unwrap(s); + return s is SList || s is SMap || s is SAny; +} + +bool _acceptsListMapOrString(Shape s) { + s = _unwrap(s); + return s is SList || s is SMap || s is SString || s is SAny; +} + +bool _acceptsStringOrNum(Shape s) { + s = _unwrap(s); + return s is SString || s is SNum || s is SAny; +} + bool _acceptsAny(Shape _) => true; // --- List-consuming ops -------------------------------------------- diff --git a/lib/src/shape/shape.dart b/lib/src/shape/shape.dart index 7742ece..36b1122 100644 --- a/lib/src/shape/shape.dart +++ b/lib/src/shape/shape.dart @@ -162,6 +162,43 @@ final class SMap extends Shape { } } +/// A value that may be absent. Used for JSON Schema properties not +/// listed in `required`, and any other statically-known optionality. +/// +/// Optional appears in the shape system to let schema-declared +/// absences be represented faithfully. At evaluation time, an +/// optional field that's absent produces null through Lambe's usual +/// null-propagation; the query language itself is unchanged. The +/// variant's purpose is purely to sharpen [explain] and writer +/// compatibility checks: an optional field accessed without a null +/// guard produces a runtime-rejection warning during static analysis. +/// +/// Nested optionality collapses: `SOptional(SOptional(x))` is +/// semantically identical to `SOptional(x)`. Constructors enforce +/// this by unwrapping. +final class SOptional extends Shape { + /// The shape of the value when present. + final Shape inner; + + /// Creates an [SOptional] shape. + /// + /// If [inner] is itself [SOptional], unwraps to avoid nested + /// optionality. + factory SOptional(Shape inner) => + inner is SOptional ? inner : SOptional._(inner); + + const SOptional._(this.inner); + + @override + bool operator ==(Object other) => other is SOptional && other.inner == inner; + + @override + int get hashCode => Object.hash('optional', inner); + + @override + String toString() => 'optional<$inner>'; +} + /// Infer the structural [Shape] of [value]. /// /// Recurses through lists and maps. Lists are sampled rather than fully @@ -242,4 +279,5 @@ Map shapeToJson(Shape shape) => switch (shape) { key: shapeToJson(value), }, }, + SOptional(:final inner) => {'kind': 'optional', 'inner': shapeToJson(inner)}, }; diff --git a/test/shape_explain_test.dart b/test/shape_explain_test.dart index 8dd739e..2dd3bee 100644 --- a/test/shape_explain_test.dart +++ b/test/shape_explain_test.dart @@ -577,4 +577,83 @@ void main() { expect(parsed['flatten_cells'], 'json'); }); }); + + group('SOptional: propagates through inference and analyzers', () { + test('field access on map with optional field returns optional', () { + final mapShape = SMap({'age': SOptional(const SNum())}); + final report = explain(_parse('.age'), mapShape); + expect(report.stages.last.shape, SOptional(const SNum())); + }); + + test('field access on optional map wraps result in optional', () { + final mapShape = SOptional(const SMap({'name': SString()})); + final report = explain(_parse('.name'), mapShape); + expect(report.stages.last.shape, SOptional(const SString())); + }); + + test('filter accepts optional list (acceptance unwraps)', () { + final listShape = SOptional(const SList(SNum())); + final report = explain(_parse('. | filter(. > 0)'), listShape); + // Rejection analyzer should NOT fire: optional unwraps for + // acceptance, and the inner SList is accepted. + final rejection = report.warnings.where( + (w) => w.kind == WarningKind.runtimeRejection, + ); + expect(rejection, isEmpty); + }); + + test('missing-field check walks through optional wrappers', () { + final shape = SOptional( + const SMap({ + 'users': SList(SMap({'name': SString()})), + }), + ); + // `.users | filter(.missing)` on an optional-outer map: the + // walk should see users is a list of maps with only `name`. + final report = explain(_parse('.users | filter(.missing)'), shape); + final emptyFilter = + report.warnings + .where((w) => w.kind == WarningKind.emptyFilter) + .toList(); + expect(emptyFilter, hasLength(1)); + expect(emptyFilter.first.message, contains('.missing')); + }); + + test('optional bool predicate is not provably-empty', () { + // An optional bool is "bool or absent" — not provably non-boolean. + // The empty-filter check should NOT fire. + final listShape = SList(SMap({'active': SOptional(const SBool())})); + final report = explain(_parse('. | filter(.active)'), listShape); + final emptyFilter = report.warnings.where( + (w) => w.kind == WarningKind.emptyFilter, + ); + expect(emptyFilter, isEmpty); + }); + + test('root optional map rejects TOML (MustBeMap does NOT unwrap)', () { + // Root optional means "might be absent entirely" — TOML cannot + // serialize that without a materialization step. + final shape = SOptional(const SMap({'a': SNum()})); + final report = explain(_parse('.'), shape); + expect(report.notWritableAs, contains(OutputFormat.toml)); + }); + + test('nested optionality collapses through factory', () { + // Verified via the factory, but also check that inference never + // produces stacked optionals via field-through-optional. + final shape = SOptional(SMap({'nested': SOptional(const SNum())})); + final report = explain(_parse('.nested'), shape); + // Two optional steps (outer map, inner field) should collapse + // to a single SOptional. + expect(report.stages.last.shape, SOptional(const SNum())); + }); + + test('shapeToJson round-trips optional', () { + final shape = SOptional(const SNum()); + expect(shapeToJson(shape), { + 'kind': 'optional', + 'inner': {'kind': 'number'}, + }); + }); + }); } diff --git a/test/shape_test.dart b/test/shape_test.dart index 7186a51..53fa8af 100644 --- a/test/shape_test.dart +++ b/test/shape_test.dart @@ -210,4 +210,41 @@ void main() { expect(json, contains('"kind":"number"')); }); }); + + group('SOptional: the optionality wrapper', () { + test('renders as optional', () { + expect(renderShape(SOptional(const SNum())), 'optional'); + expect( + renderShape(SOptional(const SList(SString()))), + 'optional>', + ); + }); + + test('serializes to {kind: optional, inner: ...}', () { + expect(shapeToJson(SOptional(const SNum())), { + 'kind': 'optional', + 'inner': {'kind': 'number'}, + }); + }); + + test('equality compares inner shapes', () { + expect(SOptional(const SNum()) == SOptional(const SNum()), isTrue); + expect(SOptional(const SNum()) == SOptional(const SString()), isFalse); + expect(SOptional(const SNum()) == const SNum(), isFalse); + }); + + test('nested optional collapses (factory unwraps)', () { + // SOptional(SOptional(x)) is semantically identical to + // SOptional(x); the factory enforces this. + final nested = SOptional(SOptional(const SNum())); + expect(nested, equals(SOptional(const SNum()))); + expect(renderShape(nested), 'optional'); + }); + + test('lives inside other shapes too', () { + final shape = SMap({'age': SOptional(const SNum())}); + expect(renderShape(shape), 'map>'); + expect(shape.fields['age'], isA()); + }); + }); } From 48b3780ad4395e65fb46412900b3d6808bc019c6 Mon Sep 17 00:00:00 2001 From: Hakim Jonas Ghoula Date: Sun, 3 May 2026 00:01:26 +0200 Subject: [PATCH 09/67] Track A step 2: JSON Schema subset parser MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Add parseJsonSchema(String): Shape in lib/src/schema/parser.dart. Walks the JsonValue output of rumil_parsers' parseJson, mapping four keywords onto the shape ADT: - type: string selects the target kind - properties: nested field schemas for "object" - items: element schema for "array" - required: which properties stay concrete vs become SOptional Rejected keywords (23 total) each produce a targeted error with a JSON path pointing at the site. Rejections cover value-level constraints (minimum/maximum/pattern/enum/format/minLength/maxLength /minItems/maxItems/uniqueItems/const/multipleOf), structural combinators (allOf/oneOf/anyOf/not), conditionals (if/then/else /dependencies/dependentRequired/dependentSchemas), references ($ref/$defs/definitions), and extra object constraints (additionalProperties/patternProperties/propertyNames). Unknown keywords are tolerated per JSON Schema's extensibility convention — $schema, $id, title, description all pass through as ignored metadata. Error diagnostics carry a JSON path ($.properties.a.properties.b) so users can find the offending nested schema without scanning the whole file. Tests (41 new): - 5 scalar types round-trip (null, bool, number, integer→number, string). - 3 array variants (no items, scalar items, object items). - 5 object + required combinations (empty, all required, no required, partial required, nested object with own required). - 18 rejection tests (one per keyword class). - 2 metadata-tolerance tests. - 7 error-diagnostic tests (invalid JSON, non-object root, missing type, unsupported type, properties type error, required type error, nested error with path). - 2 realistic round-trip scenarios (user record, list of records). Exported parseJsonSchema from package:lambe/lambe.dart. Quality gates: dart analyze clean, 1379 tests pass (was 1338, +41), dart format clean, pana 160/160. Step 2 of 9 in doc/schema-design.md's implementation plan. Next: loader (file IO + sibling auto-detect) and mergeSchemaWithData (disagreement-is-error semantics). --- lib/lambe.dart | 1 + lib/src/schema/parser.dart | 202 +++++++++++++++++++++ test/schema_parser_test.dart | 339 +++++++++++++++++++++++++++++++++++ 3 files changed, 542 insertions(+) create mode 100644 lib/src/schema/parser.dart create mode 100644 test/schema_parser_test.dart diff --git a/lib/lambe.dart b/lib/lambe.dart index c1987c7..c95c520 100644 --- a/lib/lambe.dart +++ b/lib/lambe.dart @@ -32,6 +32,7 @@ export 'src/input.dart' export 'src/mcp_payload.dart' show renderMcpShapeErrorPayload; export 'src/output.dart' show OutputFormat, CellPolicy, formatOutput, inferSchema; +export 'src/schema/parser.dart' show parseJsonSchema; export 'src/shape/shape.dart' show Shape, diff --git a/lib/src/schema/parser.dart b/lib/src/schema/parser.dart new file mode 100644 index 0000000..60ba332 --- /dev/null +++ b/lib/src/schema/parser.dart @@ -0,0 +1,202 @@ +/// Parser for a JSON Schema subset that maps to Lambe [Shape]. +/// +/// Accepts four keywords: +/// - `type` (string): `"null"`, `"boolean"`, `"number"`, `"integer"`, +/// `"string"`, `"array"`, or `"object"`. +/// - `properties` (object, only meaningful when `type` is `"object"`): +/// field name → nested schema. +/// - `items` (schema, only meaningful when `type` is `"array"`): the +/// element schema. +/// - `required` (array of strings, only meaningful when `type` is +/// `"object"`): listed properties are required; others become +/// [SOptional]. +/// +/// Rejects structural combinators, value-level constraints, and +/// references with a clear per-keyword error. Unknown keywords are +/// ignored (JSON Schema's extensibility convention for metadata like +/// `description` or `title`). +library; + +import 'package:rumil/rumil.dart'; +import 'package:rumil_parsers/rumil_parsers.dart'; + +import '../errors.dart'; +import '../shape/shape.dart'; + +/// Parse a JSON Schema subset [source] into a [Shape]. +/// +/// Throws [QueryError] on JSON parse error, on unsupported schema +/// features, or on schemas that do not describe a shape. +Shape parseJsonSchema(String source) { + final parseResult = parseJson(source); + final json = switch (parseResult) { + Success(:final value) => value, + Partial(:final value) => value, + Failure(:final errors) => + throw QueryError( + 'schema: invalid JSON (${errors.firstOrNull?.toString() ?? "parse failed"})', + ), + }; + return _schema(json, path: r'$'); +} + +Shape _schema(JsonValue node, {required String path}) { + if (node is! JsonObject) { + throw QueryError( + 'schema at $path: expected a JSON object describing a shape, ' + 'got ${_kindOf(node)}', + ); + } + _rejectUnsupportedKeywords(node, path: path); + + final typeValue = node.fields['type']; + if (typeValue == null) { + throw QueryError( + 'schema at $path: missing "type" keyword. A schema must declare ' + 'a type such as "null", "boolean", "number", "string", "array", ' + 'or "object".', + ); + } + if (typeValue is! JsonString) { + throw QueryError( + 'schema at $path: "type" must be a string, got ${_kindOf(typeValue)}', + ); + } + + switch (typeValue.value) { + case 'null': + return const SNull(); + case 'boolean': + return const SBool(); + case 'number': + case 'integer': + return const SNum(); + case 'string': + return const SString(); + case 'array': + return _array(node, path: path); + case 'object': + return _object(node, path: path); + default: + throw QueryError( + 'schema at $path: unsupported type "${typeValue.value}". ' + 'Supported: null, boolean, number, integer, string, array, object.', + ); + } +} + +Shape _array(JsonObject node, {required String path}) { + final items = node.fields['items']; + if (items == null) return const SList(SAny()); + return SList(_schema(items, path: '$path.items')); +} + +Shape _object(JsonObject node, {required String path}) { + final props = node.fields['properties']; + final required = _requiredList(node.fields['required'], path: path); + + if (props == null) return const SMap({}); + if (props is! JsonObject) { + throw QueryError( + 'schema at $path: "properties" must be a JSON object, ' + 'got ${_kindOf(props)}', + ); + } + + final fields = {}; + for (final MapEntry(:key, :value) in props.fields.entries) { + final inner = _schema(value, path: '$path.properties.$key'); + fields[key] = required.contains(key) ? inner : SOptional(inner); + } + return SMap(fields); +} + +Set _requiredList(JsonValue? node, {required String path}) { + if (node == null) { + // No `required`: JSON Schema's default is "no properties are + // required." Every property becomes SOptional. + return const {}; + } + if (node is! JsonArray) { + throw QueryError( + 'schema at $path: "required" must be an array of strings, ' + 'got ${_kindOf(node)}', + ); + } + final names = {}; + for (var i = 0; i < node.elements.length; i++) { + final el = node.elements[i]; + if (el is! JsonString) { + throw QueryError( + 'schema at $path: "required[$i]" must be a string, ' + 'got ${_kindOf(el)}', + ); + } + names.add(el.value); + } + return names; +} + +/// Keywords that are part of JSON Schema but have no mapping to +/// Lambe's shape system. Each is rejected with a targeted error so the +/// user sees exactly which feature is unsupported. +const _rejectedKeywords = { + // Value-level constraints — out of scope. Lambe is a shape system, + // not a validator. + 'minimum': 'value-level constraints are not supported', + 'maximum': 'value-level constraints are not supported', + 'exclusiveMinimum': 'value-level constraints are not supported', + 'exclusiveMaximum': 'value-level constraints are not supported', + 'multipleOf': 'value-level constraints are not supported', + 'minLength': 'value-level constraints are not supported', + 'maxLength': 'value-level constraints are not supported', + 'pattern': 'value-level constraints are not supported', + 'format': 'value-level constraints are not supported', + 'minItems': 'value-level constraints are not supported', + 'maxItems': 'value-level constraints are not supported', + 'uniqueItems': 'value-level constraints are not supported', + 'minProperties': 'value-level constraints are not supported', + 'maxProperties': 'value-level constraints are not supported', + 'const': 'value-level constraints are not supported', + 'enum': 'value-level constraints are not supported', + // Structural combinators — out of scope. Lambe's shape ADT is + // unions-free by design. + 'allOf': 'structural combinators are not supported', + 'oneOf': 'structural combinators are not supported', + 'anyOf': 'structural combinators are not supported', + 'not': 'structural combinators are not supported', + // Conditionals — would require a constraint solver, not a shape + // system. + 'if': 'conditional schemas are not supported', + 'then': 'conditional schemas are not supported', + 'else': 'conditional schemas are not supported', + 'dependencies': 'conditional schemas are not supported', + 'dependentRequired': 'conditional schemas are not supported', + 'dependentSchemas': 'conditional schemas are not supported', + // References — schemas are single-file in 0.9.0. + '\$ref': 'schema references (\$ref) are not supported', + '\$defs': 'schema references (\$ref) are not supported', + 'definitions': 'schema references (\$ref) are not supported', + // Extra object constraints — out of scope. + 'additionalProperties': 'additionalProperties is not supported', + 'patternProperties': 'patternProperties is not supported', + 'propertyNames': 'propertyNames is not supported', +}; + +void _rejectUnsupportedKeywords(JsonObject node, {required String path}) { + for (final key in node.fields.keys) { + final reason = _rejectedKeywords[key]; + if (reason != null) { + throw QueryError('schema at $path: "$key" is unsupported — $reason.'); + } + } +} + +String _kindOf(JsonValue v) => switch (v) { + JsonNull() => 'null', + JsonBool() => 'bool', + JsonNumber() => 'number', + JsonString() => 'string', + JsonArray() => 'array', + JsonObject() => 'object', +}; diff --git a/test/schema_parser_test.dart b/test/schema_parser_test.dart new file mode 100644 index 0000000..cb3ee8f --- /dev/null +++ b/test/schema_parser_test.dart @@ -0,0 +1,339 @@ +/// Tests for the JSON Schema subset parser. +/// +/// The contract: +/// 1. All seven scalar and container shapes in the Lambe ADT +/// round-trip through `type` plus the appropriate subkey. +/// 2. `required` drives the optionality of properties: listed keys +/// stay required, unlisted keys become [SOptional]. +/// 3. Rejected JSON Schema keywords produce targeted errors; +/// unknown metadata keywords are ignored. +/// 4. Errors include a JSON-path hint pointing at the site. +library; + +import 'package:lambe/lambe.dart'; +import 'package:test/test.dart'; + +void main() { + group('parseJsonSchema: scalars', () { + test('null', () { + expect(parseJsonSchema('{"type": "null"}'), const SNull()); + }); + test('boolean', () { + expect(parseJsonSchema('{"type": "boolean"}'), const SBool()); + }); + test('number', () { + expect(parseJsonSchema('{"type": "number"}'), const SNum()); + }); + test('integer maps to number (Lambe has no int/double distinction)', () { + expect(parseJsonSchema('{"type": "integer"}'), const SNum()); + }); + test('string', () { + expect(parseJsonSchema('{"type": "string"}'), const SString()); + }); + }); + + group('parseJsonSchema: arrays', () { + test('array without items defaults to list', () { + expect(parseJsonSchema('{"type": "array"}'), const SList(SAny())); + }); + + test('array with scalar items', () { + expect( + parseJsonSchema('{"type": "array", "items": {"type": "string"}}'), + const SList(SString()), + ); + }); + + test('array of objects', () { + const schema = + '{"type": "array", "items": {"type": "object", ' + '"properties": {"x": {"type": "number"}}, "required": ["x"]}}'; + expect(parseJsonSchema(schema), const SList(SMap({'x': SNum()}))); + }); + }); + + group('parseJsonSchema: objects and required', () { + test('empty object', () { + expect( + parseJsonSchema('{"type": "object"}'), + const SMap({}), + ); + }); + + test('all properties required when listed in required', () { + const schema = + '{"type": "object", "properties": ' + '{"a": {"type": "number"}, "b": {"type": "string"}}, ' + '"required": ["a", "b"]}'; + expect( + parseJsonSchema(schema), + const SMap({'a': SNum(), 'b': SString()}), + ); + }); + + test('absent required means all properties are SOptional', () { + const schema = + '{"type": "object", "properties": ' + '{"a": {"type": "number"}, "b": {"type": "string"}}}'; + final shape = parseJsonSchema(schema) as SMap; + expect(shape.fields['a'], isA()); + expect((shape.fields['a']! as SOptional).inner, const SNum()); + expect(shape.fields['b'], isA()); + expect((shape.fields['b']! as SOptional).inner, const SString()); + }); + + test('partial required: unlisted become SOptional', () { + const schema = + '{"type": "object", "properties": ' + '{"a": {"type": "number"}, "b": {"type": "string"}}, ' + '"required": ["a"]}'; + final shape = parseJsonSchema(schema) as SMap; + expect(shape.fields['a'], const SNum()); + expect(shape.fields['b'], isA()); + }); + + test('nested object with its own required list', () { + const schema = ''' + { + "type": "object", + "properties": { + "user": { + "type": "object", + "properties": { + "name": {"type": "string"}, + "age": {"type": "number"} + }, + "required": ["name"] + } + }, + "required": ["user"] + } + '''; + final shape = parseJsonSchema(schema) as SMap; + final user = shape.fields['user']! as SMap; + expect(user.fields['name'], const SString()); + expect(user.fields['age'], isA()); + }); + }); + + group('parseJsonSchema: rejected keywords', () { + final rejections = { + // Value-level constraints + '"minimum"': '{"type": "number", "minimum": 0}', + '"maximum"': '{"type": "number", "maximum": 100}', + '"pattern"': '{"type": "string", "pattern": "^[a-z]+\$"}', + '"enum"': '{"type": "string", "enum": ["a", "b"]}', + '"format"': '{"type": "string", "format": "email"}', + '"minLength"': '{"type": "string", "minLength": 1}', + // Structural combinators + '"allOf"': '{"allOf": [{"type": "object"}]}', + '"oneOf"': '{"oneOf": [{"type": "string"}, {"type": "number"}]}', + '"anyOf"': '{"anyOf": [{"type": "string"}]}', + '"not"': '{"not": {"type": "null"}}', + // Conditionals + '"if"': '{"type": "object", "if": {"type": "object"}}', + '"dependencies"': '{"type": "object", "dependencies": {"a": ["b"]}}', + // References + '"\$ref"': '{"\$ref": "#/definitions/foo"}', + '"\$defs"': '{"\$defs": {"foo": {"type": "string"}}}', + 'definitions': + '{"type": "object", "definitions": {"x": {"type": "string"}}}', + // Extra object constraints + '"additionalProperties"': + '{"type": "object", "additionalProperties": false}', + '"patternProperties"': + '{"type": "object", "patternProperties": {".*": {"type": "string"}}}', + }; + + for (final entry in rejections.entries) { + test('rejects ${entry.key}', () { + expect( + () => parseJsonSchema(entry.value), + throwsA( + isA().having( + (e) => e.message, + 'message', + // The rejection message names the keyword without quotes. + contains(entry.key.replaceAll('"', '')), + ), + ), + ); + }); + } + }); + + group('parseJsonSchema: ignored metadata', () { + test('description and title are tolerated', () { + const schema = ''' + { + "type": "object", + "title": "User", + "description": "A user record", + "properties": {"name": {"type": "string", "description": "Full name"}}, + "required": ["name"] + } + '''; + expect(parseJsonSchema(schema), const SMap({'name': SString()})); + }); + + test('\$schema and \$id at root are ignored', () { + const schema = ''' + { + "\$schema": "http://json-schema.org/draft-07/schema", + "\$id": "https://example.com/schemas/x", + "type": "string" + } + '''; + expect(parseJsonSchema(schema), const SString()); + }); + }); + + group('parseJsonSchema: error diagnostics', () { + test('invalid JSON surfaces a JSON parse error', () { + expect( + () => parseJsonSchema('{not valid'), + throwsA( + isA().having( + (e) => e.message, + 'message', + contains('invalid JSON'), + ), + ), + ); + }); + + test('non-object root is rejected', () { + expect( + () => parseJsonSchema('42'), + throwsA( + isA().having( + (e) => e.message, + 'message', + contains('expected a JSON object'), + ), + ), + ); + }); + + test('missing type is rejected with a clear message', () { + expect( + () => parseJsonSchema('{"properties": {}}'), + throwsA( + isA().having( + (e) => e.message, + 'message', + contains('missing "type"'), + ), + ), + ); + }); + + test('unsupported type value is rejected', () { + expect( + () => parseJsonSchema('{"type": "color"}'), + throwsA( + isA().having( + (e) => e.message, + 'message', + contains('unsupported type "color"'), + ), + ), + ); + }); + + test('properties must be an object', () { + expect( + () => parseJsonSchema('{"type": "object", "properties": [1, 2]}'), + throwsA( + isA().having( + (e) => e.message, + 'message', + contains('"properties" must be'), + ), + ), + ); + }); + + test('required must be an array of strings', () { + expect( + () => parseJsonSchema( + '{"type": "object", "properties": {"a": {"type": "number"}}, ' + '"required": "a"}', + ), + throwsA( + isA().having( + (e) => e.message, + 'message', + contains('"required" must be'), + ), + ), + ); + }); + + test('nested error includes the JSON path to the offender', () { + expect( + () => parseJsonSchema( + '{"type": "object", "properties": ' + '{"a": {"type": "object", "properties": ' + '{"b": {"type": "nonsense"}}}}, "required": ["a"]}', + ), + throwsA( + isA().having( + (e) => e.message, + 'message', + allOf( + contains('.properties.a.properties.b'), + contains('unsupported type "nonsense"'), + ), + ), + ), + ); + }); + }); + + group('parseJsonSchema: full round-trip scenarios', () { + test('realistic user record', () { + const schema = ''' + { + "type": "object", + "properties": { + "name": {"type": "string"}, + "age": {"type": "number"}, + "active": {"type": "boolean"}, + "tags": {"type": "array", "items": {"type": "string"}} + }, + "required": ["name", "age"] + } + '''; + final shape = parseJsonSchema(schema) as SMap; + expect(shape.fields['name'], const SString()); + expect(shape.fields['age'], const SNum()); + expect(shape.fields['active'], isA()); + expect(shape.fields['tags'], isA()); + expect( + (shape.fields['tags']! as SOptional).inner, + const SList(SString()), + ); + }); + + test('list of records (API response shape)', () { + const schema = ''' + { + "type": "array", + "items": { + "type": "object", + "properties": { + "id": {"type": "string"}, + "count": {"type": "integer"} + }, + "required": ["id", "count"] + } + } + '''; + expect( + parseJsonSchema(schema), + const SList(SMap({'id': SString(), 'count': SNum()})), + ); + }); + }); +} From 6a93b9ac66fe74a248c1183b30c58e079311bbbe Mon Sep 17 00:00:00 2001 From: Hakim Jonas Ghoula Date: Sun, 3 May 2026 00:06:48 +0200 Subject: [PATCH 10/67] Track A step 3: schema loader with sibling auto-detect + merge Add lib/src/schema/loader.dart with three functions: loadSchemaFromFile(path) Reads and parses a schema file. QueryError on missing file or parser rejection. loadSchemaForData({explicitSchemaPath, dataPath}) Explicit path wins. Otherwise auto-detects a .schema.json sibling. Returns null when neither exists. Handles extension rewriting (data.json -> data.schema.json, events.ndjson -> events.schema.json). mergeSchemaWithData(schema, data) Schema-augments-data merge per doc/schema-design.md: - SAny on either side: the other wins. - SOptional + present data: strip optional, merge inners (field is concretely there for this run). - SOptional + absent data: keep optional (field may be absent in other runs). - SOptional + null data: keep optional (Lambe-style null propagation: null ~ absent). - Schema-only fields: preserved. - Data-only fields: preserved (schema is a partial description). - Lists and maps recurse. - Concrete-type disagreement at any path: QueryError naming path ($.user.age, $[*]). Path format uses JSON Path-ish notation: $ for root, .field for map descent, [*] for list element. Same rule throughout: agreement passes, schema fills in gaps, data fills in extras, concrete disagreement is an error. Keeps --explain honest in the schema-agrees-with-data case and loud in the schema-contradicts-data case. Null-data policy The stance "schema optional + data null keeps optional" is a deliberate choice: JSON Schema users commonly use null for absent fields, and Lambe's null-propagation semantics treat null similarly to absent. Being strict here would produce friction with real-world JSON Schemas. Documented in the test that pins this behavior. Tests (+25) - 3 loadSchemaFromFile tests (success, missing file, parser error propagation). - 5 sibling auto-detect tests (no sibling, with sibling, .ndjson extension, explicit beats sibling, explicit only). - 3 agreement tests (equal scalars, SAny on either side, both SAny). - 5 disagreement tests (scalar vs scalar with path, map vs non-map, list vs non-list, nested path, list element path). - 4 SOptional handling tests (present strips, absent keeps, null keeps, disagreement on inner). - 5 augmentation tests (schema-only field, data-only field, empty list uses schema element, non-empty merges element, recursive merge). Exported loadSchemaFromFile, loadSchemaForData, mergeSchemaWithData from package:lambe/lambe.dart. Quality gates: dart analyze clean, 1404 tests pass (was 1379, +25), dart format clean, pana 160/160. Step 3 of 9 in doc/schema-design.md. Next: CLI wiring with the --schema rename. --- lib/lambe.dart | 2 + lib/src/schema/loader.dart | 151 +++++++++++++++++ test/schema_loader_test.dart | 309 +++++++++++++++++++++++++++++++++++ 3 files changed, 462 insertions(+) create mode 100644 lib/src/schema/loader.dart create mode 100644 test/schema_loader_test.dart diff --git a/lib/lambe.dart b/lib/lambe.dart index c95c520..ae4b393 100644 --- a/lib/lambe.dart +++ b/lib/lambe.dart @@ -32,6 +32,8 @@ export 'src/input.dart' export 'src/mcp_payload.dart' show renderMcpShapeErrorPayload; export 'src/output.dart' show OutputFormat, CellPolicy, formatOutput, inferSchema; +export 'src/schema/loader.dart' + show loadSchemaFromFile, loadSchemaForData, mergeSchemaWithData; export 'src/schema/parser.dart' show parseJsonSchema; export 'src/shape/shape.dart' show diff --git a/lib/src/schema/loader.dart b/lib/src/schema/loader.dart new file mode 100644 index 0000000..5264454 --- /dev/null +++ b/lib/src/schema/loader.dart @@ -0,0 +1,151 @@ +/// Load and merge schemas for the `--schema` entry point. +/// +/// [loadSchemaFromFile] reads a schema file (JSON) and returns the +/// parsed [Shape], after a JSON-Schema-looking sanity check. +/// [loadSchemaForData] adds sibling auto-detection: given a data file +/// path, it looks for `.schema.json` next to it and loads +/// that when present. +/// +/// [mergeSchemaWithData] combines a user-declared schema with the +/// shape inferred from actual data. See `doc/schema-design.md` section +/// on "Disagreement semantics" for the rules; the short version is +/// "schema augments, never contradicts" — agreements pass, schema +/// fills in what data can't express (empty-list elements, optional +/// fields), concrete-type disagreements error at load time. +library; + +import 'dart:io'; + +import '../errors.dart'; +import '../shape/shape.dart'; +import 'parser.dart'; + +/// Load a schema from a file path, parsing it as a JSON Schema subset. +/// +/// Throws [QueryError] if the file is missing or unreadable, or if +/// the schema parser rejects the content. +Shape loadSchemaFromFile(String path) { + final file = File(path); + if (!file.existsSync()) { + throw QueryError('schema file not found: $path'); + } + final source = file.readAsStringSync(); + return parseJsonSchema(source); +} + +/// Load a schema for [dataPath], preferring [explicitSchemaPath] when +/// provided and falling back to a `.schema.json` sibling. +/// +/// Returns `null` when no explicit path is given and no sibling +/// exists. Throws [QueryError] for explicit paths that fail to load. +Shape? loadSchemaForData({String? explicitSchemaPath, String? dataPath}) { + if (explicitSchemaPath != null) { + return loadSchemaFromFile(explicitSchemaPath); + } + if (dataPath != null) { + final sibling = _siblingSchemaPath(dataPath); + if (sibling != null && File(sibling).existsSync()) { + return loadSchemaFromFile(sibling); + } + } + return null; +} + +/// Compute the sibling schema path for [dataPath]. +/// +/// Strips the data file's extension and appends `.schema.json`: +/// `data.json` → `data.schema.json`, `events.ndjson` → `events.schema.json`. +/// Returns `null` for paths without a recognizable extension. +String? _siblingSchemaPath(String dataPath) { + final lastDot = dataPath.lastIndexOf('.'); + if (lastDot < 0) return null; + final base = dataPath.substring(0, lastDot); + return '$base.schema.json'; +} + +/// Merge a schema-declared [schema] shape with a data-inferred [data] +/// shape. Schema augments data: +/// +/// - Both agree on a concrete type: that type. +/// - Either side is [SAny]: use the other side. +/// - Schema-only field: keep the schema's shape (possibly optional). +/// - Data-only field: use the data's shape. +/// - Schema optional + data present: strip optional, use the merged +/// inner shape (the field is definitely there for this run). +/// - List elements merge recursively; empty-data lists take the +/// schema's element. +/// +/// Throws [QueryError] with a JSON-path when schema and data disagree +/// on a concrete type. Error path is rooted at `$` (the whole value). +Shape mergeSchemaWithData(Shape schema, Shape data) => + _merge(schema, data, r'$'); + +Shape _merge(Shape schema, Shape data, String path) { + // SAny: the other side wins. Both-any falls through to equality + // below. + if (schema is SAny) return data; + if (data is SAny) return schema; + + // Optional handling. Schema-side optional: if data has the value, + // strip the optional and merge inners; if data is null, keep the + // schema's optional as-is (field may still be absent at other call + // sites, though this particular data has null for it). + if (schema is SOptional) { + if (data is SNull) return schema; + return _merge(schema.inner, data, path); + } + if (data is SOptional) { + // Data is never an SOptional from shapeOf (shapeOf has no + // optionality signal). Included for defensive symmetry. + return _merge(schema, data.inner, path); + } + + if (schema is SList) { + if (data is! SList) { + throw _disagree(path, schema, data); + } + return SList(_merge(schema.element, data.element, '$path[*]')); + } + + if (schema is SMap) { + if (data is! SMap) { + throw _disagree(path, schema, data); + } + return _mergeMaps(schema, data, path); + } + + // Scalar shapes: must match, or disagree. + if (schema.runtimeType == data.runtimeType) { + return schema; + } + throw _disagree(path, schema, data); +} + +Shape _mergeMaps(SMap schema, SMap data, String path) { + final merged = {}; + + // Schema fields: merge with data if present, keep as-is otherwise. + for (final MapEntry(:key, value: schemaField) in schema.fields.entries) { + final dataField = data.fields[key]; + if (dataField == null) { + merged[key] = schemaField; + continue; + } + merged[key] = _merge(schemaField, dataField, '$path.$key'); + } + + // Data-only fields: pass through unchanged. Schema is a partial + // description by design; extras are fine. + for (final MapEntry(:key, value: dataField) in data.fields.entries) { + if (!schema.fields.containsKey(key)) { + merged[key] = dataField; + } + } + + return SMap(merged); +} + +QueryError _disagree(String path, Shape schema, Shape data) => QueryError( + 'schema disagreement at $path: schema says ${renderShape(schema)}, ' + 'data is ${renderShape(data)}', +); diff --git a/test/schema_loader_test.dart b/test/schema_loader_test.dart new file mode 100644 index 0000000..ac1b34f --- /dev/null +++ b/test/schema_loader_test.dart @@ -0,0 +1,309 @@ +/// Tests for the schema loader: file IO, sibling auto-detect, and +/// [mergeSchemaWithData] semantics. +/// +/// Merge rules under test: +/// 1. Agreement: both sides concrete and equal → that type. +/// 2. SAny on either side: the other side wins. +/// 3. Schema-only fields: preserved. +/// 4. Data-only fields: preserved. +/// 5. Schema optional + data present: strip optional. +/// 6. Schema optional + data null: keep optional (Lambe-style +/// null propagation: null means absent-ish). +/// 7. Disagreement on concrete types: QueryError with a path. +/// 8. Recursion through lists and maps. +library; + +import 'dart:io'; + +import 'package:lambe/lambe.dart'; +import 'package:test/test.dart'; + +void main() { + group('loadSchemaFromFile', () { + late Directory tmp; + + setUp(() { + tmp = Directory.systemTemp.createTempSync('lambe_schema_loader_'); + }); + + tearDown(() { + if (tmp.existsSync()) tmp.deleteSync(recursive: true); + }); + + test('reads a valid schema file', () { + final path = '${tmp.path}/s.json'; + File(path).writeAsStringSync('{"type": "string"}'); + expect(loadSchemaFromFile(path), const SString()); + }); + + test('throws on missing file with a clear message', () { + expect( + () => loadSchemaFromFile('${tmp.path}/nope.json'), + throwsA( + isA().having( + (e) => e.message, + 'message', + contains('schema file not found'), + ), + ), + ); + }); + + test('propagates parser errors on malformed content', () { + final path = '${tmp.path}/bad.json'; + File(path).writeAsStringSync('{"type": "nonsense"}'); + expect( + () => loadSchemaFromFile(path), + throwsA( + isA().having( + (e) => e.message, + 'message', + contains('unsupported type "nonsense"'), + ), + ), + ); + }); + }); + + group('loadSchemaForData: sibling auto-detect', () { + late Directory tmp; + + setUp(() { + tmp = Directory.systemTemp.createTempSync('lambe_schema_sibling_'); + }); + + tearDown(() { + if (tmp.existsSync()) tmp.deleteSync(recursive: true); + }); + + test('returns null when no explicit path and no sibling exists', () { + final dataPath = '${tmp.path}/data.json'; + File(dataPath).writeAsStringSync('{}'); + expect(loadSchemaForData(dataPath: dataPath), isNull); + }); + + test('finds sibling .schema.json next to data file', () { + final dataPath = '${tmp.path}/users.json'; + File(dataPath).writeAsStringSync('[]'); + File( + '${tmp.path}/users.schema.json', + ).writeAsStringSync('{"type": "array"}'); + expect(loadSchemaForData(dataPath: dataPath), const SList(SAny())); + }); + + test('sibling works for .ndjson extension too', () { + final dataPath = '${tmp.path}/events.ndjson'; + File(dataPath).writeAsStringSync('{}\n'); + File( + '${tmp.path}/events.schema.json', + ).writeAsStringSync('{"type": "object"}'); + expect( + loadSchemaForData(dataPath: dataPath), + const SMap({}), + ); + }); + + test('explicit path beats sibling', () { + final dataPath = '${tmp.path}/data.json'; + File(dataPath).writeAsStringSync('{}'); + // Sibling says number. + File( + '${tmp.path}/data.schema.json', + ).writeAsStringSync('{"type": "number"}'); + // Explicit says string. + final explicit = '${tmp.path}/explicit.json'; + File(explicit).writeAsStringSync('{"type": "string"}'); + expect( + loadSchemaForData(explicitSchemaPath: explicit, dataPath: dataPath), + const SString(), + ); + }); + + test('explicit path without data path still works', () { + final explicit = '${tmp.path}/only.json'; + File(explicit).writeAsStringSync('{"type": "boolean"}'); + expect(loadSchemaForData(explicitSchemaPath: explicit), const SBool()); + }); + }); + + group('mergeSchemaWithData: agreement and SAny', () { + test('equal concrete scalars pass through', () { + expect(mergeSchemaWithData(const SNum(), const SNum()), const SNum()); + }); + + test('SAny on either side yields the other', () { + expect(mergeSchemaWithData(const SAny(), const SNum()), const SNum()); + expect( + mergeSchemaWithData(const SString(), const SAny()), + const SString(), + ); + }); + + test('both SAny collapses to SAny', () { + expect(mergeSchemaWithData(const SAny(), const SAny()), const SAny()); + }); + }); + + group('mergeSchemaWithData: disagreement errors', () { + test('scalar vs scalar disagreement raises with path', () { + expect( + () => mergeSchemaWithData(const SNum(), const SString()), + throwsA( + isA().having( + (e) => e.message, + 'message', + allOf( + contains('disagreement'), + contains(r'$'), + contains('number'), + contains('string'), + ), + ), + ), + ); + }); + + test('schema map + data non-map raises', () { + expect( + () => mergeSchemaWithData(const SMap({'a': SNum()}), const SNum()), + throwsA(isA()), + ); + }); + + test('schema list + data non-list raises', () { + expect( + () => mergeSchemaWithData(const SList(SNum()), const SString()), + throwsA(isA()), + ); + }); + + test('nested disagreement carries the nested path', () { + expect( + () => mergeSchemaWithData( + const SMap({ + 'user': SMap({'age': SNum()}), + }), + const SMap({ + 'user': SMap({'age': SString()}), + }), + ), + throwsA( + isA().having( + (e) => e.message, + 'message', + contains(r'$.user.age'), + ), + ), + ); + }); + + test('list element disagreement carries [*] path', () { + expect( + () => mergeSchemaWithData(const SList(SNum()), const SList(SString())), + throwsA( + isA().having( + (e) => e.message, + 'message', + contains(r'$[*]'), + ), + ), + ); + }); + }); + + group('mergeSchemaWithData: SOptional handling', () { + test('schema optional + data present strips optional', () { + // Schema says field is optional; data has it present. + // Merged should be the concrete inner shape. + final merged = mergeSchemaWithData( + SMap({'age': SOptional(const SNum())}), + const SMap({'age': SNum()}), + ); + expect(merged, const SMap({'age': SNum()})); + }); + + test('schema optional + data absent keeps optional', () { + // Data has no `age` field. Schema wins. + final schema = SMap({'age': SOptional(const SNum())}); + const data = SMap({}); + expect(mergeSchemaWithData(schema, data), schema); + }); + + test('schema optional + data null keeps optional ' + '(Lambe null-propagation stance)', () { + final schema = SMap({'age': SOptional(const SNum())}); + const data = SMap({'age': SNull()}); + final merged = mergeSchemaWithData(schema, data) as SMap; + expect(merged.fields['age'], isA()); + }); + + test('optional inner still checks for disagreement', () { + // Schema says optional, data has string. String is not + // number-or-absent, so error. + expect( + () => mergeSchemaWithData( + SMap({'age': SOptional(const SNum())}), + const SMap({'age': SString()}), + ), + throwsA(isA()), + ); + }); + }); + + group('mergeSchemaWithData: augmentation', () { + test('schema-only field is preserved', () { + final schema = SMap({ + 'name': const SString(), + 'age': SOptional(const SNum()), + }); + const data = SMap({'name': SString()}); + final merged = mergeSchemaWithData(schema, data) as SMap; + expect(merged.fields['name'], const SString()); + expect(merged.fields['age'], isA()); + }); + + test('data-only field is preserved', () { + const schema = SMap({'name': SString()}); + const data = SMap({'name': SString(), 'extra': SBool()}); + final merged = mergeSchemaWithData(schema, data) as SMap; + expect(merged.fields['name'], const SString()); + expect(merged.fields['extra'], const SBool()); + }); + + test('empty data list + schema with typed items uses schema element', () { + // shapeOf([]) == SList(SAny()). Schema says list. + // Merge should yield list. + const schema = SList(SString()); + const data = SList(SAny()); + expect(mergeSchemaWithData(schema, data), const SList(SString())); + }); + + test('non-empty data list passes through schema element merge', () { + // Both sides know the element; they agree. + const schema = SList(SNum()); + const data = SList(SNum()); + expect(mergeSchemaWithData(schema, data), const SList(SNum())); + }); + + test('recursive merge across nested lists and maps', () { + final schema = SMap({ + 'users': SList( + SMap({ + 'name': const SString(), + 'tags': SOptional(const SList(SString())), + }), + ), + }); + const data = SMap({ + 'users': SList(SMap({'name': SString(), 'active': SBool()})), + }); + final merged = mergeSchemaWithData(schema, data) as SMap; + final users = (merged.fields['users']! as SList).element as SMap; + expect(users.fields['name'], const SString()); + expect(users.fields['active'], const SBool()); + // Schema-declared optional field missing from data stays + // optional. + expect(users.fields['tags'], isA()); + }); + }); +} From ebb146016bd83478b4a302e2af16cf05dbb79185 Mon Sep 17 00:00:00 2001 From: Hakim Jonas Ghoula Date: Sun, 3 May 2026 00:25:35 +0200 Subject: [PATCH 11/67] Track A step 4: JSON Schema renderer with round-trip invariant MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Add renderJsonSchema(Shape, {pretty}): String in lib/src/schema/renderer.dart. Walks the shape ADT and emits a JSON Schema subset document that parseJsonSchema accepts. Main decisions: SOptional handling SOptional inside SMap becomes a non-required property: the inner shape goes into `properties`, and the field name is omitted from `required`. This is JSON Schema's standard way to express "this field may be absent," and it's the only position where Lambe can round-trip optionality. SOptional elsewhere (top-level, inside SList, etc.) has no standard JSON Schema spelling in our subset. Renderer flattens to the inner shape — it's a one-way drop for these positions. The round-trip is preserved for every shape the parser can produce, which is the only invariant we promise. SAny handling Renders as the empty object {}. Parser treats an empty object as SAny (the "empty schema accepts anything" JSON Schema convention). Round-trip preserved. Added to parser: an empty object with no `type` is now SAny instead of a "missing type" error. Pretty vs compact Default `pretty: true` emits 2-space-indented JSON for human reading (print-shape output). `pretty: false` for embedding in other JSON payloads (future MCP responses). Round-trip invariant parseJsonSchema(renderJsonSchema(s)) == s for every shape the parser can emit. 12 representative cases pin this in the test file, plus two complex-shape tests (optional field in a nested list, four-deep nested maps). Tests (+32) - 5 scalar renderings. - 4 container renderings (list with items, list of any, map all required, map no required, empty map). - 1 mixed-required round-trip. - 3 SOptional positions (top, inside list, inside map). - 3 pretty/compact checks. - 12 explicit round-trip cases covering every parser-reachable shape plus 3 complex scenarios. Exported renderJsonSchema from package:lambe/lambe.dart. Quality gates: dart analyze clean, 1436 tests pass (was 1404, +32), dart format clean, pana 160/160. Step 4 of 9. Next: CLI wiring — rename --schema to --print-shape, add --schema option, thread through evaluation and explain. --- lib/lambe.dart | 1 + lib/src/schema/parser.dart | 3 + lib/src/schema/renderer.dart | 71 ++++++++++++ test/schema_renderer_test.dart | 197 +++++++++++++++++++++++++++++++++ 4 files changed, 272 insertions(+) create mode 100644 lib/src/schema/renderer.dart create mode 100644 test/schema_renderer_test.dart diff --git a/lib/lambe.dart b/lib/lambe.dart index ae4b393..e733bc9 100644 --- a/lib/lambe.dart +++ b/lib/lambe.dart @@ -35,6 +35,7 @@ export 'src/output.dart' export 'src/schema/loader.dart' show loadSchemaFromFile, loadSchemaForData, mergeSchemaWithData; export 'src/schema/parser.dart' show parseJsonSchema; +export 'src/schema/renderer.dart' show renderJsonSchema; export 'src/shape/shape.dart' show Shape, diff --git a/lib/src/schema/parser.dart b/lib/src/schema/parser.dart index 60ba332..7628083 100644 --- a/lib/src/schema/parser.dart +++ b/lib/src/schema/parser.dart @@ -51,6 +51,9 @@ Shape _schema(JsonValue node, {required String path}) { final typeValue = node.fields['type']; if (typeValue == null) { + // Empty-object convention: {} accepts any value. Round-trips + // with [renderJsonSchema] on SAny. + if (node.fields.isEmpty) return const SAny(); throw QueryError( 'schema at $path: missing "type" keyword. A schema must declare ' 'a type such as "null", "boolean", "number", "string", "array", ' diff --git a/lib/src/schema/renderer.dart b/lib/src/schema/renderer.dart new file mode 100644 index 0000000..fef4504 --- /dev/null +++ b/lib/src/schema/renderer.dart @@ -0,0 +1,71 @@ +/// Render a [Shape] as a JSON Schema subset document. +/// +/// Output is the input format [parseJsonSchema] accepts: a JSON +/// object with `type`, and (for containers) `properties`/`required` +/// or `items`. [SOptional] inside an [SMap] becomes a missing entry +/// in `required`; [SOptional] at other positions is flattened (there +/// is no standard JSON Schema representation for a nullable/optional +/// non-field position — the inner shape is rendered). +/// +/// Round-trip guarantee: for any `Shape` produced by +/// [parseJsonSchema], `parseJsonSchema(renderJsonSchema(s)) == s`. +library; + +import 'dart:convert'; + +import '../shape/shape.dart'; + +/// Render [shape] as a pretty-printed JSON Schema string. +/// +/// Pretty-prints with 2-space indent by default. For a compact form +/// suitable for embedding in another JSON payload (e.g. an MCP tool +/// response), pass `pretty: false`. +String renderJsonSchema(Shape shape, {bool pretty = true}) { + final payload = _encode(shape); + final encoder = + pretty ? const JsonEncoder.withIndent(' ') : const JsonEncoder(); + return encoder.convert(payload); +} + +Map _encode(Shape shape) { + // Top-level SOptional has no standard JSON Schema spelling in our + // subset. Flatten: a user who called renderJsonSchema on an + // SOptional gets the JSON Schema for T. This is the same + // behavior as `renderShape` which shows `optional` but + // parseJsonSchema has no way to re-parse that syntax. + final concrete = shape is SOptional ? shape.inner : shape; + return switch (concrete) { + // JSON Schema convention: {} accepts any value. The parser + // mirrors this by treating an empty object (no `type`) as SAny, + // so the round-trip holds. + SAny() => const {}, + SNull() => {'type': 'null'}, + SBool() => {'type': 'boolean'}, + SNum() => {'type': 'number'}, + SString() => {'type': 'string'}, + SList(:final element) => {'type': 'array', 'items': _encode(element)}, + SMap(:final fields) => _encodeMap(fields), + // Unreachable: SOptional was unwrapped above. Present for + // exhaustive-switch conformance. + SOptional() => throw StateError('unreachable: SOptional unwrapped above'), + }; +} + +Map _encodeMap(Map fields) { + final properties = {}; + final required = []; + for (final MapEntry(:key, :value) in fields.entries) { + if (value is SOptional) { + properties[key] = _encode(value.inner); + } else { + properties[key] = _encode(value); + required.add(key); + } + } + final result = { + 'type': 'object', + if (properties.isNotEmpty) 'properties': properties, + if (required.isNotEmpty) 'required': required, + }; + return result; +} diff --git a/test/schema_renderer_test.dart b/test/schema_renderer_test.dart new file mode 100644 index 0000000..ff3c4d5 --- /dev/null +++ b/test/schema_renderer_test.dart @@ -0,0 +1,197 @@ +/// Tests for [renderJsonSchema] and its round-trip with +/// [parseJsonSchema]. +/// +/// Invariants pinned here: +/// 1. Every shape the parser can produce renders to valid JSON +/// Schema the parser re-accepts. Round-trip: parse(render(s)) == s +/// for every shape reachable through [parseJsonSchema]. +/// 2. Optional fields inside an [SMap] render as missing entries in +/// `required`, not as a modification to the property's shape. +/// 3. [SAny] renders as the empty object `{}`, which parses back to +/// [SAny] via the "empty object means any" convention. +/// 4. Pretty vs compact output both parse to the same shape. +library; + +import 'package:lambe/lambe.dart'; +import 'package:test/test.dart'; + +void main() { + group('renderJsonSchema: scalars', () { + test('SNull renders as {type: null}', () { + expect(renderJsonSchema(const SNull()), contains('"type": "null"')); + }); + test('SBool renders as {type: boolean}', () { + expect(renderJsonSchema(const SBool()), contains('"type": "boolean"')); + }); + test('SNum renders as {type: number}', () { + expect(renderJsonSchema(const SNum()), contains('"type": "number"')); + }); + test('SString renders as {type: string}', () { + expect(renderJsonSchema(const SString()), contains('"type": "string"')); + }); + test('SAny renders as empty object', () { + expect(renderJsonSchema(const SAny()).trim(), '{}'); + }); + }); + + group('renderJsonSchema: containers', () { + test('SList with typed items', () { + final out = renderJsonSchema(const SList(SString())); + expect(out, contains('"type": "array"')); + expect(out, contains('"items"')); + expect(out, contains('"type": "string"')); + }); + + test('SList renders items as empty object', () { + final out = renderJsonSchema(const SList(SAny())); + expect(out, contains('"type": "array"')); + // The items field is present but its value is {}. + expect(out, contains('"items":')); + }); + + test('SMap with all required fields lists all in required', () { + final out = renderJsonSchema(const SMap({'a': SNum(), 'b': SString()})); + expect(out, contains('"type": "object"')); + expect(out, contains('"properties"')); + expect(out, contains('"required"')); + expect(out, contains('"a"')); + expect(out, contains('"b"')); + }); + + test('SMap with only optional fields omits required', () { + final shape = SMap({ + 'a': SOptional(const SNum()), + 'b': SOptional(const SString()), + }); + final out = renderJsonSchema(shape); + expect(out, contains('"properties"')); + expect(out, isNot(contains('"required"'))); + }); + + test('SMap with mix: only required fields appear in required list', () { + final shape = SMap({ + 'name': const SString(), + 'age': SOptional(const SNum()), + }); + // Round-trip-verify (shape-level) rather than string-match the + // list contents. + final reparsed = parseJsonSchema(renderJsonSchema(shape)) as SMap; + expect(reparsed.fields['name'], const SString()); + expect(reparsed.fields['age'], isA()); + }); + + test('empty SMap omits both properties and required', () { + final out = renderJsonSchema(const SMap({})); + expect(out, contains('"type": "object"')); + expect(out, isNot(contains('"properties"'))); + expect(out, isNot(contains('"required"'))); + }); + }); + + group('renderJsonSchema: SOptional handling', () { + test('SOptional at top level flattens to the inner shape', () { + // There is no JSON Schema idiom in our subset for a top-level + // optional; renderer flattens to the inner shape. + final out = renderJsonSchema(SOptional(const SNum())); + expect(out, contains('"type": "number"')); + }); + + test('SOptional inside SList flattens on the inner element', () { + // Same reasoning: a list whose element is "optional T" has no + // standard JSON Schema spelling in our subset, so we render as + // list. This is a lossy edge case called out in design doc. + final out = renderJsonSchema(SList(SOptional(const SString()))); + expect(out, contains('"type": "array"')); + expect(out, contains('"type": "string"')); + }); + + test('SOptional inside SMap becomes non-required property', () { + // This is the principal case. Round-trip with parser to verify. + final shape = SMap({'x': SOptional(const SNum())}); + final reparsed = parseJsonSchema(renderJsonSchema(shape)) as SMap; + expect(reparsed.fields['x'], isA()); + }); + }); + + group('renderJsonSchema: compact vs pretty', () { + test('pretty output has whitespace/newlines', () { + final pretty = renderJsonSchema(const SString()); + expect(pretty, contains('\n')); + }); + + test('compact output has no newlines', () { + final compact = renderJsonSchema(const SString(), pretty: false); + expect(compact, isNot(contains('\n'))); + }); + + test('pretty and compact parse back to the same shape', () { + const shape = SList(SMap({'name': SString()})); + final pretty = renderJsonSchema(shape); + final compact = renderJsonSchema(shape, pretty: false); + expect(parseJsonSchema(pretty), parseJsonSchema(compact)); + }); + }); + + group('renderJsonSchema: round-trip with parser', () { + // parse(render(s)) == s for every shape the parser can emit. + final roundTripCases = { + 'null': const SNull(), + 'bool': const SBool(), + 'number': const SNum(), + 'string': const SString(), + 'any (empty-object)': const SAny(), + 'list of numbers': const SList(SNum()), + 'list of strings': const SList(SString()), + 'list of any': const SList(SAny()), + 'empty map': const SMap({}), + 'map of scalars all required': const SMap({'a': SNum(), 'b': SString()}), + 'list of maps': const SList(SMap({'id': SString(), 'n': SNum()})), + 'nested maps': const SMap({ + 'user': SMap({'name': SString(), 'age': SNum()}), + }), + }; + + for (final entry in roundTripCases.entries) { + test('round-trip: ${entry.key}', () { + final rendered = renderJsonSchema(entry.value); + final reparsed = parseJsonSchema(rendered); + expect(reparsed, entry.value); + }); + } + + test('round-trip: map with one required and one optional field', () { + final shape = SMap({ + 'name': const SString(), + 'age': SOptional(const SNum()), + }); + final rendered = renderJsonSchema(shape); + final reparsed = parseJsonSchema(rendered); + expect(reparsed, shape); + }); + + test('round-trip: list of maps with optional field in element', () { + final shape = SList( + SMap({ + 'id': const SString(), + 'tags': SOptional(const SList(SString())), + }), + ); + final rendered = renderJsonSchema(shape); + final reparsed = parseJsonSchema(rendered); + expect(reparsed, shape); + }); + + test('deeply nested round-trip', () { + final shape = SMap({ + 'a': SMap({ + 'b': SMap({ + 'c': SList(SMap({'d': SOptional(const SNum())})), + }), + }), + }); + final rendered = renderJsonSchema(shape); + final reparsed = parseJsonSchema(rendered); + expect(reparsed, shape); + }); + }); +} From a5da7d69460a7f6be80eb2cc2d367bea6b4af516 Mon Sep 17 00:00:00 2001 From: Hakim Jonas Ghoula Date: Sun, 3 May 2026 00:38:59 +0200 Subject: [PATCH 12/67] Track A step 5: CLI rewiring with --schema and --print-shape MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit First user-visible breaking change in 0.9.0: rename the existing --schema flag to --print-shape, add a new --schema option that takes a JSON Schema file. --schema New option on `lam`. Threads the declared shape through both --explain inference (via mergeSchemaWithData) and normal evaluation (validation-as-side-effect — a concrete-type disagreement between schema and data errors at load time). Auto-detection: if --schema is omitted and a sibling .schema.json exists, it's used implicitly. Same convention as 0.9.0's .ndjson auto-detect. --print-shape Replaces the 0.8.0 --schema flag. Emits the inferred shape as a JSON Schema subset document, round-trippable with --schema input. Output format is now JSON Schema (second breaking change): 0.8.0's type-name-string JSON is replaced with the canonical schema form so that `lam --print-shape data.json > data.schema.json` followed by `lam --schema data.schema.json ...` round-trips cleanly. Mode combination guards --print-shape + --schema is rejected: --print-shape prints the inferred shape from data, which a schema would only second-guess. --ndjson + --schema is rejected (added to existing ndjson guards). --ndjson + --print-shape is rejected. Help text updates documented in doc/lam.1.md; regenerated doc/lam.1. CLI flow (when --schema is active): --explain path: shape = mergeSchemaWithData(schema, shapeOf(data)) (or just schema when data is absent); fed to explain() as inputShape. Normal eval: mergeSchemaWithData is invoked purely for its side-effect validation (throws on disagreement). Evaluation runs on raw data as usual. --print-shape: schema is rejected (see above). Smoke-tested end to end with: * --print-shape emits JSON Schema (verified by eye). * --explain with sibling .schema.json auto-loads, surfaces SOptional from the `required` semantics, shows it in the shape trace ("list>>"). * --schema api.json '.' response.json where schema says age:string but data has age:number errors cleanly with "schema disagreement at $[*].age: schema says string, data is number" and exits 1. Existing legacy inferSchema function stays referenced in REPL and MCP (updated in steps 6 and 7). Quality gates: dart analyze clean, 1436 tests pass (no changes; no new tests yet — step 8 adds CLI integration coverage), dart format clean, pana 160/160, manpage round-trip matches. Step 5 of 9. Next: REPL integration. --- bin/lam.dart | 82 ++++++++++++++++++++++++++++++++++++++++++++-------- doc/lam.1 | 11 ++++--- doc/lam.1.md | 11 ++++--- 3 files changed, 84 insertions(+), 20 deletions(-) diff --git a/bin/lam.dart b/bin/lam.dart index bf0269e..a7932cb 100644 --- a/bin/lam.dart +++ b/bin/lam.dart @@ -43,9 +43,18 @@ void main(List arguments) { allowed: ['refuse', 'json'], defaultsTo: 'refuse', ) - ..addFlag( + ..addOption( 'schema', - help: 'Show data structure without values', + help: + 'Path to a JSON Schema subset file. Threads the declared ' + 'shape through inference and explain. If omitted, a ' + 'sibling .schema.json is used when present.', + ) + ..addFlag( + 'print-shape', + help: + 'Print the inferred shape of the data as a JSON Schema. ' + 'Renames the 0.8.0 --schema flag with the same meaning.', negatable: false, ) ..addFlag( @@ -103,8 +112,9 @@ void main(List arguments) { return; } - // --schema mode: no expression needed, just file - final isSchemaMode = args.flag('schema'); + // --print-shape mode: no expression needed, just file. + final isPrintShapeMode = args.flag('print-shape'); + final schemaPath = args.option('schema'); final isAssertMode = args.flag('assert'); final isInteractive = args.flag('interactive'); // --explain-trivial and --explain-json imply --explain, so enable @@ -115,7 +125,7 @@ void main(List arguments) { var isNdjsonMode = args.flag('ndjson'); final rest = args.rest; - if (rest.isEmpty && !isSchemaMode && !isInteractive) { + if (rest.isEmpty && !isPrintShapeMode && !isInteractive) { stderr.writeln('Error: missing query expression.'); stderr.writeln(); _usage(argParser); @@ -131,7 +141,7 @@ void main(List arguments) { final expression = rest.isNotEmpty ? rest[0] : '.'; final fileArgIndex = - (isSchemaMode || isInteractive) && rest.length == 1 ? 0 : 1; + (isPrintShapeMode || isInteractive) && rest.length == 1 ? 0 : 1; // Auto-enable ndjson mode when the file extension suggests it, even // without an explicit --ndjson flag. Consistent with the existing @@ -148,7 +158,11 @@ void main(List arguments) { stderr.writeln('Error: --ndjson cannot be combined with --interactive.'); exit(1); } - if (isSchemaMode) { + if (isPrintShapeMode) { + stderr.writeln('Error: --ndjson cannot be combined with --print-shape.'); + exit(1); + } + if (schemaPath != null) { stderr.writeln('Error: --ndjson cannot be combined with --schema.'); exit(1); } @@ -241,10 +255,17 @@ void main(List arguments) { return; } - // --schema mode: show structure and exit - if (isSchemaMode) { - final schema = inferSchema(data); - stdout.writeln(const JsonEncoder.withIndent(' ').convert(schema)); + // --print-shape mode: emit the inferred shape as JSON Schema. + if (isPrintShapeMode) { + if (schemaPath != null) { + stderr.writeln( + 'Error: --print-shape prints the inferred shape of the data; ' + '--schema has nothing to contribute.', + ); + exit(1); + } + final shape = data == null ? const SAny() : shapeOf(data); + stdout.writeln(renderJsonSchema(shape)); return; } @@ -257,7 +278,25 @@ void main(List arguments) { stderr.writeln('Error: ${e.message}'); exit(1); } - final inputShape = data == null ? const SAny() : shapeOf(data); + // Initial shape: schema when provided (explicit or auto-detected + // sibling), merged with shapeOf(data). Falls back to SAny / data + // shape when no schema is available. + final dataShape = data == null ? const SAny() : shapeOf(data); + final Shape inputShape; + try { + final schema = loadSchemaForData( + explicitSchemaPath: schemaPath, + dataPath: + data != null && rest.length > fileArgIndex + ? rest[fileArgIndex] + : null, + ); + inputShape = + schema == null ? dataShape : mergeSchemaWithData(schema, dataShape); + } on QueryError catch (e) { + stderr.writeln('Error: ${e.message}'); + exit(1); + } final cellPolicy = CellPolicy.values.byName(args.option('flatten-cells')!); final report = explain( ast, @@ -273,6 +312,25 @@ void main(List arguments) { return; } + // If a schema is in effect, validate it against the data before + // evaluating. mergeSchemaWithData throws on concrete-type + // disagreement; this gives structural validation as a side effect + // of --schema. + if (data != null) { + try { + final schema = loadSchemaForData( + explicitSchemaPath: schemaPath, + dataPath: rest.length > fileArgIndex ? rest[fileArgIndex] : null, + ); + if (schema != null) { + mergeSchemaWithData(schema, shapeOf(data)); + } + } on QueryError catch (e) { + stderr.writeln('Error: ${e.message}'); + exit(1); + } + } + // The parsed AST is retained so that, if serialization later hits an // OutputShapeError, a chosen remediation can be composed with it via // applyBridge without re-parsing. diff --git a/doc/lam.1 b/doc/lam.1 index f8ee63d..fb08bb3 100644 --- a/doc/lam.1 +++ b/doc/lam.1 @@ -36,8 +36,11 @@ Output format. One of: json, yaml, toml, csv, tsv, hcl. Default is json. \fB--flatten-cells\fR \fIPOLICY\fR CSV/TSV policy for non-scalar cells. \fBrefuse\fR (default) rejects list- or map-valued cells with a shape error. \fBjson\fR encodes them as JSON strings inline; the shape check correspondingly widens to accept any list at the root. Ignored for other output formats. .TP -\fB--schema\fR -Show the data structure with type names instead of values. +\fB--schema\fR \fIPATH\fR +Path to a JSON Schema subset file. Threads the declared shape through inference and \fB--explain\fR, validates data against the schema at load time (errors on concrete-type disagreement), and fills in shape details the sampled data doesn't cover (empty-list elements, optional fields). Auto-detected as a sibling \fB.schema.json\fR if omitted. Accepts \fBtype\fR, \fBproperties\fR, \fBitems\fR, and \fBrequired\fR; rejects structural combinators (allOf/oneOf/$ref) and value-level constraints (minimum/pattern/enum/etc) with a per-keyword error. +.TP +\fB--print-shape\fR +Print the inferred shape of the data as a JSON Schema subset document. Replaces the 0.8.0 \fB--schema\fR flag with the same meaning, renamed because \fB--schema\fR now takes a path value. .TP \fB--explain\fR Trace the shape of values flowing through each pipeline stage. Static analysis only; does not execute the query. Reports which output formats the final shape can be serialized as. Flags provably-empty filters and runtime-rejection mismatches. @@ -55,7 +58,7 @@ Evaluate the expression and exit with code 0 if the result is true, 1 if false. Start the interactive REPL. Requires a file argument. .TP \fB--ndjson\fR -Treat input as ndjson or jsonl: one JSON document per line, evaluated independently with no state shared between lines. Emits one compact JSON result per line on stdout. Auto-enabled when the file extension is \fB.ndjson\fR or \fB.jsonl\fR. Cannot combine with \fB--interactive\fR, \fB--schema\fR, \fB--assert\fR, or \fB--explain\fR. Output must be JSON (\fB--to json\fR or default); other \fB--to\fR values are refused. +Treat input as ndjson or jsonl: one JSON document per line, evaluated independently with no state shared between lines. Emits one compact JSON result per line on stdout. Auto-enabled when the file extension is \fB.ndjson\fR or \fB.jsonl\fR. Cannot combine with \fB--interactive\fR, \fB--schema\fR, \fB--print-shape\fR, \fB--assert\fR, or \fB--explain\fR. Output must be JSON (\fB--to json\fR or default); other \fB--to\fR values are refused. .TP \fB-h\fR, \fB--help\fR Show usage information. @@ -239,7 +242,7 @@ lam --to yaml '.config' data.json Schema inspection: .PP .nf -lam --schema deployment.yaml +lam --print-shape deployment.yaml .fi .PP Shape trace for a pipeline: diff --git a/doc/lam.1.md b/doc/lam.1.md index c051aa3..7410a07 100644 --- a/doc/lam.1.md +++ b/doc/lam.1.md @@ -44,8 +44,11 @@ If no file is given, reads from standard input. **--flatten-cells** *POLICY* : CSV/TSV policy for non-scalar cells. **refuse** (default) rejects list- or map-valued cells with a shape error. **json** encodes them as JSON strings inline; the shape check correspondingly widens to accept any list at the root. Ignored for other output formats. -**--schema** -: Show the data structure with type names instead of values. +**--schema** *PATH* +: Path to a JSON Schema subset file. Threads the declared shape through inference and **--explain**, validates data against the schema at load time (errors on concrete-type disagreement), and fills in shape details the sampled data doesn't cover (empty-list elements, optional fields). Auto-detected as a sibling **.schema.json** if omitted. Accepts **type**, **properties**, **items**, and **required**; rejects structural combinators (allOf/oneOf/$ref) and value-level constraints (minimum/pattern/enum/etc) with a per-keyword error. + +**--print-shape** +: Print the inferred shape of the data as a JSON Schema subset document. Replaces the 0.8.0 **--schema** flag with the same meaning, renamed because **--schema** now takes a path value. **--explain** : Trace the shape of values flowing through each pipeline stage. Static analysis only; does not execute the query. Reports which output formats the final shape can be serialized as. Flags provably-empty filters and runtime-rejection mismatches. @@ -63,7 +66,7 @@ If no file is given, reads from standard input. : Start the interactive REPL. Requires a file argument. **--ndjson** -: Treat input as ndjson or jsonl: one JSON document per line, evaluated independently with no state shared between lines. Emits one compact JSON result per line on stdout. Auto-enabled when the file extension is **.ndjson** or **.jsonl**. Cannot combine with **--interactive**, **--schema**, **--assert**, or **--explain**. Output must be JSON (**--to json** or default); other **--to** values are refused. +: Treat input as ndjson or jsonl: one JSON document per line, evaluated independently with no state shared between lines. Emits one compact JSON result per line on stdout. Auto-enabled when the file extension is **.ndjson** or **.jsonl**. Cannot combine with **--interactive**, **--schema**, **--print-shape**, **--assert**, or **--explain**. Output must be JSON (**--to json** or default); other **--to** values are refused. **-h**, **--help** : Show usage information. @@ -255,7 +258,7 @@ Format conversion: Schema inspection: - lam --schema deployment.yaml + lam --print-shape deployment.yaml Shape trace for a pipeline: From 5ec6c5ebdb90a6ed2386ea574f0d062c7ed70e02 Mon Sep 17 00:00:00 2001 From: Hakim Jonas Ghoula Date: Sun, 3 May 2026 00:48:07 +0200 Subject: [PATCH 13/67] Track A self-review fixes MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Self-review of steps 1-5 caught two honesty gaps: renderJsonSchema: lossy positions documented The round-trip invariant holds only for shapes parseJsonSchema can produce (SOptional inside SMap fields). Callers composing shapes outside that path — e.g., an inference result where SOptional lands at the root or inside a list — hit a silent flatten. The previous docstring said "no standard JSON Schema representation" which was true but terse; now explicit that optionality is **dropped** in those positions, so the user knows the output isn't lossless for arbitrary shapes. inferSchema: deprecated for 1.0 removal inferSchema emits type-names-as-strings (e.g. `{"age": "number"}`), a format that doesn't round-trip with any parser we ship. With renderJsonSchema as the canonical JSON Schema emitter and shapeOf for the Shape ADT, inferSchema is vestigial. Marked @Deprecated with a migration pointer to renderJsonSchema(shapeOf(value)). Removal scheduled for 1.0 per the "freeze the shape API" target. REPL and MCP callsites migrate in steps 6 and 7. Also verified via exploratory tests (not committed, cleanup-only): - SOptional(SOptional(x)) collapses at the factory level AND through _lookupField's recursion-then-factory-wrap, so stacked optionals cannot exist from inference. - mergeSchemaWithData never produces stacked optionals either: the data-side optional branch unwraps before merging inners. Other self-review findings deferred: - CLI guard matrix (7 mode-combo rejections) is accreting. Noted in project_lambe_cli_test_matrix memory as a post-4-tracks refactor. - Validation errors aren't structured like OutputShapeError. Deliberately not forcing them into that mold; they're a different class of problem (input validation vs output serialization). - CLI integration tests for --schema / --print-shape deferred to step 8. Quality gates: dart analyze clean, 1436 tests still pass, dart format clean, pana 160/160. --- lib/src/output.dart | 9 +++++++++ lib/src/schema/renderer.dart | 16 ++++++++++++++++ 2 files changed, 25 insertions(+) diff --git a/lib/src/output.dart b/lib/src/output.dart index 89bea63..502a920 100644 --- a/lib/src/output.dart +++ b/lib/src/output.dart @@ -55,6 +55,15 @@ String formatOutput( /// - `"hello"` → `"string"` /// - `[1, 2]` → `["number"]` (schema of first element) /// - `{a: 1}` → `{a: "number"}` +/// +/// Deprecated in 0.9.0, to be removed in 1.0. Use +/// `renderJsonSchema(shapeOf(value))` for the canonical JSON Schema +/// output that round-trips with `parseJsonSchema`, or `shapeOf(value)` +/// alone for the [Shape] ADT. +@Deprecated( + 'Use renderJsonSchema(shapeOf(value)) for JSON Schema output, or ' + 'shapeOf(value) for the Shape ADT. Scheduled for removal in 1.0.', +) Object? inferSchema(Object? value) { if (value == null) return 'null'; if (value is bool) return 'boolean'; diff --git a/lib/src/schema/renderer.dart b/lib/src/schema/renderer.dart index fef4504..d0667a3 100644 --- a/lib/src/schema/renderer.dart +++ b/lib/src/schema/renderer.dart @@ -20,6 +20,22 @@ import '../shape/shape.dart'; /// Pretty-prints with 2-space indent by default. For a compact form /// suitable for embedding in another JSON payload (e.g. an MCP tool /// response), pass `pretty: false`. +/// +/// ### Lossy positions +/// +/// [SOptional] inside [SMap] encodes faithfully (missing entry in +/// `required`) and round-trips through [parseJsonSchema]. +/// +/// [SOptional] anywhere else — at the root, inside a list's +/// `element`, or nested — is **flattened to its inner shape**. Our +/// JSON Schema subset has no idiom for "optional at this position," +/// so the optionality signal is dropped. Callers composing shapes +/// via inference (for example, a query result whose outermost shape +/// is [SOptional]) should be aware: the rendered schema does not +/// preserve the "may be absent" information. +/// +/// Shapes produced by [parseJsonSchema] only put [SOptional] inside +/// [SMap] fields, so the round-trip invariant holds for those. String renderJsonSchema(Shape shape, {bool pretty = true}) { final payload = _encode(shape); final encoder = From 4941e3754565fab85752014403644575bb3c8d5e Mon Sep 17 00:00:00 2001 From: Hakim Jonas Ghoula Date: Sun, 3 May 2026 01:00:35 +0200 Subject: [PATCH 14/67] Track A step 6: REPL schema integration Migrate REPL's :schema command from inferSchema-based output to the 0.9.0 schema infrastructure, and add :print-shape. Session state New `Shape? activeSchema` variable in runRepl. Loaded by :schema , queried by :schema (no arg), used to validate future data loads. :schema [path] With a path: loads the schema via loadSchemaFromFile, stores it on the session. If data is currently loaded, runs mergeSchemaWithData on the fly as a structural validation check; reports "Schema loaded (agrees with current data)" or "Schema loaded, but disagrees with current data: : ...". No path: prints the active schema via renderJsonSchema, or the no-schema-loaded message. :print-shape New command. Prints shapeOf(currentData) as JSON Schema. The REPL analog of the CLI --print-shape; replaces the old :schema (no arg) behavior. :load re-validates against the active schema When a schema is loaded and the user switches data via :load, runs mergeSchemaWithData again and warns on disagreement. Keeps the REPL session honest across data changes. Completer Added flatten-cells and print-shape to the _replCommands list in completer.dart so tab completion on bare `:` offers the new commands alongside the old ones. 11 total now (was 9). :help updated to document both :schema forms and :print-shape. inferSchema callsite removed from REPL. The legacy function stays in lib/src/output.dart as @Deprecated; MCP migrates in step 7. Manual REPL verification (interactive, can't be automated without a TTY seam): * :print-shape emits JSON Schema for the data. * :schema loads, reports agreement / disagreement vs data. * :schema (no arg) prints the active schema. * :load re-validates against the active schema. * :help lists the new commands. * Tab completion on bare `:` offers all 11 commands. Test update: completer_test.dart "all commands on bare colon" updated from expecting 9 to expecting 11, plus explicit checks for flatten-cells and print-shape. Quality gates: dart analyze clean, 1436 tests pass, dart format clean, pana 160/160. Step 6 of 9. Next: MCP server. --- lib/src/completer.dart | 2 ++ lib/src/repl.dart | 58 +++++++++++++++++++++++++++++++++++----- test/completer_test.dart | 4 ++- 3 files changed, 57 insertions(+), 7 deletions(-) diff --git a/lib/src/completer.dart b/lib/src/completer.dart index d72b889..d3ba4eb 100644 --- a/lib/src/completer.dart +++ b/lib/src/completer.dart @@ -40,10 +40,12 @@ final List pipelineOps = parser_.pipeOpNames; /// REPL command names, sorted alphabetically. const _replCommands = [ + 'flatten-cells', 'help', 'history', 'load', 'pretty', + 'print-shape', 'q', 'quit', 'raw', diff --git a/lib/src/repl.dart b/lib/src/repl.dart index bebed9a..ddeb951 100644 --- a/lib/src/repl.dart +++ b/lib/src/repl.dart @@ -32,6 +32,7 @@ void runRepl(Object? data, {OutputFormat format = OutputFormat.json}) { var pretty = true; var raw = false; var flattenCells = CellPolicy.refuse; + Shape? activeSchema; final history = _loadHistory(); final rl = ReadLine( @@ -52,12 +53,41 @@ void runRepl(Object? data, {OutputFormat format = OutputFormat.json}) { final arg = parts.length > 1 ? parts.skip(1).join(' ') : null; switch (command) { + case 'schema' when arg != null: + try { + final schema = loadSchemaFromFile(arg); + activeSchema = schema; + // Validate against currently loaded data, if any. + if (currentData != null) { + try { + mergeSchemaWithData(schema, shapeOf(currentData)); + stdout.writeln('Schema loaded (agrees with current data).'); + } on QueryError catch (e) { + stdout.writeln( + 'Schema loaded, but disagrees with current data: ' + '${e.message}', + ); + } + } else { + stdout.writeln('Schema loaded.'); + } + } on QueryError catch (e) { + stderr.writeln('Error: ${e.message}'); + } + case 'schema': - stdout.writeln( - const JsonEncoder.withIndent( - ' ', - ).convert(inferSchema(currentData)), - ); + if (activeSchema == null) { + stdout.writeln('No schema loaded. Use :schema to load one.'); + } else { + stdout.writeln(renderJsonSchema(activeSchema)); + } + + case 'print-shape': + if (currentData == null) { + stderr.writeln('No data loaded. Use :load first.'); + } else { + stdout.writeln(renderJsonSchema(shapeOf(currentData))); + } case 'to' when arg != null: final fmt = @@ -100,6 +130,17 @@ void runRepl(Object? data, {OutputFormat format = OutputFormat.json}) { if (loaded != null) { currentData = loaded; stdout.writeln('Data loaded: ${_briefDescription(currentData)}'); + // Re-validate against the active schema, if any. + if (activeSchema != null) { + try { + mergeSchemaWithData(activeSchema, shapeOf(loaded)); + } on QueryError catch (e) { + stdout.writeln( + 'Warning: data disagrees with active schema: ' + '${e.message}', + ); + } + } } case 'load': @@ -416,7 +457,12 @@ Object? _loadFile(String path) { void _printHelp() { stdout.writeln('Commands:'); - stdout.writeln(' :schema Show data structure'); + stdout.writeln( + ' :schema [path] Load (with path) or show the active schema', + ); + stdout.writeln( + ' :print-shape Print the data\'s inferred shape as JSON Schema', + ); stdout.writeln( ' :to Set output format (json, yaml, toml, csv, tsv, hcl)', ); diff --git a/test/completer_test.dart b/test/completer_test.dart index 18c2512..b63a930 100644 --- a/test/completer_test.dart +++ b/test/completer_test.dart @@ -270,9 +270,11 @@ void main() { test('all commands on bare colon', () { final (:start, :end, :candidates) = complete(':', 1, null); - expect(candidates.length, 9); + expect(candidates.length, 11); expect(candidates, contains('help')); expect(candidates, contains('schema')); + expect(candidates, contains('print-shape')); + expect(candidates, contains('flatten-cells')); }); }); From 878abec20547d8daf6519d26c0912e9c4f31552d Mon Sep 17 00:00:00 2001 From: Hakim Jonas Ghoula Date: Sun, 3 May 2026 01:06:55 +0200 Subject: [PATCH 15/67] Track A step 7: MCP server schema integration Three MCP surface changes aligning with the CLI schema work. lambe_query: new schema parameter Optional inline JSON Schema string. When provided, data is parsed and validated against the schema before the query runs; a structural disagreement returns an error with the path. Agents wanting to fail-fast on unexpected shapes now have a first-class way to do it. Threaded through _handleQuery via parseJsonSchema + mergeSchemaWithData. lambe_schema renamed to lambe_print_shape Tool rename aligning with the CLI rename (--schema -> --print-shape). Output format changed from type-name-string JSON (e.g. `{"age": "number"}`) to canonical JSON Schema (e.g. `{"type": "object", "properties": {"age": {"type": "number"}}, ...}`). The new output round-trips with lambe_query's schema parameter, lambe_check, and the parseJsonSchema library function. This is a breaking change for agents that hardcoded the old tool name; the description calls it out explicitly. lambe_check: new tool Validates data against a JSON Schema subset without running a query. Returns `{"ok": true}` on agreement or `{"ok": false, "error": "..."}` with the disagreement path. Intended for API-contract checks, CI gates, and agents that want to verify fixtures before running queries. Server instructions updated The initial MCP instructions string now lists all four tools by name with one-line descriptions of when to use each. Helps agents pick the right tool without having to call tools/list. AGENTS.md updated Tool list in the top-level agent guide mirrors the new surface. Smoke-tested end-to-end via JSON-RPC: - tools/list returns [lambe_query, lambe_print_shape, lambe_check, lambe_assert]. - lambe_print_shape on a users object emits valid JSON Schema with required set from the data's concrete keys. - lambe_check with matching schema returns {"ok": true}. - lambe_check with mismatched schema returns {"ok": false, "error": "schema disagreement at $.age: ..."}. - lambe_query with a schema parameter that disagrees with data returns isError=true before running the query. inferSchema is no longer referenced from bin/mcp_server.dart. The legacy function remains in lib/src/output.dart marked @Deprecated; all repo callsites have now migrated. Quality gates: dart analyze clean, 1436 tests pass, dart format clean, pana 160/160. Step 7 of 9. Next: CLI integration tests for --schema / --print-shape. --- AGENTS.md | 2 +- bin/mcp_server.dart | 111 ++++++++++++++++++++++++++++++++++++++------ 2 files changed, 97 insertions(+), 16 deletions(-) diff --git a/AGENTS.md b/AGENTS.md index 7d40591..595137b 100644 --- a/AGENTS.md +++ b/AGENTS.md @@ -54,7 +54,7 @@ The `lambe_query` MCP tool is available for querying structured data. Connect wi lam-mcp # stdio transport ``` -Tools: `lambe_query` (extract/filter/transform), `lambe_schema` (structure inspection), `lambe_assert` (validation). +Tools: `lambe_query` (extract/filter/transform; optional `schema` parameter for structural validation before the query runs), `lambe_print_shape` (structure inspection — returns JSON Schema), `lambe_check` (validate data against a JSON Schema), `lambe_assert` (boolean assertion on a query). ### In Dart Code diff --git a/bin/mcp_server.dart b/bin/mcp_server.dart index 1696c42..6152e66 100644 --- a/bin/mcp_server.dart +++ b/bin/mcp_server.dart @@ -24,10 +24,12 @@ base class LambeServer extends MCPServer with ToolsSupport { implementation: Implementation(name: 'lambe', version: lambeVersion), instructions: 'Lambé is a multi-format query language for structured data. ' - 'Use the query tool to find, extract, filter, transform, or look up ' + 'Use lambe_query to find, extract, filter, transform, or look up ' 'values from JSON, YAML, TOML, HCL, CSV, TSV, or Markdown files. ' - 'Use the schema tool to understand data structure before querying. ' - 'Use the assert tool to validate or check conditions on data.\n\n' + 'Use lambe_print_shape to understand data structure before ' + 'querying (returns JSON Schema). ' + 'Use lambe_check to validate data against a JSON Schema. ' + 'Use lambe_assert to validate or check conditions on data.\n\n' 'Common patterns:\n' ' .database.host — extract a value\n' ' .users | filter(.age > 30) | map(.name) — filter and project\n' @@ -73,7 +75,8 @@ base class LambeServer extends MCPServer with ToolsSupport { ' — code blocks for one language\n', ) { registerTool(_queryTool, _handleQuery); - registerTool(_schemaTool, _handleSchema); + registerTool(_printShapeTool, _handlePrintShape); + registerTool(_checkTool, _handleCheck); registerTool(_assertTool, _handleAssert); } @@ -176,6 +179,17 @@ base class LambeServer extends MCPServer with ToolsSupport { 'other output formats.', values: ['refuse', 'json'], ), + 'schema': Schema.string( + description: + 'Optional inline JSON Schema subset (as a string) ' + 'describing the expected shape of data. When provided, ' + 'the data is validated against the schema before the ' + 'query runs; a concrete-type disagreement returns an ' + 'error. Accepts type, properties, items, required. ' + 'Rejects structural combinators, value-level ' + 'constraints, references, and additionalProperties with ' + 'a per-keyword error.', + ), }, required: ['expression', 'data'], ), @@ -188,9 +202,19 @@ base class LambeServer extends MCPServer with ToolsSupport { final formatStr = args['format'] as String?; final outputFormatStr = args['output_format'] as String?; final flattenCellsStr = args['flatten_cells'] as String?; + final schemaStr = args['schema'] as String?; try { final format = formatStr != null ? Format.values.byName(formatStr) : null; + + // Validate data against schema first, if provided. A structural + // disagreement returns an error before the query runs. + if (schemaStr != null) { + final schema = parseJsonSchema(schemaStr); + final parsed = parseInput(data, format ?? sniffFormat(data)); + mergeSchemaWithData(schema, shapeOf(parsed)); + } + final result = queryString(expression, data, format: format); final outputFormat = outputFormatStr != null @@ -226,14 +250,17 @@ base class LambeServer extends MCPServer with ToolsSupport { // See `renderMcpShapeErrorPayload` in package:lambe/lambe.dart for // the payload shape this server emits on output-shape mismatches. - final _schemaTool = Tool( - name: 'lambe_schema', + final _printShapeTool = Tool( + name: 'lambe_print_shape', description: 'Use this tool to understand the structure of unfamiliar data before ' - 'writing queries. Returns type names (string, number, boolean, null) ' - 'instead of actual values. Use when the user says "show me the ' - 'structure", "what fields are in this", or "what does this data look ' - 'like".', + 'writing queries. Returns a JSON Schema subset document ' + '(type/properties/items/required) describing the inferred shape. Use ' + 'when the user says "show me the structure", "what fields are in ' + 'this", or "what does this data look like". The output round-trips ' + 'with the `schema` parameter on lambe_query and with lambe_check. ' + 'Renamed from the 0.8.0 lambe_schema tool; output format changed ' + 'from type-name strings to JSON Schema.', inputSchema: Schema.object( properties: { 'data': Schema.string( @@ -250,7 +277,7 @@ base class LambeServer extends MCPServer with ToolsSupport { ), ); - FutureOr _handleSchema(CallToolRequest request) { + FutureOr _handlePrintShape(CallToolRequest request) { final args = request.arguments!; final data = args['data'] as String; final formatStr = args['format'] as String?; @@ -258,11 +285,8 @@ base class LambeServer extends MCPServer with ToolsSupport { try { final format = formatStr != null ? Format.values.byName(formatStr) : null; final parsed = parseInput(data, format ?? sniffFormat(data)); - final schema = inferSchema(parsed); return CallToolResult( - content: [ - TextContent(text: const JsonEncoder.withIndent(' ').convert(schema)), - ], + content: [TextContent(text: renderJsonSchema(shapeOf(parsed)))], ); } on QueryError catch (e) { return CallToolResult( @@ -272,6 +296,63 @@ base class LambeServer extends MCPServer with ToolsSupport { } } + final _checkTool = Tool( + name: 'lambe_check', + description: + 'Validate data against a JSON Schema subset. Use this when the user ' + 'wants to verify that data matches an expected shape without ' + 'running a query — API response shape checks, CI contract ' + 'validation, "does this match the spec". Returns ' + '{"ok": true} on agreement, or ' + '{"ok": false, "error": "..."} naming the disagreement path. ' + 'Accepts the same JSON Schema subset as lambe_query\'s schema ' + 'parameter: type, properties, items, required. Structural ' + 'combinators, value-level constraints, and references are ' + 'rejected per-keyword.', + inputSchema: Schema.object( + properties: { + 'schema': Schema.string( + description: 'Inline JSON Schema subset as a string.', + ), + 'data': Schema.string( + description: + 'The input data as a string (JSON, YAML, TOML, HCL, CSV, TSV, ' + 'or Markdown).', + ), + 'format': UntitledSingleSelectEnumSchema( + description: 'Input format. Auto-detected if omitted.', + values: ['json', 'yaml', 'toml', 'hcl', 'csv', 'tsv', 'markdown'], + ), + }, + required: ['schema', 'data'], + ), + ); + + FutureOr _handleCheck(CallToolRequest request) { + final args = request.arguments!; + final schemaStr = args['schema'] as String; + final data = args['data'] as String; + final formatStr = args['format'] as String?; + + try { + final schema = parseJsonSchema(schemaStr); + final format = formatStr != null ? Format.values.byName(formatStr) : null; + final parsed = parseInput(data, format ?? sniffFormat(data)); + mergeSchemaWithData(schema, shapeOf(parsed)); + return CallToolResult(content: [TextContent(text: '{"ok": true}')]); + } on QueryError catch (e) { + return CallToolResult( + content: [ + TextContent( + text: const JsonEncoder.withIndent( + ' ', + ).convert({'ok': false, 'error': e.message}), + ), + ], + ); + } + } + final _assertTool = Tool( name: 'lambe_assert', description: From 6e360081a222f36b683a696682be37f1671eef62 Mon Sep 17 00:00:00 2001 From: Hakim Jonas Ghoula Date: Sun, 3 May 2026 01:12:19 +0200 Subject: [PATCH 16/67] Track A step 8: CLI integration tests for --schema and --print-shape MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Nine new end-to-end tests in test/cli_integration_test.dart pin the schema surface at the CLI layer. Each spawns `dart bin/lam.dart` and asserts on exit code, stdout, stderr. --print-shape (3 tests) 1. Emits valid JSON Schema for a typical object (parses as JSON, carries type/properties/required). 2. Round-trip: print-shape data.json > data.schema.json, then --schema data.schema.json '.' data.json succeeds. Proves the renderer + parser agree end-to-end via a real subprocess, closing the loop the library-level round-trip tests opened. 3. --print-shape + --schema is rejected (redundant combination). --schema (6 tests) 4. Explicit --schema threads into --explain inputShape; the shape trace surfaces schema-declared optional fields (email: optional) that don't exist in data. 5. Sibling .schema.json is auto-detected when --schema is omitted. Verifies the same schema information flows through. 6. Schema disagreement (data.age is number, schema says string) exits 1 with a path-annotated stderr message ("$.age", "string", "number" all present). 7. Schema parse error on rejected keyword (allOf) surfaces a clear diagnostic (contains "allOf" and "unsupported"). 8. Missing schema file exits 1 with "schema file not found". 9. --ndjson + --schema is rejected. These exercise the full wiring added in step 5 (CLI) plus the parser/loader/renderer library layer from steps 2-4. Library tests stay the foundation; integration tests here pin the glue. Quality gates: dart analyze clean, 1445 tests pass (was 1436, +9), dart format clean, pana 160/160. Step 8 of 9 in doc/schema-design.md. Next: docs polish — CHANGELOG 0.9.0 entry, README reframe, doc/schema.md user guide, man page examples. --- test/cli_integration_test.dart | 153 +++++++++++++++++++++++++++++++++ 1 file changed, 153 insertions(+) diff --git a/test/cli_integration_test.dart b/test/cli_integration_test.dart index 744b70c..935f858 100644 --- a/test/cli_integration_test.dart +++ b/test/cli_integration_test.dart @@ -454,4 +454,157 @@ void main() { expect(err, contains('--explain')); }); }); + + group('--print-shape: JSON Schema output', () { + test('emits valid JSON Schema for a typical object', () async { + final file = File('${tmp.path}/data.json') + ..writeAsStringSync('{"name":"alice","age":30}'); + final (code, out, _) = await _runLam(['--print-shape', file.path]); + expect(code, 0); + // Parse to prove it's valid JSON and has the documented shape. + final parsed = jsonDecode(out) as Map; + expect(parsed['type'], 'object'); + expect(parsed['properties'], isA>()); + expect(parsed['required'], containsAll(['name', 'age'])); + }); + + test('output is round-trippable through --schema input', () async { + // print-shape data.json > data.schema.json, then running + // --schema data.schema.json '.' data.json must succeed. + final dataFile = File('${tmp.path}/data.json') + ..writeAsStringSync('{"a":1,"b":"x"}'); + final (code1, out, _) = await _runLam(['--print-shape', dataFile.path]); + expect(code1, 0); + + final schemaFile = File('${tmp.path}/regen.schema.json') + ..writeAsStringSync(out); + final (code2, _, err2) = await _runLam([ + '--schema', + schemaFile.path, + '.a', + dataFile.path, + ]); + expect( + code2, + 0, + reason: + 'print-shape -> schema round-trip should validate ' + 'cleanly; stderr was: $err2', + ); + }); + + test('rejects combination with --schema (redundant)', () async { + final data = File('${tmp.path}/d.json')..writeAsStringSync('{}'); + final schema = File('${tmp.path}/s.json') + ..writeAsStringSync('{"type":"object"}'); + final (code, _, err) = await _runLam([ + '--print-shape', + '--schema', + schema.path, + data.path, + ]); + expect(code, 1); + expect(err, contains('--print-shape')); + }); + }); + + group('--schema: input schema threading', () { + test('explicit --schema threads into --explain inputShape', () async { + final data = File('${tmp.path}/data.json') + ..writeAsStringSync('{"users":[{"name":"alice","age":30}]}'); + // Schema declares `email` as optional on users. + final schema = File('${tmp.path}/s.json')..writeAsStringSync( + '{"type":"object","properties":{"users":{"type":"array","items":' + '{"type":"object","properties":{"name":{"type":"string"},' + '"age":{"type":"number"},"email":{"type":"string"}},' + '"required":["name","age"]}}},"required":["users"]}', + ); + final (code, out, _) = await _runLam([ + '--schema', + schema.path, + '--explain', + '.users | map(.email)', + data.path, + ]); + expect(code, 0); + // The explain output should show `email: optional` + // in the users element shape. + expect(out, contains('email: optional')); + }); + + test('sibling .schema.json is auto-detected', () async { + final data = File('${tmp.path}/items.json') + ..writeAsStringSync('[{"id":"x","n":1}]'); + File('${tmp.path}/items.schema.json').writeAsStringSync( + '{"type":"array","items":{"type":"object","properties":' + '{"id":{"type":"string"},"n":{"type":"number"},' + '"note":{"type":"string"}},"required":["id","n"]}}', + ); + final (code, out, _) = await _runLam(['--explain', '.', data.path]); + expect(code, 0); + // Auto-detected schema adds `note: optional` to element. + expect(out, contains('note: optional')); + }); + + test('schema disagreement exits 1 with a path-annotated error', () async { + final data = File('${tmp.path}/d.json')..writeAsStringSync('{"age":30}'); + final schema = File('${tmp.path}/s.json')..writeAsStringSync( + '{"type":"object","properties":{"age":{"type":"string"}},' + '"required":["age"]}', + ); + final (code, _, err) = await _runLam([ + '--schema', + schema.path, + '.', + data.path, + ]); + expect(code, 1); + expect(err, contains('disagreement')); + expect(err, contains(r'$.age')); + expect(err, contains('string')); + expect(err, contains('number')); + }); + + test('schema parse error surfaces a clear diagnostic', () async { + final data = File('${tmp.path}/d.json')..writeAsStringSync('{}'); + final schema = File('${tmp.path}/bad.json') + ..writeAsStringSync('{"allOf":[{"type":"object"}]}'); + final (code, _, err) = await _runLam([ + '--schema', + schema.path, + '.', + data.path, + ]); + expect(code, 1); + expect(err, contains('allOf')); + expect(err, contains('unsupported')); + }); + + test('missing schema file exits 1 with a clear error', () async { + final data = File('${tmp.path}/d.json')..writeAsStringSync('{}'); + final (code, _, err) = await _runLam([ + '--schema', + '${tmp.path}/nonexistent.json', + '.', + data.path, + ]); + expect(code, 1); + expect(err, contains('schema file not found')); + }); + + test('--ndjson rejects --schema', () async { + final data = File('${tmp.path}/e.ndjson')..writeAsStringSync('{}\n'); + final schema = File('${tmp.path}/s.json') + ..writeAsStringSync('{"type":"object"}'); + final (code, _, err) = await _runLam([ + '--ndjson', + '--schema', + schema.path, + '.', + data.path, + ]); + expect(code, 1); + expect(err, contains('--schema')); + }); + }); } From 7a099b7cc5f407bbb18334852b10f728c691087f Mon Sep 17 00:00:00 2001 From: Hakim Jonas Ghoula Date: Sun, 3 May 2026 01:45:21 +0200 Subject: [PATCH 17/67] Pre-docs: add lambe_explain MCP tool MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Self-review of the full 0.9.0 before the docs polish surfaced a real gap: track B shipped --explain-json at the CLI but never surfaced --explain to MCP agents. The positioning pitch ("shows you what you're working with") specifically targets agents; leaving them without structured explain output undermines the track B deliverable. Framing this as "future" was reflexive, not reasoned. 40 lines of tool wiring calling existing library functions is not a future feature; it's an unfinished track. lambe_explain tool Parameters: expression (required): the query to analyze. data (optional): when provided, shape seeds from shapeOf(data). format (optional): input format for data; auto-detected if not given. schema (optional inline string): merges with shapeOf(data) for a more precise initial shape. With no data and no schema, starts from SAny. include_trivial (optional bool): surfaces trivial-result warnings (--explain-trivial equivalent). flatten_cells (optional enum): affects the writable_as summary. Returns renderExplainJson(report) — the exact same payload the CLI's --explain-json emits, with snake_case keys and nested-kind shape trees. Agents get one structured contract across surfaces. Updated the MCP server instructions to list all five tools. Updated AGENTS.md tool inventory. Smoke-tested end-to-end via JSON-RPC: - tools/list returns five tools including lambe_explain. - lambe_explain with data + expression returns a trace where .users shape is list> and |map(.name) is list. - lambe_explain with data + schema (schema declares email as optional) produces list> when .email is accessed — agent-advantage use case proven. - lambe_explain with include_trivial: true surfaces trivial_result warnings for sort_by(.missing). - lambe_explain with no data (expression-only) still produces a meaningful trace (length on unknown input infers SNum). Existing library-level tests cover the underlying renderExplainJson and explain functions; the new MCP tool is a thin wrapper. CLI subprocess tests for MCP are consistently deferred across all server tools. Quality gates: dart analyze clean, 1445 tests pass, dart format clean, pana 160/160. REPL still lacks :explain. Leaving as genuine future work: REPL users can already run queries live (sub-100ms), so the "see-before-run" need is weaker there than it is for agents. Clears the track-A step-9 prerequisite: MCP surface is now coherently covered. --- AGENTS.md | 2 +- bin/mcp_server.dart | 116 ++++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 117 insertions(+), 1 deletion(-) diff --git a/AGENTS.md b/AGENTS.md index 595137b..1d94f9f 100644 --- a/AGENTS.md +++ b/AGENTS.md @@ -54,7 +54,7 @@ The `lambe_query` MCP tool is available for querying structured data. Connect wi lam-mcp # stdio transport ``` -Tools: `lambe_query` (extract/filter/transform; optional `schema` parameter for structural validation before the query runs), `lambe_print_shape` (structure inspection — returns JSON Schema), `lambe_check` (validate data against a JSON Schema), `lambe_assert` (boolean assertion on a query). +Tools: `lambe_query` (extract/filter/transform; optional `schema` parameter for structural validation before the query runs), `lambe_print_shape` (structure inspection — returns JSON Schema), `lambe_check` (validate data against a JSON Schema), `lambe_explain` (trace a query statically, with or without data; returns a structured shape-per-stage report), `lambe_assert` (boolean assertion on a query). ### In Dart Code diff --git a/bin/mcp_server.dart b/bin/mcp_server.dart index 6152e66..42d5984 100644 --- a/bin/mcp_server.dart +++ b/bin/mcp_server.dart @@ -29,6 +29,8 @@ base class LambeServer extends MCPServer with ToolsSupport { 'Use lambe_print_shape to understand data structure before ' 'querying (returns JSON Schema). ' 'Use lambe_check to validate data against a JSON Schema. ' + 'Use lambe_explain to trace a query statically before running ' + 'it (returns a structured JSON report of shape at each stage). ' 'Use lambe_assert to validate or check conditions on data.\n\n' 'Common patterns:\n' ' .database.host — extract a value\n' @@ -77,6 +79,7 @@ base class LambeServer extends MCPServer with ToolsSupport { registerTool(_queryTool, _handleQuery); registerTool(_printShapeTool, _handlePrintShape); registerTool(_checkTool, _handleCheck); + registerTool(_explainTool, _handleExplain); registerTool(_assertTool, _handleAssert); } @@ -353,6 +356,119 @@ base class LambeServer extends MCPServer with ToolsSupport { } } + final _explainTool = Tool( + name: 'lambe_explain', + description: + 'Use this tool to trace the shape of values flowing through a ' + 'Lambe query without running it. Returns a structured JSON ' + 'report with one entry per pipe stage (source + inferred shape), ' + 'static-analysis warnings (empty filters, runtime rejections, ' + 'and optionally trivial results), and the output formats the ' + 'final shape can be serialized as. Use before `lambe_query` to ' + 'verify a query does what the user expects, or to find out why ' + 'an unfamiliar query would fail. Data is optional: without it, ' + 'the trace starts from "any" and still catches many classes of ' + 'mistake. A schema, when provided, sharpens the trace further.', + inputSchema: Schema.object( + properties: { + 'expression': Schema.string( + description: 'The Lambe query expression to analyze.', + ), + 'data': Schema.string( + description: + 'Optional input data. When present, shape inference seeds ' + 'from shapeOf(data); without it, the initial shape is ' + '"any".', + ), + 'format': UntitledSingleSelectEnumSchema( + description: 'Input format for [data]. Auto-detected if omitted.', + values: ['json', 'yaml', 'toml', 'hcl', 'csv', 'tsv', 'markdown'], + ), + 'schema': Schema.string( + description: + 'Optional inline JSON Schema subset. When provided, the ' + 'schema is merged with shapeOf(data) (or used alone when ' + 'no data is given) to produce a more precise initial ' + 'shape — optional fields and empty-list elements from ' + 'the schema become visible in the trace.', + ), + 'include_trivial': Schema.bool( + description: + 'When true, includes trivial-result warnings ' + '(sort_by/group_by/map/unique_by on a missing field). ' + 'Off by default because legitimate uses exist.', + ), + 'flatten_cells': UntitledSingleSelectEnumSchema( + description: + 'CSV/TSV cell policy for the writability summary. refuse ' + '(default) requires scalar cells; json accepts any list ' + 'at the root.', + values: ['refuse', 'json'], + ), + }, + required: ['expression'], + ), + ); + + FutureOr _handleExplain(CallToolRequest request) { + final args = request.arguments!; + final expression = args['expression'] as String; + final data = args['data'] as String?; + final formatStr = args['format'] as String?; + final schemaStr = args['schema'] as String?; + final includeTrivial = args['include_trivial'] as bool? ?? false; + final flattenCellsStr = args['flatten_cells'] as String?; + + try { + final ast = parseAst(expression); + final flattenCells = + flattenCellsStr != null + ? CellPolicy.values.byName(flattenCellsStr) + : CellPolicy.refuse; + + // Build the initial shape. Four cases: + // - no data, no schema: SAny + // - data only: shapeOf(data) + // - schema only: parseJsonSchema(schema) + // - both: mergeSchemaWithData(schema, shapeOf(data)) + Shape inputShape; + if (data == null && schemaStr == null) { + inputShape = const SAny(); + } else if (data == null) { + inputShape = parseJsonSchema(schemaStr!); + } else { + final format = + formatStr != null ? Format.values.byName(formatStr) : null; + final parsed = parseInput(data, format ?? sniffFormat(data)); + final dataShape = shapeOf(parsed); + inputShape = + schemaStr != null + ? mergeSchemaWithData(parseJsonSchema(schemaStr), dataShape) + : dataShape; + } + + final report = explain( + ast, + inputShape, + flattenCells: flattenCells, + includeTrivial: includeTrivial, + ); + return CallToolResult( + content: [TextContent(text: renderExplainJson(report))], + ); + } on QueryError catch (e) { + return CallToolResult( + content: [TextContent(text: 'Error: ${e.message}')], + isError: true, + ); + } on FormatException catch (e) { + return CallToolResult( + content: [TextContent(text: 'Parse error: ${e.message}')], + isError: true, + ); + } + } + final _assertTool = Tool( name: 'lambe_assert', description: From 3f4e3c4dd18326ef32588859a0aa3c03aa2d1b67 Mon Sep 17 00:00:00 2001 From: Hakim Jonas Ghoula Date: Sun, 3 May 2026 01:53:29 +0200 Subject: [PATCH 18/67] Track A step 9: docs polish for 0.9.0 Ship the 0.9.0 documentation pass: reframe the pitch to match what shipped, consolidate the scattered 0.9.0-dev CHANGELOG entries into a single coherent release section, add a user-guide for schemas, and fix stale references to the pre-rename CLI flags, deprecated library symbols, and old MCP tool names. pubspec.yaml - Version: 0.9.0 (regenerated lib/src/_version.dart). - Description reframed to the "shows you what you're working with" pitch, trimmed to fit pana's 180-char limit. - Added `schema` to topics. CHANGELOG.md - New 0.9.0 section organized by theme, not by track. Opens with the shape-feedback-loop framing. Five sections: schemas as a first-class contract, SOptional in the shape ADT, richer --explain, --ndjson, --flatten-cells, cross-surface Hint type. - Breaking changes called out explicitly: --schema renamed to --print-shape; --print-shape output format changed; MCP tool lambe_schema renamed to lambe_print_shape; Shape gains SOptional variant; ExplainWarning gains required kind param. - Deprecated section notes inferSchema scheduled for 1.0 removal. README.md - New lead: "a query language for structured data that shows you what you're working with." Drops the jq comparison from the pitch and names the actual use case ("when you don't already know the data"). - New --schema section after --explain, showing both threaded- into-explain and validation-on-load examples, plus round-trip via --print-shape. - CLI examples: --schema and --print-shape replace the stale --schema data.json (which now means something different). - Library example: shapeOf/renderJsonSchema/parseJsonSchema/ mergeSchemaWithData replace the deprecated inferSchema. - MCP tool list: five tools with their feedback-loop roles. - Docs index: added doc/schema.md. - REPL banner version bumped. DESIGN.md - MCP tool list updated to five tools. doc/schema.md (new) - Complete user guide for the schema feature: why-use, accepted keywords, rejected keywords, CLI/REPL/MCP/library surface, disagreement semantics, round-trip, what schemas don't do. - Clarifies the shapeOf-vs-schema division of labor. doc/lam.1.md - Added schema-checked query and schema-seeded explain examples to the EXAMPLES section. - Regenerated doc/lam.1 via tool/manpage.dart. AGENTS.md was already updated in step 7 (MCP). Quality gates: dart analyze clean, 1445 tests pass, dart format clean, pana 160/160 (description length was over 180 chars on first pass; trimmed). Completes track A. Release-ready from a code/docs perspective. What remains outside track A: install.sh + Homebrew tap for 1.0, the downstream rem/arda-web commits still unpushed, and the push of the 0.9.0-dev branch itself. --- CHANGELOG.md | 225 +++++++++++++++++++++++++++++------------- DESIGN.md | 2 +- README.md | 65 ++++++++++-- doc/lam.1 | 14 ++- doc/lam.1.md | 10 +- doc/schema.md | 186 ++++++++++++++++++++++++++++++++++ lib/src/_version.dart | 2 +- pubspec.yaml | 7 +- 8 files changed, 429 insertions(+), 82 deletions(-) create mode 100644 doc/schema.md diff --git a/CHANGELOG.md b/CHANGELOG.md index 5720ab7..bb07de6 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -1,70 +1,161 @@ -## 0.9.0-dev - -In progress. - -### Added - -- **Richer `--explain` output.** Three new categories of static - analysis, plus a structured output mode: - - **Runtime-rejection warnings** (always on): flags pipe ops whose - input shape is provably incompatible. `.config | filter(.x)` on a - known map produces "filter rejects map<...>; this will throw at - runtime." The existing pipe-op acceptance predicates in - `pipe_ops.dart` supply the check; `explain` surfaces it. - - **Trivial-result warnings** (opt-in via `--explain-trivial`): - flags `sort_by`, `group_by`, `map`, and `unique_by` whose - argument references a field provably absent on the element shape. - Often a typo but legitimate uses exist (stable no-op sort, - explicit null projection), hence opt-in. - - **Structured JSON output** (`--explain-json`): emits the full - explain report as JSON with snake_case keys - (`stages`, `warnings`, `writable_as`, `not_writable_as`, - `flatten_cells`). Warning kinds serialize as `empty_filter`, - `runtime_rejection`, `trivial_result`. Shapes serialize as nested - `{kind, ...}` trees (via `shapeToJson`) rather than stringified, - so agents can pattern-match shape structure without re-parsing. - For agent tooling and build-pipeline integration. -- **`shapeToJson`** library function: serializes a [`Shape`] as a - nested `Map` with a `kind` discriminator on each - node. The structured format used by `--explain-json`. -- **`ExplainWarning.kind`** (new field, [`WarningKind`] enum). - Classifier for filtering: CLI, JSON consumers, and future tooling - can select warning categories without parsing message strings. The - existing `emptyFilter` case carries the kind it always had. -- **`renderExplainJson`** library function: produces the JSON report. -- Both `--explain-trivial` and `--explain-json` imply `--explain`, - following the pattern of `--ndjson` being a non-combinable mode. -- **`--ndjson` mode for line-delimited JSON input.** Each line of the - source is parsed as an independent JSON document, the query is - evaluated per line with no shared state, and one compact JSON - result is emitted per line. Auto-enabled when the file extension is - `.ndjson` or `.jsonl`. Fail-fast on the first malformed or - unevaluable line; the line number is carried in the error. Covers - the "tail a log" use case without touching the core "AST over - in-memory tree" model. Available as a new top-level `queryNdjson` - function on the library (`Iterable -> Iterable`). - Cannot combine with `--interactive`, `--schema`, `--assert`, or - `--explain`; output is restricted to JSON. -- **`--flatten-cells` option for CSV/TSV output.** Accepts `refuse` - (default, 0.8.0 behavior) or `json`. Under `json`, non-scalar cells - are encoded as JSON strings inline; the shape check widens - `MustBeFlatList` to `MustBeList` for csv/tsv. Available in the CLI - (`--flatten-cells`), the REPL (`:flatten-cells`), the MCP server - (`flatten_cells` parameter), and as a `CellPolicy flattenCells` - named parameter on `formatOutput`, `canWriteAs`, `canWriteShapeAs`, - `requirementFor`, and `explain`. Round-tripping the resulting CSV - back into Lambë does not recover the original structure; this is - an output-side escape hatch, not a faithful encoding. -- **`NotWritable.hints`.** A list of strings surfacing environmental - guidance (flags, settings) relevant to the mismatch. The first such - hint covers the `--flatten-cells json` escape hatch: when a - CSV/TSV request rejects under `refuse` but a list root is already - present, the hint points at the equivalent CLI flag, REPL command, - and MCP parameter. Uniform channel across CLI, REPL, and MCP so - tools don't re-derive the condition. -- **`ExplainReport.flattenCells`.** The cell policy the report was - generated under. `renderExplain` prints `Cell policy: json` as a - footer when non-default; default output is byte-for-byte unchanged. +## 0.9.0 + +Closes the shape feedback loop. Declare a JSON Schema, check queries +against it, round-trip schemas with the ecosystem. Plus: richer +static analysis in `--explain`, line-delimited JSON input, and an +opt-in CSV escape hatch for nested cells. + +### Schemas as a first-class contract + +- **`--schema `** on the CLI. Threads a JSON Schema subset + through both `--explain` inference and normal evaluation. With + data, the schema validates at load time (structural disagreement + exits 1 with a JSON path). Without data, the schema alone seeds + shape inference for design-time planning. +- **Sibling auto-detect.** Data at `path/to/data.json` picks up + `path/to/data.schema.json` implicitly. Same convention as ndjson + auto-detect. +- **`--print-shape`** on the CLI. Emits `shapeOf(data)` as a JSON + Schema subset document, round-trippable with `--schema` input. The + same shape-to-JSON-Schema rendering powers + `renderJsonSchema(shape)` on the library and the MCP + `lambe_print_shape` tool. +- **REPL: `:schema [path]` and `:print-shape`.** `:schema ` + loads a schema for the session and reports agreement/disagreement + vs current data. `:schema` (no arg) prints the active schema. + `:load` re-validates against an active schema and warns on + disagreement. +- **MCP: `lambe_print_shape`, `lambe_check`, `lambe_explain`, plus + a `schema` parameter on `lambe_query`.** Agents can print a + shape, validate fixtures against a schema, trace a query + structurally before running, or gate a query on schema + conformance. `lambe_check` returns `{"ok": true}` / + `{"ok": false, "error": "..."}`. +- **Library surface.** `parseJsonSchema`, `renderJsonSchema`, + `loadSchemaFromFile`, `loadSchemaForData`, `mergeSchemaWithData` + are all exported from `package:lambe/lambe.dart`. + +### `SOptional` in the shape ADT + +- New sealed variant `SOptional(Shape)`. Represents + statically-known optionality — populated by JSON Schema's + `required` semantics, propagated through field access and op + inference, and surfaced by the explain trace. Nested optionality + collapses at construction: `SOptional(SOptional(x))` is always + `SOptional(x)`. +- Acceptance predicates unwrap `SOptional` for op inputs — `filter` + on `SOptional>` is accepted, with the potential absence + surfaced by a runtime-rejection warning rather than a silent + accept or a false reject. +- Root-level requirements (TOML/HCL `MustBeMap`) do NOT unwrap: an + absent root can't be serialized, so users must materialize a + default first. This asymmetry is deliberate. +- `shapeToJson` emits `{"kind": "optional", "inner": ...}`. + `renderJsonSchema` flattens `SOptional` inside `SMap` fields into + missing `required` entries (standard JSON Schema idiom); + non-field-position `SOptional` has no standard spelling in our + subset and is flattened with a docstring caveat. + +### Richer `--explain` output + +Three new categories of static analysis, plus a structured output +mode: +- **Runtime-rejection warnings** (always on). Flags pipe ops whose + input shape is provably incompatible. `.config | filter(.x)` on a + known map produces `"filter rejects map<...>; this will throw at + runtime"`. Uses the existing pipe-op acceptance predicates. +- **Trivial-result warnings** (opt-in via `--explain-trivial`). + Flags `sort_by`, `group_by`, `map`, and `unique_by` whose + argument references a field provably absent on the element shape. + Opt-in because legitimate uses exist (stable no-op sort, explicit + null projection). +- **Structured JSON output** (`--explain-json`). Emits the full + explain report as JSON with snake_case keys (`stages`, + `warnings`, `writable_as`, `not_writable_as`, `flatten_cells`). + Warning kinds serialize as `empty_filter`, `runtime_rejection`, + `trivial_result`. Shapes serialize as nested `{kind, ...}` trees + (via `shapeToJson`) so agents can pattern-match shape structure + without re-parsing. Also surfaces in the new `lambe_explain` MCP + tool. +- Both `--explain-trivial` and `--explain-json` imply `--explain`. +- New `shapeToJson(Shape)`, `renderExplainJson(ExplainReport)`, + `WarningKind` enum, and `ExplainWarning.kind` field on the + library. + +### `--ndjson` mode for line-delimited JSON input + +- Each line is parsed as an independent JSON document; the query is + evaluated per line with no shared state; one compact JSON result + per line. Auto-enabled when the file extension is `.ndjson` or + `.jsonl`. Stdin support streams: `tail -f app.log | lam --ndjson + '.level'` emits each result as the line arrives. +- Fail-fast on the first malformed or unevaluable line; error + carries the line number. +- New `queryNdjson(Iterable, LamExpr)` library function + (`Iterable`, lazy). +- Cannot combine with `--interactive`, `--schema`, `--assert`, or + `--explain`; output is restricted to JSON (`--to` other than + `json` is refused). + +### `--flatten-cells` for CSV/TSV + +- Opt-in escape hatch: non-scalar cells encoded as JSON strings + inline. Accepts `refuse` (default, 0.8.0 behavior) or `json`. + Under `json`, the shape check widens `MustBeFlatList` to + `MustBeList` for csv/tsv. Round-tripping the resulting CSV back + into Lambe does NOT recover structure; this is an output-side + escape hatch, not a faithful encoding. +- Surfaced at the CLI (`--flatten-cells`), REPL + (`:flatten-cells`), MCP (`flatten_cells` parameter), and as + `CellPolicy flattenCells` on `formatOutput`, `canWriteAs`, + `canWriteShapeAs`, `requirementFor`, and `explain`. + +### Cross-surface hints + +- **`NotWritable.hints`.** When a shape mismatch has an + environmental resolution (a flag, a setting, a tool parameter), + the report carries a structured `Hint` type with `label`, + `cliFlag`, `replCommand`, `mcpParameter`, and `explanation`. CLI, + REPL, and MCP each render the form that applies to them. + Agent-facing JSON carries `parameter`/`value` pairs, not CLI + syntax. +- The first shipping hint covers `--flatten-cells json`: when a + CSV/TSV request rejects under `refuse` but a list root is + already present. + +### Breaking changes + +- **`--schema` flag renamed to `--print-shape`.** 0.8.0's `--schema` + printed a type-name JSON summary of the data. That function moved + to `--print-shape`. The new `--schema` takes a JSON Schema file + path. Users scripting `lam --schema data.json` must change to + `lam --print-shape data.json`. ArgParser rejects the old form + because `--schema` now requires a value. +- **`--print-shape` output format changed.** Emits a JSON Schema + subset document (`{"type": "object", "properties": ..., "required": + ...}`) instead of the type-name-string JSON format 0.8.0 emitted + (`{"age": "number"}`). The new output round-trips with + `--schema` input; the old format had no round-trip path. +- **MCP tool `lambe_schema` renamed to `lambe_print_shape`.** Output + format also changed to JSON Schema, matching the CLI. Agents that + hardcoded the old tool name get "tool not found" and a message + pointing at `lambe_print_shape`. +- **`Shape` ADT gained `SOptional` variant.** Source-breaking for + external code that pattern-matches `Shape` without a default case + (probably just Lambe itself). Exhaustive switches now need a + fifth branch. +- **`ExplainWarning` constructor gained required `kind` parameter.** + External code constructing warnings directly must add a + `WarningKind`. Uncommon; the existing pattern is consuming + warnings, not producing them. + +### Deprecated + +- **`inferSchema(Object? value)`** library function. Emits + type-name-string JSON (no round-trip). Use + `renderJsonSchema(shapeOf(value))` for JSON Schema output, or + `shapeOf(value)` for the `Shape` ADT. Scheduled for removal in + 1.0. ## 0.8.0 diff --git a/DESIGN.md b/DESIGN.md index c9ae4d9..344d2c3 100644 --- a/DESIGN.md +++ b/DESIGN.md @@ -66,7 +66,7 @@ Absence is data (Maybe/Option semantics). Type mismatch is an error. |---------|---------|-----| | **CLI binary** | Platform engineers, DevOps | `dart compile exe` -> standalone `lam` binary | | **Dart library** | Flutter/Dart developers | `import 'package:lambe/lambe.dart'` | -| **MCP tool** | AI agents, LLM frameworks | `lambe_query`, `lambe_schema`, `lambe_assert` | +| **MCP tool** | AI agents, LLM frameworks | `lambe_query`, `lambe_print_shape`, `lambe_check`, `lambe_explain`, `lambe_assert` | --- diff --git a/README.md b/README.md index f53b958..1097308 100644 --- a/README.md +++ b/README.md @@ -1,8 +1,10 @@ # Lambë -*Query structured data, get errors with suggested fixes, and reshape results to the format you need.* +*A query language for structured data that shows you what you're working with.* -Lambë is a query language for JSON, YAML, TOML, HCL, CSV, TSV, and Markdown. Queries compose through a pipe operator, the same way a shell pipeline does. What's different: when a query produces a result your target format cannot serialize, Lambë infers the shape, explains the mismatch, and lists the curated query fragments that bridge it. The `as(fmt)` operator lets you ask for the bridge directly in the query language; `--explain` shows the shape at every pipe stage without running anything. +`lam` queries JSON, YAML, TOML, HCL, CSV, TSV, and Markdown. Unlike other query tools, it tells you what your query *does* before you run it — the shape at each pipe stage, which output formats can serialize the result, what would go wrong. + +Use it when you don't already know the data: inspecting an unfamiliar API response, auditing a Helm chart, verifying a CI pipeline's assumptions, or asking an AI agent to extract something without guessing at the structure. ``` $ lam --to toml '.dependencies | keys' pubspec.yaml @@ -14,6 +16,8 @@ $ lam --to toml '.dependencies | keys | as(toml)' pubspec.yaml items = ["rumil", "rumil_parsers", "rumil_expressions"] ``` +Queries are bounded and always terminate. No recursion, no lambdas, no `def`. That's the tradeoff: Lambe doesn't try to be a programming language, so its shape inference, `--explain`, `--schema`, and error remediations all work. + *Lambë (pronounced "lam-beh") means "language" in Quenya (Tolkien's elvish). The package name is `lambe` for ASCII compatibility.* ## Installation @@ -95,6 +99,34 @@ Not writable as: toml, hcl Explain flags provably-empty filters (`filter(.missing)` on a known shape) and runtime-rejection mismatches (`filter` on a non-list input) by default. Pass `--explain-trivial` to also flag `sort_by`/`group_by`/`map`/`unique_by` whose argument references a missing field (often a typo, sometimes intentional). For agent tooling and build pipelines, `--explain-json` emits the same information as a structured JSON document. +### `--schema` — declare a shape and let Lambe check your work + +When you have a JSON Schema for your data — from an API contract, OpenAPI spec, or hand-written docs — point `--schema` at it: + +``` +$ lam --schema api.schema.json --explain '.users | map(.email)' response.json +.users : list>> +| map(.email) : list> + +Writable as: json, yaml, csv, tsv +Not writable as: toml, hcl +``` + +The schema fills in information data alone can't express: optional fields (from JSON Schema's `required`), element shapes of empty lists, types `shapeOf` couldn't infer from sampling. `--explain` shows them; the evaluator trusts them. + +With data present, Lambe also validates: a schema saying `age: number` against data with `age: "30"` exits 1 at load time with a JSON-path-annotated diagnostic. No silent drift, no running a query against data that doesn't match its contract. + +A sibling `.schema.json` is auto-detected, so a project convention of placing schemas next to data works without explicit flags. + +The reverse direction is symmetrical: `lam --print-shape data.json` emits the inferred shape as a JSON Schema document. Round-trip: + +``` +lam --print-shape data.json > data.schema.json # bootstrap a schema from data +lam --schema data.schema.json '.users' data.json # use it back +``` + +Accepted JSON Schema keywords: `type`, `properties`, `items`, `required`. Value-level constraints (`minimum`, `pattern`, `enum`, etc.), structural combinators (`allOf`, `oneOf`), `$ref`, and conditional schemas are rejected with a per-keyword error. Lambe is a shape system, not a validation engine — for richer validation, reach for `ajv` or `check-jsonschema`. + ## Query Syntax Queries start with `.` (the current data) and chain operations with `|`: @@ -176,8 +208,11 @@ lam '.users | map("\(.name) is \(.age)")' data.json # Shape trace lam --explain '.users | map(.name)' data.json -# Schema inference -lam --schema data.json +# Shape inspection (JSON Schema output) +lam --print-shape data.json + +# Schema-checked queries: validate data against a schema as it runs +lam --schema api.schema.json '.users | map(.email)' response.json # CI validation lam --assert '.version != "0.0.0"' package.json @@ -209,7 +244,7 @@ lam -i data.json ``` ``` -lambe v0.8.0 - type :help for commands, :q to quit +lambe v0.9.0 - type :help for commands, :q to quit Data loaded: {3 fields, 42 users} lambe> .users | filter(.age > 30) | map(.name) @@ -250,8 +285,13 @@ final result2 = evaluateAst(ast, dataset2); final yaml = formatOutput(data, OutputFormat.yaml); final csv = formatOutput(users, OutputFormat.csv); -// Schema inference -final schema = inferSchema(data); +// Shape inference and JSON Schema output +final shape = shapeOf(data); // Shape ADT +final schemaJson = renderJsonSchema(shape); // JSON Schema text + +// Or parse a schema file and merge with observed data +final schema = parseJsonSchema(schemaSource); +final merged = mergeSchemaWithData(schema, shape); // throws on disagreement ``` ### Shape and bridging API @@ -353,7 +393,15 @@ Install, then add `.mcp.json` to your project: } ``` -This gives AI assistants three tools: `lambe_query` (extract/filter/transform), `lambe_schema` (structure inspection), `lambe_assert` (validation). When `lambe_query` encounters a shape mismatch with the requested output format, the error response includes a structured `suggestions` array: each entry carries a `template_text`, an `apply_as` (the complete query formed by appending the template to the original expression), and a one-line `explanation`. Agents can call the tool again with an `apply_as` verbatim. +This gives AI assistants five tools that cover the whole feedback loop: + +- `lambe_query` — extract/filter/transform, with an optional `schema` parameter that validates data structurally before the query runs. +- `lambe_print_shape` — inspect unfamiliar data; returns a JSON Schema subset document. +- `lambe_check` — validate data against a JSON Schema. Returns `{"ok": true}` or `{"ok": false, "error": "..."}` naming the disagreement path. +- `lambe_explain` — trace a query statically (with or without data); returns a structured JSON report with shape-per-stage, warnings, and writability. +- `lambe_assert` — boolean assertion on a query result. + +When `lambe_query` encounters a shape mismatch with the requested output format, the error response includes a structured `suggestions` array: each entry carries a `template_text`, an `apply_as` (the complete query formed by appending the template to the original expression), and a one-line `explanation`. Agents can call the tool again with an `apply_as` verbatim. ### For AI Coding Agents @@ -387,6 +435,7 @@ expect(data, lamHas('.users[0].address.city')); - [Getting started](doc/getting-started.md) - install and first queries - [Syntax reference](doc/syntax.md) - the full query language - [REPL guide](doc/repl.md) - interactive mode, commands, keyboard shortcuts +- [Schema guide](doc/schema.md) - the JSON Schema subset, merge semantics, round-trip - [Recipes](doc/recipes.md) - real-world patterns for Kubernetes, Terraform, CI, CSV - [Man page](doc/lam.1.md) - Unix man page (`man -l doc/lam.1`) diff --git a/doc/lam.1 b/doc/lam.1 index fb08bb3..09d7453 100644 --- a/doc/lam.1 +++ b/doc/lam.1 @@ -239,12 +239,24 @@ Format conversion: lam --to yaml '.config' data.json .fi .PP -Schema inspection: +Shape inspection: .PP .nf lam --print-shape deployment.yaml .fi .PP +Schema-checked query (validates data against the schema before running): +.PP +.nf +lam --schema api.schema.json '.users | map(.email)' response.json +.fi +.PP +Shape trace, schema-seeded (no data needed): +.PP +.nf +lam --schema api.schema.json --explain '.users | map(.email)' +.fi +.PP Shape trace for a pipeline: .PP .nf diff --git a/doc/lam.1.md b/doc/lam.1.md index 7410a07..7520dcb 100644 --- a/doc/lam.1.md +++ b/doc/lam.1.md @@ -256,10 +256,18 @@ Format conversion: lam --to yaml '.config' data.json -Schema inspection: +Shape inspection: lam --print-shape deployment.yaml +Schema-checked query (validates data against the schema before running): + + lam --schema api.schema.json '.users | map(.email)' response.json + +Shape trace, schema-seeded (no data needed): + + lam --schema api.schema.json --explain '.users | map(.email)' + Shape trace for a pipeline: lam --explain '.users | filter(.age > 30) | map(.name)' data.json diff --git a/doc/schema.md b/doc/schema.md new file mode 100644 index 0000000..4786512 --- /dev/null +++ b/doc/schema.md @@ -0,0 +1,186 @@ +# Lambe schemas + +Lambe supports a JSON Schema subset as the contract between a query and its data. Declare the shape once; let Lambe check that queries make sense against it, validate data conforms at runtime, and round-trip schemas with the rest of the ecosystem. + +## Why use a schema? + +Lambe's default inference samples the data at hand. That's robust for known inputs but has gaps: + +- **Empty lists and maps.** `shapeOf([])` returns `list`; the element type is lost. +- **Mixed sampling.** Lists with heterogeneity beyond the sampling window collapse to `list`. +- **Queries without data.** CI planning, design documents, `--explain` without a file — no data to sample, no precision. + +A schema fills those in. `--explain` shows a sharper trace, errors fire earlier, and you can validate data against the shape before running anything. + +## Accepted JSON Schema subset + +Four keywords. That's it. + +| Keyword | Meaning | +|---|---| +| `type` | `"null"`, `"boolean"`, `"number"`, `"integer"`, `"string"`, `"array"`, `"object"` | +| `properties` | Map of field name → nested schema (for `object`) | +| `items` | Element schema (for `array`) | +| `required` | List of required property names (for `object`) | + +The empty object `{}` means "any shape" — JSON Schema's convention, preserved through round-trip. + +Unknown keywords are ignored (JSON Schema's extensibility rule), so `$schema`, `$id`, `title`, `description`, and other metadata flow through without complaint. + +## Rejected keywords + +Everything else is rejected with a per-keyword error and a JSON path: + +- **Value-level constraints** (`minimum`, `maximum`, `minLength`, `maxLength`, `pattern`, `enum`, `const`, `format`, `multipleOf`, `minItems`, `maxItems`, `uniqueItems`, `minProperties`, `maxProperties`). Lambe is a shape system, not a value validator. +- **Structural combinators** (`allOf`, `oneOf`, `anyOf`, `not`). The shape ADT is union-free by design. +- **Conditionals** (`if`, `then`, `else`, `dependencies`, `dependentRequired`, `dependentSchemas`). Would require a constraint solver. +- **References** (`$ref`, `$defs`, `definitions`). Schemas are single-file in 0.9. +- **Object constraints** (`additionalProperties`, `patternProperties`, `propertyNames`). + +If you have a richer schema, strip it down or run it through `ajv`/`check-jsonschema` for value validation separately. + +## Example schemas + +Simple: + +```json +{"type": "string"} +``` + +List of strings: + +```json +{"type": "array", "items": {"type": "string"}} +``` + +Object with required and optional fields: + +```json +{ + "type": "object", + "properties": { + "name": {"type": "string"}, + "age": {"type": "number"}, + "email": {"type": "string"} + }, + "required": ["name", "age"] +} +``` + +In Lambe's shape language, that last one is `map>`. + +## How Lambe uses your schema + +### CLI + +```bash +# Thread schema into --explain: shape trace reflects declared optionality +lam --schema api.schema.json --explain '.users | map(.email)' response.json + +# With data: schema validates at load time. Disagreement exits 1. +lam --schema api.schema.json '.users' response.json + +# Without data: schema alone is the initial shape (design-time planning) +lam --schema api.schema.json --explain '.users | map(.email)' +``` + +### Sibling auto-detect + +If you have `data.json` and `data.schema.json` side-by-side, `lam` picks up the schema implicitly: + +```bash +lam '.users' data.json # data.schema.json used automatically if present +``` + +Same convention as `.ndjson` auto-detect. An explicit `--schema ` overrides the sibling. + +### REPL + +``` +lambe> :schema api.schema.json +Schema loaded (agrees with current data). +lambe> :schema +{...prints the loaded schema as JSON Schema...} +lambe> :load other-data.json +Warning: data disagrees with active schema: schema disagreement at $.users: ... +lambe> :print-shape +{...prints the inferred shape of currently loaded data...} +``` + +### MCP + +Three tools cover the schema story for agents: + +- `lambe_print_shape` — takes data, returns its JSON Schema. +- `lambe_check` — takes schema + data, returns `{"ok": true}` or `{"ok": false, "error": "..."}`. +- `lambe_query` — takes an optional `schema` parameter that validates data before running the query. +- `lambe_explain` — takes an optional `schema` parameter; the explain trace reflects it. + +### Library + +```dart +import 'package:lambe/lambe.dart'; + +// Parse a schema string +final schema = parseJsonSchema(schemaText); + +// Load from a file (throws QueryError on missing/invalid) +final schema2 = loadSchemaFromFile('api.schema.json'); + +// Merge with observed data (throws on disagreement) +final merged = mergeSchemaWithData(schema, shapeOf(data)); + +// Emit a schema from a shape +final schemaText2 = renderJsonSchema(shape); +``` + +## Disagreement semantics + +When schema and data are both present, Lambe merges them: + +- **Both agree on a concrete type.** Use that type. +- **Schema has a field data doesn't.** Use the schema's shape for that field. +- **Data has a field schema doesn't.** Use data's shape. +- **Schema marks a field optional, data has it present.** Strip the `optional` wrapper for this run. +- **Concrete-type disagreement** (schema: `number`, data: `string`). Error at load time with a JSON path. + +The error path is designed to be actionable: + +``` +Error: schema disagreement at $.users[*].age: schema says number, data is string +``` + +Merge is the heart of why schemas matter: `--explain` stays honest (what it says is what will happen, because data and schema agree), and validation falls out as a side effect of loading. + +## Round-trip + +```bash +lam --print-shape data.json > data.schema.json # Shape -> JSON Schema +lam --schema data.schema.json '.' data.json # JSON Schema -> Shape +``` + +Round-trip invariant: `parseJsonSchema(renderJsonSchema(shape))` equals `shape` for every shape reachable through `parseJsonSchema`. Pinned by 12 representative cases in `test/schema_renderer_test.dart`. + +Lossy corner: `SOptional` inside a list's `items` or at the top level has no standard JSON Schema spelling in our subset. The renderer flattens those positions. `SOptional` inside an `SMap` field — the common case — round-trips faithfully via `required`. + +## What schemas don't do + +- **No value coercion.** Schema says `age: number`, data has `"30"`. Lambe does not parse the string at query time. The user still writes `.age | to_number`. A future release may add opt-in coercion. +- **No runtime constraints.** Schema saying `age` is `number` does not enforce `age >= 0` or `age <= 150` at query time. Value-level constraints are rejected from the schema at load time. +- **No schema composition.** `$ref` is rejected. For cross-file schemas, merge them yourself before pointing `--schema` at the result. +- **No runtime validation after load.** A CSV column with mixed strings and numbers won't surface at per-row granularity; we check the aggregate shape, not every value. + +## `shapeOf` vs schema + +Different tools for different jobs: + +| | `shapeOf(data)` | Schema | +|---|---|---| +| Source of truth | This particular dataset | The contract | +| Sees empty lists as | `list` | Declared element type | +| Handles mixed lists | Collapses to `list` | Declared element type | +| Available when data is absent | No | Yes | +| Sees optionality | No | Via `required` | +| Validates | N/A | Yes (at load time) | + +Use both when you can — `mergeSchemaWithData` is the merge function designed for this. Schema augments; data fills in extras; disagreement errors. diff --git a/lib/src/_version.dart b/lib/src/_version.dart index 5fa3407..f67190f 100644 --- a/lib/src/_version.dart +++ b/lib/src/_version.dart @@ -3,4 +3,4 @@ // pubspec.yaml version. /// Lambe version, sourced from pubspec.yaml at generation time. -const lambeVersion = '0.8.0'; +const lambeVersion = '0.9.0'; diff --git a/pubspec.yaml b/pubspec.yaml index 25f2a01..ecada50 100644 --- a/pubspec.yaml +++ b/pubspec.yaml @@ -1,12 +1,13 @@ name: lambe description: >- - Query JSON, YAML, TOML, HCL, and Markdown files with a composable pipeline DSL. - Like jq but multi-format, with cleaner syntax. CLI tool + Dart library + MCP server for AI agents. -version: 0.8.0 + A query language for structured data that shows you what you're working with. + Shape-aware --explain, JSON Schema input, format bridges. CLI + library + MCP. +version: 0.9.0 homepage: https://ardaproject.org/lambe repository: https://github.com/hakimjonas/lambe topics: - query + - schema - json - yaml - jq From 0b8d3a5041cb367cc10962c074291f9c171267f0 Mon Sep 17 00:00:00 2001 From: Hakim Jonas Ghoula Date: Sun, 3 May 2026 02:18:11 +0200 Subject: [PATCH 19/67] Install ergonomics: install.sh with checksum verification MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Ship the one-line installer the 0.8.0 handover called out as "the biggest single 1.0 ergonomic win." Users no longer need to know their architecture, fetch three curl commands, or use sudo. install.sh curl -fsSL https://raw.githubusercontent.com/hakimjonas/lambe/main/install.sh | sh Detects OS (Linux/macOS) and arch (x64/arm64). Resolves the latest release via the GitHub API (no auth, no JSON parser — grep+sed). Downloads lam and lam-mcp binaries into ~/.local/bin/. Verifies SHA256 against a published checksums.txt before installing; refuses to install on mismatch. Honors LAMBE_VERSION to pin a tag, LAMBE_PREFIX to change the install dir, LAMBE_BASE_URL to override the release base URL (useful for mirrors and testing), LAMBE_NO_MAN to skip the man page. Does NOT modify shell rc files. Prints a PATH reminder if the target bin dir isn't on PATH, showing the exact export line the user would add if they choose. Man page install is best-effort: if the release has a lam.1 asset (current releases do not — placeholder for a future bump), it's installed to ~/.local/share/man/man1/. Silently skipped otherwise. Release workflow: checksums.txt .github/workflows/release.yml now runs `sha256sum lam-* > checksums.txt` over the collected artifacts and uploads the manifest as a release asset. install.sh fetches this before any binary, and every binary is verified against it before install. Smoke-tested end to end with a local python HTTP server and fake artifacts: - Platform detection correctly identified linux-x64. - LAMBE_BASE_URL override worked (needed for the test). - checksums.txt parsed, expected hashes looked up per asset. - Correctly matched hashes: binaries installed with 0755 perms. - Corrupted lam-linux-x64 (hash mismatch): refused install, exited 1, wrote no files to the install prefix. - PATH reminder rendered correctly when target wasn't on PATH. README: new Installation section leads with the one-liner, keeps pub.dev / library / source-build options below for Dart users. CHANGELOG: new "Install ergonomics" section under 0.9.0. Still deferred: Homebrew tap (noted in handover, independent work, can be added post-0.9.0 without breaking the install story). Quality gates: dart analyze clean, 1445 tests pass, pana 160/160, install.sh `sh -n` syntax check clean. --- .github/workflows/release.yml | 8 ++ CHANGELOG.md | 15 +++ README.md | 12 +- install.sh | 200 ++++++++++++++++++++++++++++++++++ 4 files changed, 232 insertions(+), 3 deletions(-) create mode 100755 install.sh diff --git a/.github/workflows/release.yml b/.github/workflows/release.yml index 7166e8a..c38423e 100644 --- a/.github/workflows/release.yml +++ b/.github/workflows/release.yml @@ -56,6 +56,14 @@ jobs: path: artifacts merge-multiple: true + - name: Generate checksums.txt + run: | + cd artifacts + # One SHA256 per line, matching sha256sum / shasum -a 256 format. + # install.sh reads this to verify downloaded binaries. + sha256sum lam-* > checksums.txt + cat checksums.txt + - uses: softprops/action-gh-release@v3 with: files: artifacts/* diff --git a/CHANGELOG.md b/CHANGELOG.md index bb07de6..912060e 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -157,6 +157,21 @@ mode: `shapeOf(value)` for the `Shape` ADT. Scheduled for removal in 1.0. +### Install ergonomics + +- **`install.sh`** — one-line installer at the repo root. + `curl -fsSL https://raw.githubusercontent.com/hakimjonas/lambe/main/install.sh | sh` + downloads the latest `lam` and `lam-mcp` binaries for the current + platform (Linux x64/arm64, macOS x64/arm64), verifies SHA256 + against a published `checksums.txt`, and installs to + `~/.local/bin/`. No sudo, no shell rc edits. Respects + `LAMBE_VERSION` and `LAMBE_PREFIX` env vars. +- **Release workflow generates `checksums.txt`.** `.github/workflows/release.yml` + now publishes a combined SHA256 manifest for every release + artifact as an asset. `install.sh` relies on this for integrity + checking; downstream package managers (a future Homebrew tap, + apt/rpm) can reuse it. + ## 0.8.0 Adds element-level shape checking for CSV/TSV output, union headers diff --git a/README.md b/README.md index 1097308..b3f8f40 100644 --- a/README.md +++ b/README.md @@ -22,11 +22,17 @@ Queries are bounded and always terminate. No recursion, no lambdas, no `def`. Th ## Installation +One-line installer (Linux and macOS, no `sudo`, verifies SHA256 checksums): + ```bash -# Pre-built binary (no Dart required) -curl -L https://github.com/hakimjonas/lambe/releases/latest/download/lam-linux-x64 -o lam -chmod +x lam && sudo mv lam /usr/local/bin/ +curl -fsSL https://raw.githubusercontent.com/hakimjonas/lambe/main/install.sh | sh +``` + +This downloads `lam` and `lam-mcp` from the latest GitHub release into `~/.local/bin/`. Environment variables `LAMBE_VERSION` (pin a version) and `LAMBE_PREFIX` (change install dir) are supported; see the script for details. +Other options: + +```bash # From pub.dev (Dart users) dart pub global activate lambe diff --git a/install.sh b/install.sh new file mode 100755 index 0000000..e0a117f --- /dev/null +++ b/install.sh @@ -0,0 +1,200 @@ +#!/bin/sh +# +# Lambe installer. Downloads the latest release binaries for your +# platform, verifies SHA256 checksums against the published +# `checksums.txt`, and installs to `~/.local/bin/`. +# +# Usage: +# curl -fsSL https://raw.githubusercontent.com/hakimjonas/lambe/main/install.sh | sh +# +# Or clone and run: +# ./install.sh +# +# Environment variables (all optional): +# LAMBE_VERSION — pin a specific release (default: latest). Example: v0.9.0 +# LAMBE_PREFIX — installation prefix (default: $HOME/.local) +# LAMBE_NO_MAN — skip man page install when non-empty +# LAMBE_BASE_URL — override the release asset base URL. For testing +# against a mirror or staged release. Default: +# https://github.com/hakimjonas/lambe/releases/download +# +# Installs (under $LAMBE_PREFIX): +# bin/lam — the CLI binary +# bin/lam-mcp — the MCP server binary +# share/man/man1/lam.1 — man page, when a `man` command is present +# +# Does NOT modify shell rc files. Prints a PATH reminder if needed. + +set -eu + +REPO="hakimjonas/lambe" +VERSION="${LAMBE_VERSION:-}" +PREFIX="${LAMBE_PREFIX:-$HOME/.local}" +BIN_DIR="$PREFIX/bin" +MAN_DIR="$PREFIX/share/man/man1" + +# ---- pretty --------------------------------------------------------- + +# Detect whether stdout is a terminal (so we don't emit ANSI to pipes). +if [ -t 1 ] && command -v tput >/dev/null 2>&1; then + BOLD=$(tput bold) + DIM=$(tput dim) + RED=$(tput setaf 1) + GREEN=$(tput setaf 2) + YELLOW=$(tput setaf 3) + RESET=$(tput sgr0) +else + BOLD="" + DIM="" + RED="" + GREEN="" + YELLOW="" + RESET="" +fi + +info() { printf "%s==>%s %s\n" "${BOLD}" "${RESET}" "$*"; } +ok() { printf "%s✓%s %s\n" "${GREEN}" "${RESET}" "$*"; } +warn() { printf "%s!%s %s\n" "${YELLOW}" "${RESET}" "$*"; } +fail() { printf "%sx%s %s\n" "${RED}" "${RESET}" "$*" >&2; exit 1; } + +# ---- platform detection -------------------------------------------- + +detect_platform() { + os="$(uname -s)" + arch="$(uname -m)" + case "$os" in + Linux) platform_os="linux" ;; + Darwin) platform_os="macos" ;; + *) fail "Unsupported OS: $os. Use Scoop (Windows) or a pre-built binary from releases." ;; + esac + case "$arch" in + x86_64|amd64) platform_arch="x64" ;; + aarch64|arm64) platform_arch="arm64" ;; + *) fail "Unsupported arch: $arch." ;; + esac + printf "%s-%s" "$platform_os" "$platform_arch" +} + +# ---- version resolution -------------------------------------------- + +resolve_version() { + if [ -n "$VERSION" ]; then + printf "%s" "$VERSION" + return + fi + # Ask GitHub for the latest tag via the API. No JSON parser + # required: grep+sed is sufficient. + tag=$(curl -fsSL "https://api.github.com/repos/$REPO/releases/latest" \ + | sed -n 's/.*"tag_name": *"\([^"]*\)".*/\1/p' \ + | head -n1) + if [ -z "$tag" ]; then + fail "Could not resolve the latest release tag from GitHub API." + fi + printf "%s" "$tag" +} + +# ---- download + verify --------------------------------------------- + +download() { + url="$1" + dest="$2" + if ! curl -fsSL --retry 3 "$url" -o "$dest"; then + fail "Failed to download $url" + fi +} + +verify_checksum() { + # $1 = filename in checksums.txt $2 = local path + name="$1" + path="$2" + expected=$(grep " $name\$" "$CHECKSUMS" | awk '{print $1}') + if [ -z "$expected" ]; then + fail "No checksum found for $name in checksums.txt" + fi + if command -v sha256sum >/dev/null 2>&1; then + actual=$(sha256sum "$path" | awk '{print $1}') + elif command -v shasum >/dev/null 2>&1; then + actual=$(shasum -a 256 "$path" | awk '{print $1}') + else + fail "Neither sha256sum nor shasum is available for checksum verification." + fi + if [ "$expected" != "$actual" ]; then + fail "Checksum mismatch for $name: expected $expected, got $actual" + fi +} + +# ---- main ---------------------------------------------------------- + +main() { + info "Detecting platform..." + PLATFORM=$(detect_platform) + ok "platform: $PLATFORM" + + if [ -n "${LAMBE_BASE_URL:-}" ]; then + BASE_URL="$LAMBE_BASE_URL" + ok "base URL: $BASE_URL (override)" + else + info "Resolving version..." + TAG=$(resolve_version) + ok "version: $TAG" + BASE_URL="https://github.com/$REPO/releases/download/$TAG" + fi + LAM_ASSET="lam-$PLATFORM" + MCP_ASSET="lam-mcp-$PLATFORM" + + TMP=$(mktemp -d 2>/dev/null || mktemp -d -t lambe-install) + # Cleanup even on unexpected exit. + trap 'rm -rf "$TMP"' EXIT + + info "Downloading checksums..." + CHECKSUMS="$TMP/checksums.txt" + download "$BASE_URL/checksums.txt" "$CHECKSUMS" + ok "checksums.txt" + + info "Downloading $LAM_ASSET..." + download "$BASE_URL/$LAM_ASSET" "$TMP/lam" + verify_checksum "$LAM_ASSET" "$TMP/lam" + ok "$LAM_ASSET (verified)" + + info "Downloading $MCP_ASSET..." + download "$BASE_URL/$MCP_ASSET" "$TMP/lam-mcp" + verify_checksum "$MCP_ASSET" "$TMP/lam-mcp" + ok "$MCP_ASSET (verified)" + + info "Installing to $BIN_DIR..." + mkdir -p "$BIN_DIR" + install -m 0755 "$TMP/lam" "$BIN_DIR/lam" + install -m 0755 "$TMP/lam-mcp" "$BIN_DIR/lam-mcp" + ok "installed lam and lam-mcp" + + if [ -z "${LAMBE_NO_MAN:-}" ] && command -v man >/dev/null 2>&1; then + # Release artifacts don't include the man page today, so we skip + # gracefully when it's not in the tarball. Placeholder for a + # future release that ships the man page as an asset. + MAN_URL="$BASE_URL/lam.1" + if curl -fsSL --head "$MAN_URL" >/dev/null 2>&1; then + info "Installing man page to $MAN_DIR..." + mkdir -p "$MAN_DIR" + download "$MAN_URL" "$MAN_DIR/lam.1" + ok "installed man page (run: man lam)" + fi + fi + + info "Done." + printf "\n" + + # PATH reminder. Use `case` on :$PATH: to match start/middle/end + # of a colon-separated list without matching partial paths. + case ":${PATH:-}:" in + *":$BIN_DIR:"*) + ok "$BIN_DIR is on PATH. Try: lam --help" + ;; + *) + warn "$BIN_DIR is not on PATH." + printf " Add it by appending this to your shell rc (~/.bashrc, ~/.zshrc, ~/.config/fish/config.fish, etc.):\n" + printf " %sexport PATH=\"\$PATH:%s\"%s\n" "${DIM}" "$BIN_DIR" "${RESET}" + ;; + esac +} + +main "$@" From e50c68149468f6a0c02e1e31e8b8a68899ab5fa2 Mon Sep 17 00:00:00 2001 From: Hakim Jonas Ghoula Date: Sun, 3 May 2026 09:46:34 +0200 Subject: [PATCH 20/67] Release prep audit + tool/release_prep.sh MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Full audit of 0.9.0 before release. Findings and fixes: doc/lam.1.md frontmatter `source: Lambë 0.8.0` -> `0.9.0`. Not auto-generated; no CI check caught it. Regenerated doc/lam.1. pubspec.yaml Stray blank line in the dev_dependencies section removed (cosmetic; pana had no opinion). server.json + .github/workflows/release.yml MCP registry description was still the 0.8.0 "Query JSON, YAML, TOML, HCL, CSV, TSV, and Markdown" pitch. Updated both to the 0.9.0 "A query language for structured data that shows you what you're working with" framing so the MCP registry entry matches pubspec and README. The workflow's hardcoded description in the publish-mcp step now also reflects 0.9.0. tool/release_prep.sh (new) Scriptable release gate. Runs the full check matrix before tagging: * Version consistency (pubspec, _version.dart, man page frontmatter, CHANGELOG section, README banner). * File hygiene (nothing tracked that matches .gitignore patterns for secrets/benchmarks/session notes). * Dependencies (pubspec_overrides.yaml not tracked, dart pub get). * Quality gates (analyze, format, test, pana 160/160). * Documentation (doc/lam.1 synced with .md source, dart doc produces zero errors). * Release workflow (.yml present, all per-platform assets referenced, checksums.txt step present, server.json description matches pubspec). * Git state (clean working tree, tag doesn't exist yet, branch check). Exit 0 means ready to tag. Non-zero collects and reports all issues at once rather than failing on the first one — so you fix the whole list and re-run, not whack-a-mole. Usage: bash tool/release_prep.sh [version] The script flagged the doc/lam.1.md frontmatter on first run — so it's already paying for itself. The README banner check initially had a shell-word-splitting bug (grep output tokenized by whitespace meant `lambe` and `v0.9.0` became separate tokens); fixed with a while-read loop over a here-doc. What the script does NOT do: * Tag, push, or publish — those stay manual. This is the "am I ready?" audit, not the release itself. * Verify install.sh against a live release. Checked manually against a staged HTTP server during install.sh development; post-tag verification with LAMBE_VERSION=v0.9.0 is noted in the "Next steps" output. Post-audit state: dart analyze clean, 1445 tests pass, dart format clean, pana 160/160, man page round-trip matches. Ready to tag after the remaining uncommitted state (this commit) lands. --- .github/workflows/release.yml | 2 +- doc/lam.1 | 2 +- doc/lam.1.md | 2 +- pubspec.yaml | 1 - server.json | 2 +- tool/release_prep.sh | 337 ++++++++++++++++++++++++++++++++++ 6 files changed, 341 insertions(+), 5 deletions(-) create mode 100755 tool/release_prep.sh diff --git a/.github/workflows/release.yml b/.github/workflows/release.yml index c38423e..c981a6d 100644 --- a/.github/workflows/release.yml +++ b/.github/workflows/release.yml @@ -96,7 +96,7 @@ jobs: "\$schema": "https://static.modelcontextprotocol.io/schemas/2025-12-11/server.schema.json", "name": "io.github.hakimjonas/lambe", "title": "Lambe", - "description": "Query JSON, YAML, TOML, HCL, CSV, TSV, and Markdown with a composable pipeline syntax.", + "description": "A query language for structured data that shows you what you're working with. Shape-aware --explain, JSON Schema input, format bridges.", "repository": { "url": "https://github.com/hakimjonas/lambe.git", "source": "github" diff --git a/doc/lam.1 b/doc/lam.1 index 09d7453..ab4fa4e 100644 --- a/doc/lam.1 +++ b/doc/lam.1 @@ -1,4 +1,4 @@ -.TH "LAM" "1" "May 2026" "Lambë 0.8.0" "" +.TH "LAM" "1" "May 2026" "Lambë 0.9.0" "" .SH AUTHOR Hakim Jonas Ghoula .SH NAME diff --git a/doc/lam.1.md b/doc/lam.1.md index 7520dcb..525d46a 100644 --- a/doc/lam.1.md +++ b/doc/lam.1.md @@ -1,7 +1,7 @@ --- title: LAM section: 1 -source: Lambë 0.8.0 +source: Lambë 0.9.0 author: Hakim Jonas Ghoula date: May 2026 --- diff --git a/pubspec.yaml b/pubspec.yaml index ecada50..d14f4b5 100644 --- a/pubspec.yaml +++ b/pubspec.yaml @@ -22,7 +22,6 @@ dependencies: args: ^2.6.0 dart_mcp: ^0.5.0 - dev_dependencies: test: ^1.31.0 lints: ^6.0.0 diff --git a/server.json b/server.json index f039fe1..2af97ae 100644 --- a/server.json +++ b/server.json @@ -2,7 +2,7 @@ "$schema": "https://static.modelcontextprotocol.io/schemas/2025-12-11/server.schema.json", "name": "io.github.hakimjonas/lambe", "title": "Lambe", - "description": "Query JSON, YAML, TOML, HCL, CSV, TSV, and Markdown with a composable pipeline syntax.", + "description": "A query language for structured data that shows you what you're working with. Shape-aware --explain, JSON Schema input, format bridges.", "repository": { "url": "https://github.com/hakimjonas/lambe.git", "source": "github" diff --git a/tool/release_prep.sh b/tool/release_prep.sh new file mode 100755 index 0000000..81c8c82 --- /dev/null +++ b/tool/release_prep.sh @@ -0,0 +1,337 @@ +#!/usr/bin/env bash +# +# Release preparation routine for Lambe. +# +# Runs the full check matrix for a release candidate: version +# consistency, quality gates, docs, release workflow sanity. Does NOT +# tag, push, or publish — those stay manual. This script is the +# "am I ready to release?" audit you run before `git tag`. +# +# Usage: +# tool/release_prep.sh [version] +# +# Where `version` (optional) is the target version, e.g. "0.9.0". If +# omitted, read from pubspec.yaml. The script asserts every other +# place that names a version matches. +# +# Exit code 0 means ready to release. Non-zero means something is off +# and is reported to stderr. + +set -euo pipefail + +# ---- pretty --------------------------------------------------------- + +if [ -t 1 ] && command -v tput >/dev/null 2>&1; then + BOLD=$(tput bold) + GREEN=$(tput setaf 2) + RED=$(tput setaf 1) + YELLOW=$(tput setaf 3) + DIM=$(tput dim) + RESET=$(tput sgr0) +else + BOLD="" GREEN="" RED="" YELLOW="" DIM="" RESET="" +fi + +# Tracks whether any check has surfaced a failure. Checks that find +# issues set this to 1 so the script can continue running later checks +# and give you the full picture, while still exiting non-zero at the end. +FAILED=0 + +# Run a named check. First arg is the section label; remaining args +# are the command to run. We funnel stdout/stderr through so tests that +# want to be noisy still are, but we preface each with a visible +# banner. +section() { + label=$1 + shift + printf "\n%s== %s ==%s\n" "${BOLD}" "${label}" "${RESET}" +} + +ok() { printf "%s✓%s %s\n" "${GREEN}" "${RESET}" "$1"; } +fail() { + printf "%s✗%s %s\n" "${RED}" "${RESET}" "$1" + FAILED=1 +} +warn_note() { printf "%s!%s %s\n" "${YELLOW}" "${RESET}" "$1"; } +note() { printf "%s %s%s\n" "${DIM}" "$1" "${RESET}"; } + +# ---- repo layout sanity -------------------------------------------- + +section "Repo layout" + +# Run from the repo root. +cd "$(dirname "$0")/.." + +if [ ! -f pubspec.yaml ]; then + fail "pubspec.yaml not found — are you running from the repo root?" + exit 1 +fi + +# Read the pubspec version. +PUBSPEC_VERSION=$(sed -n 's/^version: *//p' pubspec.yaml | head -n1) +TARGET_VERSION="${1:-$PUBSPEC_VERSION}" + +if [ "$TARGET_VERSION" != "$PUBSPEC_VERSION" ]; then + fail "Argument version $TARGET_VERSION disagrees with pubspec.yaml version $PUBSPEC_VERSION" +else + ok "pubspec.yaml version: $PUBSPEC_VERSION" +fi + +# ---- version consistency ------------------------------------------- + +section "Version consistency" + +# lib/src/_version.dart +CODE_VERSION=$(grep -oE "'[0-9]+\\.[0-9]+\\.[0-9]+[^']*'" lib/src/_version.dart | tr -d "'") +if [ "$CODE_VERSION" = "$TARGET_VERSION" ]; then + ok "lib/src/_version.dart matches ($CODE_VERSION)" +else + fail "lib/src/_version.dart has $CODE_VERSION, expected $TARGET_VERSION. Run: dart run tool/gen_version.dart" +fi + +# Man page frontmatter source field +MAN_SOURCE_VERSION=$(sed -n 's/^source: *Lambë *//p' doc/lam.1.md | head -n1) +if [ "$MAN_SOURCE_VERSION" = "$TARGET_VERSION" ]; then + ok "doc/lam.1.md frontmatter source matches" +else + fail "doc/lam.1.md frontmatter source is 'Lambë $MAN_SOURCE_VERSION', expected 'Lambë $TARGET_VERSION'" +fi + +# CHANGELOG.md must have a section for this version at the top +if head -n3 CHANGELOG.md | grep -qE "^## $TARGET_VERSION\$"; then + ok "CHANGELOG.md has section '## $TARGET_VERSION' at top" +else + fail "CHANGELOG.md does not lead with '## $TARGET_VERSION'. Got: $(head -n1 CHANGELOG.md)" +fi + +# CHANGELOG must not still have a -dev suffix anywhere +if grep -qE "^## $TARGET_VERSION-dev" CHANGELOG.md; then + fail "CHANGELOG.md still contains a '$TARGET_VERSION-dev' section. Merge/rename before release." +else + ok "CHANGELOG.md has no leftover -dev section for this version" +fi + +# README REPL banner example (if present). Read lines from the grep +# output with a newline delimiter so we compare whole banner matches, +# not whitespace-split tokens. +if grep -qE "lambe v[0-9]+\\.[0-9]+\\.[0-9]+" README.md; then + while IFS= read -r v; do + got="${v#lambe v}" + if [ "$got" = "$TARGET_VERSION" ]; then + ok "README.md REPL banner example: $v" + else + fail "README.md REPL banner shows '$v', expected 'lambe v$TARGET_VERSION'" + fi + done </dev/null | grep -E '^(bench-results-.*\.json|lam-mcp|HANDOVER_.*\.md|\.mcpregistry_.*|pubspec_overrides\.yaml)$' || true) +if [ -z "$UNEXPECTED" ]; then + ok "no gitignored patterns in tracked files" +else + fail "these files are tracked but gitignored:" + echo "$UNEXPECTED" | sed 's/^/ /' +fi + +# ---- dependency sanity --------------------------------------------- + +section "Dependencies" + +# Check for path overrides in pubspec_overrides.yaml (local dev only, +# must never be committed). +if git ls-files | grep -q pubspec_overrides.yaml; then + fail "pubspec_overrides.yaml is tracked; remove it before release (path deps break for pub.dev consumers)" +else + ok "no tracked pubspec_overrides.yaml" +fi + +# dart pub outdated (informational — don't fail, just surface) +if command -v dart >/dev/null 2>&1; then + if dart pub get >/dev/null 2>&1; then + ok "dart pub get succeeds" + else + fail "dart pub get failed" + fi +fi + +# ---- quality gates ------------------------------------------------- + +section "Quality gates" + +if dart analyze 2>&1 | tail -n1 | grep -q "No issues found"; then + ok "dart analyze clean" +else + fail "dart analyze reported issues" + dart analyze 2>&1 | tail -5 | sed 's/^/ /' +fi + +if dart format --output=none --set-exit-if-changed . >/dev/null 2>&1; then + ok "dart format clean" +else + fail "dart format has pending changes. Run: dart format ." +fi + +# dart test: must say "All tests passed!" +TEST_OUT=$(dart test 2>&1 | tail -n3) +if echo "$TEST_OUT" | grep -q "All tests passed"; then + TEST_COUNT=$(echo "$TEST_OUT" | grep -oE '\+[0-9]+' | tr -d '+' | sort -n | tail -n1) + ok "dart test: $TEST_COUNT tests pass" +else + fail "dart test did not pass" + echo "$TEST_OUT" | sed 's/^/ /' +fi + +# pana 160/160 +if command -v pana >/dev/null 2>&1; then + PANA_SCORE=$(pana --no-warning --json 2>/dev/null \ + | python3 -c " +import json, sys +try: + d = json.load(sys.stdin) + g = sum(s['grantedPoints'] for s in d['report']['sections']) + m = sum(s['maxPoints'] for s in d['report']['sections']) + print(f'{g}/{m}') +except Exception as e: + print(f'ERROR: {e}') +") + if [ "$PANA_SCORE" = "160/160" ]; then + ok "pana: $PANA_SCORE" + else + fail "pana: $PANA_SCORE (expected 160/160)" + fi +else + warn_note "pana not installed — skipping (install: dart pub global activate pana)" +fi + +# ---- documentation ------------------------------------------------- + +section "Documentation" + +# Man page round-trip test is part of `dart test`, but explicitly +# regenerate + diff here to catch doc/lam.1.md edits that weren't +# followed by a manpage regen. +if dart run tool/manpage.dart > /tmp/lambe-release-manpage.$$.txt 2>/dev/null \ + && diff -q /tmp/lambe-release-manpage.$$.txt doc/lam.1 >/dev/null 2>&1; then + ok "doc/lam.1 matches tool/manpage.dart output" + rm -f /tmp/lambe-release-manpage.$$.txt +else + fail "doc/lam.1 is out of sync with doc/lam.1.md. Run: dart run tool/manpage.dart > doc/lam.1" + rm -f /tmp/lambe-release-manpage.$$.txt +fi + +# dart doc gen (warnings are known; fail on errors only) +DOC_OUT=$(rm -rf doc/api && dart doc --validate-links 2>&1 || true) +DOC_ERRORS=$(echo "$DOC_OUT" | grep -oE 'Found [0-9]+ warnings? and [0-9]+ errors?' || true) +if echo "$DOC_ERRORS" | grep -q "0 errors"; then + ok "dart doc: $DOC_ERRORS" +else + fail "dart doc reported errors: $DOC_ERRORS" +fi + +# ---- release workflow references ----------------------------------- + +section "Release workflow" + +# The workflow triggers on tags matching v*. Make sure the workflow +# file is present and the artifacts it produces match what install.sh +# expects. +if [ -f .github/workflows/release.yml ]; then + ok ".github/workflows/release.yml present" +else + fail ".github/workflows/release.yml missing" +fi + +EXPECTED_ASSETS="lam-linux-x64 lam-linux-arm64 lam-macos-x64 lam-macos-arm64 lam-windows-x64.exe lam-mcp-linux-x64 lam-mcp-linux-arm64 lam-mcp-macos-x64 lam-mcp-macos-arm64 lam-mcp-windows-x64.exe" +MISSING_ASSETS="" +for asset in $EXPECTED_ASSETS; do + if ! grep -q "$asset" .github/workflows/release.yml; then + MISSING_ASSETS="$MISSING_ASSETS $asset" + fi +done +if [ -z "$MISSING_ASSETS" ]; then + ok "release.yml references all expected per-platform binaries" +else + fail "release.yml is missing references to:$MISSING_ASSETS" +fi + +# checksums.txt generation step +if grep -q "checksums.txt" .github/workflows/release.yml; then + ok "release.yml generates checksums.txt (install.sh depends on this)" +else + fail "release.yml does not generate checksums.txt — install.sh will break" +fi + +# server.json description should match pubspec description +PUBSPEC_DESC=$(sed -n '/^description:/,/^[^ ]/{/^description:/!{/^[^ ]/!p;};}' pubspec.yaml | tr '\n' ' ' | sed 's/ */ /g; s/^ *//; s/ *$//') +SERVER_DESC=$(sed -n 's/.*"description": "\(.*\)",/\1/p' server.json) +# Compare first 80 chars — descriptions differ slightly (pubspec wraps, server.json is one line). +PUBSPEC_PREFIX=$(echo "$PUBSPEC_DESC" | cut -c1-80) +SERVER_PREFIX=$(echo "$SERVER_DESC" | cut -c1-80) +if [ "$PUBSPEC_PREFIX" = "$SERVER_PREFIX" ]; then + ok "server.json description matches pubspec.yaml" +else + warn_note "server.json and pubspec.yaml descriptions diverge at the lead" + note "pubspec: $PUBSPEC_PREFIX" + note "server : $SERVER_PREFIX" +fi + +# ---- git state ----------------------------------------------------- + +section "Git state" + +if [ -z "$(git status --porcelain)" ]; then + ok "working tree clean" +else + fail "uncommitted changes — commit or stash before tagging:" + git status --short | sed 's/^/ /' +fi + +# Existing tag for this version? +if git rev-parse --verify "v$TARGET_VERSION" >/dev/null 2>&1; then + fail "tag v$TARGET_VERSION already exists locally" +else + ok "tag v$TARGET_VERSION does not yet exist" +fi + +# Are we on main? +BRANCH=$(git rev-parse --abbrev-ref HEAD) +if [ "$BRANCH" = "main" ] || [ "$BRANCH" = "master" ]; then + ok "on branch: $BRANCH" +else + warn_note "on branch: $BRANCH (expected main/master)" +fi + +# Local commits ahead of origin? +AHEAD=$(git rev-list --count "origin/$BRANCH..$BRANCH" 2>/dev/null || echo "?") +if [ "$AHEAD" != "0" ] && [ "$AHEAD" != "?" ]; then + note "$AHEAD local commits ahead of origin/$BRANCH (push before tagging)" +fi + +# ---- summary ------------------------------------------------------- + +printf "\n" +if [ "$FAILED" -eq 0 ]; then + printf "%s✓ Ready to release %s%s\n" "${BOLD}${GREEN}" "$TARGET_VERSION" "${RESET}" + printf "\n" + printf "Next steps:\n" + printf " 1. git push origin $BRANCH\n" + printf " 2. git tag v$TARGET_VERSION\n" + printf " 3. git push origin v$TARGET_VERSION\n" + printf " 4. Watch the release workflow build binaries and publish.\n" + printf " 5. After binaries land, verify install.sh against the new release:\n" + printf " LAMBE_VERSION=v$TARGET_VERSION LAMBE_PREFIX=/tmp/verify sh install.sh\n" + exit 0 +else + printf "%s✗ Not ready to release %s%s\n" "${BOLD}${RED}" "$TARGET_VERSION" "${RESET}" + printf " Fix the issues above and re-run.\n" + exit 1 +fi From 14f757b79133add448a81f7fa42aa7762e56ee4e Mon Sep 17 00:00:00 2001 From: Hakim Jonas Ghoula Date: Sun, 3 May 2026 12:01:55 +0200 Subject: [PATCH 21/67] Pre-push cleanup: gitignore local tool cache dir, reframe schema-design.md MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Cleanup pass before pushing 0.9.0 to make sure the public repo state is free of internal-development-only content. .gitignore: add the local AI-tool session cache directory Mirrors how .idea/ and .vscode/ are already ignored — local tooling state belongs with the checkout, not the public repo. doc/schema-design.md: reframe as rationale, not internal plan The file was written as a track-A design doc in plan mode, using internal vocabulary ("Track A", "approved, ready for implementation"). That framing is meaningful mid-release but noise to a public reader: "Track A" is not documented anywhere users would see. Retitled as "Schema-typed queries — design rationale" with a pointer to doc/schema.md for user-facing content. Removed the "Tracks B/C/D" reference from the Context section. The file's value — a record of why JSON Schema subset was chosen over a Lambe DSL, why SOptional was added, why disagreement-is-error rather than schema-wins — is preserved. Audit confirmed nothing else tracked reads as internal dev content: * AGENTS.md, AI.md, DESIGN.md, ROADMAP.md — all public by intent. * No HANDOVER_*.md tracked (commit 613803e removed it; .gitignore prevents re-adding). * No bench-results-*.json tracked (.gitignore + .pubignore both catch them). * No secrets (.mcpregistry_* ignored). * No stale binaries (lam-mcp ignored). * No local dep overrides (pubspec_overrides.yaml ignored). The 0.8.0 handover plan is still in git history (commit 93271aa, removed in 613803e). Not cleaning history: the content is planning notes from a committed-then-removed workflow, not secrets, and rewrite would break existing clones. The removal commit itself documents the intent going forward. --- .gitignore | 1 + doc/schema-design.md | 12 +++++++----- 2 files changed, 8 insertions(+), 5 deletions(-) diff --git a/.gitignore b/.gitignore index fcb7d82..4d5cf4b 100644 --- a/.gitignore +++ b/.gitignore @@ -7,6 +7,7 @@ doc/api/ .idea/ *.iml .vscode/ +.claude/ .DS_Store Thumbs.db diff --git a/doc/schema-design.md b/doc/schema-design.md index 900b028..6417b94 100644 --- a/doc/schema-design.md +++ b/doc/schema-design.md @@ -1,13 +1,15 @@ -# Lambe 0.9.0 Track A: Schema-typed queries — design document +# Schema-typed queries — design rationale -Status: **approved**, ready for implementation. +The decisions behind the 0.9.0 schema feature. User-facing documentation +is in [doc/schema.md](schema.md); this file records *why* the design is +what it is, for contributors and curious readers. ## Context 0.9.0 completes the shape feedback loop: declare a shape, check queries -against it, round-trip with JSON Schema tooling. Tracks B/C/D landed -the per-feature polish; track A ships the piece that lets Lambe's -shape system act as a contract between the tool and its users' data. +against it, round-trip with JSON Schema tooling. The schema feature is +the piece that lets Lambe's shape system act as a contract between the +tool and its users' data. The positioning is *"a query language for structured data that shows you what you're working with."* Schemas are how a user tells Lambe From f951f8a8f2929eee828365e83e21f06adf757265 Mon Sep 17 00:00:00 2001 From: Hakim Jonas Ghoula Date: Thu, 21 May 2026 00:13:24 +0200 Subject: [PATCH 22/67] feat(parser+eval): list literals, // alternative, jq-ism keyword aliases MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Three new language features for Lambé queries: 1. **List literals** (`[expr, expr, ...]`) - New `ListConstruct` AST node holding a `List` of member expressions, evaluated against the current context. - Parsed at atom level so it never shadows postfix indexing (`expr[i]`, which requires a prior atom on the left of `[`). - Plus list concatenation: `+` on two lists produces concatenation. Mixed list/scalar `+` is a type error (Lambé strictness over silent lifting); evaluator wrapper `_binaryOp` intercepts before delegating to `applyBinaryOp` for the scalar dispatcher. 2. **`//` alternative operator** (jq-style fallback) - New `Alternative` AST node. `a // b` returns `a`'s value if non-null, else `b`'s. `b` is only evaluated on fallback. - Lambé semantics differ from jq deliberately: jq fires on "null or false"; Lambé fires only on `null`. Genuine `false`/`0`/`""` pass through. - Right-associative; one level above `||` so `a // b // c` means `a // (b // c)`. Built by hand because Lambé's parser combinators ship `chainl1` (left-associative) only. - Doubles as missing-key fallback via null-propagation: `.user.email // .user.contact.email // "unknown"`. - The `/` binary op gets a `notFollowedBy('/')` guard so it doesn't shadow `//`. 3. **Keyword aliases for binary operators** (`and`/`or`/`tonumber`) - `and` parses as `&&`, `or` as `||`. Both keep word-boundary semantics so `.andy` and `.orbit` still tokenize as fields. The result `BinaryOp` node carries the canonical symbol so shape/eval don't see the alias. - `tonumber` parses as the canonical `to_number` pipe op. Registered as a jq-ism alias at the parser layer; shape and evaluator stay alias-unaware. Shape inference (`shape/infer.dart`) and rendering (`shape/explain.dart`) updated for both new AST nodes. All 1,496+ tests pass: 7 new tests for `//` (eval + parser), 5 for list literals (parser), 5 for list literals (eval), 1 for `+` list concat, 6 for jq-ism aliases. --- lib/src/ast.dart | 38 +++++++++++ lib/src/evaluator.dart | 32 ++++++++- lib/src/parser.dart | 102 ++++++++++++++++++++++++---- lib/src/shape/explain.dart | 4 ++ lib/src/shape/infer.dart | 15 +++++ test/evaluator_test.dart | 127 +++++++++++++++++++++++++++++++++++ test/parser_test.dart | 132 +++++++++++++++++++++++++++++++++++++ 7 files changed, 438 insertions(+), 12 deletions(-) diff --git a/lib/src/ast.dart b/lib/src/ast.dart index 5624cf0..89dc67b 100644 --- a/lib/src/ast.dart +++ b/lib/src/ast.dart @@ -373,6 +373,44 @@ final class As extends LamExpr { const As(this.target); } +/// Alternative: `a // b` — evaluate [left]; if it is `null`, evaluate +/// [right] instead. Otherwise return [left]'s result unchanged. +/// +/// Lambé's semantics differ deliberately from jq's: jq's `//` fires on +/// "null or false". Lambé's fires only on `null`. A genuine `false` +/// passes through — matching Lambé's broader strictness stance. +/// +/// Because field access on a missing key already yields `null` via +/// null-propagation, `//` doubles as a missing-key fallback: +/// `.user.email // .user.contact.email // "unknown"`. +final class Alternative extends LamExpr { + /// The primary expression, tried first. + final LamExpr left; + + /// The fallback, evaluated only when [left] yields `null`. + final LamExpr right; + + /// Creates an alternative expression. + const Alternative(this.left, this.right); +} + +/// List construction: `[expr, expr, ...]`. +/// +/// Each [parts] expression is evaluated against the current context +/// and the results are collected into a list. Empty list literals +/// `[]` produce the empty list. +/// +/// Distinct from [Index] (postfix `expr[i]`): list construction has +/// no target on the left, so it can never parse in a context where +/// indexing would apply. +final class ListConstruct extends LamExpr { + /// The member expressions, evaluated per-call against the context. + final List parts; + + /// Creates a list construction. + const ListConstruct(this.parts); +} + /// Conditional expression: `if cond then a else b`. final class Conditional extends LamExpr { /// The condition (must evaluate to bool). diff --git a/lib/src/evaluator.dart b/lib/src/evaluator.dart index 9892a37..34de3c5 100644 --- a/lib/src/evaluator.dart +++ b/lib/src/evaluator.dart @@ -38,7 +38,7 @@ Object? evaluate(LamExpr expr, Object? ctx) => switch (expr) { op, evaluate(operand, ctx), ), - BinaryOp(:final op, :final left, :final right) => applyBinaryOp( + BinaryOp(:final op, :final left, :final right) => _binaryOp( op, evaluate(left, ctx), evaluate(right, ctx), @@ -50,6 +50,10 @@ Object? evaluate(LamExpr expr, Object? ctx) => switch (expr) { asBool(evaluate(condition, ctx), 'if') ? evaluate(then_, ctx) : evaluate(else_, ctx), + Alternative(:final left, :final right) => _alternative(left, right, ctx), + ListConstruct(:final parts) => [ + for (final p in parts) evaluate(p, ctx), + ], StringInterp(:final parts) => _interpolate(parts, ctx), Slice(:final target, :final start, :final end) => _slice( evaluate(target, ctx), @@ -142,6 +146,32 @@ Object? _pipe(Object? input, LamExpr op) { return evaluate(op, input); } +/// Evaluate `left // right`: returns `left`'s value if non-null, +/// otherwise `right`'s value. `right` is only evaluated on fallback, +/// so `.a // someExpensiveFallback` pays nothing when `.a` hits. +Object? _alternative(LamExpr left, LamExpr right, Object? ctx) { + final primary = evaluate(left, ctx); + if (primary != null) return primary; + return evaluate(right, ctx); +} + +/// Lambé's binary-op wrapper. Intercepts `+` on two lists for +/// concatenation; delegates everything else to rumil_expressions' +/// scalar dispatcher. A mixed list/scalar `+` is a type error — +/// Lambé's strictness stance over silent lifting. +Object _binaryOp(String op, Object? l, Object? r) { + if (op == '+' && l is List && r is List) { + return [...l, ...r]; + } + if (op == '+' && (l is List) != (r is List)) { + throw QueryError( + '+: cannot mix list with ${typeName(r is List ? l : r)}; ' + 'coerce one side explicitly.', + ); + } + return applyBinaryOp(op, l, r); +} + List _filter(Object? input, LamExpr predicate) { final list = _asList(input, 'filter'); return [ diff --git a/lib/src/parser.dart b/lib/src/parser.dart index 03d056e..fd6c92f 100644 --- a/lib/src/parser.dart +++ b/lib/src/parser.dart @@ -2,9 +2,10 @@ /// via layered `chainl1` calls. /// /// Grammar structure (lowest to highest precedence): -/// _expr = _logicOr (top-level, lowest precedence) -/// _logicOr = _logicAnd chainl1 '||' -/// _logicAnd = _equality chainl1 '&&' +/// _expr = _alternative (top-level, lowest precedence) +/// _alternative = _logicOr ('//' _logicOr)* right-associative +/// _logicOr = _logicAnd chainl1 '||' | 'or' +/// _logicAnd = _equality chainl1 '&&' | 'and' /// _equality = _comparison chainl1 '==' | '!=' /// _comparison = _additive chainl1 '<' | '>' | '<=' | '>=' /// _additive = _multiplicative chainl1 '+' | '-' @@ -16,7 +17,7 @@ /// | _postfix '[' _expr ']' /// | _atom ) /// _atom = number | string | bool | null | '(' _expr ')' | dotField -/// | objConstruct | conditional | pipe_op +/// | objConstruct | listConstruct | conditional | pipe_op library; import 'package:rumil/rumil.dart'; @@ -158,6 +159,15 @@ final Parser _objConstruct = _sym('{') .thenSkip(_closeBrace) .map((entries) => ObjConstruct(entries) as LamExpr); +/// List literal: `[expr, expr, ...]` or `[]`. +/// +/// Parsed at atom level so it never shadows postfix indexing +/// (`expr[i]`), which requires a prior atom to the left of `[`. +final Parser _listConstruct = _sym('[') + .skipThen(defer(() => _expr).sepBy(_sym(','))) + .thenSkip(_closeBracket) + .map((parts) => ListConstruct(parts) as LamExpr); + final Parser _conditional = _sym('if') .skipThen(_innerExpr) .flatMap( @@ -186,6 +196,7 @@ final Parser _atom = _nullLit | _conditional | _objConstruct | + _listConstruct | _parenExpr | _dotField | _pipeOp; @@ -249,6 +260,14 @@ final Parser _asOp = _sym('as') /// still need an explicit rule here. final Parser _pipeOp = _buildPipeOp(); +/// jq-ism aliases: names agents reach for that map cleanly to an +/// existing Lambé op. Registered at the parser layer so shape/eval +/// stay unaware. Canonical name is what `--print-shape` / `--explain` +/// emit; these just let jq-trained agents land the query. +const Map _jqAliases = { + 'tonumber': 'to_number', +}; + Parser _buildPipeOp() { final alternatives = >[]; for (final spec in shape_ops.pipeOpSpecs) { @@ -265,6 +284,20 @@ Parser _buildPipeOp() { // Custom ops: hand-written rules, in the order the grammar wants // to try them. Currently just `as(fmt)`. alternatives.add(_asOp); + // jq-idiom aliases. Registered last so a canonical spec always wins + // the parse; the alias only fires when nothing else matches. + for (final entry in _jqAliases.entries) { + final canonical = shape_ops.pipeOpInfoForName(entry.value); + if (canonical == null) continue; + switch (canonical.parseKind) { + case shape_ops.PipeOpParseKind.zeroArg: + alternatives.add(_kw(entry.key).as(canonical.zeroArgCtor!())); + case shape_ops.PipeOpParseKind.oneArg: + alternatives.add(_paramOp(entry.key, canonical.oneArgCtor!)); + case shape_ops.PipeOpParseKind.custom: + break; + } + } return alternatives.reduce((a, b) => a | b); } @@ -317,10 +350,29 @@ final Parser _unary = ) | _postfix; -Parser _binOp(String op) => - _sym( - op, - ).as((l, r) => BinaryOp(op, l, r)); +Parser _binOp(String op) { + // `/` must not match the first `/` of `//` (alternative operator). + // Other single-char ops don't have a longer variant that would be + // ambiguous at this level, so we only special-case `/`. + final sym = op == '/' + ? _lex(string('/').thenSkip(char('/').notFollowedBy)) + : _sym(op); + return sym.as( + (l, r) => BinaryOp(op, l, r), + ); +} + +/// Word-boundary binary op for keyword aliases like `and` / `or`. +/// +/// `_sym` matches any substring; for keyword aliases we need +/// `.andy` / `.orbit` to keep working. The result node carries the +/// canonical symbol so shape/eval don't see the alias. +Parser _binOpKw( + String keyword, + String canonical, +) => _kw(keyword).as( + (l, r) => BinaryOp(canonical, l, r), +); Parser _binOps( List ops, @@ -349,8 +401,36 @@ final Parser _equality = _comparison.chainl1( _binOps(['==', '!=']), ); -final Parser _logicAnd = _equality.chainl1(_binOp('&&')); +final Parser _logicAnd = _equality.chainl1( + _binOp('&&') | _binOpKw('and', '&&'), +); + +final Parser _logicOr = _logicAnd.chainl1( + _binOp('||') | _binOpKw('or', '||'), +); + +/// `//` alternative: `a // b` returns `a` if non-null, else `b`. +/// Right-associative, one level above `||` so `a // b // c` means +/// `a // (b // c)`. Built by hand because Lambé's parser combinators +/// ship `chainl1` (left-associative) only. +final Parser _alternative = + _logicOr.flatMap( + (first) => _altTail.many.map( + (tail) { + if (tail.isEmpty) return first; + final all = [first, ...tail]; + LamExpr acc = all.last; + for (var i = all.length - 2; i >= 0; i--) { + acc = Alternative(all[i], acc); + } + return acc; + }, + ), + ); -final Parser _logicOr = _logicAnd.chainl1(_binOp('||')); +/// A single `// expr` suffix. Matched against the `//` symbol directly +/// to avoid ambiguity with `/` (division). +final Parser _altTail = + _sym('//').skipThen(_logicOr); -final Parser _expr = _logicOr; +final Parser _expr = _alternative; diff --git a/lib/src/shape/explain.dart b/lib/src/shape/explain.dart index 709c58a..2ba0e68 100644 --- a/lib/src/shape/explain.dart +++ b/lib/src/shape/explain.dart @@ -452,6 +452,10 @@ String _render(LamExpr expr) => switch (expr) { '${end == null ? '' : _render(end)}]', Conditional(:final condition, :final then_, :final else_) => 'if ${_render(condition)} then ${_render(then_)} else ${_render(else_)}', + Alternative(:final left, :final right) => + '${_render(left)} // ${_render(right)}', + ListConstruct(:final parts) => + '[${parts.map(_render).join(', ')}]', }; /// Render an [ExplainReport] as a plaintext table suitable for stdout. diff --git a/lib/src/shape/infer.dart b/lib/src/shape/infer.dart index 34bbd22..be750d8 100644 --- a/lib/src/shape/infer.dart +++ b/lib/src/shape/infer.dart @@ -100,6 +100,21 @@ Shape inferShape(LamExpr expr, Shape input) { inferShape(else_, input), ), + // `a // b` is either a's shape (when non-null) or b's. Equal + // shapes pass through; otherwise widen. + Alternative(:final left, :final right) => _joinBranches( + inferShape(left, input), + inferShape(right, input), + ), + + // `[e1, e2, ...]` yields `SList(join(parts))`. Empty list literal + // has no element shape, so widen to `SList(SAny)`. + ListConstruct(:final parts) => parts.isEmpty + ? const SList(SAny()) + : SList(parts + .map((p) => inferShape(p, input)) + .reduce(_joinBranches)), + // Pipe ops are handled above via [pipeOpInfoFor]; reaching this // case means the spec table is missing an op AST subtype. Falling // through to [SAny] is the safe default. diff --git a/test/evaluator_test.dart b/test/evaluator_test.dart index 147f95c..e67e49e 100644 --- a/test/evaluator_test.dart +++ b/test/evaluator_test.dart @@ -589,4 +589,131 @@ void main() { expect(result, ['Bob']); }); }); + + group('`//` alternative', () { + test('null falls through', () { + expect(query('.a // .b', {'a': null, 'b': 42}), 42); + }); + + test('non-null wins', () { + expect(query('.a // .b', {'a': 'hi', 'b': 42}), 'hi'); + }); + + test('missing key falls through (via null-propagation)', () { + expect(query('.email // "unknown"', {'name': 'alice'}), 'unknown'); + }); + + test('false is NOT a fallback trigger (Lambé is NOT jq)', () { + expect(query('.active // true', {'active': false}), false); + }); + + test('0 is NOT a fallback trigger', () { + expect(query('.count // 99', {'count': 0}), 0); + }); + + test('empty string is NOT a fallback trigger', () { + expect(query('.s // "default"', {'s': ''}), ''); + }); + + test('chained fallback', () { + expect( + query('.a // .b // .c // "none"', {'a': null, 'b': null, 'c': 'hi'}), + 'hi', + ); + }); + + test('last fallback wins when all are null', () { + expect( + query('.a // .b // "default"', {'a': null, 'b': null}), + 'default', + ); + }); + + test('right expression not evaluated when left is non-null', () { + // .b accesses a field on a null-valued 'a' which would error + // if evaluated. Since .a is "hi", the right side must not run. + expect(query('.a // .b.nested', {'a': 'hi', 'b': null}), 'hi'); + }); + + test('union-schema: email from either shape', () { + final result = queryJson( + '.contacts | map(.email // .contact.email) | filter(. != null)', + '{"contacts":[{"email":"a@x"},{"contact":{"email":"b@y"}},{}]}', + ); + expect(result, ['a@x', 'b@y']); + }); + }); + + group('List literals', () { + test('[] evaluates to empty list', () { + expect(query('[]', {}), []); + }); + + test('[1, 2, 3] evaluates to a list of numbers', () { + expect(query('[1, 2, 3]', {}), [1, 2, 3]); + }); + + test('[.a, .b] projects fields across context', () { + expect(query('[.a, .b]', {'a': 1, 'b': 2}), [1, 2]); + }); + + test('map([.name, .age]) produces pairs', () { + final result = queryJson( + '.users | map([.name, .age])', + '{"users":[{"name":"Alice","age":30},{"name":"Bob","age":25}]}', + ); + expect(result, [ + ['Alice', 30], + ['Bob', 25], + ]); + }); + + test('list literal preserves null values (no implicit filter)', () { + expect(query('[.a, .b, .c]', {'a': 1, 'b': null}), [1, null, null]); + }); + }); + + group('`+` list concatenation', () { + test('[1, 2] + [3] concatenates', () { + expect(query('[1, 2] + [3]', {}), [1, 2, 3]); + }); + + test('.a + .b where both are lists', () { + expect( + query('.a + .b', { + 'a': [1, 2], + 'b': [3, 4], + }), + [1, 2, 3, 4], + ); + }); + + test('empty + non-empty', () { + expect(query('[] + [1]', {}), [1]); + }); + + test('non-empty + empty', () { + expect(query('[1] + []', {}), [1]); + }); + + test('preserves order (left then right)', () { + expect(query('[3, 1] + [2, 4]', {}), [3, 1, 2, 4]); + }); + + test('mixed list + scalar is a type error (strict)', () { + expect(() => query('[1] + 2', {}), throwsA(isA())); + }); + + test('mixed list + null is a type error (strict)', () { + expect(() => query('[1] + null', {}), throwsA(isA())); + }); + + test('numbers still add (no interference)', () { + expect(query('.x + .y', {'x': 1, 'y': 2}), 3); + }); + + test('strings still concatenate (no interference)', () { + expect(query('.x + .y', {'x': 'hi', 'y': 'there'}), 'hithere'); + }); + }); } diff --git a/test/parser_test.dart b/test/parser_test.dart index ce58e20..ea18c2a 100644 --- a/test/parser_test.dart +++ b/test/parser_test.dart @@ -186,6 +186,138 @@ void main() { }); }); + group('jq-ism aliases', () { + test('`and` parses as `&&`', () { + final expr = _parse('.a and .b'); + expect(expr, isA()); + expect((expr as BinaryOp).op, '&&'); + }); + + test('`or` parses as `||`', () { + final expr = _parse('.a or .b'); + expect(expr, isA()); + expect((expr as BinaryOp).op, '||'); + }); + + test('`and` keeps word boundary: .andy is still a field', () { + final expr = _parse('.andy'); + expect(expr, isA()); + expect((expr as Field).name, 'andy'); + }); + + test('`or` keeps word boundary: .orbit is still a field', () { + final expr = _parse('.orbit'); + expect(expr, isA()); + expect((expr as Field).name, 'orbit'); + }); + + test('`tonumber` parses as to_number', () { + final expr = _parse('.x | tonumber'); + expect(expr, isA()); + final pipe = expr as Pipe; + expect(pipe.op, isA()); + }); + + test('`and` precedence: .a or .b and .c == .a or (.b and .c)', () { + final expr = _parse('.a or .b and .c'); + expect(expr, isA()); + final top = expr as BinaryOp; + expect(top.op, '||'); + expect(top.right, isA()); + expect((top.right as BinaryOp).op, '&&'); + }); + }); + + group('`//` alternative', () { + test('.a // .b is Alternative', () { + final expr = _parse('.a // .b'); + expect(expr, isA()); + final alt = expr as Alternative; + expect(alt.left, isA()); + expect(alt.right, isA()); + }); + + test('chained .a // .b // .c is right-associative', () { + final expr = _parse('.a // .b // .c'); + expect(expr, isA()); + final outer = expr as Alternative; + expect((outer.left as Field).name, 'a'); + expect(outer.right, isA()); + final inner = outer.right as Alternative; + expect((inner.left as Field).name, 'b'); + expect((inner.right as Field).name, 'c'); + }); + + test('// does not swallow / (division)', () { + final expr = _parse('.x / .y'); + expect(expr, isA()); + expect((expr as BinaryOp).op, '/'); + }); + + test('// binds looser than ||', () { + // `.a || .b // .c` should parse as `(.a || .b) // .c` + final expr = _parse('.a || .b // .c'); + expect(expr, isA()); + final alt = expr as Alternative; + expect(alt.left, isA()); + expect((alt.left as BinaryOp).op, '||'); + }); + + test('// is lower precedence than pipe | (jq-compatible)', () { + // `.a // .b | length` parses as `.a // (.b | length)` — same as jq. + // Right side of // is a full pipeline expression. + final expr = _parse('.a // .b | length'); + expect(expr, isA()); + expect((expr as Alternative).right, isA()); + }); + + test('parens override: (.a // .b) | length forces the alt first', () { + final expr = _parse('(.a // .b) | length'); + expect(expr, isA()); + expect((expr as Pipe).input, isA()); + }); + }); + + group('List literals', () { + test('[] is empty ListConstruct', () { + final expr = _parse('[]'); + expect(expr, isA()); + expect((expr as ListConstruct).parts, isEmpty); + }); + + test('[1, 2, 3] is a three-part ListConstruct', () { + final expr = _parse('[1, 2, 3]'); + expect(expr, isA()); + final list = expr as ListConstruct; + expect(list.parts.length, 3); + expect(list.parts.every((p) => p is NumLit), true); + }); + + test('[.a, .b] collects fields', () { + final expr = _parse('[.a, .b]'); + expect(expr, isA()); + final list = expr as ListConstruct; + expect(list.parts.length, 2); + expect(list.parts.every((p) => p is Field), true); + }); + + test('list literal at atom level does not conflict with indexing', () { + // `.users[0]` must still parse as Index, not ListConstruct. + final expr = _parse('.users[0]'); + expect(expr, isA()); + }); + + test('pipeline can feed into a list literal', () { + // `.users | map([.name, .age])` — list literal inside map's + // transform expression. + final expr = _parse('.users | map([.name, .age])'); + expect(expr, isA()); + final pipe = expr as Pipe; + expect(pipe.op, isA()); + expect((pipe.op as MapOp).transform, isA()); + }); + }); + group('Pipeline operations', () { test('.users | filter(.age > 30)', () { final expr = _parse('.users | filter(.age > 30)'); From a1be73347f118a70fa4ecac6d57bd1fffa7990f9 Mon Sep 17 00:00:00 2001 From: Hakim Jonas Ghoula Date: Thu, 21 May 2026 00:14:22 +0200 Subject: [PATCH 23/67] feat(errors): jq-idiom hints for parse failures MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Recognises common jq idioms that Lambé does not support and surfaces a targeted hint instead of the generic "expected ..." fallback. Keeps error messages short and actionable for agents trained on jq priors. Recognised idioms: - `.users[]` (jq array iteration) → hint to use `map(...)` for per-element work. - `.foo?` (jq error suppression) → hint to use `has()` or a shape check. - `..` (jq recursive descent) → hint to use explicit paths. - `| select(pred)` (jq filter) → hint to use `filter(...)`. - `map(select(...))` (jq filter idiom) → hint to use `filter(...)`. - `| empty` (jq drop stage) → hint to use `filter(...)` for the intended drop semantics. - `if/then/else/end with empty` (jq conditional drop) → same filter-based hint. - `| if ... then ... else ... end` (jq if-as-pipe-stage) → explains Lambé's expression-only `if/then/else` rule. Two integration points in `lib/lambe.dart`: - `_jqIdiomHint(expression, offset)` — pattern-matches the input and returns a `String?` hint. Wired into `_formatParseErrors` before the verbose "expected ..." fallback, and into `_describeLeftover` for unparsed-remainder context. - `_jqPipeOpHint(word)` — fires when the user writes `.x | ` for a name Lambé doesn't have, mapping to the Lambé equivalent. Plain typos still fall through to the existing did-you-mean (closest-match) suggestion. The hint short-circuits only when the jq idiom is recognised. 10 tests in `test/parse_error_format_test.dart` cover each idiom, including the falls-through case where did-you-mean still fires. --- lib/lambe.dart | 118 ++++++++++++++++++++++++++++++ test/parse_error_format_test.dart | 98 +++++++++++++++++++++++++ 2 files changed, 216 insertions(+) diff --git a/lib/lambe.dart b/lib/lambe.dart index e733bc9..2c1cbf6 100644 --- a/lib/lambe.dart +++ b/lib/lambe.dart @@ -281,6 +281,15 @@ String _formatParseErrors(String expression, List errors) { } } + // Before the verbose "expected ..." fallback, check whether the + // failure matches a recognisable jq idiom and surface a targeted + // hint instead. Keeps the error short and actionable for agents + // trained on jq priors. + final idiom = _jqIdiomHint(expression, offset); + if (idiom != null) { + return _renderParseError(expression, line, col, idiom); + } + final what = expected.isEmpty ? 'unexpected input' @@ -335,6 +344,8 @@ String _describeLeftover(String expression, int offset) { if (after.isEmpty) return 'unexpected | at end of expression'; final word = after.split(RegExp(r'[^a-zA-Z_]')).first; if (word.isNotEmpty && !parser_.pipeOpNames.contains(word)) { + final jqHint = _jqPipeOpHint(word); + if (jqHint != null) return 'unknown operation "$word" after |\n help: $jqHint'; final suggestion = _closestMatch(word, parser_.pipeOpNames); final hint = suggestion != null ? '\n help: did you mean "$suggestion"?' : ''; @@ -342,11 +353,118 @@ String _describeLeftover(String expression, int offset) { } return 'unexpected input after |'; } + final idiom = _jqIdiomHint(expression, offset); + if (idiom != null) return idiom; final token = rest.split(RegExp(r'\s')).first; if (token.isNotEmpty) return 'unexpected "$token"'; return 'unexpected input'; } +/// Hint for a jq pipe-op name that Lambé does not support. Returns null +/// for unknown names. +/// +/// Fires when the model wrote `.x | empty` or `.x | select(...)` — +/// jq pipe stages Lambé rejects. The hint points at the Lambé +/// equivalent so the retry lands the right idiom. +String? _jqPipeOpHint(String word) { + switch (word) { + case 'select': + return '`select(pred)` only works inside `filter(...)`; ' + 'write `filter(pred)` as the pipe stage instead.'; + case 'empty': + return '`empty` does not exist in Lambé. ' + 'Use `filter(pred)` to drop items that fail a predicate.'; + case 'if': + return '`if/then/else/end` is not a pipe stage in Lambé. ' + 'Use it as an expression inside `map(...)` or `filter(...)`, ' + 'or replace it with `filter(pred)`.'; + case 'not': + return '`not` is a prefix in Lambé: write `!pred`.'; + default: + return null; + } +} + +/// Hint for a jq idiom detected at [offset] in [expression]. Returns +/// null if the surrounding context doesn't match a known pattern. +/// +/// Recognises: +/// - `[]` iterate-all (Lambé has no iterate-all; use `map(...)`). +/// - `?` optional suffix (no optional-suffix; filter or shape-check). +/// - `..` recursive descent (no recursive descent; explicit paths). +/// - `select(...)` in non-filter position (only valid inside +/// `filter(...)`). +/// - `empty` keyword (no `empty`; use `filter(pred)`). +/// - `end` from a stranded `if/then/else/end` tail. +/// - `//` alternative operator (no `//` in Lambé; use `if` or +/// `filter`). +String? _jqIdiomHint(String expression, int offset) { + // `.users[]`: parser expected an index expression after `[` and + // failed on `]`. Detect by: offset points at `]` and the previous + // non-whitespace char is `[`. + if (offset < expression.length && expression[offset] == ']') { + final before = expression.substring(0, offset).trimRight(); + if (before.endsWith('[')) { + return 'Lambé has no `[]` iterate-all. ' + 'Use `map(.)` to fan out, or `map(.field)` to project. ' + 'E.g. `.users | map(.name)` not `.users[].name`, ' + '`.items | map(.spec.containers) | flatten | map(.name)` ' + 'for nested fan-out.'; + } + } + // `.foo?`: `?` immediately after an identifier or bracket. + if (offset < expression.length && expression[offset] == '?') { + return 'Lambé has no `?` optional-path suffix. ' + 'Use `filter(has("foo")) | .foo`, or check the shape with ' + '`--print-shape` (CLI) / `lambe_print_shape` (MCP) first.'; + } + // `..`: second `.` with no identifier. + if (offset < expression.length && expression[offset] == '.') { + final before = expression.substring(0, offset).trimRight(); + if (before.endsWith('.') && + !before.endsWith('..')) { + return 'Lambé has no `..` recursive descent. ' + 'Use explicit paths; combine `map(...)` and `flatten` for ' + 'nested fan-out.'; + } + } + final rest = expression.substring(offset).trimLeft(); + // `select(...)` in non-filter position. Fires anywhere — inside + // `map(...)`, at top level, in the middle of a pipeline — since + // `select` is only valid inside `filter(...)` in Lambé. + if (rest.startsWith('select(') || rest == 'select' || + (rest.startsWith('select') && + rest.length >= 7 && + !_isIdentChar(rest.codeUnitAt(6)))) { + return '`select(pred)` is only valid inside `filter(...)` in ' + 'Lambé. Replace `map(select(pred))` with `filter(pred)`, and ' + '`map(select(pred) | .field)` with ' + '`filter(pred) | map(.field)`.'; + } + // `empty` keyword. Similar: may appear inside `map(if ... then ... else empty end)`. + if (rest.startsWith('empty') && + (rest.length == 5 || !_isIdentChar(rest.codeUnitAt(5)))) { + return 'Lambé has no `empty` keyword. ' + 'Drop items with `filter(pred)` instead of ' + '`map(if pred then x else empty end)`.'; + } + // `end` from a stranded `if/then/else/end`. + if (rest.startsWith('end') && + (rest.length == 3 || + !_isIdentChar(rest.codeUnitAt(3)))) { + return '`if/then/else/end` is an expression in Lambé, not a pipe ' + 'stage. Use it inside `map(...)` / `filter(...)`, and drop ' + 'the `end` keyword — Lambé terminates `if` at the else branch.'; + } + return null; +} + +bool _isIdentChar(int code) => + (code >= 0x30 && code <= 0x39) || // 0-9 + (code >= 0x41 && code <= 0x5a) || // A-Z + (code >= 0x61 && code <= 0x7a) || // a-z + code == 0x5f; // _ + String? _closestMatch(String input, List candidates) { final maxDist = (input.length / 2).ceil().clamp(1, 3); String? best; diff --git a/test/parse_error_format_test.dart b/test/parse_error_format_test.dart index 21d3597..fda3a35 100644 --- a/test/parse_error_format_test.dart +++ b/test/parse_error_format_test.dart @@ -156,4 +156,102 @@ void main() { } }); }); + + group('jq-idiom hints', () { + test('.users[] suggests map()', () { + try { + parseAst('.users[]'); + fail('expected parse to fail'); + } on QueryError catch (e) { + expect(e.message, contains('no `[]` iterate-all')); + expect(e.message, contains('map(.name)')); + } + }); + + test('.items[].name suggests map()', () { + try { + parseAst('.items[].name'); + fail('expected parse to fail'); + } on QueryError catch (e) { + expect(e.message, contains('no `[]` iterate-all')); + } + }); + + test('.foo? suggests has() / shape-check', () { + try { + parseAst('.foo?'); + fail('expected parse to fail'); + } on QueryError catch (e) { + expect(e.message, contains('no `?` optional-path suffix')); + expect(e.message, contains('has(')); + } + }); + + test('.. suggests explicit paths', () { + try { + parseAst('..'); + fail('expected parse to fail'); + } on QueryError catch (e) { + expect(e.message, contains('no `..` recursive descent')); + } + }); + + test('| select(pred) suggests filter()', () { + try { + parseAst('.x | select(.active)'); + fail('expected parse to fail'); + } on QueryError catch (e) { + expect(e.message, contains('select')); + expect(e.message, contains('filter')); + } + }); + + test('map(select(...)) suggests filter()', () { + try { + parseAst('.users | map(select(.active))'); + fail('expected parse to fail'); + } on QueryError catch (e) { + expect(e.message, contains('only valid inside `filter')); + } + }); + + test('| empty suggests filter()', () { + try { + parseAst('.x | empty'); + fail('expected parse to fail'); + } on QueryError catch (e) { + expect(e.message, contains('`empty` does not exist')); + expect(e.message, contains('filter')); + } + }); + + test('if/then/else/end with empty suggests filter()', () { + try { + parseAst('.x | map(if .a then .a else empty end)'); + fail('expected parse to fail'); + } on QueryError catch (e) { + expect(e.message, contains('no `empty` keyword')); + } + }); + + + test('| if as pipe stage explains the expression-only rule', () { + try { + parseAst('.x | if . > 0 then . else null end'); + fail('expected parse to fail'); + } on QueryError catch (e) { + expect(e.message, contains('if/then/else/end')); + expect(e.message, contains('expression')); + } + }); + + test('did-you-mean still fires for plain typos', () { + try { + parseAst('.users | filtre(.age)'); + fail('expected parse to fail'); + } on QueryError catch (e) { + expect(e.message, contains('did you mean "filter"?')); + } + }); + }); } From 29658925a162b10015de6aa85fc6b8cdb217d06a Mon Sep 17 00:00:00 2001 From: Hakim Jonas Ghoula Date: Mon, 18 May 2026 22:51:39 +0200 Subject: [PATCH 24/67] perf(parser): migrate operator precedence from chainl1 ladder to Pratt MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Replace the six layered chainl1 calls plus the recursive _unary definition with a single pratt(_postfix, [...]) call covering prefix unary -/!, the six binary precedence levels, and the right-associative // alternative. The if/then/else conditional stays inside _atom rather than as a Pratt operator because its three-branch shape doesn't fit infix dispatch. Binding powers (low to high): // alternative (right-assoc) 5 ||, or 10 &&, and 20 ==, != 30 <=, >=, <, > 40 +, - 50 *, /, % 60 prefix -, ! 70 The / operator keeps its notFollowedBy(/) guard so it doesn't shadow the // alternative; keyword aliases and / or use _kw(...) (word boundary) so .andy / .orbit don't tokenize as 'and y' / 'or bit'. Bench numbers (tool/bench/run.dart --aot --runs 5, completer scenarios across 5 shapes x 4 sizes): vs rumil 0.6 + chainl1 baseline mean +7.1%, median +5.1% vs rumil 0.7 + chainl1 (just rumil bump) vs rumil 0.7 + Pratt (this commit) mean -10.1%, median -8.7% Net change for Lambé queries ~17% faster on the completer hot path vs the chainl1 baseline that shipped with rumil 0.6. The win comes from collapsing six chainl1 dispatch layers into one Pratt loop plus eliminating the defer(() => _unary) recursion via the explicit Prefix descriptor. The opTable fast path is not engaged here because operators are wrapped with _sym/_kw; the gain is structural. All 1,496 lambe tests pass unchanged. No public API change (parseQuery / parsePartial signatures untouched). --- lib/src/parser.dart | 165 +++++++++++++++++--------------------------- 1 file changed, 63 insertions(+), 102 deletions(-) diff --git a/lib/src/parser.dart b/lib/src/parser.dart index fd6c92f..ec68827 100644 --- a/lib/src/parser.dart +++ b/lib/src/parser.dart @@ -1,16 +1,26 @@ /// Query parser. Left-recursive grammar via `rule()`, operator precedence -/// via layered `chainl1` calls. +/// via the `pratt` combinator. /// -/// Grammar structure (lowest to highest precedence): -/// _expr = _alternative (top-level, lowest precedence) -/// _alternative = _logicOr ('//' _logicOr)* right-associative -/// _logicOr = _logicAnd chainl1 '||' | 'or' -/// _logicAnd = _equality chainl1 '&&' | 'and' -/// _equality = _comparison chainl1 '==' | '!=' -/// _comparison = _additive chainl1 '<' | '>' | '<=' | '>=' -/// _additive = _multiplicative chainl1 '+' | '-' -/// _multiplicative = _unary chainl1 '*' | '/' | '%' -/// _unary = ('-' | '!') _unary | _postfix +/// Grammar structure: +/// _expr = _operators (top-level) +/// _operators = pratt(_postfix, [ +/// // prefix unary at bp 70 +/// Prefix('-', 70), Prefix('!', 70), +/// // multiplicative (left-assoc) bp 60 +/// *, /, % +/// // additive (left-assoc) bp 50 +/// +, - +/// // comparison (left-assoc) bp 40 +/// <=, >=, <, > +/// // equality (left-assoc) bp 30 +/// ==, != +/// // logic and (left-assoc) bp 20 +/// &&, and +/// // logic or (left-assoc) bp 10 +/// ||, or +/// // alternative (right-assoc) bp 5 +/// // +/// ]) /// _postfix = rule( (left-recursive via Warth) /// _postfix '|' pipe_op /// | _postfix '.' ident @@ -343,94 +353,45 @@ final Parser _postfix = rule( _atom, ); -final Parser _unary = - (_sym('-').as('-') | _sym('!').as('!')).flatMap( - (op) => - defer(() => _unary).map((operand) => UnaryOp(op, operand) as LamExpr), - ) | - _postfix; - -Parser _binOp(String op) { - // `/` must not match the first `/` of `//` (alternative operator). - // Other single-char ops don't have a longer variant that would be - // ambiguous at this level, so we only special-case `/`. - final sym = op == '/' - ? _lex(string('/').thenSkip(char('/').notFollowedBy)) - : _sym(op); - return sym.as( - (l, r) => BinaryOp(op, l, r), - ); -} - -/// Word-boundary binary op for keyword aliases like `and` / `or`. -/// -/// `_sym` matches any substring; for keyword aliases we need -/// `.andy` / `.orbit` to keep working. The result node carries the -/// canonical symbol so shape/eval don't see the alias. -Parser _binOpKw( - String keyword, - String canonical, -) => _kw(keyword).as( - (l, r) => BinaryOp(canonical, l, r), -); - -Parser _binOps( - List ops, -) { - var p = _binOp(ops.first); - for (var i = 1; i < ops.length; i++) { - p = p | _binOp(ops[i]); - } - return p; -} - -final Parser _multiplicative = _unary.chainl1( - _binOps(['*', '/', '%']), -); - -final Parser _additive = _multiplicative.chainl1( - _binOps(['+', '-']), -); - -final Parser _comparison = () { - final ops = _binOp('<=') | _binOp('>=') | _binOp('<') | _binOp('>'); - return _additive.chainl1(ops); -}(); - -final Parser _equality = _comparison.chainl1( - _binOps(['==', '!=']), -); - -final Parser _logicAnd = _equality.chainl1( - _binOp('&&') | _binOpKw('and', '&&'), -); - -final Parser _logicOr = _logicAnd.chainl1( - _binOp('||') | _binOpKw('or', '||'), -); - -/// `//` alternative: `a // b` returns `a` if non-null, else `b`. -/// Right-associative, one level above `||` so `a // b // c` means -/// `a // (b // c)`. Built by hand because Lambé's parser combinators -/// ship `chainl1` (left-associative) only. -final Parser _alternative = - _logicOr.flatMap( - (first) => _altTail.many.map( - (tail) { - if (tail.isEmpty) return first; - final all = [first, ...tail]; - LamExpr acc = all.last; - for (var i = all.length - 2; i >= 0; i--) { - acc = Alternative(all[i], acc); - } - return acc; - }, - ), - ); - -/// A single `// expr` suffix. Matched against the `//` symbol directly -/// to avoid ambiguity with `/` (division). -final Parser _altTail = - _sym('//').skipThen(_logicOr); - -final Parser _expr = _alternative; +/// `/` must not match the first `/` of `//` (alternative operator). Other +/// single-char ops don't have a longer variant that would be ambiguous at +/// the binary-operator level, so only `/` needs a notFollowedBy guard. +final Parser _divSym = + _lex(string('/').thenSkip(char('/').notFollowedBy)); + +LamExpr _binOp(String op, LamExpr a, LamExpr b) => BinaryOp(op, a, b); + +/// Single Pratt parse covering prefix unary, six binary precedence levels, +/// and the right-associative `//` alternative. The conditional (`if/then/ +/// else`) is parsed inside `_atom` rather than as a Pratt operator because +/// its three-branch shape doesn't fit infix dispatch. +final Parser _operators = pratt(_postfix, [ + // Alternative (right-associative, lowest precedence). + InfixRight(_sym('//'), 5, Alternative.new), + // Logical OR. + InfixLeft(_sym('||'), 10, (LamExpr a, LamExpr b) => _binOp('||', a, b)), + InfixLeft(_kw('or'), 10, (LamExpr a, LamExpr b) => _binOp('||', a, b)), + // Logical AND. + InfixLeft(_sym('&&'), 20, (LamExpr a, LamExpr b) => _binOp('&&', a, b)), + InfixLeft(_kw('and'), 20, (LamExpr a, LamExpr b) => _binOp('&&', a, b)), + // Equality. + InfixLeft(_sym('=='), 30, (LamExpr a, LamExpr b) => _binOp('==', a, b)), + InfixLeft(_sym('!='), 30, (LamExpr a, LamExpr b) => _binOp('!=', a, b)), + // Comparison. + InfixLeft(_sym('<='), 40, (LamExpr a, LamExpr b) => _binOp('<=', a, b)), + InfixLeft(_sym('>='), 40, (LamExpr a, LamExpr b) => _binOp('>=', a, b)), + InfixLeft(_sym('<'), 40, (LamExpr a, LamExpr b) => _binOp('<', a, b)), + InfixLeft(_sym('>'), 40, (LamExpr a, LamExpr b) => _binOp('>', a, b)), + // Additive. + InfixLeft(_sym('+'), 50, (LamExpr a, LamExpr b) => _binOp('+', a, b)), + InfixLeft(_sym('-'), 50, (LamExpr a, LamExpr b) => _binOp('-', a, b)), + // Multiplicative. + InfixLeft(_sym('*'), 60, (LamExpr a, LamExpr b) => _binOp('*', a, b)), + InfixLeft(_divSym, 60, (LamExpr a, LamExpr b) => _binOp('/', a, b)), + InfixLeft(_sym('%'), 60, (LamExpr a, LamExpr b) => _binOp('%', a, b)), + // Prefix unary (highest precedence). + Prefix(_sym('-'), 70, (LamExpr e) => UnaryOp('-', e)), + Prefix(_sym('!'), 70, (LamExpr e) => UnaryOp('!', e)), +]); + +final Parser _expr = _operators; From 8529abaf7438835a3b2aeb7b356dac8c1afe5272 Mon Sep 17 00:00:00 2001 From: Hakim Jonas Ghoula Date: Tue, 19 May 2026 08:46:08 +0200 Subject: [PATCH 25/67] refactor(parser): use rumil's cFamilyPrecedence preset MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Replaces the inline 13-operator infix ladder with a single call to rumil's new cFamilyPrecedence preset. Lambé-specific operators stay inline: - The right-associative `//` alternative (no C-family analogue) - Keyword aliases `and` / `or` for `&&` / `||` (Lambé extension) - The `/` notFollowedBy(/) guard, supplied via sym dispatch Functionally equivalent to the previous hand-rolled list; bench numbers within noise of pre-preset (mean -9.7% vs -10.1% on the completer matrix). All 1,496 lambe tests pass unchanged. --- lib/src/parser.dart | 53 +++++++++++++++++++-------------------------- 1 file changed, 22 insertions(+), 31 deletions(-) diff --git a/lib/src/parser.dart b/lib/src/parser.dart index ec68827..6c702d0 100644 --- a/lib/src/parser.dart +++ b/lib/src/parser.dart @@ -359,39 +359,30 @@ final Parser _postfix = rule( final Parser _divSym = _lex(string('/').thenSkip(char('/').notFollowedBy)); -LamExpr _binOp(String op, LamExpr a, LamExpr b) => BinaryOp(op, a, b); - -/// Single Pratt parse covering prefix unary, six binary precedence levels, -/// and the right-associative `//` alternative. The conditional (`if/then/ -/// else`) is parsed inside `_atom` rather than as a Pratt operator because -/// its three-branch shape doesn't fit infix dispatch. +/// Lambé's symbol parser routing: `/` requires a not-followed-by guard +/// so it doesn't shadow the `//` alternative; everything else is a +/// whitespace-tolerant `_sym(...)`. +Parser _opSym(String s) => s == '/' ? _divSym : _sym(s); + +/// Single Pratt parse covering prefix unary, the six binary precedence +/// levels supplied by [cFamilyPrecedence], plus Lambé extensions: the +/// right-associative `//` alternative at the bottom, and the keyword +/// aliases `and` / `or`. The conditional (`if/then/else`) is parsed +/// inside `_atom` rather than as a Pratt operator because its +/// three-branch shape doesn't fit infix dispatch. final Parser _operators = pratt(_postfix, [ - // Alternative (right-associative, lowest precedence). + // Alternative (right-associative, below `||`). InfixRight(_sym('//'), 5, Alternative.new), - // Logical OR. - InfixLeft(_sym('||'), 10, (LamExpr a, LamExpr b) => _binOp('||', a, b)), - InfixLeft(_kw('or'), 10, (LamExpr a, LamExpr b) => _binOp('||', a, b)), - // Logical AND. - InfixLeft(_sym('&&'), 20, (LamExpr a, LamExpr b) => _binOp('&&', a, b)), - InfixLeft(_kw('and'), 20, (LamExpr a, LamExpr b) => _binOp('&&', a, b)), - // Equality. - InfixLeft(_sym('=='), 30, (LamExpr a, LamExpr b) => _binOp('==', a, b)), - InfixLeft(_sym('!='), 30, (LamExpr a, LamExpr b) => _binOp('!=', a, b)), - // Comparison. - InfixLeft(_sym('<='), 40, (LamExpr a, LamExpr b) => _binOp('<=', a, b)), - InfixLeft(_sym('>='), 40, (LamExpr a, LamExpr b) => _binOp('>=', a, b)), - InfixLeft(_sym('<'), 40, (LamExpr a, LamExpr b) => _binOp('<', a, b)), - InfixLeft(_sym('>'), 40, (LamExpr a, LamExpr b) => _binOp('>', a, b)), - // Additive. - InfixLeft(_sym('+'), 50, (LamExpr a, LamExpr b) => _binOp('+', a, b)), - InfixLeft(_sym('-'), 50, (LamExpr a, LamExpr b) => _binOp('-', a, b)), - // Multiplicative. - InfixLeft(_sym('*'), 60, (LamExpr a, LamExpr b) => _binOp('*', a, b)), - InfixLeft(_divSym, 60, (LamExpr a, LamExpr b) => _binOp('/', a, b)), - InfixLeft(_sym('%'), 60, (LamExpr a, LamExpr b) => _binOp('%', a, b)), - // Prefix unary (highest precedence). - Prefix(_sym('-'), 70, (LamExpr e) => UnaryOp('-', e)), - Prefix(_sym('!'), 70, (LamExpr e) => UnaryOp('!', e)), + // Standard C-family operators. + ...cFamilyPrecedence( + sym: _opSym, + binary: BinaryOp.new, + unary: UnaryOp.new, + ), + // Lambé-specific keyword aliases for && / ||. _kw enforces a word + // boundary so `.andy` / `.orbit` keep tokenizing as identifiers. + InfixLeft(_kw('and'), 20, (LamExpr a, LamExpr b) => BinaryOp('&&', a, b)), + InfixLeft(_kw('or'), 10, (LamExpr a, LamExpr b) => BinaryOp('||', a, b)), ]); final Parser _expr = _operators; From f0f6f4bafa0bd7d00759e275661108c2b07e5956 Mon Sep 17 00:00:00 2001 From: Hakim Jonas Ghoula Date: Thu, 21 May 2026 00:17:06 +0200 Subject: [PATCH 26/67] chore(deps): bump rumil family to ^0.7.0; drop pubspec_overrides MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit rumil 0.7.0 (and rumil_parsers, rumil_expressions) published to pub.dev. Lambé now resolves these from pub.dev directly rather than via the local-path override that carried us through the 0.7 development cycle. Constraints: - rumil: ^0.6.0 -> ^0.7.0 - rumil_parsers: ^0.6.0 -> ^0.7.0 - rumil_expressions: ^0.6.0 -> ^0.7.0 pubspec_overrides.yaml is removed (it's gitignored, so this is a local-file deletion only). Future contributors clone and `dart pub get` resolves real published packages. All 1,496 lambe tests pass against the published rumil 0.7 family. --- pubspec.yaml | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/pubspec.yaml b/pubspec.yaml index d14f4b5..f9b5d2a 100644 --- a/pubspec.yaml +++ b/pubspec.yaml @@ -16,9 +16,9 @@ environment: sdk: ^3.7.0 dependencies: - rumil: ^0.6.0 - rumil_parsers: ^0.6.0 - rumil_expressions: ^0.6.0 + rumil: ^0.7.0 + rumil_parsers: ^0.7.0 + rumil_expressions: ^0.7.0 args: ^2.6.0 dart_mcp: ^0.5.0 From 24702e8cdc87666254943160c588162c1867f278 Mon Sep 17 00:00:00 2001 From: Hakim Jonas Ghoula Date: Thu, 21 May 2026 00:30:53 +0200 Subject: [PATCH 27/67] chore: gitignore *.scratch.md for local scratch notes Mirrors the same convention added to rumil-dart's .gitignore. Lets release-planning notes, status snapshots, and similar working-memory documents live in the repo for discoverability without ever getting committed. --- .gitignore | 3 +++ 1 file changed, 3 insertions(+) diff --git a/.gitignore b/.gitignore index 4d5cf4b..6deb7f3 100644 --- a/.gitignore +++ b/.gitignore @@ -27,3 +27,6 @@ bench-results-*.json # Session handover notes (internal workflow, not code) HANDOVER_*.md + +# Local scratch notes (release planning, status snapshots, etc.) +*.scratch.md From e7dc1612b57e5ce21d5ffcf52a41ca2fa9c0f2f7 Mon Sep 17 00:00:00 2001 From: Hakim Jonas Ghoula Date: Thu, 21 May 2026 20:18:55 +0200 Subject: [PATCH 28/67] 0.9.0: pipe-op AST consolidation + REPL highlighter migration + Tier A followups MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Bundles steps 1–8 of LAMBE_0.9.0_PLAN with Tier A items A1–A7. The 27 per-op AST classes collapse into a single BuiltinPipeOp(name, args) backed by an extended pipe_ops.dart spec table that owns acceptance, shape inference, runtime evaluation, and parse arity on one record. Adding or renaming a pipe op is now a one-file change. As(target) keeps a dedicated AST class for its typed OutputFormat argument. REPL highlighter migrated from a 100-line hand-rolled tokenizer to a rumil_tokens LangGrammar defined in lib/src/highlight_grammar.dart. New runtime dependency: rumil_tokens ^0.1.0. Other 0.9.0 wins: - _normalize short-circuits canonical inputs (identity-pass for Map / List / scalars) - queryNdjsonString(lines, expression) convenience added - Six doc-precision fixes inlined into pipe_ops.dart and evaluator.dart (// is null-fallback; empty-list policy; unique distinguishes int/double; duplicate-key behaviour; from_entries rejects non-map / non-string-key entries explicitly; type rejects non-JSON values with a hint) - inferSchema @Deprecated annotation already in place Tier A followups from the discovery session: - TSV input honors header rows the same way CSV does (input.dart now runs detectDialect with the tab delimiter forced) - String single-char indexing: .name[0] returns "a" instead of erroring (mirrors slice semantics; out-of-range returns null) - jq alias: add → sum - Stale // line removed from _jqIdiomHint doc - doc/getting-started.md pubspec snippet bumped to ^0.9.0 - doc/syntax.md bare-literal examples rewritten as runnable echo/lam invocations (every rewritten example verified) - CHANGELOG appended for both batches 1516 tests pass (1500 baseline + 16 new). pana 160/160. dart analyze clean (one pre-existing test warning at evaluator_test.dart:646). --- CHANGELOG.md | 88 ++++- doc/getting-started.md | 4 +- doc/syntax.md | 70 +++- lib/lambe.dart | 79 ++++- lib/src/ast.dart | 204 +---------- lib/src/completer.dart | 22 +- lib/src/evaluator.dart | 322 ++---------------- lib/src/highlight_grammar.dart | 25 ++ lib/src/input.dart | 19 +- lib/src/parser.dart | 48 ++- lib/src/readline.dart | 151 +++------ lib/src/shape/explain.dart | 55 +-- lib/src/shape/infer.dart | 9 +- lib/src/shape/pipe_ops.dart | 507 ++++++++++++++++++++-------- pubspec.yaml | 1 + test/evaluator_test.dart | 5 +- test/ndjson_test.dart | 27 ++ test/normalize_test.dart | 24 ++ test/parse_error_format_test.dart | 1 - test/parser_test.dart | 63 ++-- test/pipe_ops_consistency_test.dart | 135 ++++---- test/ring4_test.dart | 6 +- test/ring5_test.dart | 6 +- test/string_indexing_test.dart | 57 ++++ test/tsv_headers_test.dart | 83 +++++ 25 files changed, 1065 insertions(+), 946 deletions(-) create mode 100644 lib/src/highlight_grammar.dart create mode 100644 test/string_indexing_test.dart create mode 100644 test/tsv_headers_test.dart diff --git a/CHANGELOG.md b/CHANGELOG.md index 912060e..3bae50d 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -2,8 +2,92 @@ Closes the shape feedback loop. Declare a JSON Schema, check queries against it, round-trip schemas with the ecosystem. Plus: richer -static analysis in `--explain`, line-delimited JSON input, and an -opt-in CSV escape hatch for nested cells. +static analysis in `--explain`, line-delimited JSON input, an opt-in +CSV escape hatch for nested cells, an architectural pipe-op +consolidation, and a `rumil_tokens`-based REPL highlighter. + +### Pipe-op AST consolidation + +- The 27 per-op AST classes (`FilterOp`, `MapOp`, `SortOp`, …) + collapse into a single `BuiltinPipeOp(name, args)`. The spec table + in `pipe_ops.dart` is now the only place per-op behaviour lives: + acceptance, shape inference, runtime evaluation, and parse arity + all live on the same record. Adding or renaming a pipe op is a + one-file change. +- `As(target)` keeps a dedicated AST class for its typed + `OutputFormat` argument — it's the only custom-arity op. +- `pipeOpInfoFor(LamExpr)` recognises both `BuiltinPipeOp` and `As`. +- Source-breaking for external code that constructed pipe-op AST + nodes directly. The pre-1.0 contract here was that AST classes + were internals; we're taking that out properly. Tests that + assembled `MapOp(.x)` etc. now write `BuiltinPipeOp('map', [.x])`. + +### REPL syntax highlighter on `rumil_tokens` + +- `lib/src/readline.dart`'s 100-line hand-rolled tokenizer is gone. + The highlighter now consumes a `Token` stream from the + `rumil_tokens` `LangGrammar` defined in + `lib/src/highlight_grammar.dart`. The grammar lives in lambé (not + in `rumil_tokens`' built-in five) because it's lambé-specific. +- New runtime dependency: `rumil_tokens ^0.1.0`. +- Visible behavioural change in the REPL: `.field` colours as two + tokens (`.` punctuation + `field` identifier) rather than one + cyan run; negative literals colour as `-` operator + number + rather than one yellow run. The audit determined the new + behaviour is more principled; the visual effect is subtle. + +### `queryNdjsonString` convenience + +- New `queryNdjsonString(Iterable lines, String expression)` + parses the expression once and delegates to `queryNdjson`. Resolves + the asymmetry where the existing `queryNdjson` took a pre-parsed + AST while every other `query*` took a string. + +### Performance + +- `_normalize` short-circuits canonical inputs. + `Map` / `List` / scalars round-trip + through the public API without allocating a copy. Non-canonical + inputs (e.g. `Map` from some YAML decoders) + still rebuild as before. + +### Documentation precision + +- Six per-op behavioural details now have load-bearing docstrings: + `//` is a null-fallback (not an error-handler), the empty-list + policy (`first`/`last` return null; `min`/`max`/`avg` throw; `sum` + returns 0), `unique` distinguishes int from double by canonical + encoding, duplicate keys in `{a: x, a: y}` follow Dart map literal + semantics (last wins), `from_entries` rejects non-map / non-string- + key entries explicitly (was silent skip), `type` rejects non-JSON + runtime values with a hint pointing at `parseInput` / `jsonDecode`. +- The `from_entries` change is the only behavioural one — non-map + entries used to be dropped silently, now they throw `QueryError`. + Hides a class of bugs where upstream pipelines emit the wrong + shape. + +### Bug fixes + +- **TSV input now honors header rows the same way CSV does.** Pre-0.9.0 + every TSV file returned `List>` because the parser + passed a static `defaultTsvConfig` and skipped dialect detection. + Now `parseInput` runs `detectDialect` for TSV with the tab + delimiter forced, so files where the first row looks like headers + return `List>`. `--print-shape data.tsv` and + `--print-shape data.csv` agree on logical content. +- **String single-char indexing.** `.name[0]` now returns a + one-character substring instead of erroring with `Cannot index + string`. Slicing (`.name[0:3]`) already worked; the asymmetry is + gone. Out-of-range returns `null` (mirrors list indexing); + non-int still throws. + +### jq compatibility + +- **`add` is now recognized as an alias for `sum`.** A jq idiom that + matches Lambé's `sum` exactly. `_jqAliases` in `parser.dart` is the + table; entries belong there only when the jq semantics are an + exact match. Other unsupported jq idioms still surface a + "did you mean" hint or an explanatory message via `_jqIdiomHint`. ### Schemas as a first-class contract diff --git a/doc/getting-started.md b/doc/getting-started.md index 9c666bc..4f4358b 100644 --- a/doc/getting-started.md +++ b/doc/getting-started.md @@ -195,7 +195,7 @@ For exploring unfamiliar data, use interactive mode: ```bash $ lam -i data.json -lambe v0.8.0 - type :help for commands, :q to quit +lambe v0.9.0 - type :help for commands, :q to quit Data loaded: {2 fields, 3 users} lambe> @@ -250,7 +250,7 @@ Add to your `pubspec.yaml`: ```yaml dependencies: - lambe: ^0.7.0 + lambe: ^0.9.0 ``` ## Next steps diff --git a/doc/syntax.md b/doc/syntax.md index c28196e..1517f5f 100644 --- a/doc/syntax.md +++ b/doc/syntax.md @@ -288,8 +288,12 @@ Group elements by a key. Returns `[{key, values}]`. Remove duplicate values. ``` -[1, 2, 2, 3, 1] | unique --> [1, 2, 3] +$ echo '[1, 2, 2, 3, 1]' | lam '. | unique' +[ + 1, + 2, + 3 +] ``` ### unique_by(key) @@ -306,8 +310,14 @@ Remove duplicates by a key expression. Flatten one level of nesting. ``` -[[1, 2], [3, 4], [5]] | flatten --> [1, 2, 3, 4, 5] +$ echo '[[1, 2], [3, 4], [5]]' | lam '. | flatten' +[ + 1, + 2, + 3, + 4, + 5 +] ``` ### reverse @@ -402,8 +412,10 @@ Convert between maps and `[{key, value}]` lists. .config.database | to_entries -> [{"key": "host", "value": "localhost"}, {"key": "port", "value": 5432}] -[{"key": "a", "value": 1}] | from_entries --> {"a": 1} +$ echo '[{"key": "a", "value": 1}]' | lam '. | from_entries' +{ + "a": 1 +} ``` ### to_number @@ -414,11 +426,17 @@ CSV and TSV cells are strings by default; use `to_number` to coerce them before arithmetic. ``` -"42" | to_number -> 42 -"3.14" | to_number -> 3.14 -100 | to_number -> 100 +$ echo '"42"' | lam '. | to_number' +42 + +$ echo '"3.14"' | lam '. | to_number' +3.14 -.price | to_number on {price: "29.99"} -> 29.99 +$ echo '100' | lam '. | to_number' +100 + +$ echo '{"price": "29.99"}' | lam '.price | to_number' +29.99 ``` Throws on strings that do not parse, and on inputs that are not strings @@ -432,13 +450,26 @@ Possible return values: `"null"`, `"boolean"`, `"number"`, `"string"`, `"array"`, `"object"`. ``` -42 | type -> "number" -"hello" | type -> "string" -null | type -> "null" -[1, 2] | type -> "array" -{"a": 1} | type -> "object" +$ echo '42' | lam '. | type' +"number" + +$ echo '"hello"' | lam '. | type' +"string" -. | filter((. | type) == "number") on [1, "two", 3] -> [1, 3] +$ echo 'null' | lam '. | type' +"null" + +$ echo '[1, 2]' | lam '. | type' +"array" + +$ echo '{"a": 1}' | lam '. | type' +"object" + +$ echo '[1, "two", 3]' | lam '. | filter((. | type) == "number")' +[ + 1, + 3 +] ``` ### filter_values(predicate) @@ -455,8 +486,11 @@ Filter a map's values. Transform a map's values. ``` -{"a": 1, "b": 2} | map_values(. * 10) --> {"a": 10, "b": 20} +$ echo '{"a": 1, "b": 2}' | lam '. | map_values(. * 10)' +{ + "a": 10, + "b": 20 +} ``` ### filter_keys(predicate) diff --git a/lib/lambe.dart b/lib/lambe.dart index 2c1cbf6..9e89b81 100644 --- a/lib/lambe.dart +++ b/lib/lambe.dart @@ -174,6 +174,12 @@ Object? queryJson(String expression, String json) => /// /// Lazy: returns an [Iterable] that evaluates on demand. Safe to use /// over large inputs as long as individual lines fit in memory. +/// +/// For one-shot use where the expression is a string, see +/// [queryNdjsonString], which parses the expression once and delegates +/// here. Use this AST-taking variant when you've parsed the expression +/// up front (REPL session, bench harness) and want to apply it to many +/// ndjson lines without re-parsing. Iterable queryNdjson(Iterable lines, LamExpr ast) sync* { var lineNum = 0; for (final raw in lines) { @@ -196,6 +202,23 @@ Iterable queryNdjson(Iterable lines, LamExpr ast) sync* { } } +/// Evaluate [expression] against each non-empty line of [lines] +/// independently as a JSON document. +/// +/// Convenience equivalent to parsing [expression] once via [parseAst] +/// and calling [queryNdjson] with the resulting AST. The parse cost is +/// paid once, then amortized across every line. Errors flow through +/// the same `line N:` prefix machinery as [queryNdjson]. +/// +/// Throws [QueryError] if [expression] fails to parse, or on the first +/// per-line parse or evaluation error. Lazy in [lines]: parsing of +/// [expression] is eager (so syntax errors fire before any line is +/// read), but evaluation per line happens on demand. +Iterable queryNdjsonString(Iterable lines, String expression) { + final ast = parseAst(expression); + return queryNdjson(lines, ast); +} + /// Parse a query expression string into a [LamExpr] AST. /// /// Returns a Rumil [Result] which is [Success], [Partial], or [Failure]. @@ -218,14 +241,45 @@ Object? eval(LamExpr ast, Object? data) { /// Normalize [value] into the canonical shape the evaluator expects. /// -/// Recursively converts any `Map` into `Map` and any `List` -/// into `List`, regardless of original element type parameters. -/// Canonical collections from `parseInput`, `jsonDecode`, and hand-written -/// typed literals round-trip through this cheaply (one traversal, no per- -/// value reconstruction of scalars). +/// Already-canonical inputs (`Map`, `List`, +/// scalars) are returned unchanged via an identity-pass check. Non- +/// canonical inputs (e.g. `Map` from some third-party +/// JSON decoders) are recursively rebuilt as canonical types. /// /// Throws [QueryError] if a map has a non-string key. Object? _normalize(Object? value) { + if (_isCanonical(value)) return value; + return _rebuild(value); +} + +/// Returns `true` iff [value] already matches the canonical shape the +/// evaluator expects: scalars, `Map` (recursively), or +/// `List` (recursively). The recursive walk short-circuits on +/// the first non-canonical element, so canonical inputs cost one +/// traversal and no allocation. +bool _isCanonical(Object? value) { + if (value == null || value is num || value is bool || value is String) { + return true; + } + // Match the same `is List` / `is Map` checks + // the evaluator and pipe-op specs use, so canonical-by-evaluator-rules + // inputs always short-circuit here. + if (value is Map) { + for (final v in value.values) { + if (!_isCanonical(v)) return false; + } + return true; + } + if (value is List) { + for (final e in value) { + if (!_isCanonical(e)) return false; + } + return true; + } + return false; +} + +Object? _rebuild(Object? value) { if (value == null || value is num || value is bool || value is String) { return value; } @@ -345,7 +399,9 @@ String _describeLeftover(String expression, int offset) { final word = after.split(RegExp(r'[^a-zA-Z_]')).first; if (word.isNotEmpty && !parser_.pipeOpNames.contains(word)) { final jqHint = _jqPipeOpHint(word); - if (jqHint != null) return 'unknown operation "$word" after |\n help: $jqHint'; + if (jqHint != null) { + return 'unknown operation "$word" after |\n help: $jqHint'; + } final suggestion = _closestMatch(word, parser_.pipeOpNames); final hint = suggestion != null ? '\n help: did you mean "$suggestion"?' : ''; @@ -396,8 +452,6 @@ String? _jqPipeOpHint(String word) { /// `filter(...)`). /// - `empty` keyword (no `empty`; use `filter(pred)`). /// - `end` from a stranded `if/then/else/end` tail. -/// - `//` alternative operator (no `//` in Lambé; use `if` or -/// `filter`). String? _jqIdiomHint(String expression, int offset) { // `.users[]`: parser expected an index expression after `[` and // failed on `]`. Detect by: offset points at `]` and the previous @@ -421,8 +475,7 @@ String? _jqIdiomHint(String expression, int offset) { // `..`: second `.` with no identifier. if (offset < expression.length && expression[offset] == '.') { final before = expression.substring(0, offset).trimRight(); - if (before.endsWith('.') && - !before.endsWith('..')) { + if (before.endsWith('.') && !before.endsWith('..')) { return 'Lambé has no `..` recursive descent. ' 'Use explicit paths; combine `map(...)` and `flatten` for ' 'nested fan-out.'; @@ -432,7 +485,8 @@ String? _jqIdiomHint(String expression, int offset) { // `select(...)` in non-filter position. Fires anywhere — inside // `map(...)`, at top level, in the middle of a pipeline — since // `select` is only valid inside `filter(...)` in Lambé. - if (rest.startsWith('select(') || rest == 'select' || + if (rest.startsWith('select(') || + rest == 'select' || (rest.startsWith('select') && rest.length >= 7 && !_isIdentChar(rest.codeUnitAt(6)))) { @@ -450,8 +504,7 @@ String? _jqIdiomHint(String expression, int offset) { } // `end` from a stranded `if/then/else/end`. if (rest.startsWith('end') && - (rest.length == 3 || - !_isIdentChar(rest.codeUnitAt(3)))) { + (rest.length == 3 || !_isIdentChar(rest.codeUnitAt(3)))) { return '`if/then/else/end` is an expression in Lambé, not a pipe ' 'stage. Use it inside `map(...)` / `filter(...)`, and drop ' 'the `end` keyword — Lambé terminates `if` at the else branch.'; diff --git a/lib/src/ast.dart b/lib/src/ast.dart index 89dc67b..b65c316 100644 --- a/lib/src/ast.dart +++ b/lib/src/ast.dart @@ -123,198 +123,26 @@ final class BinaryOp extends LamExpr { const BinaryOp(this.op, this.left, this.right); } -/// Filter elements by predicate: `filter(.age > 30)`. -final class FilterOp extends LamExpr { - /// The predicate expression, evaluated per element. - final LamExpr predicate; - - /// Creates a filter operation with [predicate]. - const FilterOp(this.predicate); -} - -/// Transform each element: `map(.name)`. -final class MapOp extends LamExpr { - /// The transform expression, evaluated per element. - final LamExpr transform; - - /// Creates a map operation with [transform]. - const MapOp(this.transform); -} - -/// Sort elements naturally: `sort`. -final class SortOp extends LamExpr { - /// Creates a sort operation. - const SortOp(); -} - -/// Reverse element order: `reverse`. -final class ReverseOp extends LamExpr { - /// Creates a reverse operation. - const ReverseOp(); -} - -/// Get keys of a map or indices of a list: `keys`. -final class KeysOp extends LamExpr { - /// Creates a keys operation. - const KeysOp(); -} - -/// Get values of a map (or identity for a list): `values`. -final class ValuesOp extends LamExpr { - /// Creates a values operation. - const ValuesOp(); -} - -/// Get length of a list, map, or string: `length`. -final class LengthOp extends LamExpr { - /// Creates a length operation. - const LengthOp(); -} - -/// Get first element of a list: `first`. -final class FirstOp extends LamExpr { - /// Creates a first operation. - const FirstOp(); -} - -/// Get last element of a list: `last`. -final class LastOp extends LamExpr { - /// Creates a last operation. - const LastOp(); -} - -/// Sum all numeric elements: `sum`. -final class SumOp extends LamExpr { - /// Creates a sum operation. - const SumOp(); -} - -/// Average of all numeric elements: `avg`. -final class AvgOp extends LamExpr { - /// Creates an avg operation. - const AvgOp(); -} - -/// Minimum element: `min`. -final class MinOp extends LamExpr { - /// Creates a min operation. - const MinOp(); -} - -/// Maximum element: `max`. -final class MaxOp extends LamExpr { - /// Creates a max operation. - const MaxOp(); -} - -/// Sort by a key expression: `sort_by(.age)`. -final class SortByOp extends LamExpr { - /// The key expression, evaluated per element. - final LamExpr key; - - /// Creates a sort_by operation with [key]. - const SortByOp(this.key); -} - -/// Group elements by a key expression: `group_by(.type)`. +/// A built-in pipe operation: `filter(...)`, `map(...)`, `sort`, `length`, ... /// -/// Returns `[{key: k, values: [items]}, ...]`. -final class GroupByOp extends LamExpr { - /// The key expression, evaluated per element. - final LamExpr key; - - /// Creates a group_by operation with [key]. - const GroupByOp(this.key); -} - -/// Remove duplicate elements: `unique`. -final class UniqueOp extends LamExpr { - /// Creates a unique operation. - const UniqueOp(); -} - -/// Remove duplicates by key: `unique_by(.name)`. -final class UniqueByOp extends LamExpr { - /// The key expression, evaluated per element. - final LamExpr key; - - /// Creates a unique_by operation with [key]. - const UniqueByOp(this.key); -} - -/// Flatten one level of nesting: `flatten`. -final class FlattenOp extends LamExpr { - /// Creates a flatten operation. - const FlattenOp(); -} - -/// Filter map values by predicate: `filter_values(. > 5)`. -final class FilterValuesOp extends LamExpr { - /// The predicate expression, evaluated per value. - final LamExpr predicate; - - /// Creates a filter_values operation with [predicate]. - const FilterValuesOp(this.predicate); -} - -/// Transform map values: `map_values(. * 2)`. -final class MapValuesOp extends LamExpr { - /// The transform expression, evaluated per value. - final LamExpr transform; - - /// Creates a map_values operation with [transform]. - const MapValuesOp(this.transform); -} - -/// Check if a key exists: `has("name")` or `has(.key_field)`. -/// -/// The key expression is evaluated and must produce a `String`. -/// Returns `true` if the input map contains the key. -final class HasOp extends LamExpr { - /// The key expression (must evaluate to a string). - final LamExpr key; - - /// Creates a has operation with [key]. - const HasOp(this.key); -} - -/// Convert a map to a list of `{key, value}` entries: `to_entries`. -final class ToEntriesOp extends LamExpr { - /// Creates a to_entries operation. - const ToEntriesOp(); -} - -/// Convert a list of `{key, value}` entries back to a map: `from_entries`. -final class FromEntriesOp extends LamExpr { - /// Creates a from_entries operation. - const FromEntriesOp(); -} - -/// Parse a string as a number: `to_number`. +/// The [name] corresponds to a spec in `shape/pipe_ops.dart`, which is the +/// single source of truth for the op's input acceptance, shape inference, +/// runtime evaluation, and parser arity. Adding a new op is a one-file +/// change to that table. /// -/// Matches CSV and TSV cells, which are strings by default. Pass-through -/// for existing numbers. Throws on strings that do not parse. -final class ToNumberOp extends LamExpr { - /// Creates a to_number operation. - const ToNumberOp(); -} - -/// Runtime type of the input as a string: `type`. -/// -/// Returns one of `"null"`, `"boolean"`, `"number"`, `"string"`, -/// `"array"`, `"object"`. -final class TypeOp extends LamExpr { - /// Creates a type operation. - const TypeOp(); -} +/// [args] holds parsed sub-expressions: empty for zero-arg ops like +/// `length`, single-element for one-arg ops like `filter(predicate)` or +/// `map(transform)`. Custom-arity ops (currently just `as(fmt)` with its +/// typed [OutputFormat] argument) keep dedicated AST classes; see [As]. +final class BuiltinPipeOp extends LamExpr { + /// The canonical op name (matches a [PipeOpInfo.name] in the spec table). + final String name; -/// Filter map keys by predicate: `filter_keys(. != "internal")`. -final class FilterKeysOp extends LamExpr { - /// The predicate expression, evaluated per key. - final LamExpr predicate; + /// Parsed argument expressions, in source order. Empty for zero-arg ops. + final List args; - /// Creates a filter_keys operation with [predicate]. - const FilterKeysOp(this.predicate); + /// Creates a built-in pipe op. + const BuiltinPipeOp(this.name, this.args); } /// Object construction: `{name, total: .price * .qty}`. diff --git a/lib/src/completer.dart b/lib/src/completer.dart index d3ba4eb..a42d630 100644 --- a/lib/src/completer.dart +++ b/lib/src/completer.dart @@ -327,17 +327,11 @@ Shape _resolveTarget(LamExpr? ast, Shape inputShape) { /// Extract the inner expression from a parameterized pipe operation. /// -/// Returns `null` for simple (no-arg) ops like [SortOp], [ReverseOp], -/// etc. and for non-operation expressions like [ObjConstruct]. -LamExpr? _innerExpr(LamExpr op) => switch (op) { - FilterOp(:final predicate) => predicate, - MapOp(:final transform) => transform, - SortByOp(:final key) => key, - GroupByOp(:final key) => key, - UniqueByOp(:final key) => key, - FilterValuesOp(:final predicate) => predicate, - MapValuesOp(:final transform) => transform, - FilterKeysOp(:final predicate) => predicate, - HasOp(:final key) => key, - _ => null, -}; +/// Returns `null` for zero-arg ops (`sort`, `reverse`, `length`, ...) +/// and for non-operation expressions like [ObjConstruct]. The unified +/// [BuiltinPipeOp] dispatch makes this trivial: any one-arg op stores +/// its inner expression at `args[0]`. +LamExpr? _innerExpr(LamExpr op) { + if (op is! BuiltinPipeOp) return null; + return op.args.isEmpty ? null : op.args[0]; +} diff --git a/lib/src/evaluator.dart b/lib/src/evaluator.dart index 34de3c5..1a49359 100644 --- a/lib/src/evaluator.dart +++ b/lib/src/evaluator.dart @@ -1,15 +1,14 @@ /// Query evaluator. Walks the AST over `Object?` JSON values. library; -import 'dart:convert'; - import 'package:rumil_expressions/rumil_expressions.dart' - show applyBinaryOp, applyUnaryOp, asBool, compareValues, typeName; + show applyBinaryOp, applyUnaryOp, asBool, typeName; import 'ast.dart'; import 'errors.dart'; import 'output_format.dart'; import 'shape/check.dart'; +import 'shape/pipe_ops.dart'; import 'shape/shape.dart'; /// Evaluate a [LamExpr] AST against a JSON [ctx] value. @@ -43,6 +42,9 @@ Object? evaluate(LamExpr expr, Object? ctx) => switch (expr) { evaluate(left, ctx), evaluate(right, ctx), ), + // Duplicate keys: later entries silently override earlier ones (Dart + // map literal semantics). The parser does not reject duplicates; users + // wanting strictness can validate the AST. ObjConstruct(:final entries) => { for (final (key, valExpr) in entries) key: evaluate(valExpr, ctx), }, @@ -51,9 +53,7 @@ Object? evaluate(LamExpr expr, Object? ctx) => switch (expr) { ? evaluate(then_, ctx) : evaluate(else_, ctx), Alternative(:final left, :final right) => _alternative(left, right, ctx), - ListConstruct(:final parts) => [ - for (final p in parts) evaluate(p, ctx), - ], + ListConstruct(:final parts) => [for (final p in parts) evaluate(p, ctx)], StringInterp(:final parts) => _interpolate(parts, ctx), Slice(:final target, :final start, :final end) => _slice( evaluate(target, ctx), @@ -61,32 +61,7 @@ Object? evaluate(LamExpr expr, Object? ctx) => switch (expr) { end, ctx, ), - FilterOp(:final predicate) => _filter(ctx, predicate), - MapOp(:final transform) => _mapOp(ctx, transform), - SortOp() => _sort(ctx), - ReverseOp() => _reverse(ctx), - KeysOp() => _keys(ctx), - ValuesOp() => _values(ctx), - LengthOp() => _length(ctx), - FirstOp() => _first(ctx), - LastOp() => _last(ctx), - SumOp() => _sum(ctx), - AvgOp() => _avg(ctx), - MinOp() => _min(ctx), - MaxOp() => _max(ctx), - SortByOp(:final key) => _sortBy(ctx, key), - GroupByOp(:final key) => _groupBy(ctx, key), - UniqueOp() => _unique(ctx), - UniqueByOp(:final key) => _uniqueBy(ctx, key), - FlattenOp() => _flatten(ctx), - FilterValuesOp(:final predicate) => _filterValues(ctx, predicate), - MapValuesOp(:final transform) => _mapValues(ctx, transform), - FilterKeysOp(:final predicate) => _filterKeys(ctx, predicate), - HasOp(:final key) => _has(ctx, key), - ToEntriesOp() => _toEntries(ctx), - FromEntriesOp() => _fromEntries(ctx), - ToNumberOp() => _toNumber(ctx), - TypeOp() => _typeOf(ctx), + BuiltinPipeOp() => evalBuiltinPipeOp(expr, ctx, evaluate), As(:final target) => _as(ctx, target), }; @@ -138,6 +113,19 @@ Object? _index(Object? target, Object? idx) { if (idx is String) return target[idx]; throw QueryError('Cannot index map with ${typeName(idx)}'); } + // String single-char indexing mirrors slice semantics: `.name[0]` + // returns a one-character substring, matching how `.name[0:1]` already + // worked. Out-of-range returns null (same convention as list + // indexing). + if (target is String) { + if (idx is num) { + final i = idx.toInt(); + final resolved = i < 0 ? target.length + i : i; + if (resolved < 0 || resolved >= target.length) return null; + return target.substring(resolved, resolved + 1); + } + throw QueryError('Cannot index string with ${typeName(idx)}'); + } throw QueryError('Cannot index ${typeName(target)}'); } @@ -147,8 +135,15 @@ Object? _pipe(Object? input, LamExpr op) { } /// Evaluate `left // right`: returns `left`'s value if non-null, -/// otherwise `right`'s value. `right` is only evaluated on fallback, -/// so `.a // someExpensiveFallback` pays nothing when `.a` hits. +/// otherwise `right`'s value. +/// +/// `//` is a null-fallback, not an error-handler. If `left` throws +/// (e.g. a type error during evaluation), the throw propagates without +/// trying `right`. To rescue from errors, use shape-checking facilities +/// (e.g. `filter(has("foo")) | .foo` instead of `.foo // ...`). +/// +/// `right` is only evaluated on null fallback, so +/// `.a // someExpensiveFallback` pays nothing when `.a` hits. Object? _alternative(LamExpr left, LamExpr right, Object? ctx) { final primary = evaluate(left, ctx); if (primary != null) return primary; @@ -172,198 +167,6 @@ Object _binaryOp(String op, Object? l, Object? r) { return applyBinaryOp(op, l, r); } -List _filter(Object? input, LamExpr predicate) { - final list = _asList(input, 'filter'); - return [ - for (final item in list) - if (evaluate(predicate, item) == true) item, - ]; -} - -List _mapOp(Object? input, LamExpr transform) { - final list = _asList(input, 'map'); - return [for (final item in list) evaluate(transform, item)]; -} - -List _sort(Object? input) { - final list = List.of(_asList(input, 'sort')); - list.sort(compareValues); - return list; -} - -List _reverse(Object? input) => - List.of(_asList(input, 'reverse').reversed); - -List _keys(Object? input) { - if (input is Map) return input.keys.toList(); - if (input is List) { - return [for (var i = 0; i < input.length; i++) i]; - } - throw QueryError('keys: expected map or list, got ${typeName(input)}'); -} - -List _values(Object? input) { - if (input is Map) return input.values.toList(); - if (input is List) return input; - throw QueryError('values: expected map or list, got ${typeName(input)}'); -} - -int _length(Object? input) { - if (input is List) return input.length; - if (input is Map) return input.length; - if (input is String) return input.length; - throw QueryError( - 'length: expected list, map, or string, got ${typeName(input)}', - ); -} - -Object? _first(Object? input) { - final list = _asList(input, 'first'); - return list.isEmpty ? null : list.first; -} - -Object? _last(Object? input) { - final list = _asList(input, 'last'); - return list.isEmpty ? null : list.last; -} - -num _sum(Object? input) { - final list = _asList(input, 'sum'); - num total = 0; - for (final item in list) { - if (item is! num) { - throw QueryError('sum: expected number, got ${typeName(item)}'); - } - total += item; - } - return total; -} - -double _avg(Object? input) { - final list = _asList(input, 'avg'); - if (list.isEmpty) throw const QueryError('avg: empty list'); - return _sum(list).toDouble() / list.length; -} - -Object? _min(Object? input) { - final list = _asList(input, 'min'); - if (list.isEmpty) throw const QueryError('min: empty list'); - var best = list.first; - for (var i = 1; i < list.length; i++) { - if (compareValues(list[i], best) < 0) best = list[i]; - } - return best; -} - -Object? _max(Object? input) { - final list = _asList(input, 'max'); - if (list.isEmpty) throw const QueryError('max: empty list'); - var best = list.first; - for (var i = 1; i < list.length; i++) { - if (compareValues(list[i], best) > 0) best = list[i]; - } - return best; -} - -List _sortBy(Object? input, LamExpr key) { - final list = List.of(_asList(input, 'sort_by')); - list.sort((a, b) => compareValues(evaluate(key, a), evaluate(key, b))); - return list; -} - -List> _groupBy(Object? input, LamExpr key) { - final list = _asList(input, 'group_by'); - // Group on a canonical string representation so structurally-equal - // Maps and Lists compare as equal. A side map preserves the original - // key value for the output record. - final groups = >{}; - final originalKeys = {}; - for (final item in list) { - final k = evaluate(key, item); - final canonical = _canonicalKey(k); - originalKeys[canonical] = k; - (groups[canonical] ??= []).add(item); - } - return [ - for (final entry in groups.entries) - {'key': originalKeys[entry.key], 'values': entry.value}, - ]; -} - -List _unique(Object? input) { - final list = _asList(input, 'unique'); - final seen = {}; - return [ - for (final item in list) - if (seen.add(_canonicalKey(item))) item, - ]; -} - -List _uniqueBy(Object? input, LamExpr key) { - final list = _asList(input, 'unique_by'); - final seen = {}; - return [ - for (final item in list) - if (seen.add(_canonicalKey(evaluate(key, item)))) item, - ]; -} - -/// Canonical string representation of [value] for use as a hash key. -/// -/// Dart's native equality on `List` and `Map` is reference-based, so -/// structurally-equal collections compare as unequal. `unique`, `unique_by`, -/// and `group_by` need structural equality to behave sensibly. Encoding the -/// value as JSON with sorted map keys gives a stable, equality-friendly key. -String _canonicalKey(Object? value) => jsonEncode(_sortKeys(value)); - -/// Recursively sort map keys so `jsonEncode` produces a stable output. -Object? _sortKeys(Object? value) { - if (value is Map) { - final sorted = {}; - final keys = value.keys.toList()..sort(); - for (final k in keys) { - sorted[k] = _sortKeys(value[k]); - } - return sorted; - } - if (value is List) { - return [for (final e in value) _sortKeys(e)]; - } - return value; -} - -List _flatten(Object? input) { - final list = _asList(input, 'flatten'); - return [ - for (final item in list) - if (item is List) ...item else item, - ]; -} - -Map _filterValues(Object? input, LamExpr predicate) { - final map = _asMap(input, 'filter_values'); - return { - for (final MapEntry(:key, :value) in map.entries) - if (evaluate(predicate, value) == true) key: value, - }; -} - -Map _mapValues(Object? input, LamExpr transform) { - final map = _asMap(input, 'map_values'); - return { - for (final MapEntry(:key, :value) in map.entries) - key: evaluate(transform, value), - }; -} - -Map _filterKeys(Object? input, LamExpr predicate) { - final map = _asMap(input, 'filter_keys'); - return { - for (final MapEntry(:key, :value) in map.entries) - if (evaluate(predicate, key) == true) key: value, - }; -} - String _interpolate(List parts, Object? ctx) { final buffer = StringBuffer(); for (final part in parts) { @@ -411,70 +214,3 @@ int _resolveSliceIndex( } throw QueryError('Slice index must be a number, got ${typeName(value)}'); } - -bool _has(Object? input, LamExpr key) { - if (input is Map) { - final k = evaluate(key, input); - if (k is String) return input.containsKey(k); - throw QueryError('has: key must be a string, got ${typeName(k)}'); - } - if (input is List) { - final k = evaluate(key, input); - if (k is num) return k.toInt() >= 0 && k.toInt() < input.length; - throw QueryError('has: index must be a number, got ${typeName(k)}'); - } - throw QueryError('has: expected map or list, got ${typeName(input)}'); -} - -List> _toEntries(Object? input) { - final map = _asMap(input, 'to_entries'); - return [ - for (final MapEntry(:key, :value) in map.entries) - {'key': key, 'value': value}, - ]; -} - -Map _fromEntries(Object? input) { - final list = _asList(input, 'from_entries'); - return { - for (final item in list) - if (item is Map) - (item['key'] as String? ?? - (throw const QueryError( - 'from_entries: entry missing "key" field', - ))): - item['value'], - }; -} - -num _toNumber(Object? input) { - if (input is num) return input; - if (input is String) { - final parsed = num.tryParse(input); - if (parsed != null) return parsed; - throw QueryError('to_number: cannot parse "$input" as a number'); - } - throw QueryError( - 'to_number: expected string or number, got ${typeName(input)}', - ); -} - -String _typeOf(Object? input) => switch (input) { - null => 'null', - bool() => 'boolean', - num() => 'number', - String() => 'string', - List() => 'array', - Map() => 'object', - _ => throw QueryError('type: unexpected runtime type ${input.runtimeType}'), -}; - -List _asList(Object? v, String ctx) { - if (v is List) return v; - throw QueryError('$ctx: expected list, got ${typeName(v)}'); -} - -Map _asMap(Object? v, String ctx) { - if (v is Map) return v; - throw QueryError('$ctx: expected map, got ${typeName(v)}'); -} diff --git a/lib/src/highlight_grammar.dart b/lib/src/highlight_grammar.dart new file mode 100644 index 0000000..3b3a3d9 --- /dev/null +++ b/lib/src/highlight_grammar.dart @@ -0,0 +1,25 @@ +/// Lexical grammar for the Lambé REPL syntax highlighter. +/// +/// Lives in lambé rather than in rumil_tokens' built-in grammars +/// because the grammar is lambé-specific. The REPL's `_highlight` +/// builds a tokenizer from this grammar once at startup and re-runs +/// it on every keystroke, so the cost is amortized across a session. +library; + +import 'package:rumil_tokens/rumil_tokens.dart'; + +/// Lambé query grammar for the REPL highlighter. +/// +/// Keywords cover the conditional (`if/then/else`), the literals +/// (`true/false/null`), and the `and`/`or` aliases. Operator tables +/// match Lambé's actual operator set, including the right-associative +/// `//` alternative and the `&&`/`||` symbolic forms. No comments — +/// Lambé queries are one-liners typed at the REPL prompt. +const LangGrammar lambeGrammar = LangGrammar( + name: 'lambe', + keywords: ['if', 'then', 'else', 'true', 'false', 'null', 'and', 'or'], + stringDelimiters: ['"'], + punctuationChars: '(){}[],;:.', + operatorChars: '+-*/%=!<>&|', + multiCharOperators: ['==', '!=', '<=', '>=', '&&', '||', '//'], +); diff --git a/lib/src/input.dart b/lib/src/input.dart index 1a24278..96d0234 100644 --- a/lib/src/input.dart +++ b/lib/src/input.dart @@ -43,7 +43,7 @@ Object? parseInput(String input, Format format) => switch (format) { Format.toml => _parse(parseToml(input), tomlDocToNative, 'TOML'), Format.hcl => _parse(parseHcl(input), hclDocToNative, 'HCL'), Format.csv => _parseDelimited(input, null), - Format.tsv => _parseDelimited(input, defaultTsvConfig), + Format.tsv => _parseDelimited(input, _detectTsvDialect(input)), Format.markdown => _parseMd(input), }; @@ -94,6 +94,23 @@ Object? _parse( throw QueryError('$formatName parse error: ${result.errors}'), }; +/// Detect a TSV dialect by reusing [detectDialect]'s header and quote +/// inference, but force the tab delimiter. +/// +/// The file extension (or explicit `Format.tsv`) is the strongest signal +/// that fields are tab-separated; `detectDialect` would otherwise be free +/// to pick `,` or `;` if the sample is ambiguous. Header detection still +/// runs because TSV's documented model matches CSV: a header row produces +/// `List>`. +DelimitedConfig _detectTsvDialect(String input) { + final detected = detectDialect(input); + return DelimitedConfig( + delimiter: '\t', + quote: detected.quote, + hasHeader: detected.hasHeader, + ); +} + /// Parse delimited input, auto-detecting dialect if [config] is null. /// /// If the detected (or provided) dialect has headers, returns diff --git a/lib/src/parser.dart b/lib/src/parser.dart index 6c702d0..cd1e8ce 100644 --- a/lib/src/parser.dart +++ b/lib/src/parser.dart @@ -231,12 +231,17 @@ final Parser _closeBracket = _sym(']').recover(succeed('')); final Parser _closeBrace = _sym('}').recover(succeed('')); /// Parameterized pipe op: `name(expr)` with tolerant inner and close. -Parser _paramOp( - String name, - LamExpr Function(LamExpr) ctor, -) => _sym( - name, -).skipThen(_sym('(')).skipThen(_innerExpr).thenSkip(_closeParen).map(ctor); +/// +/// [astName] is the canonical op name written into [BuiltinPipeOp]; +/// [synName] is the keyword the parser matches in the source. They +/// differ for jq-idiom aliases (e.g. parser sees `tonumber`, AST says +/// `to_number`). For canonical ops the two are equal. +Parser _paramOp(String synName, String astName) => + _sym(synName) + .skipThen(_sym('(')) + .skipThen(_innerExpr) + .thenSkip(_closeParen) + .map((inner) => BuiltinPipeOp(astName, [inner])); /// `as(format)` parser: shape-directed bridge to an output format. /// @@ -260,8 +265,8 @@ final Parser _asOp = _sym('as') /// so `sort_by` is tried before `sort`). Each spec contributes one /// alternative whose shape depends on [shape_ops.PipeOpParseKind]: /// -/// - `zeroArg` → `_kw(name).as(zeroArgCtor())` -/// - `oneArg` → `_paramOp(name, oneArgCtor)` +/// - `zeroArg` → `_kw(name).as(BuiltinPipeOp(name, const []))` +/// - `oneArg` → `_paramOp(name, name)` (builds `BuiltinPipeOp(name, [arg])`) /// - `custom` → hand-written rule (currently only `as(fmt)`, which /// takes a closed keyword set rather than an arbitrary expression). /// @@ -274,18 +279,24 @@ final Parser _pipeOp = _buildPipeOp(); /// existing Lambé op. Registered at the parser layer so shape/eval /// stay unaware. Canonical name is what `--print-shape` / `--explain` /// emit; these just let jq-trained agents land the query. -const Map _jqAliases = { - 'tonumber': 'to_number', -}; +/// +/// Only entries whose jq semantics match an existing Lambé op exactly +/// belong here. `select` deliberately stays out — `select(p)` is only +/// valid inside `filter(...)` in Lambé and an alias would mislead; +/// `_jqIdiomHint` already steers users to `filter`. `paths`, +/// `recurse`, etc. need pattern hints, not aliases. +const Map _jqAliases = {'tonumber': 'to_number', 'add': 'sum'}; Parser _buildPipeOp() { final alternatives = >[]; for (final spec in shape_ops.pipeOpSpecs) { switch (spec.parseKind) { case shape_ops.PipeOpParseKind.zeroArg: - alternatives.add(_kw(spec.name).as(spec.zeroArgCtor!())); + alternatives.add( + _kw(spec.name).as(BuiltinPipeOp(spec.name, const [])), + ); case shape_ops.PipeOpParseKind.oneArg: - alternatives.add(_paramOp(spec.name, spec.oneArgCtor!)); + alternatives.add(_paramOp(spec.name, spec.name)); case shape_ops.PipeOpParseKind.custom: // Handled below. break; @@ -301,9 +312,11 @@ Parser _buildPipeOp() { if (canonical == null) continue; switch (canonical.parseKind) { case shape_ops.PipeOpParseKind.zeroArg: - alternatives.add(_kw(entry.key).as(canonical.zeroArgCtor!())); + alternatives.add( + _kw(entry.key).as(BuiltinPipeOp(canonical.name, const [])), + ); case shape_ops.PipeOpParseKind.oneArg: - alternatives.add(_paramOp(entry.key, canonical.oneArgCtor!)); + alternatives.add(_paramOp(entry.key, canonical.name)); case shape_ops.PipeOpParseKind.custom: break; } @@ -356,8 +369,9 @@ final Parser _postfix = rule( /// `/` must not match the first `/` of `//` (alternative operator). Other /// single-char ops don't have a longer variant that would be ambiguous at /// the binary-operator level, so only `/` needs a notFollowedBy guard. -final Parser _divSym = - _lex(string('/').thenSkip(char('/').notFollowedBy)); +final Parser _divSym = _lex( + string('/').thenSkip(char('/').notFollowedBy), +); /// Lambé's symbol parser routing: `/` requires a not-followed-by guard /// so it doesn't shadow the `//` alternative; everything else is a diff --git a/lib/src/readline.dart b/lib/src/readline.dart index fe0fda9..e000c36 100644 --- a/lib/src/readline.dart +++ b/lib/src/readline.dart @@ -2,11 +2,17 @@ /// /// Handles printable characters, cursor movement, history navigation, /// tab completion with common-prefix fill, and standard editing shortcuts. -/// No external dependencies - uses only `dart:io`. +/// Uses [rumil_tokens] for the syntax highlighter; no other external +/// runtime dependency. library; import 'dart:io'; +import 'package:rumil/rumil.dart'; +import 'package:rumil_tokens/rumil_tokens.dart'; + +import 'highlight_grammar.dart'; + /// Callback for tab completion. /// /// Takes the current input [text] and [cursor] position. Returns a record @@ -380,115 +386,58 @@ const _hYellow = '\x1b[33m'; const _hMagenta = '\x1b[35m'; const _hRed = '\x1b[31m'; -/// Colorize a buffer for display. Lightweight lexer-level scan - not a full -/// parse, but good enough for interactive highlighting. +/// rumil_tokens parser for the Lambé grammar. Built once at module +/// load time so per-keystroke highlighting only pays the run cost. +final Parser>> _highlightTokenizer = + buildTokenizer(lambeGrammar); + +/// Colorize a buffer for display. +/// +/// Tokenizes through [rumil_tokens] and maps each token kind to an +/// ANSI color. The tokenizer is lossless: concatenating the colored +/// segments reproduces the original input verbatim. On the rare +/// tokenizer failure the raw text is returned uncolored so the user +/// still sees what they typed. String _highlight(List buf) { if (buf.isEmpty) return ''; - - final out = StringBuffer(); final text = String.fromCharCodes(buf); - var i = 0; - - while (i < text.length) { - final c = text[i]; - - if (c == '"') { - out.write(_hGreen); - out.write('"'); - i++; - while (i < text.length && text[i] != '"') { - if (text[i] == r'\' && i + 1 < text.length) { - out.write(text[i]); - out.write(text[i + 1]); - i += 2; - } else { - out.write(text[i]); - i++; - } - } - if (i < text.length) { - out.write('"'); - i++; - } - out.write(_hReset); - continue; - } - - if ((c == '-' && i + 1 < text.length && _isDigit(text.codeUnitAt(i + 1))) || - _isDigit(c.codeUnitAt(0))) { - out.write(_hYellow); - if (c == '-') { - out.write(c); - i++; - } - while (i < text.length && - (_isDigit(text.codeUnitAt(i)) || text[i] == '.')) { - out.write(text[i]); - i++; - } - out.write(_hReset); - continue; - } - - if ('|><=!&+-*/%'.contains(c)) { - out.write(_hDim); - out.write(c); - if (i + 1 < text.length && '|&='.contains(text[i + 1])) { - out.write(text[i + 1]); - i++; - } - out.write(_hReset); - i++; - continue; - } - - if ('()[]{}:,'.contains(c)) { - out.write(_hDim); - out.write(c); - out.write(_hReset); - i++; - continue; - } - - if (c == '.') { - out.write(_hCyan); - out.write('.'); - i++; - while (i < text.length && _isWordChar(text.codeUnitAt(i))) { - out.write(text[i]); - i++; - } + final result = _highlightTokenizer.run(text); + final spans = switch (result) { + Success>>(:final value) => value, + Partial>>(:final value) => value, + Failure>>() => null, + }; + if (spans == null) return text; + final out = StringBuffer(); + for (final span in spans) { + final color = _colorFor(span.token); + if (color.isEmpty) { + out.write(span.token.text); + } else { + out.write(color); + out.write(span.token.text); out.write(_hReset); - continue; - } - - if (_isWordChar(c.codeUnitAt(0))) { - final start = i; - while (i < text.length && _isWordChar(text.codeUnitAt(i))) { - i++; - } - final word = text.substring(start, i); - switch (word) { - case 'true' || 'false': - out.write('$_hMagenta$word$_hReset'); - case 'null': - out.write('$_hRed$word$_hReset'); - case 'if' || 'then' || 'else': - out.write('$_hMagenta$word$_hReset'); - default: - out.write(word); - } - continue; } - - out.write(c); - i++; } - return out.toString(); } -bool _isDigit(int c) => c >= 0x30 && c <= 0x39; +/// ANSI color (or empty string for "no color") for [token]. +/// +/// Choices preserve the previous hand-rolled highlighter's vibe: +/// strings green, numbers yellow, keywords magenta, `null` red, +/// punctuation/operators dim, `.` cyan (the field-access mark). +String _colorFor(Token token) => switch (token) { + StringLit() => _hGreen, + NumberLit() => _hYellow, + Keyword(text: 'null') => _hRed, + Keyword() => _hMagenta, + Punctuation(text: '.') => _hCyan, + Operator() || Punctuation() => _hDim, + Comment() => _hDim, + Annotation() => _hCyan, + _ => '', +}; bool _isWordChar(int c) => (c >= 0x30 && c <= 0x39) || diff --git a/lib/src/shape/explain.dart b/lib/src/shape/explain.dart index 2ba0e68..a046e75 100644 --- a/lib/src/shape/explain.dart +++ b/lib/src/shape/explain.dart @@ -238,21 +238,22 @@ String? _analyzePredicate(LamExpr op, Shape inputShape) { // determined by the inner shape; absence is handled by the // runtime-rejection warning elsewhere. final concrete = inputShape is SOptional ? inputShape.inner : inputShape; - switch (op) { - case FilterOp(:final predicate): + if (op is! BuiltinPipeOp) return null; + switch (op.name) { + case 'filter': final element = concrete is SList ? concrete.element : const SAny(); - return _predicateWarning(predicate, element, 'filter', 'element'); - case FilterValuesOp(:final predicate): + return _predicateWarning(op.args[0], element, 'filter', 'element'); + case 'filter_values': final value = switch (concrete) { SMap(:final fields) when fields.isNotEmpty => fields.values.reduce( (a, b) => a == b ? a : const SAny(), ), _ => const SAny(), }; - return _predicateWarning(predicate, value, 'filter_values', 'value'); - case FilterKeysOp(:final predicate): + return _predicateWarning(op.args[0], value, 'filter_values', 'value'); + case 'filter_keys': return _predicateWarning( - predicate, + op.args[0], const SString(), 'filter_keys', 'key', @@ -320,11 +321,9 @@ String? _analyzeRejection(LamExpr op, Shape inputShape) { /// lists (outer shape errors surface as runtime-rejection warnings /// instead), or when the argument references a field that may exist. String? _analyzeTrivial(LamExpr op, Shape inputShape) { - final (argExpr, opName) = switch (op) { - SortByOp(:final key) => (key, 'sort_by'), - GroupByOp(:final key) => (key, 'group_by'), - MapOp(:final transform) => (transform, 'map'), - UniqueByOp(:final key) => (key, 'unique_by'), + if (op is! BuiltinPipeOp) return null; + final (argExpr, opName) = switch (op.name) { + 'sort_by' || 'group_by' || 'map' || 'unique_by' => (op.args[0], op.name), _ => (null, null), }; if (argExpr == null || opName == null) return null; @@ -417,32 +416,9 @@ String _render(LamExpr expr) => switch (expr) { UnaryOp(:final op, :final operand) => '$op${_render(operand)}', BinaryOp(:final op, :final left, :final right) => '${_render(left)} $op ${_render(right)}', - FilterOp(:final predicate) => 'filter(${_render(predicate)})', - MapOp(:final transform) => 'map(${_render(transform)})', - SortByOp(:final key) => 'sort_by(${_render(key)})', - GroupByOp(:final key) => 'group_by(${_render(key)})', - UniqueByOp(:final key) => 'unique_by(${_render(key)})', - FilterValuesOp(:final predicate) => 'filter_values(${_render(predicate)})', - MapValuesOp(:final transform) => 'map_values(${_render(transform)})', - FilterKeysOp(:final predicate) => 'filter_keys(${_render(predicate)})', - HasOp(:final key) => 'has(${_render(key)})', - SortOp() => 'sort', - ReverseOp() => 'reverse', - KeysOp() => 'keys', - ValuesOp() => 'values', - LengthOp() => 'length', - FirstOp() => 'first', - LastOp() => 'last', - SumOp() => 'sum', - AvgOp() => 'avg', - MinOp() => 'min', - MaxOp() => 'max', - UniqueOp() => 'unique', - FlattenOp() => 'flatten', - ToEntriesOp() => 'to_entries', - FromEntriesOp() => 'from_entries', - ToNumberOp() => 'to_number', - TypeOp() => 'type', + BuiltinPipeOp(:final name, :final args) when args.isEmpty => name, + BuiltinPipeOp(:final name, :final args) => + '$name(${args.map(_render).join(', ')})', As(:final target) => 'as(${target.name})', ObjConstruct(:final entries) => '{${[for (final (k, v) in entries) '$k: ${_render(v)}'].join(', ')}}', @@ -454,8 +430,7 @@ String _render(LamExpr expr) => switch (expr) { 'if ${_render(condition)} then ${_render(then_)} else ${_render(else_)}', Alternative(:final left, :final right) => '${_render(left)} // ${_render(right)}', - ListConstruct(:final parts) => - '[${parts.map(_render).join(', ')}]', + ListConstruct(:final parts) => '[${parts.map(_render).join(', ')}]', }; /// Render an [ExplainReport] as a plaintext table suitable for stdout. diff --git a/lib/src/shape/infer.dart b/lib/src/shape/infer.dart index be750d8..7dca400 100644 --- a/lib/src/shape/infer.dart +++ b/lib/src/shape/infer.dart @@ -109,11 +109,10 @@ Shape inferShape(LamExpr expr, Shape input) { // `[e1, e2, ...]` yields `SList(join(parts))`. Empty list literal // has no element shape, so widen to `SList(SAny)`. - ListConstruct(:final parts) => parts.isEmpty - ? const SList(SAny()) - : SList(parts - .map((p) => inferShape(p, input)) - .reduce(_joinBranches)), + ListConstruct(:final parts) => + parts.isEmpty + ? const SList(SAny()) + : SList(parts.map((p) => inferShape(p, input)).reduce(_joinBranches)), // Pipe ops are handled above via [pipeOpInfoFor]; reaching this // case means the spec table is missing an op AST subtype. Falling diff --git a/lib/src/shape/pipe_ops.dart b/lib/src/shape/pipe_ops.dart index d6b9685..389f8a7 100644 --- a/lib/src/shape/pipe_ops.dart +++ b/lib/src/shape/pipe_ops.dart @@ -1,10 +1,11 @@ -/// Single source of truth for pipe-op metadata. +/// Single source of truth for pipe-op metadata, runtime, and parsing. /// /// Each [PipeOpInfo] record describes one pipe operation: its canonical /// name, which input [Shape]s it accepts (structurally; element-level -/// constraints are not modelled), and how it transforms the input shape -/// into an output shape. The parser's [pipeOpNames], the completer's -/// shape-gated candidate filter, and [inferShape]'s per-op cases all +/// constraints are not modelled), how it transforms the input shape into +/// an output shape, and how it evaluates at runtime. The parser's +/// [pipeOpNames], the completer's shape-gated candidate filter, +/// [inferShape]'s per-op cases, and the evaluator's per-op dispatch all /// derive from these specs, so adding or renaming an op is a single- /// file change. /// @@ -25,34 +26,48 @@ /// kind and cross-checking with the evaluator. library; +import 'dart:convert'; + +import 'package:rumil_expressions/rumil_expressions.dart' + show compareValues, typeName; + import '../ast.dart'; +import '../errors.dart'; import 'shape.dart'; +/// Recursive evaluator callback. The spec's `eval` field invokes this to +/// evaluate sub-expressions (predicates, key extractors, transforms) +/// against a given context. Pipe ops do not import the evaluator +/// directly; they reach back through this callback to keep the +/// dependency direction acyclic. +typedef PipeOpEval = Object? Function(LamExpr expr, Object? ctx); + /// How the parser should build a grammar rule for this op. /// /// - [zeroArg]: bare keyword followed by a word boundary — `sort`, -/// `length`, `to_entries`. Constructed via the spec's `zeroArgCtor`. +/// `length`, `to_entries`. Parser builds `BuiltinPipeOp(name, [])`. /// - [oneArg]: keyword followed by `(expr)` with tolerant inner and -/// close paren — `filter(...)`, `map(...)`. Constructed via the -/// spec's `oneArgCtor`. +/// close paren — `filter(...)`, `map(...)`. Parser builds +/// `BuiltinPipeOp(name, [innerExpr])`. /// - [custom]: the op has grammar the generic rules cannot express -/// (e.g. `as(fmt)` takes a keyword set, not an arbitrary expression). -/// The parser hand-writes these rules and the spec table provides -/// metadata only. +/// (e.g. `as(fmt)` takes a closed keyword set, not an arbitrary +/// expression). The parser hand-writes these rules and the spec +/// provides shape metadata only; runtime dispatch lives outside +/// [BuiltinPipeOp] (see [As]). enum PipeOpParseKind { /// Bare keyword followed by a word boundary — `sort`, `length`, - /// `to_entries`. Parser builds `_kw(name).as(zeroArgCtor())`. + /// `to_entries`. Parser builds `BuiltinPipeOp(name, const [])`. zeroArg, /// Keyword followed by `(expr)` with a tolerant inner expression and /// close paren — `filter(.x)`, `map(.y)`, `sort_by(.name)`. Parser - /// builds `_paramOp(name, oneArgCtor)`. + /// builds `BuiltinPipeOp(name, [innerExpr])`. oneArg, /// Op has custom grammar not expressible as `zeroArg` or `oneArg`, /// e.g. `as(fmt)` takes a closed keyword set instead of an arbitrary - /// expression. Parser hand-writes the rule; the spec table supplies - /// only the name and shape-inference metadata. + /// expression. Parser hand-writes the rule; runtime dispatch lives + /// in a dedicated AST node (see [As]). custom, } @@ -60,70 +75,39 @@ enum PipeOpParseKind { /// /// The `accepts` field is a structural predicate on the input shape. /// The `infer` field is the shape transformer — given the input shape -/// and the AST node for this op (so parameterized ops can recurse -/// into their inner expression), it returns the output shape. -/// -/// The `parseKind`, `zeroArgCtor`, and `oneArgCtor` fields let the -/// parser build its pipe-op grammar rules from this table rather -/// than hand-writing them per op. A spec's `parseKind` determines -/// which constructor reference is consulted: +/// and the AST node for this op (so parameterized ops can recurse into +/// their inner expression), it returns the output shape. The `eval` +/// field is the runtime evaluator — given the input value, the parsed +/// argument expressions, and the recursive [PipeOpEval] callback, it +/// returns the op's result. /// -/// - `zeroArg` → `zeroArgCtor!()` produces the AST node. -/// - `oneArg` → `oneArgCtor!(innerExpr)` produces the AST node. -/// - `custom` → the parser handles it with a hand-written rule; both -/// ctor fields may be null. -/// -/// The `infer` function receives the op AST node itself, not the -/// surrounding [Pipe] — callers must destructure the specific op type -/// they expect. Since [PipeOpInfo] is looked up by AST runtime type, -/// the match is exhaustive at registration time. +/// `parseKind` tells the parser which generic rule shape this op uses. +/// `custom` ops are hand-written in the parser and use dedicated AST +/// nodes; their `eval` field is unreachable via the [BuiltinPipeOp] +/// dispatch and may be omitted. typedef PipeOpInfo = ({ String name, bool Function(Shape input) accepts, Shape Function(Shape input, LamExpr op) infer, + Object? Function(Object? ctx, List args, PipeOpEval eval) eval, PipeOpParseKind parseKind, - LamExpr Function()? zeroArgCtor, - LamExpr Function(LamExpr)? oneArgCtor, }); /// Look up the spec for a pipe-op AST node, or `null` if [node] is not /// a pipe op. /// -/// Returns `null` for non-op expressions that happen to appear on the -/// right-hand side of a pipe (object constructors, literals, etc.). -/// The shape inference and completer code paths that consume the spec -/// must handle `null` by falling back to generic expression inference. -PipeOpInfo? pipeOpInfoFor(LamExpr node) => switch (node) { - FilterOp _ => _filterSpec, - MapOp _ => _mapSpec, - SortOp _ => _sortSpec, - ReverseOp _ => _reverseSpec, - KeysOp _ => _keysSpec, - ValuesOp _ => _valuesSpec, - LengthOp _ => _lengthSpec, - FirstOp _ => _firstSpec, - LastOp _ => _lastSpec, - SumOp _ => _sumSpec, - AvgOp _ => _avgSpec, - MinOp _ => _minSpec, - MaxOp _ => _maxSpec, - SortByOp _ => _sortBySpec, - GroupByOp _ => _groupBySpec, - UniqueOp _ => _uniqueSpec, - UniqueByOp _ => _uniqueBySpec, - FlattenOp _ => _flattenSpec, - FilterValuesOp _ => _filterValuesSpec, - MapValuesOp _ => _mapValuesSpec, - FilterKeysOp _ => _filterKeysSpec, - HasOp _ => _hasSpec, - ToEntriesOp _ => _toEntriesSpec, - FromEntriesOp _ => _fromEntriesSpec, - ToNumberOp _ => _toNumberSpec, - TypeOp _ => _typeSpec, - As _ => _asSpec, - _ => null, -}; +/// Recognises the unified [BuiltinPipeOp] dispatch and the dedicated +/// [As] node (the only custom-arity op). Returns `null` for non-op +/// expressions that happen to appear on the right-hand side of a pipe +/// (object constructors, literals, etc.). The shape inference and +/// completer code paths that consume the spec must handle `null` by +/// falling back to generic expression inference. +PipeOpInfo? pipeOpInfoFor(LamExpr node) { + if (node is BuiltinPipeOp) return _specsByName[node.name]; + if (node is As) return _asSpec; + return null; +} /// Spec lookup by op name. Returns `null` for names that are not in /// the spec table. @@ -189,27 +173,26 @@ Shape inferPipeOpShape(Shape input, LamExpr op) { return info.infer(input, op); } -// Sentinel specs. Each is a one-shot record whose `infer` closes over -// its own op-type expectations; the AST parameter is destructured -// where needed (parameterized ops only). -// -// Conventions: -// - `identity` in `infer` means "this op does not change the shape" -// (e.g. `sort` on a list). Ops whose output shape equals the input -// shape by design use this. -// - For ops that only work on one shape kind, `accepts` is the -// positive predicate and `infer` can rely on the input matching. -// [inferPipeOpShape] has already gated on `accepts`, so the pattern -// match on `input` is exhaustive against the accepted cases. +/// Evaluate a built-in pipe op against [ctx]. +/// +/// Looks up the spec for [op]'s name and invokes its `eval` field with +/// the parsed argument expressions and the recursive evaluator +/// callback. Throws [QueryError] if [op]'s name has no registered +/// spec — that means the parser produced a node the table does not +/// know about, which is a programmer error rather than a user-input +/// error. +Object? evalBuiltinPipeOp(BuiltinPipeOp op, Object? ctx, PipeOpEval eval) { + final spec = _specsByName[op.name]; + if (spec == null) { + throw QueryError('${op.name}: no registered pipe-op spec'); + } + return spec.eval(ctx, op.args, eval); +} // Every predicate treats [SAny] as accepted. Inference cannot prove // an SAny input will fail at runtime, so rejecting it would hide // correct candidates from the completer — a violation of the // design invariant documented at the top of this file. -// -// Putting the SAny check inside every predicate, rather than at the -// call site, keeps the invariant a property of the spec table -// itself: any new spec defined via these helpers inherits it. // Optional wraps the value's potential absence. For acceptance // purposes, unwrap: if the inner shape is accepted, so is the @@ -246,6 +229,44 @@ bool _acceptsStringOrNum(Shape s) { bool _acceptsAny(Shape _) => true; +// ------------------------------------------------------------------ +// Runtime helpers shared across op evals. +// ------------------------------------------------------------------ + +List _asList(Object? v, String ctx) { + if (v is List) return v; + throw QueryError('$ctx: expected list, got ${typeName(v)}'); +} + +Map _asMap(Object? v, String ctx) { + if (v is Map) return v; + throw QueryError('$ctx: expected map, got ${typeName(v)}'); +} + +/// Canonical string representation of [value] for use as a hash key. +/// +/// Dart's native equality on `List` and `Map` is reference-based, so +/// structurally-equal collections compare as unequal. `unique`, +/// `unique_by`, and `group_by` need structural equality to behave +/// sensibly. Encoding the value as JSON with sorted map keys gives a +/// stable, equality-friendly key. +String _canonicalKey(Object? value) => jsonEncode(_sortKeys(value)); + +Object? _sortKeys(Object? value) { + if (value is Map) { + final sorted = {}; + final keys = value.keys.toList()..sort(); + for (final k in keys) { + sorted[k] = _sortKeys(value[k]); + } + return sorted; + } + if (value is List) { + return [for (final e in value) _sortKeys(e)]; + } + return value; +} + // --- List-consuming ops -------------------------------------------- final PipeOpInfo _filterSpec = ( @@ -253,9 +274,15 @@ final PipeOpInfo _filterSpec = ( accepts: _acceptsList, // `filter` preserves the list-of-elements shape. infer: (input, _) => input, + eval: (ctx, args, eval) { + final list = _asList(ctx, 'filter'); + final pred = args[0]; + return [ + for (final item in list) + if (eval(pred, item) == true) item, + ]; + }, parseKind: PipeOpParseKind.oneArg, - zeroArgCtor: null, - oneArgCtor: FilterOp.new, ); final PipeOpInfo _mapSpec = ( @@ -263,59 +290,87 @@ final PipeOpInfo _mapSpec = ( accepts: _acceptsList, infer: (input, op) => switch ((input, op)) { - (SList(element: final e), MapOp(:final transform)) => SList( - _inferSubExpr(transform, e), - ), + ( + SList(element: final e), + BuiltinPipeOp(name: 'map', args: [final transform]), + ) => + SList(_inferSubExpr(transform, e)), _ => const SAny(), }, + eval: (ctx, args, eval) { + final list = _asList(ctx, 'map'); + final transform = args[0]; + return [for (final item in list) eval(transform, item)]; + }, parseKind: PipeOpParseKind.oneArg, - zeroArgCtor: null, - oneArgCtor: MapOp.new, ); final PipeOpInfo _sortSpec = ( name: 'sort', accepts: _acceptsList, infer: (input, _) => input, + eval: (ctx, _, _) { + final list = List.of(_asList(ctx, 'sort')); + list.sort(compareValues); + return list; + }, parseKind: PipeOpParseKind.zeroArg, - zeroArgCtor: SortOp.new, - oneArgCtor: null, ); final PipeOpInfo _reverseSpec = ( name: 'reverse', accepts: _acceptsList, infer: (input, _) => input, + eval: (ctx, _, _) => List.of(_asList(ctx, 'reverse').reversed), parseKind: PipeOpParseKind.zeroArg, - zeroArgCtor: ReverseOp.new, - oneArgCtor: null, ); final PipeOpInfo _sortBySpec = ( name: 'sort_by', accepts: _acceptsList, infer: (input, _) => input, + eval: (ctx, args, eval) { + final list = List.of(_asList(ctx, 'sort_by')); + final key = args[0]; + list.sort((a, b) => compareValues(eval(key, a), eval(key, b))); + return list; + }, parseKind: PipeOpParseKind.oneArg, - zeroArgCtor: null, - oneArgCtor: SortByOp.new, ); final PipeOpInfo _uniqueSpec = ( name: 'unique', accepts: _acceptsList, infer: (input, _) => input, + // `unique` distinguishes int from double even when numerically + // equal: `unique([1, 1.0])` keeps both because the canonical + // encodings differ. Use `unique_by(.)` with `to_number` if numeric + // equality is required. + eval: (ctx, _, _) { + final list = _asList(ctx, 'unique'); + final seen = {}; + return [ + for (final item in list) + if (seen.add(_canonicalKey(item))) item, + ]; + }, parseKind: PipeOpParseKind.zeroArg, - zeroArgCtor: UniqueOp.new, - oneArgCtor: null, ); final PipeOpInfo _uniqueBySpec = ( name: 'unique_by', accepts: _acceptsList, infer: (input, _) => input, + eval: (ctx, args, eval) { + final list = _asList(ctx, 'unique_by'); + final key = args[0]; + final seen = {}; + return [ + for (final item in list) + if (seen.add(_canonicalKey(eval(key, item)))) item, + ]; + }, parseKind: PipeOpParseKind.oneArg, - zeroArgCtor: null, - oneArgCtor: UniqueByOp.new, ); final PipeOpInfo _flattenSpec = ( @@ -330,11 +385,24 @@ final PipeOpInfo _flattenSpec = ( SList() => const SList(SAny()), _ => const SAny(), }, + eval: (ctx, _, _) { + final list = _asList(ctx, 'flatten'); + return [ + for (final item in list) + if (item is List) ...item else item, + ]; + }, parseKind: PipeOpParseKind.zeroArg, - zeroArgCtor: FlattenOp.new, - oneArgCtor: null, ); +// Empty-list policy: +// - Operations with an identity element (sum -> 0) return that. +// - Operations that pick an existing element (first, last) return +// null. +// - Operations that compute a property of the elements (min, max, +// avg) throw, because no sensible result exists. +// +// This mirrors how most languages handle the same situations. final PipeOpInfo _firstSpec = ( name: 'first', accepts: _acceptsList, @@ -343,9 +411,11 @@ final PipeOpInfo _firstSpec = ( SList(element: final e) => e, _ => const SAny(), }, + eval: (ctx, _, _) { + final list = _asList(ctx, 'first'); + return list.isEmpty ? null : list.first; + }, parseKind: PipeOpParseKind.zeroArg, - zeroArgCtor: FirstOp.new, - oneArgCtor: null, ); final PipeOpInfo _lastSpec = ( @@ -356,27 +426,48 @@ final PipeOpInfo _lastSpec = ( SList(element: final e) => e, _ => const SAny(), }, + eval: (ctx, _, _) { + final list = _asList(ctx, 'last'); + return list.isEmpty ? null : list.last; + }, parseKind: PipeOpParseKind.zeroArg, - zeroArgCtor: LastOp.new, - oneArgCtor: null, ); final PipeOpInfo _sumSpec = ( name: 'sum', accepts: _acceptsList, infer: (_, _) => const SNum(), + eval: (ctx, _, _) { + final list = _asList(ctx, 'sum'); + num total = 0; + for (final item in list) { + if (item is! num) { + throw QueryError('sum: expected number, got ${typeName(item)}'); + } + total += item; + } + return total; + }, parseKind: PipeOpParseKind.zeroArg, - zeroArgCtor: SumOp.new, - oneArgCtor: null, ); final PipeOpInfo _avgSpec = ( name: 'avg', accepts: _acceptsList, infer: (_, _) => const SNum(), + eval: (ctx, _, _) { + final list = _asList(ctx, 'avg'); + if (list.isEmpty) throw const QueryError('avg: empty list'); + num total = 0; + for (final item in list) { + if (item is! num) { + throw QueryError('avg: expected number, got ${typeName(item)}'); + } + total += item; + } + return total.toDouble() / list.length; + }, parseKind: PipeOpParseKind.zeroArg, - zeroArgCtor: AvgOp.new, - oneArgCtor: null, ); final PipeOpInfo _minSpec = ( @@ -387,9 +478,16 @@ final PipeOpInfo _minSpec = ( SList(element: final e) => e, _ => const SAny(), }, + eval: (ctx, _, _) { + final list = _asList(ctx, 'min'); + if (list.isEmpty) throw const QueryError('min: empty list'); + var best = list.first; + for (var i = 1; i < list.length; i++) { + if (compareValues(list[i], best) < 0) best = list[i]; + } + return best; + }, parseKind: PipeOpParseKind.zeroArg, - zeroArgCtor: MinOp.new, - oneArgCtor: null, ); final PipeOpInfo _maxSpec = ( @@ -400,9 +498,16 @@ final PipeOpInfo _maxSpec = ( SList(element: final e) => e, _ => const SAny(), }, + eval: (ctx, _, _) { + final list = _asList(ctx, 'max'); + if (list.isEmpty) throw const QueryError('max: empty list'); + var best = list.first; + for (var i = 1; i < list.length; i++) { + if (compareValues(list[i], best) > 0) best = list[i]; + } + return best; + }, parseKind: PipeOpParseKind.zeroArg, - zeroArgCtor: MaxOp.new, - oneArgCtor: null, ); final PipeOpInfo _groupBySpec = ( @@ -415,9 +520,26 @@ final PipeOpInfo _groupBySpec = ( ), _ => const SAny(), }, + eval: (ctx, args, eval) { + final list = _asList(ctx, 'group_by'); + final key = args[0]; + // Group on a canonical string representation so structurally-equal + // Maps and Lists compare as equal. A side map preserves the + // original key value for the output record. + final groups = >{}; + final originalKeys = {}; + for (final item in list) { + final k = eval(key, item); + final canonical = _canonicalKey(k); + originalKeys[canonical] = k; + (groups[canonical] ??= []).add(item); + } + return [ + for (final entry in groups.entries) + {'key': originalKeys[entry.key], 'values': entry.value}, + ]; + }, parseKind: PipeOpParseKind.oneArg, - zeroArgCtor: null, - oneArgCtor: GroupByOp.new, ); final PipeOpInfo _fromEntriesSpec = ( @@ -427,9 +549,28 @@ final PipeOpInfo _fromEntriesSpec = ( // Callers that only need to know "is it a map" (e.g. TOML/HCL // writability) still get a correct answer. infer: (_, _) => const SMap({}), + // Non-map entries are rejected explicitly. Earlier silent skipping + // hid bugs where upstream pipelines emitted the wrong shape. + eval: (ctx, _, _) { + final list = _asList(ctx, 'from_entries'); + final result = {}; + for (final item in list) { + if (item is! Map) { + throw QueryError( + 'from_entries: entry must be a map, got ${typeName(item)}', + ); + } + final key = item['key']; + if (key is! String) { + throw QueryError( + 'from_entries: entry "key" must be a string, got ${typeName(key)}', + ); + } + result[key] = item['value']; + } + return result; + }, parseKind: PipeOpParseKind.zeroArg, - zeroArgCtor: FromEntriesOp.new, - oneArgCtor: null, ); // --- Map-consuming ops --------------------------------------------- @@ -438,9 +579,15 @@ final PipeOpInfo _filterValuesSpec = ( name: 'filter_values', accepts: _acceptsMap, infer: (input, _) => input, + eval: (ctx, args, eval) { + final map = _asMap(ctx, 'filter_values'); + final predicate = args[0]; + return { + for (final MapEntry(:key, :value) in map.entries) + if (eval(predicate, value) == true) key: value, + }; + }, parseKind: PipeOpParseKind.oneArg, - zeroArgCtor: null, - oneArgCtor: FilterValuesOp.new, ); final PipeOpInfo _mapValuesSpec = ( @@ -448,33 +595,61 @@ final PipeOpInfo _mapValuesSpec = ( accepts: _acceptsMap, infer: (input, op) => switch ((input, op)) { - (SMap(fields: final fields), MapValuesOp(:final transform)) => SMap({ - for (final MapEntry(:key, :value) in fields.entries) - key: _inferSubExpr(transform, value), - }), + ( + SMap(fields: final fields), + BuiltinPipeOp(name: 'map_values', args: [final transform]), + ) => + SMap({ + for (final MapEntry(:key, :value) in fields.entries) + key: _inferSubExpr(transform, value), + }), _ => const SAny(), }, + eval: (ctx, args, eval) { + final map = _asMap(ctx, 'map_values'); + final transform = args[0]; + return { + for (final MapEntry(:key, :value) in map.entries) + key: eval(transform, value), + }; + }, parseKind: PipeOpParseKind.oneArg, - zeroArgCtor: null, - oneArgCtor: MapValuesOp.new, ); final PipeOpInfo _filterKeysSpec = ( name: 'filter_keys', accepts: _acceptsMap, infer: (input, _) => input, + eval: (ctx, args, eval) { + final map = _asMap(ctx, 'filter_keys'); + final predicate = args[0]; + return { + for (final MapEntry(:key, :value) in map.entries) + if (eval(predicate, key) == true) key: value, + }; + }, parseKind: PipeOpParseKind.oneArg, - zeroArgCtor: null, - oneArgCtor: FilterKeysOp.new, ); final PipeOpInfo _hasSpec = ( name: 'has', accepts: _acceptsMap, infer: (_, _) => const SBool(), + eval: (ctx, args, eval) { + final key = args[0]; + if (ctx is Map) { + final k = eval(key, ctx); + if (k is String) return ctx.containsKey(k); + throw QueryError('has: key must be a string, got ${typeName(k)}'); + } + if (ctx is List) { + final k = eval(key, ctx); + if (k is num) return k.toInt() >= 0 && k.toInt() < ctx.length; + throw QueryError('has: index must be a number, got ${typeName(k)}'); + } + throw QueryError('has: expected map or list, got ${typeName(ctx)}'); + }, parseKind: PipeOpParseKind.oneArg, - zeroArgCtor: null, - oneArgCtor: HasOp.new, ); final PipeOpInfo _toEntriesSpec = ( @@ -490,9 +665,14 @@ final PipeOpInfo _toEntriesSpec = ( ), _ => const SAny(), }, + eval: (ctx, _, _) { + final map = _asMap(ctx, 'to_entries'); + return [ + for (final MapEntry(:key, :value) in map.entries) + {'key': key, 'value': value}, + ]; + }, parseKind: PipeOpParseKind.zeroArg, - zeroArgCtor: ToEntriesOp.new, - oneArgCtor: null, ); // --- List-or-map ops ----------------------------------------------- @@ -506,9 +686,14 @@ final PipeOpInfo _keysSpec = ( SList() => const SList(SNum()), _ => const SAny(), }, + eval: (ctx, _, _) { + if (ctx is Map) return ctx.keys.toList(); + if (ctx is List) { + return [for (var i = 0; i < ctx.length; i++) i]; + } + throw QueryError('keys: expected map or list, got ${typeName(ctx)}'); + }, parseKind: PipeOpParseKind.zeroArg, - zeroArgCtor: KeysOp.new, - oneArgCtor: null, ); final PipeOpInfo _valuesSpec = ( @@ -521,9 +706,12 @@ final PipeOpInfo _valuesSpec = ( SList() => input, _ => const SAny(), }, + eval: (ctx, _, _) { + if (ctx is Map) return ctx.values.toList(); + if (ctx is List) return ctx; + throw QueryError('values: expected map or list, got ${typeName(ctx)}'); + }, parseKind: PipeOpParseKind.zeroArg, - zeroArgCtor: ValuesOp.new, - oneArgCtor: null, ); // --- List, map, or string ------------------------------------------ @@ -532,9 +720,15 @@ final PipeOpInfo _lengthSpec = ( name: 'length', accepts: _acceptsListMapOrString, infer: (_, _) => const SNum(), + eval: (ctx, _, _) { + if (ctx is List) return ctx.length; + if (ctx is Map) return ctx.length; + if (ctx is String) return ctx.length; + throw QueryError( + 'length: expected list, map, or string, got ${typeName(ctx)}', + ); + }, parseKind: PipeOpParseKind.zeroArg, - zeroArgCtor: LengthOp.new, - oneArgCtor: null, ); // --- String or number ---------------------------------------------- @@ -543,9 +737,18 @@ final PipeOpInfo _toNumberSpec = ( name: 'to_number', accepts: _acceptsStringOrNum, infer: (_, _) => const SNum(), + eval: (ctx, _, _) { + if (ctx is num) return ctx; + if (ctx is String) { + final parsed = num.tryParse(ctx); + if (parsed != null) return parsed; + throw QueryError('to_number: cannot parse "$ctx" as a number'); + } + throw QueryError( + 'to_number: expected string or number, got ${typeName(ctx)}', + ); + }, parseKind: PipeOpParseKind.zeroArg, - zeroArgCtor: ToNumberOp.new, - oneArgCtor: null, ); // --- Universal ops ------------------------------------------------- @@ -554,9 +757,22 @@ final PipeOpInfo _typeSpec = ( name: 'type', accepts: _acceptsAny, infer: (_, _) => const SString(), + eval: + (ctx, _, _) => switch (ctx) { + null => 'null', + bool() => 'boolean', + num() => 'number', + String() => 'string', + List() => 'array', + Map() => 'object', + _ => + throw QueryError( + 'type: data contains a non-JSON value (${ctx.runtimeType}). ' + 'Lambé queries operate on JSON-shaped data — pass results of ' + 'parseInput, jsonDecode, or canonical literals.', + ), + }, parseKind: PipeOpParseKind.zeroArg, - zeroArgCtor: TypeOp.new, - oneArgCtor: null, ); /// `as(target)` is structurally universal: it accepts any shape and @@ -570,14 +786,17 @@ final PipeOpInfo _typeSpec = ( /// `parseKind` is `custom` because `as(fmt)` takes a keyword set /// (`json`, `yaml`, etc.) rather than an arbitrary expression — the /// generic `oneArg` rule cannot express that. The parser hand-writes -/// `_asOp` and this spec provides the name/shape metadata only. +/// `_asOp` and the runtime evaluator handles [As] directly. The `eval` +/// field here is unreachable via [BuiltinPipeOp] dispatch and is a +/// stub. final PipeOpInfo _asSpec = ( name: 'as', accepts: _acceptsAny, infer: (_, _) => const SAny(), + eval: + (_, _, _) => + throw const QueryError('as: dispatched outside BuiltinPipeOp'), parseKind: PipeOpParseKind.custom, - zeroArgCtor: null, - oneArgCtor: null, ); // ------------------------------------------------------------------ diff --git a/pubspec.yaml b/pubspec.yaml index f9b5d2a..97c8994 100644 --- a/pubspec.yaml +++ b/pubspec.yaml @@ -19,6 +19,7 @@ dependencies: rumil: ^0.7.0 rumil_parsers: ^0.7.0 rumil_expressions: ^0.7.0 + rumil_tokens: ^0.1.0 args: ^2.6.0 dart_mcp: ^0.5.0 diff --git a/test/evaluator_test.dart b/test/evaluator_test.dart index e67e49e..546ddbb 100644 --- a/test/evaluator_test.dart +++ b/test/evaluator_test.dart @@ -623,10 +623,7 @@ void main() { }); test('last fallback wins when all are null', () { - expect( - query('.a // .b // "default"', {'a': null, 'b': null}), - 'default', - ); + expect(query('.a // .b // "default"', {'a': null, 'b': null}), 'default'); }); test('right expression not evaluated when left is non-null', () { diff --git a/test/ndjson_test.dart b/test/ndjson_test.dart index 41abf94..f2c9232 100644 --- a/test/ndjson_test.dart +++ b/test/ndjson_test.dart @@ -177,4 +177,31 @@ void main() { ]); }); }); + + group('queryNdjsonString: string-expression convenience', () { + test('parses once, applies to every line', () { + final results = + queryNdjsonString([ + '{"name": "alice"}', + '{"name": "bob"}', + ], '.name').toList(); + expect(results, ['alice', 'bob']); + }); + + test('expression syntax error throws QueryError', () { + expect( + () => queryNdjsonString(['{"a": 1}'], '.a |').toList(), + throwsA(isA()), + ); + }); + + test('per-line errors carry line number', () { + expect( + () => queryNdjsonString(['{"a": 1}', 'not json'], '.a').toList(), + throwsA( + predicate((e) => e is QueryError && e.message.contains('line 2')), + ), + ); + }); + }); } diff --git a/test/normalize_test.dart b/test/normalize_test.dart index 1ded2fc..fac469b 100644 --- a/test/normalize_test.dart +++ b/test/normalize_test.dart @@ -141,4 +141,28 @@ void main() { expect(eval(ast, data), 'Bob'); }); }); + + group('canonical inputs short-circuit', () { + // Already-canonical inputs (Map, List, + // scalars) must pass through query() without any per-element rebuild + // — so `.identity` returns the same object, not a fresh copy. + test('canonical map is returned identical', () { + final data = {'a': 1, 'b': 'two'}; + expect(identical(query('.', data), data), isTrue); + }); + + test('canonical list is returned identical', () { + final data = [1, 2, 3]; + expect(identical(query('.', data), data), isTrue); + }); + + test('nested canonical map of list is returned identical', () { + final data = { + 'users': [ + {'name': 'Alice'}, + ], + }; + expect(identical(query('.', data), data), isTrue); + }); + }); } diff --git a/test/parse_error_format_test.dart b/test/parse_error_format_test.dart index fda3a35..6dd30f4 100644 --- a/test/parse_error_format_test.dart +++ b/test/parse_error_format_test.dart @@ -234,7 +234,6 @@ void main() { } }); - test('| if as pipe stage explains the expression-only rule', () { try { parseAst('.x | if . > 0 then . else null end'); diff --git a/test/parser_test.dart b/test/parser_test.dart index ea18c2a..17d8af6 100644 --- a/test/parser_test.dart +++ b/test/parser_test.dart @@ -11,6 +11,11 @@ LamExpr _parse(String input) { }; } +void _expectOp(LamExpr op, String name) { + expect(op, isA()); + expect((op as BuiltinPipeOp).name, name); +} + void main() { group('Atoms', () { test('identity (.)', () { @@ -215,7 +220,22 @@ void main() { final expr = _parse('.x | tonumber'); expect(expr, isA()); final pipe = expr as Pipe; - expect(pipe.op, isA()); + expect(pipe.op, isA()); + expect((pipe.op as BuiltinPipeOp).name, 'to_number'); + }); + + test('`add` parses as sum (jq alias)', () { + // jq's `add` reduces a list of numbers to their sum, matching + // Lambé's `sum` exactly. Aliased so jq-trained agents land the + // right idiom; the AST and `--explain` output use the canonical + // name. + final viaAlias = _parse('.x | add') as Pipe; + final viaCanonical = _parse('.x | sum') as Pipe; + _expectOp(viaAlias.op, 'sum'); + _expectOp(viaCanonical.op, 'sum'); + // Args identical (both empty) on both sides — same canonical AST. + expect((viaAlias.op as BuiltinPipeOp).args, isEmpty); + expect((viaCanonical.op as BuiltinPipeOp).args, isEmpty); }); test('`and` precedence: .a or .b and .c == .a or (.b and .c)', () { @@ -313,8 +333,9 @@ void main() { final expr = _parse('.users | map([.name, .age])'); expect(expr, isA()); final pipe = expr as Pipe; - expect(pipe.op, isA()); - expect((pipe.op as MapOp).transform, isA()); + expect(pipe.op, isA()); + expect((pipe.op as BuiltinPipeOp).name, 'map'); + expect((pipe.op as BuiltinPipeOp).args[0], isA()); }); }); @@ -324,8 +345,8 @@ void main() { expect(expr, isA()); final pipe = expr as Pipe; expect(pipe.input, isA()); - expect(pipe.op, isA()); - final pred = (pipe.op as FilterOp).predicate; + _expectOp(pipe.op, 'filter'); + final pred = (pipe.op as BuiltinPipeOp).args[0]; expect(pred, isA()); expect((pred as BinaryOp).op, '>'); }); @@ -334,21 +355,21 @@ void main() { final expr = _parse('.users | map(.name)'); expect(expr, isA()); final pipe = expr as Pipe; - expect(pipe.op, isA()); - expect((pipe.op as MapOp).transform, isA()); + _expectOp(pipe.op, 'map'); + expect((pipe.op as BuiltinPipeOp).args[0], isA()); }); test('chained: .users | filter(.active) | map(.name) | sort', () { final expr = _parse('.users | filter(.active) | map(.name) | sort'); expect(expr, isA()); final sort = expr as Pipe; - expect(sort.op, isA()); + _expectOp(sort.op, 'sort'); expect(sort.input, isA()); final map = sort.input as Pipe; - expect(map.op, isA()); + _expectOp(map.op, 'map'); expect(map.input, isA()); final filter = map.input as Pipe; - expect(filter.op, isA()); + _expectOp(filter.op, 'filter'); expect(filter.input, isA()); }); @@ -357,43 +378,43 @@ void main() { expect(expr, isA()); final pipe = expr as Pipe; expect(pipe.input, isA()); - expect(pipe.op, isA()); + _expectOp(pipe.op, 'keys'); }); test('. | values', () { final expr = _parse('. | values'); expect(expr, isA()); - expect((expr as Pipe).op, isA()); + _expectOp((expr as Pipe).op, 'values'); }); test('. | length', () { final expr = _parse('. | length'); expect(expr, isA()); - expect((expr as Pipe).op, isA()); + _expectOp((expr as Pipe).op, 'length'); }); test('. | sort', () { final expr = _parse('. | sort'); expect(expr, isA()); - expect((expr as Pipe).op, isA()); + _expectOp((expr as Pipe).op, 'sort'); }); test('. | reverse', () { final expr = _parse('. | reverse'); expect(expr, isA()); - expect((expr as Pipe).op, isA()); + _expectOp((expr as Pipe).op, 'reverse'); }); test('. | first', () { final expr = _parse('. | first'); expect(expr, isA()); - expect((expr as Pipe).op, isA()); + _expectOp((expr as Pipe).op, 'first'); }); test('. | last', () { final expr = _parse('. | last'); expect(expr, isA()); - expect((expr as Pipe).op, isA()); + _expectOp((expr as Pipe).op, 'last'); }); }); @@ -423,7 +444,7 @@ void main() { test('named ops still parse as before', () { final expr = _parse('. | filter(.age > 30)'); expect(expr, isA()); - expect((expr as Pipe).op, isA()); + _expectOp((expr as Pipe).op, 'filter'); }); }); @@ -476,14 +497,14 @@ void main() { final expr = _parse('.users | filter(.tags | length > 0)'); expect(expr, isA()); final pipe = expr as Pipe; - expect(pipe.op, isA()); - final pred = (pipe.op as FilterOp).predicate; + _expectOp(pipe.op, 'filter'); + final pred = (pipe.op as BuiltinPipeOp).args[0]; expect(pred, isA()); final gt = pred as BinaryOp; expect(gt.op, '>'); expect(gt.left, isA()); final inner = gt.left as Pipe; - expect(inner.op, isA()); + _expectOp(inner.op, 'length'); expect(inner.input, isA()); }); diff --git a/test/pipe_ops_consistency_test.dart b/test/pipe_ops_consistency_test.dart index 54e8653..f4e9622 100644 --- a/test/pipe_ops_consistency_test.dart +++ b/test/pipe_ops_consistency_test.dart @@ -35,38 +35,49 @@ final _representatives = { /// AST node to evaluate for each op. Parameterized ops use a minimal /// inner expression — `Identity()` where the evaluator just passes -/// through, and a string literal for `HasOp` which needs a key. -LamExpr _opNode(String name) => switch (name) { - 'filter' => const FilterOp(BoolLit(true)), - 'map' => const MapOp(Identity()), - 'sort' => const SortOp(), - 'reverse' => const ReverseOp(), - 'keys' => const KeysOp(), - 'values' => const ValuesOp(), - 'length' => const LengthOp(), - 'first' => const FirstOp(), - 'last' => const LastOp(), - 'sum' => const SumOp(), - 'avg' => const AvgOp(), - 'min' => const MinOp(), - 'max' => const MaxOp(), - 'sort_by' => const SortByOp(Identity()), - 'group_by' => const GroupByOp(Identity()), - 'unique' => const UniqueOp(), - 'unique_by' => const UniqueByOp(Identity()), - 'flatten' => const FlattenOp(), - 'filter_values' => const FilterValuesOp(BoolLit(true)), - 'map_values' => const MapValuesOp(Identity()), - 'filter_keys' => const FilterKeysOp(BoolLit(true)), - 'has' => const HasOp(StrLit('a')), - 'to_entries' => const ToEntriesOp(), - 'from_entries' => const FromEntriesOp(), - 'to_number' => const ToNumberOp(), - 'type' => const TypeOp(), - // `as(json)` is universal; every shape is writable as JSON. - 'as' => const As(OutputFormat.json), - _ => throw StateError('No test AST for op "$name"'), -}; +/// through, `BoolLit(true)` for filter predicates, and a string literal +/// for `has` which needs a key. After the pipe-op AST consolidation, +/// every built-in op resolves to a [BuiltinPipeOp]; the only exception +/// is `as(...)`, which keeps a dedicated AST class for its typed +/// argument. +LamExpr _opNode(String name) { + switch (name) { + case 'as': + return const As(OutputFormat.json); + case 'filter': + case 'filter_values': + case 'filter_keys': + return BuiltinPipeOp(name, const [BoolLit(true)]); + case 'has': + return BuiltinPipeOp(name, const [StrLit('a')]); + case 'map': + case 'map_values': + case 'sort_by': + case 'group_by': + case 'unique_by': + return BuiltinPipeOp(name, const [Identity()]); + case 'sort': + case 'reverse': + case 'keys': + case 'values': + case 'length': + case 'first': + case 'last': + case 'sum': + case 'avg': + case 'min': + case 'max': + case 'unique': + case 'flatten': + case 'to_entries': + case 'from_entries': + case 'to_number': + case 'type': + return BuiltinPipeOp(name, const []); + default: + throw StateError('No test AST for op "$name"'); + } +} /// Runtime outcome of evaluating an op against a representative value /// of some shape. @@ -214,58 +225,26 @@ void main() { } }); - test('zeroArg specs have a zeroArgCtor', () { - // The parser iterates over pipeOpSpecs and dereferences - // zeroArgCtor / oneArgCtor based on parseKind. Missing a ctor - // where one is required would be a runtime null-deref in - // parser initialization — catch it here with a clearer error. - for (final spec in pipeOpSpecs) { - if (spec.parseKind == PipeOpParseKind.zeroArg) { - expect( - spec.zeroArgCtor, - isNotNull, - reason: - '${spec.name}.zeroArgCtor must be set when ' - 'parseKind is zeroArg', - ); - } - } - }); - - test('oneArg specs have a oneArgCtor', () { - for (final spec in pipeOpSpecs) { - if (spec.parseKind == PipeOpParseKind.oneArg) { - expect( - spec.oneArgCtor, - isNotNull, - reason: - '${spec.name}.oneArgCtor must be set when ' - 'parseKind is oneArg', - ); - } - } - }); - - test('spec ctor output matches pipeOpInfoFor lookup', () { - // Round-trip: the AST produced by a spec's ctor must map back - // to the same spec via pipeOpInfoFor. This pins the ctor and - // the AST-subtype switch together — renaming one without the - // other fails here. + test('BuiltinPipeOp(name) round-trips through pipeOpInfoFor', () { + // Round-trip: building a [BuiltinPipeOp] with a spec's name and + // looking it up via pipeOpInfoFor must yield the same spec. This + // pins the unified dispatch — renaming a spec or breaking the + // name lookup fails here. for (final spec in pipeOpSpecs) { - final LamExpr? node = switch (spec.parseKind) { - PipeOpParseKind.zeroArg => spec.zeroArgCtor!(), - PipeOpParseKind.oneArg => spec.oneArgCtor!(const Identity()), - PipeOpParseKind.custom => null, - }; - if (node == null) continue; + if (spec.parseKind == PipeOpParseKind.custom) continue; + final args = + spec.parseKind == PipeOpParseKind.oneArg + ? const [Identity()] + : const []; + final node = BuiltinPipeOp(spec.name, args); final resolved = pipeOpInfoFor(node); expect( resolved?.name, spec.name, reason: - 'Ctor for ${spec.name} produced an AST that ' - 'pipeOpInfoFor resolved to "${resolved?.name}" instead. ' - 'Ensure the new AST subtype is wired into pipeOpInfoFor.', + 'BuiltinPipeOp("${spec.name}", ...) resolved to ' + '"${resolved?.name}" instead. Ensure the spec is ' + 'registered in _specsByName.', ); } }); diff --git a/test/ring4_test.dart b/test/ring4_test.dart index 6af6ca4..dad722e 100644 --- a/test/ring4_test.dart +++ b/test/ring4_test.dart @@ -37,7 +37,8 @@ void main() { test('parse structure', () { final expr = _parse('. | sort_by(.name)'); expect(expr, isA()); - expect((expr as Pipe).op, isA()); + expect((expr as Pipe).op, isA()); + expect(((expr).op as BuiltinPipeOp).name, 'sort_by'); }); }); @@ -75,7 +76,8 @@ void main() { test('parse structure', () { final expr = _parse('. | group_by(.type)'); expect(expr, isA()); - expect((expr as Pipe).op, isA()); + expect((expr as Pipe).op, isA()); + expect(((expr).op as BuiltinPipeOp).name, 'group_by'); }); }); diff --git a/test/ring5_test.dart b/test/ring5_test.dart index 97c3152..2589d91 100644 --- a/test/ring5_test.dart +++ b/test/ring5_test.dart @@ -187,7 +187,8 @@ void main() { test('parse structure', () { final expr = _parse('. | has("name")'); expect(expr, isA()); - expect((expr as Pipe).op, isA()); + expect((expr as Pipe).op, isA()); + expect(((expr).op as BuiltinPipeOp).name, 'has'); }); }); @@ -237,7 +238,8 @@ void main() { test('parse structure', () { final expr = _parse('. | to_entries'); expect(expr, isA()); - expect((expr as Pipe).op, isA()); + expect((expr as Pipe).op, isA()); + expect(((expr).op as BuiltinPipeOp).name, 'to_entries'); }); }); diff --git a/test/string_indexing_test.dart b/test/string_indexing_test.dart new file mode 100644 index 0000000..ac8c4d0 --- /dev/null +++ b/test/string_indexing_test.dart @@ -0,0 +1,57 @@ +/// Pins string single-char indexing semantics. +/// +/// Pre-0.9.0 string slicing (`.name[0:3]`) worked but single-char +/// indexing (`.name[0]`) threw `Cannot index string`. The asymmetry was +/// gratuitous; 0.9.0 mirrors slice semantics — out-of-range returns +/// null (same as list indexing), non-int still throws. +library; + +import 'package:lambe/lambe.dart'; +import 'package:test/test.dart'; + +void main() { + group('string single-char indexing', () { + test('first char', () { + expect(query('.name[0]', {'name': 'alice'}), 'a'); + }); + + test('middle char', () { + expect(query('.name[2]', {'name': 'alice'}), 'i'); + }); + + test('last char via -1', () { + expect(query('.name[-1]', {'name': 'alice'}), 'e'); + }); + + test('negative offset within range', () { + expect(query('.name[-3]', {'name': 'alice'}), 'i'); + }); + + test('out of range returns null', () { + expect(query('.name[10]', {'name': 'alice'}), null); + }); + + test('negative out of range returns null', () { + expect(query('.name[-99]', {'name': 'alice'}), null); + }); + + test('empty string is always out of range', () { + expect(query('.name[0]', {'name': ''}), null); + }); + + test('non-int index still throws', () { + expect( + () => query('.name["a"]', {'name': 'alice'}), + throwsA(isA()), + ); + }); + + test('slice still works (regression check)', () { + expect(query('.name[0:3]', {'name': 'alice'}), 'ali'); + }); + + test('slice and index compose', () { + expect(query('.name[0:3] | .[1]', {'name': 'alice'}), 'l'); + }); + }); +} diff --git a/test/tsv_headers_test.dart b/test/tsv_headers_test.dart new file mode 100644 index 0000000..91fa751 --- /dev/null +++ b/test/tsv_headers_test.dart @@ -0,0 +1,83 @@ +/// Pins TSV header detection to match CSV semantics. +/// +/// 0.8.0 always returned `List>` for TSV regardless of +/// whether the first row looked like headers — a silent inconsistency +/// vs the documented CSV model. 0.9.0 runs `detectDialect` for TSV with +/// the tab delimiter forced, so a header row produces +/// `List>` like CSV does. +library; + +import 'package:lambe/lambe.dart'; +import 'package:test/test.dart'; + +void main() { + group('TSV header detection mirrors CSV', () { + test('header row produces List', () { + const tsv = + 'name\tage\tcity\n' + 'alice\t30\tboston\n' + 'bob\t25\tseattle\n'; + final result = parseInput(tsv, Format.tsv); + expect(result, isA>()); + final rows = result as List; + expect(rows, hasLength(2)); + expect(rows[0], isA>()); + expect(rows[0], {'name': 'alice', 'age': '30', 'city': 'boston'}); + expect(rows[1], {'name': 'bob', 'age': '25', 'city': 'seattle'}); + }); + + test('all-numeric data (no header) returns List', () { + const tsv = + '1\t2\t3\n' + '4\t5\t6\n' + '7\t8\t9\n'; + final result = parseInput(tsv, Format.tsv); + expect(result, isA>()); + final rows = result as List; + expect(rows, hasLength(3)); + expect(rows[0], isA>()); + expect(rows[0], ['1', '2', '3']); + }); + + test('mixed numeric/string in row 1 does NOT false-detect a header', () { + // detectDialect's heuristic: header iff row1 is all non-numeric AND + // row2 has at least one numeric. A row1 with a number is not a + // header. + const tsv = + 'alice\t30\tboston\n' + 'bob\t25\tseattle\n'; + final result = parseInput(tsv, Format.tsv); + expect(result, isA>()); + final rows = result as List; + expect(rows[0], isA>()); + }); + + test('quoted fields parse correctly under header detection', () { + // Header detection requires row2 to have at least one numeric + // field. With age=30 the heuristic fires and quoted fields in + // headers + data must round-trip through parseDelimitedWithHeaders. + const tsv = + 'name\tage\n' + '"alice, smith"\t30\n'; + final result = parseInput(tsv, Format.tsv); + expect(result, isA>()); + final rows = result as List; + expect(rows, hasLength(1)); + expect(rows[0], isA>()); + final row0 = rows[0] as Map; + expect(row0['name'], 'alice, smith'); + expect(row0['age'], '30'); + }); + + test('CSV with same logical content also returns List', () { + // Sanity: the docstring promise is "TSV honors headers the same + // way CSV does". Pin both side by side. + const csv = + 'name,age,city\n' + 'alice,30,boston\n'; + final csvResult = parseInput(csv, Format.csv); + expect(csvResult, isA>()); + expect((csvResult as List)[0], isA>()); + }); + }); +} From ee83616629d530388d06ca85fe061226916327da Mon Sep 17 00:00:00 2001 From: Hakim Jonas Ghoula Date: Thu, 21 May 2026 20:22:03 +0200 Subject: [PATCH 29/67] chore: gitignore lam binary; fix to_entries doc example output The `lam` AOT binary built into the repo root was tracked as untracked. Now matches the `lam-mcp` entry just below. The to_entries example in doc/syntax.md showed compact single-line output (`-> [{"key": ...}]`) but real `lam` defaults to pretty-printed JSON. Rewrote the example as a runnable echo/lam invocation matching the real output, consistent with the Tier A doc rewrites. Implementer surfaced both as out-of-scope-but-flagged after Tier A landed. --- .gitignore | 1 + doc/syntax.md | 13 +++++++++++-- 2 files changed, 12 insertions(+), 2 deletions(-) diff --git a/.gitignore b/.gitignore index 6deb7f3..1f64299 100644 --- a/.gitignore +++ b/.gitignore @@ -13,6 +13,7 @@ doc/api/ Thumbs.db # Compiled binaries +lam lam-mcp # Local dependency overrides diff --git a/doc/syntax.md b/doc/syntax.md index 1517f5f..51180f2 100644 --- a/doc/syntax.md +++ b/doc/syntax.md @@ -409,8 +409,17 @@ Check if a map contains a key. Convert between maps and `[{key, value}]` lists. ``` -.config.database | to_entries --> [{"key": "host", "value": "localhost"}, {"key": "port", "value": 5432}] +$ echo '{"config":{"database":{"host":"localhost","port":5432}}}' | lam '.config.database | to_entries' +[ + { + "key": "host", + "value": "localhost" + }, + { + "key": "port", + "value": 5432 + } +] $ echo '[{"key": "a", "value": 1}]' | lam '. | from_entries' { From d2e07e22aaa7c2e6d9534cd191b6718a0257f2f1 Mon Sep 17 00:00:00 2001 From: Hakim Jonas Ghoula Date: Thu, 21 May 2026 21:26:56 +0200 Subject: [PATCH 30/67] feat(cli): --print-shape composes with query expression MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit `lam --print-shape '.users' data.json` now prints the schema of the result of evaluating `.users` rather than the schema of the whole document. Pre-0.9.0 the expression was silently ignored. Composes with the existing inferShape / renderJsonSchema machinery and mirrors --explain's no-data fallback (infer against SAny when no data is available). The single-positional case is disambiguated by file existence: if rest[0] is an existing file, treat it as the file (legacy form); otherwise treat it as an expression. Plain identifier filenames aren't valid lambé queries either, so the collision case is vanishingly unlikely. Empty stdin in static-analysis modes (--explain, --print-shape) is now treated as "no data" rather than triggering a JSON parse error. Tests: 4 new cases pinning compose, legacy, no-data, and null-result behaviours. 1516 -> 1520 tests pass. --- bin/lam.dart | 78 ++++++++++++++++++++++++++++++---- test/cli_integration_test.dart | 66 ++++++++++++++++++++++++++++ 2 files changed, 135 insertions(+), 9 deletions(-) diff --git a/bin/lam.dart b/bin/lam.dart index a7932cb..7b8ad76 100644 --- a/bin/lam.dart +++ b/bin/lam.dart @@ -139,9 +139,27 @@ void main(List arguments) { exit(1); } - final expression = rest.isNotEmpty ? rest[0] : '.'; - final fileArgIndex = - (isPrintShapeMode || isInteractive) && rest.length == 1 ? 0 : 1; + final int fileArgIndex; + if (isInteractive && rest.length == 1) { + fileArgIndex = 0; + } else if (isPrintShapeMode && rest.length == 1) { + // --print-shape is overloaded: a single positional may be either + // a file (legacy form: `lam --print-shape data.json`) or an + // expression (compose form: `lam --print-shape '.users'` with + // piped or no data). Disambiguate by file existence — if rest[0] + // names an existing file, treat it as the file; otherwise treat + // it as an expression. The collision case (a file whose name + // happens to be a valid lambé expression like `.users`) is + // vanishingly unlikely; plain identifier filenames aren't valid + // queries either. + fileArgIndex = File(rest[0]).existsSync() ? 0 : 1; + } else { + fileArgIndex = 1; + } + // The expression sits at rest[0] when fileArgIndex isn't 0; when + // fileArgIndex == 0 the user gave a file but no expression, so the + // identity expression is the right default. + final expression = (rest.isNotEmpty && fileArgIndex != 0) ? rest[0] : '.'; // Auto-enable ndjson mode when the file extension suggests it, even // without an explicit --ndjson flag. Consistent with the existing @@ -197,9 +215,11 @@ void main(List arguments) { } input = file.readAsStringSync(); } else if (stdin.hasTerminal) { - // `--explain` performs static analysis and can run without input. - // Every other mode requires a file argument or piped stdin. - if (!isExplainMode) { + // `--explain` and `--print-shape` perform static analysis and can + // run without input — `--print-shape EXPR` falls back to inferring + // from SAny, mirroring the explain-without-data flow. Every other + // mode requires a file argument or piped stdin. + if (!isExplainMode && !isPrintShapeMode) { stderr.writeln('Error: no input. Provide a file or pipe data via stdin.'); stderr.writeln(); _usage(argParser); @@ -211,7 +231,14 @@ void main(List arguments) { while ((line = stdin.readLineSync()) != null) { buffer.writeln(line); } - input = buffer.toString(); + // Empty stdin in static-analysis modes (--explain, --print-shape): + // treat as "no data" rather than trying to parse the empty string + // as JSON. This matches the no-stdin branch's contract. + if (buffer.isEmpty && (isExplainMode || isPrintShapeMode)) { + input = null; + } else { + input = buffer.toString(); + } } // Determine input format (only relevant when we have input). @@ -256,6 +283,11 @@ void main(List arguments) { } // --print-shape mode: emit the inferred shape as JSON Schema. + // Composes with the query expression — `lam --print-shape '.users' + // data.json` prints the shape of the result of evaluating `.users`, + // not the whole document. With no expression, prints the document + // shape (the legacy 0.8.0 form). Without data, falls back to + // inferShape against SAny — same as `--explain` without data. if (isPrintShapeMode) { if (schemaPath != null) { stderr.writeln( @@ -264,8 +296,36 @@ void main(List arguments) { ); exit(1); } - final shape = data == null ? const SAny() : shapeOf(data); - stdout.writeln(renderJsonSchema(shape)); + // No expression: print the document shape directly. + final hasExpression = rest.isNotEmpty && fileArgIndex != 0; + if (!hasExpression) { + final shape = data == null ? const SAny() : shapeOf(data); + stdout.writeln(renderJsonSchema(shape)); + return; + } + final LamExpr ast; + try { + ast = parseAst(expression); + } on QueryError catch (e) { + stderr.writeln('Error: ${e.message}'); + exit(1); + } + final Shape resultShape; + if (data == null) { + // Mirror --explain-without-data: infer shape statically against + // the empty-prior SAny. The user gets the static shape of the + // query, the same answer --explain would give. + resultShape = inferShape(ast, const SAny()); + } else { + try { + final result = evaluateAst(ast, data); + resultShape = shapeOf(result); + } on QueryError catch (e) { + stderr.writeln('Error: ${e.message}'); + exit(1); + } + } + stdout.writeln(renderJsonSchema(resultShape)); return; } diff --git a/test/cli_integration_test.dart b/test/cli_integration_test.dart index 935f858..9bb5c6f 100644 --- a/test/cli_integration_test.dart +++ b/test/cli_integration_test.dart @@ -493,6 +493,72 @@ void main() { ); }); + test('composes with EXPR: shape of evaluated result', () async { + // `--print-shape '.users' data.json` returns the schema of the + // users array, not the schema of the whole document. Pre-0.9.0 + // (when this composed) the expression was silently ignored. + final file = File('${tmp.path}/data.json')..writeAsStringSync( + '{"users":[{"name":"alice","age":30}],"version":"1.0.0"}', + ); + final (code, out, _) = await _runLam([ + '--print-shape', + '.users', + file.path, + ]); + expect(code, 0); + final parsed = jsonDecode(out) as Map; + expect(parsed['type'], 'array'); + // items reflect a user, not the whole doc. + final items = parsed['items'] as Map; + expect(items['type'], 'object'); + final props = items['properties'] as Map; + expect(props.keys, containsAll(['name', 'age'])); + }); + + test('no expression form unchanged (legacy)', () async { + // `--print-shape data.json` (single positional that's a file) + // continues to print the whole-document shape, matching the + // 0.8.0 -> 0.9.0 contract. + final file = File('${tmp.path}/data.json') + ..writeAsStringSync('{"a":1,"b":"x"}'); + final (code, out, _) = await _runLam(['--print-shape', file.path]); + expect(code, 0); + final parsed = jsonDecode(out) as Map; + expect(parsed['type'], 'object'); + expect( + (parsed['properties'] as Map).keys, + containsAll(['a', 'b']), + ); + }); + + test('EXPR with no data: matches --explain-without-data', () async { + // `lam --print-shape '.users'` (no file, no piped stdin) + // infers statically from SAny. Because . | .users on SAny + // resolves to SAny, the rendered schema is the empty + // (any-typed) schema. + final (code, out, _) = await _runLam(['--print-shape', '.users']); + expect(code, 0); + // Non-empty output, valid JSON. + expect(out.trim(), isNotEmpty); + final parsed = jsonDecode(out); + expect(parsed, isA>()); + }); + + test('EXPR result is null: schema is the null/empty form', () async { + // .field-that-does-not-exist evaluates to null; shapeOf(null) + // is SNull. The renderer must produce valid JSON Schema for + // that case rather than crashing. + final file = File('${tmp.path}/data.json')..writeAsStringSync('{"a":1}'); + final (code, out, _) = await _runLam([ + '--print-shape', + '.missing', + file.path, + ]); + expect(code, 0); + final parsed = jsonDecode(out); + expect(parsed, isA>()); + }); + test('rejects combination with --schema (redundant)', () async { final data = File('${tmp.path}/d.json')..writeAsStringSync('{}'); final schema = File('${tmp.path}/s.json') From 9da554713452915226da028665bb541d5684b19c Mon Sep 17 00:00:00 2001 From: Hakim Jonas Ghoula Date: Thu, 21 May 2026 21:28:53 +0200 Subject: [PATCH 31/67] fix(explain): suppress writability when runtime-rejection fires MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit When a pipe op's input shape is provably incompatible (e.g. `.config | flatten` on a map), the rejection warning fires AND the post-stage shape widens to SAny. SAny passes canWriteAs for every format, so the explain renderer would print `Writable as: json, yaml, toml, csv, tsv, hcl` for a pipeline that will throw before any writer runs. Suppress both writability lists in that case. Text rendering shows `Writable as: (suppressed — runtime-rejection warning above)`. JSON rendering sets `writable_as` and `not_writable_as` to null. Other warning kinds (emptyFilter, trivialResult) don't suppress — emptyFilter pipelines run to completion just with empty results. Tests: 4 new cases (text suppress, text emptyFilter unaffected, JSON suppress, JSON clean pipeline keeps lists). 1520 -> 1524. --- lib/src/shape/explain.dart | 50 ++++++++++++++++++------- test/shape_explain_test.dart | 72 ++++++++++++++++++++++++++++++++++++ 2 files changed, 109 insertions(+), 13 deletions(-) diff --git a/lib/src/shape/explain.dart b/lib/src/shape/explain.dart index a046e75..f7aedce 100644 --- a/lib/src/shape/explain.dart +++ b/lib/src/shape/explain.dart @@ -465,17 +465,27 @@ String renderExplain(ExplainReport report) { } buf.write('\n'); - if (report.writableAs.isNotEmpty) { - buf.write( - 'Writable as: ${report.writableAs.map((f) => f.name).join(", ")}', - ); - buf.write('\n'); - } - if (report.notWritableAs.isNotEmpty) { - buf.write( - 'Not writable as: ${report.notWritableAs.map((f) => f.name).join(", ")}', - ); - buf.write('\n'); + // When a runtime-rejection warning is present earlier in the + // pipeline, the writability lists are misleading: the pipeline will + // throw before any writer runs, but inferShape widens the post- + // rejection shape to SAny, which makes every format pass canWriteAs. + // Suppress the section and surface why. + if (_hasRuntimeRejection(report)) { + buf.write('Writable as: (suppressed — runtime-rejection warning above)\n'); + } else { + if (report.writableAs.isNotEmpty) { + buf.write( + 'Writable as: ${report.writableAs.map((f) => f.name).join(", ")}', + ); + buf.write('\n'); + } + if (report.notWritableAs.isNotEmpty) { + buf.write( + 'Not writable as: ' + '${report.notWritableAs.map((f) => f.name).join(", ")}', + ); + buf.write('\n'); + } } if (report.flattenCells != CellPolicy.refuse) { buf.write('Cell policy: ${report.flattenCells.name}\n'); @@ -483,6 +493,9 @@ String renderExplain(ExplainReport report) { return buf.toString(); } +bool _hasRuntimeRejection(ExplainReport report) => + report.warnings.any((w) => w.kind == WarningKind.runtimeRejection); + /// Render an [ExplainReport] as a JSON string for programmatic /// consumers (agent tooling, build pipelines). /// @@ -494,6 +507,11 @@ String renderExplain(ExplainReport report) { /// carries `stage_index`, `kind` (one of `empty_filter`, /// `runtime_rejection`, `trivial_result`), and `message`. String renderExplainJson(ExplainReport report) { + // Suppress writability when a runtime-rejection warning fires — + // listing every format would mislead, since the pipeline throws + // before any writer runs. Agents should pattern-match on warnings + // first; null on writability is the explicit "uncertain" signal. + final suppressWritability = _hasRuntimeRejection(report); final payload = { 'stages': [ for (final s in report.stages) @@ -507,8 +525,14 @@ String renderExplainJson(ExplainReport report) { 'message': w.message, }, ], - 'writable_as': [for (final f in report.writableAs) f.name], - 'not_writable_as': [for (final f in report.notWritableAs) f.name], + 'writable_as': + suppressWritability + ? null + : [for (final f in report.writableAs) f.name], + 'not_writable_as': + suppressWritability + ? null + : [for (final f in report.notWritableAs) f.name], 'flatten_cells': report.flattenCells.name, }; return const JsonEncoder.withIndent(' ').convert(payload); diff --git a/test/shape_explain_test.dart b/test/shape_explain_test.dart index 2dd3bee..90c9b42 100644 --- a/test/shape_explain_test.dart +++ b/test/shape_explain_test.dart @@ -146,6 +146,78 @@ void main() { expect(text, contains('Not writable as:')); expect(text, contains('toml')); }); + + test('suppresses writability when a runtime-rejection warning fires', () { + // `.config | flatten` on a map shape: flatten rejects map at + // runtime, so the post-stage shape is SAny — which would + // otherwise pass canWriteAs for every format. Listing every + // format would mislead because the pipeline will throw before + // any writer runs. + final report = explain( + _parse('.config | flatten'), + const SMap({ + 'config': SMap({'host': SString()}), + }), + ); + // Sanity: the rejection warning is in fact present. + expect( + report.warnings.any((w) => w.kind == WarningKind.runtimeRejection), + isTrue, + ); + final text = renderExplain(report); + expect(text, contains('runtime-rejection warning above')); + expect(text, isNot(contains('Writable as: json'))); + expect(text, isNot(contains('Not writable as:'))); + }); + + test('empty-filter warning alone does NOT suppress writability', () { + // emptyFilter is not runtimeRejection — the pipeline runs to + // completion, just produces an empty result. Writability still + // applies. + final report = explain( + _parse('.users | filter(.missing)'), + const SMap({ + 'users': SList(SMap({'name': SString()})), + }), + ); + expect( + report.warnings.any((w) => w.kind == WarningKind.emptyFilter), + isTrue, + ); + expect( + report.warnings.any((w) => w.kind == WarningKind.runtimeRejection), + isFalse, + ); + final text = renderExplain(report); + expect(text, contains('Writable as:')); + expect(text, isNot(contains('runtime-rejection warning above'))); + }); + }); + + group('renderExplainJson: writability suppression', () { + test('writable_as / not_writable_as become null on runtime-rejection', () { + final report = explain( + _parse('.config | flatten'), + const SMap({ + 'config': SMap({'host': SString()}), + }), + ); + final json = + jsonDecode(renderExplainJson(report)) as Map; + expect(json['writable_as'], isNull); + expect(json['not_writable_as'], isNull); + // warnings still present so consumers can see why. + expect(json['warnings'], isA>()); + expect((json['warnings'] as List), isNotEmpty); + }); + + test('clean pipeline keeps both writability lists', () { + final report = explain(_parse('.'), const SMap({'a': SNum()})); + final json = + jsonDecode(renderExplainJson(report)) as Map; + expect(json['writable_as'], isA>()); + expect(json['not_writable_as'], isA>()); + }); }); group('explain: predicate warnings for provably-empty filters', () { From 8501256615e8720362179d01fc096dcd0e7c9d1f Mon Sep 17 00:00:00 2001 From: Hakim Jonas Ghoula Date: Thu, 21 May 2026 21:55:17 +0200 Subject: [PATCH 32/67] feat(cli): -n / --null-input flag MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit `lam -n '[1,2,3] | unique'` now runs the query against null context with no input. Useful for value computations: literal lists, type ops, scratch arithmetic. Without -n, the missing-input guard fires (typo'd filenames and missing redirects are common footguns); the flag puts the "I have no input" intent on the command line where it's visible in scripts. Long form `--null-input` matches jq's spelling exactly. Borrowed convention, not borrowed semantics — the same convergent design that gave us `tonumber → to_number` and `add → sum` jq aliases. Rejects combination with -i, --ndjson, --schema, --assert. Empty piped stdin in evaluation mode now surfaces the standard "no input" error rather than confusing the user with a JSON parse error on the empty string. The test runner's closed empty stdin exposed this latent bug. doc/syntax.md examples that A6 rewrote as `echo '...' | lam '. | op'` revert to the cleaner `lam -n '... | op'` form. Several were also silently broken: lambé object construction uses bare identifiers (`{a: 1}`), not JSON-string keys (`{"a": 1}`), so the pre-A6 doc examples like `[{"key": "a"}] | from_entries` were never runnable. Fixed. doc/getting-started.md gains a "Value computations with no input" section. doc/lam.1.md gains the flag entry; doc/lam.1 regenerated. Tests: 11 new cases pinning the flag, the rejections, and the no-input footgun preservation. 1524 -> 1535. --- bin/lam.dart | 62 ++++++++++++++++++++---- doc/getting-started.md | 21 ++++++++ doc/lam.1 | 3 ++ doc/lam.1.md | 3 ++ doc/syntax.md | 26 +++++----- test/cli_integration_test.dart | 87 ++++++++++++++++++++++++++++++++++ 6 files changed, 181 insertions(+), 21 deletions(-) diff --git a/bin/lam.dart b/bin/lam.dart index 7b8ad76..d1d71f7 100644 --- a/bin/lam.dart +++ b/bin/lam.dart @@ -95,6 +95,14 @@ void main(List arguments) { 'evaluated independently. One result per line on stdout.', negatable: false, ) + ..addFlag( + 'null-input', + abbr: 'n', + help: + 'Run the query against null context with no input. Useful ' + 'for value computations: `lam -n \'[1,2,3] | unique\'`.', + negatable: false, + ) ..addFlag('help', abbr: 'h', negatable: false, help: 'Show usage'); final ArgResults args; @@ -123,6 +131,29 @@ void main(List arguments) { final explainJson = args.flag('explain-json'); final isExplainMode = args.flag('explain') || explainTrivial || explainJson; var isNdjsonMode = args.flag('ndjson'); + final nullInput = args.flag('null-input'); + + // -n / --null-input combinations. The flag's purpose is "run the + // query against null with no input"; combinations that take input + // (REPL, ndjson, schema validation, assert) are nonsensical. + if (nullInput) { + if (isInteractive) { + stderr.writeln('Error: -n cannot be combined with --interactive.'); + exit(1); + } + if (isNdjsonMode) { + stderr.writeln('Error: -n cannot be combined with --ndjson.'); + exit(1); + } + if (schemaPath != null) { + stderr.writeln('Error: -n cannot be combined with --schema.'); + exit(1); + } + if (isAssertMode) { + stderr.writeln('Error: -n cannot be combined with --assert.'); + exit(1); + } + } final rest = args.rest; if (rest.isEmpty && !isPrintShapeMode && !isInteractive) { @@ -217,9 +248,10 @@ void main(List arguments) { } else if (stdin.hasTerminal) { // `--explain` and `--print-shape` perform static analysis and can // run without input — `--print-shape EXPR` falls back to inferring - // from SAny, mirroring the explain-without-data flow. Every other - // mode requires a file argument or piped stdin. - if (!isExplainMode && !isPrintShapeMode) { + // from SAny, mirroring the explain-without-data flow. `-n` is the + // explicit "run against null" opt-in. Every other mode requires + // a file argument or piped stdin. + if (!isExplainMode && !isPrintShapeMode && !nullInput) { stderr.writeln('Error: no input. Provide a file or pipe data via stdin.'); stderr.writeln(); _usage(argParser); @@ -231,11 +263,25 @@ void main(List arguments) { while ((line = stdin.readLineSync()) != null) { buffer.writeln(line); } - // Empty stdin in static-analysis modes (--explain, --print-shape): - // treat as "no data" rather than trying to parse the empty string - // as JSON. This matches the no-stdin branch's contract. - if (buffer.isEmpty && (isExplainMode || isPrintShapeMode)) { - input = null; + // Empty stdin in static-analysis modes (--explain, --print-shape) + // and explicit null-input mode (-n): treat as "no data" rather + // than trying to parse the empty string as JSON. This matches the + // no-stdin branch's contract. + if (buffer.isEmpty) { + if (isExplainMode || isPrintShapeMode || nullInput) { + input = null; + } else { + // Empty piped stdin in evaluation mode is the same footgun as + // a missing file argument: surface the "no input" message + // rather than confusing the user with a JSON parse error on + // the empty string. + stderr.writeln( + 'Error: no input. Provide a file or pipe data via stdin.', + ); + stderr.writeln(); + _usage(argParser); + exit(1); + } } else { input = buffer.toString(); } diff --git a/doc/getting-started.md b/doc/getting-started.md index 4f4358b..f13764f 100644 --- a/doc/getting-started.md +++ b/doc/getting-started.md @@ -189,6 +189,27 @@ $ echo $? The exit code is 0 if the assertion passes, 1 if it fails. +## Value computations with no input + +For pure value computations — query expressions that build their own +data and don't read from a file or stdin — pass `-n` (`--null-input`): + +```bash +$ lam -n '[1,2,3] | unique' +[ + 1, + 2, + 3 +] + +$ lam -n '[1,2,3] | sum' +6 +``` + +Without `-n`, lambé errors on a missing input — that's deliberate +footgun-catching for typo'd filenames and missing redirects. The +flag makes "I have no input" explicit. + ## The REPL For exploring unfamiliar data, use interactive mode: diff --git a/doc/lam.1 b/doc/lam.1 index ab4fa4e..ea67e2b 100644 --- a/doc/lam.1 +++ b/doc/lam.1 @@ -60,6 +60,9 @@ Start the interactive REPL. Requires a file argument. \fB--ndjson\fR Treat input as ndjson or jsonl: one JSON document per line, evaluated independently with no state shared between lines. Emits one compact JSON result per line on stdout. Auto-enabled when the file extension is \fB.ndjson\fR or \fB.jsonl\fR. Cannot combine with \fB--interactive\fR, \fB--schema\fR, \fB--print-shape\fR, \fB--assert\fR, or \fB--explain\fR. Output must be JSON (\fB--to json\fR or default); other \fB--to\fR values are refused. .TP +\fB-n\fR, \fB--null-input\fR +Run the query against \fBnull\fR context with no input. Useful for value computations like \fClam -n '[1,2,3] | unique'\fR. Without \fB-n\fR, the missing-input guard fires (a typo'd filename or missing redirect is a common footgun); the flag makes the "I have no input" intent explicit. Cannot combine with \fB--interactive\fR, \fB--ndjson\fR, \fB--schema\fR, or \fB--assert\fR. +.TP \fB-h\fR, \fB--help\fR Show usage information. .SH QUERY LANGUAGE diff --git a/doc/lam.1.md b/doc/lam.1.md index 525d46a..fc44188 100644 --- a/doc/lam.1.md +++ b/doc/lam.1.md @@ -68,6 +68,9 @@ If no file is given, reads from standard input. **--ndjson** : Treat input as ndjson or jsonl: one JSON document per line, evaluated independently with no state shared between lines. Emits one compact JSON result per line on stdout. Auto-enabled when the file extension is **.ndjson** or **.jsonl**. Cannot combine with **--interactive**, **--schema**, **--print-shape**, **--assert**, or **--explain**. Output must be JSON (**--to json** or default); other **--to** values are refused. +**-n**, **--null-input** +: Run the query against **null** context with no input. Useful for value computations like `lam -n '[1,2,3] | unique'`. Without **-n**, the missing-input guard fires (a typo'd filename or missing redirect is a common footgun); the flag makes the "I have no input" intent explicit. Cannot combine with **--interactive**, **--ndjson**, **--schema**, or **--assert**. + **-h**, **--help** : Show usage information. diff --git a/doc/syntax.md b/doc/syntax.md index 51180f2..682b77f 100644 --- a/doc/syntax.md +++ b/doc/syntax.md @@ -288,7 +288,7 @@ Group elements by a key. Returns `[{key, values}]`. Remove duplicate values. ``` -$ echo '[1, 2, 2, 3, 1]' | lam '. | unique' +$ lam -n '[1, 2, 2, 3, 1] | unique' [ 1, 2, @@ -310,7 +310,7 @@ Remove duplicates by a key expression. Flatten one level of nesting. ``` -$ echo '[[1, 2], [3, 4], [5]]' | lam '. | flatten' +$ lam -n '[[1, 2], [3, 4], [5]] | flatten' [ 1, 2, @@ -421,7 +421,7 @@ $ echo '{"config":{"database":{"host":"localhost","port":5432}}}' | lam '.config } ] -$ echo '[{"key": "a", "value": 1}]' | lam '. | from_entries' +$ lam -n '[{key: "a", value: 1}] | from_entries' { "a": 1 } @@ -435,13 +435,13 @@ CSV and TSV cells are strings by default; use `to_number` to coerce them before arithmetic. ``` -$ echo '"42"' | lam '. | to_number' +$ lam -n '"42" | to_number' 42 -$ echo '"3.14"' | lam '. | to_number' +$ lam -n '"3.14" | to_number' 3.14 -$ echo '100' | lam '. | to_number' +$ lam -n '100 | to_number' 100 $ echo '{"price": "29.99"}' | lam '.price | to_number' @@ -459,22 +459,22 @@ Possible return values: `"null"`, `"boolean"`, `"number"`, `"string"`, `"array"`, `"object"`. ``` -$ echo '42' | lam '. | type' +$ lam -n '42 | type' "number" -$ echo '"hello"' | lam '. | type' +$ lam -n '"hello" | type' "string" -$ echo 'null' | lam '. | type' +$ lam -n 'null | type' "null" -$ echo '[1, 2]' | lam '. | type' +$ lam -n '[1, 2] | type' "array" -$ echo '{"a": 1}' | lam '. | type' +$ lam -n '{a: 1} | type' "object" -$ echo '[1, "two", 3]' | lam '. | filter((. | type) == "number")' +$ lam -n '[1, "two", 3] | filter((. | type) == "number")' [ 1, 3 @@ -495,7 +495,7 @@ Filter a map's values. Transform a map's values. ``` -$ echo '{"a": 1, "b": 2}' | lam '. | map_values(. * 10)' +$ lam -n '{a: 1, b: 2} | map_values(. * 10)' { "a": 10, "b": 20 diff --git a/test/cli_integration_test.dart b/test/cli_integration_test.dart index 9bb5c6f..06af9a0 100644 --- a/test/cli_integration_test.dart +++ b/test/cli_integration_test.dart @@ -673,4 +673,91 @@ void main() { expect(err, contains('--schema')); }); }); + + group('-n / --null-input: input-less queries', () { + test('-n with literal-list query', () async { + final (code, out, _) = await _runLam(['-n', '[1,2,3] | unique']); + expect(code, 0); + expect(jsonDecode(out), [1, 2, 3]); + }); + + test('-n with identity returns null', () async { + final (code, out, _) = await _runLam(['-n', '.']); + expect(code, 0); + expect(jsonDecode(out), isNull); + }); + + test('-n with field access on null is null (null-propagation)', () async { + final (code, out, _) = await _runLam(['-n', '.name']); + expect(code, 0); + expect(jsonDecode(out), isNull); + }); + + test('--null-input long form works the same', () async { + final (code, out, _) = await _runLam([ + '--null-input', + '[1,2,2,3] | unique', + ]); + expect(code, 0); + expect(jsonDecode(out), [1, 2, 3]); + }); + + test('-n without expression errors with missing-query message', () async { + final (code, _, err) = await _runLam(['-n']); + expect(code, 1); + expect(err, contains('missing query expression')); + }); + + test('rejects -n -i', () async { + final (code, _, err) = await _runLam(['-n', '-i', '.']); + expect(code, 1); + expect(err, contains('-n')); + expect(err, contains('--interactive')); + }); + + test('rejects -n --ndjson', () async { + final (code, _, err) = await _runLam(['-n', '--ndjson', '.']); + expect(code, 1); + expect(err, contains('-n')); + expect(err, contains('--ndjson')); + }); + + test('rejects -n --schema', () async { + final schema = File('${tmp.path}/s.json') + ..writeAsStringSync('{"type":"object"}'); + final (code, _, err) = await _runLam([ + '-n', + '--schema', + schema.path, + '.', + ]); + expect(code, 1); + expect(err, contains('-n')); + expect(err, contains('--schema')); + }); + + test('rejects -n --assert', () async { + final (code, _, err) = await _runLam(['-n', '--assert', 'true']); + expect(code, 1); + expect(err, contains('-n')); + expect(err, contains('--assert')); + }); + + test( + 'without -n, no input still errors with the standard message', + () async { + // The default footgun catch must stay. `-n` is the explicit + // opt-in. + final (code, _, err) = await _runLam(['[1,2,3] | unique']); + expect(code, 1); + expect(err, contains('no input')); + }, + ); + + test('-n with sum on a literal list', () async { + final (code, out, _) = await _runLam(['-n', '[1,2,3] | sum']); + expect(code, 0); + expect(jsonDecode(out), 6); + }); + }); } From 710f9e2aaea6ff10dfe1534d4706edbd32bb03fe Mon Sep 17 00:00:00 2001 From: Hakim Jonas Ghoula Date: Thu, 21 May 2026 21:57:56 +0200 Subject: [PATCH 33/67] feat(parser): jq idiom hints for try/recurse/paths/range/@csv Extends `_jqIdiomHint` and `_jqPipeOpHint` with one-liner pointers for the column-1 jq keywords agents reach for that produce a giant vocabulary dump otherwise: - `try` / `try ... catch` -> shape-checks or `if`/`else` - `recurse`, `walk` -> explicit paths + map/flatten - `paths`, `leaf_paths` -> `--print-shape` / `lambe_print_shape` - `range` -> data-driven build (no generator) - `limit`, `nth` -> slicing or `first`/`last` - `@csv`, `@tsv` -> `as(csv)` / `as(tsv)` - `@base64` -> explicitly unsupported `_describeLeftover` falls through to `_jqIdiomHint` when the post- pipe token starts with a non-identifier char (the `@` formatters were hitting the generic "unexpected input after |" message otherwise). Tests: 12 new cases in parse_error_format_test. 1535 -> 1547. --- lib/lambe.dart | 89 +++++++++++++++++++++++ test/parse_error_format_test.dart | 114 ++++++++++++++++++++++++++++++ 2 files changed, 203 insertions(+) diff --git a/lib/lambe.dart b/lib/lambe.dart index 9e89b81..59bb502 100644 --- a/lib/lambe.dart +++ b/lib/lambe.dart @@ -407,6 +407,15 @@ String _describeLeftover(String expression, int offset) { suggestion != null ? '\n help: did you mean "$suggestion"?' : ''; return 'unknown operation "$word" after |$hint'; } + // Word-based dispatch didn't fire (often because the next token + // starts with a non-identifier char like `@`). Try the + // idiom-detection pass against the post-pipe content before + // falling back to the generic message. + final pipeIdiom = _jqIdiomHint( + expression, + expression.length - rest.length + 1, + ); + if (pipeIdiom != null) return pipeIdiom; return 'unexpected input after |'; } final idiom = _jqIdiomHint(expression, offset); @@ -436,6 +445,27 @@ String? _jqPipeOpHint(String word) { 'or replace it with `filter(pred)`.'; case 'not': return '`not` is a prefix in Lambé: write `!pred`.'; + case 'try': + return 'Lambé has no exception model. ' + 'Use `if`/`else` or shape checks (`has("k")`, ' + '`--print-shape`) instead of `try ... catch`.'; + case 'recurse': + case 'walk': + return 'Lambé has no recursive descent. Use explicit paths; ' + 'combine `map(...)` and `flatten` for nested fan-out.'; + case 'paths': + case 'leaf_paths': + return 'Lambé has no `$word` op. Use `--print-shape` (CLI) or ' + '`lambe_print_shape` (MCP) to see the structure of the data.'; + case 'range': + return 'Lambé has no `range` generator. Build the list inline ' + '(`[0,1,2,...]`) or pre-compute it; lambé queries are ' + 'data-driven, not generator-driven.'; + case 'limit': + case 'nth': + return '`$word` is not a lambé op. Use slicing `[:n]` to take a ' + 'prefix, `[n:n+1]` to take an index, or `first`/`last` for ' + 'the ends.'; default: return null; } @@ -452,6 +482,12 @@ String? _jqPipeOpHint(String word) { /// `filter(...)`). /// - `empty` keyword (no `empty`; use `filter(pred)`). /// - `end` from a stranded `if/then/else/end` tail. +/// - `try` / `try ... catch` (Lambé has no exception model). +/// - `recurse`, `walk` (no recursive descent; explicit paths). +/// - `paths`, `leaf_paths` (use `--print-shape` to inspect structure). +/// - `range`, `limit`, `nth` (use slicing or `first`/`last`). +/// - `@csv`, `@tsv`, `@base64` (use `as(csv)` / `as(tsv)`; base64 is +/// not supported). String? _jqIdiomHint(String expression, int offset) { // `.users[]`: parser expected an index expression after `[` and // failed on `]`. Detect by: offset points at `]` and the previous @@ -509,9 +545,62 @@ String? _jqIdiomHint(String expression, int offset) { 'stage. Use it inside `map(...)` / `filter(...)`, and drop ' 'the `end` keyword — Lambé terminates `if` at the else branch.'; } + // `try` / `try ... catch`. jq's exception model has no lambé + // analogue. + if (_atKeyword(rest, 'try')) { + return 'Lambé has no exception model. ' + 'Use `if`/`else` or shape checks (`has("k")`, `--print-shape`) ' + 'instead of `try ... catch`.'; + } + // `recurse`, `walk` — both jq's recursive-descent operators. + if (_atKeyword(rest, 'recurse') || _atKeyword(rest, 'walk')) { + return 'Lambé has no recursive descent. Use explicit paths; ' + 'combine `map(...)` and `flatten` for nested fan-out.'; + } + // `paths`, `leaf_paths` — jq's path enumeration. Lambé exposes + // structure via `--print-shape` instead. + if (_atKeyword(rest, 'paths') || _atKeyword(rest, 'leaf_paths')) { + return 'Lambé has no `paths`/`leaf_paths`. Use `--print-shape` ' + '(CLI) or `lambe_print_shape` (MCP) to see the structure of ' + 'the data.'; + } + // `range`, `limit`, `nth` — jq generators / slicing helpers. + if (_atKeyword(rest, 'range')) { + return 'Lambé has no `range` generator. Build the list inline ' + '(`[0,1,2,...]`) or pre-compute it; lambé queries are ' + 'data-driven, not generator-driven.'; + } + if (_atKeyword(rest, 'limit') || _atKeyword(rest, 'nth')) { + final word = _atKeyword(rest, 'limit') ? 'limit' : 'nth'; + return '`$word` is not a lambé op. Use slicing `[:n]` to take a ' + 'prefix, `[n:n+1]` to take an index, or `first`/`last` for ' + 'the ends.'; + } + // `@csv` / `@tsv` — jq's format strings. Lambé routes through + // `as(csv)` / `as(tsv)` instead. + if (rest.startsWith('@csv') || rest.startsWith('@tsv')) { + final fmt = rest.startsWith('@csv') ? 'csv' : 'tsv'; + return 'Lambé has no `@$fmt` format string. Use `as($fmt)` to ' + 'serialize a list-of-records as $fmt, or `--to $fmt` at the ' + 'CLI level.'; + } + // `@base64` — explicitly unsupported. + if (rest.startsWith('@base64')) { + return 'Lambé does not support `@base64` encoding/decoding. ' + 'Pre-process the data outside lambé if you need it.'; + } return null; } +/// Whether [rest] begins with [keyword] followed by a non-identifier +/// character (or end-of-string). Mirrors the `select`/`empty`/`end` +/// detection above; centralised here to keep the new cases compact. +bool _atKeyword(String rest, String keyword) { + if (!rest.startsWith(keyword)) return false; + if (rest.length == keyword.length) return true; + return !_isIdentChar(rest.codeUnitAt(keyword.length)); +} + bool _isIdentChar(int code) => (code >= 0x30 && code <= 0x39) || // 0-9 (code >= 0x41 && code <= 0x5a) || // A-Z diff --git a/test/parse_error_format_test.dart b/test/parse_error_format_test.dart index 6dd30f4..3d3260e 100644 --- a/test/parse_error_format_test.dart +++ b/test/parse_error_format_test.dart @@ -252,5 +252,119 @@ void main() { expect(e.message, contains('did you mean "filter"?')); } }); + + test('| try suggests if/else or shape checks', () { + try { + parseAst('.x | try .a'); + fail('expected parse to fail'); + } on QueryError catch (e) { + expect(e.message, contains('try')); + expect(e.message, contains('exception model')); + } + }); + + test('try at top level suggests if/else', () { + try { + parseAst('try .a catch null'); + fail('expected parse to fail'); + } on QueryError catch (e) { + expect(e.message, contains('try')); + } + }); + + test('| recurse suggests explicit paths', () { + try { + parseAst('.x | recurse(.children)'); + fail('expected parse to fail'); + } on QueryError catch (e) { + expect(e.message, contains('recursive descent')); + } + }); + + test('| walk suggests explicit paths', () { + try { + parseAst('.x | walk(. * 2)'); + fail('expected parse to fail'); + } on QueryError catch (e) { + expect(e.message, contains('recursive descent')); + } + }); + + test('| paths suggests --print-shape', () { + try { + parseAst('.x | paths'); + fail('expected parse to fail'); + } on QueryError catch (e) { + expect(e.message, contains('paths')); + expect(e.message, contains('print-shape')); + } + }); + + test('| leaf_paths suggests --print-shape', () { + try { + parseAst('.x | leaf_paths'); + fail('expected parse to fail'); + } on QueryError catch (e) { + expect(e.message, contains('print-shape')); + } + }); + + test('range generator hint', () { + try { + parseAst('range(0; 10)'); + fail('expected parse to fail'); + } on QueryError catch (e) { + expect(e.message, contains('range')); + expect(e.message, contains('generator')); + } + }); + + test('| limit suggests slicing', () { + try { + parseAst('.x | limit(3; .)'); + fail('expected parse to fail'); + } on QueryError catch (e) { + expect(e.message, contains('limit')); + expect(e.message, contains('slicing')); + } + }); + + test('| nth suggests slicing or first/last', () { + try { + parseAst('.x | nth(0; .)'); + fail('expected parse to fail'); + } on QueryError catch (e) { + expect(e.message, contains('nth')); + } + }); + + test('@csv suggests as(csv)', () { + try { + parseAst('.users | @csv'); + fail('expected parse to fail'); + } on QueryError catch (e) { + expect(e.message, contains('@csv')); + expect(e.message, contains('as(csv)')); + } + }); + + test('@tsv suggests as(tsv)', () { + try { + parseAst('.users | @tsv'); + fail('expected parse to fail'); + } on QueryError catch (e) { + expect(e.message, contains('as(tsv)')); + } + }); + + test('@base64 explicitly unsupported', () { + try { + parseAst('.x | @base64'); + fail('expected parse to fail'); + } on QueryError catch (e) { + expect(e.message, contains('@base64')); + expect(e.message, contains('not support')); + } + }); }); } From fad6c9a2a63ceb3c0582465515a4904f6d59d182 Mon Sep 17 00:00:00 2001 From: Hakim Jonas Ghoula Date: Thu, 21 May 2026 21:59:48 +0200 Subject: [PATCH 34/67] fix(shape): heterogeneous list rendering hint MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit When `shapeOf` collapses a list with mixed element types to SList(SAny()), the rendered JSON Schema now carries a description: "sampled, may be heterogeneous". Users see the hint in `--print-shape` output and know the schema reflects sampling, not a guarantee. Per the followups doc this is the lower-effort path. The deeper fix (an SUnion variant on the Shape ADT) is bigger than Tier B's scope and intentionally deferred. The schema parser ignores unknown keywords, so the hint round-trips through parseJsonSchema — typed lists don't carry it, and the SList(SAny()) shape parses back identically. Tests: 3 new cases (presence, absence on typed lists, round-trip). 1547 -> 1550. --- lib/src/schema/renderer.dart | 13 ++++++++++++- test/schema_renderer_test.dart | 26 ++++++++++++++++++++++++++ 2 files changed, 38 insertions(+), 1 deletion(-) diff --git a/lib/src/schema/renderer.dart b/lib/src/schema/renderer.dart index d0667a3..bc085cf 100644 --- a/lib/src/schema/renderer.dart +++ b/lib/src/schema/renderer.dart @@ -59,7 +59,18 @@ Map _encode(Shape shape) { SBool() => {'type': 'boolean'}, SNum() => {'type': 'number'}, SString() => {'type': 'string'}, - SList(:final element) => {'type': 'array', 'items': _encode(element)}, + SList(:final element) => { + 'type': 'array', + 'items': _encode(element), + // SList(SAny()) means "this list contained heterogeneous or + // unknown elements" — `shapeOf` collapses to SAny when it can't + // narrow the element type. Surface the hint so users know the + // schema reflects sampling, not a guarantee. The lambé schema + // parser ignores unknown keywords (per JSON Schema's + // extensibility convention for metadata), so this round-trips + // safely. + if (element is SAny) 'description': 'sampled, may be heterogeneous', + }, SMap(:final fields) => _encodeMap(fields), // Unreachable: SOptional was unwrapped above. Present for // exhaustive-switch conformance. diff --git a/test/schema_renderer_test.dart b/test/schema_renderer_test.dart index ff3c4d5..89feedf 100644 --- a/test/schema_renderer_test.dart +++ b/test/schema_renderer_test.dart @@ -49,6 +49,32 @@ void main() { expect(out, contains('"items":')); }); + test('SList carries a "sampled, may be heterogeneous" hint', () { + // shapeOf collapses heterogeneous elements to SAny. The + // renderer surfaces that via a description so users know the + // schema reflects sampling, not a guarantee. + final out = renderJsonSchema(const SList(SAny())); + expect(out, contains('"description"')); + expect(out, contains('sampled, may be heterogeneous')); + }); + + test('typed list does NOT carry the heterogeneous hint', () { + final out = renderJsonSchema(const SList(SString())); + expect(out, isNot(contains('description'))); + }); + + test( + 'SList heterogeneous hint round-trips through parseJsonSchema', + () { + // The hint is metadata; the parser ignores unknown keywords. + // Round-trip preserves the SList shape. + const shape = SList(SAny()); + final out = renderJsonSchema(shape); + final reparsed = parseJsonSchema(out); + expect(reparsed, shape); + }, + ); + test('SMap with all required fields lists all in required', () { final out = renderJsonSchema(const SMap({'a': SNum(), 'b': SString()})); expect(out, contains('"type": "object"')); From 29abef58ef47a6e5214dcecfe8524ccf8c5dc2e1 Mon Sep 17 00:00:00 2001 From: Hakim Jonas Ghoula Date: Thu, 21 May 2026 22:01:57 +0200 Subject: [PATCH 35/67] chore: as(fmt) ambiguity investigation; soften As class doc MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit The discovery session couldn't reproduce the "ambiguous bridge" error path. Investigation confirms why: every arm of `_suggestionsFor` in `lib/src/shape/check.dart` returns a list of exactly one `Remediation`. The runtime check at `evaluator.dart:_as`'s `nw.suggestions.length > 1` branch is structurally unreachable with the current curated table. Recommendation (kept as-is): the runtime branch is a cheap defensive guard against future curation errors. Keep it. But the user-facing `As` class doc (`ast.dart`) was claiming the path was reachable; softened to be honest about what users will and won't hit. Added a synthesize_test invariant: every (representative shape × every format) produces ≤ 1 bridge. If a future contributor adds a second bridge for any pair the test fails, surfacing the design choice instead of letting the multi-bridge branch silently become reachable. 54 new test cases (9 shapes × 6 formats). 1550 -> 1604. --- lib/src/ast.dart | 16 ++++++++++----- test/shape_synthesize_test.dart | 35 +++++++++++++++++++++++++++++++++ 2 files changed, 46 insertions(+), 5 deletions(-) diff --git a/lib/src/ast.dart b/lib/src/ast.dart index b65c316..e6d8891 100644 --- a/lib/src/ast.dart +++ b/lib/src/ast.dart @@ -188,11 +188,17 @@ final class Slice extends LamExpr { /// /// At runtime the evaluator infers the shape of the current context and /// checks it against the target format's requirement. If the shape is -/// already compatible, [As] returns the context unchanged. If exactly -/// one curated remediation exists for the mismatch, it is applied. If -/// the combination has no curated remediation, or more than one, -/// evaluation throws a [QueryError] listing the available candidates -/// so the caller can pick one explicitly. +/// already compatible, [As] returns the context unchanged. If a curated +/// remediation exists for the mismatch, it is applied. Otherwise +/// evaluation throws a [QueryError]. +/// +/// The curated remediation table in `shape/check.dart:_suggestionsFor` +/// returns at most one bridge per `(input shape, format)` pair, so in +/// practice "no curated bridge" is the only failure mode users hit. +/// A defensive multi-bridge branch in the evaluator (`_as` in +/// `evaluator.dart`) guards against future curation errors that might +/// add competing bridges; if that path ever fires the user will get a +/// listing and a request to pick one explicitly. final class As extends LamExpr { /// The target output format the pipeline should fit. final OutputFormat target; diff --git a/test/shape_synthesize_test.dart b/test/shape_synthesize_test.dart index b3da864..f2ba735 100644 --- a/test/shape_synthesize_test.dart +++ b/test/shape_synthesize_test.dart @@ -116,4 +116,39 @@ void main() { expect(composed.op, same(bridge)); }); }); + + group('curated table: at most one bridge per (shape, format)', () { + // Pins the invariant the documented "ambiguous bridge" error path + // currently relies on by being unreachable. If a future curation + // adds a second bridge for any pair, this test fails and the + // contributor either picks one or accepts that the multi-bridge + // branch in `evaluator.dart` becomes user-visible. + final shapes = [ + const SNull(), + const SBool(), + const SNum(), + const SString(), + const SList(SAny()), + const SList(SString()), + const SList(SNum()), + const SMap({}), + const SMap({'a': SNum()}), + ]; + for (final shape in shapes) { + for (final fmt in OutputFormat.values) { + test('${shape.runtimeType} -> ${fmt.name}: ≤ 1 bridge', () { + final bridges = synthesize(shape, fmt); + expect( + bridges.length, + lessThanOrEqualTo(1), + reason: + 'Curated table produced ${bridges.length} bridges for ' + '${shape.runtimeType} -> ${fmt.name}; if this is ' + 'intentional, the multi-bridge ambiguity path becomes ' + 'reachable and the As class doc should be updated.', + ); + }); + } + } + }); } From e53fa22062ce28765fce282e5b757622bdd47fe7 Mon Sep 17 00:00:00 2001 From: Hakim Jonas Ghoula Date: Thu, 21 May 2026 22:02:45 +0200 Subject: [PATCH 36/67] docs: as(fmt) bridges reference in recipes.md Documents the four canonical bridges with runnable examples: - list | as(toml/hcl) -> {items: [...]} - scalar | as(toml/hcl) -> {value: scalar} - map | as(csv/tsv) -> derived from to_entries - scalar | as(csv/tsv) -> {value: .} | to_entries (one-row csv) Each example was verified by running through dart run bin/lam.dart; output in the doc matches actual output. Recipes is the right home because these are end-to-end CLI demonstrations, not grammar reference. --- doc/recipes.md | 66 ++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 66 insertions(+) diff --git a/doc/recipes.md b/doc/recipes.md index 3cfeb27..2b8ac2f 100644 --- a/doc/recipes.md +++ b/doc/recipes.md @@ -336,6 +336,72 @@ $ lam '.spec.template.spec' deployment.yaml $ lam -i deployment.yaml ``` +## Bridging shapes to output formats with `as(fmt)` + +Some output formats restrict the root shape: TOML and HCL want a map +at the top level; CSV and TSV want a list of records. When the +pipeline produces something else, `as(fmt)` applies a curated bridge +so the value fits. + +There are four canonical bridges. All four are reachable via `as(...)` +or via the CLI's `--to` flag with `--flatten-cells refuse` (the +default). + +### `list | as(toml)` and `as(hcl)` + +Wrap a list under a single `items` key. + +``` +$ lam -n --to toml '["a", "b", "c"] | as(toml)' +items = ["a", "b", "c"] + + +$ lam -n --to hcl '["a", "b"] | as(hcl)' +items = ["a", "b"] +``` + +### `scalar | as(toml)` and `as(hcl)` + +Wrap a scalar under a single `value` key. + +``` +$ lam -n --to toml '"hello" | as(toml)' +value = "hello" + + +$ lam -n --to hcl '"hello" | as(hcl)' +value = "hello" +``` + +### `map | as(csv)` and `as(tsv)` + +Convert a map to a two-column key/value list of records via +`to_entries`. + +``` +$ lam -n --to csv '{a: 1, b: 2} | as(csv)' +key,value +a,1 +b,2 +``` + +### `scalar | as(csv)` and `as(tsv)` + +Compose: wrap the scalar under `value`, then `to_entries`. The +result is a one-row CSV with a `key`/`value` header. + +``` +$ lam -n --to csv '42 | as(csv)' +key,value +value,42 +``` + +### When `as(fmt)` does nothing + +A shape that already satisfies the format's requirement passes +through unchanged: `map | as(toml)` is identity, as is `list | +as(csv)`. The bridge fires only when there's a real mismatch. + ## Next steps - [Getting started](getting-started.md) for installation From 82168f604752686e73ea4b86b143e9a6c13a22ef Mon Sep 17 00:00:00 2001 From: Hakim Jonas Ghoula Date: Thu, 21 May 2026 22:05:01 +0200 Subject: [PATCH 37/67] docs(CHANGELOG): Tier B entries MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Bug fixes section gains entries for B3 (writability suppression), B2 (heterogeneous list hint), and the empty-stdin guard surfaced during B4. jq compatibility section gains B5 (try/recurse/paths/ range/@csv hints). Schemas as a first-class contract is extended with B1 (--print-shape EXPR composes). New "Null input" subsection for B4 (the -n / --null-input flag). Documentation precision gains B7 (as(fmt) bridges reference) and B6 (As class doc honesty + the ≤1-bridge invariant test). --- CHANGELOG.md | 65 ++++++++++++++++++++++++++++++++++++++++++++++++++-- 1 file changed, 63 insertions(+), 2 deletions(-) diff --git a/CHANGELOG.md b/CHANGELOG.md index 3bae50d..181afb9 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -65,6 +65,24 @@ consolidation, and a `rumil_tokens`-based REPL highlighter. entries used to be dropped silently, now they throw `QueryError`. Hides a class of bugs where upstream pipelines emit the wrong shape. +- **`as(fmt)` bridges reference** in `doc/recipes.md`. Documents the + four canonical bridges with runnable examples: `list | + as(toml/hcl)` wraps as `{items: ...}`; `scalar | as(toml/hcl)` + wraps as `{value: ...}`; `map | as(csv/tsv)` derives via + `to_entries`; `scalar | as(csv/tsv)` composes both. +- **`As` class doc** softened to be honest about which error paths + users will and won't hit. The "ambiguous bridge" runtime branch is + defensive against future curation errors but unreachable with the + current curated table — the doc no longer claims otherwise. A new + invariant test in `shape_synthesize_test` pins `≤ 1 bridge per + (shape, format)` so the path becomes reachable only by a + deliberate change. +- **`syntax.md` examples** revert from `echo … | lam '. | op'` to + the cleaner `lam -n '… | op'` form now that `-n` exists. Several + pre-A6 examples were also silently broken: lambé object + construction uses bare identifiers (`{a: 1}`), not JSON-string + keys (`{"a": 1}`), so `[{"key": "a"}] | from_entries` was never + runnable. Fixed. ### Bug fixes @@ -80,14 +98,39 @@ consolidation, and a `rumil_tokens`-based REPL highlighter. string`. Slicing (`.name[0:3]`) already worked; the asymmetry is gone. Out-of-range returns `null` (mirrors list indexing); non-int still throws. +- **`--explain` writability section is suppressed when a + runtime-rejection warning fires.** When a pipe op's input shape is + provably incompatible the post-stage shape widens to `SAny`, which + used to make every output format pass `canWriteAs` — so the + explain report listed every format for a pipeline that would throw + before any writer ran. Both `Writable as:` and `Not writable as:` + are now suppressed; the text renderer prints a one-line note in + their place, and the JSON renderer sets both keys to `null`. +- **Heterogeneous list rendering hint.** `shapeOf([1, "two", true])` + collapses the element type to `SAny`. The rendered JSON Schema now + carries a `description: "sampled, may be heterogeneous"` so + `--print-shape` users see that the schema reflects sampling, not + a guarantee. The hint round-trips through `parseJsonSchema` + (unknown keywords are ignored per JSON Schema's extensibility + convention). +- **Empty piped stdin.** Empty stdin in evaluation mode now surfaces + the standard "no input" error rather than a confusing JSON parse + error on the empty string. ### jq compatibility - **`add` is now recognized as an alias for `sum`.** A jq idiom that matches Lambé's `sum` exactly. `_jqAliases` in `parser.dart` is the table; entries belong there only when the jq semantics are an - exact match. Other unsupported jq idioms still surface a - "did you mean" hint or an explanatory message via `_jqIdiomHint`. + exact match. +- **Idiom hints for column-1 jq keywords.** `_jqIdiomHint` and + `_jqPipeOpHint` now recognise `try` / `try ... catch`, `recurse`, + `walk`, `paths`, `leaf_paths`, `range`, `limit`, `nth`, `@csv`, + `@tsv`, and `@base64`. Each produces a one-liner pointing at the + lambé equivalent (or, for `@base64`, the explicit "not supported" + signal) instead of the giant op-vocabulary dump. Folds into the + pre-existing hints for `[]`, `?`, `..`, `select`, `empty`, and + stranded `end`. ### Schemas as a first-class contract @@ -104,6 +147,12 @@ consolidation, and a `rumil_tokens`-based REPL highlighter. same shape-to-JSON-Schema rendering powers `renderJsonSchema(shape)` on the library and the MCP `lambe_print_shape` tool. +- **`--print-shape EXPR` composes with the query.** When given an + expression, `lam --print-shape '.users' data.json` now returns the + schema of the result of evaluating `.users` rather than the schema + of the whole document. Pre-0.9.0 the expression was silently + ignored. Without data, falls back to inferring from `SAny` — + matches the `--explain`-without-data flow. - **REPL: `:schema [path]` and `:print-shape`.** `:schema ` loads a schema for the session and reports agreement/disagreement vs current data. `:schema` (no arg) prints the active schema. @@ -181,6 +230,18 @@ mode: `--explain`; output is restricted to JSON (`--to` other than `json` is refused). +### Null input + +- **`-n` / `--null-input` flag.** Run a query against `null` context + with no input file. Useful for value computations: + `lam -n '[1,2,3] | unique'`. Without `-n`, the missing-input guard + fires (typo'd filename or missing redirect is a common footgun); + the flag puts the "I have no input" intent on the command line + where it's visible in scripts and code review. The `--null-input` + spelling matches jq exactly. +- Cannot combine with `--interactive`, `--ndjson`, `--schema`, or + `--assert`. The TTY stdin guard is unchanged. + ### `--flatten-cells` for CSV/TSV - Opt-in escape hatch: non-scalar cells encoded as JSON strings From 4d0e3bbcc182a79715459073c55173789ec89833 Mon Sep 17 00:00:00 2001 From: Hakim Jonas Ghoula Date: Fri, 22 May 2026 08:34:31 +0200 Subject: [PATCH 38/67] chore(deps): bump rumil_parsers usage; update HCL block-shape tests MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit `rumil_parsers 0.7.1` flips HCL block decoding to always return a list, regardless of count. ring6_test's single-block HCL queries now access `.resource[0]._labels` / `.resource[0].ami` rather than walking the old N=1 single-map shape. New regression test pins `.variable` is a list for both N=1 and N≥2 fixtures. --- test/ring6_test.dart | 23 +++++++++++++++++++++-- 1 file changed, 21 insertions(+), 2 deletions(-) diff --git a/test/ring6_test.dart b/test/ring6_test.dart index e1f3396..b1db146 100644 --- a/test/ring6_test.dart +++ b/test/ring6_test.dart @@ -171,7 +171,7 @@ void main() { test('block access', () { final result = queryString( - '.resource._labels', + '.resource[0]._labels', 'resource "aws_instance" "web" {\n ami = "abc"\n}\n', format: Format.hcl, ); @@ -180,13 +180,32 @@ void main() { test('block body field', () { final result = queryString( - '.resource.ami', + '.resource[0].ami', 'resource "aws_instance" "web" {\n ami = "abc"\n}\n', format: Format.hcl, ); expect(result, 'abc'); }); + test('blocks are list-shaped uniformly across N=1 and N=2', () { + final n1 = queryString( + '.variable', + 'variable "region" {\n default = "us-east-1"\n}\n', + format: Format.hcl, + ); + expect(n1, isA>()); + expect((n1 as List).length, 1); + + final n2 = queryString( + '.variable', + 'variable "region" {\n default = "us-east-1"\n}\n' + 'variable "instance_type" {\n default = "t3.micro"\n}\n', + format: Format.hcl, + ); + expect(n2, isA>()); + expect((n2 as List).length, 2); + }); + test('.tf extension auto-detected', () { expect(detectFormat('main.tf'), Format.hcl); expect(detectFormat('config.hcl'), Format.hcl); From d5588f83ab4ddddc40fac867764ed7f9d8ab892d Mon Sep 17 00:00:00 2001 From: Hakim Jonas Ghoula Date: Fri, 22 May 2026 09:07:23 +0200 Subject: [PATCH 39/67] feat(eval): markdown text extraction op MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit `text` walks a markdown node (or list of nodes) and concatenates every prose-bearing leaf — `text`, `code`, `code_block`, and `image.alt` — in document order. Container nodes recurse through their `children`. `html_block` / `html_inline` are skipped (the `Node.textContent` trap of dragging `