From efaef0e22eff3f9b08b073976f7ed6e7f1c92780 Mon Sep 17 00:00:00 2001 From: Hakim Jonas Ghoula Date: Tue, 26 May 2026 00:21:35 +0200 Subject: [PATCH 1/2] feat(lambe)!: lib/ dart:io cleanup + 0.10.0 surface polish MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Largest concrete change: the published library now has zero dart:io imports, so the lambé playground in arda-web (which compiles with `dart compile wasm`) gets a clean platform compatibility report from pana and runs without dart:io bridges. Breaking — library API surface ============================== Removed from `package:lambe`: - `loadSchemaFromFile` - `loadSchemaForData` Both helpers used `dart:io` (file IO). They moved to `bin/schema_io.dart` where the CLI consumes them. `mergeSchemaWithData` (pure) stays in `lib/src/schema/loader.dart` and remains exported. REPL and readline (CLI-only) moved out of lib/ into bin/: - `lib/src/repl.dart` → `bin/repl.dart` - `lib/src/readline.dart` → `bin/readline.dart` - `lib/src/highlight_grammar.dart` → `bin/highlight_grammar.dart` bin/lam.dart and bin/repl.dart updated to use `package:lambe/...` for the lib/ imports they still need, plus relative imports for sibling bin/ files. Markdown frontmatter ==================== `parseInput(text, Format.markdown)` now uses `parseMarkdownWithFrontmatter` from rumil_parsers 0.8.1. A leading `---` YAML frontmatter block becomes a sibling `frontmatter` field on the document instead of being absorbed into the body as prose. Files without frontmatter parse byte-identically to before. The frontmatter conversion goes through rumil_parsers' yamlToNative so anchors and aliases resolve correctly without lambé reimplementing the walk. Foreign-idiom redirects (jq compatibility) ========================================== Adds redirect hints for the rest of the common jq habits a model might draft. Pipe-op redirects (fire on `... | name(...)`): `getpath`, `setpath`, `env`, `gsub`, `sub`, `test`, `match`, `scan`, `splits`, `tojson`, `fromjson`. Inline-idiom redirects: `@uri`, `@html`, `@sh`, `@json`, `$ENV` and other variable-binding forms. The regex family (`test`, `match`, `sub`) previously hit the closest-match suggestion ("did you mean text/map/sum?"). Now they get the regex-family explanation directly. Heterogeneous-list shape descriptions ===================================== `SList` gains an optional `sampledKinds: List?` field, populated when widening to `SAny` because of observed heterogeneity. Empty lists keep `sampledKinds: null` (no observations to report). `renderJsonSchema` reads it and emits, for example, `"sampled: number, string, boolean, null, array (heterogeneous)"` instead of the generic `"sampled, may be heterogeneous"`. CLI / error message polish ========================== - Output-shape error messages clarified: `Append one of these stages to the end of your query (keep your existing flags such as -t hcl):` instead of the ambiguous `Try appending one of:`. The format name interpolates dynamically so HCL / TOML / CSV all read correctly. - `--schema ` migration hint when the argument has a data extension (`.json`, `.yaml`, etc.) rather than `*.schema.json`, pointing at the new `--print-shape` flag instead of dumping usage. - `.jsonlines` extension joins `.ndjson` and `.jsonl` in auto-detecting --ndjson mode. Tests ===== - 4 new tests in `test/markdown_text_test.dart` for the frontmatter contract (text op no longer scoops frontmatter; frontmatter addressable; absent case unchanged; child shape preserved). - 2 tests in `test/shape_test.dart` updated to assert the new `sampledKinds` contract on heterogeneous lists. - `test/schema_loader_test.dart` imports the moved file-loaders via relative path `../bin/schema_io.dart`. 1657 tests pass, format clean, analyze clean. Performance =========== Bench (lambé library bench, median of 7, after warm-up) versus lambé 0.9.0 with rumil 0.7.0 / rumil_parsers 0.8.0: Workload AOT 0.9.0 0.10.0 Δ WASM 0.9.0 0.10.0 Δ --print-shape on 50k items 742.2 ms 693.8 ms -6.5% 319.9 ms 290.1 ms -9.3% filter+length on 50k items 748.5 ms 704.8 ms -5.8% 324.9 ms 285.9 ms -12.0% group_by(.role) on 1k records 30.7 ms 27.1 ms -11.7% 12.4 ms 11.8 ms -4.8% Comes from rumil's hot/cold dispatch split (4-6% on synthetic format benches), compounded through lambé's deeper call chain. --- {lib/src => bin}/highlight_grammar.dart | 2 +- bin/lam.dart | 52 +++++++++++++++++++++- {lib/src => bin}/readline.dart | 0 {lib/src => bin}/repl.dart | 5 ++- bin/schema_io.dart | 56 +++++++++++++++++++++++ lib/lambe.dart | 56 ++++++++++++++++++++++- lib/src/errors.dart | 5 ++- lib/src/input.dart | 36 +++++++++++++-- lib/src/schema/loader.dart | 59 +++---------------------- lib/src/schema/renderer.dart | 31 ++++++++++++- lib/src/shape/shape.dart | 49 ++++++++++++++++++-- test/markdown_text_test.dart | 42 ++++++++++++++++++ test/schema_loader_test.dart | 5 +++ test/shape_test.dart | 22 ++++++++- 14 files changed, 348 insertions(+), 72 deletions(-) rename {lib/src => bin}/highlight_grammar.dart (95%) rename {lib/src => bin}/readline.dart (100%) rename {lib/src => bin}/repl.dart (99%) create mode 100644 bin/schema_io.dart diff --git a/lib/src/highlight_grammar.dart b/bin/highlight_grammar.dart similarity index 95% rename from lib/src/highlight_grammar.dart rename to bin/highlight_grammar.dart index 2b63f30..2687a3a 100644 --- a/lib/src/highlight_grammar.dart +++ b/bin/highlight_grammar.dart @@ -8,7 +8,7 @@ library; import 'package:rumil_tokens/rumil_tokens.dart'; -import 'shape/pipe_ops.dart' as shape_ops; +import 'package:lambe/src/shape/pipe_ops.dart' as shape_ops; /// Lambé query grammar for the REPL highlighter. /// diff --git a/bin/lam.dart b/bin/lam.dart index d1d71f7..972a78f 100644 --- a/bin/lam.dart +++ b/bin/lam.dart @@ -10,7 +10,9 @@ import 'dart:io'; import 'package:args/args.dart'; import 'package:lambe/lambe.dart'; -import 'package:lambe/src/repl.dart' show runRepl; + +import 'repl.dart' show runRepl; +import 'schema_io.dart'; void main(List arguments) { final argParser = @@ -157,6 +159,19 @@ void main(List arguments) { final rest = args.rest; if (rest.isEmpty && !isPrintShapeMode && !isInteractive) { + // 0.8 → 0.9 migration: --schema took a data file in 0.8 and printed + // its shape. In 0.9 it takes a JSON Schema file (the shape printer + // moved to --print-shape). When the argument looks like data, point + // the user at the new flag instead of the generic usage dump. + if (schemaPath != null && _looksLikeDataFile(schemaPath)) { + stderr.writeln('Error: missing query expression.'); + stderr.writeln( + ' hint: --schema is now for declaring a JSON Schema (renamed ' + 'from 0.8.0).', + ); + stderr.writeln(' To inspect data shape use: lam --print-shape '); + exit(1); + } stderr.writeln('Error: missing query expression.'); stderr.writeln(); _usage(argParser); @@ -197,7 +212,9 @@ void main(List arguments) { // format auto-detection convention for .csv, .yaml, etc. if (!isNdjsonMode && rest.length > fileArgIndex) { final fpath = rest[fileArgIndex].toLowerCase(); - if (fpath.endsWith('.ndjson') || fpath.endsWith('.jsonl')) { + if (fpath.endsWith('.ndjson') || + fpath.endsWith('.jsonl') || + fpath.endsWith('.jsonlines')) { isNdjsonMode = true; } } @@ -648,6 +665,37 @@ Iterable _stdinLines() sync* { } } +/// True if [path] has a data-format extension (`.json`, `.yaml`, etc.) +/// rather than the `*.schema.json` JSON-Schema convention. Used by the +/// 0.8 → 0.9 migration hint: `--schema /path/to/data.json` is almost +/// certainly stale shell history from when `--schema` printed shapes +/// (now `--print-shape`). +bool _looksLikeDataFile(String path) { + final lower = path.toLowerCase(); + // *.schema.json is the canonical JSON Schema filename — not data. + if (lower.endsWith('.schema.json')) return false; + const dataExts = [ + '.json', + '.ndjson', + '.jsonl', + '.jsonlines', + '.yaml', + '.yml', + '.toml', + '.tf', + '.hcl', + '.csv', + '.tsv', + '.md', + '.markdown', + '.proto', + ]; + for (final ext in dataExts) { + if (lower.endsWith(ext)) return true; + } + return false; +} + /// Print usage information to stderr. void _usage(ArgParser parser) { stderr.writeln('Usage: lam [options] [file]'); diff --git a/lib/src/readline.dart b/bin/readline.dart similarity index 100% rename from lib/src/readline.dart rename to bin/readline.dart diff --git a/lib/src/repl.dart b/bin/repl.dart similarity index 99% rename from lib/src/repl.dart rename to bin/repl.dart index 681028a..d174ffc 100644 --- a/lib/src/repl.dart +++ b/bin/repl.dart @@ -11,9 +11,10 @@ import 'dart:io'; import 'package:rumil/rumil.dart'; -import '../lambe.dart'; -import 'completer.dart'; +import 'package:lambe/lambe.dart'; +import 'package:lambe/src/completer.dart'; import 'readline.dart'; +import 'schema_io.dart'; /// Run the interactive REPL with [data]. /// diff --git a/bin/schema_io.dart b/bin/schema_io.dart new file mode 100644 index 0000000..d0b5f58 --- /dev/null +++ b/bin/schema_io.dart @@ -0,0 +1,56 @@ +/// CLI-only schema file loaders. +/// +/// Lives in bin/ rather than lib/ so the published library has zero +/// `dart:io` imports — making it safely usable from `dart compile wasm` +/// and dart2js consumers (e.g. the lambé playground in arda-web). +/// +/// The pure schema-merge logic stays in `lib/src/schema/loader.dart` as +/// [mergeSchemaWithData]; only the path-based loaders moved. +library; + +import 'dart:io'; + +import 'package:lambe/lambe.dart'; + +/// Load a schema from a file path, parsing it as a JSON Schema subset. +/// +/// Throws [QueryError] if the file is missing or unreadable, or if +/// the schema parser rejects the content. +Shape loadSchemaFromFile(String path) { + final file = File(path); + if (!file.existsSync()) { + throw QueryError('schema file not found: $path'); + } + final source = file.readAsStringSync(); + return parseJsonSchema(source); +} + +/// Load a schema for [dataPath], preferring [explicitSchemaPath] when +/// provided and falling back to a `.schema.json` sibling. +/// +/// Returns `null` when no explicit path is given and no sibling +/// exists. Throws [QueryError] for explicit paths that fail to load. +Shape? loadSchemaForData({String? explicitSchemaPath, String? dataPath}) { + if (explicitSchemaPath != null) { + return loadSchemaFromFile(explicitSchemaPath); + } + if (dataPath != null) { + final sibling = _siblingSchemaPath(dataPath); + if (sibling != null && File(sibling).existsSync()) { + return loadSchemaFromFile(sibling); + } + } + return null; +} + +/// Compute the sibling schema path for [dataPath]. +/// +/// Strips the data file's extension and appends `.schema.json`: +/// `data.json` → `data.schema.json`, `events.ndjson` → `events.schema.json`. +/// Returns `null` for paths without a recognizable extension. +String? _siblingSchemaPath(String dataPath) { + final lastDot = dataPath.lastIndexOf('.'); + if (lastDot < 0) return null; + final base = dataPath.substring(0, lastDot); + return '$base.schema.json'; +} diff --git a/lib/lambe.dart b/lib/lambe.dart index 3a76ce7..20dbc7d 100644 --- a/lib/lambe.dart +++ b/lib/lambe.dart @@ -32,8 +32,7 @@ export 'src/input.dart' export 'src/mcp_payload.dart' show renderMcpShapeErrorPayload; export 'src/output.dart' show OutputFormat, CellPolicy, formatOutput, inferSchema; -export 'src/schema/loader.dart' - show loadSchemaFromFile, loadSchemaForData, mergeSchemaWithData; +export 'src/schema/loader.dart' show mergeSchemaWithData; export 'src/schema/parser.dart' show parseJsonSchema; export 'src/schema/renderer.dart' show renderJsonSchema; export 'src/shape/shape.dart' @@ -452,6 +451,16 @@ String? _jqPipeOpHint(String word) { case 'leaf_paths': return 'Lambé has no `$word` op. Use `--print-shape` (CLI) or ' '`lambe_print_shape` (MCP) to see the structure of the data.'; + case 'getpath': + return '`getpath([...])` is not a lambé op. Lambé paths are ' + 'static: write `.users[0].age` instead of ' + '`getpath(["users",0,"age"])`. For dynamic indexing, ' + 'compose with `map(...)` over the path components.'; + case 'setpath': + return '`setpath([...]; v)` is not a lambé op. Lambé does not ' + 'mutate input; it produces new values. Construct the new ' + 'object with `{...}` literals, or use `map(...)` / ' + '`map_values(...)` to update fields in lists / maps.'; case 'range': return 'Lambé has no `range` generator. Build the list inline ' '(`[0,1,2,...]`) or pre-compute it; lambé queries are ' @@ -461,6 +470,27 @@ String? _jqPipeOpHint(String word) { return '`$word` is not a lambé op. Use slicing `[:n]` to take a ' 'prefix, `[n:n+1]` to take an index, or `first`/`last` for ' 'the ends.'; + case 'env': + return 'Lambé has no `env` op (queries are pure; environment ' + 'access lives outside the query). Set up the values via the ' + 'shell and pipe them in as data.'; + case 'gsub': + case 'sub': + case 'test': + case 'match': + case 'scan': + case 'splits': + return '`$word` is a regex op; lambé treats strings as opaque. ' + 'Pipe through `grep` / `sed` / a regex tool before or after ' + '`lam` for regex transforms.'; + case 'tojson': + return '`tojson` is not a lambé op. Use `as(json)` to bridge to ' + 'a JSON-shaped value, or run `lam` with `-t json` (the ' + 'default) to serialize the result.'; + case 'fromjson': + return '`fromjson` is not a lambé op. Lambé parses input by ' + 'format on read; for JSON-in-strings, decode upstream of ' + 'lambé or use `as(json)` after coercing the wrapping shape.'; default: return null; } @@ -584,6 +614,28 @@ String? _jqIdiomHint(String expression, int offset) { return 'Lambé does not support `@base64` encoding/decoding. ' 'Pre-process the data outside lambé if you need it.'; } + // `@uri` — explicitly unsupported. + if (rest.startsWith('@uri')) { + return 'Lambé does not support `@uri` URL-encoding. ' + 'Pre-process the data outside lambé if you need it.'; + } + // `@html` / `@sh` / `@json` — other jq format strings. + if (rest.startsWith('@html') || + rest.startsWith('@sh') || + rest.startsWith('@json')) { + final fmt = RegExp(r'^@(\w+)').firstMatch(rest)?.group(1) ?? 'fmt'; + return 'Lambé has no `@$fmt` format string. ' + 'Use `as(json)` for JSON shape bridges, `--to ` for ' + 'output formats, or pre-process outside lambé.'; + } + // `$ENV` / `$NAME` — jq's environment-variable / variable-binding + // syntax. Lambé queries are pure; no in-query variables. + if (rest.startsWith(r'$')) { + return 'Lambé has no variable-binding (`\$NAME`) or environment ' + '(`\$ENV`) syntax. Queries are pure; environment access lives ' + 'outside the query. Set up values via the shell, pipe them ' + 'as data.'; + } return null; } diff --git a/lib/src/errors.dart b/lib/src/errors.dart index ffa552d..f2180a0 100644 --- a/lib/src/errors.dart +++ b/lib/src/errors.dart @@ -64,7 +64,10 @@ class OutputShapeError extends QueryError { buf.write(renderShape(r.got)); buf.write('.'); if (r.suggestions.isNotEmpty) { - buf.write('\nTry appending one of:'); + buf.write( + '\nAppend one of these stages to the end of your query ' + '(keep your existing flags such as `-t ${r.format.name}`):', + ); for (final s in r.suggestions) { buf.write('\n | '); buf.write(s.display); diff --git a/lib/src/input.dart b/lib/src/input.dart index 96d0234..ccff560 100644 --- a/lib/src/input.dart +++ b/lib/src/input.dart @@ -134,15 +134,45 @@ Object? _parseDelimited(String input, DelimitedConfig? config) { } /// Parse CommonMark Markdown into queryable native Dart types. +/// +/// Uses [parseMarkdownWithFrontmatter] so a leading `---` YAML +/// frontmatter block is detected and surfaced as a sibling +/// `frontmatter` field on the document, instead of being absorbed into +/// the body as prose. Files without frontmatter parse identically to +/// the previous [parseMarkdown]-only path. Object? _parseMd(String input) { - final result = parseMarkdown(input); + final result = parseMarkdownWithFrontmatter(input); return switch (result) { - Success(:final value) => mdToNative(value), - Partial(:final value) => mdToNative(value), + Success(:final value) => mdToNativeWithFrontmatter(value), + Partial(:final value) => mdToNativeWithFrontmatter(value), Failure() => throw QueryError('Markdown parse error: ${result.errors}'), }; } +/// Convert a [MarkdownDocument] (Markdown body + optional YAML +/// frontmatter) into queryable native Dart types. +/// +/// When frontmatter is absent the result matches [mdToNative] +/// byte-for-byte — `{type: 'document', children: [...]}`. When present, +/// a sibling `frontmatter` key carries the parsed YAML as native Dart +/// values (maps, lists, scalars), addressable via the usual lambé path +/// access (e.g. `.frontmatter.title`). +/// +/// Frontmatter is decoded via rumil_parsers' [yamlToNative], which +/// resolves YAML anchors and aliases before flattening — so a +/// frontmatter block that happens to use them (rare but valid) +/// queries identically to inline-only YAML. +Map mdToNativeWithFrontmatter(MarkdownDocument doc) { + final body = mdToNative(doc.document); + final fm = doc.frontmatter; + if (fm == null) return body as Map; + return { + 'type': 'document', + 'frontmatter': yamlToNative(fm), + 'children': (body as Map)['children'], + }; +} + /// Convert an [MdDocument] into queryable native Dart types. /// /// Every node becomes a map with a `type` discriminator. Container nodes diff --git a/lib/src/schema/loader.dart b/lib/src/schema/loader.dart index 5264454..1a05b6c 100644 --- a/lib/src/schema/loader.dart +++ b/lib/src/schema/loader.dart @@ -1,10 +1,4 @@ -/// Load and merge schemas for the `--schema` entry point. -/// -/// [loadSchemaFromFile] reads a schema file (JSON) and returns the -/// parsed [Shape], after a JSON-Schema-looking sanity check. -/// [loadSchemaForData] adds sibling auto-detection: given a data file -/// path, it looks for `.schema.json` next to it and loads -/// that when present. +/// Pure schema-merge logic for the `--schema` entry point. /// /// [mergeSchemaWithData] combines a user-declared schema with the /// shape inferred from actual data. See `doc/schema-design.md` section @@ -12,56 +6,15 @@ /// "schema augments, never contradicts" — agreements pass, schema /// fills in what data can't express (empty-list elements, optional /// fields), concrete-type disagreements error at load time. +/// +/// File-loading helpers (`loadSchemaFromFile`, `loadSchemaForData`) +/// live in `bin/schema_io.dart` because they pull in `dart:io`. The +/// published library has zero `dart:io` imports so it stays +/// WASM-compilable for browser consumers (e.g. the lambé playground). library; -import 'dart:io'; - import '../errors.dart'; import '../shape/shape.dart'; -import 'parser.dart'; - -/// Load a schema from a file path, parsing it as a JSON Schema subset. -/// -/// Throws [QueryError] if the file is missing or unreadable, or if -/// the schema parser rejects the content. -Shape loadSchemaFromFile(String path) { - final file = File(path); - if (!file.existsSync()) { - throw QueryError('schema file not found: $path'); - } - final source = file.readAsStringSync(); - return parseJsonSchema(source); -} - -/// Load a schema for [dataPath], preferring [explicitSchemaPath] when -/// provided and falling back to a `.schema.json` sibling. -/// -/// Returns `null` when no explicit path is given and no sibling -/// exists. Throws [QueryError] for explicit paths that fail to load. -Shape? loadSchemaForData({String? explicitSchemaPath, String? dataPath}) { - if (explicitSchemaPath != null) { - return loadSchemaFromFile(explicitSchemaPath); - } - if (dataPath != null) { - final sibling = _siblingSchemaPath(dataPath); - if (sibling != null && File(sibling).existsSync()) { - return loadSchemaFromFile(sibling); - } - } - return null; -} - -/// Compute the sibling schema path for [dataPath]. -/// -/// Strips the data file's extension and appends `.schema.json`: -/// `data.json` → `data.schema.json`, `events.ndjson` → `events.schema.json`. -/// Returns `null` for paths without a recognizable extension. -String? _siblingSchemaPath(String dataPath) { - final lastDot = dataPath.lastIndexOf('.'); - if (lastDot < 0) return null; - final base = dataPath.substring(0, lastDot); - return '$base.schema.json'; -} /// Merge a schema-declared [schema] shape with a data-inferred [data] /// shape. Schema augments data: diff --git a/lib/src/schema/renderer.dart b/lib/src/schema/renderer.dart index bc085cf..1e54614 100644 --- a/lib/src/schema/renderer.dart +++ b/lib/src/schema/renderer.dart @@ -59,7 +59,7 @@ Map _encode(Shape shape) { SBool() => {'type': 'boolean'}, SNum() => {'type': 'number'}, SString() => {'type': 'string'}, - SList(:final element) => { + SList(:final element, :final sampledKinds) => { 'type': 'array', 'items': _encode(element), // SList(SAny()) means "this list contained heterogeneous or @@ -69,7 +69,17 @@ Map _encode(Shape shape) { // parser ignores unknown keywords (per JSON Schema's // extensibility convention for metadata), so this round-trips // safely. - if (element is SAny) 'description': 'sampled, may be heterogeneous', + // + // When `sampledKinds` is populated, the heterogeneity was + // observed (mixed types in the sample) — list the distinct + // shapes so users see what's actually there. When it's null, + // the list was empty or shape-inference widened structurally + // without an observable sample (e.g. via static query analysis). + if (element is SAny) + 'description': + sampledKinds != null && sampledKinds.isNotEmpty + ? 'sampled: ${_describeKinds(sampledKinds)} (heterogeneous)' + : 'sampled, may be heterogeneous', }, SMap(:final fields) => _encodeMap(fields), // Unreachable: SOptional was unwrapped above. Present for @@ -78,6 +88,23 @@ Map _encode(Shape shape) { }; } +/// Render a list of distinct sampled shapes as a comma-separated word +/// list for the heterogeneous-list description string. Maps each +/// [Shape] to a short human-readable name (number, string, list, ...); +/// nested container shapes collapse to their kind without recursion. +String _describeKinds(List kinds) => kinds.map(_kindName).join(', '); + +String _kindName(Shape s) => switch (s) { + SNull() => 'null', + SBool() => 'boolean', + SNum() => 'number', + SString() => 'string', + SList() => 'array', + SMap() => 'object', + SOptional() => 'optional', + SAny() => 'any', +}; + Map _encodeMap(Map fields) { final properties = {}; final required = []; diff --git a/lib/src/shape/shape.dart b/lib/src/shape/shape.dart index 36b1122..f98d1c8 100644 --- a/lib/src/shape/shape.dart +++ b/lib/src/shape/shape.dart @@ -106,23 +106,52 @@ final class SString extends Shape { /// /// The [element] is the shape of all elements if they agree, or [SAny] if /// the list is empty or contains mixed shapes. +/// +/// [sampledKinds] is non-null only when [element] is [SAny] *because of +/// observed heterogeneity* (mixed types in the sample), and lists the +/// distinct observed element shapes. It is null for the empty-list case +/// (no observations) and for any [SList] whose element is not [SAny]. +/// Renderers and explainers can use it to surface what was actually +/// seen, without having to re-walk the source data. final class SList extends Shape { /// Shape of each element. [SAny] for empty or heterogeneous lists. final Shape element; + /// Distinct observed element shapes, when [element] is [SAny] due to + /// heterogeneity. Null when not applicable. + final List? sampledKinds; + /// Creates an [SList] shape with the given [element] shape. - const SList(this.element); + const SList(this.element, {this.sampledKinds}); @override - bool operator ==(Object other) => other is SList && other.element == element; + bool operator ==(Object other) => + other is SList && + other.element == element && + _kindsEqual(other.sampledKinds, sampledKinds); @override - int get hashCode => Object.hash('list', element); + int get hashCode => Object.hash('list', element, _kindsHash(sampledKinds)); @override String toString() => 'list<$element>'; } +bool _kindsEqual(List? a, List? b) { + if (identical(a, b)) return true; + if (a == null || b == null) return false; + if (a.length != b.length) return false; + for (var i = 0; i < a.length; i++) { + if (a[i] != b[i]) return false; + } + return true; +} + +int _kindsHash(List? kinds) { + if (kinds == null) return 0; + return Object.hashAll(kinds); +} + /// Shape of a map, with the shape of each known field. /// /// [SMap] preserves field order (insertion order), which matches Dart's @@ -227,7 +256,19 @@ Shape _listShape(List list) { final limit = list.length < _heteroSampleLimit ? list.length : _heteroSampleLimit; for (var i = 1; i < limit; i++) { - if (shapeOf(list[i]) != first) return const SList(SAny()); + final next = shapeOf(list[i]); + if (next != first) { + // Heterogeneous: collect distinct shapes from the sample so the + // schema renderer can describe what was observed instead of just + // saying "may be heterogeneous." Stable insertion order, by-equality + // dedup; cheap because limit is bounded by [_heteroSampleLimit]. + final kinds = [first]; + for (var j = 1; j < limit; j++) { + final k = shapeOf(list[j]); + if (!kinds.contains(k)) kinds.add(k); + } + return SList(const SAny(), sampledKinds: kinds); + } } return SList(first); } diff --git a/test/markdown_text_test.dart b/test/markdown_text_test.dart index 43fa531..e2c69f7 100644 --- a/test/markdown_text_test.dart +++ b/test/markdown_text_test.dart @@ -126,6 +126,48 @@ void main() { }); }); + group('YAML frontmatter is split out, not absorbed as prose', () { + const fmSource = + '---\n' + 'title: My Title\n' + 'tags:\n' + ' - alpha\n' + ' - beta\n' + '---\n' + '\n' + '# Heading\n' + '\n' + 'Body.\n'; + + test('text op no longer scoops up the frontmatter', () { + final doc = _md(fmSource); + final result = query('. | text', doc) as String; + // Pre-fix: the result included "title: My Title alpha beta..." because + // the YAML block was parsed as a paragraph. Post-fix: only the body. + expect(result, contains('Heading')); + expect(result, contains('Body.')); + expect(result, isNot(contains('title:'))); + expect(result, isNot(contains('alpha'))); + }); + + test('frontmatter is addressable via .frontmatter', () { + final doc = _md(fmSource); + expect(query('.frontmatter.title', doc), 'My Title'); + expect(query('.frontmatter.tags', doc), ['alpha', 'beta']); + }); + + test('document without frontmatter has no frontmatter field', () { + final doc = _md('# heading\n\nbody.\n'); + expect(query('. | has("frontmatter")', doc), false); + }); + + test('children of frontmatter doc preserve previous shape', () { + final doc = _md(fmSource); + final firstChild = query('.children[0]', doc) as Map; + expect(firstChild['type'], 'heading'); + }); + }); + group('text op metadata', () { test('registered in pipeOpNames', () { expect(pipeOpNames, contains('text')); diff --git a/test/schema_loader_test.dart b/test/schema_loader_test.dart index ac1b34f..dfe8f3a 100644 --- a/test/schema_loader_test.dart +++ b/test/schema_loader_test.dart @@ -18,6 +18,11 @@ import 'dart:io'; import 'package:lambe/lambe.dart'; import 'package:test/test.dart'; +// File-loading helpers live in `bin/` (not in `lib/`) so the published +// library has no `dart:io` imports. They're still under test via this +// relative import. +import '../bin/schema_io.dart'; + void main() { group('loadSchemaFromFile', () { late Directory tmp; diff --git a/test/shape_test.dart b/test/shape_test.dart index 53fa8af..0f331e7 100644 --- a/test/shape_test.dart +++ b/test/shape_test.dart @@ -45,7 +45,17 @@ void main() { }); test('heterogeneous list collapses to SAny element', () { - expect(shapeOf([1, 'x', true]), const SList(SAny())); + final shape = shapeOf([1, 'x', true]); + expect(shape, isA()); + final list = shape as SList; + expect(list.element, const SAny()); + // Heterogeneous lists carry the distinct observed shapes so the + // schema renderer can describe what was actually present. + expect(list.sampledKinds, isNotNull); + expect( + list.sampledKinds, + containsAll([const SNum(), const SString(), const SBool()]), + ); }); test('nested list of lists', () { @@ -76,7 +86,15 @@ void main() { test('sampling: heterogeneity within the sample window is detected', () { final value = [1, 2, 3, 'x', 5]; - expect(shapeOf(value), const SList(SAny())); + final shape = shapeOf(value); + expect(shape, isA()); + final list = shape as SList; + expect(list.element, const SAny()); + // Both number and string were in the sample window. + expect( + list.sampledKinds, + containsAll([const SNum(), const SString()]), + ); }); }); From c89b72f0dee9daa52c8eef3fc63d215852336996 Mon Sep 17 00:00:00 2001 From: Hakim Jonas Ghoula Date: Tue, 26 May 2026 00:22:19 +0200 Subject: [PATCH 2/2] docs(lambe): 0.10.0 release notes, jq-alias docs, PATH skill note MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit CHANGELOG entry for 0.10.0 covering the breaking library API removal, the frontmatter parser adoption, foreign-idiom redirect coverage, heterogeneous-list description upgrade, error message polish, performance numbers (5-12% AOT, 5-12% WASM on representative workloads), and the rumil_parsers ^0.8.0 → ^0.8.1 dependency bump. jq-compatibility aliases now documented across the canonical surfaces: - `tonumber` (jq alias for `to_number`) — was implemented in 0.9 but undocumented; now appears in jq-to-lambe.md, syntax.md, lam.1.md, AGENTS.md. - `add` (jq alias for `sum`) — same story; same surfaces. `//` documentation tightened in jq-to-lambe.md: it's the null-fallback operator (returns the right-hand side when the left evaluates to null), not an error-handler. The doc previously said "not yet supported" — that was stale; 0.9 implemented it. Skill file (.claude/skills/lambe/SKILL.md) gains a sandbox note: when an agent harness reports `lam: command not found`, the fall-back is the absolute path `~/.pub-cache/bin/lam`. This is a shell-PATH issue, not a lambé issue. Documented so future readers of the skill don't bounce off it. doc/lam.1 regenerated from doc/lam.1.md via tool/manpage.dart so the committed mandoc round-trips cleanly with the source. pubspec bumped to 0.10.0; rumil_parsers constraint bumped to ^0.8.1. --- .claude/skills/lambe/SKILL.md | 6 ++ AGENTS.md | 4 +- CHANGELOG.md | 113 ++++++++++++++++++++++++++++++++++ doc/jq-to-lambe.md | 25 +++++++- doc/lam.1 | 8 +-- doc/lam.1.md | 8 +-- doc/syntax.md | 7 ++- pubspec.yaml | 4 +- 8 files changed, 160 insertions(+), 15 deletions(-) diff --git a/.claude/skills/lambe/SKILL.md b/.claude/skills/lambe/SKILL.md index 36b58a5..200ca24 100644 --- a/.claude/skills/lambe/SKILL.md +++ b/.claude/skills/lambe/SKILL.md @@ -12,6 +12,12 @@ metadata: Lambé is on the user's PATH after `dart pub global activate lambe`. You invoke it via shell. The binary is named `lam`. +**Sandbox note for Claude Code:** the Bash tool does not always inherit +the user's interactive shell PATH. If `lam: command not found` appears, +fall back to the absolute path `~/.pub-cache/bin/lam`, which is where +`dart pub global activate` installs it. This is a Claude-Code shell +behavior, not a lambé issue. + ## When to reach for `lam` The user wants to do something with a **structured data file**: diff --git a/AGENTS.md b/AGENTS.md index 675e3d2..eb421be 100644 --- a/AGENTS.md +++ b/AGENTS.md @@ -91,7 +91,7 @@ recursion, no `def`/lambdas. Don't reach for it when the user wants: . | length count elements (list / map / string) . | first first element . | last last element -. | sum sum numbers +. | sum sum numbers (jq alias: add) . | avg average . | min / max minimum / maximum . | keys map keys or list indices @@ -99,7 +99,7 @@ recursion, no `def`/lambdas. Don't reach for it when the user wants: . | has("field") check field exists (returns bool) . | to_entries map to [{key, value}] . | from_entries [{key, value}] to map -. | to_number parse a string as a number (use on CSV numeric columns) +. | to_number parse a string as a number (use on CSV numeric columns; jq alias: tonumber) . | type runtime type: null, boolean, number, string, array, object . | filter_values(. > 5) filter a map's values . | map_values(. * 2) transform a map's values diff --git a/CHANGELOG.md b/CHANGELOG.md index 441611a..693abe8 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -1,3 +1,116 @@ +## 0.10.0 + +Polish release built on rumil 0.7.1 / rumil_parsers 0.8.1. The library +becomes WASM-clean (no `dart:io` in `lib/`) so it runs in browsers +without the CLI binary, frontmatter no longer leaks into Markdown +prose, foreign-jq idioms get redirect hints, and a handful of error +messages clarify themselves. End-to-end speedup on representative +workloads: 5–12% AOT, 5–12% WASM (the rumil hot/cold split lands +through the lambé pipeline). + +### Breaking — library API surface + +- **Removed from `package:lambe`**: `loadSchemaFromFile`, + `loadSchemaForData`. The file-loading helpers moved to + `bin/schema_io.dart` because they pull in `dart:io`. `lib/` is now + `dart:io`-free, which lets the library compile to WASM without + bridges (the lambé playground in arda-web depends on this). + + **`mergeSchemaWithData` stays** — it's pure and remains exported. + + Migration: a library consumer who needs the file-loading shape + inlines `parseJsonSchema(File(path).readAsStringSync())`. The CLI + unchanged; both `lam` and `lam-mcp` keep their schema flags. + +### Markdown frontmatter no longer absorbed as prose + +- `parseInput(text, Format.markdown)` now uses + `parseMarkdownWithFrontmatter` from rumil_parsers 0.8.1. A leading + YAML frontmatter block (delimited by `---` lines) becomes a sibling + `frontmatter` field on the document, instead of being concatenated + into the body text by the `text` op or by the children list. + + Files without frontmatter parse byte-identically to before. Files + with frontmatter gain a `frontmatter` key addressable via + `.frontmatter.title`, `.frontmatter.tags[0]`, etc. + + Pre-fix: `lam '. | text' SKILL.md` returned `"name: lambe + description: | ...First headingBody."` (the YAML scooped up by the + prose walker). + + Post-fix: `lam '. | text' SKILL.md` returns `"First headingBody."` + and the metadata is queryable separately. + +### Foreign-idiom redirects (jq-compatibility) + +The parser already emitted `help: ...` redirects for `select`/`paths`/ +`..`/`try` in 0.9. This release widens coverage to the rest of the +common jq habits the model might draft: + +- **Pipe-op redirects** (fire when written as `... | name(...)`): + `getpath`, `setpath`, `env`, `gsub`/`sub`/`test`/`match`/`scan`/ + `splits` (regex family), `tojson`, `fromjson`. +- **Inline-idiom redirects** (fire on character patterns): + `@uri`, `@html`, `@sh`, `@json`, `$ENV` and other variable-binding + forms (`$NAME`). +- **Closest-match overrides**: `test`, `match`, `sub` previously + produced misleading "did you mean text/map/sum?" guesses. Now they + produce the regex-family explanation directly. + +### Other user-visible changes + +- **Output-shape error messages clarify "appending":** the writer's + bridge suggestion now reads `Append one of these stages to the end + of your query (keep your existing flags such as -t hcl):` instead + of the ambiguous `Try appending one of:`. The format name is + interpolated dynamically. +- **`--schema ` migration hint:** when the argument has a + data extension (`.json`/`.yaml`/`.toml`/etc.) instead of a + `*.schema.json`, the error suggests the new flag: + `hint: --schema is now for declaring a JSON Schema (renamed from + 0.8.0). To inspect data shape use: lam --print-shape `. +- **`.jsonlines` extension auto-implies `--ndjson`** (joining the + existing `.ndjson` and `.jsonl` auto-detection). +- **Heterogeneous-list shape descriptions list the sampled types:** + `--print-shape` on a mixed array now emits `"description": + "sampled: number, string, boolean, null, array (heterogeneous)"` + instead of the generic `"sampled, may be heterogeneous"`. Empty + lists keep the original wording (no observed kinds). +- **`tonumber` and `add` jq-compatibility aliases now documented** in + `doc/jq-to-lambe.md`, `doc/syntax.md`, `doc/lam.1.md`, and + `AGENTS.md`. The aliases were already implemented in 0.9; the + surface was just undocumented. +- **`//` documentation tightened** in `doc/jq-to-lambe.md` (it's the + null-fallback operator, not an error-handler — `expr // alt` + returns `alt` when `expr` evaluates to null, but computation + errors still propagate). + +### Performance + +End-to-end CLI / library benchmark (median of 7 runs, after +warm-up), comparing lambé 0.9.0 (rumil 0.7.0 / rumil_parsers 0.8.0) +against this release (rumil 0.7.1 / rumil_parsers 0.8.1): + +| Workload | AOT 0.9.0 | AOT 0.10.0 | Δ | WASM 0.9.0 | WASM 0.10.0 | Δ | +|-------------------------------------------|----------:|-----------:|-------:|-----------:|------------:|-------:| +| `--print-shape` on 50k items | 742.2 ms | 693.8 ms | -6.5% | 319.9 ms | 290.1 ms | -9.3% | +| `.items \| filter(.value > 50000) \| length` | 748.5 ms | 704.8 ms | -5.8% | 324.9 ms | 285.9 ms | -12.0% | +| `group_by(.role)` on 1k records | 30.7 ms | 27.1 ms | -11.7% | 12.4 ms | 11.8 ms | -4.8% | + +The win comes from rumil's hot/cold dispatch split (4–6% on +synthetic format benches; compounded through lambé's deeper call +chain). WASM is also the relevant runtime for the lambé playground +in arda-web; users get faster live-explain feedback in the browser. + +Reproduce with `tool/bench/cli_bench.sh` (AOT) — the WASM library +bench requires a host program importing `package:lambe` and +compiling with `dart compile wasm`. + +### Dependency bumps + +- `rumil_parsers ^0.8.0` → `^0.8.1` (required for + `parseMarkdownWithFrontmatter`). + ## 0.9.0 Closes the shape feedback loop. Declare a JSON Schema, check queries diff --git a/doc/jq-to-lambe.md b/doc/jq-to-lambe.md index 0bea6e3..2fcb442 100644 --- a/doc/jq-to-lambe.md +++ b/doc/jq-to-lambe.md @@ -95,6 +95,23 @@ jq returns `[[group1], [group2]]`. Lambe returns `[{key: true, values: [...]}, { jq uses `add` for sum and `add / length` for average. Lambe has `sum` and `avg` directly. +`add` is also accepted as a jq-compatibility alias for `sum` — both +parse, both produce the same AST, and `--explain` canonicalises to +`sum`. Use `sum` in new lambé queries; `add` exists so jq habits +don't fail on the parser. + +## Type coercion + +| jq | Lambe | +|----|-------| +| `"42" \| tonumber` | `"42" \| to_number` (or jq alias `tonumber`) | +| `"3.14" \| tonumber` | `"3.14" \| to_number` (or jq alias `tonumber`) | + +`to_number` is lambé's canonical name; `tonumber` is accepted as a +jq-compatibility alias. Both parse, both throw `to_number: cannot +parse "..."` on a non-numeric string, and `--explain` canonicalises +to `to_number`. + ## Object construction | jq | Lambe | @@ -126,9 +143,13 @@ Identical syntax. | jq | Lambe | |----|-------| | `.config \| has("host")` | `.config \| has("host")` | -| `.config.missing // "default"` | not yet supported | +| `.config.missing // "default"` | `.config.missing // "default"` | -`has` is identical. jq's `//` (alternative operator) does not exist in Lambe yet. +`has` is identical. `//` is the null-fallback operator: `expr // alt` +returns `alt` when `expr` evaluates to `null`. It is not an +error-handler — computation errors still propagate. For "the field +might not exist," `// default` is the idiom; for "this might fail," +use shape checks (`has(...)`, `--print-shape`) before the call site. ## Entry conversion diff --git a/doc/lam.1 b/doc/lam.1 index a0d03c7..baf8bba 100644 --- a/doc/lam.1 +++ b/doc/lam.1 @@ -139,8 +139,8 @@ Length of list, map, or string. \fBfirst\fR, \fBlast\fR First or last element. .TP -\fBsum\fR, \fBavg\fR, \fBmin\fR, \fBmax\fR -Aggregate operations on numeric lists. +\fBsum\fR (jq alias: \fBadd\fR), \fBavg\fR, \fBmin\fR, \fBmax\fR +Aggregate operations on numeric lists. \fCadd\fR is accepted for jq-compatibility; \fC--explain\fR canonicalises to \fCsum\fR. .TP \fBhas\fR(\fIkey\fR) Check if a map contains a key. @@ -151,8 +151,8 @@ Map to [{key, value}]. \fBfrom_entries\fR [{key, value}] to map. .TP -\fBto_number\fR -Parse a string as a number. Pass-through for existing numbers. +\fBto_number\fR (jq alias: \fBtonumber\fR) +Parse a string as a number. Pass-through for existing numbers. Both names parse identically; \fC--explain\fR canonicalises to \fCto_number\fR. .TP \fBtype\fR Runtime type of the value as a string: "null", "boolean", "number", "string", "array", or "object". diff --git a/doc/lam.1.md b/doc/lam.1.md index 729f25b..851905b 100644 --- a/doc/lam.1.md +++ b/doc/lam.1.md @@ -159,8 +159,8 @@ Queries start with **.** (the current document) and chain operations with **|**. **first**, **last** : First or last element. -**sum**, **avg**, **min**, **max** -: Aggregate operations on numeric lists. +**sum** (jq alias: **add**), **avg**, **min**, **max** +: Aggregate operations on numeric lists. `add` is accepted for jq-compatibility; `--explain` canonicalises to `sum`. **has**(*key*) : Check if a map contains a key. @@ -171,8 +171,8 @@ Queries start with **.** (the current document) and chain operations with **|**. **from_entries** : [{key, value}] to map. -**to_number** -: Parse a string as a number. Pass-through for existing numbers. +**to_number** (jq alias: **tonumber**) +: Parse a string as a number. Pass-through for existing numbers. Both names parse identically; `--explain` canonicalises to `to_number`. **type** : Runtime type of the value as a string: "null", "boolean", "number", "string", "array", or "object". diff --git a/doc/syntax.md b/doc/syntax.md index 37f7fc5..a50955b 100644 --- a/doc/syntax.md +++ b/doc/syntax.md @@ -510,7 +510,8 @@ $ lam '.tags | last' data.json ### sum, avg, min, max -Aggregate operations on numeric lists. +Aggregate operations on numeric lists. `add` is accepted as a +jq-compatibility alias for `sum`; `--explain` canonicalises to `sum`. ```bash $ lam '.users | map(.age) | sum' data.json @@ -568,6 +569,10 @@ Parse a string as a number. Pass-through for existing numbers. CSV and TSV cells are strings by default; use `to_number` to coerce them before arithmetic. +`tonumber` is accepted as a jq-compatibility alias — both names parse +identically and `--explain` canonicalises to `to_number`. Use +`to_number` in new lambé queries. + ```bash $ lam -n '"42" | to_number' 42 diff --git a/pubspec.yaml b/pubspec.yaml index ca971c9..2fe90a8 100644 --- a/pubspec.yaml +++ b/pubspec.yaml @@ -2,7 +2,7 @@ name: lambe description: >- A query language for structured data that shows you what you're working with. Shape-aware --explain, JSON Schema input, format bridges. CLI + library + MCP. -version: 0.9.0 +version: 0.10.0 homepage: https://ardaproject.org/lambe repository: https://github.com/hakimjonas/lambe topics: @@ -17,7 +17,7 @@ environment: dependencies: rumil: ^0.7.0 - rumil_parsers: ^0.8.0 + rumil_parsers: ^0.8.1 rumil_expressions: ^0.7.0 rumil_tokens: ^0.1.0 args: ^2.6.0