Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 6 additions & 0 deletions .claude/skills/lambe/SKILL.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,12 @@ metadata:
Lambé is on the user's PATH after `dart pub global activate lambe`. You
invoke it via shell. The binary is named `lam`.

**Sandbox note for Claude Code:** the Bash tool does not always inherit
the user's interactive shell PATH. If `lam: command not found` appears,
fall back to the absolute path `~/.pub-cache/bin/lam`, which is where
`dart pub global activate` installs it. This is a Claude-Code shell
behavior, not a lambé issue.

## When to reach for `lam`

The user wants to do something with a **structured data file**:
Expand Down
4 changes: 2 additions & 2 deletions AGENTS.md
Original file line number Diff line number Diff line change
Expand Up @@ -91,15 +91,15 @@ recursion, no `def`/lambdas. Don't reach for it when the user wants:
. | length count elements (list / map / string)
. | first first element
. | last last element
. | sum sum numbers
. | sum sum numbers (jq alias: add)
. | avg average
. | min / max minimum / maximum
. | keys map keys or list indices
. | values map values
. | has("field") check field exists (returns bool)
. | to_entries map to [{key, value}]
. | from_entries [{key, value}] to map
. | to_number parse a string as a number (use on CSV numeric columns)
. | to_number parse a string as a number (use on CSV numeric columns; jq alias: tonumber)
. | type runtime type: null, boolean, number, string, array, object
. | filter_values(. > 5) filter a map's values
. | map_values(. * 2) transform a map's values
Expand Down
113 changes: 113 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,116 @@
## 0.10.0

Polish release built on rumil 0.7.1 / rumil_parsers 0.8.1. The library
becomes WASM-clean (no `dart:io` in `lib/`) so it runs in browsers
without the CLI binary, frontmatter no longer leaks into Markdown
prose, foreign-jq idioms get redirect hints, and a handful of error
messages clarify themselves. End-to-end speedup on representative
workloads: 5–12% AOT, 5–12% WASM (the rumil hot/cold split lands
through the lambé pipeline).

### Breaking — library API surface

- **Removed from `package:lambe`**: `loadSchemaFromFile`,
`loadSchemaForData`. The file-loading helpers moved to
`bin/schema_io.dart` because they pull in `dart:io`. `lib/` is now
`dart:io`-free, which lets the library compile to WASM without
bridges (the lambé playground in arda-web depends on this).

**`mergeSchemaWithData` stays** — it's pure and remains exported.

Migration: a library consumer who needs the file-loading shape
inlines `parseJsonSchema(File(path).readAsStringSync())`. The CLI
unchanged; both `lam` and `lam-mcp` keep their schema flags.

### Markdown frontmatter no longer absorbed as prose

- `parseInput(text, Format.markdown)` now uses
`parseMarkdownWithFrontmatter` from rumil_parsers 0.8.1. A leading
YAML frontmatter block (delimited by `---` lines) becomes a sibling
`frontmatter` field on the document, instead of being concatenated
into the body text by the `text` op or by the children list.

Files without frontmatter parse byte-identically to before. Files
with frontmatter gain a `frontmatter` key addressable via
`.frontmatter.title`, `.frontmatter.tags[0]`, etc.

Pre-fix: `lam '. | text' SKILL.md` returned `"name: lambe
description: | ...First headingBody."` (the YAML scooped up by the
prose walker).

Post-fix: `lam '. | text' SKILL.md` returns `"First headingBody."`
and the metadata is queryable separately.

### Foreign-idiom redirects (jq-compatibility)

The parser already emitted `help: ...` redirects for `select`/`paths`/
`..`/`try` in 0.9. This release widens coverage to the rest of the
common jq habits the model might draft:

- **Pipe-op redirects** (fire when written as `... | name(...)`):
`getpath`, `setpath`, `env`, `gsub`/`sub`/`test`/`match`/`scan`/
`splits` (regex family), `tojson`, `fromjson`.
- **Inline-idiom redirects** (fire on character patterns):
`@uri`, `@html`, `@sh`, `@json`, `$ENV` and other variable-binding
forms (`$NAME`).
- **Closest-match overrides**: `test`, `match`, `sub` previously
produced misleading "did you mean text/map/sum?" guesses. Now they
produce the regex-family explanation directly.

### Other user-visible changes

- **Output-shape error messages clarify "appending":** the writer's
bridge suggestion now reads `Append one of these stages to the end
of your query (keep your existing flags such as -t hcl):` instead
of the ambiguous `Try appending one of:`. The format name is
interpolated dynamically.
- **`--schema <data-file>` migration hint:** when the argument has a
data extension (`.json`/`.yaml`/`.toml`/etc.) instead of a
`*.schema.json`, the error suggests the new flag:
`hint: --schema is now for declaring a JSON Schema (renamed from
0.8.0). To inspect data shape use: lam --print-shape <file>`.
- **`.jsonlines` extension auto-implies `--ndjson`** (joining the
existing `.ndjson` and `.jsonl` auto-detection).
- **Heterogeneous-list shape descriptions list the sampled types:**
`--print-shape` on a mixed array now emits `"description":
"sampled: number, string, boolean, null, array (heterogeneous)"`
instead of the generic `"sampled, may be heterogeneous"`. Empty
lists keep the original wording (no observed kinds).
- **`tonumber` and `add` jq-compatibility aliases now documented** in
`doc/jq-to-lambe.md`, `doc/syntax.md`, `doc/lam.1.md`, and
`AGENTS.md`. The aliases were already implemented in 0.9; the
surface was just undocumented.
- **`//` documentation tightened** in `doc/jq-to-lambe.md` (it's the
null-fallback operator, not an error-handler — `expr // alt`
returns `alt` when `expr` evaluates to null, but computation
errors still propagate).

### Performance

End-to-end CLI / library benchmark (median of 7 runs, after
warm-up), comparing lambé 0.9.0 (rumil 0.7.0 / rumil_parsers 0.8.0)
against this release (rumil 0.7.1 / rumil_parsers 0.8.1):

| Workload | AOT 0.9.0 | AOT 0.10.0 | Δ | WASM 0.9.0 | WASM 0.10.0 | Δ |
|-------------------------------------------|----------:|-----------:|-------:|-----------:|------------:|-------:|
| `--print-shape` on 50k items | 742.2 ms | 693.8 ms | -6.5% | 319.9 ms | 290.1 ms | -9.3% |
| `.items \| filter(.value > 50000) \| length` | 748.5 ms | 704.8 ms | -5.8% | 324.9 ms | 285.9 ms | -12.0% |
| `group_by(.role)` on 1k records | 30.7 ms | 27.1 ms | -11.7% | 12.4 ms | 11.8 ms | -4.8% |

The win comes from rumil's hot/cold dispatch split (4–6% on
synthetic format benches; compounded through lambé's deeper call
chain). WASM is also the relevant runtime for the lambé playground
in arda-web; users get faster live-explain feedback in the browser.

Reproduce with `tool/bench/cli_bench.sh` (AOT) — the WASM library
bench requires a host program importing `package:lambe` and
compiling with `dart compile wasm`.

### Dependency bumps

- `rumil_parsers ^0.8.0` → `^0.8.1` (required for
`parseMarkdownWithFrontmatter`).

## 0.9.0

Closes the shape feedback loop. Declare a JSON Schema, check queries
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ library;

import 'package:rumil_tokens/rumil_tokens.dart';

import 'shape/pipe_ops.dart' as shape_ops;
import 'package:lambe/src/shape/pipe_ops.dart' as shape_ops;

/// Lambé query grammar for the REPL highlighter.
///
Expand Down
52 changes: 50 additions & 2 deletions bin/lam.dart
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,9 @@ import 'dart:io';

import 'package:args/args.dart';
import 'package:lambe/lambe.dart';
import 'package:lambe/src/repl.dart' show runRepl;

import 'repl.dart' show runRepl;
import 'schema_io.dart';

void main(List<String> arguments) {
final argParser =
Expand Down Expand Up @@ -157,6 +159,19 @@ void main(List<String> arguments) {

final rest = args.rest;
if (rest.isEmpty && !isPrintShapeMode && !isInteractive) {
// 0.8 → 0.9 migration: --schema took a data file in 0.8 and printed
// its shape. In 0.9 it takes a JSON Schema file (the shape printer
// moved to --print-shape). When the argument looks like data, point
// the user at the new flag instead of the generic usage dump.
if (schemaPath != null && _looksLikeDataFile(schemaPath)) {
stderr.writeln('Error: missing query expression.');
stderr.writeln(
' hint: --schema is now for declaring a JSON Schema (renamed '
'from 0.8.0).',
);
stderr.writeln(' To inspect data shape use: lam --print-shape <file>');
exit(1);
}
stderr.writeln('Error: missing query expression.');
stderr.writeln();
_usage(argParser);
Expand Down Expand Up @@ -197,7 +212,9 @@ void main(List<String> arguments) {
// format auto-detection convention for .csv, .yaml, etc.
if (!isNdjsonMode && rest.length > fileArgIndex) {
final fpath = rest[fileArgIndex].toLowerCase();
if (fpath.endsWith('.ndjson') || fpath.endsWith('.jsonl')) {
if (fpath.endsWith('.ndjson') ||
fpath.endsWith('.jsonl') ||
fpath.endsWith('.jsonlines')) {
isNdjsonMode = true;
}
}
Expand Down Expand Up @@ -648,6 +665,37 @@ Iterable<String> _stdinLines() sync* {
}
}

/// True if [path] has a data-format extension (`.json`, `.yaml`, etc.)
/// rather than the `*.schema.json` JSON-Schema convention. Used by the
/// 0.8 → 0.9 migration hint: `--schema /path/to/data.json` is almost
/// certainly stale shell history from when `--schema` printed shapes
/// (now `--print-shape`).
bool _looksLikeDataFile(String path) {
final lower = path.toLowerCase();
// *.schema.json is the canonical JSON Schema filename — not data.
if (lower.endsWith('.schema.json')) return false;
const dataExts = [
'.json',
'.ndjson',
'.jsonl',
'.jsonlines',
'.yaml',
'.yml',
'.toml',
'.tf',
'.hcl',
'.csv',
'.tsv',
'.md',
'.markdown',
'.proto',
];
for (final ext in dataExts) {
if (lower.endsWith(ext)) return true;
}
return false;
}

/// Print usage information to stderr.
void _usage(ArgParser parser) {
stderr.writeln('Usage: lam [options] <expression> [file]');
Expand Down
File renamed without changes.
5 changes: 3 additions & 2 deletions lib/src/repl.dart → bin/repl.dart
Original file line number Diff line number Diff line change
Expand Up @@ -11,9 +11,10 @@ import 'dart:io';

import 'package:rumil/rumil.dart';

import '../lambe.dart';
import 'completer.dart';
import 'package:lambe/lambe.dart';
import 'package:lambe/src/completer.dart';
import 'readline.dart';
import 'schema_io.dart';

/// Run the interactive REPL with [data].
///
Expand Down
56 changes: 56 additions & 0 deletions bin/schema_io.dart
Original file line number Diff line number Diff line change
@@ -0,0 +1,56 @@
/// CLI-only schema file loaders.
///
/// Lives in bin/ rather than lib/ so the published library has zero
/// `dart:io` imports — making it safely usable from `dart compile wasm`
/// and dart2js consumers (e.g. the lambé playground in arda-web).
///
/// The pure schema-merge logic stays in `lib/src/schema/loader.dart` as
/// [mergeSchemaWithData]; only the path-based loaders moved.
library;

import 'dart:io';

import 'package:lambe/lambe.dart';

/// Load a schema from a file path, parsing it as a JSON Schema subset.
///
/// Throws [QueryError] if the file is missing or unreadable, or if
/// the schema parser rejects the content.
Shape loadSchemaFromFile(String path) {
final file = File(path);
if (!file.existsSync()) {
throw QueryError('schema file not found: $path');
}
final source = file.readAsStringSync();
return parseJsonSchema(source);
}

/// Load a schema for [dataPath], preferring [explicitSchemaPath] when
/// provided and falling back to a `<dataPath>.schema.json` sibling.
///
/// Returns `null` when no explicit path is given and no sibling
/// exists. Throws [QueryError] for explicit paths that fail to load.
Shape? loadSchemaForData({String? explicitSchemaPath, String? dataPath}) {
if (explicitSchemaPath != null) {
return loadSchemaFromFile(explicitSchemaPath);
}
if (dataPath != null) {
final sibling = _siblingSchemaPath(dataPath);
if (sibling != null && File(sibling).existsSync()) {
return loadSchemaFromFile(sibling);
}
}
return null;
}

/// Compute the sibling schema path for [dataPath].
///
/// Strips the data file's extension and appends `.schema.json`:
/// `data.json` → `data.schema.json`, `events.ndjson` → `events.schema.json`.
/// Returns `null` for paths without a recognizable extension.
String? _siblingSchemaPath(String dataPath) {
final lastDot = dataPath.lastIndexOf('.');
if (lastDot < 0) return null;
final base = dataPath.substring(0, lastDot);
return '$base.schema.json';
}
25 changes: 23 additions & 2 deletions doc/jq-to-lambe.md
Original file line number Diff line number Diff line change
Expand Up @@ -95,6 +95,23 @@ jq returns `[[group1], [group2]]`. Lambe returns `[{key: true, values: [...]}, {

jq uses `add` for sum and `add / length` for average. Lambe has `sum` and `avg` directly.

`add` is also accepted as a jq-compatibility alias for `sum` — both
parse, both produce the same AST, and `--explain` canonicalises to
`sum`. Use `sum` in new lambé queries; `add` exists so jq habits
don't fail on the parser.

## Type coercion

| jq | Lambe |
|----|-------|
| `"42" \| tonumber` | `"42" \| to_number` (or jq alias `tonumber`) |
| `"3.14" \| tonumber` | `"3.14" \| to_number` (or jq alias `tonumber`) |

`to_number` is lambé's canonical name; `tonumber` is accepted as a
jq-compatibility alias. Both parse, both throw `to_number: cannot
parse "..."` on a non-numeric string, and `--explain` canonicalises
to `to_number`.

## Object construction

| jq | Lambe |
Expand Down Expand Up @@ -126,9 +143,13 @@ Identical syntax.
| jq | Lambe |
|----|-------|
| `.config \| has("host")` | `.config \| has("host")` |
| `.config.missing // "default"` | not yet supported |
| `.config.missing // "default"` | `.config.missing // "default"` |

`has` is identical. jq's `//` (alternative operator) does not exist in Lambe yet.
`has` is identical. `//` is the null-fallback operator: `expr // alt`
returns `alt` when `expr` evaluates to `null`. It is not an
error-handler — computation errors still propagate. For "the field
might not exist," `// default` is the idiom; for "this might fail,"
use shape checks (`has(...)`, `--print-shape`) before the call site.

## Entry conversion

Expand Down
8 changes: 4 additions & 4 deletions doc/lam.1
Original file line number Diff line number Diff line change
Expand Up @@ -139,8 +139,8 @@ Length of list, map, or string.
\fBfirst\fR, \fBlast\fR
First or last element.
.TP
\fBsum\fR, \fBavg\fR, \fBmin\fR, \fBmax\fR
Aggregate operations on numeric lists.
\fBsum\fR (jq alias: \fBadd\fR), \fBavg\fR, \fBmin\fR, \fBmax\fR
Aggregate operations on numeric lists. \fCadd\fR is accepted for jq-compatibility; \fC--explain\fR canonicalises to \fCsum\fR.
.TP
\fBhas\fR(\fIkey\fR)
Check if a map contains a key.
Expand All @@ -151,8 +151,8 @@ Map to [{key, value}].
\fBfrom_entries\fR
[{key, value}] to map.
.TP
\fBto_number\fR
Parse a string as a number. Pass-through for existing numbers.
\fBto_number\fR (jq alias: \fBtonumber\fR)
Parse a string as a number. Pass-through for existing numbers. Both names parse identically; \fC--explain\fR canonicalises to \fCto_number\fR.
.TP
\fBtype\fR
Runtime type of the value as a string: "null", "boolean", "number", "string", "array", or "object".
Expand Down
8 changes: 4 additions & 4 deletions doc/lam.1.md
Original file line number Diff line number Diff line change
Expand Up @@ -159,8 +159,8 @@ Queries start with **.** (the current document) and chain operations with **|**.
**first**, **last**
: First or last element.

**sum**, **avg**, **min**, **max**
: Aggregate operations on numeric lists.
**sum** (jq alias: **add**), **avg**, **min**, **max**
: Aggregate operations on numeric lists. `add` is accepted for jq-compatibility; `--explain` canonicalises to `sum`.

**has**(*key*)
: Check if a map contains a key.
Expand All @@ -171,8 +171,8 @@ Queries start with **.** (the current document) and chain operations with **|**.
**from_entries**
: [{key, value}] to map.

**to_number**
: Parse a string as a number. Pass-through for existing numbers.
**to_number** (jq alias: **tonumber**)
: Parse a string as a number. Pass-through for existing numbers. Both names parse identically; `--explain` canonicalises to `to_number`.

**type**
: Runtime type of the value as a string: "null", "boolean", "number", "string", "array", or "object".
Expand Down
Loading