diff --git a/.claude/skills/lambe/SKILL.md b/.claude/skills/lambe/SKILL.md new file mode 100644 index 0000000..36b58a5 --- /dev/null +++ b/.claude/skills/lambe/SKILL.md @@ -0,0 +1,142 @@ +--- +name: lambe +description: Query, filter, transform, validate, and convert structured data files (JSON, YAML, TOML, HCL/Terraform, CSV, TSV, Markdown) using the `lam` CLI. Use when the user asks to extract a field, filter records, aggregate values, check structure, validate against a schema, or convert between formats. Works on config files, API responses, deployment manifests, data exports, and Markdown documents (parsed as a typed AST). Bounded — no recursion, no `def`, no regex; for those the user should reach for a real programming language. +license: MIT +metadata: + homepage: https://pub.dev/packages/lambe + repository: https://github.com/hakimjonas/lambe +--- + +# Lambé (`lam`) — structured data queries + +Lambé is on the user's PATH after `dart pub global activate lambe`. You +invoke it via shell. The binary is named `lam`. + +## When to reach for `lam` + +The user wants to do something with a **structured data file**: +- "Get the X field from this JSON" +- "Filter the Y where Z" +- "Sum / count / list / sort the items" +- "What's the structure of this file?" +- "Check that the deployment has at least 2 replicas" +- "Convert this YAML to TOML" +- "List all the headings in this README" + +Don't reach for `lam` when: +- The data is binary, in a database, or a stream. +- The user explicitly asked for jq syntax (use jq). +- The query needs recursion, `try`/`catch`, regex, or accumulating state — write code instead. + +## Core moves + +```bash +# Extract — single value or path +lam '.database.host' config.toml + +# Filter + project +lam '.users | filter(.age > 30) | map(.name)' data.json + +# Aggregate +lam '.items | map(.price) | sum' data.json + +# Inspect structure (returns JSON Schema) +lam --print-shape data.json + +# Static query trace (no execution; surfaces shape per stage + warnings) +lam --explain '.config | flatten | as(toml)' data.json + +# CI assertion (exit 0 on true, 1 on false) +lam --assert '.replicas >= 2' deployment.yaml + +# Convert format +lam --to yaml '.config' data.json + +# Run without input (literal-only queries) +lam -n '[1, 2, 2, 3] | unique' + +# Markdown headings (use `text` op, not `.children[0].text`) +lam '.children | filter(.type == "heading") | map(text)' README.md +``` + +## Syntax in 30 seconds + +**Property access**: `.field`, `.users[0]`, `.users[-1]`, `.tags[1:3]`, +`.["x-axis"]` (bracket form for non-identifier keys). + +**Pipeline ops** chained with `|`: +`filter(p)`, `map(e)`, `sort`, `sort_by(k)`, `group_by(k)`, `unique`, +`unique_by(k)`, `flatten`, `reverse`, `length`, `first`, `last`, +`sum`, `avg`, `min`, `max`, `keys`, `values`, `has("k")`, +`to_entries`, `from_entries`, `to_number`, `type`, +`filter_values(p)`, `map_values(e)`, `filter_keys(p)`, `text` (markdown), +`as(fmt)` (cross-format bridge). + +**Expressions**: arithmetic `+ - * / %`, comparison `< > <= >= == !=`, +boolean `&& || !`, null fallback `//`, conditional `if c then a else b`, +object construction `{name, total: .price * .qty}`, string interpolation +`"\(.name) is \(.age)"`, list literal `[1, 2, 3]`. + +**Boolean keywords**: lambé's logic operators are `&&` `||` `!`. Don't +write `and` `or` `not` — the parser will tell you, but save the round trip. + +## Markdown data model + +Markdown parses to a CommonMark AST. Root is `{type: "document", children: [...]}`. +Every node has a `type`. Container nodes have `children`; leaves carry +content directly. + +Common queries: + +```bash +# Heading texts (use `text` op for prose extraction) +lam '.children | filter(.type == "heading") | map(text)' doc.md + +# Headings with levels +lam '.children | filter(.type == "heading") | map({level, title: text})' doc.md + +# Code blocks by language +lam '.children | filter(.type == "code_block") | map({language, code})' doc.md + +# Whole document as plain text +lam '. | text' doc.md +``` + +The `text` op walks any node tree and concatenates prose recursively +(text + code + code_block + image alt). Use it instead of +`.children[0].text` — that pattern only sees the first immediate child +and misses nested emphasis, links, and inline code. + +## Common gotchas + +- **Output is pretty-printed JSON by default.** Pass `--no-pretty` for + compact output, or `-r` for raw top-level strings (no quotes). +- **Lambé's null contract**: navigation returns null (`.missing` is null, + doesn't throw); computation throws (`.missing + 5` errors). Use + `.field == null` to test, or `.field // default` to substitute. +- **Empty-list policy**: `first`/`last` return null on empty; + `min`/`max`/`avg` throw; `sum` returns 0. +- **Heterogeneous lists** widen to `any` in shape inference. Real-world + markdown children, mixed JSON arrays. The shape system is honest about + this; `--print-shape` shows the widening. +- **Output format errors give actionable hints.** If `lam --to toml ...` + rejects the shape, the error names the `as(...)` bridge to apply. +- **`--explain` is your friend** when a query is unexpectedly empty or + errors. It prints the shape at every stage statically, plus warnings + for runtime-rejected ops and provably-empty filters. + +## What lambé deliberately doesn't do + +`..` recursive descent, `def` user functions, `try`/`catch`, regex, +`getpath`/`setpath`, in-place mutation, streaming. If you draft a +query needing any of these, lambé will tell you with an "unknown +pipe op" error or a `_jqIdiomHint`. The repo's `doc/non-goals.md` +documents the lambé idiom that replaces each omission. + +## When you hit something this skill doesn't cover + +The repo's `AGENTS.md` has the broader reference (more examples, full +pipeline op list, error pattern table, format auto-detect rules). +`doc/syntax.md` is the language reference. `doc/recipes.md` has +end-to-end examples. The MCP server `lam-mcp` is available for +sandboxed agents that can't shell out. diff --git a/.github/workflows/ci.yml b/.github/workflows/ci.yml index 83438ca..e592ee8 100644 --- a/.github/workflows/ci.yml +++ b/.github/workflows/ci.yml @@ -12,6 +12,8 @@ jobs: steps: - uses: actions/checkout@v6 - uses: dart-lang/setup-dart@v1.7.2 + with: + sdk: 3.12.0 - run: dart pub get - run: cd lambe_test && dart pub get && cd .. - run: dart analyze --fatal-infos @@ -21,6 +23,8 @@ jobs: steps: - uses: actions/checkout@v6 - uses: dart-lang/setup-dart@v1.7.2 + with: + sdk: 3.12.0 - run: dart pub get - run: dart format --set-exit-if-changed . @@ -29,6 +33,19 @@ jobs: steps: - uses: actions/checkout@v6 - uses: dart-lang/setup-dart@v1.7.2 + with: + sdk: 3.12.0 - run: dart pub get - run: dart test - run: cd lambe_test && dart pub get && dart test + + lint-changelog: + runs-on: ubuntu-latest + steps: + - uses: actions/checkout@v6 + - uses: dart-lang/setup-dart@v1.7.2 + with: + sdk: 3.12.0 + - run: dart pub get + - run: dart compile exe bin/lam.dart -o lam + - run: ./tool/lint_changelog.sh diff --git a/.github/workflows/release.yml b/.github/workflows/release.yml index 7166e8a..175560b 100644 --- a/.github/workflows/release.yml +++ b/.github/workflows/release.yml @@ -29,6 +29,8 @@ jobs: steps: - uses: actions/checkout@v6 - uses: dart-lang/setup-dart@v1.7.2 + with: + sdk: 3.12.0 - run: dart pub get - run: dart run tool/gen_version.dart @@ -56,6 +58,14 @@ jobs: path: artifacts merge-multiple: true + - name: Generate checksums.txt + run: | + cd artifacts + # One SHA256 per line, matching sha256sum / shasum -a 256 format. + # install.sh reads this to verify downloaded binaries. + sha256sum lam-* > checksums.txt + cat checksums.txt + - uses: softprops/action-gh-release@v3 with: files: artifacts/* @@ -88,7 +98,7 @@ jobs: "\$schema": "https://static.modelcontextprotocol.io/schemas/2025-12-11/server.schema.json", "name": "io.github.hakimjonas/lambe", "title": "Lambe", - "description": "Query JSON, YAML, TOML, HCL, CSV, TSV, and Markdown with a composable pipeline syntax.", + "description": "A query language for structured data that shows you what you're working with. Shape-aware --explain, JSON Schema input, format bridges.", "repository": { "url": "https://github.com/hakimjonas/lambe.git", "source": "github" diff --git a/.gitignore b/.gitignore index fcb7d82..f80e81b 100644 --- a/.gitignore +++ b/.gitignore @@ -8,10 +8,21 @@ doc/api/ *.iml .vscode/ +# Claude Code local state (settings, session caches). The +# .claude/skills/ subdirectory is the exception — it ships an Agent +# Skills package to GitHub for AI coding agents working on lambé and +# is tracked deliberately. .pubignore separately excludes the whole +# .claude/ tree from the pub.dev publish payload. +# `.claude/*` (not `.claude/`) so the negation can re-include +# subdirectories — git won't descend into a directory ignored by name. +.claude/* +!.claude/skills/ + .DS_Store Thumbs.db # Compiled binaries +lam lam-mcp # Local dependency overrides @@ -26,3 +37,6 @@ bench-results-*.json # Session handover notes (internal workflow, not code) HANDOVER_*.md + +# Local scratch notes (release planning, status snapshots, etc.) +*.scratch.md diff --git a/.pubignore b/.pubignore index 461fd30..6c3b0e5 100644 --- a/.pubignore +++ b/.pubignore @@ -2,8 +2,43 @@ HANDOVER_*.md bench-results-*.json tool/bench/ doc/api/ -# Stale local AOT build; per-platform binaries are published via CI + +# Stale local AOT builds; per-platform binaries are published via CI # to the GitHub release and consumed by the MCP registry. Pub clients -# rebuild from source on `dart pub global activate`, so shipping this -# is dead weight. +# rebuild from source on `dart pub global activate`, so shipping these +# is dead weight (and locks pub.dev consumers to a Linux x86_64 binary +# that's useless on other platforms). +lam lam-mcp + +# Local scratch notes (release planning, status snapshots, etc.). +# Same intent as the matching `.gitignore` rule, repeated here because +# `.pubignore` does not inherit from `.gitignore`. +*.scratch.md + +# Claude Code session state and skill bundles. The .claude/skills/ +# subdirectory ships an Agent Skills package for AI coding agents +# working in a clone of this repo; pub.dev consumers (Dart developers) +# don't load skills, so it would just be noise on the package page. +.claude/ + +# Cross-vendor Agent Skills path (recognised by Gemini CLI and others +# as the interoperable alias for .claude/skills/). Same exclusion logic. +.agents/ + +# Agent-facing instruction docs. Read by Cursor, Copilot, Claude Code, +# Gemini CLI, Devin, and other agents inspecting a cloned repo. +# pub.dev consumers (Dart developers) don't need them. +AGENTS.md + +# Local pubspec overrides. +pubspec_overrides.yaml + +# MCP registry tokens. +.mcpregistry_* + +# MCP registry manifest template — regenerated by .github/workflows/release.yml +# at release time and consumed by the MCP registry publish flow. Pub.dev +# consumers never use it, and the placeholder version in the file would be +# misleading if shipped. +server.json diff --git a/AGENTS.md b/AGENTS.md index 7d40591..675e3d2 100644 --- a/AGENTS.md +++ b/AGENTS.md @@ -1,71 +1,233 @@ -# AGENTS.md +# AGENTS.md — using Lambë + +Lambë (`lam`) is a query language for structured data. It extracts, +filters, transforms, validates, and converts JSON, YAML, TOML, HCL, +CSV, TSV, and Markdown — auto-detecting format from file extension. + +This file teaches you (the agent) **when to reach for `lam` and how to +write queries that work**. The `lam` binary is on the user's PATH after +`dart pub global activate lambe`; you can invoke it from a shell tool. + +## When to use it + +Reach for `lam` when the user wants to: + +- **Extract** values from a structured file (one field, an array, a nested path). +- **Filter** records by a predicate. +- **Transform** records into a different shape. +- **Aggregate** numbers (sum, avg, min, max, count). +- **Validate** structure or values (`--assert`, `--schema`, `--explain`). +- **Convert** between formats (`--to yaml`, `--to csv`, etc.). +- **Inspect** unfamiliar data (`--print-shape` returns JSON Schema). + +Lambë is a **bounded tree transformer** — every query terminates, no +recursion, no `def`/lambdas. Don't reach for it when the user wants: + +- Binary data, images, databases, streaming. +- jq syntax specifically (use jq). +- SQL queries (use SQL). +- Programmatic processing with loops or accumulating state (write code instead). +- Recursive descent (`..`), `try`/`catch`, regex, `getpath`/`setpath`, + in-place mutation. See [doc/non-goals.md](doc/non-goals.md) for the + full list and the lambë idiom that replaces each omission. If you + hit "unknown pipe op" or a `_jqIdiomHint` message, that page is the + canonical reference. + +## Natural language → `lam` query + +| User says | Query | +|---|---| +| "Get the database host" | `lam '.database.host' config.yaml` | +| "List all user names" | `lam '.users \| map(.name)' data.json` | +| "Filter active users over 30" | `lam '.users \| filter(.active && .age > 30)' data.json` | +| "How many items?" | `lam '.items \| length' data.json` | +| "Sort by price descending" | `lam '.items \| sort_by(.price) \| reverse' data.json` | +| "Group by department" | `lam '.users \| group_by(.dept)' data.json` | +| "Total price" | `lam '.items \| map(.price) \| sum' data.json` | +| "Show the structure" | `lam --print-shape data.json` | +| "Check version isn't empty" | `lam --assert '.version != ""' package.json` | +| "Convert to YAML" | `lam --to yaml '.' data.json` | +| "Export as CSV" | `lam --to csv '.users \| map({name, age})' data.json` | +| "Get all unique tags" | `lam '.items \| map(.tags) \| flatten \| unique' data.json` | +| "Get the first 3 items" | `lam '.items[:3]' data.json` | +| "Build a summary object" | `lam '{count: .items \| length, total: .items \| map(.price) \| sum}' data.json` | +| "Find containers without limits" | `lam '.spec.template.spec.containers \| filter(has("resources") == false) \| map(.name)' deployment.yaml` | +| "List Terraform resources" | `lam '.resource \| map(._labels)' main.tf` | +| "Query CSV data" | `lam '. \| filter(.status != "closed") \| map(.title)' issues.csv` | +| "Sum a CSV numeric column" | `lam '. \| map(.price \| to_number) \| sum' orders.csv` | +| "Inspect a value's type" | `lam '.config \| type' data.yaml` | +| "List all headings in this markdown" | `lam '.children \| filter(.type == "heading") \| map(text)' README.md` | +| "What languages are in the code blocks?" | `lam '.children \| filter(.type == "code_block") \| map(.language)' tutorial.md` | +| "Run a query without input" | `lam -n '[1, 2, 2, 3] \| unique'` | +| "Explore interactively" | `lam -i data.json` | + +## Syntax reference + +### Property access -## Structured Data Queries +``` +.name field access +.users[0] index +.users[0].name chained +.users[-1] negative index (from end) +.users[1:3] slice +.users[:3] slice from start +.users[-2:] slice from end +.["x-axis"] bracket form for keys with hyphens / spaces / dots +``` -This project uses [Lambë](https://pub.dev/packages/lambe) (`lam`) for querying structured data files. +### Pipeline operations -### CLI +``` +. | filter(.age > 30) keep matching elements +. | map(.name) transform each element +. | sort natural-order sort +. | sort_by(.age) sort by key expression +. | group_by(.type) group into [{key, values}] +. | unique deduplicate +. | unique_by(.id) deduplicate by key +. | flatten flatten one level +. | reverse reverse order +. | length count elements (list / map / string) +. | first first element +. | last last element +. | sum sum numbers +. | avg average +. | min / max minimum / maximum +. | keys map keys or list indices +. | values map values +. | has("field") check field exists (returns bool) +. | to_entries map to [{key, value}] +. | from_entries [{key, value}] to map +. | to_number parse a string as a number (use on CSV numeric columns) +. | type runtime type: null, boolean, number, string, array, object +. | filter_values(. > 5) filter a map's values +. | map_values(. * 2) transform a map's values +. | filter_keys(. != "x") filter a map's keys +. | text markdown-only — concatenate prose from a node tree +. | as(yaml) cross-format bridge (also as(toml), as(csv), as(hcl)) +``` -```bash -# Extract values -lam '.database.host' config.toml -lam '.spec.containers[0].image' deployment.yaml +### Expressions -# Filter and transform -lam '.users | filter(.age > 30) | map(.name)' data.json +``` +.price * .qty arithmetic (+, -, *, /, %) +.age > 30 comparison (<, >, <=, >=, ==, !=) +.active && .verified logic (&&, ||, !) +.config // "default" null fallback (// is null-fallback, not error-handler) +if .age > 65 then "senior" else "active" +{name, total: .price * .qty} object construction +"\(.name) is \(.age)" string interpolation +[1, 2, 3] list literal +``` -# Aggregate -lam '.items | map(.price) | sum' data.json +## CLI flags worth knowing -# Schema inspection -lam --schema data.json +``` +-n, --null-input Run without input ("lam -n '[1,2,3] | unique'") +-i, --interactive REPL mode (loads data, then prompts for queries) +-f, --format FMT Override input format detection +--to FMT Output format (json default; yaml, toml, csv, tsv, hcl) +--no-pretty Compact (single-line) output +-r, --raw Output top-level string scalars without quotes + (no effect on structured output) +--print-shape Emit a JSON Schema describing the data's shape +--schema FILE Validate input against a JSON Schema before querying +--explain Static shape trace per pipeline stage (no execution) +--explain-json Same as --explain but emits structured JSON +--explain-trivial Surface trivially-empty / shape-rejected ops as warnings +--assert Exit 0 if the query returns true, exit 1 otherwise +--ndjson Each line of input is a JSON document (line-delimited) +--flatten-cells json For CSV/TSV output: encode non-scalar cells as JSON strings +``` -# CI validation -lam --assert '.replicas >= 2' deployment.yaml +## Markdown data model + +Markdown files are parsed into a CommonMark AST. Every node is a map +with a `type` field. Container nodes have `children`. The root is +`{type: "document", children: [...]}`. + +| Node type | Fields | Notes | +|---|---|---| +| `document` | `children` | root | +| `heading` | `level`, `children` | block | +| `paragraph` | `children` | block | +| `list` | `ordered`, `tight`, `items`, `start?` | block | +| `list_item` | `children` | block | +| `code_block` | `code`, `language?` | leaf-ish | +| `blockquote` | `children` | block | +| `link` | `href`, `children`, `title?` | inline | +| `image` | `src`, `alt`, `title?` | inline | +| `emphasis` | `children` | inline (italic) | +| `strong` | `children` | inline (bold) | +| `text` | `text` | leaf | +| `code` | `code` | leaf (inline code) | +| `thematic_break` | — | horizontal rule | +| `hard_break` | — | explicit line break | +| `soft_break` | — | source line wrap | +| `html_block` | `html` | raw HTML block | +| `html_inline` | `html` | raw inline HTML | + +**Use the `text` pipe op for prose extraction**, not `.children[0].text`. +The `text` op walks any node tree and concatenates text/code/code_block +content + image alt text in document order, recursing into nested +emphasis, strong, links, and inline code. `.children[0].text` only +sees the first immediate child and misses nested formatting. -# Format conversion -lam --to yaml '.config' data.json -lam --to csv '.users | map({name, age})' data.json +```bash +# All heading texts +lam '.children | filter(.type == "heading") | map(text)' README.md -# Query CSV/TSV -lam '. | filter(.status != "closed") | map(.title)' issues.csv +# Headings with their levels +lam '.children | filter(.type == "heading") | map({level, title: text})' README.md -# Query Terraform -lam '.resource | filter(._labels[0] == "aws_instance") | map(._labels[1])' main.tf +# Every code block by language +lam '.children | filter(.type == "code_block") | map({language, code})' tutorial.md -# Query Markdown (AST with typed nodes: heading, paragraph, link, code_block, etc.) -lam '.children | filter(.type == "heading") | map(.children[0].text)' README.md -lam '.children | filter(.type == "code_block") | map(.language)' tutorial.md +# Python code blocks only +lam '.children | filter(.type == "code_block" && .language == "python") | map(.code)' tutorial.md -# Interactive REPL -lam -i data.json +# Whole document as plain prose +lam '. | text' README.md ``` -### Supported Formats +## Error patterns -Input: JSON, YAML, TOML, HCL/Terraform, CSV, TSV, Markdown (auto-detected from file extension). -Output: JSON (default), YAML, TOML, CSV, TSV, HCL. +| Behaviour | What's happening | +|---|---| +| Result is `null` | Field doesn't exist; navigation returns null. Lambë's null-propagation contract: navigation is null-safe, computation throws. | +| `QueryError: ... expected number, got null` | Arithmetic / comparison on a missing value. Use `.field == null` to check, or `.field // default` to substitute, or filter upstream. | +| `QueryError: ... rejects map<...>` | Op needs a list but got a map (or vice versa). Use `--explain` to see the shape at each stage. | +| Parse error with caret | Invalid query syntax. Check parentheses, quotes, pipe placement. The error message names the column and offers a "did you mean" hint for typos. | +| `OutputShapeError` | The chosen output format needs a different shape (e.g., TOML needs a map root). Lambë's error message names the bridge `as(...)` to apply. | -### As MCP Tool +## Format auto-detection -The `lambe_query` MCP tool is available for querying structured data. Connect with: - -```bash -lam-mcp # stdio transport -``` +| Extension | Format | +|---|---| +| `.json` | JSON | +| `.yaml`, `.yml` | YAML | +| `.toml` | TOML | +| `.tf`, `.hcl` | HCL | +| `.csv` | CSV | +| `.tsv`, `.tab` | TSV | +| `.md`, `.markdown` | Markdown | -Tools: `lambe_query` (extract/filter/transform), `lambe_schema` (structure inspection), `lambe_assert` (validation). +Stdin sniffs from content. Override with `-f`/`--format`. -### In Dart Code +## In Dart code ```dart import 'package:lambe/lambe.dart'; final name = query('.users[0].name', data); final active = queryString('.users | filter(.active)', jsonString); +final config = queryString( + '.database.host', tomlString, format: Format.toml, +); ``` -### In Dart Tests +In tests: ```dart import 'package:lambe_test/lambe_test.dart'; @@ -74,8 +236,11 @@ expect(response, lamWhere('.errors | length == 0')); expect(config, lamEquals('.database.port', 5432)); ``` -### Pipeline Operations +## MCP server (sandboxed agents) -filter, map, sort, sort_by, group_by, unique, unique_by, flatten, reverse, -keys, values, length, first, last, sum, avg, min, max, has, to_entries, -from_entries, to_number, type, filter_values, map_values, filter_keys. +`lam-mcp` exposes the same query surface via the Model Context Protocol +for agents that can't shell out. Tools: `lambe_query`, +`lambe_print_shape`, `lambe_check`, `lambe_explain`, `lambe_assert`. Most +agents should prefer running `lam` from the shell directly — it's +cheaper per turn and the same capabilities. Reach for the MCP server +when shell access isn't available. diff --git a/AI.md b/AI.md deleted file mode 100644 index 85b46d5..0000000 --- a/AI.md +++ /dev/null @@ -1,176 +0,0 @@ -# Lambë AI Reference - -This document helps AI assistants decide when and how to use Lambë. - -## When to Use - -Use Lambë when the user needs to **extract, filter, transform, validate, or convert** data from structured files: -- JSON, YAML, TOML, HCL/Terraform, CSV, TSV, Markdown -- Configuration files, API responses, deployment manifests, data exports - -## When NOT to Use - -- Binary data, images, databases, streaming data -- If the user specifically requests jq syntax, use jq -- For SQL databases, use SQL -- For programmatic data processing (loops, variables), write code instead - -## Natural Language to Lambë - -| User says | Lambë query | -|-----------|-------------| -| "Get the database host" | `lam '.database.host' config.yaml` | -| "List all user names" | `lam '.users \| map(.name)' data.json` | -| "Filter active users over 30" | `lam '.users \| filter(.active && .age > 30)' data.json` | -| "How many items?" | `lam '.items \| length' data.json` | -| "Sort by price descending" | `lam '.items \| sort_by(.price) \| reverse' data.json` | -| "Group by department" | `lam '.users \| group_by(.dept)' data.json` | -| "Total price" | `lam '.items \| map(.price) \| sum' data.json` | -| "Show the structure" | `lam --schema data.json` | -| "Check version isn't empty" | `lam --assert '.version != ""' package.json` | -| "Convert to YAML" | `lam --to yaml '.' data.json` | -| "Export as CSV" | `lam --to csv '.users \| map({name, age})' data.json` | -| "Get all unique tags" | `lam '.items \| map(.tags) \| flatten \| unique' data.json` | -| "Get the first 3 items" | `lam '.items[:3]' data.json` | -| "Build a summary object" | `lam '{count: .items \| length, total: .items \| map(.price) \| sum}' data.json` | -| "Find containers without limits" | `lam '.spec.template.spec.containers \| filter(has("resources") == false) \| map(.name)' deployment.yaml` | -| "List Terraform resources" | `lam '.resource \| map(._labels)' main.tf` | -| "Query CSV data" | `lam '. \| filter(.status != "closed") \| map(.title)' issues.csv` | -| "Sum a CSV numeric column" | `lam '. \| map(.price \| to_number) \| sum' orders.csv` | -| "Inspect a value's type" | `lam '.config \| type' data.yaml` | -| "List all headings in this markdown" | `lam '.children \| filter(.type == "heading") \| map(.children[0].text)' README.md` | -| "What languages are in the code blocks?" | `lam '.children \| filter(.type == "code_block") \| map(.language)' tutorial.md` | -| "Explore interactively" | `lam -i data.json` | - -## Syntax Quick Reference - -### Property Access -``` -.name field access -.users[0] index -.users[0].name chained -.users[-1] negative index (from end) -.users[1:3] slice -.users[:3] slice from start -.users[-2:] slice from end -``` - -### Pipeline Operations -``` -. | filter(.age > 30) keep matching elements -. | map(.name) transform each element -. | sort_by(.age) sort by key -. | group_by(.type) group into [{key, values}] -. | unique_by(.id) deduplicate by key -. | flatten flatten one level -. | reverse reverse order -. | length count elements -. | first first element -. | last last element -. | sum sum numbers -. | avg average -. | min / max minimum / maximum -. | keys map keys or list indices -. | values map values -. | has("field") check field exists -. | to_entries map to [{key, value}] -. | from_entries [{key, value}] to map -. | to_number parse a string as a number (for CSV numeric columns) -. | type runtime type as string: null, boolean, number, string, array, object -. | filter_values(. > 5) filter map values -. | map_values(. * 2) transform map values -. | filter_keys(. != "x") filter map keys -``` - -### Expressions -``` -.price * .qty arithmetic (+, -, *, /, %) -.age > 30 comparison (<, >, <=, >=, ==, !=) -.active && .verified logic (&&, ||, !) -if .age > 65 then "senior" else "active" -{name, total: .price * .qty} object construction -"\(.name) is \(.age)" string interpolation -``` - -## Markdown Data Model - -Markdown files are parsed into a CommonMark AST. Every node is a map with a `type` field. Container nodes have `children`. The root is `{type: "document", children: [...]}`. - -### Node types and their fields - -| Node type | Fields | Example query | -|-----------|--------|---------------| -| `document` | children | `.children` | -| `heading` | level, children | `.children \| filter(.type == "heading" && .level == 1)` | -| `paragraph` | children | `.children \| filter(.type == "paragraph")` | -| `list` | ordered, tight, items, start? | `.children \| filter(.type == "list" && .ordered)` | -| `list_item` | children | `.children[0].items \| map(.children)` | -| `code_block` | code, language? | `.children \| filter(.type == "code_block") \| map({language, code})` | -| `blockquote` | children | `.children \| filter(.type == "blockquote")` | -| `link` | href, children, title? | inline node inside paragraph/heading children | -| `image` | src, alt, title? | inline node inside paragraph children | -| `emphasis` | children | inline node (italic) | -| `strong` | children | inline node (bold) | -| `text` | text | leaf inline node | -| `code` | code | inline code span | -| `thematic_break` | — | horizontal rule | -| `hard_break` | — | line break | -| `soft_break` | — | line break | -| `html_block` | html | raw HTML block | -| `html_inline` | html | raw inline HTML | - -Inline nodes (text, emphasis, strong, code, link, image, etc.) appear inside the `children` of block nodes like heading and paragraph. - -### Common markdown query patterns - -```bash -# All heading texts -lam '.children | filter(.type == "heading") | map(.children[0].text)' README.md - -# Headings with levels -lam '.children | filter(.type == "heading") | map({level, text: .children[0].text})' README.md - -# Code block languages -lam '.children | filter(.type == "code_block") | map(.language)' tutorial.md - -# Code block contents by language -lam '.children | filter(.type == "code_block" && .language == "python") | map(.code)' tutorial.md - -# Count headings by level -lam '.children | filter(.type == "heading") | group_by(.level) | map({level: .values[0].level, count: .values | length})' README.md - -# Extract plain text from paragraphs -lam '.children | filter(.type == "paragraph") | map(.children | filter(.type == "text") | map(.text))' doc.md -``` - -## Error Patterns - -- **Result is `null`**: the field doesn't exist (navigation returns null) -- **`QueryError` thrown**: type mismatch (e.g., arithmetic on null, filtering a non-list) -- **Parse error**: invalid query syntax (check parentheses, quotes, pipe placement) - -## Format Detection - -Lambë auto-detects format from file extension: -- `.json` → JSON -- `.yaml`, `.yml` → YAML -- `.toml` → TOML -- `.tf`, `.hcl` → HCL -- `.csv` → CSV -- `.tsv`, `.tab` → TSV -- `.md`, `.markdown` → Markdown - -Use `--format` / `-f` to override. - -## Installation - -```bash -# CLI tool -dart pub global activate lambe - -# Dart dependency -dart pub add lambe - -# MCP server (after global activate) -lam-mcp -``` diff --git a/CHANGELOG.md b/CHANGELOG.md index 091e480..441611a 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -1,3 +1,478 @@ +## 0.9.0 + +Closes the shape feedback loop. Declare a JSON Schema, check queries +against it, round-trip schemas with the ecosystem. Plus: richer +static analysis in `--explain`, line-delimited JSON input, an opt-in +CSV escape hatch for nested cells, an architectural pipe-op +consolidation, and a `rumil_tokens`-based REPL highlighter. + +### Pipe-op AST consolidation + +- The 27 per-op AST classes (`FilterOp`, `MapOp`, `SortOp`, …) + collapse into a single `BuiltinPipeOp(name, args)`. The spec table + in `pipe_ops.dart` is now the only place per-op behaviour lives: + acceptance, shape inference, runtime evaluation, and parse arity + all live on the same record. Adding or renaming a pipe op is a + one-file change. +- `As(target)` keeps a dedicated AST class for its typed + `OutputFormat` argument — it's the only custom-arity op. +- `pipeOpInfoFor(LamExpr)` recognises both `BuiltinPipeOp` and `As`. +- Source-breaking for external code that constructed pipe-op AST + nodes directly. The pre-1.0 contract here was that AST classes + were internals; we're taking that out properly. Tests that + assembled `MapOp(.x)` etc. now write `BuiltinPipeOp('map', [.x])`. + +### REPL syntax highlighter on `rumil_tokens` + +- `lib/src/readline.dart`'s 100-line hand-rolled tokenizer is gone. + The highlighter now consumes a `Token` stream from the + `rumil_tokens` `LangGrammar` defined in + `lib/src/highlight_grammar.dart`. The grammar lives in lambé (not + in `rumil_tokens`' built-in five) because it's lambé-specific. +- New runtime dependency: `rumil_tokens ^0.1.0`. +- Pipe op names (`filter`, `map`, `text`, etc.) now colour as + keywords (magenta) — they're routed through `LangGrammar.types` + and sourced from `pipe_ops.dart`'s spec table, so adding a new op + picks up colouring automatically. +- Highlighting is re-rendered on every keystroke. Earlier sessions + used a fast-path that wrote each typed character verbatim without + re-tokenising, so keywords stayed plain until a later edit + triggered a full redraw. With `rumil_tokens` actually being fast, + the fast-path was a UX bug; now `filter` colours on the final `r`, + not after the next backspace. +- Visible behavioural change in the REPL: `.field` colours as two + tokens (`.` punctuation + `field` identifier) rather than one + cyan run; negative literals colour as `-` operator + number + rather than one yellow run. The audit determined the new + behaviour is more principled; the visual effect is subtle. + +### REPL Tab completion: bare pipe ops inside parameterised ops + +- `map(t` now offers `text`, `to_entries`, `type`, etc. instead + of nothing useful. Bare pipe-op names like `text`, `length`, + `to_entries` are legal expressions in lambé (sugar for `. | op`), + so the completer should offer them inside `map(...)` / + `filter(...)` when the user is typing a partial name without a + leading `.`. Candidates are filtered by the element shape of the + surrounding pipe input — same shape-gated rule that already + governed post-pipe completion. The new `text` op makes + `map(text)` a useful and discoverable pattern; this change ensures + the completer can help users find it. + +### REPL Tab completion: heterogeneous lists via data sampling + +- `.children | map(.` on a heterogeneous list (e.g. a real + markdown document where `children` mixes headings / paragraphs / + code blocks) now offers the actual fields of the first list + element, instead of an empty candidate list. The static shape + system correctly widens such lists to `SList`, which gives + completion no hints; the completer falls back to navigating the + actual data values and shape-of-ing the first element to recover + a useful shape. Sampling threads through pipe ops that preserve + the element family (`filter`, `sort`, `sort_by`, `unique`, + `unique_by`, `reverse`) so + `.children | filter(.type == "heading") | map(.` works too. + Completion never runs the user's query — only structural + navigation — so cost stays bounded. + +### Markdown text extraction + +- **`text` pipe op.** Walks a markdown node (or list of nodes) and + concatenates every prose-bearing leaf — `text`, `code`, `code_block`, + and `image.alt` — in document order. Container nodes recurse through + their `children`. `html_block` and `html_inline` are skipped (avoids + the `Node.textContent` trap of dragging raw HTML, scripts, and styles + into "give me the text"). `soft_break` (a paragraph wrap in source) + contributes a single space, preserving word boundaries across line + wraps; `hard_break` (`\` at end of line, or two trailing spaces, an + explicit author-intended break) contributes a literal `'\n'`. This + diverges from `mdast-util-to-string`'s empty-on-break default — + trades strict precedent for the typical case of "produce readable + prose". Users who want a fully flat string can post-process with a + whitespace collapser. The previous recommendation, + `.children[0].text`, is structurally wrong for non-trivial markdown + (nested emphasis, inline code, links) and the existing pipe surface + cannot fix that without recursion. +- **First op tuned to a specific input format's vocabulary.** This is + the only pipe op whose `eval` switches on a value's `type` field. + The behaviour is bounded to markdown's node-type vocabulary as + defined in `lib/src/input.dart`'s `_nodeToNative`. It does NOT + authorise content-level dispatch in any other op — the spec entry + carries a load-bearing comment to that effect. Prior art (XPath + `string(node)`, mdast-util-to-string) converges on the same shape: + format-aware leaf primitive with hardcoded knowledge of which fields + carry prose vs metadata. jq's `..` approach drags in `link.href`, + `image.src`, and `code_block.language` — exactly the trap this op + avoids. + +### `queryNdjsonString` convenience + +- New `queryNdjsonString(Iterable lines, String expression)` + parses the expression once and delegates to `queryNdjson`. Resolves + the asymmetry where the existing `queryNdjson` took a pre-parsed + AST while every other `query*` took a string. + +### Performance + +- `_normalize` short-circuits canonical inputs. + `Map` / `List` / scalars round-trip + through the public API without allocating a copy. Non-canonical + inputs (e.g. `Map` from some YAML decoders) + still rebuild as before. +- End-to-end CLI is roughly **3.3× faster** than 0.8.0 on + parse-bound workloads. Measured on a 50k-element JSON document + (1.5 MB), AOT, on a Linux x86_64 workstation with the bench + harness in `tool/bench/cli_bench.sh`: + - `lam --print-shape big.json`: 2.4 s → 732 ms (3.28×). + - `lam '.items | filter(.value > 50000) | length' big.json`: + 2.5 s → 742 ms (3.37×). + Most of the win is inherited: rumil 0.7's FIRST-set Or dispatch, + the `firstCharChoice` combinator, and the Pratt migration carried + the bulk; rumil_parsers 0.8.0's JSON AST split, capture-based + number/string parsing, HCL AST split, and `common.dart` capture + rewrites carried the rest. Non-parse-bound paths benefit too — + `group_by` on 1k records is ~15% faster (39 ms → 33 ms) because + the JSON AST split removes a per-number `truncateToDouble` check + in `jsonToNative`. + See `tool/bench/cli_bench.sh` for the harness and reproduction. + +### Documentation precision + +- Six per-op behavioural details now have load-bearing docstrings: + `//` is a null-fallback (not an error-handler), the empty-list + policy (`first`/`last` return null; `min`/`max`/`avg` throw; `sum` + returns 0), `unique` distinguishes int from double by canonical + encoding, duplicate keys in `{a: x, a: y}` follow Dart map literal + semantics (last wins), `from_entries` rejects non-map / non-string- + key entries explicitly (was silent skip), `type` rejects non-JSON + runtime values with a hint pointing at `parseInput` / `jsonDecode`. +- The `from_entries` change is the only behavioural one — non-map + entries used to be dropped silently, now they throw `QueryError`. + Hides a class of bugs where upstream pipelines emit the wrong + shape. +- **`as(fmt)` bridges reference** in `doc/recipes.md`. Documents the + four canonical bridges with runnable examples: `list | + as(toml/hcl)` wraps as `{items: ...}`; `scalar | as(toml/hcl)` + wraps as `{value: ...}`; `map | as(csv/tsv)` derives via + `to_entries`; `scalar | as(csv/tsv)` composes both. +- **`As` class doc** softened to be honest about which error paths + users will and won't hit. The "ambiguous bridge" runtime branch is + defensive against future curation errors but unreachable with the + current curated table — the doc no longer claims otherwise. A new + invariant test in `shape_synthesize_test` pins `≤ 1 bridge per + (shape, format)` so the path becomes reachable only by a + deliberate change. +- **`syntax.md` examples** revert from `echo … | lam '. | op'` to + the cleaner `lam -n '… | op'` form now that `-n` exists. Several + pre-A6 examples were also silently broken: lambé object + construction uses bare identifiers (`{a: 1}`), not JSON-string + keys (`{"a": 1}`), so `[{"key": "a"}] | from_entries` was never + runnable. Fixed. +- **`-r` / `--raw` semantics** — man page entry now states the option + only affects top-level string scalars and is a silent no-op on + structured output (objects, arrays, numbers, booleans, null). The + previous wording ("Output strings without quotes") read as a + pretty-print toggle and surprised users on non-string values. +- **`doc/non-goals.md`** — new page enumerating the features lambé + deliberately omits, with the lambé idiom that replaces each one. + Cross-linked from `README.md` ("What lambé is not"), + `jq-to-lambe.md`, and `AGENTS.md`. Covers Turing-completeness, + recursive descent (`..`), `try`/`catch`, `select` outside `filter`, + `paths`/`leaf_paths`/`getpath`/`setpath`, regex, `range`/`limit`/ + `nth`, `.[]` iteration, `def`/lambdas, `@base64`/`@uri`, + streaming, `env`/`$__loc__`, HCL evaluation, and XML. Staying + bounded is a feature; the page makes that legible. +- **`text` op precedent** — the new `text` pipe op (see Markdown text + extraction) is the only op tuned to a specific input format's + vocabulary. The spec entry carries a load-bearing dartdoc comment + declaring this is bounded to markdown's node-type vocabulary as + defined in `_nodeToNative`, and does NOT authorise content-level + dispatch in any other op. + +### Bug fixes + +- **TSV input now honors header rows the same way CSV does.** Pre-0.9.0 + every TSV file returned `List>` because the parser + passed a static `defaultTsvConfig` and skipped dialect detection. + Now `parseInput` runs `detectDialect` for TSV with the tab + delimiter forced, so files where the first row looks like headers + return `List>`. `--print-shape data.tsv` and + `--print-shape data.csv` agree on logical content. +- **String single-char indexing.** `.name[0]` now returns a + one-character substring instead of erroring with `Cannot index + string`. Slicing (`.name[0:3]`) already worked; the asymmetry is + gone. Out-of-range returns `null` (mirrors list indexing); + non-int still throws. +- **`--explain` writability section is suppressed when a + runtime-rejection warning fires.** When a pipe op's input shape is + provably incompatible the post-stage shape widens to `SAny`, which + used to make every output format pass `canWriteAs` — so the + explain report listed every format for a pipeline that would throw + before any writer ran. Both `Writable as:` and `Not writable as:` + are now suppressed; the text renderer prints a one-line note in + their place, and the JSON renderer sets both keys to `null`. +- **Heterogeneous list rendering hint.** `shapeOf([1, "two", true])` + collapses the element type to `SAny`. The rendered JSON Schema now + carries a `description: "sampled, may be heterogeneous"` so + `--print-shape` users see that the schema reflects sampling, not + a guarantee. The hint round-trips through `parseJsonSchema` + (unknown keywords are ignored per JSON Schema's extensibility + convention). +- **Empty piped stdin.** Empty stdin in evaluation mode now surfaces + the standard "no input" error rather than a confusing JSON parse + error on the empty string. +- **HCL block access is now uniform across N=1 and N≥2 cases.** + Previously, querying `.variable` returned a single map for one + `variable` block but a list for two or more — forcing defensive + shape checks in queries. Now `.variable` is always a list, regardless + of count. Common Terraform patterns (one `terraform`, one `provider`, + single `variable`) no longer require N=1-vs-N≥2 branching. Fixed + upstream in `rumil_parsers 0.8.0` (decoder uses the `HclBlock` + discriminator already present in the AST instead of inferring shape + from key collisions); lambé adopts it via a constraint bump from + `^0.7.0` to `^0.8.0`. + +### Dependencies + +- **`rumil_parsers ^0.8.0`.** The JSON parser AST splits `JsonNumber` + into a sealed `JsonInt | JsonDouble` sum. Lambé propagates the + change through one schema-parser switch case — `JsonInt() || + JsonDouble() => 'number'` in `lib/src/schema/parser.dart`. No + user-visible behavior change at the lambé surface; downstream + consumers of lambé's library API see no shape difference because + `parseInput`-flavored Map/List types remain canonical Dart types + (the AST split is only visible when you reach into the JSON AST + directly via the lambé schema layer). The HCL fix described above + also rides this dependency bump (originally scoped as + `rumil_parsers 0.7.1`; rolled into 0.8.0 alongside the AST split). + See `rumil_parsers/BENCHMARKS.md` for the JSON parser perf wins on + the 0.8.0 release; lambé queries operating on JSON inputs benefit + transparently. +- **Object construction accepts JSON-string keys.** `{name: .x}` was + the only spelling; `{"name": .x}` errored with a confusing + "unexpected" message. Now both spellings produce the same map. Keys + that are valid identifiers should still use the bare form (`name:`); + keys that aren't (hyphenated, spaces, leading digits) use a + JSON-string literal in key position — `{"x-axis": .a}`, + `{"Content-Type": "application/json"}`, `{"my key": 1}`. Lambé's + data model accepts any string as a key; the construction grammar + now matches. Interpolation (`{"\(expr)": .y}`) is rejected with a + clear message — key position is structurally not an expression + position; build dynamic keys via `from_entries` on a list of + `{key, value}` maps. Shorthand `{name}` continues to require a bare + identifier (`{"name"}` alone is intentionally not supported). + +### jq compatibility + +- **`add` is now recognized as an alias for `sum`.** A jq idiom that + matches Lambé's `sum` exactly. `_jqAliases` in `parser.dart` is the + table; entries belong there only when the jq semantics are an + exact match. +- **Idiom hints for column-1 jq keywords.** `_jqIdiomHint` and + `_jqPipeOpHint` now recognise `try` / `try ... catch`, `recurse`, + `walk`, `paths`, `leaf_paths`, `range`, `limit`, `nth`, `@csv`, + `@tsv`, and `@base64`. Each produces a one-liner pointing at the + lambé equivalent (or, for `@base64`, the explicit "not supported" + signal) instead of the giant op-vocabulary dump. Folds into the + pre-existing hints for `[]`, `?`, `..`, `select`, `empty`, and + stranded `end`. + +### Schemas as a first-class contract + +- **`--schema `** on the CLI. Threads a JSON Schema subset + through both `--explain` inference and normal evaluation. With + data, the schema validates at load time (structural disagreement + exits 1 with a JSON path). Without data, the schema alone seeds + shape inference for design-time planning. +- **Sibling auto-detect.** Data at `path/to/data.json` picks up + `path/to/data.schema.json` implicitly. Same convention as ndjson + auto-detect. +- **`--print-shape`** on the CLI. Emits `shapeOf(data)` as a JSON + Schema subset document, round-trippable with `--schema` input. The + same shape-to-JSON-Schema rendering powers + `renderJsonSchema(shape)` on the library and the MCP + `lambe_print_shape` tool. +- **`--print-shape EXPR` composes with the query.** When given an + expression, `lam --print-shape '.users' data.json` now returns the + schema of the result of evaluating `.users` rather than the schema + of the whole document. Pre-0.9.0 the expression was silently + ignored. Without data, falls back to inferring from `SAny` — + matches the `--explain`-without-data flow. +- **REPL: `:schema [path]` and `:print-shape`.** `:schema ` + loads a schema for the session and reports agreement/disagreement + vs current data. `:schema` (no arg) prints the active schema. + `:load` re-validates against an active schema and warns on + disagreement. +- **MCP: `lambe_print_shape`, `lambe_check`, `lambe_explain`, plus + a `schema` parameter on `lambe_query`.** Agents can print a + shape, validate fixtures against a schema, trace a query + structurally before running, or gate a query on schema + conformance. `lambe_check` returns `{"ok": true}` / + `{"ok": false, "error": "..."}`. +- **Library surface.** `parseJsonSchema`, `renderJsonSchema`, + `loadSchemaFromFile`, `loadSchemaForData`, `mergeSchemaWithData` + are all exported from `package:lambe/lambe.dart`. + +### `SOptional` in the shape ADT + +- New sealed variant `SOptional(Shape)`. Represents + statically-known optionality — populated by JSON Schema's + `required` semantics, propagated through field access and op + inference, and surfaced by the explain trace. Nested optionality + collapses at construction: `SOptional(SOptional(x))` is always + `SOptional(x)`. +- Acceptance predicates unwrap `SOptional` for op inputs — `filter` + on `SOptional>` is accepted, with the potential absence + surfaced by a runtime-rejection warning rather than a silent + accept or a false reject. +- Root-level requirements (TOML/HCL `MustBeMap`) do NOT unwrap: an + absent root can't be serialized, so users must materialize a + default first. This asymmetry is deliberate. +- `shapeToJson` emits `{"kind": "optional", "inner": ...}`. + `renderJsonSchema` flattens `SOptional` inside `SMap` fields into + missing `required` entries (standard JSON Schema idiom); + non-field-position `SOptional` has no standard spelling in our + subset and is flattened with a docstring caveat. + +### Richer `--explain` output + +Three new categories of static analysis, plus a structured output +mode: +- **Runtime-rejection warnings** (always on). Flags pipe ops whose + input shape is provably incompatible. `.config | filter(.x)` on a + known map produces `"filter rejects map<...>; this will throw at + runtime"`. Uses the existing pipe-op acceptance predicates. +- **Trivial-result warnings** (opt-in via `--explain-trivial`). + Flags `sort_by`, `group_by`, `map`, and `unique_by` whose + argument references a field provably absent on the element shape. + Opt-in because legitimate uses exist (stable no-op sort, explicit + null projection). +- **Structured JSON output** (`--explain-json`). Emits the full + explain report as JSON with snake_case keys (`stages`, + `warnings`, `writable_as`, `not_writable_as`, `flatten_cells`). + Warning kinds serialize as `empty_filter`, `runtime_rejection`, + `trivial_result`. Shapes serialize as nested `{kind, ...}` trees + (via `shapeToJson`) so agents can pattern-match shape structure + without re-parsing. Also surfaces in the new `lambe_explain` MCP + tool. +- Both `--explain-trivial` and `--explain-json` imply `--explain`. +- New `shapeToJson(Shape)`, `renderExplainJson(ExplainReport)`, + `WarningKind` enum, and `ExplainWarning.kind` field on the + library. + +### `--ndjson` mode for line-delimited JSON input + +- Each line is parsed as an independent JSON document; the query is + evaluated per line with no shared state; one compact JSON result + per line. Auto-enabled when the file extension is `.ndjson` or + `.jsonl`. Stdin support streams: `tail -f app.log | lam --ndjson + '.level'` emits each result as the line arrives. +- Fail-fast on the first malformed or unevaluable line; error + carries the line number. +- New `queryNdjson(Iterable, LamExpr)` library function + (`Iterable`, lazy). +- Cannot combine with `--interactive`, `--schema`, `--assert`, or + `--explain`; output is restricted to JSON (`--to` other than + `json` is refused). + +### Null input + +- **`-n` / `--null-input` flag.** Run a query against `null` context + with no input file. Useful for value computations: + `lam -n '[1,2,3] | unique'`. Without `-n`, the missing-input guard + fires (typo'd filename or missing redirect is a common footgun); + the flag puts the "I have no input" intent on the command line + where it's visible in scripts and code review. The `--null-input` + spelling matches jq exactly. +- Cannot combine with `--interactive`, `--ndjson`, `--schema`, or + `--assert`. The TTY stdin guard is unchanged. + +### `--flatten-cells` for CSV/TSV + +- Opt-in escape hatch: non-scalar cells encoded as JSON strings + inline. Accepts `refuse` (default, 0.8.0 behavior) or `json`. + Under `json`, the shape check widens `MustBeFlatList` to + `MustBeList` for csv/tsv. Round-tripping the resulting CSV back + into Lambe does NOT recover structure; this is an output-side + escape hatch, not a faithful encoding. +- Surfaced at the CLI (`--flatten-cells`), REPL + (`:flatten-cells`), MCP (`flatten_cells` parameter), and as + `CellPolicy flattenCells` on `formatOutput`, `canWriteAs`, + `canWriteShapeAs`, `requirementFor`, and `explain`. + +### Cross-surface hints + +- **`NotWritable.hints`.** When a shape mismatch has an + environmental resolution (a flag, a setting, a tool parameter), + the report carries a structured `Hint` type with `label`, + `cliFlag`, `replCommand`, `mcpParameter`, and `explanation`. CLI, + REPL, and MCP each render the form that applies to them. + Agent-facing JSON carries `parameter`/`value` pairs, not CLI + syntax. +- The first shipping hint covers `--flatten-cells json`: when a + CSV/TSV request rejects under `refuse` but a list root is + already present. + +### Breaking changes + +- **`--schema` flag renamed to `--print-shape`.** 0.8.0's `--schema` + printed a type-name JSON summary of the data. That function moved + to `--print-shape`. The new `--schema` takes a JSON Schema file + path. Users scripting `lam --schema data.json` must change to + `lam --print-shape data.json`. ArgParser rejects the old form + because `--schema` now requires a value. +- **`--print-shape` output format changed.** Emits a JSON Schema + subset document (`{"type": "object", "properties": ..., "required": + ...}`) instead of the type-name-string JSON format 0.8.0 emitted + (`{"age": "number"}`). The new output round-trips with + `--schema` input; the old format had no round-trip path. +- **MCP tool `lambe_schema` renamed to `lambe_print_shape`.** Output + format also changed to JSON Schema, matching the CLI. Agents that + hardcoded the old tool name get "tool not found" and a message + pointing at `lambe_print_shape`. +- **`Shape` ADT gained `SOptional` variant.** Source-breaking for + external code that pattern-matches `Shape` without a default case + (probably just Lambe itself). Exhaustive switches now need a + fifth branch. +- **`ExplainWarning` constructor gained required `kind` parameter.** + External code constructing warnings directly must add a + `WarningKind`. Uncommon; the existing pattern is consuming + warnings, not producing them. + +### Deprecated + +- **`inferSchema(Object? value)`** library function. Emits + type-name-string JSON (no round-trip). Use + `renderJsonSchema(shapeOf(value))` for JSON Schema output, or + `shapeOf(value)` for the `Shape` ADT. Scheduled for removal in + 1.0. + +### Install ergonomics + +- **`install.sh`** — one-line installer at the repo root. + `curl -fsSL https://raw.githubusercontent.com/hakimjonas/lambe/main/install.sh | sh` + downloads the latest `lam` and `lam-mcp` binaries for the current + platform (Linux x64/arm64, macOS x64/arm64), verifies SHA256 + against a published `checksums.txt`, and installs to + `~/.local/bin/`. No sudo, no shell rc edits. Respects + `LAMBE_VERSION` and `LAMBE_PREFIX` env vars. +- **Release workflow generates `checksums.txt`.** `.github/workflows/release.yml` + now publishes a combined SHA256 manifest for every release + artifact as an asset. `install.sh` relies on this for integrity + checking; downstream package managers (a future Homebrew tap, + apt/rpm) can reuse it. + +### Tooling + +- **CHANGELOG self-validation.** `tool/lint_changelog.sh` uses lambé + itself (via `--assert`) to validate this file's structural + invariants on every CI run: at least one H2 release entry, no + duplicate H2s, the first heading is H2, and the latest H2 matches + `pubspec.yaml`'s version. The toolchain checks itself: rumil's + Markdown parser handles the input, lambé's query model expresses + the invariants. See `doc/recipes.md#querying-a-changelog` for the + underlying queries. + ## 0.8.0 Adds element-level shape checking for CSV/TSV output, union headers diff --git a/DESIGN.md b/DESIGN.md index c9ae4d9..344d2c3 100644 --- a/DESIGN.md +++ b/DESIGN.md @@ -66,7 +66,7 @@ Absence is data (Maybe/Option semantics). Type mismatch is an error. |---------|---------|-----| | **CLI binary** | Platform engineers, DevOps | `dart compile exe` -> standalone `lam` binary | | **Dart library** | Flutter/Dart developers | `import 'package:lambe/lambe.dart'` | -| **MCP tool** | AI agents, LLM frameworks | `lambe_query`, `lambe_schema`, `lambe_assert` | +| **MCP tool** | AI agents, LLM frameworks | `lambe_query`, `lambe_print_shape`, `lambe_check`, `lambe_explain`, `lambe_assert` | --- diff --git a/README.md b/README.md index 0750561..a0a3fe6 100644 --- a/README.md +++ b/README.md @@ -1,8 +1,10 @@ # Lambë -*Query structured data, get errors with suggested fixes, and reshape results to the format you need.* +*A query language for structured data that shows you what you're working with.* -Lambë is a query language for JSON, YAML, TOML, HCL, CSV, TSV, and Markdown. Queries compose through a pipe operator, the same way a shell pipeline does. What's different: when a query produces a result your target format cannot serialize, Lambë infers the shape, explains the mismatch, and lists the curated query fragments that bridge it. The `as(fmt)` operator lets you ask for the bridge directly in the query language; `--explain` shows the shape at every pipe stage without running anything. +`lam` queries JSON, YAML, TOML, HCL, CSV, TSV, and Markdown. Unlike other query tools, it tells you what your query *does* before you run it — the shape at each pipe stage, which output formats can serialize the result, what would go wrong. + +Use it when you don't already know the data: inspecting an unfamiliar API response, auditing a Helm chart, verifying a CI pipeline's assumptions, or asking an AI agent to extract something without guessing at the structure. ``` $ lam --to toml '.dependencies | keys' pubspec.yaml @@ -14,15 +16,23 @@ $ lam --to toml '.dependencies | keys | as(toml)' pubspec.yaml items = ["rumil", "rumil_parsers", "rumil_expressions"] ``` +Queries are bounded and always terminate. No recursion, no lambdas, no `def`. That's the tradeoff: Lambe doesn't try to be a programming language, so its shape inference, `--explain`, `--schema`, and error remediations all work. + *Lambë (pronounced "lam-beh") means "language" in Quenya (Tolkien's elvish). The package name is `lambe` for ASCII compatibility.* ## Installation +One-line installer (Linux and macOS, no `sudo`, verifies SHA256 checksums): + ```bash -# Pre-built binary (no Dart required) -curl -L https://github.com/hakimjonas/lambe/releases/latest/download/lam-linux-x64 -o lam -chmod +x lam && sudo mv lam /usr/local/bin/ +curl -fsSL https://raw.githubusercontent.com/hakimjonas/lambe/main/install.sh | sh +``` + +This downloads `lam` and `lam-mcp` from the latest GitHub release into `~/.local/bin/`. Environment variables `LAMBE_VERSION` (pin a version) and `LAMBE_PREFIX` (change install dir) are supported; see the script for details. +Other options: + +```bash # From pub.dev (Dart users) dart pub global activate lambe @@ -57,6 +67,10 @@ The same flow applies to CSV and TSV (which require a list of records at the roo Suggestions surface the intent-level `as()` form. The explanation names the raw fragment (`{value: .}`, `to_entries`, etc.) the bridge composes, so `--explain` and manual composition stay available to anyone who wants them. +### Non-scalar cells in CSV/TSV + +By default, nested lists or maps in CSV/TSV cells are rejected — there is no faithful delimited rendering for them. When you need a quick export and lossy is acceptable, pass `--flatten-cells json` (CLI) or `:flatten-cells json` (REPL) to encode them as JSON strings inline. Round-tripping the resulting file back into Lambë does not recover the original structure; prefer reshaping the data query-side when fidelity matters. + ### `as(fmt)` — bridging in the query language When the shape of the target format is known up front, `as(fmt)` performs the bridge inside the query. The combinator is a no-op when the input already satisfies the target, applies a single curated bridge when one exists, and lists the candidates when more than one could apply. @@ -89,6 +103,36 @@ Writable as: json, yaml, csv, tsv Not writable as: toml, hcl ``` +Explain flags provably-empty filters (`filter(.missing)` on a known shape) and runtime-rejection mismatches (`filter` on a non-list input) by default. Pass `--explain-trivial` to also flag `sort_by`/`group_by`/`map`/`unique_by` whose argument references a missing field (often a typo, sometimes intentional). For agent tooling and build pipelines, `--explain-json` emits the same information as a structured JSON document. + +### `--schema` — declare a shape and let Lambe check your work + +When you have a JSON Schema for your data — from an API contract, OpenAPI spec, or hand-written docs — point `--schema` at it: + +``` +$ lam --schema api.schema.json --explain '.users | map(.email)' response.json +.users : list>> +| map(.email) : list> + +Writable as: json, yaml, csv, tsv +Not writable as: toml, hcl +``` + +The schema fills in information data alone can't express: optional fields (from JSON Schema's `required`), element shapes of empty lists, types `shapeOf` couldn't infer from sampling. `--explain` shows them; the evaluator trusts them. + +With data present, Lambe also validates: a schema saying `age: number` against data with `age: "30"` exits 1 at load time with a JSON-path-annotated diagnostic. No silent drift, no running a query against data that doesn't match its contract. + +A sibling `.schema.json` is auto-detected, so a project convention of placing schemas next to data works without explicit flags. + +The reverse direction is symmetrical: `lam --print-shape data.json` emits the inferred shape as a JSON Schema document. Round-trip: + +``` +lam --print-shape data.json > data.schema.json # bootstrap a schema from data +lam --schema data.schema.json '.users' data.json # use it back +``` + +Accepted JSON Schema keywords: `type`, `properties`, `items`, `required`. Value-level constraints (`minimum`, `pattern`, `enum`, etc.), structural combinators (`allOf`, `oneOf`), `$ref`, and conditional schemas are rejected with a per-keyword error. Lambe is a shape system, not a validation engine — for richer validation, reach for `ajv` or `check-jsonschema`. + ## Query Syntax Queries start with `.` (the current data) and chain operations with `|`: @@ -170,8 +214,11 @@ lam '.users | map("\(.name) is \(.age)")' data.json # Shape trace lam --explain '.users | map(.name)' data.json -# Schema inference -lam --schema data.json +# Shape inspection (JSON Schema output) +lam --print-shape data.json + +# Schema-checked queries: validate data against a schema as it runs +lam --schema api.schema.json '.users | map(.email)' response.json # CI validation lam --assert '.version != "0.0.0"' package.json @@ -181,6 +228,11 @@ lam --assert '.replicas >= 2' deployment.yaml lam --to yaml '.config' data.json lam --to csv '.users | map({name, age})' data.json lam --to toml '.config | as(toml)' data.json +lam --to csv --flatten-cells json '.users' data.json # encode nested cells as JSON + +# Line-delimited JSON (logs, event streams) +lam --ndjson '.user.id' events.ndjson +tail -f app.log | lam --ndjson '.level' # Query any format (auto-detected from extension) lam '. | filter(.status != "closed")' issues.csv @@ -198,7 +250,7 @@ lam -i data.json ``` ``` -lambe v0.8.0 - type :help for commands, :q to quit +lambe v0.9.0 - type :help for commands, :q to quit Data loaded: {3 fields, 42 users} lambe> .users | filter(.age > 30) | map(.name) @@ -239,8 +291,13 @@ final result2 = evaluateAst(ast, dataset2); final yaml = formatOutput(data, OutputFormat.yaml); final csv = formatOutput(users, OutputFormat.csv); -// Schema inference -final schema = inferSchema(data); +// Shape inference and JSON Schema output +final shape = shapeOf(data); // Shape ADT +final schemaJson = renderJsonSchema(shape); // JSON Schema text + +// Or parse a schema file and merge with observed data +final schema = parseJsonSchema(schemaSource); +final merged = mergeSchemaWithData(schema, shape); // throws on disagreement ``` ### Shape and bridging API @@ -342,7 +399,15 @@ Install, then add `.mcp.json` to your project: } ``` -This gives AI assistants three tools: `lambe_query` (extract/filter/transform), `lambe_schema` (structure inspection), `lambe_assert` (validation). When `lambe_query` encounters a shape mismatch with the requested output format, the error response includes a structured `suggestions` array: each entry carries a `template_text`, an `apply_as` (the complete query formed by appending the template to the original expression), and a one-line `explanation`. Agents can call the tool again with an `apply_as` verbatim. +This gives AI assistants five tools that cover the whole feedback loop: + +- `lambe_query` — extract/filter/transform, with an optional `schema` parameter that validates data structurally before the query runs. +- `lambe_print_shape` — inspect unfamiliar data; returns a JSON Schema subset document. +- `lambe_check` — validate data against a JSON Schema. Returns `{"ok": true}` or `{"ok": false, "error": "..."}` naming the disagreement path. +- `lambe_explain` — trace a query statically (with or without data); returns a structured JSON report with shape-per-stage, warnings, and writability. +- `lambe_assert` — boolean assertion on a query result. + +When `lambe_query` encounters a shape mismatch with the requested output format, the error response includes a structured `suggestions` array: each entry carries a `template_text`, an `apply_as` (the complete query formed by appending the template to the original expression), and a one-line `explanation`. Agents can call the tool again with an `apply_as` verbatim. ### For AI Coding Agents @@ -376,9 +441,21 @@ expect(data, lamHas('.users[0].address.city')); - [Getting started](doc/getting-started.md) - install and first queries - [Syntax reference](doc/syntax.md) - the full query language - [REPL guide](doc/repl.md) - interactive mode, commands, keyboard shortcuts +- [Schema guide](doc/schema.md) - the JSON Schema subset, merge semantics, round-trip - [Recipes](doc/recipes.md) - real-world patterns for Kubernetes, Terraform, CI, CSV - [Man page](doc/lam.1.md) - Unix man page (`man -l doc/lam.1`) +## What lambé is not + +Lambé is a bounded tree transformer over JSON-shaped data. It +deliberately omits Turing-completeness, user-defined functions, +recursive descent (`..`), `try`/`catch`, regex, streaming, and +in-place mutation. Staying bounded is what makes shape inference, +`--explain`, and `as(fmt)` bridging work. + +See [doc/non-goals.md](doc/non-goals.md) for the full list and the +lambé idiom that replaces each omission. + ## Design See [DESIGN.md](DESIGN.md) for architecture and design decisions. diff --git a/bin/lam.dart b/bin/lam.dart index d9bee8e..d1d71f7 100644 --- a/bin/lam.dart +++ b/bin/lam.dart @@ -34,9 +34,27 @@ void main(List arguments) { help: 'Output format', allowed: ['json', 'yaml', 'toml', 'csv', 'tsv', 'hcl'], ) - ..addFlag( + ..addOption( + 'flatten-cells', + help: + 'CSV/TSV policy for non-scalar cells. ' + 'refuse (default) rejects them; json encodes them as ' + 'JSON strings inline.', + allowed: ['refuse', 'json'], + defaultsTo: 'refuse', + ) + ..addOption( 'schema', - help: 'Show data structure without values', + help: + 'Path to a JSON Schema subset file. Threads the declared ' + 'shape through inference and explain. If omitted, a ' + 'sibling .schema.json is used when present.', + ) + ..addFlag( + 'print-shape', + help: + 'Print the inferred shape of the data as a JSON Schema. ' + 'Renames the 0.8.0 --schema flag with the same meaning.', negatable: false, ) ..addFlag( @@ -44,6 +62,21 @@ void main(List arguments) { help: 'Show shape trace of the query (static analysis, no execution)', negatable: false, ) + ..addFlag( + 'explain-trivial', + help: + 'Include trivial-result warnings in the explain report ' + '(sort_by/group_by/map/unique_by on a missing field). ' + 'Implies --explain.', + negatable: false, + ) + ..addFlag( + 'explain-json', + help: + 'Emit the explain report as JSON instead of the text table. ' + 'Implies --explain.', + negatable: false, + ) ..addFlag( 'assert', help: 'Assert expression is true (exit 1 if false)', @@ -55,6 +88,21 @@ void main(List arguments) { help: 'Interactive REPL mode', negatable: false, ) + ..addFlag( + 'ndjson', + help: + 'Treat input as ndjson/jsonl: one JSON document per line, ' + 'evaluated independently. One result per line on stdout.', + negatable: false, + ) + ..addFlag( + 'null-input', + abbr: 'n', + help: + 'Run the query against null context with no input. Useful ' + 'for value computations: `lam -n \'[1,2,3] | unique\'`.', + negatable: false, + ) ..addFlag('help', abbr: 'h', negatable: false, help: 'Show usage'); final ArgResults args; @@ -72,14 +120,43 @@ void main(List arguments) { return; } - // --schema mode: no expression needed, just file - final isSchemaMode = args.flag('schema'); + // --print-shape mode: no expression needed, just file. + final isPrintShapeMode = args.flag('print-shape'); + final schemaPath = args.option('schema'); final isAssertMode = args.flag('assert'); final isInteractive = args.flag('interactive'); - final isExplainMode = args.flag('explain'); + // --explain-trivial and --explain-json imply --explain, so enable + // explain mode if any of the three is set. + final explainTrivial = args.flag('explain-trivial'); + final explainJson = args.flag('explain-json'); + final isExplainMode = args.flag('explain') || explainTrivial || explainJson; + var isNdjsonMode = args.flag('ndjson'); + final nullInput = args.flag('null-input'); + + // -n / --null-input combinations. The flag's purpose is "run the + // query against null with no input"; combinations that take input + // (REPL, ndjson, schema validation, assert) are nonsensical. + if (nullInput) { + if (isInteractive) { + stderr.writeln('Error: -n cannot be combined with --interactive.'); + exit(1); + } + if (isNdjsonMode) { + stderr.writeln('Error: -n cannot be combined with --ndjson.'); + exit(1); + } + if (schemaPath != null) { + stderr.writeln('Error: -n cannot be combined with --schema.'); + exit(1); + } + if (isAssertMode) { + stderr.writeln('Error: -n cannot be combined with --assert.'); + exit(1); + } + } final rest = args.rest; - if (rest.isEmpty && !isSchemaMode && !isInteractive) { + if (rest.isEmpty && !isPrintShapeMode && !isInteractive) { stderr.writeln('Error: missing query expression.'); stderr.writeln(); _usage(argParser); @@ -93,9 +170,70 @@ void main(List arguments) { exit(1); } - final expression = rest.isNotEmpty ? rest[0] : '.'; - final fileArgIndex = - (isSchemaMode || isInteractive) && rest.length == 1 ? 0 : 1; + final int fileArgIndex; + if (isInteractive && rest.length == 1) { + fileArgIndex = 0; + } else if (isPrintShapeMode && rest.length == 1) { + // --print-shape is overloaded: a single positional may be either + // a file (legacy form: `lam --print-shape data.json`) or an + // expression (compose form: `lam --print-shape '.users'` with + // piped or no data). Disambiguate by file existence — if rest[0] + // names an existing file, treat it as the file; otherwise treat + // it as an expression. The collision case (a file whose name + // happens to be a valid lambé expression like `.users`) is + // vanishingly unlikely; plain identifier filenames aren't valid + // queries either. + fileArgIndex = File(rest[0]).existsSync() ? 0 : 1; + } else { + fileArgIndex = 1; + } + // The expression sits at rest[0] when fileArgIndex isn't 0; when + // fileArgIndex == 0 the user gave a file but no expression, so the + // identity expression is the right default. + final expression = (rest.isNotEmpty && fileArgIndex != 0) ? rest[0] : '.'; + + // Auto-enable ndjson mode when the file extension suggests it, even + // without an explicit --ndjson flag. Consistent with the existing + // format auto-detection convention for .csv, .yaml, etc. + if (!isNdjsonMode && rest.length > fileArgIndex) { + final fpath = rest[fileArgIndex].toLowerCase(); + if (fpath.endsWith('.ndjson') || fpath.endsWith('.jsonl')) { + isNdjsonMode = true; + } + } + + if (isNdjsonMode) { + if (isInteractive) { + stderr.writeln('Error: --ndjson cannot be combined with --interactive.'); + exit(1); + } + if (isPrintShapeMode) { + stderr.writeln('Error: --ndjson cannot be combined with --print-shape.'); + exit(1); + } + if (schemaPath != null) { + stderr.writeln('Error: --ndjson cannot be combined with --schema.'); + exit(1); + } + if (isAssertMode) { + stderr.writeln('Error: --ndjson cannot be combined with --assert.'); + exit(1); + } + if (isExplainMode) { + stderr.writeln('Error: --ndjson cannot be combined with --explain.'); + exit(1); + } + final toArg = args.option('to'); + if (toArg != null && toArg != 'json') { + stderr.writeln( + 'Error: --ndjson emits one compact JSON document per line; ' + '--to $toArg is not supported.', + ); + exit(1); + } + _runNdjson(argParser, expression, rest, fileArgIndex); + return; + } String? input; String? filePath; @@ -108,9 +246,12 @@ void main(List arguments) { } input = file.readAsStringSync(); } else if (stdin.hasTerminal) { - // `--explain` performs static analysis and can run without input. - // Every other mode requires a file argument or piped stdin. - if (!isExplainMode) { + // `--explain` and `--print-shape` perform static analysis and can + // run without input — `--print-shape EXPR` falls back to inferring + // from SAny, mirroring the explain-without-data flow. `-n` is the + // explicit "run against null" opt-in. Every other mode requires + // a file argument or piped stdin. + if (!isExplainMode && !isPrintShapeMode && !nullInput) { stderr.writeln('Error: no input. Provide a file or pipe data via stdin.'); stderr.writeln(); _usage(argParser); @@ -122,7 +263,28 @@ void main(List arguments) { while ((line = stdin.readLineSync()) != null) { buffer.writeln(line); } - input = buffer.toString(); + // Empty stdin in static-analysis modes (--explain, --print-shape) + // and explicit null-input mode (-n): treat as "no data" rather + // than trying to parse the empty string as JSON. This matches the + // no-stdin branch's contract. + if (buffer.isEmpty) { + if (isExplainMode || isPrintShapeMode || nullInput) { + input = null; + } else { + // Empty piped stdin in evaluation mode is the same footgun as + // a missing file argument: surface the "no input" message + // rather than confusing the user with a JSON parse error on + // the empty string. + stderr.writeln( + 'Error: no input. Provide a file or pipe data via stdin.', + ); + stderr.writeln(); + _usage(argParser); + exit(1); + } + } else { + input = buffer.toString(); + } } // Determine input format (only relevant when we have input). @@ -166,10 +328,50 @@ void main(List arguments) { return; } - // --schema mode: show structure and exit - if (isSchemaMode) { - final schema = inferSchema(data); - stdout.writeln(const JsonEncoder.withIndent(' ').convert(schema)); + // --print-shape mode: emit the inferred shape as JSON Schema. + // Composes with the query expression — `lam --print-shape '.users' + // data.json` prints the shape of the result of evaluating `.users`, + // not the whole document. With no expression, prints the document + // shape (the legacy 0.8.0 form). Without data, falls back to + // inferShape against SAny — same as `--explain` without data. + if (isPrintShapeMode) { + if (schemaPath != null) { + stderr.writeln( + 'Error: --print-shape prints the inferred shape of the data; ' + '--schema has nothing to contribute.', + ); + exit(1); + } + // No expression: print the document shape directly. + final hasExpression = rest.isNotEmpty && fileArgIndex != 0; + if (!hasExpression) { + final shape = data == null ? const SAny() : shapeOf(data); + stdout.writeln(renderJsonSchema(shape)); + return; + } + final LamExpr ast; + try { + ast = parseAst(expression); + } on QueryError catch (e) { + stderr.writeln('Error: ${e.message}'); + exit(1); + } + final Shape resultShape; + if (data == null) { + // Mirror --explain-without-data: infer shape statically against + // the empty-prior SAny. The user gets the static shape of the + // query, the same answer --explain would give. + resultShape = inferShape(ast, const SAny()); + } else { + try { + final result = evaluateAst(ast, data); + resultShape = shapeOf(result); + } on QueryError catch (e) { + stderr.writeln('Error: ${e.message}'); + exit(1); + } + } + stdout.writeln(renderJsonSchema(resultShape)); return; } @@ -182,12 +384,59 @@ void main(List arguments) { stderr.writeln('Error: ${e.message}'); exit(1); } - final inputShape = data == null ? const SAny() : shapeOf(data); - final report = explain(ast, inputShape); - stdout.write(renderExplain(report)); + // Initial shape: schema when provided (explicit or auto-detected + // sibling), merged with shapeOf(data). Falls back to SAny / data + // shape when no schema is available. + final dataShape = data == null ? const SAny() : shapeOf(data); + final Shape inputShape; + try { + final schema = loadSchemaForData( + explicitSchemaPath: schemaPath, + dataPath: + data != null && rest.length > fileArgIndex + ? rest[fileArgIndex] + : null, + ); + inputShape = + schema == null ? dataShape : mergeSchemaWithData(schema, dataShape); + } on QueryError catch (e) { + stderr.writeln('Error: ${e.message}'); + exit(1); + } + final cellPolicy = CellPolicy.values.byName(args.option('flatten-cells')!); + final report = explain( + ast, + inputShape, + flattenCells: cellPolicy, + includeTrivial: explainTrivial, + ); + if (explainJson) { + stdout.writeln(renderExplainJson(report)); + } else { + stdout.write(renderExplain(report)); + } return; } + // If a schema is in effect, validate it against the data before + // evaluating. mergeSchemaWithData throws on concrete-type + // disagreement; this gives structural validation as a side effect + // of --schema. + if (data != null) { + try { + final schema = loadSchemaForData( + explicitSchemaPath: schemaPath, + dataPath: rest.length > fileArgIndex ? rest[fileArgIndex] : null, + ); + if (schema != null) { + mergeSchemaWithData(schema, shapeOf(data)); + } + } on QueryError catch (e) { + stderr.writeln('Error: ${e.message}'); + exit(1); + } + } + // The parsed AST is retained so that, if serialization later hits an // OutputShapeError, a chosen remediation can be composed with it via // applyBridge without re-parsing. @@ -225,12 +474,14 @@ void main(List arguments) { final toArg = args.option('to'); if (toArg != null) { final outputFormat = OutputFormat.values.byName(toArg); + final cellPolicy = CellPolicy.values.byName(args.option('flatten-cells')!); _writeWithBridge( result, outputFormat, pretty: args.flag('pretty'), queryAst: queryAst, data: data, + flattenCells: cellPolicy, ); } else if (args.flag('raw') && result is String) { stdout.writeln(result); @@ -253,18 +504,23 @@ void _writeWithBridge( required bool pretty, required LamExpr queryAst, required Object? data, + required CellPolicy flattenCells, }) { try { - stdout.writeln(formatOutput(result, fmt, pretty: pretty)); + stdout.writeln( + formatOutput(result, fmt, pretty: pretty, flattenCells: flattenCells), + ); return; } on OutputShapeError catch (e) { if (!(stdin.hasTerminal && stdout.hasTerminal)) { stderr.writeln('Error: ${e.message}'); + _writeHintsCli(e.hints); exit(1); } final choice = _promptForRemediation(e); if (choice == null) { stderr.writeln('Error: ${e.message}'); + _writeHintsCli(e.hints); exit(1); } // Re-evaluate with the chosen bridge applied to the user's AST, @@ -274,7 +530,14 @@ void _writeWithBridge( final bridged = applyBridge(queryAst, choice.template); try { final Object? newResult = evaluateAst(bridged, data); - stdout.writeln(formatOutput(newResult, fmt, pretty: pretty)); + stdout.writeln( + formatOutput( + newResult, + fmt, + pretty: pretty, + flattenCells: flattenCells, + ), + ); } on QueryError catch (e2) { stderr.writeln('Error applying "${choice.display}": ${e2.message}'); exit(1); @@ -285,6 +548,14 @@ void _writeWithBridge( } } +/// Render [hints] in CLI form to stderr, one per line, after the +/// shape-error message. Silent when no hints are present. +void _writeHintsCli(List hints) { + for (final h in hints) { + stderr.writeln('Or pass ${h.cliFlag}: ${h.explanation}'); + } +} + /// Interactive prompt for the remediations carried by an /// [OutputShapeError]. /// @@ -293,6 +564,9 @@ void _writeWithBridge( /// `q`, a blank line, EOF, or an index outside the valid range. Remediation? _promptForRemediation(OutputShapeError err) { stdout.writeln(err.message); + for (final h in err.hints) { + stdout.writeln('Or pass ${h.cliFlag}: ${h.explanation}'); + } stdout.writeln(); stdout.writeln('Apply a bridge?'); for (var i = 0; i < err.suggestions.length; i++) { @@ -311,6 +585,69 @@ Remediation? _promptForRemediation(OutputShapeError err) { return err.suggestions[pick - 1]; } +/// Handle `--ndjson` mode: evaluate the query against each non-empty +/// line of input independently, emit one compact JSON document per +/// line. +/// +/// File input is read eagerly into a list of lines (sufficient for +/// typical ndjson files). Stdin is read line by line, so `tail -f | +/// lam --ndjson` works as expected. On the first line that fails to +/// parse or evaluate, writes the error with line number to stderr and +/// exits 1; subsequent lines are not evaluated. Fail-fast matches the +/// single-document CLI's semantics and jq's default behavior. +void _runNdjson( + ArgParser argParser, + String expression, + List rest, + int fileArgIndex, +) { + final LamExpr queryAst; + try { + queryAst = parseAst(expression); + } on QueryError catch (e) { + stderr.writeln('Error: ${e.message}'); + exit(1); + } + + Iterable lines; + if (rest.length > fileArgIndex) { + final filePath = rest[fileArgIndex]; + final file = File(filePath); + if (!file.existsSync()) { + stderr.writeln('Error: file not found: $filePath'); + exit(1); + } + lines = file.readAsLinesSync(); + } else if (stdin.hasTerminal) { + stderr.writeln('Error: --ndjson needs a file argument or piped stdin.'); + stderr.writeln(); + _usage(argParser); + exit(1); + } else { + // Lazy stdin reader so `tail -f app.log | lam --ndjson ...` emits + // each line's result as soon as it arrives, not after EOF. The + // iterable completes when readLineSync returns null (pipe closed). + lines = _stdinLines(); + } + + try { + for (final result in queryNdjson(lines, queryAst)) { + stdout.writeln(const JsonEncoder().convert(result)); + } + } on QueryError catch (e) { + stderr.writeln('Error: ${e.message}'); + exit(1); + } +} + +/// Lazy iterable over stdin lines, terminating at EOF. +Iterable _stdinLines() sync* { + String? line; + while ((line = stdin.readLineSync()) != null) { + yield line!; + } +} + /// Print usage information to stderr. void _usage(ArgParser parser) { stderr.writeln('Usage: lam [options] [file]'); diff --git a/bin/mcp_server.dart b/bin/mcp_server.dart index df2d91c..d40ab0f 100644 --- a/bin/mcp_server.dart +++ b/bin/mcp_server.dart @@ -24,10 +24,14 @@ base class LambeServer extends MCPServer with ToolsSupport { implementation: Implementation(name: 'lambe', version: lambeVersion), instructions: 'Lambé is a multi-format query language for structured data. ' - 'Use the query tool to find, extract, filter, transform, or look up ' + 'Use lambe_query to find, extract, filter, transform, or look up ' 'values from JSON, YAML, TOML, HCL, CSV, TSV, or Markdown files. ' - 'Use the schema tool to understand data structure before querying. ' - 'Use the assert tool to validate or check conditions on data.\n\n' + 'Use lambe_print_shape to understand data structure before ' + 'querying (returns JSON Schema). ' + 'Use lambe_check to validate data against a JSON Schema. ' + 'Use lambe_explain to trace a query statically before running ' + 'it (returns a structured JSON report of shape at each stage). ' + 'Use lambe_assert to validate or check conditions on data.\n\n' 'Common patterns:\n' ' .database.host — extract a value\n' ' .users | filter(.age > 30) | map(.name) — filter and project\n' @@ -58,25 +62,38 @@ base class LambeServer extends MCPServer with ToolsSupport { '(children), link (href, title, children), image (src, alt, title), ' 'emphasis (children), strong (children), text (text), code (code), ' 'thematic_break, hard_break, soft_break, html_block (html), ' - 'html_inline (html). Links and images are inline nodes and appear ' - 'nested inside heading/paragraph children (no recursive descent op ' - 'currently; drill in via explicit .children paths).\n' + 'html_inline (html). Links and images are inline nodes nested ' + 'inside heading/paragraph children. Use the `text` pipe op to ' + 'extract prose from any node tree (it walks children recursively ' + 'and concatenates text/code/code_block/image.alt leaves) — ' + '`.children[0].text` only sees the first immediate child and ' + 'misses nested emphasis, links, and inline code.\n' '\n' 'Markdown query patterns:\n' - ' .children | filter(.type == "heading") | map(.children[0].text)\n' - ' — extract all heading texts\n' - ' .children | filter(.type == "heading") | map({level, text: .children[0].text})\n' + ' .children | filter(.type == "heading") | map(text)\n' + ' — extract all heading texts (handles nested formatting)\n' + ' .children | filter(.type == "heading") | map({level, title: text})\n' ' — headings with levels\n' ' .children | filter(.type == "code_block") | map(.language)\n' ' — list code block languages\n' ' .children | filter(.type == "code_block" && .language == "python") | map(.code)\n' - ' — code blocks for one language\n', + ' — code blocks for one language\n' + ' . | text\n' + ' — entire document as plain prose\n', ) { registerTool(_queryTool, _handleQuery); - registerTool(_schemaTool, _handleSchema); + registerTool(_printShapeTool, _handlePrintShape); + registerTool(_checkTool, _handleCheck); + registerTool(_explainTool, _handleExplain); registerTool(_assertTool, _handleAssert); } + /// Build an error-shaped [CallToolResult] (`isError: true`) wrapping + /// [message]. Centralises the boilerplate at every handler's catch + /// site. + CallToolResult _errorResult(String message) => + CallToolResult(content: [TextContent(text: message)], isError: true); + final _queryTool = Tool( name: 'lambe_query', description: @@ -168,6 +185,25 @@ base class LambeServer extends MCPServer with ToolsSupport { 'list of lists).', values: ['json', 'yaml', 'toml', 'csv', 'tsv', 'hcl'], ), + 'flatten_cells': UntitledSingleSelectEnumSchema( + description: + 'CSV/TSV policy for non-scalar cells. refuse (default) ' + 'rejects list- or map-valued cells with a shape error; ' + 'json encodes them as JSON strings inline. Ignored for ' + 'other output formats.', + values: ['refuse', 'json'], + ), + 'schema': Schema.string( + description: + 'Optional inline JSON Schema subset (as a string) ' + 'describing the expected shape of data. When provided, ' + 'the data is validated against the schema before the ' + 'query runs; a concrete-type disagreement returns an ' + 'error. Accepts type, properties, items, required. ' + 'Rejects structural combinators, value-level ' + 'constraints, references, and additionalProperties with ' + 'a per-keyword error.', + ), }, required: ['expression', 'data'], ), @@ -179,73 +215,57 @@ base class LambeServer extends MCPServer with ToolsSupport { final data = args['data'] as String; final formatStr = args['format'] as String?; final outputFormatStr = args['output_format'] as String?; + final flattenCellsStr = args['flatten_cells'] as String?; + final schemaStr = args['schema'] as String?; try { final format = formatStr != null ? Format.values.byName(formatStr) : null; + + // Validate data against schema first, if provided. A structural + // disagreement returns an error before the query runs. + if (schemaStr != null) { + final schema = parseJsonSchema(schemaStr); + final parsed = parseInput(data, format ?? sniffFormat(data)); + mergeSchemaWithData(schema, shapeOf(parsed)); + } + final result = queryString(expression, data, format: format); final outputFormat = outputFormatStr != null ? OutputFormat.values.byName(outputFormatStr) : OutputFormat.json; + final flattenCells = + flattenCellsStr != null + ? CellPolicy.values.byName(flattenCellsStr) + : CellPolicy.refuse; final rendered = outputFormat == OutputFormat.json ? const JsonEncoder.withIndent(' ').convert(result) - : formatOutput(result, outputFormat); + : formatOutput(result, outputFormat, flattenCells: flattenCells); return CallToolResult(content: [TextContent(text: rendered)]); } on OutputShapeError catch (e) { - return CallToolResult( - content: [TextContent(text: _renderShapeErrorPayload(e, expression))], - isError: true, - ); + return _errorResult(renderMcpShapeErrorPayload(e, expression)); } on QueryError catch (e) { - return CallToolResult( - content: [TextContent(text: 'Error: ${e.message}')], - isError: true, - ); + return _errorResult('Error: ${e.message}'); } on FormatException catch (e) { - return CallToolResult( - content: [TextContent(text: 'Parse error: ${e.message}')], - isError: true, - ); + return _errorResult('Parse error: ${e.message}'); } } - /// Render an [OutputShapeError] as a JSON payload for agent - /// consumption. - /// - /// The payload has keys `error`, `message`, `format`, `got_shape`, - /// `original_expression`, and `suggestions`. Each entry in - /// `suggestions` carries a 1-based `id`, a `label`, a `template_text` - /// (the query-fragment source), an `apply_as` (the complete query - /// formed by appending the template to the original expression via - /// `|`), and an `explanation`. - String _renderShapeErrorPayload(OutputShapeError e, String expression) => - const JsonEncoder.withIndent(' ').convert({ - 'error': 'output_shape_mismatch', - 'message': e.message, - 'format': e.format.name, - 'got_shape': renderShape(e.got), - 'original_expression': expression, - 'suggestions': [ - for (var i = 0; i < e.suggestions.length; i++) - { - 'id': i + 1, - 'label': e.suggestions[i].label, - 'template_text': e.suggestions[i].display, - 'apply_as': '$expression | ${e.suggestions[i].display}', - 'explanation': e.suggestions[i].explanation, - }, - ], - }); + // See `renderMcpShapeErrorPayload` in package:lambe/lambe.dart for + // the payload shape this server emits on output-shape mismatches. - final _schemaTool = Tool( - name: 'lambe_schema', + final _printShapeTool = Tool( + name: 'lambe_print_shape', description: 'Use this tool to understand the structure of unfamiliar data before ' - 'writing queries. Returns type names (string, number, boolean, null) ' - 'instead of actual values. Use when the user says "show me the ' - 'structure", "what fields are in this", or "what does this data look ' - 'like".', + 'writing queries. Returns a JSON Schema subset document ' + '(type/properties/items/required) describing the inferred shape. Use ' + 'when the user says "show me the structure", "what fields are in ' + 'this", or "what does this data look like". The output round-trips ' + 'with the `schema` parameter on lambe_query and with lambe_check. ' + 'Renamed from the 0.8.0 lambe_schema tool; output format changed ' + 'from type-name strings to JSON Schema.', inputSchema: Schema.object( properties: { 'data': Schema.string( @@ -262,25 +282,183 @@ base class LambeServer extends MCPServer with ToolsSupport { ), ); - FutureOr _handleSchema(CallToolRequest request) { + FutureOr _handlePrintShape(CallToolRequest request) { + final args = request.arguments!; + final data = args['data'] as String; + final formatStr = args['format'] as String?; + + try { + final format = formatStr != null ? Format.values.byName(formatStr) : null; + final parsed = parseInput(data, format ?? sniffFormat(data)); + return CallToolResult( + content: [TextContent(text: renderJsonSchema(shapeOf(parsed)))], + ); + } on QueryError catch (e) { + return _errorResult('Error: ${e.message}'); + } + } + + final _checkTool = Tool( + name: 'lambe_check', + description: + 'Validate data against a JSON Schema subset. Use this when the user ' + 'wants to verify that data matches an expected shape without ' + 'running a query — API response shape checks, CI contract ' + 'validation, "does this match the spec". Returns ' + '{"ok": true} on agreement, or ' + '{"ok": false, "error": "..."} naming the disagreement path. ' + 'Accepts the same JSON Schema subset as lambe_query\'s schema ' + 'parameter: type, properties, items, required. Structural ' + 'combinators, value-level constraints, and references are ' + 'rejected per-keyword.', + inputSchema: Schema.object( + properties: { + 'schema': Schema.string( + description: 'Inline JSON Schema subset as a string.', + ), + 'data': Schema.string( + description: + 'The input data as a string (JSON, YAML, TOML, HCL, CSV, TSV, ' + 'or Markdown).', + ), + 'format': UntitledSingleSelectEnumSchema( + description: 'Input format. Auto-detected if omitted.', + values: ['json', 'yaml', 'toml', 'hcl', 'csv', 'tsv', 'markdown'], + ), + }, + required: ['schema', 'data'], + ), + ); + + FutureOr _handleCheck(CallToolRequest request) { final args = request.arguments!; + final schemaStr = args['schema'] as String; final data = args['data'] as String; final formatStr = args['format'] as String?; try { + final schema = parseJsonSchema(schemaStr); final format = formatStr != null ? Format.values.byName(formatStr) : null; final parsed = parseInput(data, format ?? sniffFormat(data)); - final schema = inferSchema(parsed); + mergeSchemaWithData(schema, shapeOf(parsed)); + return CallToolResult(content: [TextContent(text: '{"ok": true}')]); + } on QueryError catch (e) { return CallToolResult( content: [ - TextContent(text: const JsonEncoder.withIndent(' ').convert(schema)), + TextContent( + text: const JsonEncoder.withIndent( + ' ', + ).convert({'ok': false, 'error': e.message}), + ), ], ); - } on QueryError catch (e) { + } + } + + final _explainTool = Tool( + name: 'lambe_explain', + description: + 'Use this tool to trace the shape of values flowing through a ' + 'Lambe query without running it. Returns a structured JSON ' + 'report with one entry per pipe stage (source + inferred shape), ' + 'static-analysis warnings (empty filters, runtime rejections, ' + 'and optionally trivial results), and the output formats the ' + 'final shape can be serialized as. Use before `lambe_query` to ' + 'verify a query does what the user expects, or to find out why ' + 'an unfamiliar query would fail. Data is optional: without it, ' + 'the trace starts from "any" and still catches many classes of ' + 'mistake. A schema, when provided, sharpens the trace further.', + inputSchema: Schema.object( + properties: { + 'expression': Schema.string( + description: 'The Lambe query expression to analyze.', + ), + 'data': Schema.string( + description: + 'Optional input data. When present, shape inference seeds ' + 'from shapeOf(data); without it, the initial shape is ' + '"any".', + ), + 'format': UntitledSingleSelectEnumSchema( + description: 'Input format for [data]. Auto-detected if omitted.', + values: ['json', 'yaml', 'toml', 'hcl', 'csv', 'tsv', 'markdown'], + ), + 'schema': Schema.string( + description: + 'Optional inline JSON Schema subset. When provided, the ' + 'schema is merged with shapeOf(data) (or used alone when ' + 'no data is given) to produce a more precise initial ' + 'shape — optional fields and empty-list elements from ' + 'the schema become visible in the trace.', + ), + 'include_trivial': Schema.bool( + description: + 'When true, includes trivial-result warnings ' + '(sort_by/group_by/map/unique_by on a missing field). ' + 'Off by default because legitimate uses exist.', + ), + 'flatten_cells': UntitledSingleSelectEnumSchema( + description: + 'CSV/TSV cell policy for the writability summary. refuse ' + '(default) requires scalar cells; json accepts any list ' + 'at the root.', + values: ['refuse', 'json'], + ), + }, + required: ['expression'], + ), + ); + + FutureOr _handleExplain(CallToolRequest request) { + final args = request.arguments!; + final expression = args['expression'] as String; + final data = args['data'] as String?; + final formatStr = args['format'] as String?; + final schemaStr = args['schema'] as String?; + final includeTrivial = args['include_trivial'] as bool? ?? false; + final flattenCellsStr = args['flatten_cells'] as String?; + + try { + final ast = parseAst(expression); + final flattenCells = + flattenCellsStr != null + ? CellPolicy.values.byName(flattenCellsStr) + : CellPolicy.refuse; + + // Build the initial shape. Four cases: + // - no data, no schema: SAny + // - data only: shapeOf(data) + // - schema only: parseJsonSchema(schema) + // - both: mergeSchemaWithData(schema, shapeOf(data)) + Shape inputShape; + if (data == null && schemaStr == null) { + inputShape = const SAny(); + } else if (data == null) { + inputShape = parseJsonSchema(schemaStr!); + } else { + final format = + formatStr != null ? Format.values.byName(formatStr) : null; + final parsed = parseInput(data, format ?? sniffFormat(data)); + final dataShape = shapeOf(parsed); + inputShape = + schemaStr != null + ? mergeSchemaWithData(parseJsonSchema(schemaStr), dataShape) + : dataShape; + } + + final report = explain( + ast, + inputShape, + flattenCells: flattenCells, + includeTrivial: includeTrivial, + ); return CallToolResult( - content: [TextContent(text: 'Error: ${e.message}')], - isError: true, + content: [TextContent(text: renderExplainJson(report))], ); + } on QueryError catch (e) { + return _errorResult('Error: ${e.message}'); + } on FormatException catch (e) { + return _errorResult('Parse error: ${e.message}'); } } @@ -324,21 +502,13 @@ base class LambeServer extends MCPServer with ToolsSupport { } else if (result == false) { return CallToolResult(content: [TextContent(text: 'FAIL')]); } else { - return CallToolResult( - content: [ - TextContent( - text: - 'Error: assertion expression must return boolean, got ${result.runtimeType}: $result', - ), - ], - isError: true, + return _errorResult( + 'Error: assertion expression must return boolean, ' + 'got ${result.runtimeType}: $result', ); } } on QueryError catch (e) { - return CallToolResult( - content: [TextContent(text: 'Error: ${e.message}')], - isError: true, - ); + return _errorResult('Error: ${e.message}'); } } } diff --git a/doc/getting-started.md b/doc/getting-started.md index 9c666bc..f13764f 100644 --- a/doc/getting-started.md +++ b/doc/getting-started.md @@ -189,13 +189,34 @@ $ echo $? The exit code is 0 if the assertion passes, 1 if it fails. +## Value computations with no input + +For pure value computations — query expressions that build their own +data and don't read from a file or stdin — pass `-n` (`--null-input`): + +```bash +$ lam -n '[1,2,3] | unique' +[ + 1, + 2, + 3 +] + +$ lam -n '[1,2,3] | sum' +6 +``` + +Without `-n`, lambé errors on a missing input — that's deliberate +footgun-catching for typo'd filenames and missing redirects. The +flag makes "I have no input" explicit. + ## The REPL For exploring unfamiliar data, use interactive mode: ```bash $ lam -i data.json -lambe v0.8.0 - type :help for commands, :q to quit +lambe v0.9.0 - type :help for commands, :q to quit Data loaded: {2 fields, 3 users} lambe> @@ -250,7 +271,7 @@ Add to your `pubspec.yaml`: ```yaml dependencies: - lambe: ^0.7.0 + lambe: ^0.9.0 ``` ## Next steps diff --git a/doc/jq-to-lambe.md b/doc/jq-to-lambe.md index 6060660..0bea6e3 100644 --- a/doc/jq-to-lambe.md +++ b/doc/jq-to-lambe.md @@ -4,12 +4,15 @@ A side-by-side mapping of common jq patterns to their Lambe equivalents. Lambe and jq have overlapping but distinct scopes. jq is the established standard for JSON processing on the command line, with a long history and -features Lambe does not have (e.g. streaming, `//` alternative operator, -recursive descent, regex filters). Lambe covers more input formats by default +features Lambe does not have (e.g. streaming, recursive descent, regex +filters, user-defined functions). Lambe covers more input formats by default (YAML, TOML, HCL, CSV, TSV, Markdown) and leans on explicit SQL-like verbs (`filter`, `map`, `sort_by`) rather than jq's terser generic filter model. If you already know jq, most of it translates directly. +See [non-goals.md](non-goals.md) for the full list of deliberate +omissions and the lambé idiom that replaces each one. + All examples use this data: ```json diff --git a/doc/lam.1 b/doc/lam.1 index c397693..a0d03c7 100644 --- a/doc/lam.1 +++ b/doc/lam.1 @@ -1,4 +1,4 @@ -.TH "LAM" "1" "May 2026" "Lambë 0.8.0" "" +.TH "LAM" "1" "May 2026" "Lambë 0.9.0" "" .SH AUTHOR Hakim Jonas Ghoula .SH NAME @@ -25,7 +25,7 @@ Pretty-print output. On by default. Disable pretty-printing. .TP \fB-r\fR, \fB--raw\fR -Output strings without quotes. +Output top-level string scalars without quotes. No effect on structured output (objects, arrays, numbers, booleans, null) — those still serialize through the active output format. .TP \fB-f\fR, \fB--format\fR \fIFMT\fR Input format. One of: json, yaml, toml, hcl, csv, tsv, markdown. Auto-detected from file extension if omitted. @@ -33,11 +33,23 @@ Input format. One of: json, yaml, toml, hcl, csv, tsv, markdown. Auto-detected f \fB-t\fR, \fB--to\fR \fIFMT\fR Output format. One of: json, yaml, toml, csv, tsv, hcl. Default is json. .TP -\fB--schema\fR -Show the data structure with type names instead of values. +\fB--flatten-cells\fR \fIPOLICY\fR +CSV/TSV policy for non-scalar cells. \fBrefuse\fR (default) rejects list- or map-valued cells with a shape error. \fBjson\fR encodes them as JSON strings inline; the shape check correspondingly widens to accept any list at the root. Ignored for other output formats. +.TP +\fB--schema\fR \fIPATH\fR +Path to a JSON Schema subset file. Threads the declared shape through inference and \fB--explain\fR, validates data against the schema at load time (errors on concrete-type disagreement), and fills in shape details the sampled data doesn't cover (empty-list elements, optional fields). Auto-detected as a sibling \fB.schema.json\fR if omitted. Accepts \fBtype\fR, \fBproperties\fR, \fBitems\fR, and \fBrequired\fR; rejects structural combinators (allOf/oneOf/$ref) and value-level constraints (minimum/pattern/enum/etc) with a per-keyword error. +.TP +\fB--print-shape\fR +Print the inferred shape of the data as a JSON Schema subset document. Replaces the 0.8.0 \fB--schema\fR flag with the same meaning, renamed because \fB--schema\fR now takes a path value. .TP \fB--explain\fR -Trace the shape of values flowing through each pipeline stage. Static analysis only; does not execute the query. Reports which output formats the final shape can be serialized as. +Trace the shape of values flowing through each pipeline stage. Static analysis only; does not execute the query. Reports which output formats the final shape can be serialized as. Flags provably-empty filters and runtime-rejection mismatches. +.TP +\fB--explain-trivial\fR +Include trivial-result warnings in the explain report. Flags parameterised ops (\fBsort_by\fR, \fBgroup_by\fR, \fBmap\fR, \fBunique_by\fR) whose argument references a field provably absent on the element shape. Implies \fB--explain\fR. +.TP +\fB--explain-json\fR +Emit the explain report as a JSON document instead of the text table. Useful for agent tooling or build-pipeline integration. Implies \fB--explain\fR. .TP \fB--assert\fR Evaluate the expression and exit with code 0 if the result is true, 1 if false. @@ -45,6 +57,12 @@ Evaluate the expression and exit with code 0 if the result is true, 1 if false. \fB-i\fR, \fB--interactive\fR Start the interactive REPL. Requires a file argument. .TP +\fB--ndjson\fR +Treat input as ndjson or jsonl: one JSON document per line, evaluated independently with no state shared between lines. Emits one compact JSON result per line on stdout. Auto-enabled when the file extension is \fB.ndjson\fR or \fB.jsonl\fR. Cannot combine with \fB--interactive\fR, \fB--schema\fR, \fB--print-shape\fR, \fB--assert\fR, or \fB--explain\fR. Output must be JSON (\fB--to json\fR or default); other \fB--to\fR values are refused. +.TP +\fB-n\fR, \fB--null-input\fR +Run the query against \fBnull\fR context with no input. Useful for value computations like \fClam -n '[1,2,3] | unique'\fR. Without \fB-n\fR, the missing-input guard fires (a typo'd filename or missing redirect is a common footgun); the flag makes the "I have no input" intent explicit. Cannot combine with \fB--interactive\fR, \fB--ndjson\fR, \fB--schema\fR, or \fB--assert\fR. +.TP \fB-h\fR, \fB--help\fR Show usage information. .SH QUERY LANGUAGE @@ -171,6 +189,9 @@ Toggle unquoted string output. \fB:pretty\fR Toggle pretty-printing. .TP +\fB:flatten-cells\fR \fIPOLICY\fR +Set CSV/TSV cell policy for this session. One of: refuse, json. +.TP \fB:load\fR \fIfile\fR Load a different data file. .TP @@ -221,10 +242,22 @@ Format conversion: lam --to yaml '.config' data.json .fi .PP -Schema inspection: +Shape inspection: .PP .nf -lam --schema deployment.yaml +lam --print-shape deployment.yaml +.fi +.PP +Schema-checked query (validates data against the schema before running): +.PP +.nf +lam --schema api.schema.json '.users | map(.email)' response.json +.fi +.PP +Shape trace, schema-seeded (no data needed): +.PP +.nf +lam --schema api.schema.json --explain '.users | map(.email)' .fi .PP Shape trace for a pipeline: @@ -256,6 +289,13 @@ Interactive exploration: .nf lam -i data.json .fi +.PP +Line-delimited JSON (logs, event streams): +.PP +.nf +lam --ndjson '.level' events.ndjson +tail -f app.log | lam --ndjson '.user.id' +.fi .SH SEE ALSO .PP \fBjq\fR(1) — the established JSON query tool. Lambe shares its pipeline aesthetic and extends to multi-format input with shape-aware output. diff --git a/doc/lam.1.md b/doc/lam.1.md index 8c86380..729f25b 100644 --- a/doc/lam.1.md +++ b/doc/lam.1.md @@ -1,7 +1,7 @@ --- title: LAM section: 1 -source: Lambë 0.8.0 +source: Lambë 0.9.0 author: Hakim Jonas Ghoula date: May 2026 --- @@ -33,7 +33,7 @@ If no file is given, reads from standard input. : Disable pretty-printing. **-r**, **--raw** -: Output strings without quotes. +: Output top-level string scalars without quotes. No effect on structured output (objects, arrays, numbers, booleans, null) — those still serialize through the active output format. **-f**, **--format** *FMT* : Input format. One of: json, yaml, toml, hcl, csv, tsv, markdown. Auto-detected from file extension if omitted. @@ -41,11 +41,23 @@ If no file is given, reads from standard input. **-t**, **--to** *FMT* : Output format. One of: json, yaml, toml, csv, tsv, hcl. Default is json. -**--schema** -: Show the data structure with type names instead of values. +**--flatten-cells** *POLICY* +: CSV/TSV policy for non-scalar cells. **refuse** (default) rejects list- or map-valued cells with a shape error. **json** encodes them as JSON strings inline; the shape check correspondingly widens to accept any list at the root. Ignored for other output formats. + +**--schema** *PATH* +: Path to a JSON Schema subset file. Threads the declared shape through inference and **--explain**, validates data against the schema at load time (errors on concrete-type disagreement), and fills in shape details the sampled data doesn't cover (empty-list elements, optional fields). Auto-detected as a sibling **.schema.json** if omitted. Accepts **type**, **properties**, **items**, and **required**; rejects structural combinators (allOf/oneOf/$ref) and value-level constraints (minimum/pattern/enum/etc) with a per-keyword error. + +**--print-shape** +: Print the inferred shape of the data as a JSON Schema subset document. Replaces the 0.8.0 **--schema** flag with the same meaning, renamed because **--schema** now takes a path value. **--explain** -: Trace the shape of values flowing through each pipeline stage. Static analysis only; does not execute the query. Reports which output formats the final shape can be serialized as. +: Trace the shape of values flowing through each pipeline stage. Static analysis only; does not execute the query. Reports which output formats the final shape can be serialized as. Flags provably-empty filters and runtime-rejection mismatches. + +**--explain-trivial** +: Include trivial-result warnings in the explain report. Flags parameterised ops (**sort_by**, **group_by**, **map**, **unique_by**) whose argument references a field provably absent on the element shape. Implies **--explain**. + +**--explain-json** +: Emit the explain report as a JSON document instead of the text table. Useful for agent tooling or build-pipeline integration. Implies **--explain**. **--assert** : Evaluate the expression and exit with code 0 if the result is true, 1 if false. @@ -53,6 +65,12 @@ If no file is given, reads from standard input. **-i**, **--interactive** : Start the interactive REPL. Requires a file argument. +**--ndjson** +: Treat input as ndjson or jsonl: one JSON document per line, evaluated independently with no state shared between lines. Emits one compact JSON result per line on stdout. Auto-enabled when the file extension is **.ndjson** or **.jsonl**. Cannot combine with **--interactive**, **--schema**, **--print-shape**, **--assert**, or **--explain**. Output must be JSON (**--to json** or default); other **--to** values are refused. + +**-n**, **--null-input** +: Run the query against **null** context with no input. Useful for value computations like `lam -n '[1,2,3] | unique'`. Without **-n**, the missing-input guard fires (a typo'd filename or missing redirect is a common footgun); the flag makes the "I have no input" intent explicit. Cannot combine with **--interactive**, **--ndjson**, **--schema**, or **--assert**. + **-h**, **--help** : Show usage information. @@ -193,6 +211,9 @@ Computation on null throws: **null + 5** and **null > 3** are errors. **:pretty** : Toggle pretty-printing. +**:flatten-cells** *POLICY* +: Set CSV/TSV cell policy for this session. One of: refuse, json. + **:load** *file* : Load a different data file. @@ -238,9 +259,17 @@ Format conversion: lam --to yaml '.config' data.json -Schema inspection: +Shape inspection: - lam --schema deployment.yaml + lam --print-shape deployment.yaml + +Schema-checked query (validates data against the schema before running): + + lam --schema api.schema.json '.users | map(.email)' response.json + +Shape trace, schema-seeded (no data needed): + + lam --schema api.schema.json --explain '.users | map(.email)' Shape trace for a pipeline: @@ -262,6 +291,11 @@ Interactive exploration: lam -i data.json +Line-delimited JSON (logs, event streams): + + lam --ndjson '.level' events.ndjson + tail -f app.log | lam --ndjson '.user.id' + # SEE ALSO **jq**(1) — the established JSON query tool. Lambe shares its pipeline aesthetic and extends to multi-format input with shape-aware output. diff --git a/doc/non-goals.md b/doc/non-goals.md new file mode 100644 index 0000000..9dcf04e --- /dev/null +++ b/doc/non-goals.md @@ -0,0 +1,91 @@ +# Non-goals + +Lambé is a bounded tree transformer over JSON-shaped data. The list +below is what lambé deliberately does not do, and the lambé idiom that +replaces each omission where one exists. The tool is small *because* +these are excluded; staying bounded is what makes shape inference, +`--explain`, and `as(fmt)` bridging work. + +If you came from jq looking for a feature that's missing, this is the +short answer. The full migration guide is in +[jq-to-lambe.md](jq-to-lambe.md). + +## Language scope + +- **Turing-completeness** → no `def`, no recursion, no lambdas. jq has + these and regrets them: their presence is exactly what prevents + static analysis, makes error messages vague, and turns "quick query" + into "programming language to learn." Lambé's shape inference, + `--explain`, and `as(fmt)` bridging all work *because* lambé is a + bounded tree transformer. +- **User-defined functions (`def`)** → not supported. The bounded tree + transformer is the design. +- **Lambdas** → same. +- **Recursive descent (`..`)** → not supported. Compose with explicit + paths plus `flatten` / `map`. For prose extraction from markdown, + use the `text` op (the only op tuned to a specific input format's + vocabulary). For paths into structured data, use `--print-shape` to + see the structure first. +- **`.[]` iteration sugar** → list ops are list ops. Use + `.users | map(.)` instead of `.users[]`. jq's `.[]` overloads on + container type, which conflicts with lambé's shape-aware approach. +- **`try` / `catch`** → lambé's contract is "navigation returns null, + computation throws." There is no exception model in user space. Use + `// fallback` for null handling; let computation errors propagate to + the CLI. +- **`select(p)` outside `filter(...)`** → `select` is only valid as the + predicate of `filter`. `map(select(p))` is just `filter(p)`. + +## Path manipulation + +- **`paths` / `leaf_paths`** → use `--print-shape` (CLI), + `lambe_print_shape` (MCP), or `renderJsonSchema(shapeOf(value))` + (library). Structural exploration is a separate tool from query + evaluation. +- **`getpath` / `setpath`** → read-only by design. lambé does not + mutate input; it produces new values. There is no in-place update. + +## Iteration & limits + +- **`range`, `limit`, `nth`** → use slicing (`[:n]`, `[n:]`, `[a:b]`) + and `first` / `last`. These cover the common cases without + introducing iteration as a language primitive. + +## Strings + +- **Regex (`test`, `match`, `sub`, `gsub`)** → out of scope. Lambé + treats strings as opaque values. For regex, pipe through `grep` or a + regex tool before / after `lam`. +- **`@base64`, `@uri`** → not supported. Encoding is out of scope. +- **`@csv`, `@tsv`** → use `--to csv` / `--to tsv` on the CLI, or + `as(csv)` / `as(tsv)` in the query, plus `formatOutput(value, + OutputFormat.csv)` in the library. Output formatting belongs to the + format layer, not the query language. + +## Environment + +- **`env`, `$__loc__`** → not supported. Queries are pure; environment + access lives outside the query (set up via the shell). + +## Streaming + +- **Streaming evaluation** → out of scope. Two blockers: (1) half the + language (`sort`, `group_by`, `sum`, `unique`) cannot stream; + building a parallel streaming pipeline would fork the semantics. + (2) Rumil's parser uses Warth seed-growth for left recursion, which + requires re-parsing a prefix as a seed grows; a streaming parser + cannot rewind buffers it has already discarded. This is algorithmic, + not a tuning knob. For the "tail a log file" use case, `--ndjson` + evaluates one document per line with no shared state. + +## Format-specific + +- **HCL evaluation** → lambé reads HCL syntax (parses Terraform `.tf` + files, surfaces blocks and attributes), but does NOT evaluate + Terraform expressions. Variable resolution, function calls, + `for` expressions, splats, and conditionals serialise back to their + source form. Use Terraform's own tooling for evaluation. +- **XML** → temporarily out of scope. The 0.4.0 release dropped XML + because the projection was lossy; see CHANGELOG. A future release + may reintroduce it once the array-preserved-siblings projection is + designed. diff --git a/doc/recipes.md b/doc/recipes.md index 3cfeb27..8df420d 100644 --- a/doc/recipes.md +++ b/doc/recipes.md @@ -127,6 +127,81 @@ $ lam '. | map(.count | to_number) | max' inventory.csv 942 ``` +## Markdown + +Extract heading text (`text` walks the node tree, so it handles +emphasis, inline code, links, and nested formatting): + +```bash +$ lam '.children | filter(.type == "heading") | map(text)' README.md +``` + +Headings paired with their level: + +```bash +$ lam '.children | filter(.type == "heading") | map({level, text: text})' README.md +``` + +Plain text from each paragraph: + +```bash +$ lam '.children | filter(.type == "paragraph") | map(text)' doc.md +``` + +Code-block contents by language: + +```bash +$ lam '.children | filter(.type == "code_block" && .language == "python") | map(.code)' tutorial.md +``` + +Full document prose, no markup: + +```bash +$ lam '. | text' README.md +``` + +### Querying a CHANGELOG + +A release notes file follows a recurring shape: H2 per release, H3 per +subsection. The same `text` op recovers the release names regardless of +inline formatting. + +Every release version: + +```bash +$ lam '.children | filter(.type == "heading" and .level == 2) | map(text)' CHANGELOG.md +[ + "0.9.0", + "0.8.0", + "0.7.1" +] +``` + +Latest release name: + +```bash +$ lam '.children | filter(.type == "heading" and .level == 2) | map(text) | first' CHANGELOG.md +"0.9.0" +``` + +Every subsection title (informational; structure under each release): + +```bash +$ lam '.children | filter(.type == "heading" and .level == 3) | map(text)' CHANGELOG.md +``` + +Check for duplicate release entries (returns `true` when none): + +```bash +$ lam '.children | filter(.type == "heading" and .level == 2) | map(text) | length == (.children | filter(.type == "heading" and .level == 2) | map(text) | unique | length)' CHANGELOG.md +true +``` + +These same queries are gated by `--assert` in `tool/lint_changelog.sh`, +which CI runs on every push: lambé itself validates lambé's release +notes, parsed by lambé's own Markdown parser. Real-world example of the +pattern. + ## TOML (Rust, Python config) Get a dependency version from Cargo.toml: @@ -336,6 +411,72 @@ $ lam '.spec.template.spec' deployment.yaml $ lam -i deployment.yaml ``` +## Bridging shapes to output formats with `as(fmt)` + +Some output formats restrict the root shape: TOML and HCL want a map +at the top level; CSV and TSV want a list of records. When the +pipeline produces something else, `as(fmt)` applies a curated bridge +so the value fits. + +There are four canonical bridges. All four are reachable via `as(...)` +or via the CLI's `--to` flag with `--flatten-cells refuse` (the +default). + +### `list | as(toml)` and `as(hcl)` + +Wrap a list under a single `items` key. + +``` +$ lam -n --to toml '["a", "b", "c"] | as(toml)' +items = ["a", "b", "c"] + + +$ lam -n --to hcl '["a", "b"] | as(hcl)' +items = ["a", "b"] +``` + +### `scalar | as(toml)` and `as(hcl)` + +Wrap a scalar under a single `value` key. + +``` +$ lam -n --to toml '"hello" | as(toml)' +value = "hello" + + +$ lam -n --to hcl '"hello" | as(hcl)' +value = "hello" +``` + +### `map | as(csv)` and `as(tsv)` + +Convert a map to a two-column key/value list of records via +`to_entries`. + +``` +$ lam -n --to csv '{a: 1, b: 2} | as(csv)' +key,value +a,1 +b,2 +``` + +### `scalar | as(csv)` and `as(tsv)` + +Compose: wrap the scalar under `value`, then `to_entries`. The +result is a one-row CSV with a `key`/`value` header. + +``` +$ lam -n --to csv '42 | as(csv)' +key,value +value,42 +``` + +### When `as(fmt)` does nothing + +A shape that already satisfies the format's requirement passes +through unchanged: `map | as(toml)` is identity, as is `list | +as(csv)`. The bridge fires only when there's a real mismatch. + ## Next steps - [Getting started](getting-started.md) for installation diff --git a/doc/schema-design.md b/doc/schema-design.md new file mode 100644 index 0000000..6417b94 --- /dev/null +++ b/doc/schema-design.md @@ -0,0 +1,366 @@ +# Schema-typed queries — design rationale + +The decisions behind the 0.9.0 schema feature. User-facing documentation +is in [doc/schema.md](schema.md); this file records *why* the design is +what it is, for contributors and curious readers. + +## Context + +0.9.0 completes the shape feedback loop: declare a shape, check queries +against it, round-trip with JSON Schema tooling. The schema feature is +the piece that lets Lambe's shape system act as a contract between the +tool and its users' data. + +The positioning is *"a query language for structured data that shows +you what you're working with."* Schemas are how a user tells Lambe +what they're working with when the data alone doesn't say enough +(empty lists, optional fields, heterogeneous sampling) — and how +Lambe tells the user, statically, whether their query makes sense +against that contract. + +## Non-goals + +- **No value-level constraints.** `minimum`, `maximum`, `pattern`, + `enum`, `format`, `minLength`, `maxLength` are rejected at + schema-load time with a one-line per-keyword error. Lambe is a + query tool that understands shape, not a constraint system. Users + who want value validation reach for ajv, check-jsonschema, or CUE. +- **No conditional schemas** (`if`/`then`/`else`, `dependencies`, + `allOf`/`oneOf`/`anyOf`/`not`). These introduce a constraint solver + and break the bounded-tree-transformer promise. +- **No external `$ref` resolution.** Schemas are single-file. +- **No runtime coercion.** A schema saying `age: number` does not + cause CSV's `"30"` string to be parsed as a number at query time. + The user still writes `.age | to_number`. +- **No pure-validation CLI command** (`lam --validate`). A user who + wants to validate data against a schema can write + `lam --schema s.json '.' data.yaml`. If data violates the schema, + the load fails with a structural error. That's enough. + +## Design decisions + +### 1. Schema format: JSON Schema subset + +Accept JSON files that describe a shape using four JSON Schema +keywords: `type`, `properties`, `items`, `required`. + +**Chosen over a custom Lambe DSL because:** + +- Ecosystem leverage. JSON Schema is what users already have — + OpenAPI specs, pub.dev metadata, IDE validators, CI linters all + emit or consume it. Zero authoring cost for users with an existing + schema. +- `rumil_parsers.parseJson` does the parse for free, with typed + errors and line/column locations. The "walk `JsonValue` → build + `Shape`" layer is ~50 lines of exhaustive switch. +- JSON Schema as the ecosystem's lingua franca for structural + description is a fact. A Lambe-specific DSL would be one more + thing to learn with no reciprocal win. + +**Accepted keywords and their mapping:** + +| JSON Schema | Maps to | +|-------------|---------| +| `{"type": "null"}` | `SNull` | +| `{"type": "boolean"}` | `SBool` | +| `{"type": "number"}` or `"integer"` | `SNum` | +| `{"type": "string"}` | `SString` | +| `{"type": "array", "items": S}` | `SList(parse(S))` | +| `{"type": "object", "properties": P}` | `SMap({...})` with each property recursively parsed | +| `"required": [names]` on an object | Non-listed properties become `SOptional` in the `SMap` | + +**Rejected keywords** produce a clear per-keyword error pointing at +the source location: + +- `minimum`, `maximum`, `exclusiveMinimum`, `exclusiveMaximum` +- `multipleOf` +- `minLength`, `maxLength`, `pattern`, `format` +- `minItems`, `maxItems`, `uniqueItems`, tuple-form `items` +- `minProperties`, `maxProperties`, `additionalProperties`, + `patternProperties`, `propertyNames` +- `const`, `enum` +- `allOf`, `oneOf`, `anyOf`, `not` +- `if`/`then`/`else`, `dependencies` +- `$ref`, `$defs`, `definitions`, `$schema`, `$id` + +**Unknown keywords** are ignored (JSON Schema's extensibility +convention). A schema with `"description"` or `"title"` is fine — +those are metadata that don't affect shape. + +### 2. `SOptional(Shape)` variant + +Adding a new sealed variant to `Shape`: + +```dart +/// A value that may be absent. Used for JSON Schema properties not +/// listed in `required`, and for other cases where optionality is +/// statically known. +final class SOptional extends Shape { + final Shape inner; + const SOptional(this.inner); + // == / hashCode / toString +} +``` + +**What this gives us:** + +- JSON Schema's `required` semantics correctly map to the shape + system. Schemas ship honestly or not at all. +- `--explain` can point out "this field is optional; `.age + 5` may + throw at runtime on rows without `age`." +- `SMap` field shapes can carry `SOptional(...)` to represent + "declared but optional." + +**What this costs:** + +- Every exhaustive `switch` on `Shape` gets a new case. The Dart + compiler finds them all. Expected sites: `pipe_ops.dart` predicates, + `inferShape`, `renderShape`, `shapeToJson`, `canWriteAs` + requirements, `check.dart` hints. +- Op acceptance semantics: for a list pipe op (like `filter`), an + `SOptional>` input means "might be a list, might not be." + The op accepts (treating as the inner `SList`), but a + runtime-rejection warning fires: "this may be absent; guard with a + null check." + +**What this preserves:** + +- **Termination.** `SOptional` lives in the shape ADT, not the query + language. Query evaluation semantics unchanged. +- **The bounded-language contract.** No new query operators. +- **The "narrow on purpose" scope.** The analyzer gets richer; the + language surface is unchanged. + +### 3. Disagreement semantics: schema augments data + +When `--schema` is provided AND data is present, the initial shape +for `inferShape` is `mergeSchemaWithData(schemaShape, shapeOf(data))`. + +Merge rules: + +- Both agree on a concrete type: use that type. +- Schema has a field, data doesn't: use schema's shape. +- Data has a field, schema doesn't: use data's shape. +- Schema marks a field optional, data has it: field is present; + outer `SOptional` wrapper is stripped at the merged point. +- Schema and data disagree on a concrete type at any path: **error + at load time** with a diagnostic showing the path, expected, and + actual. +- Empty-list element shapes always take the schema's element if one + is declared. `shapeOf([])` = `SList(SAny)`, schema + `list` → result is `SList`. + +**Rationale:** the value proposition of `--explain` is "what it +says is what will happen." A schema-wins policy would make +`--explain` lie whenever schema and data diverge. Error-on-conflict +keeps `--explain` honest. The merge preserves the case where schema +adds information (optionality, empty-list elements) without +overriding data. + +**This gives structural validation as a side effect.** A user +running `lam --schema api.json '.' response.json` whose response +doesn't match the schema gets a load-time error naming the path and +types. No separate `--validate` mode needed. + +### 4. CLI surface + +**Rename `--schema` to `--print-shape`.** The existing `--schema` +flag prints the inferred shape of data; its semantics are really +"print the shape you'd infer." Renaming aligns the flag names with +their verbs. + +```bash +# 0.8.0 (old) +lam --schema data.json # prints the inferred shape + +# 0.9.0 (new) +lam --print-shape data.json # prints the inferred shape (as JSON Schema) +lam --schema spec.json data.json # uses spec.json as the input schema +lam --schema spec.json 'query' # schema-only (no data) +lam --schema spec.json --explain 'q' # trace a query against the schema +``` + +**Auto-detection:** if `--schema` is not passed and a file named +`.schema.json` exists next to the data file, use it +implicitly. Consistent with the `.ndjson` auto-detection shipped in +track C. + +**`--print-shape` output is JSON Schema.** Round-trips with +`--schema` input: + +```bash +lam --print-shape data.json > data.schema.json +# edit data.schema.json as needed +lam --schema data.schema.json query.lam data.json +``` + +This replaces the current type-name-string JSON output. **Breaking +change**, documented in CHANGELOG. + +**REPL additions:** + +- `:schema ` — load a schema for this session. +- `:schema` — show the active schema (if any). +- `:print-shape` — print the inferred shape of the currently loaded + data, in JSON Schema form. + +**JSON-Schema-looking reject:** if `--schema` is passed a file with +no recognized content (empty, random text, HTML, etc.), error with +a clear message. If it contains unsupported JSON Schema features, +error per feature. If it's valid JSON but not a schema (a bare +number, a plain object without `type`/`properties`), error with +"schema root must declare a shape (use `{\"type\": ...}`)." + +### 5. MCP integration + +The `lambe_query` tool gains an optional `schema` parameter: a JSON +string containing the schema. Threaded through like `flatten_cells`. + +The `lambe_schema` MCP tool is renamed to `lambe_print_shape` for +consistency. Returns the shape as JSON Schema. (Agents that were +calling `lambe_schema` get a clear deprecation: tool not found, +suggest `lambe_print_shape`.) + +New MCP tool: `lambe_check` — takes `schema` and `data`, returns +`{ok: true}` or `{ok: false, errors: [...]}`. This is structural +validation on demand, using the same `mergeSchemaWithData` logic. +Useful for agents verifying they have the right fixtures before +running queries. + +### 6. Library surface + +New module: `lib/src/schema/parser.dart` + +```dart +/// Parse a JSON Schema subset into a [Shape]. +/// +/// Accepts a subset of JSON Schema: `type`, `properties`, `items`, +/// `required`. Rejects value-level constraints and structural +/// combinators; see doc/schema-design.md for the full list. +/// +/// Throws [QueryError] with a line-aware diagnostic on parse error. +Shape parseJsonSchema(String source); +``` + +New module: `lib/src/schema/loader.dart` + +```dart +/// Load a schema from a file path, auto-detecting siblings if +/// [explicitPath] is null. +Shape? loadSchema({String? explicitPath, String? dataPath}); + +/// Merge a schema shape with an observed data shape per the rules +/// in doc/schema-design.md section 3. Throws [QueryError] on concrete- +/// type disagreement. +Shape mergeSchemaWithData(Shape schema, Shape dataShape); + +/// Render a [Shape] as a JSON Schema document. +/// +/// Round-trips with [parseJsonSchema] — parsing the output of +/// `renderJsonSchema(s)` yields a shape equal to `s`. +String renderJsonSchema(Shape shape); +``` + +Library barrel exports: `parseJsonSchema`, `loadSchema`, +`mergeSchemaWithData`, `renderJsonSchema`, `SOptional`. + +Existing APIs (`explain`, `inferShape`, `canWriteShapeAs`, +`renderExplain`, `renderExplainJson`, `shapeToJson`) are unchanged +in signature. `SOptional` propagates through them naturally via the +exhaustive-switch update. + +### 7. Interaction with existing 0.9.0 features + +- **ndjson**: `lam --ndjson --schema line.schema.json query file.ndjson` + threads the schema as each line's initial shape. No new design. +- **`--flatten-cells json`**: schema-aware. Nested-list cells still + refuse by default; `--flatten-cells json` still widens writer + acceptance. Schema provides richer element shape for CSV writers. +- **`--explain-trivial`**: a schema-provided optional field accessed + without a null guard still triggers the runtime-rejection warning + even under `--explain-trivial`. Trivial-result detection benefits + from schema: `sort_by(.missing)` becomes provably missing when the + schema doesn't declare it. +- **Hints**: `mergeSchemaWithData` errors populate `hints` where a + CLI flag would resolve the conflict. For 0.9.0, no such flags + exist, so `hints` stays empty on schema errors. + +### 8. Grammar of the accepted JSON Schema subset + +``` +schema := object_schema + | array_schema + | scalar_schema + +scalar_schema := {"type": "null"} + | {"type": "boolean"} + | {"type": "number"} + | {"type": "integer"} # same as number, per lambe + | {"type": "string"} + +array_schema := {"type": "array", "items": } + +object_schema := {"type": "object", + "properties": {: , ...}, + "required": [, ...]?} +``` + +Keywords outside this grammar (but not in the explicit reject list) +are ignored as metadata. Reject-list violations are errors. + +## Implementation plan + +~1 week. Order: + +1. **`SOptional` variant.** Add to `shape.dart`. Run the analyzer; + fix every exhaustive-switch compile error. Each fix is local: + - `renderShape`: `optional`. + - `shapeToJson`: `{"kind": "optional", "inner": ...}`. + - `canWriteAs` requirements: optional unwraps to inner for + writability purposes, except for TOML/HCL where "optional at + root" is unwritable. + - `inferShape`: field access on `SMap` with optional field yields + `SOptional`. Subsequent ops propagate or strip as + appropriate. + - `pipe_ops.dart` predicates: optional is accepted wherever + inner is, but a runtime-rejection warning is emitted. +2. **Parser** (`lib/src/schema/parser.dart`): walk `JsonValue`, + recursive. Line-aware errors via `rumil_parsers.parseJson` error + positions. +3. **Loader + merge** (`lib/src/schema/loader.dart`): file reader, + sibling auto-detect, `mergeSchemaWithData` with diagnostic errors. +4. **Renderer** (`lib/src/schema/render.dart` or inline): shape → + JSON Schema. Used by `--print-shape`. +5. **CLI** (`bin/lam.dart`): rename flag, add option, thread + through explain and evaluation paths. +6. **REPL** (`lib/src/repl.dart`): `:schema`, `:print-shape`. +7. **MCP** (`bin/mcp_server.dart`): `schema` param, `lambe_check` + tool, `lambe_schema` → `lambe_print_shape` rename. +8. **Tests**: + - `test/schema_parser_test.dart`: every shape constructor, + `required` semantics, unknown-keyword tolerance, rejected- + keyword errors, round-trip with `renderJsonSchema`. + - `test/schema_loader_test.dart`: sibling auto-detect, merge + rules, disagreement errors, validation-as-side-effect. + - Extend `test/cli_integration_test.dart`: `--schema`, + `--print-shape`, rename rejection error. + - Extend `test/shape_explain_test.dart`: schema-seeded explain + reports. +9. **Docs**: + - `doc/schema.md`: user-facing guide with examples. + - `doc/lam.1.md`: `--schema`, `--print-shape`. + - `CHANGELOG.md`: Added bullets + Breaking callout for rename. + - `README.md`: reframe to the shape-feedback-loop pitch (held + until all tracks land). + +## Open decisions + +- **MCP tool rename `lambe_schema` → `lambe_print_shape`.** Strictly + speaking, backward-compatible would keep the old name. Renaming + aligns with the CLI rename. Lean: rename. Any agent with the old + name gets a clear "tool not found" and can update. + +- **Auto-detect behavior when both `--schema ` and a sibling + `.schema.json` exist.** Explicit wins. + +These are resolved; calling them out for the record. diff --git a/doc/schema.md b/doc/schema.md new file mode 100644 index 0000000..4786512 --- /dev/null +++ b/doc/schema.md @@ -0,0 +1,186 @@ +# Lambe schemas + +Lambe supports a JSON Schema subset as the contract between a query and its data. Declare the shape once; let Lambe check that queries make sense against it, validate data conforms at runtime, and round-trip schemas with the rest of the ecosystem. + +## Why use a schema? + +Lambe's default inference samples the data at hand. That's robust for known inputs but has gaps: + +- **Empty lists and maps.** `shapeOf([])` returns `list`; the element type is lost. +- **Mixed sampling.** Lists with heterogeneity beyond the sampling window collapse to `list`. +- **Queries without data.** CI planning, design documents, `--explain` without a file — no data to sample, no precision. + +A schema fills those in. `--explain` shows a sharper trace, errors fire earlier, and you can validate data against the shape before running anything. + +## Accepted JSON Schema subset + +Four keywords. That's it. + +| Keyword | Meaning | +|---|---| +| `type` | `"null"`, `"boolean"`, `"number"`, `"integer"`, `"string"`, `"array"`, `"object"` | +| `properties` | Map of field name → nested schema (for `object`) | +| `items` | Element schema (for `array`) | +| `required` | List of required property names (for `object`) | + +The empty object `{}` means "any shape" — JSON Schema's convention, preserved through round-trip. + +Unknown keywords are ignored (JSON Schema's extensibility rule), so `$schema`, `$id`, `title`, `description`, and other metadata flow through without complaint. + +## Rejected keywords + +Everything else is rejected with a per-keyword error and a JSON path: + +- **Value-level constraints** (`minimum`, `maximum`, `minLength`, `maxLength`, `pattern`, `enum`, `const`, `format`, `multipleOf`, `minItems`, `maxItems`, `uniqueItems`, `minProperties`, `maxProperties`). Lambe is a shape system, not a value validator. +- **Structural combinators** (`allOf`, `oneOf`, `anyOf`, `not`). The shape ADT is union-free by design. +- **Conditionals** (`if`, `then`, `else`, `dependencies`, `dependentRequired`, `dependentSchemas`). Would require a constraint solver. +- **References** (`$ref`, `$defs`, `definitions`). Schemas are single-file in 0.9. +- **Object constraints** (`additionalProperties`, `patternProperties`, `propertyNames`). + +If you have a richer schema, strip it down or run it through `ajv`/`check-jsonschema` for value validation separately. + +## Example schemas + +Simple: + +```json +{"type": "string"} +``` + +List of strings: + +```json +{"type": "array", "items": {"type": "string"}} +``` + +Object with required and optional fields: + +```json +{ + "type": "object", + "properties": { + "name": {"type": "string"}, + "age": {"type": "number"}, + "email": {"type": "string"} + }, + "required": ["name", "age"] +} +``` + +In Lambe's shape language, that last one is `map>`. + +## How Lambe uses your schema + +### CLI + +```bash +# Thread schema into --explain: shape trace reflects declared optionality +lam --schema api.schema.json --explain '.users | map(.email)' response.json + +# With data: schema validates at load time. Disagreement exits 1. +lam --schema api.schema.json '.users' response.json + +# Without data: schema alone is the initial shape (design-time planning) +lam --schema api.schema.json --explain '.users | map(.email)' +``` + +### Sibling auto-detect + +If you have `data.json` and `data.schema.json` side-by-side, `lam` picks up the schema implicitly: + +```bash +lam '.users' data.json # data.schema.json used automatically if present +``` + +Same convention as `.ndjson` auto-detect. An explicit `--schema ` overrides the sibling. + +### REPL + +``` +lambe> :schema api.schema.json +Schema loaded (agrees with current data). +lambe> :schema +{...prints the loaded schema as JSON Schema...} +lambe> :load other-data.json +Warning: data disagrees with active schema: schema disagreement at $.users: ... +lambe> :print-shape +{...prints the inferred shape of currently loaded data...} +``` + +### MCP + +Three tools cover the schema story for agents: + +- `lambe_print_shape` — takes data, returns its JSON Schema. +- `lambe_check` — takes schema + data, returns `{"ok": true}` or `{"ok": false, "error": "..."}`. +- `lambe_query` — takes an optional `schema` parameter that validates data before running the query. +- `lambe_explain` — takes an optional `schema` parameter; the explain trace reflects it. + +### Library + +```dart +import 'package:lambe/lambe.dart'; + +// Parse a schema string +final schema = parseJsonSchema(schemaText); + +// Load from a file (throws QueryError on missing/invalid) +final schema2 = loadSchemaFromFile('api.schema.json'); + +// Merge with observed data (throws on disagreement) +final merged = mergeSchemaWithData(schema, shapeOf(data)); + +// Emit a schema from a shape +final schemaText2 = renderJsonSchema(shape); +``` + +## Disagreement semantics + +When schema and data are both present, Lambe merges them: + +- **Both agree on a concrete type.** Use that type. +- **Schema has a field data doesn't.** Use the schema's shape for that field. +- **Data has a field schema doesn't.** Use data's shape. +- **Schema marks a field optional, data has it present.** Strip the `optional` wrapper for this run. +- **Concrete-type disagreement** (schema: `number`, data: `string`). Error at load time with a JSON path. + +The error path is designed to be actionable: + +``` +Error: schema disagreement at $.users[*].age: schema says number, data is string +``` + +Merge is the heart of why schemas matter: `--explain` stays honest (what it says is what will happen, because data and schema agree), and validation falls out as a side effect of loading. + +## Round-trip + +```bash +lam --print-shape data.json > data.schema.json # Shape -> JSON Schema +lam --schema data.schema.json '.' data.json # JSON Schema -> Shape +``` + +Round-trip invariant: `parseJsonSchema(renderJsonSchema(shape))` equals `shape` for every shape reachable through `parseJsonSchema`. Pinned by 12 representative cases in `test/schema_renderer_test.dart`. + +Lossy corner: `SOptional` inside a list's `items` or at the top level has no standard JSON Schema spelling in our subset. The renderer flattens those positions. `SOptional` inside an `SMap` field — the common case — round-trips faithfully via `required`. + +## What schemas don't do + +- **No value coercion.** Schema says `age: number`, data has `"30"`. Lambe does not parse the string at query time. The user still writes `.age | to_number`. A future release may add opt-in coercion. +- **No runtime constraints.** Schema saying `age` is `number` does not enforce `age >= 0` or `age <= 150` at query time. Value-level constraints are rejected from the schema at load time. +- **No schema composition.** `$ref` is rejected. For cross-file schemas, merge them yourself before pointing `--schema` at the result. +- **No runtime validation after load.** A CSV column with mixed strings and numbers won't surface at per-row granularity; we check the aggregate shape, not every value. + +## `shapeOf` vs schema + +Different tools for different jobs: + +| | `shapeOf(data)` | Schema | +|---|---|---| +| Source of truth | This particular dataset | The contract | +| Sees empty lists as | `list` | Declared element type | +| Handles mixed lists | Collapses to `list` | Declared element type | +| Available when data is absent | No | Yes | +| Sees optionality | No | Via `required` | +| Validates | N/A | Yes (at load time) | + +Use both when you can — `mergeSchemaWithData` is the merge function designed for this. Schema augments; data fills in extras; disagreement errors. diff --git a/doc/syntax.md b/doc/syntax.md index c28196e..37f7fc5 100644 --- a/doc/syntax.md +++ b/doc/syntax.md @@ -2,7 +2,7 @@ The complete Lambë query language. Every feature, with input and output examples. -All examples use this data unless stated otherwise: +All examples use this data unless stated otherwise. Save it as `data.json`: ```json { @@ -20,6 +20,8 @@ All examples use this data unless stated otherwise: } ``` +Examples that don't reference input data use `lam -n` (null input). + ## Data model Lambë operates on JSON-compatible values: maps (objects), lists (arrays), strings, numbers, booleans, and null. @@ -30,196 +32,258 @@ All input formats (YAML, TOML, HCL, CSV, TSV, Markdown) are converted to this mo `.` returns the current value unchanged. -``` -. --> (the entire document) +```bash +$ lam '.' data.json +# (the entire document, pretty-printed) ``` ## Field access `.field` accesses a named field on a map. -``` -.version --> "1.0.0" +```bash +$ lam '.version' data.json +"1.0.0" -.config.database.host --> "localhost" +$ lam '.config.database.host' data.json +"localhost" ``` Accessing a field that doesn't exist returns `null`: -``` -.missing --> null +```bash +$ lam '.missing' data.json +null -.missing.nested --> null +$ lam '.missing.nested' data.json +null ``` ## Indexing `[n]` indexes into a list. Zero-based. Negative indices count from the end. -``` -.users[0] --> {"name": "Alice", "age": 25, "active": true} +```bash +$ lam '.users[0]' data.json +{ + "name": "Alice", + "age": 25, + "active": true +} -.users[-1].name --> "Carol" +$ lam '.users[-1].name' data.json +"Carol" -.tags[1] --> "v1" +$ lam '.tags[1]' data.json +"v1" ``` Out-of-bounds returns `null`: -``` -.users[99] --> null +```bash +$ lam '.users[99]' data.json +null ``` ## Slicing `[start:end]` extracts a sub-list. Start is inclusive, end is exclusive. -``` -.tags[0:2] --> ["api", "v1"] +```bash +$ lam '.tags[0:2]' data.json +[ + "api", + "v1" +] -.tags[:2] --> ["api", "v1"] +$ lam '.tags[:2]' data.json +[ + "api", + "v1" +] -.tags[1:] --> ["v1", "stable"] +$ lam '.tags[1:]' data.json +[ + "v1", + "stable" +] -.tags[:-1] --> ["api", "v1"] +$ lam '.tags[:-1]' data.json +[ + "api", + "v1" +] ``` Slicing works on strings too: -``` -.version[0:1] --> "1" +```bash +$ lam '.version[0:1]' data.json +"1" ``` ## Arithmetic `+`, `-`, `*`, `/`, `%` on numbers. -``` -.users[0].age + 10 --> 35 +```bash +$ lam '.users[0].age + 10' data.json +35 -.users[0].age * 2 --> 50 +$ lam '.users[0].age * 2' data.json +50 -.config.database.port % 100 --> 32 +$ lam '.config.database.port % 100' data.json +32.0 ``` Using arithmetic on null throws an error: -``` -.missing + 5 --> Error: +: expected number, got null +```bash +$ lam '.missing + 5' data.json +Error: +: expected number, got null ``` ## Comparison `<`, `>`, `<=`, `>=` compare numbers. `==`, `!=` compare any type with deep equality. -``` -.users[0].age > 30 --> false +```bash +$ lam '.users[0].age > 30' data.json +false -.version == "1.0.0" --> true +$ lam '.version == "1.0.0"' data.json +true -.config.debug != true --> true +$ lam '.config.debug != true' data.json +true ``` Comparing null throws (except for `==` and `!=`): -``` -.missing > 5 --> Error: >: expected number, got null +```bash +$ lam '.missing > 5' data.json +Error: >: expected number, got null -.missing == null --> true +$ lam '.missing == null' data.json +true ``` ## Boolean logic `&&`, `||`, `!` with short-circuit evaluation. -``` -.users[0].active && .users[0].age < 30 --> true +```bash +$ lam '.users[0].active && .users[0].age < 30' data.json +true -!.config.debug --> true +$ lam '!.config.debug' data.json +true ``` ## String literals Double-quoted. Supports `\"`, `\\`, `\n`, `\t`. -``` -.users | filter(.name == "Alice") | length --> 1 +```bash +$ lam '.users | filter(.name == "Alice") | length' data.json +1 ``` ## String interpolation `\(expr)` inside a string evaluates the expression and inserts the result. -``` -.users | map("\(.name) is \(.age)") --> ["Alice is 25", "Bob is 35", "Carol is 42"] +```bash +$ lam '.users | map("\(.name) is \(.age)")' data.json +[ + "Alice is 25", + "Bob is 35", + "Carol is 42" +] ``` ## Object construction Build new maps from the current context. `{name}` expands to `{name: .name}`. +```bash +$ lam '.users[0] | {name, age}' data.json +{ + "name": "Alice", + "age": 25 +} + +$ lam '.users | map({name, senior: .age > 40})' data.json +[ + { + "name": "Alice", + "senior": false + }, + { + "name": "Bob", + "senior": false + }, + { + "name": "Carol", + "senior": true + } +] ``` -.users[0] | {name, age} --> {"name": "Alice", "age": 25} -.users | map({name, senior: .age > 40}) --> [ - {"name": "Alice", "senior": false}, - {"name": "Bob", "senior": false}, - {"name": "Carol", "senior": true} - ] +Keys that are valid identifiers use the bare form (`name:`); keys that +are not (hyphenated, spaces, leading digits) use a JSON-string literal +in key position. Both spellings produce identical maps. + +```bash +$ lam '{"x-axis": .config.database.port, "y-axis": .users[0].age}' data.json +{ + "x-axis": 5432, + "y-axis": 25 +} + +$ lam '{name: .users[0].name, "Content-Type": "application/json"}' data.json +{ + "name": "Alice", + "Content-Type": "application/json" +} ``` +Interpolation (`"\(expr)"`) is not allowed in key position — build +dynamic keys via `from_entries` on a list of `{key, value}` maps. The +shorthand form `{name}` only applies to bare identifiers; `{"name"}` +on its own is not supported. + ## Conditionals `if condition then value else value`. The condition must evaluate to a boolean. -``` -.users | map(if .age > 40 then "senior" else "junior") --> ["junior", "junior", "senior"] +```bash +$ lam '.users | map(if .age > 40 then "senior" else "junior")' data.json +[ + "junior", + "junior", + "senior" +] ``` ## Pipelines `|` passes the left side's result into the right side's operation. -``` -.users | filter(.active) | sort_by(.age) | map(.name) --> ["Alice", "Carol"] +```bash +$ lam '.users | filter(.active) | sort_by(.age) | map(.name)' data.json +[ + "Alice", + "Carol" +] ``` Pipelines bind tighter than binary operators: -``` -.tags | length > 0 --> true +```bash +$ lam '.tags | length > 0' data.json +true ``` This parses as `(.tags | length) > 0`, not `.tags | (length > 0)`. @@ -230,180 +294,271 @@ This parses as `(.tags | length) > 0`, not `.tags | (length > 0)`. Keep elements where the predicate is true. -``` -.users | filter(.age > 30) --> [{"name": "Bob", ...}, {"name": "Carol", ...}] - -.users | filter(.active && .age < 40) --> [{"name": "Alice", "age": 25, "active": true}] +```bash +$ lam '.users | filter(.age > 30)' data.json +[ + { + "name": "Bob", + "age": 35, + "active": false + }, + { + "name": "Carol", + "age": 42, + "active": true + } +] + +$ lam '.users | filter(.active && .age < 40)' data.json +[ + { + "name": "Alice", + "age": 25, + "active": true + } +] ``` ### map(expression) Transform each element. -``` -.users | map(.name) --> ["Alice", "Bob", "Carol"] +```bash +$ lam '.users | map(.name)' data.json +[ + "Alice", + "Bob", + "Carol" +] -.users | map(.age * 2) --> [50, 70, 84] +$ lam '.users | map(.age * 2)' data.json +[ + 50, + 70, + 84 +] ``` ### sort Sort elements by natural order. -``` -.tags | sort --> ["api", "stable", "v1"] +```bash +$ lam '.tags | sort' data.json +[ + "api", + "stable", + "v1" +] ``` ### sort_by(key) Sort elements by a key expression. -``` -.users | sort_by(.age) --> [Alice (25), Bob (35), Carol (42)] - -.users | sort_by(.name) | map(.name) --> ["Alice", "Bob", "Carol"] +```bash +$ lam '.users | sort_by(.age) | map(.name)' data.json +[ + "Alice", + "Bob", + "Carol" +] ``` ### group_by(key) Group elements by a key. Returns `[{key, values}]`. -``` -.users | group_by(.active) --> [ - {"key": true, "values": [Alice, Carol]}, - {"key": false, "values": [Bob]} - ] +```bash +$ lam '.users | group_by(.active)' data.json +[ + { + "key": true, + "values": [ + { + "name": "Alice", + "age": 25, + "active": true + }, + { + "name": "Carol", + "age": 42, + "active": true + } + ] + }, + { + "key": false, + "values": [ + { + "name": "Bob", + "age": 35, + "active": false + } + ] + } +] ``` ### unique Remove duplicate values. -``` -[1, 2, 2, 3, 1] | unique --> [1, 2, 3] +```bash +$ lam -n '[1, 2, 2, 3, 1] | unique' +[ + 1, + 2, + 3 +] ``` ### unique_by(key) Remove duplicates by a key expression. -``` -.users | unique_by(.active) | map(.name) --> ["Alice", "Bob"] +```bash +$ lam '.users | unique_by(.active) | map(.name)' data.json +[ + "Alice", + "Bob" +] ``` ### flatten Flatten one level of nesting. -``` -[[1, 2], [3, 4], [5]] | flatten --> [1, 2, 3, 4, 5] +```bash +$ lam -n '[[1, 2], [3, 4], [5]] | flatten' +[ + 1, + 2, + 3, + 4, + 5 +] ``` ### reverse Reverse the order. -``` -.tags | reverse --> ["stable", "v1", "api"] +```bash +$ lam '.tags | reverse' data.json +[ + "stable", + "v1", + "api" +] ``` ### keys Map keys or list indices. -``` -.config | keys --> ["database", "debug"] +```bash +$ lam '.config | keys' data.json +[ + "database", + "debug" +] -.tags | keys --> [0, 1, 2] +$ lam '.tags | keys' data.json +[ + 0, + 1, + 2 +] ``` ### values Map values (identity for lists). -``` -.config.database | values --> ["localhost", 5432] +```bash +$ lam '.config.database | values' data.json +[ + "localhost", + 5432 +] ``` ### length Length of a list, map, or string. -``` -.users | length --> 3 +```bash +$ lam '.users | length' data.json +3 -.version | length --> 5 +$ lam '.version | length' data.json +5 ``` ### first, last First or last element of a list. -``` -.users | first | .name --> "Alice" +```bash +$ lam '.users | first | .name' data.json +"Alice" -.tags | last --> "stable" +$ lam '.tags | last' data.json +"stable" ``` ### sum, avg, min, max Aggregate operations on numeric lists. -``` -.users | map(.age) | sum --> 102 +```bash +$ lam '.users | map(.age) | sum' data.json +102 -.users | map(.age) | avg --> 34.0 +$ lam '.users | map(.age) | avg' data.json +34.0 -.users | map(.age) | min --> 25 +$ lam '.users | map(.age) | min' data.json +25 -.users | map(.age) | max --> 42 +$ lam '.users | map(.age) | max' data.json +42 ``` ### has(key) Check if a map contains a key. -``` -.config | has("database") --> true +```bash +$ lam '.config | has("database")' data.json +true -.config | has("missing") --> false +$ lam '.config | has("missing")' data.json +false ``` ### to_entries, from_entries Convert between maps and `[{key, value}]` lists. -``` -.config.database | to_entries --> [{"key": "host", "value": "localhost"}, {"key": "port", "value": 5432}] +```bash +$ lam '.config.database | to_entries' data.json +[ + { + "key": "host", + "value": "localhost" + }, + { + "key": "port", + "value": 5432 + } +] -[{"key": "a", "value": 1}] | from_entries --> {"a": 1} +$ lam -n '[{key: "a", value: 1}] | from_entries' +{ + "a": 1 +} ``` ### to_number @@ -413,12 +568,18 @@ Parse a string as a number. Pass-through for existing numbers. CSV and TSV cells are strings by default; use `to_number` to coerce them before arithmetic. -``` -"42" | to_number -> 42 -"3.14" | to_number -> 3.14 -100 | to_number -> 100 +```bash +$ lam -n '"42" | to_number' +42 + +$ lam -n '"3.14" | to_number' +3.14 + +$ lam -n '100 | to_number' +100 -.price | to_number on {price: "29.99"} -> 29.99 +$ echo '{"price": "29.99"}' | lam '.price | to_number' +29.99 ``` Throws on strings that do not parse, and on inputs that are not strings @@ -431,41 +592,91 @@ Return the runtime type of the input as a string. Possible return values: `"null"`, `"boolean"`, `"number"`, `"string"`, `"array"`, `"object"`. -``` -42 | type -> "number" -"hello" | type -> "string" -null | type -> "null" -[1, 2] | type -> "array" -{"a": 1} | type -> "object" +```bash +$ lam -n '42 | type' +"number" + +$ lam -n '"hello" | type' +"string" + +$ lam -n 'null | type' +"null" + +$ lam -n '[1, 2] | type' +"array" + +$ lam -n '{a: 1} | type' +"object" -. | filter((. | type) == "number") on [1, "two", 3] -> [1, 3] +$ lam -n '[1, "two", 3] | filter((. | type) == "number")' +[ + 1, + 3 +] ``` ### filter_values(predicate) Filter a map's values. -``` -.config.database | filter_values(. == "localhost") --> {"host": "localhost"} +```bash +$ lam '.config.database | filter_values(. == "localhost")' data.json +{ + "host": "localhost" +} ``` ### map_values(expression) Transform a map's values. -``` -{"a": 1, "b": 2} | map_values(. * 10) --> {"a": 10, "b": 20} +```bash +$ lam -n '{a: 1, b: 2} | map_values(. * 10)' +{ + "a": 10, + "b": 20 +} ``` ### filter_keys(predicate) Filter a map's keys. +```bash +$ lam '.config | filter_keys(. != "debug")' data.json +{ + "database": { + "host": "localhost", + "port": 5432 + } +} ``` -.config | filter_keys(. != "debug") --> {"database": {"host": "localhost", "port": 5432}} + +### text + +Markdown-only. Walks a node or list of nodes and concatenates every +prose-bearing leaf — `text`, `code`, `code_block`, and `image.alt` — in +document order. Container nodes recurse through their `children`. +`html_block` and `html_inline` are skipped (a deliberate divergence +from mdast: raw HTML in "give me the text" is the same trap that drags +`