Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -45,6 +45,9 @@ pnpm-debug.log*
# Local AI agent instructions
.github/copilot-instructions.md

# Local agent state
.agents/state/

# MkDocs build output
site/

Expand Down
48 changes: 24 additions & 24 deletions docs/architecture.md
Original file line number Diff line number Diff line change
Expand Up @@ -129,40 +129,40 @@ Session is intentionally outside RFC 002’s normative emitted contract. It cons

For current package behavior, see [Execution context (Reference)][execution-ref] and [Execution context (Explanation)][execution-expl].

## Current implementation shape
## Package implementation shape

The package currently uses the following implementation shape:
The package uses the following implementation shape:

- author-facing carrier types live in [mod.incn](../src/dataset/mod.incn)
- canonical relational operator helpers live in [ops.incn](../src/dataset/ops.incn)
- Substrait emission lives under [substrait/](../src/substrait/)
- Prism internals live under [prism/](../src/prism/)
- `LazyFrame[T]` currently routes through a backend-native `PrismCursor[T]`
- `DataFrame[T]` and `DataStream[T]` are not yet fully converged on the same internal backing model as `LazyFrame[T]`
- `LazyFrame[T]` routes through a backend-native `PrismCursor[T]`
- `DataFrame[T]` and `DataStream[T]` keep their carrier-specific backing shapes while sharing the public dataset surface

This is enough to explain the package architecture while keeping current API behavior in language docs and follow-on gaps in RFCs, issues, and release notes.
This keeps package architecture in this document while detailed API behavior lives in language docs and future surface expansion stays in RFCs, issues, and release notes.

## Repository layout

| Path | Role |
| -------------------------------- | -------------------------------------------------- |
| `incan.toml` | Package metadata and Rust dependency declarations |
| `src/lib.incn` | Public package exports |
| `src/dataset/mod.incn` | Carrier types and trait surface |
| `src/dataset/ops.incn` | Canonical relational operator helpers |
| `src/prism/mod.incn` | Internal Prism graph, cursor, and lowering logic |
| `src/substrait/relations.incn` | Concrete `Rel` builders and relation lowering |
| `src/substrait/plans.incn` | Top-level `Plan` assembly helpers |
| `src/substrait/inspect.incn` | Relation/plan inspection and output-column helpers |
| `src/substrait/schema_registry.incn` | Named-table schema registration and lookup |
| `src/substrait/extensions.incn` | Extension anchors, URIs, and declaration helpers |
| `src/substrait/expr_lowering.incn` | Builder-to-Substrait expression lowering |
| `src/substrait/conformance.incn` | Typed conformance facade over catalog + validators |
| `src/substrait/schema.incn` | Model/schema to Substrait type bridging |
| `tests/` | Package tests run through `incan test` |
| `docs/language/` | Current package docs |
| `docs/rfcs/` | Normative RFC series |
| `docs/release_notes/` | Release-facing notes |
| Path | Role |
| ------------------------------------ | -------------------------------------------------- |
| `incan.toml` | Package metadata and Rust dependency declarations |
| `src/lib.incn` | Public package exports |
| `src/dataset/mod.incn` | Carrier types and trait surface |
| `src/dataset/ops.incn` | Canonical relational operator helpers |
| `src/prism/mod.incn` | Internal Prism graph, cursor, and lowering logic |
| `src/substrait/relations.incn` | Concrete `Rel` builders and relation lowering |
| `src/substrait/plans.incn` | Top-level `Plan` assembly helpers |
| `src/substrait/inspect.incn` | Relation/plan inspection and output-column helpers |
| `src/substrait/schema_registry.incn` | Named-table schema registration and lookup |
| `src/substrait/extensions.incn` | Extension anchors, URIs, and declaration helpers |
| `src/substrait/expr_lowering.incn` | Builder-to-Substrait expression lowering |
| `src/substrait/conformance.incn` | Typed conformance facade over catalog + validators |
| `src/substrait/schema.incn` | Model/schema to Substrait type bridging |
| `tests/` | Package tests run through `incan test` |
| `docs/language/` | Current package docs |
| `docs/rfcs/` | Normative RFC series |
| `docs/release_notes/` | Release-facing notes |

Normative behavior lives in the RFC series first. Current package behavior and usage belong in the language docs. If code and RFCs disagree, treat that as a bug or transition state to resolve explicitly.

Expand Down
22 changes: 11 additions & 11 deletions docs/language/explanation/dataset_carriers.md
Original file line number Diff line number Diff line change
Expand Up @@ -56,11 +56,11 @@ Use `LazyFrame[T]` when you want to compose operations before execution:

```incan
from pub::inql import LazyFrame
from pub::inql.functions import col, gt, int_lit
from pub::inql.functions import col, gt, lit
from models import Order

def high_value_orders(orders: LazyFrame[Order]) -> LazyFrame[Order]:
return orders.filter(gt(col("amount"), int_lit(100)))
return orders.filter(gt(col("amount"), lit(100)))
```

### `DataStream[T]` — streaming
Expand All @@ -69,11 +69,11 @@ Use `DataStream[T]` for streaming/unbounded data:

```incan
from pub::inql import DataStream
from pub::inql.functions import col, eq, str_lit
from pub::inql.functions import col, eq, lit
from models import Event

def important_events(events: DataStream[Event]) -> DataStream[Event]:
return events.filter(eq(col("severity"), str_lit("critical")))
return events.filter(eq(col("severity"), lit("critical")))
```

`DataStream[T]` shares the same operation API as batch carriers, but signals that its source is unbounded. Static streaming constraints are specified in RFC 001 and enforced as the compiler gains analysis for `UnboundedDataSet[T]`.
Expand Down Expand Up @@ -122,9 +122,9 @@ Current relational authoring is explicit and builder-based. That is deliberate:

Today there are three concrete builder families:

- filters: `eq(...)`, `gt(...)`, `int_lit(...)`, `str_lit(...)`
- filters: `eq(...)`, `gt(...)`, `lit(...)`
- aggregates: `col(...)`, `sum(...)`, `count()`
- projections: `with_column(...)`, `add(...)`, `mul(...)`, `int_expr(...)`, `str_expr(...)`, `bool_expr(...)`
- projections: `with_column(...)`, `add(...)`, `mul(...)`, `lit(...)`

### Aggregate helpers

Expand All @@ -148,15 +148,15 @@ That is the current semantic target for future sugar such as `.customer_id` or `
Computed columns now have one real entrypoint: `with_column(name, expr)`.

```incan
from pub::inql.functions import add, col, int_expr, mul
from pub::inql.functions import add, col, lit, mul
from pub::inql import LazyFrame
from models import Order

def enrich_orders(orders: LazyFrame[Order]) -> LazyFrame[Order]:
return (
orders
.with_column("amount_x2", mul(col("amount"), int_expr(2)))
.with_column("amount_plus_one", add(col("amount"), int_expr(1)))
.with_column("amount_x2", mul(col("amount"), lit(2)))
.with_column("amount_plus_one", add(col("amount"), lit(1)))
)
```

Expand All @@ -177,14 +177,14 @@ The most useful way to read the current surface is to separate:
This is real current InQL, not aspirational pseudocode:

```incan
from pub::inql.functions import add, col, count, int_expr, sum
from pub::inql.functions import add, col, count, lit, sum
from pub::inql import LazyFrame
from models import Order

def summarize_orders(orders: LazyFrame[Order]) -> LazyFrame[Order]:
grouped = (
orders
.with_column("amount_plus_one", add(col("amount"), int_expr(1)))
.with_column("amount_plus_one", add(col("amount"), lit(1)))
.group_by([col("customer_id")])
.agg([sum(col("amount")), count()])
)
Expand Down
24 changes: 12 additions & 12 deletions docs/language/explanation/execution_context.md
Original file line number Diff line number Diff line change
Expand Up @@ -52,36 +52,36 @@ This is the boundary where deferred relational work becomes local data in hand.

Some convenience APIs are nicer when they do not force the session parameter through every call site. `lazy.collect()` is one of those cases.

That convenience still needs a real execution context underneath, so it resolves through the active session at call time.
That convenience needs a real execution context underneath, so it resolves through the active session at call time.

- `session.activate()` sets the current active session
- `lazy.collect()` uses that active session

If there is no active session, the convenience API fails clearly instead of pretending execution context can be ambient without definition.

## Writing is still Session-owned
## Writing is Session-owned

`session.write_csv(...)` and `session.write_parquet(...)` remain explicit Session methods because writing is not just a carrier concern. It requires binding, execution, and sink ownership.

So the current ergonomic split is:
The ergonomic split is:

- convenience materialization: `lazy.collect()`
- explicit writes: `session.write_csv(...)`, `session.write_parquet(...)`

This is a current package ergonomics choice, not a statement that all future convenience APIs must keep the same shape.
This keeps materialization convenient while leaving sink ownership explicit at the session boundary.

## Typical flow

```incan
from pub::inql import Session
from pub::inql.functions import col, gt, int_expr, int_lit, mul
from pub::inql import LazyFrame, Session
from pub::inql.functions import col, gt, lit, mul
from models import Order

session = Session.default()

orders = session.read_csv[Order]("orders", "orders.csv")?
enriched = orders.with_column("amount_x2", mul(col("amount"), int_expr(2)))
filtered = enriched.filter(gt(col("amount"), int_lit(100))).limit(10)
orders: LazyFrame[Order] = session.read_csv("orders", "orders.csv")?
enriched = orders.with_column("amount_x2", mul(col("amount"), lit(2)))
filtered = enriched.filter(gt(col("amount"), lit(100))).limit(10)

session.activate()
preview = filtered.collect()?
Expand All @@ -98,14 +98,14 @@ This pattern is intentionally simple:

For the exact method surface, see [Dataset methods (Reference)](../reference/dataset_methods.md).

## Current limitation
## Materialized carrier shape

`DataFrame[T]` is already the materialized carrier, but its row-level user API is still intentionally narrow. The important current semantic distinction is already in place:
`DataFrame[T]` is the materialized carrier. The important semantic distinction is:

- `LazyFrame[T]` = deferred
- `DataFrame[T]` = local materialized

Today that materialized carrier exposes structured collection metadata first:
The materialized carrier exposes structured collection metadata:

- resolved columns
- row count
Expand Down
19 changes: 10 additions & 9 deletions docs/language/reference/builders/aggregates.md
Original file line number Diff line number Diff line change
@@ -1,24 +1,25 @@
# Aggregate builders (Reference)

Current aggregate authoring is explicit and builder-based.
Current aggregate authoring is explicit and scalar-expression-based.

## Functions

| Builder | Signature | Meaning |
| ------- | ----------------------------------------------- | ---------------------------------------------------------------------- |
| `col` | `def col(name: str) -> ColumnExpr` | Column reference builder used by aggregates, filters, and projections. |
| `sum` | `def sum(expr: ColumnExpr) -> AggregateMeasure` | Sum one selected numeric column. |
| `count` | `def count() -> AggregateMeasure` | Count rows in the current relation or group. |
| Builder | Signature | Meaning |
| ------- | ----------------------------------------------------------- | ---------------------------------------------------------------------- |
| `col` | `def col(name: str) -> ColumnExpr` | Column reference builder used by aggregates, filters, and projections. |
| `lit` | `def lit(value: int \| float \| str \| bool) -> ColumnExpr` | Canonical scalar literal helper. |
| `sum` | `def sum(expr: ColumnExpr) -> AggregateMeasure` | Sum one scalar expression. |
| `count` | `def count() -> AggregateMeasure` | Count rows in the current relation or group. |

## Example

```incan
from pub::inql.functions import col, count, sum
from pub::inql.functions import add, col, count, lit, sum

grouped = orders.group_by([col("customer_id")]).agg([sum(col("amount")), count()])
grouped = orders.group_by([col("customer_id")]).agg([sum(add(col("amount"), lit(5))), count()])
```

## Notes

- The current package slice requires explicit `col(...)` builders.
- Aggregate inputs use the same scalar-expression model as filters, projections, and grouping keys.
- Future `.column` sugar and scoped aggregate symbols should lower to this same surface rather than replacing its semantics.
32 changes: 17 additions & 15 deletions docs/language/reference/builders/filters.md
Original file line number Diff line number Diff line change
@@ -1,32 +1,34 @@
# Filter builders (Reference)

Current filter authoring is explicit and builder-based.
Current filter authoring uses the shared scalar-expression builder model.

## Functions

| Builder | Signature | Meaning |
| -------------- | ----------------------------------------------------------------------- | ------------------------------------------------------ |
| `always_true` | `def always_true() -> FilterPredicate` | Trivial predicate; canonical rewrite can eliminate it. |
| `always_false` | `def always_false() -> FilterPredicate` | Predicate that rejects every row. |
| `eq` | `def eq(column: ColumnExpr, literal: FilterLiteral) -> FilterPredicate` | Equality predicate. |
| `gt` | `def gt(column: ColumnExpr, literal: FilterLiteral) -> FilterPredicate` | Greater-than predicate. |
| `int_lit` | `def int_lit(value: int) -> FilterLiteral` | Integer literal for filter predicates. |
| `str_lit` | `def str_lit(value: str) -> FilterLiteral` | String literal for filter predicates. |
| `bool_lit` | `def bool_lit(value: bool) -> FilterLiteral` | Boolean literal for filter predicates. |
| Builder | Signature | Meaning |
| -------------- | ----------------------------------------------------------- | ---------------------------------------------------------------------- |
| `always_true` | `def always_true() -> ColumnExpr` | Trivial boolean scalar expression; canonical rewrite can eliminate it. |
| `always_false` | `def always_false() -> ColumnExpr` | Boolean scalar expression that rejects every row. |
| `eq` | `def eq(left: ColumnExpr, right: ColumnExpr) -> ColumnExpr` | Equality predicate scalar expression. |
| `gt` | `def gt(left: ColumnExpr, right: ColumnExpr) -> ColumnExpr` | Greater-than predicate scalar expression. |
| `lit` | `def lit(value: int \| float \| str \| bool) -> ColumnExpr` | Canonical scalar literal helper. |
| `int_lit` | `def int_lit(value: int) -> ColumnExpr` | Typed integer literal helper. |
| `str_lit` | `def str_lit(value: str) -> ColumnExpr` | Typed string literal helper. |
| `bool_lit` | `def bool_lit(value: bool) -> ColumnExpr` | Typed boolean literal helper. |

## Example

```incan
from pub::inql.functions import col, eq, gt, int_lit, str_lit
from pub::inql.functions import col, eq, gt, lit

filtered = (
orders
.filter(gt(col("amount"), int_lit(100)))
.filter(eq(col("status"), str_lit("open")))
.filter(gt(col("amount"), lit(100)))
.filter(eq(col("status"), lit("open")))
)
```

## Notes

- Filter predicates currently operate on one explicit column builder plus one explicit literal.
- Rich boolean composition is follow-up work.
- Filter predicates are scalar expressions, not a separate predicate-only builder hierarchy.
- The typed `*_lit(...)` helpers construct the same scalar-literal representation as `lit(...)`.
- Boolean composition belongs to the broader scalar-function surface.
19 changes: 11 additions & 8 deletions docs/language/reference/builders/projections.md
Original file line number Diff line number Diff line change
@@ -1,18 +1,21 @@
# Projection builders (Reference)

Projection builders are the current semantic target for computed columns.
Projection builders are the current semantic target for scalar expressions in computed columns and other row-level positions.

## Functions

| Builder | Signature | Meaning |
| ------------ | ------------------------------------------------------------ | --------------------------- |
| `col` | `def col(name: str) -> ColumnExpr` | Named column reference. |
| `lit` | `def lit(value: int \| float \| str \| bool) -> ColumnExpr` | Canonical scalar literal. |
| `int_expr` | `def int_expr(value: int) -> ColumnExpr` | Integer literal expression. |
| `float_expr` | `def float_expr(value: float) -> ColumnExpr` | Float literal expression. |
| `str_expr` | `def str_expr(value: str) -> ColumnExpr` | String literal expression. |
| `bool_expr` | `def bool_expr(value: bool) -> ColumnExpr` | Boolean literal expression. |
| `add` | `def add(left: ColumnExpr, right: ColumnExpr) -> ColumnExpr` | Binary addition. |
| `mul` | `def mul(left: ColumnExpr, right: ColumnExpr) -> ColumnExpr` | Binary multiplication. |
| `eq` | `def eq(left: ColumnExpr, right: ColumnExpr) -> ColumnExpr` | Equality predicate. |
| `gt` | `def gt(left: ColumnExpr, right: ColumnExpr) -> ColumnExpr` | Greater-than predicate. |

## Dataset entrypoint

Expand All @@ -26,17 +29,17 @@ def with_column(self, name: str, expr: ColumnExpr) -> Self
## Example

```incan
from pub::inql.functions import add, col, int_expr, mul
from pub::inql.functions import add, col, lit, mul

projected = (
orders
.with_column("amount_x2", mul(col("amount"), int_expr(2)))
.with_column("amount_plus_one", add(col("amount"), int_expr(1)))
.with_column("amount_x2", mul(col("amount"), lit(2)))
.with_column("amount_plus_one", add(col("amount"), lit(1)))
)
```

## Current limits
## Capability notes

- No argument-bearing `select(...)` yet.
- No query-block projection sugar yet.
- No alias-free symbolic surface like `.amount * 2` yet.
- `with_column(...)` is the explicit computed-column entrypoint.
- Projection-list selection, query-block projection sugar, and alias-free symbolic surfaces lower to this scalar-expression model when exposed.
- The typed literal helpers construct the same scalar-literal representation as `lit(...)`.
Loading
Loading