- Status: In Progress
- Created: 2026-03-23
- Author(s): Danny Meijer
- Related:
- InQL RFC 000 (language specification — naming, schema shapes, compilation model)
- InQL RFC 001 (dataset types —
DataSet[T]carriers and schema parameter)
- Issue: InQL #3
- RFC PR: -
- Written against: Incan v0.2
- Shipped in: -
This RFC defines Apache Substrait as the normative logical interchange for InQL relational plans: which Rel and expression shapes implementations produce, how read roots remain backend-agnostic while environment binding (adapters, credentials, runner choice) stays outside InQL, and how extensions cover capabilities that lack a stable logical Rel in core Substrait. The query {} surface requires lowering to Substrait; this RFC owns the cross-surface contract so method-chain APIs (InQL RFC 001), query {} blocks, and optional pipe-forward do not diverge at emission time.
- A checked InQL relational tree must be expressible as a Substrait
Planwhose executable root is aReltree, optionally a DAG viaReferenceRelwhen subplans are shared. - Logical reads are
ReadRel(or extension leaf relations) carrying names, virtual rows, or extension payloads instead of host-specific connection strings or secrets in the normative interchange. - Scalar and aggregate computation uses Substrait expressions and aggregate functions; functions outside the pinned core set must use registered extension URIs documented with the compiler.
- North-star operator catalog: InQL capabilities map to logical
Relkinds as specified in the Substrait operator catalog reference; MVP subsets are implementation choices but must not contradict this RFC for operators they expose.
Without a dedicated specification, Substrait lowering risks drifting between front-ends (query {}, APIs on DataSet[T]) and emitters, and risks smuggling execution concerns (storage URIs, credentials, engine choice) into the query IR. Substrait is the ecosystem's portable relational algebra serialization; InQL needs a single Rel-level contract, version pinning rules, and an explicit boundary between plan semantics and operational binding.
- Require that conforming implementations emit Substrait for relational features they claim to support, using logical
Relnodes unless a documented extension applies. - Publish a versioned mapping catalog from InQL plan concepts to Substrait logical relations and expression patterns, marking core spec, extension, or documented expansion / gap.
- Specify read roots: logical
ReadRelshapes in InQL vs adapter resolution in the host execution environment. - Require documented pinning of Substrait revision and of any bundled extension function sets shipped with the toolchain.
- List known gaps (unnest, pivot, advanced joins, streaming-specific semantics) without blocking InQL RFC 003.
- Defining orchestration, workflow, or adapter authoring syntax — out of scope; only binding boundaries relative to InQL plans are stated here.
- Mandating a default Substrait consumer (specific engine or library) — implementation detail; InQL RFC 004 names the reference backend.
- Physical Substrait relations as a normative InQL output — consumers may use them; InQL may emit them when documented as a non-portable or target-specific mode.
- ANSI SQL completeness — mapping is capability-based, not a SQL compliance checklist.
The v1 implementation profile for this RFC is explicitly scoped to InQL package code (.incn) and is the contract for current delivery tracking.
- Core read/query
Relcoverage is implemented through a thin proto-backed Substrait boundary in InQL package code. - Optional mutation relations remain modeled but are not required to be executable in v1.
- Gap and extension semantics are represented as typed contracts in package code and conformance scenarios, rather than ad hoc string payloads.
- Richer planning semantics remain outside this profile when they logically belong to future
query {}lowering or Prism.
This profile is reflected by:
src/substrait/schema.incnsrc/substrait/plan.incnsrc/substrait/conformance.incnsrc/substrait/conformance_catalog.incnsrc/substrait/conformance_validate.incndocs/language/reference/substrait/conformance.md
| Area | Current status |
|---|---|
| Read roots, filter, cross, sort, fetch | Implemented at the proto-backed Substrait boundary |
| Join and set operation selection | Implemented at the boundary through explicit package-level enums/helpers |
| Reference rel | Implemented at the boundary for ordinal preservation only |
| Project and aggregate | Present as boundary-shape scaffolds; richer expression/grouping semantics remain deferred |
| Extension URI policy and explode gap encoding | Implemented through a registered package-level URI and documented gap policy |
query {} lowering parity |
Deferred to RFC 003 and Prism-backed lowering |
| Optional mutation profile | Deferred; not required for the v0.1 read/query analytical core |
Authors build DataSet[T] values (InQL RFC 001) using query {} or relational method chains. After typechecking, the relational work becomes a Substrait plan: mostly FilterRel, ProjectRel, JoinRel, AggregateRel, and so on, rooted in a ReadRel when new data enters the plan.
When a plan says "read this named relation" or "read this logical asset id," the plan carries the logical identity. The execution context resolves that identity to concrete storage, applies policy, and supplies credentials. That split keeps InQL portable and keeps governance-sensitive details out of the serialized plan's normative story.
- Implementations must be able to produce a Substrait
Planfor every relational operator they expose that is claimed portable in documentation. - Lowering semantics must be identical whether the surface is
query {}, trait methods, or desugared pipe-forward, for the same checked tree. - Implementations may additionally lower to InQL RFC 001 operations for execution; if both paths exist, they must match the Substrait semantics for those operators.
For the full capability → Rel mapping, profile classifications, and gap encoding requirements, see the Substrait operator catalog reference.
Conformance scenarios should use stable scenario IDs and typed InQL model contracts as defined by the Substrait conformance corpus reference, so implementation and CI reporting can track portability status consistently across toolchains.
The following are the primary logical relations InQL targets. Exact protobuf message paths follow the pinned Substrait version selected by the toolchain for a given release.
Substrait Rel |
Role |
|---|---|
ReadRel |
Scans: named table, local files, virtual rows, extension-defined sources |
FilterRel |
Row filter |
ProjectRel |
Derived columns; window expressions appear here per Substrait |
JoinRel |
Joins including semi, anti, single, and mark variants; optional post_join_filter |
CrossRel |
Cartesian product |
AggregateRel |
Grouping sets, measures, FILTER on measures; distinct via keys-only aggregate |
SortRel |
Sort |
FetchRel |
Limit / offset |
SetRel |
Union, intersect, except variants |
ReferenceRel |
Shared subplans within a Plan |
WriteRel |
DML / CTAS (optional mutation profile) |
UpdateRel |
Table update without a full child Rel input (optional profile) |
DdlRel |
DDL (optional profile) |
ExtensionSingleRel / ExtensionMultiRel / ExtensionLeafRel |
Extension escape hatches |
InQL defines a north-star operator catalog that maps every InQL plan capability to a Substrait Rel or expression pattern, and classifies each capability as one of: core (maps to a standard logical Rel in the pinned revision), extension (requires a registered extension URI), gap (no stable logical Rel in core Substrait; encoding must be documented), or optional-mutation (not required for read/query analytical core).
The full, versioned catalog — including all profile tags, gap encoding requirements, and mutation-profile operators — lives in the Substrait operator catalog reference. Conforming implementations must follow the mappings listed there for any capability they claim portable.
- InQL must express new data entering a plan as logical reads: names, virtual values, or opaque extension table types that still serialize as Substrait
ReadRel(or an extension leaf) without normative dependence on secret material in the plan text. - The execution context must resolve logical reads to physical resources through its adapter and execution layer; that layer must not redefine relational semantics of the plan.
- Product SDKs may present a unified import surface; adapter-specific "open connection" APIs should not be specified as core InQL — they remain thin wrappers at most.
For the full ReadRel variant reference, the detailed execution context obligations, and the adapter boundary contract, see the read-root and binding contract reference.
- Functions not in the pinned core Substrait bundle must use extension URIs registered in the compiler's public catalog for that toolchain version.
AdvancedExtensionfields may carry hints; normative semantics must be expressible without relying on hints.
For revision pin requirements, URI registration policy, bundle naming, compatibility conventions, and the release-note checklist for pin bumps, see the revision and extension policy reference.
- InQL may expose
WriteRel,DdlRel, orUpdateRelfor warehouse-style mutation. Absence of these in a given distribution does not make InQL incomplete for read-only analytical use.
The optional mutation profile operators, per-operator portability notes, and support expectations are listed in the Substrait operator catalog reference.
The following reference documents expand on the operational detail that is too long-lived and versioned to remain inside this RFC text:
| Document | What it covers |
|---|---|
| Substrait operator catalog | Full InQL capability → Rel mapping; profile tags; gap encoding rules; mutation profile operators |
| Substrait revision and extension policy | Revision pin requirements; extension URI registration policy; compatibility conventions; release-note checklist |
| Substrait read-root and binding contract | ReadRel variant reference; execution context obligations; adapter boundary |
| Substrait conformance corpus | Canonical corpus structure, scenario metadata schema, profile taxonomy, and stable scenario ID conventions |
- Field references and types must align with
model-backed schemas (InQL RFC 001) and lower to Substrait types and field indices consistent with the emittedNamedStruct.
- Additive mapping catalog changes should be the default; breaking emitter changes must ship with release notes and, when user-visible, an RFC amendment.
- SQL strings only as interchange — rejected (weak structure for optimizers and cross-language tools).
- Custom proprietary IR only — rejected (ecosystem and long-term coupling).
- Substrait optional for "portable" builds — rejected; optional Substrait may exist only for explicitly non-portable or closed backends if documented as such.
- Substrait lags some front-end expressiveness; extensions and rewrites add maintenance.
- Dual lowering (InQL RFC 001 APIs + Substrait) increases test surface unless one path is canonical in practice.
- Producer / consumer version skew requires disciplined pinning and clear compatibility statements.
Non-normative: toolchains should maintain golden Substrait plans or equivalent fixture tests for representative API-lowered trees, and later add query {} fixtures once that surface lowers through the same boundary.
- IR / lowering to Substrait and extension registration.
- Conformance / testing artifacts for serialized plans.
- Published operator catalog and release notes for Substrait pin bumps.
- Lock down the versioned Substrait revision pinning policy in compiler documentation and release artifacts.
- Publish the normative operator catalog mapping InQL capabilities to Substrait
Relkinds, including gap annotations for unnest, pivot, and streaming semantics. - Document extension URI registration conventions in the public toolchain catalog.
- Lower current package-authored
DataSet[T]method-chain relational trees to SubstraitPlan/Relnodes covering:ReadRel,FilterRel,ProjectRel,JoinRel,CrossRel,AggregateRel,SortRel,FetchRel,SetRel, andReferenceRel. - Keep the current package layer thin: relation-shape ownership stays here, while richer planning semantics remain candidates for Prism.
- Align field references and types with
model-backed schemas (RFC 001) where the current package code materially owns that boundary.
- Implement extension URI registration for functions outside the pinned core Substrait bundle.
- Implement logical
ReadRelemission for named tables, virtual rows, and extension-defined sources without normative secret material in the plan text. - Implement documented extension encoding for unnest / explode (gap handling).
- Implement
WriteRel,DdlRel, andUpdateRelemission under the optional mutation profile.
- Add golden Substrait plan fixtures for representative API-lowered trees.
- Add
query {}fixtures later when that surface lowers through the same boundary. - Verify fixture round-trips when Substrait revision is bumped.
- Update docs-site pages and operator catalog for public release.
- Substrait revision pinning policy documented in release artifacts and compiler docs.
- Normative operator catalog published (including gap annotations).
- Extension URI registration conventions documented.
-
ReadRelemission: named table, virtual rows, extension sources. -
FilterRelemission. -
ProjectRelboundary scaffold emission. -
JoinRelemission (semi, anti, single, mark variants). -
CrossRelemission. -
AggregateRelboundary scaffold emission. -
SortRelemission. -
FetchRelemission (limit / offset). -
SetRelemission (union / intersect / except). -
ReferenceRelordinal emission at the Substrait boundary. - Lowering is identical across
query {}, method chains, and desugared pipe-forward for the same checked tree once RFC 003 and Prism-backed lowering land. - Field references align with RFC 001
model-backed schemas andNamedStructindices.
- Extension URI registration for non-core functions wired in toolchain catalog.
- Logical
ReadRelcarries no normative secret material (binding left to execution context). - Documented extension encoding for unnest / explode gap.
-
WriteRelemission (optional profile). -
DdlRelemission (optional profile). -
UpdateRelemission (optional profile).
- Golden Substrait plan fixtures for representative API-lowered (
DataSet[T]) trees. - Golden Substrait plan fixtures for representative
query {}trees once that surface lowers through the same boundary. - Fixture round-trip tests on Substrait revision bump.
- Tests confirm no secret material in emitted
ReadRelplans.
- Operator catalog page updated in docs-site.
- Release notes entry added.
- Substrait revision pinning: this RFC defines the pinning policy, not one timeless revision number. Each conforming InQL toolchain release must publish the exact Substrait revision it targets and any bundled extension sets in public release artifacts and compiler documentation.
- Canonical unnest / explode encoding: until core Substrait standardizes a portable unnest relation that InQL adopts,
EXPLODE-style behavior must lower through a documented extension relation or another documented non-core encoding listed in the toolchain's public operator catalog. Implementations must not present ad hoc or undocumented encodings as portable core behavior. - Mutation relations:
WriteRel,DdlRel, andUpdateRelremain an optional mutation profile. They are not part of the minimum read/query analytical core required for InQL v0.1, and implementations may expose them only when the execution context and backend support them. - Correlated subqueries: InQL v0.1 does not standardize a single correlated-subquery desugaring because correlated subquery surface syntax is not part of the minimum relational grammar. If a future RFC adds correlated subqueries, that RFC must define the lowering contract explicitly rather than relying on implicit emitter policy.