Skip to content

feat: Subqueries in SELECT for hierarchical data (includes)#1294

Open
kevin-dp wants to merge 19 commits intomainfrom
kevin/includes
Open

feat: Subqueries in SELECT for hierarchical data (includes)#1294
kevin-dp wants to merge 19 commits intomainfrom
kevin/includes

Conversation

@kevin-dp
Copy link
Contributor

Summary

  • Adds support for subqueries inside .select() that produce hierarchical results — each parent row gets a child Collection (e.g., projects with nested issues, issues with nested comments)
  • Child queries are inner-joined with the parent pipeline so only children matching filtered parents flow through
  • Supports ORDER BY and LIMIT/OFFSET on child queries (uses grouped ORDER BY so limits are per-parent)
  • Nested includes work recursively (projects → issues → comments)

Closes #288

Test plan

  • Basic includes: parent rows have child Collections with correct items
  • Reactivity: adding/removing children updates child Collections without touching parents
  • Parent remove + re-add: child Collection resets correctly
  • Inner join filtering: children only shown for parents matching WHERE
  • Nested includes: two levels deep (projects → issues → comments)
  • Ordered child queries: child Collections respect ORDER BY
  • Ordered + LIMIT: limit applied per parent, not globally; insertions displace correctly
  • All 1815 existing tests pass (no regressions)

🤖 Generated with Claude Code

@changeset-bot
Copy link

changeset-bot bot commented Feb 25, 2026

🦋 Changeset detected

Latest commit: fc269d5

The changes in this PR will be included in the next version bump.

This PR includes changesets to release 14 packages
Name Type
@tanstack/db Minor
@tanstack/angular-db Patch
@tanstack/electric-db-collection Patch
@tanstack/offline-transactions Patch
@tanstack/powersync-db-collection Patch
@tanstack/query-db-collection Patch
@tanstack/react-db Patch
@tanstack/rxdb-db-collection Patch
@tanstack/solid-db Patch
@tanstack/svelte-db Patch
@tanstack/trailbase-db-collection Patch
@tanstack/vue-db Patch
todos Patch
@tanstack/db-example-paced-mutations-demo Patch

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

@pkg-pr-new
Copy link

pkg-pr-new bot commented Feb 25, 2026

More templates

@tanstack/angular-db

npm i https://pkg.pr.new/@tanstack/angular-db@1294

@tanstack/db

npm i https://pkg.pr.new/@tanstack/db@1294

@tanstack/db-ivm

npm i https://pkg.pr.new/@tanstack/db-ivm@1294

@tanstack/electric-db-collection

npm i https://pkg.pr.new/@tanstack/electric-db-collection@1294

@tanstack/offline-transactions

npm i https://pkg.pr.new/@tanstack/offline-transactions@1294

@tanstack/powersync-db-collection

npm i https://pkg.pr.new/@tanstack/powersync-db-collection@1294

@tanstack/query-db-collection

npm i https://pkg.pr.new/@tanstack/query-db-collection@1294

@tanstack/react-db

npm i https://pkg.pr.new/@tanstack/react-db@1294

@tanstack/rxdb-db-collection

npm i https://pkg.pr.new/@tanstack/rxdb-db-collection@1294

@tanstack/solid-db

npm i https://pkg.pr.new/@tanstack/solid-db@1294

@tanstack/svelte-db

npm i https://pkg.pr.new/@tanstack/svelte-db@1294

@tanstack/trailbase-db-collection

npm i https://pkg.pr.new/@tanstack/trailbase-db-collection@1294

@tanstack/vue-db

npm i https://pkg.pr.new/@tanstack/vue-db@1294

commit: fc269d5

@github-actions
Copy link
Contributor

github-actions bot commented Feb 25, 2026

Size Change: +3.38 kB (+3.65%)

Total Size: 96 kB

Filename Size Change
./packages/db/dist/esm/query/builder/index.js 4.59 kB +485 B (+11.83%) ⚠️
./packages/db/dist/esm/query/compiler/group-by.js 2.24 kB +9 B (+0.4%)
./packages/db/dist/esm/query/compiler/index.js 2.68 kB +641 B (+31.5%) 🚨
./packages/db/dist/esm/query/compiler/order-by.js 1.5 kB +52 B (+3.58%)
./packages/db/dist/esm/query/compiler/select.js 1.11 kB +20 B (+1.83%)
./packages/db/dist/esm/query/ir.js 738 B +65 B (+9.66%) ⚠️
./packages/db/dist/esm/query/live/collection-config-builder.js 7.65 kB +2.11 kB (+37.97%) 🚨
ℹ️ View Unchanged
Filename Size
./packages/db/dist/esm/collection/change-events.js 1.39 kB
./packages/db/dist/esm/collection/changes.js 1.22 kB
./packages/db/dist/esm/collection/events.js 388 B
./packages/db/dist/esm/collection/index.js 3.32 kB
./packages/db/dist/esm/collection/indexes.js 1.1 kB
./packages/db/dist/esm/collection/lifecycle.js 1.75 kB
./packages/db/dist/esm/collection/mutations.js 2.34 kB
./packages/db/dist/esm/collection/state.js 3.49 kB
./packages/db/dist/esm/collection/subscription.js 3.71 kB
./packages/db/dist/esm/collection/sync.js 2.41 kB
./packages/db/dist/esm/deferred.js 207 B
./packages/db/dist/esm/errors.js 4.7 kB
./packages/db/dist/esm/event-emitter.js 748 B
./packages/db/dist/esm/index.js 2.69 kB
./packages/db/dist/esm/indexes/auto-index.js 742 B
./packages/db/dist/esm/indexes/base-index.js 766 B
./packages/db/dist/esm/indexes/btree-index.js 2.17 kB
./packages/db/dist/esm/indexes/lazy-index.js 1.1 kB
./packages/db/dist/esm/indexes/reverse-index.js 538 B
./packages/db/dist/esm/local-only.js 808 B
./packages/db/dist/esm/local-storage.js 2.1 kB
./packages/db/dist/esm/optimistic-action.js 359 B
./packages/db/dist/esm/paced-mutations.js 496 B
./packages/db/dist/esm/proxy.js 3.75 kB
./packages/db/dist/esm/query/builder/functions.js 733 B
./packages/db/dist/esm/query/builder/ref-proxy.js 1.05 kB
./packages/db/dist/esm/query/compiler/evaluators.js 1.43 kB
./packages/db/dist/esm/query/compiler/expressions.js 430 B
./packages/db/dist/esm/query/compiler/joins.js 2.11 kB
./packages/db/dist/esm/query/expression-helpers.js 1.43 kB
./packages/db/dist/esm/query/live-query-collection.js 360 B
./packages/db/dist/esm/query/live/collection-registry.js 264 B
./packages/db/dist/esm/query/live/collection-subscriber.js 2.42 kB
./packages/db/dist/esm/query/live/internal.js 145 B
./packages/db/dist/esm/query/optimizer.js 2.62 kB
./packages/db/dist/esm/query/predicate-utils.js 2.97 kB
./packages/db/dist/esm/query/subset-dedupe.js 921 B
./packages/db/dist/esm/scheduler.js 1.3 kB
./packages/db/dist/esm/SortedMap.js 1.3 kB
./packages/db/dist/esm/strategies/debounceStrategy.js 247 B
./packages/db/dist/esm/strategies/queueStrategy.js 428 B
./packages/db/dist/esm/strategies/throttleStrategy.js 246 B
./packages/db/dist/esm/transactions.js 2.9 kB
./packages/db/dist/esm/utils.js 924 B
./packages/db/dist/esm/utils/browser-polyfills.js 304 B
./packages/db/dist/esm/utils/btree.js 5.61 kB
./packages/db/dist/esm/utils/comparison.js 952 B
./packages/db/dist/esm/utils/cursor.js 457 B
./packages/db/dist/esm/utils/index-optimization.js 1.51 kB
./packages/db/dist/esm/utils/type-guards.js 157 B

compressed-size-action::db-package-size

@github-actions
Copy link
Contributor

github-actions bot commented Feb 25, 2026

Size Change: 0 B

Total Size: 3.7 kB

ℹ️ View Unchanged
Filename Size
./packages/react-db/dist/esm/index.js 225 B
./packages/react-db/dist/esm/useLiveInfiniteQuery.js 1.17 kB
./packages/react-db/dist/esm/useLiveQuery.js 1.34 kB
./packages/react-db/dist/esm/useLiveSuspenseQuery.js 559 B
./packages/react-db/dist/esm/usePacedMutations.js 401 B

compressed-size-action::react-db-package-size

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
kevin-dp and others added 5 commits February 25, 2026 11:16
Replace O(n) parent collection scans with a reverse index
(correlationKey → Set<parentKey>) for attaching child Collections
to parent rows. The index is populated during parent INSERTs
and cleaned up on parent DELETEs.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Copy link
Collaborator

@samwillis samwillis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is looking really great. Awesome work!

Depending on if we are going to release before or after followup PRs it may make sense to add some defensive errors for unsupported queries (groupBy, referencing multiple fields on the parent)

? where.expression
: where

// Look for eq(a, b) where one side references parent and other references child
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this is finding the first expression that references both sides, this is correct. We should consider what something like this does:

q.from({ p: projects }).select(({ p }) => ({
  id: p.id,
  name: p.name,
  issues: q
    .from({ i: issues })
    .where(({ i }) => and(eq(i.projectId, p.id)), eq(i.createdBy, p.createdBy))
    .select(({ i }) => ({
      id: i.id,
      title: i.title,
    })),
})),
)

I suspect it breaks at the moment, and so we may want to throw if there is more than one expression matching both sources.

I think it's possible to make this work though by pulling the parent project value temporarily into the child issue pipeline.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed, it breaks right now because the parent row is not in the child pipeline. I added support for this in this PR: #1307

Comment on lines +213 to +226
// Re-add project Alpha — should get a fresh child collection
projects.utils.begin()
projects.utils.write({
type: `insert`,
value: { id: 1, name: `Alpha Reborn` },
})
projects.utils.commit()

const alpha = collection.get(1) as any
expect(alpha).toMatchObject({ id: 1, name: `Alpha Reborn` })
expect(childItems(alpha.issues)).toEqual([
{ id: 10, title: `Bug in Alpha` },
{ id: 11, title: `Feature for Alpha` },
])
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🥳

@samwillis
Copy link
Collaborator

ChatGPT review:


Here’s my review of TanStack/db PR #1294 (adds “includes” subqueries / nested child collections). ([GitHub]1)

What this PR is doing (as I understand it)

  • New “includes subquery” syntax: returning a QueryBuilder from inside a parent .select() field now becomes an IncludesSubquery IR node, by detecting a correlating eq(child.fk, parent.pk) in the child query’s where. ([GitHub]2)
  • Compiler support: the compiler extracts those IncludesSubquery nodes, compiles the child query “per parent correlation key”, and plumbs an extra correlationKey through the result tuples so the output layer can route rows into the correct child collection. ([GitHub]3)
  • Output/runtime support: live query builder wires child pipelines via output() callbacks, creates per-parent child Collections, attaches them onto parent rows, and handles nested includes via a shared-buffer + routing-index approach. ([GitHub]4)
  • Tests: good coverage for basic includes, reactivity (insert/delete), ordered children, per-parent limit, and 2-level nesting (projects → issues → comments). ([GitHub]5)
  • Related fix: containsAggregate is now defensive against nested Select objects (important because includes introduces nested select-shaped objects). ([GitHub]6)

Overall: the API is very ergonomic, and the nested routing solution is clever.


Things I like

  • The user-facing API is dead simple and reads like a real ORM include. ([GitHub]5)
  • The compiler approach (extract includes early, compile child with a parent-key stream, and then let the live layer attach real child collections) is a solid separation of concerns. ([GitHub]3)
  • The tests hit the highest-risk areas: ordering + limit per parent + nested includes. ([GitHub]5)

Key concerns / suggested changes

1) Alias collisions between parent and child queries (correctness bug risk)

extractCorrelation() decides “parent vs child” purely by membership in parentAliases / childAliases. If an alias name appears in both sets (e.g. user reuses p inside the child query), the correlation detection can mis-classify and/or silently do the wrong thing. ([GitHub]2)

Suggestion

  • Enforce disjoint alias sets for includes subqueries at build time:

    • If childAliases intersects parentAliases, throw a dedicated error (ideally the same family as DuplicateAliasInSubqueryError used elsewhere). ([GitHub]3)
  • Also consider extending validateQueryStructure to walk select and validate nested IncludesSubquery nodes too (right now it looks focused on from/join QueryRefs). ([GitHub]3)

2) Correlation extraction only matches top-level eq(ref, ref)

Right now the includes correlation must be a direct where(() => eq(child.fk, parent.pk)). If someone writes:

.where(({ i }) => and(eq(i.projectId, p.id), eq(i.status, 'open')))

…correlation won’t be found (because it doesn’t traverse and() trees). That’s fine for v1, but it needs to be explicit.

Suggestion

  • Either:

    1. document “correlation must be a top-level where(eq(...))”, and add a clearer error type/message; or
    2. traverse boolean expression trees (and/or) to find the first correlating eq.

The current thrown Error(...) is a bit “raw” for public API behavior. ([GitHub]2)

3) Compiler mutates the query IR’s select in-place

replaceIncludesInSelect(query.select, key) replaces includes entries with Val(null) so processSelect() doesn’t see them. That’s convenient, but in-place mutation of a query object that may be cached/reused is a footgun (especially if you ever recompile without cache, or if other passes expect to see includes). ([GitHub]3)

Suggestion

  • Treat IR as immutable here:

    • clone select into selectWithoutIncludes,
    • compile using the clone,
    • keep the original IR intact.

4) Parent “correlation key” changes on update aren’t handled cleanly

In flushIncludesState, Phase 1 runs on changes.inserts > 0, so it will also run for updates (which often show as delete+insert deltas internally). But Phase 5 only cleans up on “pure delete” (deletes > 0 && inserts === 0). ([GitHub]4)

If a parent row’s correlation value can change (even if rare), you can end up with:

  • stale correlationToParentKeys membership under the old key,
  • orphaned child collections / routing entries,
  • parent row still pointing at the old child collection.

Suggestion

  • Either explicitly state/validate that the correlation field must be stable (usually the parent primary key), or

  • Track the last correlationKey per parent key and on update:

    • remove old mapping + routing entries when the correlationKey changes,
    • attach the new child collection.

Given the current code already supports “multiple parents per correlationKey”, it’s close—just missing the “move” case. ([GitHub]4)

5) Child collection IDs use String(correlationKey)

id: ${parentId}-${fieldName}-${String(correlationKey)}

  • If correlationKey is an object/composite, String() becomes "[object Object]" → collisions.
  • If it contains awkward chars, IDs get messy.

Suggestion

  • If you want IDs to be stable + readable, consider serializeValue(correlationKey) (you’re already using serializeValue elsewhere) or a tiny hash of a stable serialization. ([GitHub]6)

6) Internal __correlationKey property naming

Compiler/runtime appears to rely on a magic __correlationKey on the source row for grouping/limit-per-parent logic. If user data can contain that field, there’s collision risk. ([GitHub]3)

Suggestion

  • Use a Symbol, or a namespaced internal key that can’t realistically collide, or store correlation metadata outside user rows.

Tests I’d add (small but high value)

  1. Alias collision: parent uses alias p, child also uses alias p → assert it throws a clear error.
  2. Correlation inside and(): demonstrate either supported traversal or a friendly “must be top-level eq” error.
  3. Parent update that changes correlation value (even if discouraged): verify the old child collection detaches and the new one attaches (or verify it’s rejected).
  4. Multiple parents share same correlationKey (since your reverse index supports it): ensure attach/update works for all parents in the set. ([GitHub]4)

Bottom line

This is a strong feature with a very nice API and solid initial test coverage. The biggest things I’d address before merging are:

  • alias overlap validation (likely correctness),
  • avoid mutating query IR in compiler,
  • define behavior/constraints for correlation extraction (top-level eq vs expression traversal),
  • and handle or forbid parent correlationKey changes.

@kevin-dp
Copy link
Contributor Author

@samwillis response to codex' review:

  1. Alias collisions between parent and child queries (correctness bug risk)

Valid concern but I don't think it's a real risk in practice. The child query is built via the builder API where the user explicitly declares aliases in .from({ i: issues }). If they reuse p as a child alias, it would shadow the parent's p in the closure scope — so p.id in the child's .where() would already refer to the child's p, not the parent's.

  1. Correlation extraction only matches top-level eq(ref, ref)

Fixed in #1307

  1. Compiler mutates the query IR’s select in-place

This is a fair observation but the compiler already runs on the output of optimizeQuery(), which returns a new object. And the cache is keyed by the raw query, with queryMapping linking optimized → raw. So in practice the mutation happens on a fresh optimized copy, not the user's original IR.

Still a valid code hygiene concern, but it's out of scope for this PR. If it were to be fixed, it should be its own cleanup.

  1. Parent “correlation key” changes on update aren’t handled cleanly

The correlation field is almost always the parent's primary key (e.g., p.id in eq(i.projectId, p.id)). PKs don't change by definition.

Could a user correlate on a non-PK field? Technically yes, but it would be semantically wrong — the correlation field determines how children are grouped to parents. If it's not stable, the entire grouping model is broken, not just the cleanup logic.

I'd lean toward the first suggestion: document/validate that the correlation field should be a stable key (which it naturally is in every real use case). The "move" case handling would add complexity for a scenario that doesn't really make sense to support.

  1. Child collection IDs use String(correlationKey)

True — the correlation field doesn't have to be the PK (which are restricted to string | number). Someone could correlate on any field, and arbitrary field values could be objects, arrays, dates, etc. Even though it's unusual, String() would silently produce "[object Object]" and cause collisions.

Using serializeValue is a cheap one-line fix that makes it robust. Will do.

  1. Internal __correlationKey property naming

This is fine. We have a couple of these reserved properties around the code. It's unlikely someone would use this name.

Regarding the additional tests:

  1. alias collisions: isn't needed because as we explained this is handled by standard shadowing in TS.
  2. correlation inside and(): these tests already exists in follow up PRs since we introduced support for this.
  3. Parent update that changes correlation value: As discussed, the correlation field is practically always the PK which doesn't change. Testing this would be testing undefined behavior. I'd rather document the constraint than test around it.
  4. Multiple parents sharing same correlationKey: This is a good one. It tests a real scenario (e.g., multiple
    projects with the same foreign key value). Added this one.

…hIncludesState reads from that stamp. The stamp is cleaned up at the end of flush so it never leaks to the user
@kevin-dp kevin-dp requested a review from samwillis February 26, 2026 15:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Joins with a hierarchical projection (includes)

2 participants