Skip to content

docs: datafusion-future-improvements.md (post-Lance-6 DF state + future work)#114

Open
aaltshuler wants to merge 3 commits into
mainfrom
andrew/datafusion-future-improvements-doc
Open

docs: datafusion-future-improvements.md (post-Lance-6 DF state + future work)#114
aaltshuler wants to merge 3 commits into
mainfrom
andrew/datafusion-future-improvements-doc

Conversation

@aaltshuler

@aaltshuler aaltshuler commented May 23, 2026

Copy link
Copy Markdown
Collaborator

Summary

New dev doc capturing the DataFusion state of the codebase after PR #111 (Lance 4 → 6) and PR #113 (structured Expr pushdown). The chat-history-only record of "what we did, what's free, what's still on the table" gets a permanent home.

Structure

  • Current pin — DF 53.1.0 (workspace pin + features)
  • Direct touchpoints — only 2 sites in our code (narrow surface)
  • Shipped — PR-by-PR delta of what's landed
  • Passive wins — DF 53 optimizer/perf wins active automatically, each linked to the upstream DF PR with a where-it-bites-us note
  • Still on the table — ranked by tier:
  • Upstream cadence — Lance dictates the DF version; we follow
  • Maintenance — explicit rules for keeping the doc current

Why now

We've made two material DataFusion-side moves in two PRs (bumped to 53, switched the bulk of read-path pushdown to structured Expr). Without a doc, the next time someone asks "where are we with DataFusion?" we re-derive it from chat history. Doc costs ~120 lines, saves that re-derivation cost on every future ask.

Test plan

  • scripts/check-agents-md.sh clean (cross-link integrity — 35 links, 34 docs)
  • Doc linked from docs/dev/index.md
  • No code changes; docs-only

🤖 Generated with Claude Code


Open in Devin Review

Captures the post-PR #111 (Lance 4→6) + PR #113 (structured Expr
pushdown) DataFusion state in one place, so future maintainers don't
have to re-derive what's done, what's free, and what's still on the
table from chat history.

Structure:
- Direct touchpoints (only 2 — narrow surface)
- Shipped: PR-by-PR delta of what's landed
- Passive wins active on DF 53 (PR-linked, with where-it-bites-us
  notes)
- Still on the table, ranked by tier:
  - T1: structural, unblocked today (hydrate_nodes Expr pushdown)
  - T2: gated on Lance v7 (delete Expr via MR-A / issue #112)
  - T3: future-shape unlocks (extension planner, expression
    placement, etc.)
  - T4: won't reach us without major changes (custom ExecutionPlan
    territory)
- Upstream cadence note (Lance dictates the DF version)
- Maintenance section

Linked from docs/dev/index.md so the check-agents-md CI guard
passes.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@aaltshuler aaltshuler requested a review from ragnorc as a code owner May 23, 2026 11:52

@cubic-dev-ai cubic-dev-ai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No issues found across 2 files

Re-trigger cubic

@devin-ai-integration devin-ai-integration Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Devin Review: No Issues Found

Devin Review analyzed this PR and found no bugs or issues to report.

Open in Devin Review

aaltshuler and others added 2 commits June 6, 2026 19:36
Lance 7.0.0 shipped stable 2026-05-28 and still pins datafusion = "^53"
/ arrow = "^58" (verified against the published 7.0.0 dependency
manifest), so the pending 6.0.1 -> 7.0.0 bump is not a DataFusion bump:
the "Passive wins" table is unchanged.

- Current-pin stanza: note 7.0.0 is available upstream and holds DF ^53.
- Tier 2: the delete-Expr item's upstream gate (execute_uncommitted,
  lance#6658) is now satisfied (in 7.0.0 stable); reframe the trigger as
  our own 6->7 bump rather than waiting on a Lance release.
- Upstream cadence: correct the pre-release speculation — 7.0.0 stayed on
  DF 53; a DF 54/55 jump is deferred to a later Lance.
- Drop the brittle exec/query.rs:771-796 line range (drifted; hydrate_nodes
  is at 863 on main) in favor of the stable function name.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant