Blog post on writing table providers by timsaucer · Pull Request #161 · apache/datafusion-site

timsaucer · 2026-03-20T22:06:08Z

This blog post is designed to help new users of DataFusion write their own table providers and understand some of the core concepts.

Preview site: https://datafusion.staged.apache.org/blog/2026/03/20/writing-table-providers/

stuhood

Thanks for doing this!

content/blog/2026-03-31-writing-table-providers.md

content/blog/2026-03-20-writing-table-providers.md

2010YOUY01

LGTM. I read through it and found the concepts well explained and easy to follow. One follow-up after publishing would be to link this blog from the doc comments of related APIs such as TableProvider.

content/blog/2026-03-31-writing-table-providers.md

pgwhalen

As someone who struggled in the past, I'm thrilled to see this get created now! I added some comments that highlight my biggest struggles.

content/blog/2026-03-20-writing-table-providers.md

content/blog/2026-03-31-writing-table-providers.md

content/blog/2026-03-20-writing-table-providers.md

content/blog/2026-03-31-writing-table-providers.md

Co-authored-by: Yongting You <2010youy01@gmail.com>

timsaucer · 2026-03-23T16:54:56Z

Thanks everyone for the feedback. The post is updated in case anyone wants another look.

content/blog/2026-03-31-writing-table-providers.md

alamb · 2026-03-24T19:31:56Z

Starting to check this out

alamb

Thank you so much @timsaucer -- this is really great and I think will help people write table providers a lot

The only thing I think we should be careful of is suggesting that people run CPU work on blocking threads as I don't think that is necessairly best practice -- I left some comments to that effect inline

Also, once we publish this blog, I think it would be sweet to incorporate a bunch of its content into the https://datafusion.apache.org/library-user-guide/custom-table-providers.html section of the doc

content/blog/2026-03-20-writing-table-providers.md

content/blog/2026-03-31-writing-table-providers.md

content/blog/2026-03-20-writing-table-providers.md

content/blog/2026-03-31-writing-table-providers.md

content/blog/2026-03-20-writing-table-providers.md

alamb · 2026-03-31T17:50:31Z

@timsaucer how is this post going? Shall we publish it?

timsaucer · 2026-03-31T18:24:16Z

@timsaucer how is this post going? Shall we publish it?

I was working on it as you pinged. I hope to get it wrapped today

…able

- Clarify intro sentence to mention planning/execution work - Label TableProvider as Logical Plan and ExecutionPlan as Physical Plan - Change "four phases" to "several phases" (list has 5 items) - "Some logical optimizations" and "rewrites such as" to signal non-exhaustive lists - Clarify scan() comment: "don't do any execution work here" - Rewrite partitioning section to lead with simple advice (match data layout) before covering target_partitions and hash partitioning subtleties - Narrow CPU thread pool advice: spawn_blocking is for blocking/long-running work, not all CPU work - Add "scan is single-threaded" as a reason to keep scan() lightweight Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Addresses alamb's suggestion to move the section earlier so readers understand what level of work is required before diving in. - Moved section to just before Layer 1: TableProvider - Trimmed the file-based path detail to a short paragraph with links (the full trait hierarchy was too deep for an intro-position section) - Removed RecordBatchStreamAdapter reference (not yet introduced at that point in the article) - Added a sentence orienting the reader to what the rest of the post covers Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

- Fix use-after-move bug in DatePartitionedExec construction (dirs.len() called after dirs moved into struct field) - Fix incorrect import: SessionState → catalog::Session in CountingTable example - Remove double space before scan_with_args link - Add missing blank line before '### Using EXPLAIN' heading - Split dense 'Only Push Down Filters' paragraph for readability - Change 'full working example' to 'illustrative example' for the filter pushdown code that contains todo!() stubs - Use 'Rerun is building' instead of repeating [Rerun.io] link Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

- Fix grammar: "Best practices are" → "Best practice is" - Remove unused StringArray import from complete example - Fix outdated arrow-datafusion repo link → apache/datafusion - Add missing reviewers to acknowledgements: adriangb, kevinjqliu, Omega359 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

alamb · 2026-04-01T15:52:17Z

Its live!

https://datafusion.apache.org/blog/2026/03/31/writing-table-providers/

alamb · 2026-04-01T16:09:04Z

Also, once we publish this blog, I think it would be sweet to incorporate a bunch of its content into the https://datafusion.apache.org/library-user-guide/custom-table-providers.html section of the doc

FYI filed apache/datafusion#21304 to track

Initial commit for blog post on writing table providers

1fe3d2e

timsaucer marked this pull request as ready for review March 20, 2026 22:09

timsaucer added 2 commits March 20, 2026 18:11

Minor text changes

dea6e3f

Add acknowledgement

2de8b7b

stuhood reviewed Mar 21, 2026

View reviewed changes

content/blog/2026-03-31-writing-table-providers.md Show resolved Hide resolved

content/blog/2026-03-20-writing-table-providers.md Outdated Show resolved Hide resolved

2010YOUY01 approved these changes Mar 22, 2026

View reviewed changes

content/blog/2026-03-31-writing-table-providers.md Show resolved Hide resolved

content/blog/2026-03-31-writing-table-providers.md Show resolved Hide resolved

pgwhalen reviewed Mar 22, 2026

View reviewed changes

adriangb reviewed Mar 22, 2026

View reviewed changes

content/blog/2026-03-31-writing-table-providers.md Show resolved Hide resolved

timsaucer and others added 4 commits March 23, 2026 12:06

Add note about when to add push down filters

0a66a20

Co-authored-by: Yongting You <2010youy01@gmail.com>

Address a variety of user feedback

df813ec

Update links

d72f9c6

pelican processing didn't handle backticks in links well

e03d448

adriangb reviewed Mar 23, 2026

View reviewed changes

content/blog/2026-03-31-writing-table-providers.md Outdated Show resolved Hide resolved

Omega359 reviewed Mar 24, 2026

View reviewed changes

content/blog/2026-03-31-writing-table-providers.md Show resolved Hide resolved

alamb mentioned this pull request Mar 25, 2026

Add example implementing filter pushdown apache/datafusion#21145

Open

alamb approved these changes Mar 25, 2026

View reviewed changes

kevinjqliu reviewed Mar 28, 2026

View reviewed changes

content/blog/2026-03-20-writing-table-providers.md Outdated Show resolved Hide resolved

timsaucer and others added 9 commits March 31, 2026 14:48

Add an explanation of different ways to use FileFormat for a ListingT…

dfc0520

…able

update date

46edd5a

Add link to thread_pools example for blocking work section

e076627

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

remove use statements from example

981d98d

revert section on scan_with_args and drop to single line

5c61689

Add reviewer acknowledgements to blog post

3d74c05

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

timsaucer and others added 3 commits March 31, 2026 15:56

Add 'Get Involved' section

735ea2f

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

make it alphabetical

d286cba

timsaucer merged commit f4ee574 into main Mar 31, 2026
4 checks passed

timsaucer deleted the site/writing-table-providers branch March 31, 2026 20:13

alamb mentioned this pull request Apr 1, 2026

Incorporate "Writing Custom Table Providers in Apache DataFusion" into the main datafusion docs apache/datafusion#21304

Open

Conversation

timsaucer commented Mar 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

stuhood left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

2010YOUY01 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

pgwhalen left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

timsaucer commented Mar 23, 2026

Uh oh!

Uh oh!

Uh oh!

alamb commented Mar 24, 2026

Uh oh!

alamb left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

alamb commented Mar 31, 2026

Uh oh!

timsaucer commented Mar 31, 2026

Uh oh!

Uh oh!

alamb commented Apr 1, 2026

Uh oh!

alamb commented Apr 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

8 participants

timsaucer commented Mar 20, 2026 •

edited

Loading

stuhood left a comment •

edited

Loading