Skip to content

Commit f4ee574

Browse files
timsaucer2010YOUY01claude
authored
Blog post on writing table providers (#161)
* Initial commit for blog post on writing table providers * Minor text changes * Add acknowledgement * Add note about when to add push down filters Co-authored-by: Yongting You <2010youy01@gmail.com> * Address a variety of user feedback * Update links * pelican processing didn't handle backticks in links well * Add an explanation of different ways to use FileFormat for a ListingTable * Address alamb review feedback - Clarify intro sentence to mention planning/execution work - Label TableProvider as Logical Plan and ExecutionPlan as Physical Plan - Change "four phases" to "several phases" (list has 5 items) - "Some logical optimizations" and "rewrites such as" to signal non-exhaustive lists - Clarify scan() comment: "don't do any execution work here" - Rewrite partitioning section to lead with simple advice (match data layout) before covering target_partitions and hash partitioning subtleties - Narrow CPU thread pool advice: spawn_blocking is for blocking/long-running work, not all CPU work - Add "scan is single-threaded" as a reason to keep scan() lightweight Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * update date * Add link to thread_pools example for blocking work section Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * remove use statements from example * revert section on scan_with_args and drop to single line * Move 'Choosing the Right Starting Point' before Layer 1 Addresses alamb's suggestion to move the section earlier so readers understand what level of work is required before diving in. - Moved section to just before Layer 1: TableProvider - Trimmed the file-based path detail to a short paragraph with links (the full trait hierarchy was too deep for an intro-position section) - Removed RecordBatchStreamAdapter reference (not yet introduced at that point in the article) - Added a sentence orienting the reader to what the rest of the post covers Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * Fix pre-publish review issues - Fix use-after-move bug in DatePartitionedExec construction (dirs.len() called after dirs moved into struct field) - Fix incorrect import: SessionState → catalog::Session in CountingTable example - Remove double space before scan_with_args link - Add missing blank line before '### Using EXPLAIN' heading - Split dense 'Only Push Down Filters' paragraph for readability - Change 'full working example' to 'illustrative example' for the filter pushdown code that contains todo!() stubs - Use 'Rerun is building' instead of repeating [Rerun.io] link Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * Add reviewer acknowledgements to blog post Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * Add 'Get Involved' section Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * Final pre-publish fixes - Fix grammar: "Best practices are" → "Best practice is" - Remove unused StringArray import from complete example - Fix outdated arrow-datafusion repo link → apache/datafusion - Add missing reviewers to acknowledgements: adriangb, kevinjqliu, Omega359 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * make it alphabetical --------- Co-authored-by: Yongting You <2010youy01@gmail.com> Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
1 parent 64c98bf commit f4ee574

1 file changed

Lines changed: 916 additions & 0 deletions

File tree

0 commit comments

Comments
 (0)