diff --git a/content/blog/2026-04-02-datafusion-53.0.0.md b/content/blog/2026-04-02-datafusion-53.0.0.md new file mode 100644 index 00000000..030e4f37 --- /dev/null +++ b/content/blog/2026-04-02-datafusion-53.0.0.md @@ -0,0 +1,402 @@ +--- +layout: post +title: Apache DataFusion 53.0.0 Released +date: 2026-04-02 +author: pmc +categories: [release] +--- + + + +[TOC] + +We are proud to announce the release of [DataFusion 53.0.0]. This post highlights +some of the major improvements since [DataFusion 52.0.0]. The complete list of +changes is available in the [changelog]. Thanks to the [114 contributors] for +making this release possible. + +[DataFusion 53.0.0]: https://crates.io/crates/datafusion/53.0.0 +[DataFusion 52.0.0]: https://datafusion.apache.org/blog/2026/01/12/datafusion-52.0.0/ +[changelog]: https://github.com/apache/datafusion/blob/branch-53/dev/changelog/53.0.0.md +[114 contributors]: https://github.com/apache/datafusion/blob/branch-53/dev/changelog/53.0.0.md#credits + +## Performance Improvements 🚀 + + + +**Figure 1**: Average and median normalized execution times for DataFusion 53.0.0 on ClickBench queries, compared to previous releases. +Query times are normalized using the ClickBench definition. See the +[DataFusion Benchmarking Page](https://alamb.github.io/datafusion-benchmarking/) +for more details. + +DataFusion 53 continues the project-wide focus on performance. This release +reduces planning overhead, skips more unnecessary I/O, and pushes more work +into earlier and cheaper stages of execution. + +### `LIMIT`-Aware Parquet Row Group Pruning + +DataFusion 53 includes a new optimization that makes Parquet pruning aware of +`LIMIT`. This optimization is described in full in [limit pruning blog post]. If +DataFusion can prove that an entire row group matches the predicate, and those +fully matching row groups contain enough rows to satisfy the `LIMIT`, partially +matching row groups are skipped entirely. + +
+ +
Figure 2: Limit pruning is inserted between row group and page index pruning.
+
+ +Thanks to [@xudong963] for implementing this feature. Related PRs: [#18868] + +### Improved Filter Pushdown + +DataFusion 53 pushes filters down through more join types and through `UnionExec`, +and expands support for pushing down [dynamic filters]. More +pushdown means fewer rows flow into joins, repartitions, and later operators, +which reduces CPU, memory, and I/O. + +For example: + +```sql +SELECT * +FROM ( + SELECT * + FROM t1 + LEFT ANTI JOIN t2 ON t1.k = t2.k +) a +JOIN t1 b ON a.k = b.k +WHERE b.v = 1; +``` + +Now DataFusion can often transform the physical plan so filters and +[dynamic filters] are pushed deeper into the plan, even through subqueries and +nested joins. In this example, the filter on `b.v` helps produce dynamic filters +that can be pushed into both sides of the nested anti join. + +
+ +
Figure 3: DataFusion 53 pushes dynamic filters through subqueries and into both sides of nested joins.
+
+ +Thanks to [@nuno-faria], [@haohuaijin], and [@jackkleeman] for +driving this work. Related PRs: [#19918], [#20145], [#20192] + +### Faster Query Planning + +DataFusion 53 improves query planning performance by making immutable pieces of +execution plans cheaper to clone. This helps applications that need extremely +low latency, plan many or complex queries, or use prepared statements or +parameterized queries. In some benchmarks, overall execution time drops from +roughly 4-5 ms to about 100 us. + +Thanks to [@askalt] for leading this work. Related PRs: [#19792], [#19893] + +### Faster Functions + +DataFusion includes [235 built-in functions]. Improving the performance of these +functions benefits a wide range of workloads. This release improves the performance of 42 of those +functions, such as [strpos], [replace], [concat], [translate], [array_has], +[array_agg], [left], [right], and [case_when]. + +Thanks to the contributors who drove this work, especially [@neilconway], +[@theirix], [@lyne7-sc], [@kumarUjjawal], [@pepijnve], [@zhangxffff], and +[@UBarney]. + + +### Nested Field Pushdown + +DataFusion 53 pushes expressions such as `get_field` down the plan and into data +sources. This is especially important for nested data such as structs in +Parquet files. Instead of reading an entire struct column and then extracting +the field of interest, DataFusion 53 pushes the field extraction into the scan. + +For example, the following query reads a struct column `s` and extracts the +`label` field for rows where the `value` field is greater than 150: + +```sql +SELECT id, s['label'] +FROM t +WHERE s['value'] > 150; +``` + +
+ +
Figure 4: DataFusion 53 pushes field-access expressions closer to the scan.
+
+ +Special thanks to [@adriangb] for designing and implementing this optimizer +work. Related PRs: [#20065], [#20117], [#20239] + +## New Features ✨ + +* **JSON Array File Support**: DataFusion 53 can now read JSON arrays such as + `[{...}, {...}]` directly as multiple rows, including streaming inputs from + object stores. Thanks to [@zhuqi-lucas] for implementing this feature. + Related PRs: [#19924] + +* **Support for `:` operator**: DataFusion can plan queries such as + `SELECT payload:'user_id' FROM events;`, enabling better [Parquet Variant] + support via [datafusion-variant]. Thanks to [@Samyak2]. Related PRs: + [#20717] + +* **New SQL**: DataFusion supports additional set-comparison subqueries, null-aware + anti join, and deletion predicates. Thanks to [@waynexia], [@viirya], and + [@askalt] for key contributions in this area. Related PRs: [#19109], + [#19635], [#20137] + +* **Spark-Compatible Functions**: This release includes almost 20 new or improved + Spark-compatible functions and behaviors in the [datafusion-spark crate]. + It includes functions such as [collect_list], [date_diff], + [from_utc_timestamp], [json_tuple], [arrays_zip], [bin], and [array_contains]. + Thanks to the contributors who drove this work, especially [@cht42], + [@CuteChuanChuan], [@SubhamSinghal], [@kazantsev-maksim], [@unknowntpo], + [@aryan-212], [@hsiang-c], and [@davidlghellin]. + +## Stability and Release Engineering 🦺 + +The community spent significant time this release cycle stabilizing the release +branch and improving the release process. While such improvements are not as +headline-friendly as new features, they are highly important for real +deployments. We are discussing ways to improve the process on [#21034] and would +welcome suggestions and contributions to help with release engineering work in +the future. + +Thanks to [@comphead] for running this release, and to [@jonathanc-n], [@alamb], +[@xanderbailey], [@haohuaijin], [@friendlymatthew], [@fwojciec], +[@Kontinuation], [@nathanb9], and many others who helped stabilize the release +branch. + +## Upgrade Notes + +DataFusion 53 includes some breaking changes, including updates to the SQL +parser, optimizer behavior, and some physical-plan APIs. Please see the [upgrade +guide] and [changelog] for the full details before upgrading. + +## Known Issues + +A small number of issues were discovered after the 53.0.0 release, +and we expect to publish DataFusion 53.1.0 soon. See the [53.1.0 release tracking +issue] for the latest status. + + +## Thank You + +Thank you to everyone in the DataFusion community who contributed code, reviews, +testing, bug reports, documentation, and release engineering work for 53.0.0. +This release contains direct contributions from 114 different people, and we are +grateful for the time and effort that everyone put in to make it happen. + +[limit pruning blog post]: https://datafusion.apache.org/blog/2026/03/20/limit-pruning/ +[@neilconway]: https://github.com/neilconway +[@xudong963]: https://github.com/xudong963 +[@nuno-faria]: https://github.com/nuno-faria +[@haohuaijin]: https://github.com/haohuaijin +[@jackkleeman]: https://github.com/jackkleeman +[@askalt]: https://github.com/askalt +[@adriangb]: https://github.com/adriangb +[@zhuqi-lucas]: https://github.com/zhuqi-lucas +[@linhr]: https://github.com/linhr +[@Samyak2]: https://github.com/Samyak2 +[@timsaucer]: https://github.com/timsaucer +[@waynexia]: https://github.com/waynexia +[@viirya]: https://github.com/viirya +[@jonathanc-n]: https://github.com/jonathanc-n +[@alamb]: https://github.com/alamb +[@xanderbailey]: https://github.com/xanderbailey +[@friendlymatthew]: https://github.com/friendlymatthew +[@fwojciec]: https://github.com/fwojciec +[@Kontinuation]: https://github.com/Kontinuation +[@nathanb9]: https://github.com/nathanb9 +[@comphead]: https://github.com/comphead +[@cht42]: https://github.com/cht42 +[@CuteChuanChuan]: https://github.com/CuteChuanChuan +[@SubhamSinghal]: https://github.com/SubhamSinghal +[@kazantsev-maksim]: https://github.com/kazantsev-maksim +[@unknowntpo]: https://github.com/unknowntpo +[@aryan-212]: https://github.com/aryan-212 +[@hsiang-c]: https://github.com/hsiang-c +[@davidlghellin]: https://github.com/davidlghellin +[@theirix]: https://github.com/theirix +[@lyne7-sc]: https://github.com/lyne7-sc +[@kumarUjjawal]: https://github.com/kumarUjjawal +[@pepijnve]: https://github.com/pepijnve +[@zhangxffff]: https://github.com/zhangxffff +[@UBarney]: https://github.com/UBarney + +[abs_ansi]: https://github.com/apache/datafusion/pull/18828 +[add_months]: https://github.com/apache/datafusion/pull/19711 +[array_agg]: https://github.com/apache/datafusion/pull/20504 +[array_contains]: https://github.com/apache/datafusion/pull/20685 +[array_distinct]: https://github.com/apache/datafusion/pull/20364 +[array_has]: https://github.com/apache/datafusion/pull/20374 +[array_has_any]: https://github.com/apache/datafusion/pull/20385 +[array_intersect]: https://github.com/apache/datafusion/pull/20243 +[array_position]: https://github.com/apache/datafusion/pull/20532 +[array_remove]: https://github.com/apache/datafusion/pull/19996 +[array_repeat_func]: https://github.com/apache/datafusion/pull/20049 +[array_repeat_spark]: https://github.com/apache/datafusion/pull/19702 +[array_to_string]: https://github.com/apache/datafusion/pull/20553 +[array_union]: https://github.com/apache/datafusion/pull/20243 +[arrays_zip]: https://github.com/apache/datafusion/pull/20440 +[ascii]: https://github.com/apache/datafusion/pull/19951 +[atan2]: https://github.com/apache/datafusion/pull/20336 +[base64]: https://github.com/apache/datafusion/pull/19968 +[bin]: https://github.com/apache/datafusion/pull/20479 +[bitmap_bit_position]: https://github.com/apache/datafusion/pull/20275 +[bitmap_bucket_number]: https://github.com/apache/datafusion/pull/20288 +[case_when]: https://github.com/apache/datafusion/pull/20097 +[chr]: https://github.com/apache/datafusion/pull/20073 +[collect_list]: https://github.com/apache/datafusion/pull/19699 +[collect_set]: https://github.com/apache/datafusion/pull/19699 +[concat]: https://github.com/apache/datafusion/pull/20317 +[date_diff]: https://github.com/apache/datafusion/pull/19845 +[from_utc_timestamp]: https://github.com/apache/datafusion/pull/19880 +[hash_table_lookup]: https://github.com/apache/datafusion/pull/19602 +[in_list]: https://github.com/apache/datafusion/pull/20528 +[initcap]: https://github.com/apache/datafusion/pull/20352 +[json_tuple]: https://github.com/apache/datafusion/pull/20412 +[left]: https://github.com/apache/datafusion/pull/19980 +[lpad]: https://github.com/apache/datafusion/pull/20278 +[negative]: https://github.com/apache/datafusion/pull/20006 +[regexp_like]: https://github.com/apache/datafusion/pull/20354 +[replace]: https://github.com/apache/datafusion/pull/20344 +[right]: https://github.com/apache/datafusion/pull/20069 +[round]: https://github.com/apache/datafusion/pull/19831 +[rpad]: https://github.com/apache/datafusion/pull/20278 +[signum]: https://github.com/apache/datafusion/pull/19871 +[slice]: https://github.com/apache/datafusion/pull/19811 +[strpos]: https://github.com/apache/datafusion/pull/20295 +[string_to_map]: https://github.com/apache/datafusion/pull/20120 +[substring]: https://github.com/apache/datafusion/pull/19805 +[to_array_of_size]: https://github.com/apache/datafusion/pull/20459 +[to_utc_timestamp]: https://github.com/apache/datafusion/pull/19880 +[translate]: https://github.com/apache/datafusion/pull/20305 +[trim]: https://github.com/apache/datafusion/pull/20328 +[unbase64]: https://github.com/apache/datafusion/pull/19968 +[unix_date]: https://github.com/apache/datafusion/pull/19892 +[unix_timestamp]: https://github.com/apache/datafusion/pull/19892 + +[#18868]: https://github.com/apache/datafusion/pull/18868 +[#18828]: https://github.com/apache/datafusion/pull/18828 +[#19109]: https://github.com/apache/datafusion/pull/19109 +[#19592]: https://github.com/apache/datafusion/pull/19592 +[#19635]: https://github.com/apache/datafusion/pull/19635 +[#19699]: https://github.com/apache/datafusion/pull/19699 +[#19702]: https://github.com/apache/datafusion/pull/19702 +[#19711]: https://github.com/apache/datafusion/pull/19711 +[#19792]: https://github.com/apache/datafusion/pull/19792 +[#19805]: https://github.com/apache/datafusion/pull/19805 +[#19811]: https://github.com/apache/datafusion/pull/19811 +[#19829]: https://github.com/apache/datafusion/pull/19829 +[#19845]: https://github.com/apache/datafusion/pull/19845 +[#19865]: https://github.com/apache/datafusion/pull/19865 +[#19880]: https://github.com/apache/datafusion/pull/19880 +[#19892]: https://github.com/apache/datafusion/pull/19892 +[#19893]: https://github.com/apache/datafusion/pull/19893 +[#19918]: https://github.com/apache/datafusion/pull/19918 +[#19924]: https://github.com/apache/datafusion/pull/19924 +[#19951]: https://github.com/apache/datafusion/pull/19951 +[#19996]: https://github.com/apache/datafusion/pull/19996 +[#19968]: https://github.com/apache/datafusion/pull/19968 +[#19984]: https://github.com/apache/datafusion/pull/19984 +[#19977]: https://github.com/apache/datafusion/pull/19977 +[#20049]: https://github.com/apache/datafusion/pull/20049 +[#20065]: https://github.com/apache/datafusion/pull/20065 +[#20006]: https://github.com/apache/datafusion/pull/20006 +[#20073]: https://github.com/apache/datafusion/pull/20073 +[#20097]: https://github.com/apache/datafusion/pull/20097 +[#20117]: https://github.com/apache/datafusion/pull/20117 +[#20137]: https://github.com/apache/datafusion/pull/20137 +[#20145]: https://github.com/apache/datafusion/pull/20145 +[#20192]: https://github.com/apache/datafusion/pull/20192 +[#20120]: https://github.com/apache/datafusion/pull/20120 +[#20243]: https://github.com/apache/datafusion/pull/20243 +[#20239]: https://github.com/apache/datafusion/pull/20239 +[#20275]: https://github.com/apache/datafusion/pull/20275 +[#20288]: https://github.com/apache/datafusion/pull/20288 +[#20278]: https://github.com/apache/datafusion/pull/20278 +[#20295]: https://github.com/apache/datafusion/pull/20295 +[#20305]: https://github.com/apache/datafusion/pull/20305 +[#20317]: https://github.com/apache/datafusion/pull/20317 +[#20323]: https://github.com/apache/datafusion/pull/20323 +[#20328]: https://github.com/apache/datafusion/pull/20328 +[#20336]: https://github.com/apache/datafusion/pull/20336 +[#20344]: https://github.com/apache/datafusion/pull/20344 +[#20354]: https://github.com/apache/datafusion/pull/20354 +[#20352]: https://github.com/apache/datafusion/pull/20352 +[#20374]: https://github.com/apache/datafusion/pull/20374 +[#20385]: https://github.com/apache/datafusion/pull/20385 +[#20364]: https://github.com/apache/datafusion/pull/20364 +[#20412]: https://github.com/apache/datafusion/pull/20412 +[#20440]: https://github.com/apache/datafusion/pull/20440 +[#20461]: https://github.com/apache/datafusion/pull/20461 +[#20479]: https://github.com/apache/datafusion/pull/20479 +[#20459]: https://github.com/apache/datafusion/pull/20459 +[#20548]: https://github.com/apache/datafusion/pull/20548 +[#20504]: https://github.com/apache/datafusion/pull/20504 +[#20532]: https://github.com/apache/datafusion/pull/20532 +[#20538]: https://github.com/apache/datafusion/pull/20538 +[#20553]: https://github.com/apache/datafusion/pull/20553 +[#20528]: https://github.com/apache/datafusion/pull/20528 +[#20685]: https://github.com/apache/datafusion/pull/20685 +[#20717]: https://github.com/apache/datafusion/pull/20717 +[#20722]: https://github.com/apache/datafusion/pull/20722 +[#20726]: https://github.com/apache/datafusion/pull/20726 +[#20791]: https://github.com/apache/datafusion/pull/20791 +[#20792]: https://github.com/apache/datafusion/pull/20792 +[#20882]: https://github.com/apache/datafusion/pull/20882 +[#20883]: https://github.com/apache/datafusion/pull/20883 +[#20884]: https://github.com/apache/datafusion/pull/20884 +[#20890]: https://github.com/apache/datafusion/pull/20890 +[#20891]: https://github.com/apache/datafusion/pull/20891 +[#20892]: https://github.com/apache/datafusion/pull/20892 +[#20895]: https://github.com/apache/datafusion/pull/20895 +[#20898]: https://github.com/apache/datafusion/pull/20898 +[#20903]: https://github.com/apache/datafusion/pull/20903 +[#20918]: https://github.com/apache/datafusion/pull/20918 +[#20932]: https://github.com/apache/datafusion/pull/20932 +[dynamic filters]: https://datafusion.apache.org/blog/2025/09/10/dynamic-filters +[235 built-in functions]: https://datafusion.apache.org/user-guide/sql/scalar_functions.html +[datafusion-variant]: https://github.com/datafusion-contrib/datafusion-variant +[Parquet Variant]: https://parquet.apache.org/blog/2026/02/27/variant-type-in-apache-parquet-for-semi-structured-data/ +[#21034]: https://github.com/apache/datafusion/issues/21034 +[53.1.0 release tracking issue]: https://github.com/apache/datafusion/issues/21079 +[upgrade guide]: https://datafusion.apache.org/library-user-guide/upgrading/index.html +[datafusion-spark crate]: https://docs.rs/datafusion-spark/latest/datafusion_spark/index.html diff --git a/content/images/datafusion-53.0.0/field-access-pushdown.svg b/content/images/datafusion-53.0.0/field-access-pushdown.svg new file mode 100644 index 00000000..e0c7b90f --- /dev/null +++ b/content/images/datafusion-53.0.0/field-access-pushdown.svg @@ -0,0 +1,58 @@ + + + + + + + + + + Before + + + + ProjectionExec + s['label'] + + + + + FilterExec + s['value'] > 150 + + + + + DataSourceExec + projection=[id, s] + reads full struct column + + + After + + + + ProjectionExec + __label + + + + + FilterExec + __value > 150 + + + + + DataSourceExec + projection=[id, s['label'], s['value']] + field access extracted near the scan + diff --git a/content/images/datafusion-53.0.0/join-filter-pushdown.svg b/content/images/datafusion-53.0.0/join-filter-pushdown.svg new file mode 100644 index 00000000..141dd827 --- /dev/null +++ b/content/images/datafusion-53.0.0/join-filter-pushdown.svg @@ -0,0 +1,99 @@ + + + + + + + + + Before + Dynamic filters stop at the subquery boundary + + + + HashJoinExec + a.k = b.k + + + + + + Subquery a + + + HashJoinExec + LeftAnti + + + + + + DataSourceExec + t1 + + + DataSourceExec + t2 + + + FilterExec + b.v = 1 + + + + + DataSourceExec + t1 as b + + No dynamic filters reach the nested join inputs + + After + Dynamic filters pushed through the subquery into both scans + + + + HashJoinExec + a.k = b.k + + + + + + Subquery a + + + HashJoinExec + LeftAnti + + + + + + DataSourceExec + t1 IN <Dynamic Filter> + + + DataSourceExec + t2 IN <Dynamic Filter> + + + FilterExec + b.v = 1 + + + + + DataSourceExec + t1 as b + + Dynamic filters are pushed into both sides of the nested anti join + diff --git a/content/images/datafusion-53.0.0/performance_over_time_clickbench.png b/content/images/datafusion-53.0.0/performance_over_time_clickbench.png new file mode 100644 index 00000000..74f23c18 Binary files /dev/null and b/content/images/datafusion-53.0.0/performance_over_time_clickbench.png differ