Skip to content

Commit 1d5beb7

Browse files
authored
Comet 0.14.0 blog post (#163)
1 parent 7ac5df0 commit 1d5beb7

1 file changed

Lines changed: 114 additions & 0 deletions

File tree

Lines changed: 114 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,114 @@
1+
---
2+
layout: post
3+
title: Apache DataFusion Comet 0.14.0 Release
4+
date: 2026-03-18
5+
author: pmc
6+
categories: [subprojects]
7+
---
8+
9+
<!--
10+
{% comment %}
11+
Licensed to the Apache Software Foundation (ASF) under one or more
12+
contributor license agreements. See the NOTICE file distributed with
13+
this work for additional information regarding copyright ownership.
14+
The ASF licenses this file to you under the Apache License, Version 2.0
15+
(the "License"); you may not use this file except in compliance with
16+
the License. You may obtain a copy of the License at
17+
18+
http://www.apache.org/licenses/LICENSE-2.0
19+
20+
Unless required by applicable law or agreed to in writing, software
21+
distributed under the License is distributed on an "AS IS" BASIS,
22+
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
23+
See the License for the specific language governing permissions and
24+
limitations under the License.
25+
{% endcomment %}
26+
-->
27+
28+
[TOC]
29+
30+
The Apache DataFusion PMC is pleased to announce version 0.14.0 of the [Comet](https://datafusion.apache.org/comet/) subproject.
31+
32+
Comet is an accelerator for Apache Spark that translates Spark physical plans to DataFusion physical plans for
33+
improved performance and efficiency without requiring any code changes.
34+
35+
This release covers approximately eight weeks of development work and is the result of merging 189 PRs from 21
36+
contributors. See the [change log] for more information.
37+
38+
[change log]: https://github.com/apache/datafusion-comet/blob/main/dev/changelog/0.14.0.md
39+
40+
## Key Features
41+
42+
### Native Iceberg Improvements
43+
44+
Comet's fully-native Iceberg integration received several enhancements:
45+
46+
**Per-Partition Plan Serialization**: `CometExecRDD` now supports per-partition plan data, reducing serialization
47+
overhead for native Iceberg scans and enabling dynamic partition pruning (DPP).
48+
49+
**Vended Credentials**: Native Iceberg scans now support passing vended credentials from the catalog, improving
50+
integration with cloud storage services.
51+
52+
**Upstream Reader Performance Improvements**: The Comet team contributed a number of
53+
[reader performance improvements](https://iceberg.apache.org/blog/apache-iceberg-rust-0.9.0-release/#reader-performance-improvements)
54+
to iceberg-rust 0.9.0, which Comet now uses. These improvements benefit all iceberg-rust users.
55+
56+
**Performance Optimizations**:
57+
58+
- Single-pass `FileScanTask` validation for reduced planning overhead
59+
- Configurable data file concurrency via `spark.comet.scan.icebergNative.dataFileConcurrencyLimit`
60+
- Channel-based executor thread parking instead of `yield_now()` for reduced CPU overhead
61+
- Reuse of `CometConf` and native utility instances in batch decoding
62+
63+
### Native Columnar-to-Row Conversion
64+
65+
Comet now uses a native columnar-to-row (C2R) conversion by default. This
66+
feature replaces Comet's JVM-based columnar-to-row transition with a native Rust implementation, reducing JVM memory overhead
67+
when data flows from Comet's native execution back to Spark operators that require row-based input.
68+
69+
### New Expressions
70+
71+
This release adds support for the following expressions:
72+
73+
- Date/time functions: `make_date`, `next_day`
74+
- String functions: `right`, `string_split`, `luhn_check`
75+
- Math functions: `crc32`
76+
- Map functions: `map_contains_key`, `map_from_entries`
77+
- Conversion functions: `to_csv`
78+
- Cast support: date to timestamp, numeric to timestamp, integer to binary, boolean to decimal, date to numeric
79+
80+
### ANSI Mode Error Messages
81+
82+
ANSI SQL mode now produces proper error messages matching Spark's expected output, improving compatibility for
83+
workloads that rely on strict SQL error handling.
84+
85+
### DataFusion Configuration Passthrough
86+
87+
DataFusion session-level configurations can now be set directly from Spark using the `spark.comet.datafusion.*`
88+
prefix. This enables tuning DataFusion internals such as batch sizes and memory limits without modifying Comet code.
89+
90+
## Performance Improvements
91+
92+
This release includes extensive performance optimizations:
93+
94+
- **Sum aggregation**: Specialized implementations for each eval mode eliminate per-row mode checks
95+
- **Contains expression**: SIMD-based scalar pattern search for faster string matching
96+
- **Batch coalescing**: Reduced IPC schema overhead in `BufBatchWriter` by coalescing small batches
97+
- **Tokio runtime**: Worker threads now initialize from `spark.executor.cores` for better resource utilization
98+
- **Decimal expressions**: Optimized decimal arithmetic operations
99+
- **Row-to-columnar transition**: Improved performance for JVM shuffle data conversion
100+
- **Aligned pointer reads**: Optimized `SparkUnsafeRow` field accessors using aligned memory reads
101+
102+
## Deprecations and Removals
103+
104+
The deprecated `native_comet` scan mode has been removed. Use `native_datafusion` instead. Note
105+
that the `native_iceberg_compat` scan is now deprecated and will be removed from a future release.
106+
107+
## Compatibility
108+
109+
This release upgrades to DataFusion 52.3, Arrow 57.3, and iceberg-rust 0.9.0. Published binaries now target
110+
x86-64-v3 and neoverse-n1 CPU architectures for improved performance on modern hardware.
111+
112+
Supported platforms include Spark 3.4.3, 3.5.4-3.5.8, and Spark 4.0.x with various JDK and Scala combinations.
113+
114+
The community encourages users to test Comet with existing Spark workloads and welcomes contributions to ongoing development.

0 commit comments

Comments
 (0)