|
| 1 | +--- |
| 2 | +layout: post |
| 3 | +title: Apache DataFusion Comet 0.14.0 Release |
| 4 | +date: 2026-03-18 |
| 5 | +author: pmc |
| 6 | +categories: [subprojects] |
| 7 | +--- |
| 8 | + |
| 9 | +<!-- |
| 10 | +{% comment %} |
| 11 | +Licensed to the Apache Software Foundation (ASF) under one or more |
| 12 | +contributor license agreements. See the NOTICE file distributed with |
| 13 | +this work for additional information regarding copyright ownership. |
| 14 | +The ASF licenses this file to you under the Apache License, Version 2.0 |
| 15 | +(the "License"); you may not use this file except in compliance with |
| 16 | +the License. You may obtain a copy of the License at |
| 17 | +
|
| 18 | +http://www.apache.org/licenses/LICENSE-2.0 |
| 19 | +
|
| 20 | +Unless required by applicable law or agreed to in writing, software |
| 21 | +distributed under the License is distributed on an "AS IS" BASIS, |
| 22 | +WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. |
| 23 | +See the License for the specific language governing permissions and |
| 24 | +limitations under the License. |
| 25 | +{% endcomment %} |
| 26 | +--> |
| 27 | + |
| 28 | +[TOC] |
| 29 | + |
| 30 | +The Apache DataFusion PMC is pleased to announce version 0.14.0 of the [Comet](https://datafusion.apache.org/comet/) subproject. |
| 31 | + |
| 32 | +Comet is an accelerator for Apache Spark that translates Spark physical plans to DataFusion physical plans for |
| 33 | +improved performance and efficiency without requiring any code changes. |
| 34 | + |
| 35 | +This release covers approximately eight weeks of development work and is the result of merging 189 PRs from 21 |
| 36 | +contributors. See the [change log] for more information. |
| 37 | + |
| 38 | +[change log]: https://github.com/apache/datafusion-comet/blob/main/dev/changelog/0.14.0.md |
| 39 | + |
| 40 | +## Key Features |
| 41 | + |
| 42 | +### Native Iceberg Improvements |
| 43 | + |
| 44 | +Comet's fully-native Iceberg integration received several enhancements: |
| 45 | + |
| 46 | +**Per-Partition Plan Serialization**: `CometExecRDD` now supports per-partition plan data, reducing serialization |
| 47 | +overhead for native Iceberg scans and enabling dynamic partition pruning (DPP). |
| 48 | + |
| 49 | +**Vended Credentials**: Native Iceberg scans now support passing vended credentials from the catalog, improving |
| 50 | +integration with cloud storage services. |
| 51 | + |
| 52 | +**Upstream Reader Performance Improvements**: The Comet team contributed a number of |
| 53 | +[reader performance improvements](https://iceberg.apache.org/blog/apache-iceberg-rust-0.9.0-release/#reader-performance-improvements) |
| 54 | +to iceberg-rust 0.9.0, which Comet now uses. These improvements benefit all iceberg-rust users. |
| 55 | + |
| 56 | +**Performance Optimizations**: |
| 57 | + |
| 58 | +- Single-pass `FileScanTask` validation for reduced planning overhead |
| 59 | +- Configurable data file concurrency via `spark.comet.scan.icebergNative.dataFileConcurrencyLimit` |
| 60 | +- Channel-based executor thread parking instead of `yield_now()` for reduced CPU overhead |
| 61 | +- Reuse of `CometConf` and native utility instances in batch decoding |
| 62 | + |
| 63 | +### Native Columnar-to-Row Conversion |
| 64 | + |
| 65 | +Comet now uses a native columnar-to-row (C2R) conversion by default. This |
| 66 | +feature replaces Comet's JVM-based columnar-to-row transition with a native Rust implementation, reducing JVM memory overhead |
| 67 | +when data flows from Comet's native execution back to Spark operators that require row-based input. |
| 68 | + |
| 69 | +### New Expressions |
| 70 | + |
| 71 | +This release adds support for the following expressions: |
| 72 | + |
| 73 | +- Date/time functions: `make_date`, `next_day` |
| 74 | +- String functions: `right`, `string_split`, `luhn_check` |
| 75 | +- Math functions: `crc32` |
| 76 | +- Map functions: `map_contains_key`, `map_from_entries` |
| 77 | +- Conversion functions: `to_csv` |
| 78 | +- Cast support: date to timestamp, numeric to timestamp, integer to binary, boolean to decimal, date to numeric |
| 79 | + |
| 80 | +### ANSI Mode Error Messages |
| 81 | + |
| 82 | +ANSI SQL mode now produces proper error messages matching Spark's expected output, improving compatibility for |
| 83 | +workloads that rely on strict SQL error handling. |
| 84 | + |
| 85 | +### DataFusion Configuration Passthrough |
| 86 | + |
| 87 | +DataFusion session-level configurations can now be set directly from Spark using the `spark.comet.datafusion.*` |
| 88 | +prefix. This enables tuning DataFusion internals such as batch sizes and memory limits without modifying Comet code. |
| 89 | + |
| 90 | +## Performance Improvements |
| 91 | + |
| 92 | +This release includes extensive performance optimizations: |
| 93 | + |
| 94 | +- **Sum aggregation**: Specialized implementations for each eval mode eliminate per-row mode checks |
| 95 | +- **Contains expression**: SIMD-based scalar pattern search for faster string matching |
| 96 | +- **Batch coalescing**: Reduced IPC schema overhead in `BufBatchWriter` by coalescing small batches |
| 97 | +- **Tokio runtime**: Worker threads now initialize from `spark.executor.cores` for better resource utilization |
| 98 | +- **Decimal expressions**: Optimized decimal arithmetic operations |
| 99 | +- **Row-to-columnar transition**: Improved performance for JVM shuffle data conversion |
| 100 | +- **Aligned pointer reads**: Optimized `SparkUnsafeRow` field accessors using aligned memory reads |
| 101 | + |
| 102 | +## Deprecations and Removals |
| 103 | + |
| 104 | +The deprecated `native_comet` scan mode has been removed. Use `native_datafusion` instead. Note |
| 105 | +that the `native_iceberg_compat` scan is now deprecated and will be removed from a future release. |
| 106 | + |
| 107 | +## Compatibility |
| 108 | + |
| 109 | +This release upgrades to DataFusion 52.3, Arrow 57.3, and iceberg-rust 0.9.0. Published binaries now target |
| 110 | +x86-64-v3 and neoverse-n1 CPU architectures for improved performance on modern hardware. |
| 111 | + |
| 112 | +Supported platforms include Spark 3.4.3, 3.5.4-3.5.8, and Spark 4.0.x with various JDK and Scala combinations. |
| 113 | + |
| 114 | +The community encourages users to test Comet with existing Spark workloads and welcomes contributions to ongoing development. |
0 commit comments