You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Fix duplicate filtering path in Arrow task batches (#3448)
Closes#3272
## What this changes
This PR updates the Arrow scan path in `_task_to_record_batches` to
avoid redundant filtering when there are no positional deletes.
- Keeps predicate pushdown in `Scanner.from_fragment` as the only filter
path when `positional_deletes` is absent.
- Applies `current_batch.filter(pyarrow_filter)` only in the
positional-delete path, after deletes are applied.
- Preserves empty-batch handling after both delete application and
conditional filtering.
## Why
The previous flow could perform an extra table-level refilter even when
the scanner already applied the predicate. This change removes that
stale workaround path while keeping correct behavior for positional
delete scenarios.
## Tests
Added regression coverage in `tests/io/test_pyarrow.py`:
-
`test_task_to_record_batches_filter_without_positional_deletes_avoids_table_refilter`
-
`test_task_to_record_batches_filter_with_positional_deletes_handles_empty_batch`
Validated locally:
- `python -m pytest tests/io/test_pyarrow.py -q -k
"task_to_record_batches_nanos or
filter_without_positional_deletes_avoids_table_refilter or
filter_with_positional_deletes_handles_empty_batch"`
- `make lint`
---------
Co-authored-by: Gayathri Srividya Rajavarapu <gayathrir@Gayathris-MacBook-Air.local>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-authored-by: Fokko Driesprong <fokko@apache.org>
0 commit comments