-
Notifications
You must be signed in to change notification settings - Fork 518
Description
Search before asking
- I searched in the issues and found nothing similar.
Description
Introduce server-side record batch filtering using batch-level statistics (min/max values, null counts) that are already available in the V1 log batch format. When a client sends a fetch request with a filter predicate, the server evaluates the predicate against each batch's statistics and skips batches that cannot contain matching records.
Key points:
- Batch-level filtering, not row-level: the server uses batch statistics to skip entire batches. The client still performs row-level filtering on the returned batches.
- ARROW format only: only ARROW log format includes batch-level statistics (V1+ magic). COMPACTED/INDEXED formats fall back to unfiltered reads.
- Schema evolution safe: a
PredicateSchemaResolveradapts the predicate when the batch schema differs from the predicate schema, with safe fallback (include the batch) on any failure. - Offset advancement: when all batches in a fetch are filtered out, the server returns a
filteredEndOffsetso the client can advance past the filtered range without re-fetching.
Willingness to contribute
- I'm willing to submit a PR!
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels