Skip to content

Add server-side record batch filter execution #2950

@platinumhamburg

Description

@platinumhamburg

Search before asking

  • I searched in the issues and found nothing similar.

Description

Introduce server-side record batch filtering using batch-level statistics (min/max values, null counts) that are already available in the V1 log batch format. When a client sends a fetch request with a filter predicate, the server evaluates the predicate against each batch's statistics and skips batches that cannot contain matching records.

Key points:

  • Batch-level filtering, not row-level: the server uses batch statistics to skip entire batches. The client still performs row-level filtering on the returned batches.
  • ARROW format only: only ARROW log format includes batch-level statistics (V1+ magic). COMPACTED/INDEXED formats fall back to unfiltered reads.
  • Schema evolution safe: a PredicateSchemaResolver adapts the predicate when the batch schema differs from the predicate schema, with safe fallback (include the batch) on any failure.
  • Offset advancement: when all batches in a fetch are filtered out, the server returns a filteredEndOffset so the client can advance past the filtered range without re-fetching.

Willingness to contribute

  • I'm willing to submit a PR!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions