Skip to content

Conversation

@geruh
Copy link
Contributor

@geruh geruh commented Jan 16, 2026

Related to #2255.

Rationale for this change

This PR is a piece of the existing DFI PR in #2255. However, this rips out the existing delete->data matching behavior for deletes and indexes them for efficient lookup.

The previous implementation:

  1. Scanned all delete files with sequence number >= data file's sequence number
  2. Created a new _InclusiveMetricsEvaluator instance for each data file
  3. Evaluated every candidate delete file against the data file's path

Now we extend this workflow with a DeleteFileIndex that:

  • INdexes path specific DVs
  • Indexes partition-scoped deletes by (spec_id, partition record)
  • Uses bisect_left for sequence number filtering

This aligns with the Java implementation of the DeleteFileIndex, following the python infra.

Are these changes tested?

New tests added and existing tests continue to pass

Are there any user-facing changes?

No

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant