[python] Add schema short-circuit to SplitRead and FileScanner read paths by MgjLLL · Pull Request #8217 · apache/paimon

MgjLLL · 2026-06-12T06:45:16Z

Purpose

Fix redundant filesystem I/O in SplitRead and FileScanner when reading schema.

SplitRead has 3 call sites that unconditionally call schema_manager.get_schema(schema_id) even when schema_id == table.table_schema.id — the schema is already in memory. This causes unnecessary filesystem reads in the common case (no schema evolution).

Java equivalent (RawFileSplitRead.createFileReader()) short-circuits with:

schemaId == schema.id() ? schema : schemaManager.schema(schemaId)

Changes

split_read.py: Add _resolve_schema() method that returns in-memory schema when id matches, replacing 3 direct get_schema() calls in raw_reader_supplier, _get_fields_and_predicate, and _file_read_fields
file_scanner.py: Add _schema_fields() method with same short-circuit pattern for SimpleStatsEvolutions

Tests

Added file_scanner_schema_fields_test.py with 3 test cases covering short-circuit, delegation, and zero-id edge case
All existing tests pass (106 passed)

This closes #8216

…er docstring

The schema short-circuit in FileScanner._schema_fields() returns table.table_schema.fields when schema_id matches the current schema id. The test fixture only mocked Mock(id=0) without .fields, causing the short-circuit path to return a Mock auto-attribute that is not iterable when used by SimpleStatsEvolutions._create_index_cast_mapping.

MgjLLL added 3 commits June 12, 2026 11:13

[python] Fix schema fields callback to short-circuit current schema id

6f56c7c

[python] Add schema short-circuit to SplitRead and simplify FileScann…

e9f1267

…er docstring

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[python] Add schema short-circuit to SplitRead and FileScanner read paths#8217

[python] Add schema short-circuit to SplitRead and FileScanner read paths#8217
MgjLLL wants to merge 3 commits into
apache:masterfrom
MgjLLL:python-fix-stats-evolutions-eager-schema-read

MgjLLL commented Jun 12, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

MgjLLL commented Jun 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Changes

Tests

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

MgjLLL commented Jun 12, 2026 •

edited

Loading