Skip to content

[Bug][python] SplitRead redundantly reads schema from filesystem when current schema is already in memory #8216

@MgjLLL

Description

@MgjLLL

Search before asking

  • I searched in the issues and found nothing similar.

Paimon version

master (latest)

Compute Engine

PythonAPI

Minimal reproduce step

Minimal reproduce step:

1. Create a Paimon table and write data
2. Use pypaimon to read data via `SplitRead`
3. Observe that `schema_manager.get_schema()` is called even when `schema_id` matches the current table schema id
4. This triggers redundant filesystem reads for schema files that are already available in memory

### What doesn't meet your expectations?

When `schema_id == table.table_schema.id`, the Python read path should return the in-memory `table.table_schema` directly without filesystem access, matching the Java short-circuit pattern in `RawFileSplitRead.createFileReader()`.

### Anything else?

This is a companion fix to `FileScanner._schema_fields` which had the same redundant read pattern. Both share the root cause: missing short-circuit for the current table schema id.

### Are you willing to submit a PR?

- [x] I'm willing to submit a PR!

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions