Skip to content

[SPARK-55402][SS] Move streamingSourceIdentifyingName from CatalogTable to DataSource#54185

Open
ericm-db wants to merge 2 commits intoapache:masterfrom
ericm-db:source-name-ds
Open

[SPARK-55402][SS] Move streamingSourceIdentifyingName from CatalogTable to DataSource#54185
ericm-db wants to merge 2 commits intoapache:masterfrom
ericm-db:source-name-ds

Conversation

@ericm-db
Copy link
Contributor

@ericm-db ericm-db commented Feb 6, 2026

What changes were proposed in this pull request?

This PR refactors the streamingSourceIdentifyingName field to move it from CatalogTable into DataSource as a constructor parameter. The changes include:

  1. DataSource.scala: Added streamingSourceIdentifyingName as an optional constructor parameter
  2. StreamingRelation.scala: Updated StreamingRelation.apply() to read the name from DataSource instead of CatalogTable
  3. DataSourceStrategy.scala: Modified FindDataSourceTable.getStreamingRelation() to pass the name directly to DataSource constructor instead of copying it onto CatalogTable
  4. CatalogTable (interface.scala): Removed the streamingSourceIdentifyingName field entirely
  5. Test updates: Updated test files to reflect the new architecture

Why are the changes needed?

streamingSourceIdentifyingName represents query-specific metadata (which source name was assigned in a particular streaming query plan), not an intrinsic property of the table itself. Storing it in CatalogTable breaks table equality semantics:

  • Two references to the same table in a single query can have different streamingSourceIdentifyingName values
  • This causes them to compare as unequal via CatalogTable.equals()
  • This can impact multi-statement transactions and any caching/deduplication logic that relies on CatalogTable equality
    By moving this field to DataSource (which is already query-specific), we restore proper catalog table equality while maintaining the ability to track streaming source identifying names for stable checkpoints.

Does this PR introduce any user-facing change?

No

How was this patch tested?

Unit tests

Was this patch authored or co-authored using generative AI tooling?

Generated-by: Claude Opus 4.6

@github-actions
Copy link

github-actions bot commented Feb 6, 2026

JIRA Issue Information

=== Task SPARK-55402 ===
Summary: Move streamingSourceIdentifyingName from CatalogTable to DataSource
Assignee: None
Status: Open
Affected: ["4.2.0"]


This comment was automatically generated by GitHub Actions

Copy link
Contributor

@brkyvz brkyvz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM minus one comment

@ericm-db ericm-db requested a review from brkyvz February 6, 2026 23:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants