-
Notifications
You must be signed in to change notification settings - Fork 488
[coordinator] Refactor SchemaUpdate to delegate schema changes to Schema.Builder #2416
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[coordinator] Refactor SchemaUpdate to delegate schema changes to Schema.Builder #2416
Conversation
5da4435 to
f10a940
Compare
wuchong
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Prajwal-banakar thanks for the contribution. The new code looks much clear now. I only left a minor comment.
@loserwang1024 could you have another look?
| // 1. Clear current builder state | ||
| this.columns.clear(); | ||
| this.autoIncrementColumnNames.clear(); | ||
| this.primaryKey = null; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's better make sure the fromSchema is the first API to be called on the builder, rather than silently dropping previous columns. Otherwise, it's hard to debug the problem. You can validate all the members should be empty.
| /** Schema update. */ | ||
| public class SchemaUpdate { | ||
|
|
||
| /** Apply schema changes to the given table info and return the updated schema. */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Keep the original javadoc?
f10a940 to
08590d8
Compare
|
Hi @wuchong I have addressed the feedback: Added checkState validation in Schema.Builder#fromSchema. I am seeing different failures in the CI (CommitRemoteLogManifestITCase in the first run and TableChangeWatcherTest in the second). I've verified these tests locally and they pass consistently. It seems the CI environment is experiencing some flakiness unrelated to this refactor. |
| } | ||
| this.highestFieldId = new AtomicInteger(schema.highestFieldId); | ||
| // Check that the builder is empty before adopting from an existing schema | ||
| checkState( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's very useful check. Maybe we can move this to fromColumns. Thus, they can share the same check. A
8d6ff7c to
1b59edd
Compare
loserwang1024
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
|
Hi @loserwang1024 and @wuchong, I was about to move the checkState validation to fromColumns as suggested, but I noticed that the strict "empty builder" rule causes failures in existing integration tests, specifically FlussAdminITCase#testAlterTableColumn. It seems some tests use a pattern where they call .column(...) to set up a template and then call .fromColumns(...) to load the remaining fields. If we enforce that the builder must be empty at the start of fromColumns, these existing tests will break. I have two thoughts on how to proceed and wanted to get your preference: Strict for IDs only: Only throw the IllegalStateException if the columns being adopted already have assigned IDs (initialization mode). This protects the schema's "source of truth" while allowing tests to append template columns. Keep it in fromSchema: Revert to keeping the check only in fromSchema(...). This ensures that when loading a full schema object, the builder is fresh, but doesn't restrict the more flexible fromColumns API. What do you think is the best approach for the project's long-term architecture? |
|
@Prajwal-banakar Agree with you. |
1b59edd to
5806e63
Compare
|
Hi @loserwang1024 and @wuchong , |
|
Hi @Prajwal-banakar @loserwang1024 , I think we still need to keep the |
|
@Prajwal-banakar, besides, just a quick note: please avoid squashing commits while responding to review comments. Keeping the changes in separate commits makes it much easier to track what has been updated since the last review. When commits are squashed, it becomes difficult to distinguish new changes from previous ones, forcing reviewers to review the entire changes again. This significantly increases review time and may slow down the overall merge process. The committer will help to squash commits and improve the commit message before merging the PR. So don’t worry about having multiple "fix" commits in PR. |
Purpose
Linked issue: close #2344
The purpose of this change is to refactor the SchemaUpdate class to delegate all schema membership management (columns, primary keys, and auto-increment fields) directly to Schema.Builder. Currently, SchemaUpdate manually maintains these members, which is error-prone and can lead to broken schemas during evolution.
Brief change log
Refactored SchemaUpdate: Removed local lists for columns, primary keys, and auto-increment fields.
Builder Delegation: Modified SchemaUpdate to directly maintain a Schema.Builder instance, ensuring a single source of truth for schema building and validation logic.
Enhanced Schema.Builder: Updated Schema.Builder#fromSchema to correctly adopt all members from an existing schema, including column IDs and metadata.
Added Helper API: Added Schema.Builder#getColumn(String columnName) to allow checking for existing columns within the builder's state.
Tests
Verified the fix by running TableSchemaTest and SchemaUpdateTest to ensure the delegation logic handles schema evolution correctly without IllegalStateException.
Performed a full test suite execution for the fluss-common module: mvn test -pl fluss-common.
Results: 1,491 tests run, 0 failures.
API and Format
This change adds a public helper method getColumn(String columnName) to the Schema.Builder API.
It does not affect the storage format.
Documentation
This change does not introduce a new feature; it is a refactoring of existing internal logic.