Skip to content

Conversation

@Prajwal-banakar
Copy link
Contributor

Purpose

Linked issue: close #2344

The purpose of this change is to refactor the SchemaUpdate class to delegate all schema membership management (columns, primary keys, and auto-increment fields) directly to Schema.Builder. Currently, SchemaUpdate manually maintains these members, which is error-prone and can lead to broken schemas during evolution.

Brief change log

Refactored SchemaUpdate: Removed local lists for columns, primary keys, and auto-increment fields.

Builder Delegation: Modified SchemaUpdate to directly maintain a Schema.Builder instance, ensuring a single source of truth for schema building and validation logic.

Enhanced Schema.Builder: Updated Schema.Builder#fromSchema to correctly adopt all members from an existing schema, including column IDs and metadata.

Added Helper API: Added Schema.Builder#getColumn(String columnName) to allow checking for existing columns within the builder's state.

Tests

Verified the fix by running TableSchemaTest and SchemaUpdateTest to ensure the delegation logic handles schema evolution correctly without IllegalStateException.

Performed a full test suite execution for the fluss-common module: mvn test -pl fluss-common.

Results: 1,491 tests run, 0 failures.

API and Format

This change adds a public helper method getColumn(String columnName) to the Schema.Builder API.

It does not affect the storage format.

Documentation

This change does not introduce a new feature; it is a refactoring of existing internal logic.

@Prajwal-banakar Prajwal-banakar force-pushed the schemaupdate-delegation branch from 5da4435 to f10a940 Compare January 20, 2026 12:48
@Prajwal-banakar Prajwal-banakar changed the title [common] Refactor SchemaUpdate to delegate schema changes to Schema.Builder [coordinator] Refactor SchemaUpdate to delegate schema changes to Schema.Builder Jan 20, 2026
Copy link
Member

@wuchong wuchong left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Prajwal-banakar thanks for the contribution. The new code looks much clear now. I only left a minor comment.

@loserwang1024 could you have another look?

Comment on lines 288 to 291
// 1. Clear current builder state
this.columns.clear();
this.autoIncrementColumnNames.clear();
this.primaryKey = null;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's better make sure the fromSchema is the first API to be called on the builder, rather than silently dropping previous columns. Otherwise, it's hard to debug the problem. You can validate all the members should be empty.

/** Schema update. */
public class SchemaUpdate {

/** Apply schema changes to the given table info and return the updated schema. */
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Keep the original javadoc?

@Prajwal-banakar Prajwal-banakar force-pushed the schemaupdate-delegation branch from f10a940 to 08590d8 Compare January 21, 2026 17:19
@Prajwal-banakar
Copy link
Contributor Author

Hi @wuchong I have addressed the feedback:

Added checkState validation in Schema.Builder#fromSchema.
Restored the original Javadoc in SchemaUpdate.

I am seeing different failures in the CI (CommitRemoteLogManifestITCase in the first run and TableChangeWatcherTest in the second). I've verified these tests locally and they pass consistently. It seems the CI environment is experiencing some flakiness unrelated to this refactor.
All core metadata tests in fluss-common and fluss-server passed perfectly. Could you please trigger a manual re-run or review the logic?

}
this.highestFieldId = new AtomicInteger(schema.highestFieldId);
// Check that the builder is empty before adopting from an existing schema
checkState(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's very useful check. Maybe we can move this to fromColumns. Thus, they can share the same check. A

@Prajwal-banakar Prajwal-banakar force-pushed the schemaupdate-delegation branch from 8d6ff7c to 1b59edd Compare January 22, 2026 05:26
Copy link
Contributor

@loserwang1024 loserwang1024 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@Prajwal-banakar
Copy link
Contributor Author

Hi @loserwang1024 and @wuchong,

I was about to move the checkState validation to fromColumns as suggested, but I noticed that the strict "empty builder" rule causes failures in existing integration tests, specifically FlussAdminITCase#testAlterTableColumn.

It seems some tests use a pattern where they call .column(...) to set up a template and then call .fromColumns(...) to load the remaining fields. If we enforce that the builder must be empty at the start of fromColumns, these existing tests will break.

I have two thoughts on how to proceed and wanted to get your preference:

Strict for IDs only: Only throw the IllegalStateException if the columns being adopted already have assigned IDs (initialization mode). This protects the schema's "source of truth" while allowing tests to append template columns.

Keep it in fromSchema: Revert to keeping the check only in fromSchema(...). This ensures that when loading a full schema object, the builder is fresh, but doesn't restrict the more flexible fromColumns API.

What do you think is the best approach for the project's long-term architecture?

@loserwang1024
Copy link
Contributor

@Prajwal-banakar Agree with you.

@Prajwal-banakar Prajwal-banakar force-pushed the schemaupdate-delegation branch from 1b59edd to 5806e63 Compare January 22, 2026 10:09
@Prajwal-banakar
Copy link
Contributor Author

Hi @loserwang1024 and @wuchong ,
I have finalized the implementation of Option 1. The validation in fromColumns(...) now only enforces the empty builder rule when adopting columns with assigned IDs (initialization mode).
Verification:
Verified locally with a full test run. Both fluss-common and fluss-server resulted in BUILD SUCCESS. I also verified that FlussAdminITCase now passes perfectly with this refined logic.
Ready for final review

@wuchong
Copy link
Member

wuchong commented Jan 23, 2026

Hi @Prajwal-banakar @loserwang1024 , I think we still need to keep the checkState in fromSchema(..), otherwise, autoIncrementColumnNames and primaryKey are silently overridden. I added a commit to fix. Will merge this PR when CI is passed.

@wuchong
Copy link
Member

wuchong commented Jan 23, 2026

@Prajwal-banakar, besides, just a quick note: please avoid squashing commits while responding to review comments. Keeping the changes in separate commits makes it much easier to track what has been updated since the last review.

When commits are squashed, it becomes difficult to distinguish new changes from previous ones, forcing reviewers to review the entire changes again. This significantly increases review time and may slow down the overall merge process.

The committer will help to squash commits and improve the commit message before merging the PR. So don’t worry about having multiple "fix" commits in PR.

@loserwang1024 loserwang1024 merged commit 2185afb into apache:main Jan 23, 2026
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Use Schema.Builder#fromSchema in SchemaUpdate

3 participants