Skip to content

kafka: bump sarama to fix out or oder when controller failure#5483

Open
3AceShowHand wants to merge 7 commits into
pingcap:masterfrom
3AceShowHand:sarama-fix
Open

kafka: bump sarama to fix out or oder when controller failure#5483
3AceShowHand wants to merge 7 commits into
pingcap:masterfrom
3AceShowHand:sarama-fix

Conversation

@3AceShowHand

@3AceShowHand 3AceShowHand commented Jun 23, 2026

Copy link
Copy Markdown
Collaborator

What problem does this PR solve?

Issue Number: close #xxx

What is changed and how it works?

Check List

Tests

  • Unit test
  • Integration test
  • Manual test (add detailed scripts or steps below)
  • No code

Questions

Will it cause performance regression or break compatibility?
Do you need to update user documentation, design documentation or monitoring documentation?

Release note

Please refer to [Release Notes Language Style Guide](https://pingcap.github.io/tidb-dev-guide/contribute-to-tidb/release-notes-style-guide.html) to write a quality release note.

If you don't think this PR needs a release note then fill it with `None`.

Summary by CodeRabbit

Release Notes

  • Improvements

    • Enhanced schema file handling and path management in cloud storage sink operations.
    • Improved data replay ordering mechanism for better consistency across distributed setups.
    • Optimized logging initialization for reduced overhead.
  • Chores

    • Updated internal dependencies and package organization for cloud storage sink components.

@ti-chi-bot ti-chi-bot Bot added do-not-merge/needs-linked-issue release-note Denotes a PR that will be considered when it comes time to generate release notes. labels Jun 23, 2026
@ti-chi-bot

ti-chi-bot Bot commented Jun 23, 2026

Copy link
Copy Markdown

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign 3aceshowhand for approval. For more information see the Code Review Process.
Please ensure that each of them provides their approval before proceeding.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@ti-chi-bot ti-chi-bot Bot added the size/S Denotes a PR that changes 10-29 lines, ignoring generated files. label Jun 23, 2026

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request updates the sarama dependency version and modifies initSaramaLogger in pkg/logger/log.go to restrict Sarama logging to levels below InfoLevel. The reviewer suggested simplifying the level check to level < zapcore.InfoLevel for better readability and consistency with other parts of the codebase.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

Comment thread pkg/logger/log.go Outdated
@coderabbitai

coderabbitai Bot commented Jun 23, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack

📝 Walkthrough

Walkthrough

Replaces TableDefinition with SchemaFile as the persisted cloud storage schema payload across pkg/cloudstorage, the downstream sink, and the storage consumer. Introduces new DmlPathKey/SchemaPathKey types in pkg/cloudstorage with non-regex path parsing, table-ID layout support, and CompareDMLPathKey replay ordering. Rewires all downstreamadapter/sink/cloudstorage files to import pkg/cloudstorage instead of pkg/sink/cloudstorage. Updates sarama's module replace directive and adds an InfoLevel discard guard to initSaramaLogger.

Changes

Cloud storage path keys, schema validation, consumer wiring, and sink DDL write flow

Layer / File(s) Summary
SchemaFile contracts, checksum, and parser validation
pkg/cloudstorage/schema_file.go, pkg/cloudstorage/schema_file_test.go
Defines SchemaFile struct replacing TableDefinition; adds Build, ToDDLEvent, ToTableInfo, Marshal, Parse, Sum32, and GenerateSchemaFilePath methods; tests cover build/serialize/parse/checksum round-trips and column-order invariance.
DML/index path key types and pkg/cloudstorage path generation
pkg/cloudstorage/path_key.go, pkg/cloudstorage/path_key_test.go, pkg/cloudstorage/path.go, pkg/cloudstorage/path_test.go
Introduces SchemaPathKey, FileIndex, DmlPathKey types with CompareDMLPathKey, NewSchemaFileDMLPathKey, GenerateDMLFilePath, ParseDMLFilePath, ParseIndexFilePath; refactors CheckOrWriteSchema and generateDataDirPath to use SchemaFile and DmlPathKey; replaces FetchIndexFromFileName with ParseFileIndexFromFileName.
pkg/sink/cloudstorage path_key refactor
pkg/sink/cloudstorage/path_key.go, pkg/sink/cloudstorage/path_key_test.go
Expands DmlPathKey with UseTableIDAsPath/TableID fields and schema-file sentinel constant; rewrites ParseDMLFilePath/ParseIndexFilePath without regex; adds CompareDMLPathKey; validates parsing, rejection of invalid paths, and ordering semantics in extended tests.
Sink DDL write flow and import rewiring
downstreamadapter/sink/cloudstorage/sink.go, downstreamadapter/sink/cloudstorage/sink_test.go, downstreamadapter/sink/cloudstorage/buffer_manager.go, downstreamadapter/sink/cloudstorage/dml_writers.go, downstreamadapter/sink/cloudstorage/task.go, downstreamadapter/sink/cloudstorage/writer.go, downstreamadapter/sink/cloudstorage/writer_test.go, downstreamadapter/sink/cloudstorage/buffer_manager_test.go, downstreamadapter/sink/cloudstorage/encoder_group_test.go
Replaces TableDefinition with SchemaFile in writeDDLEvent/writeFile; removes tableSchemaStore field; rewires all files from pkg/sink/cloudstorage to pkg/cloudstorage; updates tests to assert SchemaFile JSON fields.
Storage consumer schema-file ingestion and replay
cmd/storage-consumer/consumer.go
Replaces tableDefMap with schemaFileMap; parses schema files via cloudstorage.Parse; inserts synthetic DML keys via NewSchemaFileDMLPathKey; sorts replay with CompareDMLPathKey; gates DDL via IsSchemaFileDMLPathKey; uses SchemaFile.ToDDLEvent/ToTableInfo for replay.

Sarama version bump and logger initialization guard

Layer / File(s) Summary
Logger discard guard and sarama version bump
pkg/logger/log.go, go.mod
initSaramaLogger short-circuits to a discard stdlib logger when zapcore.InfoLevel is enabled; sarama replace directive updated to v1.41.2-pingcap-20260622.1.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

  • pingcap/ticdc#4658: The exchange-partition DDL write path uses CloneWithRouting to produce separate schema and source events, which depends on routing field plumbing introduced in this PR.

Suggested labels

release-note-none

Suggested reviewers

  • asddongmen
  • wk989898
  • flowbehappy
  • hongyunyan

Poem

🐇 Hoppity-hop through the schema store,
No more TableDefinitionSchemaFile forevermore!
The path keys are parsed without regex today,
Checksums and versions keep chaos at bay.
DML replays in the right order now, yay! 🎉

🚥 Pre-merge checks | ✅ 2 | ❌ 3

❌ Failed checks (3 warnings)

Check name Status Explanation Resolution
Title check ⚠️ Warning Title references Kafka/Sarama updates, but actual changes show extensive cloud storage schema refactoring unrelated to Kafka. Update title to accurately reflect the primary changes: cloud storage schema file implementation and replay ordering, or clarify if this is a multi-part PR.
Description check ⚠️ Warning Description is entirely placeholder template with unfilled critical sections and no substance about actual changes made. Fill in issue number, describe changes (cloud storage schema refactoring), specify test coverage, and provide release notes documenting the schema-file migration and API changes.
Docstring Coverage ⚠️ Warning Docstring coverage is 18.84% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Warning

There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure.

🔧 OpenGrep (1.23.0)
downstreamadapter/sink/cloudstorage/encoder_group_test.go

┌──────────────┐
│ Opengrep CLI │
└──────────────┘

�[32m✔�[39m �[1mOpengrep OSS�[0m
�[32m✔�[39m Basic security coverage for first-party code vulnerabilities.

[00.35][ERROR]: unable to find a config; path .coderabbit-opengrep-fallback.yml does not exist

downstreamadapter/sink/cloudstorage/dml_writers.go

┌──────────────┐
│ Opengrep CLI │
└──────────────┘

�[32m✔�[39m �[1mOpengrep OSS�[0m
�[32m✔�[39m Basic security coverage for first-party code vulnerabilities.

[00.71][ERROR]: unable to find a config; path .coderabbit-opengrep-fallback.yml does not exist

downstreamadapter/sink/cloudstorage/buffer_manager_test.go

┌──────────────┐
│ Opengrep CLI │
└──────────────┘

�[32m✔�[39m �[1mOpengrep OSS�[0m
�[32m✔�[39m Basic security coverage for first-party code vulnerabilities.

[00.80][ERROR]: unable to find a config; path .coderabbit-opengrep-fallback.yml does not exist

  • 13 others

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands.

@ti-chi-bot ti-chi-bot Bot added size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. and removed size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels Jun 23, 2026
@3AceShowHand

Copy link
Copy Markdown
Collaborator Author

/test all

@3AceShowHand

Copy link
Copy Markdown
Collaborator Author

/test all

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
pkg/sink/cloudstorage/schema_file.go (1)

230-249: 🎯 Functional Correctness | 🟠 Major | ⚡ Quick win

Build should reset receiver state before populating fields.

Build mutates SchemaFile in place but never clears existing Columns/TotalColumns. A second call can keep stale column data (especially on the early-return branch when TableInfo is nil), producing invalid schema artifacts.

Proposed fix
 func (t *SchemaFile) Build(event *commonEvent.DDLEvent, outputColumnID bool) {
-	t.Version = defaultSchemaFileVersion
-	t.TableVersion = event.FinishedTs
-	t.Query = event.Query
-	t.Type = event.Type
+	*t = SchemaFile{
+		Version:      defaultSchemaFileVersion,
+		TableVersion: event.FinishedTs,
+		Query:        event.Query,
+		Type:         event.Type,
+	}
 
 	info := event.TableInfo
 	if info == nil {
 		t.Schema = event.GetTargetSchemaName()
 		t.Table = event.GetTargetTableName()
 		return
 	}
 	t.Schema = info.GetTargetSchemaName()
 	t.Table = info.GetTargetTableName()
 	t.TotalColumns = len(info.GetColumns())
+	t.Columns = make([]TableCol, 0, t.TotalColumns)
 	for _, col := range info.GetColumns() {
 		var tableCol TableCol
 		tableCol.FromTiColumnInfo(col, outputColumnID)
 		t.Columns = append(t.Columns, tableCol)
 	}
 }
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@pkg/sink/cloudstorage/schema_file.go` around lines 230 - 249, The Build
method on SchemaFile mutates the receiver in place without clearing existing
state before population. At the start of the Build method (before any field
assignments), reset the Columns slice to empty and reset TotalColumns to zero.
This ensures that subsequent calls to Build do not retain stale column data from
previous invocations, especially on the early-return path when TableInfo is nil.
🧹 Nitpick comments (2)
pkg/logger/log_test.go (1)

37-49: 📐 Maintainability & Code Quality | 🔵 Trivial | 💤 Low value

Consider testing additional log levels (optional).

The test validates the behavior for DebugLevel (zap logger) and InfoLevel (discard logger). For completeness, you could add assertions for WarnLevel and ErrorLevel to ensure they also use the discard logger. However, the current test is focused and sufficient given that the guard condition applies uniformly to all levels ≥ Info.

🧪 Optional: Additional test coverage
 	require.NoError(t, initSaramaLogger(zapcore.InfoLevel))
 	require.NotSame(t, debugLogger, sarama.Logger)
 	require.IsType(t, stdlog.New(nil, "", 0), sarama.Logger)
+
+	// Verify WarnLevel and ErrorLevel also use discard logger
+	require.NoError(t, initSaramaLogger(zapcore.WarnLevel))
+	require.IsType(t, stdlog.New(nil, "", 0), sarama.Logger)
 }
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@pkg/logger/log_test.go` around lines 37 - 49, The test
TestInitSaramaLoggerResetsWhenInfoEnabled currently only validates behavior for
DebugLevel and InfoLevel. To improve test coverage, add additional assertions
that call initSaramaLogger with WarnLevel and ErrorLevel, and verify that these
levels also result in sarama.Logger being set to a discard logger (using
require.IsType with stdlog.New), similar to the existing InfoLevel validation.
pkg/logger/log.go (1)

261-265: 📐 Maintainability & Code Quality | 🔵 Trivial

Consider adding a clarifying comment for the guard condition.

The condition zapcore.InfoLevel.Enabled(level) correctly silences Sarama when the log level is Info or higher (less verbose), but the intent may not be immediately clear to future readers. A brief comment explaining that Sarama is intentionally silenced at less verbose levels would improve maintainability.

📝 Suggested clarifying comment
 func initSaramaLogger(level zapcore.Level) error {
+	// Silence Sarama at Info or higher levels; it is noisy and not needed for normal operation.
 	if zapcore.InfoLevel.Enabled(level) {
 		sarama.Logger = stdlog.New(io.Discard, "[Sarama] ", stdlog.LstdFlags)
 		return nil
 	}
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@pkg/logger/log.go` around lines 261 - 265, Add a clarifying comment above the
if statement that checks `zapcore.InfoLevel.Enabled(level)` to explain that
Sarama logging is intentionally silenced when the log level is Info or higher
(less verbose logging levels). The comment should make it clear to future
maintainers that this condition prevents overly verbose output from Sarama by
discarding its logs when verbosity is lower, improving code readability and
maintainability.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@cmd/storage-consumer/consumer.go`:
- Around line 593-594: The mustGetSchemaFile function at line 593-594 panics
when a schema file is missing, which causes hard failures when DML files are
encountered before their corresponding schema files due to unordered file walk
traversal in getNewFiles. To fix this, either enforce a two-pass approach in
getNewFiles that processes all schema files before DML files, or add schema
existence validation before invoking mustGetSchemaFile in the loop to check if
the schema is present in schemaFileMap and skip or log the DML key gracefully
instead of panicking. Additionally, replace the hard panic logic at lines 523,
529, and 536 with recoverable error handling or skip logic to handle transient
storage inconsistencies.

In `@pkg/sink/cloudstorage/schema_file.go`:
- Around line 300-305: The checksumPayload struct initialization is missing the
Version field, which means schema file version changes won't be reflected in the
checksum-derived filenames. In the checksumPayload struct literal construction
within marshalForChecksum, add the Version field assignment alongside the other
fields like Table, Schema, Columns, and TotalColumns. This ensures that version
information is included in the checksum payload calculation.
- Around line 275-280: The isTableLevel() method currently uses log.Panic to
handle schema validation failures, which crashes the process instead of allowing
callers to handle the error. Refactor isTableLevel() to return an error in
addition to the boolean result (or return only an error if appropriate), and
instead of calling log.Panic when len(t.Columns) != t.TotalColumns, return a
predefined repository error as per the coding guidelines documented in
docs/agents/error-handling.md. This allows callers to handle validation failures
gracefully without causing process-level availability impact.

---

Outside diff comments:
In `@pkg/sink/cloudstorage/schema_file.go`:
- Around line 230-249: The Build method on SchemaFile mutates the receiver in
place without clearing existing state before population. At the start of the
Build method (before any field assignments), reset the Columns slice to empty
and reset TotalColumns to zero. This ensures that subsequent calls to Build do
not retain stale column data from previous invocations, especially on the
early-return path when TableInfo is nil.

---

Nitpick comments:
In `@pkg/logger/log_test.go`:
- Around line 37-49: The test TestInitSaramaLoggerResetsWhenInfoEnabled
currently only validates behavior for DebugLevel and InfoLevel. To improve test
coverage, add additional assertions that call initSaramaLogger with WarnLevel
and ErrorLevel, and verify that these levels also result in sarama.Logger being
set to a discard logger (using require.IsType with stdlog.New), similar to the
existing InfoLevel validation.

In `@pkg/logger/log.go`:
- Around line 261-265: Add a clarifying comment above the if statement that
checks `zapcore.InfoLevel.Enabled(level)` to explain that Sarama logging is
intentionally silenced when the log level is Info or higher (less verbose
logging levels). The comment should make it clear to future maintainers that
this condition prevents overly verbose output from Sarama by discarding its logs
when verbosity is lower, improving code readability and maintainability.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 3d711bcf-bb42-4ca5-a5a9-64a068bff015

📥 Commits

Reviewing files that changed from the base of the PR and between 7741785 and 0703056.

📒 Files selected for processing (13)
  • cmd/storage-consumer/consumer.go
  • downstreamadapter/sink/cloudstorage/sink.go
  • downstreamadapter/sink/cloudstorage/sink_test.go
  • pkg/logger/log.go
  • pkg/logger/log_test.go
  • pkg/sink/cloudstorage/path.go
  • pkg/sink/cloudstorage/path_key.go
  • pkg/sink/cloudstorage/path_key_test.go
  • pkg/sink/cloudstorage/path_test.go
  • pkg/sink/cloudstorage/schema_file.go
  • pkg/sink/cloudstorage/schema_file_parse.go
  • pkg/sink/cloudstorage/schema_file_parse_test.go
  • pkg/sink/cloudstorage/schema_file_test.go

Comment thread cmd/storage-consumer/consumer.go
Comment thread pkg/cloudstorage/schema_file.go Outdated
Comment thread pkg/cloudstorage/schema_file.go
@3AceShowHand

Copy link
Copy Markdown
Collaborator Author

/test all

@ti-chi-bot

ti-chi-bot Bot commented Jun 23, 2026

Copy link
Copy Markdown

[FORMAT CHECKER NOTIFICATION]

Notice: To remove the do-not-merge/needs-linked-issue label, please provide the linked issue number on one line in the PR body, for example: Issue Number: close #123 or Issue Number: ref #456.

📖 For more info, you can check the "Contribute Code" section in the development guide.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@cmd/storage-consumer/consumer.go`:
- Around line 445-452: The code at ParseFileIndexFromFileName extracts a
dispatcherID from the filename into fileIndex, but then immediately overwrites
it with the dispatcherID from the index path without validating they match. Add
validation after ParseFileIndexFromFileName returns to check if the extracted
dispatcherID from the filename matches the dispatcherID parameter being assigned
to FileIndexKey. If there is a mismatch, return an error indicating the
dispatcher IDs are inconsistent, as this can cause incorrect file-index state
and replay behavior.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: fbd9743c-107c-4c78-a91f-205d8272aa8a

📥 Commits

Reviewing files that changed from the base of the PR and between 0703056 and eb69660.

📒 Files selected for processing (19)
  • cmd/storage-consumer/consumer.go
  • downstreamadapter/sink/cloudstorage/buffer_manager.go
  • downstreamadapter/sink/cloudstorage/buffer_manager_test.go
  • downstreamadapter/sink/cloudstorage/dml_writers.go
  • downstreamadapter/sink/cloudstorage/encoder_group_test.go
  • downstreamadapter/sink/cloudstorage/sink.go
  • downstreamadapter/sink/cloudstorage/sink_test.go
  • downstreamadapter/sink/cloudstorage/task.go
  • downstreamadapter/sink/cloudstorage/writer.go
  • downstreamadapter/sink/cloudstorage/writer_test.go
  • pkg/cloudstorage/config.go
  • pkg/cloudstorage/config_test.go
  • pkg/cloudstorage/main_test.go
  • pkg/cloudstorage/path.go
  • pkg/cloudstorage/path_key.go
  • pkg/cloudstorage/path_key_test.go
  • pkg/cloudstorage/path_test.go
  • pkg/cloudstorage/schema_file.go
  • pkg/cloudstorage/schema_file_test.go
✅ Files skipped from review due to trivial changes (1)
  • downstreamadapter/sink/cloudstorage/buffer_manager.go
🚧 Files skipped from review as they are similar to previous changes (1)
  • downstreamadapter/sink/cloudstorage/sink_test.go

Comment on lines +445 to 452
fileIndex, err := cloudstorage.ParseFileIndexFromFileName(fileName, c.fileExtension)
if err != nil {
return err
}
fileIndex := &cloudstorage.FileIndex{
FileIndexKey: cloudstorage.FileIndexKey{
DispatcherID: dispatcherID,
EnableTableAcrossNodes: dispatcherID != "",
},
Idx: fileIdx,
fileIndex.FileIndexKey = cloudstorage.FileIndexKey{
DispatcherID: dispatcherID,
EnableTableAcrossNodes: dispatcherID != "",
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🗄️ Data Integrity & Integration | 🟠 Major | ⚡ Quick win

Validate dispatcher consistency between index-path and index-file payload.

At Line 445 and Line 449-452, the dispatcher parsed from the data filename is discarded and replaced by the dispatcher from the index path without validation. A mismatch can mis-key file-index state and replay from the wrong stream.

Proposed fix
-	fileIndex, err := cloudstorage.ParseFileIndexFromFileName(fileName, c.fileExtension)
+	fileIndex, err := cloudstorage.ParseFileIndexFromFileName(fileName, c.fileExtension)
 	if err != nil {
 		return err
 	}
+	if fileIndex.DispatcherID != dispatcherID {
+		return errors.ErrStorageSinkInvalidFileName.GenWithStack(
+			"dispatcher mismatch between index path and file payload: path=%s, payload=%s, file=%s",
+			dispatcherID, fileIndex.DispatcherID, fileName,
+		)
+	}
 	fileIndex.FileIndexKey = cloudstorage.FileIndexKey{
 		DispatcherID:           dispatcherID,
 		EnableTableAcrossNodes: dispatcherID != "",
 	}
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@cmd/storage-consumer/consumer.go` around lines 445 - 452, The code at
ParseFileIndexFromFileName extracts a dispatcherID from the filename into
fileIndex, but then immediately overwrites it with the dispatcherID from the
index path without validating they match. Add validation after
ParseFileIndexFromFileName returns to check if the extracted dispatcherID from
the filename matches the dispatcherID parameter being assigned to FileIndexKey.
If there is a mismatch, return an error indicating the dispatcher IDs are
inconsistent, as this can cause incorrect file-index state and replay behavior.

@ti-chi-bot

ti-chi-bot Bot commented Jun 23, 2026

Copy link
Copy Markdown

@3AceShowHand: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
pull-error-log-review eb69660 link true /test pull-error-log-review
pull-cdc-storage-integration-heavy eb69660 link true /test pull-cdc-storage-integration-heavy
pull-cdc-storage-integration-light eb69660 link true /test pull-cdc-storage-integration-light

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

do-not-merge/needs-linked-issue release-note Denotes a PR that will be considered when it comes time to generate release notes. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant