100w-master by hongyunyan · Pull Request #4958 · pingcap/ticdc

hongyunyan · 2026-04-30T03:09:07Z

What problem does this PR solve?

Issue Number: close #xxx

What is changed and how it works?

Check List

Tests

Unit test
Integration test
Manual test (add detailed scripts or steps below)
No code

Questions

Will it cause performance regression or break compatibility?

Do you need to update user documentation, design documentation or monitoring documentation?

Release note

Please refer to [Release Notes Language Style Guide](https://pingcap.github.io/tidb-dev-guide/contribute-to-tidb/release-notes-style-guide.html) to write a quality release note.

If you don't think this PR needs a release note then fill it with `None`.

Summary by CodeRabbit

Chores
- Adjusted periodic event interval timing for the maintainer service.
- Refined event message logging output to improve debugging visibility.

ti-chi-bot · 2026-04-30T03:09:13Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign kennytm for approval. For more information see the Code Review Process.
Please ensure that each of them provides their approval before proceeding.

The full list of commands accepted by this bot can be found here.

Details

Needs approval from an approver in each of these files:

OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

coderabbitai · 2026-04-30T03:09:18Z

📝 Walkthrough

Walkthrough

Modified periodic event interval in the maintainer's event loop from 100ms to 120s, and adjusted slow-event logging for EventMessage to log only the message type instead of the entire message object.

Changes

Cohort / File(s)	Summary
Event Loop & Logging Configuration `maintainer/maintainer.go`	Increased `periodEventInterval` from 100ms to 120s and modified EventMessage slow-event logging to record only `MessageType` instead of the full `Message` object.

Estimated code review effort

🎯 1 (Trivial) | ⏱️ ~5 minutes

Suggested labels

size/M

Suggested reviewers

wk989898
lidezhu
flowbehappy

Poem

🐰 A tick-tock so patient, now 120 seconds strong,
No need for verbose chatter when the type will do just fine,
Changes so surgical, logging refined,
The maintainer hops forward, ✨ one hop at a time!

🚥 Pre-merge checks | ✅ 3 | ❌ 2

❌ Failed checks (2 warnings)

Check name	Status	Explanation	Resolution
Title check	⚠️ Warning	The title '100w-master' is not descriptive of the actual changes made. The changes increase the periodic event interval from 100ms to 120s and adjust slow-event logging, but the title provides no indication of these modifications.	Revise the title to clearly describe the main change, such as 'Increase periodic event interval and adjust event logging' or similar descriptive phrasing.
Description check	⚠️ Warning	The PR description is entirely a template with no actual content filled in. Critical sections like 'What is changed and how it works?', issue reference, test information, and release notes are empty or show only placeholders.	Complete the PR description with: a specific issue number or link, explanation of the interval increase and logging changes, which tests were performed, answers to the compatibility and documentation questions, and an appropriate release note.

✅ Passed checks (3 passed)

Check name	Status	Explanation
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Review rate limit: 0/1 reviews remaining, refill in 60 minutes.}

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

gemini-code-assist

Code Review

This pull request increases the periodEventInterval from 100ms to 120s and updates event logging to record only the message type. Feedback suggests that the significant increase in the interval may negatively impact system responsiveness and replication lag. Additionally, it is recommended to use zap.Stringer for logging the message type to improve log readability.

gemini-code-assist · 2026-04-30T03:10:02Z


 const (
-	periodEventInterval = time.Millisecond * 100
+	periodEventInterval = time.Second * 120


Increasing periodEventInterval from 100ms to 120s is a drastic change that will significantly impact the system's responsiveness. This interval controls how frequently the maintainer calculates the global checkpoint and resends critical messages (such as bootstrap requests or barrier ACKs). A 2-minute delay in these operations will lead to excessive replication lag and extremely slow recovery from transient network issues or node failures. If this change was intended to reduce CPU overhead for large-scale changefeeds (e.g., 1 million tables), it should be made configurable or set to a more reasonable value like 1-5 seconds.

gemini-code-assist · 2026-04-30T03:10:03Z

 					zap.Int("eventType", event.eventType),
 					zap.Duration("duration", duration),
-					zap.Any("Message", event.message),
+					zap.Any("MessageType", event.message.Type),


Using zap.Stringer is preferred here since event.message.Type (of type IOType) implements the fmt.Stringer interface. This ensures the log contains the human-readable string representation of the message type rather than its raw integer value, which is much more helpful for debugging slow event processing. Additionally, logging only the type instead of the entire message is a good improvement for performance when dealing with large table counts.

Suggested change

zap.Any("MessageType", event.message.Type),

zap.Stringer("MessageType", event.message.Type),

ti-chi-bot · 2026-04-30T03:16:33Z

[FORMAT CHECKER NOTIFICATION]

Notice: To remove the do-not-merge/needs-linked-issue label, please provide the linked issue number on one line in the PR body, for example: Issue Number: close #123 or Issue Number: ref #456.

_{📖 For more info, you can check the "Contribute Code" section in the development guide.}

coderabbitai

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@maintainer/maintainer.go`:
- Around line 50-51: The single shared variable periodEventInterval was
increased to 120s but is used for both checkpoint calculation and EventPeriod
resend/housekeeping; split it into two clearly named intervals (e.g.,
periodCheckpointInterval for the checkpoint calculation used in the checkpoint
function referenced at Line 655, and periodEventInterval for EventPeriod
resend/housekeeping used around Lines 1178 and 1230) and update all references
so checkpoint logic uses periodCheckpointInterval (shorter, e.g.,
original/near-real-time value) while EventPeriod resend/housekeeping continues
to use the longer periodEventInterval; keep periodRedoInterval unchanged and
ensure names (periodCheckpointInterval, periodEventInterval, periodRedoInterval)
match across maintainer.go and any functions that read them.
- Line 312: In the defer slow-event logging path, guard against nil
event.message before accessing event.message.Type: update the zap logging call
that currently uses zap.Any("MessageType", event.message.Type) to first check if
event.message != nil and only include event.message.Type when non-nil (otherwise
log nil or omit the field) so the defer logger cannot nil-deref; locate this
change around the slow-event defer block and the usage of event.message and
"MessageType".

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 15a8564c-8b20-4472-b77d-509703c7cc7f

📥 Commits

Reviewing files that changed from the base of the PR and between 5518eb2 and cf1d9bd.

📒 Files selected for processing (1)

maintainer/maintainer.go

coderabbitai · 2026-04-30T03:18:29Z

+	periodEventInterval = time.Second * 120
 	periodRedoInterval  = time.Second * 1


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

120s shared periodic tick can regress checkpoint freshness and resend recovery.

periodEventInterval now drives both checkpoint calculation (Line 655) and EventPeriod resend/housekeeping (Line 1230 + Line 1178). Moving it to 120s means both flows can lag by up to 2 minutes.

Suggested split of intervals

const ( - periodEventInterval = time.Second * 120 + periodEventInterval = time.Second * 120 // resend/housekeeping + checkpointCalcInterval = time.Second // checkpoint advancement cadence periodRedoInterval = time.Second * 1 ) @@ -func (m *Maintainer) calCheckpointTs(ctx context.Context) { - ticker := time.NewTicker(periodEventInterval) +func (m *Maintainer) calCheckpointTs(ctx context.Context) { + ticker := time.NewTicker(checkpointCalcInterval)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@maintainer/maintainer.go` around lines 50 - 51, The single shared variable periodEventInterval was increased to 120s but is used for both checkpoint calculation and EventPeriod resend/housekeeping; split it into two clearly named intervals (e.g., periodCheckpointInterval for the checkpoint calculation used in the checkpoint function referenced at Line 655, and periodEventInterval for EventPeriod resend/housekeeping used around Lines 1178 and 1230) and update all references so checkpoint logic uses periodCheckpointInterval (shorter, e.g., original/near-real-time value) while EventPeriod resend/housekeeping continues to use the longer periodEventInterval; keep periodRedoInterval unchanged and ensure names (periodCheckpointInterval, periodEventInterval, periodRedoInterval) match across maintainer.go and any functions that read them.

coderabbitai · 2026-04-30T03:18:29Z

 					zap.Int("eventType", event.eventType),
 					zap.Duration("duration", duration),
-					zap.Any("Message", event.message),
+					zap.Any("MessageType", event.message.Type),


⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Guard event.message before logging MessageType in defer path.

event.message.Type can nil-deref in the slow-event logger. A small guard avoids panic in diagnostic code.

Nil-safe logging tweak

if event.eventType == EventMessage { + messageType := "nil" + if event.message != nil { + messageType = event.message.Type.String() + } log.Info("maintainer is too slow", zap.Stringer("changefeedID", m.changefeedID), zap.Int("eventType", event.eventType), zap.Duration("duration", duration), - zap.Any("MessageType", event.message.Type), + zap.String("messageType", messageType), ) } else {

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

zap.Any("MessageType", event.message.Type),

if event.eventType == EventMessage {

messageType := "nil"

if event.message != nil {

messageType = event.message.Type.String()

}

log.Info("maintainer is too slow",

zap.Stringer("changefeedID", m.changefeedID),

zap.Int("eventType", event.eventType),

zap.Duration("duration", duration),

zap.String("messageType", messageType),

)

} else {

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@maintainer/maintainer.go` at line 312, In the defer slow-event logging path, guard against nil event.message before accessing event.message.Type: update the zap logging call that currently uses zap.Any("MessageType", event.message.Type) to first check if event.message != nil and only include event.message.Type when non-nil (otherwise log nil or omit the field) so the defer logger cannot nil-deref; locate this change around the slow-event defer block and the usage of event.message and "MessageType".

update

cf1d9bd

ti-chi-bot Bot added release-note Denotes a PR that will be considered when it comes time to generate release notes. do-not-merge/needs-linked-issue labels Apr 30, 2026

ti-chi-bot Bot added the size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. label Apr 30, 2026

gemini-code-assist Bot reviewed Apr 30, 2026

View reviewed changes

coderabbitai Bot reviewed Apr 30, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

100w-master#4958

100w-master#4958
hongyunyan wants to merge 1 commit intopingcap:masterfrom
hongyunyan:100w-master

hongyunyan commented Apr 30, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

ti-chi-bot Bot commented Apr 30, 2026

Uh oh!

coderabbitai Bot commented Apr 30, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

Suggested labels

Suggested reviewers

Poem

❌ Failed checks (2 warnings)

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot Apr 30, 2026

Uh oh!

gemini-code-assist Bot Apr 30, 2026

Uh oh!

ti-chi-bot Bot commented Apr 30, 2026

Uh oh!

coderabbitai Bot left a comment

Uh oh!

coderabbitai Bot Apr 30, 2026

Uh oh!

coderabbitai Bot Apr 30, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

	zap.Any("MessageType", event.message.Type),
	zap.Stringer("MessageType", event.message.Type),

		periodEventInterval = time.Second * 120
		periodRedoInterval = time.Second * 1

-					zap.Any("MessageType", event.message.Type),
+			if event.eventType == EventMessage {
+				messageType := "nil"
+				if event.message != nil {
+					messageType = event.message.Type.String()
+				}
+				log.Info("maintainer is too slow",
+					zap.Stringer("changefeedID", m.changefeedID),
+					zap.Int("eventType", event.eventType),
+					zap.Duration("duration", duration),
+					zap.String("messageType", messageType),
+				)
+			} else {

Conversation

hongyunyan commented Apr 30, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What problem does this PR solve?

What is changed and how it works?

Check List

Tests

Questions

Will it cause performance regression or break compatibility?

Do you need to update user documentation, design documentation or monitoring documentation?

Release note

Summary by CodeRabbit

Uh oh!

ti-chi-bot Bot commented Apr 30, 2026

Uh oh!

coderabbitai Bot commented Apr 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Suggested labels

Suggested reviewers

Poem

❌ Failed checks (2 warnings)

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot Apr 30, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Apr 30, 2026

Choose a reason for hiding this comment

Uh oh!

ti-chi-bot Bot commented Apr 30, 2026

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 30, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 30, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

hongyunyan commented Apr 30, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Apr 30, 2026 •

edited

Loading