Skip to content

100w-master#4958

Open
hongyunyan wants to merge 1 commit intopingcap:masterfrom
hongyunyan:100w-master
Open

100w-master#4958
hongyunyan wants to merge 1 commit intopingcap:masterfrom
hongyunyan:100w-master

Conversation

@hongyunyan
Copy link
Copy Markdown
Collaborator

@hongyunyan hongyunyan commented Apr 30, 2026

What problem does this PR solve?

Issue Number: close #xxx

What is changed and how it works?

Check List

Tests

  • Unit test
  • Integration test
  • Manual test (add detailed scripts or steps below)
  • No code

Questions

Will it cause performance regression or break compatibility?
Do you need to update user documentation, design documentation or monitoring documentation?

Release note

Please refer to [Release Notes Language Style Guide](https://pingcap.github.io/tidb-dev-guide/contribute-to-tidb/release-notes-style-guide.html) to write a quality release note.

If you don't think this PR needs a release note then fill it with `None`.

Summary by CodeRabbit

  • Chores
    • Adjusted periodic event interval timing for the maintainer service.
    • Refined event message logging output to improve debugging visibility.

@ti-chi-bot ti-chi-bot Bot added release-note Denotes a PR that will be considered when it comes time to generate release notes. do-not-merge/needs-linked-issue labels Apr 30, 2026
@ti-chi-bot
Copy link
Copy Markdown

ti-chi-bot Bot commented Apr 30, 2026

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign kennytm for approval. For more information see the Code Review Process.
Please ensure that each of them provides their approval before proceeding.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Apr 30, 2026

📝 Walkthrough

Walkthrough

Modified periodic event interval in the maintainer's event loop from 100ms to 120s, and adjusted slow-event logging for EventMessage to log only the message type instead of the entire message object.

Changes

Cohort / File(s) Summary
Event Loop & Logging Configuration
maintainer/maintainer.go
Increased periodEventInterval from 100ms to 120s and modified EventMessage slow-event logging to record only MessageType instead of the full Message object.

Estimated code review effort

🎯 1 (Trivial) | ⏱️ ~5 minutes

Suggested labels

size/M

Suggested reviewers

  • wk989898
  • lidezhu
  • flowbehappy

Poem

🐰 A tick-tock so patient, now 120 seconds strong,
No need for verbose chatter when the type will do just fine,
Changes so surgical, logging refined,
The maintainer hops forward, ✨ one hop at a time!

🚥 Pre-merge checks | ✅ 3 | ❌ 2

❌ Failed checks (2 warnings)

Check name Status Explanation Resolution
Title check ⚠️ Warning The title '100w-master' is not descriptive of the actual changes made. The changes increase the periodic event interval from 100ms to 120s and adjust slow-event logging, but the title provides no indication of these modifications. Revise the title to clearly describe the main change, such as 'Increase periodic event interval and adjust event logging' or similar descriptive phrasing.
Description check ⚠️ Warning The PR description is entirely a template with no actual content filled in. Critical sections like 'What is changed and how it works?', issue reference, test information, and release notes are empty or show only placeholders. Complete the PR description with: a specific issue number or link, explanation of the interval increase and logging changes, which tests were performed, answers to the compatibility and documentation questions, and an appropriate release note.
✅ Passed checks (3 passed)
Check name Status Explanation
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
Review rate limit: 0/1 reviews remaining, refill in 60 minutes.

Comment @coderabbitai help to get the list of available commands and usage tips.

@ti-chi-bot ti-chi-bot Bot added the size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. label Apr 30, 2026
Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request increases the periodEventInterval from 100ms to 120s and updates event logging to record only the message type. Feedback suggests that the significant increase in the interval may negatively impact system responsiveness and replication lag. Additionally, it is recommended to use zap.Stringer for logging the message type to improve log readability.

Comment thread maintainer/maintainer.go

const (
periodEventInterval = time.Millisecond * 100
periodEventInterval = time.Second * 120
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

Increasing periodEventInterval from 100ms to 120s is a drastic change that will significantly impact the system's responsiveness. This interval controls how frequently the maintainer calculates the global checkpoint and resends critical messages (such as bootstrap requests or barrier ACKs). A 2-minute delay in these operations will lead to excessive replication lag and extremely slow recovery from transient network issues or node failures. If this change was intended to reduce CPU overhead for large-scale changefeeds (e.g., 1 million tables), it should be made configurable or set to a more reasonable value like 1-5 seconds.

Comment thread maintainer/maintainer.go
zap.Int("eventType", event.eventType),
zap.Duration("duration", duration),
zap.Any("Message", event.message),
zap.Any("MessageType", event.message.Type),
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Using zap.Stringer is preferred here since event.message.Type (of type IOType) implements the fmt.Stringer interface. This ensures the log contains the human-readable string representation of the message type rather than its raw integer value, which is much more helpful for debugging slow event processing. Additionally, logging only the type instead of the entire message is a good improvement for performance when dealing with large table counts.

Suggested change
zap.Any("MessageType", event.message.Type),
zap.Stringer("MessageType", event.message.Type),

@ti-chi-bot
Copy link
Copy Markdown

ti-chi-bot Bot commented Apr 30, 2026

[FORMAT CHECKER NOTIFICATION]

Notice: To remove the do-not-merge/needs-linked-issue label, please provide the linked issue number on one line in the PR body, for example: Issue Number: close #123 or Issue Number: ref #456.

📖 For more info, you can check the "Contribute Code" section in the development guide.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@maintainer/maintainer.go`:
- Around line 50-51: The single shared variable periodEventInterval was
increased to 120s but is used for both checkpoint calculation and EventPeriod
resend/housekeeping; split it into two clearly named intervals (e.g.,
periodCheckpointInterval for the checkpoint calculation used in the checkpoint
function referenced at Line 655, and periodEventInterval for EventPeriod
resend/housekeeping used around Lines 1178 and 1230) and update all references
so checkpoint logic uses periodCheckpointInterval (shorter, e.g.,
original/near-real-time value) while EventPeriod resend/housekeeping continues
to use the longer periodEventInterval; keep periodRedoInterval unchanged and
ensure names (periodCheckpointInterval, periodEventInterval, periodRedoInterval)
match across maintainer.go and any functions that read them.
- Line 312: In the defer slow-event logging path, guard against nil
event.message before accessing event.message.Type: update the zap logging call
that currently uses zap.Any("MessageType", event.message.Type) to first check if
event.message != nil and only include event.message.Type when non-nil (otherwise
log nil or omit the field) so the defer logger cannot nil-deref; locate this
change around the slow-event defer block and the usage of event.message and
"MessageType".
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 15a8564c-8b20-4472-b77d-509703c7cc7f

📥 Commits

Reviewing files that changed from the base of the PR and between 5518eb2 and cf1d9bd.

📒 Files selected for processing (1)
  • maintainer/maintainer.go

Comment thread maintainer/maintainer.go
Comment on lines +50 to 51
periodEventInterval = time.Second * 120
periodRedoInterval = time.Second * 1
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

120s shared periodic tick can regress checkpoint freshness and resend recovery.

periodEventInterval now drives both checkpoint calculation (Line 655) and EventPeriod resend/housekeeping (Line 1230 + Line 1178). Moving it to 120s means both flows can lag by up to 2 minutes.

Suggested split of intervals
 const (
-	periodEventInterval = time.Second * 120
+	periodEventInterval      = time.Second * 120 // resend/housekeeping
+	checkpointCalcInterval   = time.Second       // checkpoint advancement cadence
 	periodRedoInterval  = time.Second * 1
 )
@@
-func (m *Maintainer) calCheckpointTs(ctx context.Context) {
-	ticker := time.NewTicker(periodEventInterval)
+func (m *Maintainer) calCheckpointTs(ctx context.Context) {
+	ticker := time.NewTicker(checkpointCalcInterval)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@maintainer/maintainer.go` around lines 50 - 51, The single shared variable
periodEventInterval was increased to 120s but is used for both checkpoint
calculation and EventPeriod resend/housekeeping; split it into two clearly named
intervals (e.g., periodCheckpointInterval for the checkpoint calculation used in
the checkpoint function referenced at Line 655, and periodEventInterval for
EventPeriod resend/housekeeping used around Lines 1178 and 1230) and update all
references so checkpoint logic uses periodCheckpointInterval (shorter, e.g.,
original/near-real-time value) while EventPeriod resend/housekeeping continues
to use the longer periodEventInterval; keep periodRedoInterval unchanged and
ensure names (periodCheckpointInterval, periodEventInterval, periodRedoInterval)
match across maintainer.go and any functions that read them.

Comment thread maintainer/maintainer.go
zap.Int("eventType", event.eventType),
zap.Duration("duration", duration),
zap.Any("Message", event.message),
zap.Any("MessageType", event.message.Type),
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Guard event.message before logging MessageType in defer path.

event.message.Type can nil-deref in the slow-event logger. A small guard avoids panic in diagnostic code.

Nil-safe logging tweak
 			if event.eventType == EventMessage {
+				messageType := "nil"
+				if event.message != nil {
+					messageType = event.message.Type.String()
+				}
 				log.Info("maintainer is too slow",
 					zap.Stringer("changefeedID", m.changefeedID),
 					zap.Int("eventType", event.eventType),
 					zap.Duration("duration", duration),
-					zap.Any("MessageType", event.message.Type),
+					zap.String("messageType", messageType),
 				)
 			} else {
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
zap.Any("MessageType", event.message.Type),
if event.eventType == EventMessage {
messageType := "nil"
if event.message != nil {
messageType = event.message.Type.String()
}
log.Info("maintainer is too slow",
zap.Stringer("changefeedID", m.changefeedID),
zap.Int("eventType", event.eventType),
zap.Duration("duration", duration),
zap.String("messageType", messageType),
)
} else {
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@maintainer/maintainer.go` at line 312, In the defer slow-event logging path,
guard against nil event.message before accessing event.message.Type: update the
zap logging call that currently uses zap.Any("MessageType", event.message.Type)
to first check if event.message != nil and only include event.message.Type when
non-nil (otherwise log nil or omit the field) so the defer logger cannot
nil-deref; locate this change around the slow-event defer block and the usage of
event.message and "MessageType".

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

do-not-merge/needs-linked-issue release-note Denotes a PR that will be considered when it comes time to generate release notes. size/XS Denotes a PR that changes 0-9 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant