server: honor scheduler concurrency for coordinator (#4832)#5396
server: honor scheduler concurrency for coordinator (#4832)#5396ti-chi-bot wants to merge 1 commit into
Conversation
Signed-off-by: ti-chi-bot <ti-community-prow-bot@tidb.io>
|
This cherry pick PR is for a release branch and has not yet been approved by triage owners. To merge this cherry pick:
DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
|
Important Review skippedAuto reviews are disabled on base/target branches other than the default branch. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Code Review
This pull request improves changefeed resume validation by retrieving persisted metadata from the backend instead of using stale in-memory copies, and clones changefeed info in GetChangefeed to prevent concurrent mutation issues. It also captures scheduler settings from the server startup configuration. The review feedback highlights three potential nil pointer dereference panics: when overwriting cfInfo with the result of GetPersistedChangefeedInfo in ResumeChangefeed, when dereferencing info in EtcdBackend.GetChangefeedInfo, and when accessing schedulerCfg in coordinatorSchedulerSettings.
Important
The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.
| cfInfo, err = co.GetPersistedChangefeedInfo(ctx, cfInfo.ChangefeedID) | ||
| if err != nil { | ||
| _ = c.Error(err) | ||
| return | ||
| } |
There was a problem hiding this comment.
Overwriting cfInfo directly with the result of GetPersistedChangefeedInfo before checking if it is nil can lead to a nil pointer dereference panic later in the function (e.g., when accessing cfInfo.ChangefeedID or cfInfo.Config). Using a temporary variable to perform a nil check ensures safety.
persistedInfo, err := co.GetPersistedChangefeedInfo(ctx, cfInfo.ChangefeedID)
if err != nil {
_ = c.Error(err)
return
}
if persistedInfo == nil {
_ = c.Error(errors.ErrChangeFeedNotExists.GenWithStackByArgs(cfInfo.ChangefeedID.Name()))
return
}
cfInfo = persistedInfo| info, err := b.etcdClient.GetChangeFeedInfo(ctx, id.DisplayName) | ||
| if err != nil { | ||
| return nil, errors.Trace(err) | ||
| } |
There was a problem hiding this comment.
If b.etcdClient.GetChangeFeedInfo returns nil, nil (which can happen in certain error/mock scenarios), dereferencing info.ChangefeedID on line 136 will cause a nil pointer panic. Adding a defensive nil check right after retrieving the info prevents this.
| info, err := b.etcdClient.GetChangeFeedInfo(ctx, id.DisplayName) | |
| if err != nil { | |
| return nil, errors.Trace(err) | |
| } | |
| info, err := b.etcdClient.GetChangeFeedInfo(ctx, id.DisplayName) | |
| if err != nil { | |
| return nil, errors.Trace(err) | |
| } | |
| if info == nil { | |
| return nil, errors.Trace(cerror.ErrChangeFeedNotExists.GenWithStackByArgs(id.Name())) | |
| } |
| func coordinatorSchedulerSettings(schedulerCfg *config.SchedulerConfig) (int, time.Duration) { | ||
| return schedulerCfg.MaxTaskConcurrency, time.Duration(schedulerCfg.CheckBalanceInterval) | ||
| } |
There was a problem hiding this comment.
If schedulerCfg is nil (which can happen during tests or if the configuration is partially initialized), accessing schedulerCfg.MaxTaskConcurrency will cause a panic. Adding a defensive nil check and returning the previous hardcoded defaults (10000 and time.Minute) as fallback values improves robustness.
func coordinatorSchedulerSettings(schedulerCfg *config.SchedulerConfig) (int, time.Duration) {
if schedulerCfg == nil {
return 10000, time.Minute
}
return schedulerCfg.MaxTaskConcurrency, time.Duration(schedulerCfg.CheckBalanceInterval)
}
This is an automated cherry-pick of #4832
What problem does this PR solve?
Issue Number: close #4831
What is changed and how it works?
The coordinator was constructed with hard-coded scheduling settings:
10000time.MinuteThis bypassed
Debug.Scheduler.MaxTaskConcurrencyandDebug.Scheduler.CheckBalanceInterval. During bulk changefeed creation, the basic scheduler could therefore schedule a very large number of absent changefeeds at once, causing many maintainer bootstraps to run concurrently.This PR reads the coordinator scheduler settings from the validated global server config before constructing the coordinator. With the default config, maintainer scheduling concurrency returns to
10, which reduces creation-time memory and CPU spikes by throttling concurrent bootstrap work.Check List
Tests
go test ./serverQuestions
Will it cause performance regression or break compatibility?
It should reduce CPU and memory spikes during bulk changefeed creation by honoring the existing scheduler concurrency config. It may make very large bulk creation finish more gradually compared with the previous hard-coded
10000concurrency, but that behavior matches the intended configurable scheduler limit and can be tuned via server config.Do you need to update user documentation, design documentation or monitoring documentation?
No. This PR makes existing scheduler config effective for coordinator scheduling.
Release note
Summary by CodeRabbit
Chores
Bug Fixes / Behavior Changes
Tests