Skip to content

eventservice: optimize scanwindow#4950

Draft
asddongmen wants to merge 5 commits intopingcap:masterfrom
asddongmen:0427-scanwindow-v2
Draft

eventservice: optimize scanwindow#4950
asddongmen wants to merge 5 commits intopingcap:masterfrom
asddongmen:0427-scanwindow-v2

Conversation

@asddongmen
Copy link
Copy Markdown
Collaborator

What problem does this PR solve?

Issue Number: close #xxx

What is changed and how it works?

Check List

Tests

  • Unit test
  • Integration test
  • Manual test (add detailed scripts or steps below)
  • No code

Questions

Will it cause performance regression or break compatibility?
Do you need to update user documentation, design documentation or monitoring documentation?

Release note

Please refer to [Release Notes Language Style Guide](https://pingcap.github.io/tidb-dev-guide/contribute-to-tidb/release-notes-style-guide.html) to write a quality release note.

If you don't think this PR needs a release note then fill it with `None`.

Signed-off-by: dongmen <414110582@qq.com>
Signed-off-by: dongmen <414110582@qq.com>
Signed-off-by: dongmen <414110582@qq.com>
Signed-off-by: dongmen <414110582@qq.com>
Signed-off-by: dongmen <414110582@qq.com>
@ti-chi-bot ti-chi-bot Bot added do-not-merge/needs-linked-issue release-note Denotes a PR that will be considered when it comes time to generate release notes. do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. labels Apr 29, 2026
@ti-chi-bot
Copy link
Copy Markdown

ti-chi-bot Bot commented Apr 29, 2026

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@ti-chi-bot
Copy link
Copy Markdown

ti-chi-bot Bot commented Apr 29, 2026

[FORMAT CHECKER NOTIFICATION]

Notice: To remove the do-not-merge/needs-linked-issue label, please provide the linked issue number on one line in the PR body, for example: Issue Number: close #123 or Issue Number: ref #456.

📖 For more info, you can check the "Contribute Code" section in the development guide.

@ti-chi-bot
Copy link
Copy Markdown

ti-chi-bot Bot commented Apr 29, 2026

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign hongyunyan for approval. For more information see the Code Review Process.
Please ensure that each of them provides their approval before proceeding.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Apr 29, 2026

Important

Review skipped

Draft detected.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 60b848c0-b349-4a50-84d2-26739fb7a9a3

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@ti-chi-bot ti-chi-bot Bot added the size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. label Apr 29, 2026
Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request replaces the existing scan interval adjustment logic with a new adaptiveScanWindowController that utilizes Exponential Moving Averages (EMAs) and a pressure score for more stable memory pressure management. The update includes comprehensive simulation tests and enhanced Prometheus metrics for monitoring controller decisions. Review feedback highlights a non-monotonic discontinuity in the emergency brake calculation, potential over-throttling caused by latching peak usage values, the presence of magic numbers, and the use of a redundant maxFloat64 helper that should be replaced by the built-in max function.

Comment on lines +562 to +567
func scanWindowEmergencyBrakeInterval(current time.Duration) time.Duration {
if current <= 6*defaultScanInterval {
return max(current/2, defaultScanInterval)
}
return max(current/4, minScanInterval)
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The logic in scanWindowEmergencyBrakeInterval has two significant issues:

  1. Discontinuity: There is a sharp jump at the boundary of 6*defaultScanInterval (30s). For an input of 30s, it returns 15s (current/2), but for 30.1s, it returns ~7.5s (current/4). This non-monotonic behavior can cause unstable oscillations in the scan interval.
  2. Unreachable Minimum: The function floors at defaultScanInterval (5s) for any current <= 30s. This makes the minScanInterval (1s) constant effectively unreachable during emergency pressure (98%+ usage) if the interval has already been reduced to a moderate level. If the goal is to allow the interval to drop to 1s under extreme congestion, the floor should be minScanInterval in both branches.


return c.fastUsageEMA >= scanWindowHighPressureThreshold ||
c.slowUsageEMA >= scanWindowHighPressureThreshold ||
usage.max >= memoryUsageHighThreshold
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Using usage.max in shouldReduceForHighPressureLocked may lead to excessive interval reductions. Since usage.max tracks the peak usage over the last 30 seconds, a single transient spike will keep triggering reductions every 10 seconds (the cooldown) for the entire duration the spike remains in the window, even if the current usage (usage.last) and EMAs indicate that pressure has subsided. Consider relying primarily on EMAs or the current report value to ensure the response is truly adaptive to the present state.

Comment on lines +738 to +739
c.fastUsageEMA < memoryUsageLowThreshold+0.03 &&
c.slowUsageEMA < memoryUsageLowThreshold+0.02
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The magic numbers 0.03 and 0.02 used as offsets for the low pressure check should be defined as named constants to improve code clarity and maintainability.

Comment on lines +796 to 801
func maxFloat64(a float64, b float64) float64 {
if a > b {
return a
}
return b
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The maxFloat64 helper function is redundant as Go 1.21+ provides a built-in max function. The codebase already uses the built-in max in other parts of this file (e.g., lines 450, 464), so this helper should be removed and its call sites (lines 679, 681, 687) updated to use the built-in function for consistency.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

do-not-merge/needs-linked-issue do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. release-note Denotes a PR that will be considered when it comes time to generate release notes. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant