Skip to content

fix(resourcecontrol): bypass RU accounting for NextGen background activities#1930

Open
YuhaoZhang00 wants to merge 10 commits intotikv:masterfrom
YuhaoZhang00:rc/bypass-background-activities
Open

fix(resourcecontrol): bypass RU accounting for NextGen background activities#1930
YuhaoZhang00 wants to merge 10 commits intotikv:masterfrom
YuhaoZhang00:rc/bypass-background-activities

Conversation

@YuhaoZhang00
Copy link
Copy Markdown

@YuhaoZhang00 YuhaoZhang00 commented Mar 31, 2026

Ref pingcap/tidb#64339.

Supersedes #1809.
Supersedes #1929.

Summary

  • Add RU bypass logic for NextGen background activities

    • Expand shouldBypass() to skip RU accounting for heavy internal operations under the nextgen build tag: DDL index backfill (add_index, merge_temp_index), BR (br), IMPORT INTO (ImportInto), and workload learning (WorkloadLearning).
  • Plumb RequestSource into resourcecontrol.RequestInfo

    • Add RequestSource() to the local request info object built in client-go for metric collection purpose

Relative PR:

Tests

Manually verified on a local NextGen cluster (NextGen PD + CSE TiKV + TiDB NEXT_GEN=1):

Test Summary

Source Activity Bypassed? RU accrued?
Control (SELECT) normal user queries No Yes — RU correctly charged
add_index ALTER TABLE t1 ADD INDEX on 4M rows Yes — 148 requests No — 0 RU
stats (cop tp=104) ANALYZE TABLE t1 on 4M rows Yes — +13 analyze requests No — 0 RU for analyze cops
stats (non-analyze) metadata reads, histogram writes No Yes — +13k RU
internal_others background ops (always-on) Yes — 7,500+ requests No — 0 RU

Test Detail

Data: bypass_test.t1 — 4M rows, ~370 MB
Observation: Before and after each activity, snapshot metrics:

curl -s http://127.0.0.1:10080/metrics | grep 'resource_control_bypassed_request_total{' | sort
curl -s http://127.0.0.1:10080/metrics | grep 'resource_control_ru_total{' | sort

A. Baseline (before any test)

bypassed_request_total:

internal_others                         29
leader_internal_others                  6186
leader_internal_stats                   20
retry_leader_leader_internal_others     62

ru_total:

external_Insert                         wru    531,100.27
internal_ddl                            wru    10.40
internal_DDLNotifier                    wru    28.71
internal_DistTask                       wru    3.53
internal_gc                             wru    213.44
internal_stats                          wru    1,123.36
leader_external_Select                  rru    16,483.78
leader_internal_ddl                     rru    1,486.64
leader_internal_DDLNotifier             rru    1,353.71
leader_internal_DistTask                rru    6,804.38
leader_internal_gc                      rru    135.55
leader_internal_stats                   rru    22,804.10
leader_internal_Timer                   rru    25.30
leader_internal_TTL                     rru    43.04
leader_unknown                          rru    2.02
retry_leader_leader_internal_ddl        rru    1.02
retry_leader_leader_internal_stats      rru    544.25
unknown                                 wru    22.63

B. add_index — DDL index backfill

SQL: ALTER TABLE t1 ADD INDEX idx_col2 (col2); (4M rows)

bypassed_request_total After DDL add_index:

internal_ddl_add_index                  62      ← NEW
internal_others                         37
leader_internal_ddl_add_index           71      ← NEW
leader_internal_ddl_add_index_ddl       15      ← NEW
leader_internal_others                  6983
leader_internal_stats                   34
retry_leader_leader_internal_others     68

bypassed_request_total delta:

Wire source Delta
internal_ddl_add_index +62
leader_internal_ddl_add_index +71
leader_internal_ddl_add_index_ddl +15
Total +148

ru_total: unchanged from baseline (no entries containing "add_index" appeared).

C. stats + cop type 104 — ANALYZE TABLE

SQL: ANALYZE TABLE t1; (4M rows)

bypassed_request_total After ANALYZE TABLE:

...
leader_internal_stats                   56      ← was 34, +22
...

bypassed_request_total delta:

Wire source Before After Delta
leader_internal_stats 43 56 +13

ru_total (stats-related delta) After ANALYZE TABLE:

...
internal_stats                          wru    3,066.25    ← was 1,123.36
leader_internal_stats                   rru    34,377.64   ← was 22,804.10
retry_leader_leader_internal_stats      rru    547.62      ← was 544.25
...

ru_total delta:

Wire source Type Delta
internal_stats wru +1,943
leader_internal_stats rru +11,574

This is correct. The bypass only fires for cop requests with tp == 104 (analyze). Other stats operations (metadata reads, histogram writes) correctly consume RU.

D. internal_others — low-resource internal ops

Not triggered by a specific test — these accumulate from background system operations.

bypassed_request_total at end of session:

Wire source Count
internal_others 45
leader_internal_others 7,522
retry_leader_leader_internal_others 72

ru_total: no entries containing "internal_others" appeared.

Control: Normal user queries

SQL: SELECT COUNT(*), SELECT AVG(), point reads

Metric Delta
ru_total leader_external_Select rru +11,677
bypassed_request_total for external sources 0

User queries are unaffected by bypass logic.

Summary by CodeRabbit

  • Bug Fixes

    • Improved resource control bypass logic to correctly handle internal operations (DDL, backup/restore, import) while preserving resource group information.
    • Enhanced request source tracking for better resource management in NextGen mode.
  • Tests

    • Added test coverage for resource control bypass scenarios and request source handling.

JmPotato and others added 7 commits December 9, 2025 14:55
Signed-off-by: JmPotato <github@ipotato.me>
…s for DDL source types

Bypass WorkloadLearning internal requests from resource control, and
replace hardcoded "add_index"/"merge_temp_index" strings with named
constants for consistency.

Signed-off-by: Yuhao Zhang <yhzhang00@outlook.com>
Signed-off-by: Yuhao Zhang <yhzhang00@outlook.com>
Signed-off-by: Yuhao Zhang <yhzhang00@outlook.com>
Add a `tikv_client_go_resource_control_ru_total` counter that records
RRU and WRU consumed per request, labeled by resource group, request
source, and RU type. This fills an observability gap where client-go
computes RU but never exposes it — all existing RU metrics live on the
PD server side after aggregation, with no per-source breakdown.

Signed-off-by: Yuhao Zhang <yhzhang00@outlook.com>
…terceptor

Add a new bypassed_request_total counter to track requests that bypass
resource control. Update getResourceControlInfo to return request info
for bypassed requests so the interceptor can count them.

Signed-off-by: Yuhao Zhang <yhzhang00@outlook.com>
@ti-chi-bot ti-chi-bot bot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. dco-signoff: yes Indicates the PR's author has signed the dco. labels Mar 31, 2026
@ti-chi-bot
Copy link
Copy Markdown

ti-chi-bot bot commented Mar 31, 2026

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign myonkeminta for approval. For more information see the Code Review Process.
Please ensure that each of them provides their approval before proceeding.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Mar 31, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 289cf8dc-4935-4a3d-8e99-64d50ffca317

📥 Commits

Reviewing files that changed from the base of the PR and between e791ac6 and 993380a.

📒 Files selected for processing (4)
  • internal/client/client_interceptor.go
  • internal/client/client_interceptor_test.go
  • internal/resourcecontrol/resource_control.go
  • internal/resourcecontrol/resource_control_test.go

📝 Walkthrough

Walkthrough

This PR enhances resource control bypass detection by adding request source tracking to RequestInfo and extending bypass logic to detect internal operations. It modifies getResourceControlInfo to return resource group name and request info even when bypassed, enabling downstream callers to maintain contextual information while skipping resource control.

Changes

Cohort / File(s) Summary
Core Bypass Logic
internal/client/client_interceptor.go
Modified getResourceControlInfo to return the computed resourceGroupName and non-nil reqInfo when bypass is true, instead of empty group name and nil request info.
Request Source Tracking & Bypass Detection
internal/resourcecontrol/resource_control.go
Added requestSource field to RequestInfo struct with accessor method. Extended shouldBypass logic to detect internal operations (BR, ImportInto, DDL index operations, workload learning) when NextGen mode is enabled by checking request source substrings in addition to existing analyze-request checks.
Internal Request Source Constants
util/request_source.go
Added five new exported constants for internal transaction types: InternalTxnBR, InternalImportInto, InternalTxnAddIndex, InternalTxnMergeTempIndex, and InternalTxnWorkloadLearning.
Bypass Logic Tests
internal/client/client_interceptor_test.go
Added testResourceControlInterceptor stub implementation and new test TestGetResourceControlInfoBypassesResourceControl verifying that resource group name and non-nil request info are returned on bypass with nil interceptor.
Request Source & Bypass Tests
internal/resourcecontrol/resource_control_test.go
Updated TestMakeRequestInfo to verify RequestSource() accessor and added new table-driven test TestMakeRequestInfoBypassCases covering bypass behavior across multiple internal request sources with conditional NextGen mode checks.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

  • #1926: Uses MakeRequestInfo(req).Bypass() to gate RUv2 updates in client send paths, directly consuming the bypass information propagated by this PR's changes to RequestInfo and getResourceControlInfo.

Suggested labels

lgtm, approved

Suggested reviewers

  • nolouch
  • ekexium
  • glorv

Poem

🐰 Hopping through control with grace,
Request sources find their place,
Bypass signals shine so bright,
Internal ops now skip the fight!
Resource groups dance with care,
Better flow beyond compare!

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 12.50% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'fix(resourcecontrol): bypass RU accounting for NextGen background activities' is specific and directly summarizes the main change: extending bypass logic for background operations.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@ti-chi-bot ti-chi-bot bot added contribution This PR is from a community contributor. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Mar 31, 2026
@YuhaoZhang00 YuhaoZhang00 marked this pull request as ready for review March 31, 2026 03:00
@ti-chi-bot ti-chi-bot bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Mar 31, 2026
Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (3)
util/request_source.go (1)

44-53: Consider renaming InternalImportInto to InternalTxnImportInto for naming consistency.

The other constants in this block follow the InternalTxn* naming pattern (e.g., InternalTxnBR, InternalTxnAddIndex, InternalTxnWorkloadLearning), but InternalImportInto does not include the Txn infix. This inconsistency may cause confusion when developers search for or reference these constants.

🔧 Suggested rename
-	// InternalImportInto is the type of IMPORT INTO usage
-	InternalImportInto = "ImportInto"
+	// InternalTxnImportInto is the type of IMPORT INTO usage
+	InternalTxnImportInto = "ImportInto"

Note: If renamed, update the reference in internal/resourcecontrol/resource_control.go (bypassResourceSourceList) accordingly.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@util/request_source.go` around lines 44 - 53, The constant name
InternalImportInto is inconsistent with the InternalTxn* pattern; rename
InternalImportInto to InternalTxnImportInto across the codebase (replace
occurrences of InternalImportInto with InternalTxnImportInto) and update any
references such as bypassResourceSourceList in
internal/resourcecontrol/resource_control.go to use InternalTxnImportInto;
ensure you update imports/usage sites and run tests to verify no remaining
references to InternalImportInto.
internal/client/client_interceptor.go (1)

158-170: Consider defining constants for RU type labels.

The string literals "rru" and "wru" are used directly. For consistency with other label constants in the metrics package and to avoid typos in future uses, consider defining these as constants.

🔧 Suggested constants (in metrics/metrics.go)
const (
	// ... existing constants ...
	LblRRU = "rru"
	LblWRU = "wru"
)

Then use:

-		metrics.TiKVResourceControlRUCounter.WithLabelValues(resourceGroupName, requestSource, "rru").Add(consumption.RRU)
+		metrics.TiKVResourceControlRUCounter.WithLabelValues(resourceGroupName, requestSource, metrics.LblRRU).Add(consumption.RRU)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/client/client_interceptor.go` around lines 158 - 170, The function
recordResourceControlMetrics uses hardcoded RU label strings ("rru", "wru");
define constants in the metrics package (e.g., LblRRU = "rru" and LblWRU = "wru"
in metrics/metrics.go) and replace the literals in recordResourceControlMetrics
(and any other usages) to use metrics.LblRRU and metrics.LblWRU when calling
metrics.TiKVResourceControlRUCounter.WithLabelValues to ensure consistency and
avoid typos.
internal/resourcecontrol/resource_control.go (1)

98-103: Consider edge cases with substring matching.

The strings.Contains approach could theoretically cause false positives if a future request source contains a bypass source as a substring (e.g., a hypothetical "no_add_index" would match "add_index"). Given the current naming conventions in TiDB, this seems unlikely, but worth noting for future reference.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/resourcecontrol/resource_control.go` around lines 98 - 103, The
current substring check using strings.Contains(requestSource, source) can
produce false positives; update the loop in resource_control.go
(bypassResourceSourceList and requestSource) to use a safer comparison such as
exact equality (requestSource == source) or, if requestSource can contain
multiple tokens, split requestSource into tokens and check membership, or use
boundaries (strings.HasPrefix/HasSuffix with delimiters) so only whole-source
matches bypass; implement one of these approaches in the loop to replace
strings.Contains.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@internal/client/client_interceptor.go`:
- Around line 158-170: The function recordResourceControlMetrics uses hardcoded
RU label strings ("rru", "wru"); define constants in the metrics package (e.g.,
LblRRU = "rru" and LblWRU = "wru" in metrics/metrics.go) and replace the
literals in recordResourceControlMetrics (and any other usages) to use
metrics.LblRRU and metrics.LblWRU when calling
metrics.TiKVResourceControlRUCounter.WithLabelValues to ensure consistency and
avoid typos.

In `@internal/resourcecontrol/resource_control.go`:
- Around line 98-103: The current substring check using
strings.Contains(requestSource, source) can produce false positives; update the
loop in resource_control.go (bypassResourceSourceList and requestSource) to use
a safer comparison such as exact equality (requestSource == source) or, if
requestSource can contain multiple tokens, split requestSource into tokens and
check membership, or use boundaries (strings.HasPrefix/HasSuffix with
delimiters) so only whole-source matches bypass; implement one of these
approaches in the loop to replace strings.Contains.

In `@util/request_source.go`:
- Around line 44-53: The constant name InternalImportInto is inconsistent with
the InternalTxn* pattern; rename InternalImportInto to InternalTxnImportInto
across the codebase (replace occurrences of InternalImportInto with
InternalTxnImportInto) and update any references such as
bypassResourceSourceList in internal/resourcecontrol/resource_control.go to use
InternalTxnImportInto; ensure you update imports/usage sites and run tests to
verify no remaining references to InternalImportInto.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 9424dd89-6885-425f-95f8-0aec17877926

📥 Commits

Reviewing files that changed from the base of the PR and between 3805cb7 and e791ac6.

📒 Files selected for processing (4)
  • internal/client/client_interceptor.go
  • internal/resourcecontrol/resource_control.go
  • metrics/metrics.go
  • util/request_source.go

}
recordResourceControlMetrics(resourceGroupName, req.GetRequestSource(), consumption)
} else if reqInfo != nil && reqInfo.Bypass() {
metrics.TiKVResourceControlBypassedCounter.WithLabelValues(resourceGroupName, req.GetRequestSource()).Inc()
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Avoid introducing WithLabelValues op on the hot path, should cache it in advance.

return resp, err
})
} else if reqInfo != nil && reqInfo.Bypass() {
metrics.TiKVResourceControlBypassedCounter.WithLabelValues(resourceGroupName, req.GetRequestSource()).Inc()
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto.

Comment on lines +165 to +168
metrics.TiKVResourceControlRUCounter.WithLabelValues(resourceGroupName, requestSource, "rru").Add(consumption.RRU)
}
if consumption.WRU > 0 {
metrics.TiKVResourceControlRUCounter.WithLabelValues(resourceGroupName, requestSource, "wru").Add(consumption.WRU)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto.

@YuhaoZhang00 YuhaoZhang00 marked this pull request as draft April 9, 2026 03:18
@ti-chi-bot ti-chi-bot bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Apr 9, 2026
@YuhaoZhang00
Copy link
Copy Markdown
Author

Remove the client-go local resource-control metrics introduced in earlier iterations
- Do not record RU-by-source metrics in client-go
- Do not record bypass metrics in client-go

…und activities

Signed-off-by: Yuhao Zhang <yhzhang00@outlook.com>
…und-activities

Signed-off-by: Yuhao Zhang <yhzhang00@outlook.com>
@YuhaoZhang00 YuhaoZhang00 marked this pull request as ready for review April 9, 2026 05:00
@ti-chi-bot ti-chi-bot bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Apr 9, 2026
@YuhaoZhang00 YuhaoZhang00 requested a review from JmPotato April 9, 2026 05:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

contribution This PR is from a community contributor. dco-signoff: yes Indicates the PR's author has signed the dco. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants