fix(resourcecontrol): bypass RU accounting for NextGen background activities#1930
fix(resourcecontrol): bypass RU accounting for NextGen background activities#1930YuhaoZhang00 wants to merge 10 commits intotikv:masterfrom
Conversation
Signed-off-by: JmPotato <github@ipotato.me>
…s for DDL source types Bypass WorkloadLearning internal requests from resource control, and replace hardcoded "add_index"/"merge_temp_index" strings with named constants for consistency. Signed-off-by: Yuhao Zhang <yhzhang00@outlook.com>
Signed-off-by: Yuhao Zhang <yhzhang00@outlook.com>
Signed-off-by: Yuhao Zhang <yhzhang00@outlook.com>
Add a `tikv_client_go_resource_control_ru_total` counter that records RRU and WRU consumed per request, labeled by resource group, request source, and RU type. This fills an observability gap where client-go computes RU but never exposes it — all existing RU metrics live on the PD server side after aggregation, with no per-source breakdown. Signed-off-by: Yuhao Zhang <yhzhang00@outlook.com>
…terceptor Add a new bypassed_request_total counter to track requests that bypass resource control. Update getResourceControlInfo to return request info for bypassed requests so the interceptor can count them. Signed-off-by: Yuhao Zhang <yhzhang00@outlook.com>
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (4)
📝 WalkthroughWalkthroughThis PR enhances resource control bypass detection by adding request source tracking to RequestInfo and extending bypass logic to detect internal operations. It modifies getResourceControlInfo to return resource group name and request info even when bypassed, enabling downstream callers to maintain contextual information while skipping resource control. Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly related PRs
Suggested labels
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
🧹 Nitpick comments (3)
util/request_source.go (1)
44-53: Consider renamingInternalImportIntotoInternalTxnImportIntofor naming consistency.The other constants in this block follow the
InternalTxn*naming pattern (e.g.,InternalTxnBR,InternalTxnAddIndex,InternalTxnWorkloadLearning), butInternalImportIntodoes not include theTxninfix. This inconsistency may cause confusion when developers search for or reference these constants.🔧 Suggested rename
- // InternalImportInto is the type of IMPORT INTO usage - InternalImportInto = "ImportInto" + // InternalTxnImportInto is the type of IMPORT INTO usage + InternalTxnImportInto = "ImportInto"Note: If renamed, update the reference in
internal/resourcecontrol/resource_control.go(bypassResourceSourceList) accordingly.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@util/request_source.go` around lines 44 - 53, The constant name InternalImportInto is inconsistent with the InternalTxn* pattern; rename InternalImportInto to InternalTxnImportInto across the codebase (replace occurrences of InternalImportInto with InternalTxnImportInto) and update any references such as bypassResourceSourceList in internal/resourcecontrol/resource_control.go to use InternalTxnImportInto; ensure you update imports/usage sites and run tests to verify no remaining references to InternalImportInto.internal/client/client_interceptor.go (1)
158-170: Consider defining constants for RU type labels.The string literals
"rru"and"wru"are used directly. For consistency with other label constants in the metrics package and to avoid typos in future uses, consider defining these as constants.🔧 Suggested constants (in metrics/metrics.go)
const ( // ... existing constants ... LblRRU = "rru" LblWRU = "wru" )Then use:
- metrics.TiKVResourceControlRUCounter.WithLabelValues(resourceGroupName, requestSource, "rru").Add(consumption.RRU) + metrics.TiKVResourceControlRUCounter.WithLabelValues(resourceGroupName, requestSource, metrics.LblRRU).Add(consumption.RRU)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/client/client_interceptor.go` around lines 158 - 170, The function recordResourceControlMetrics uses hardcoded RU label strings ("rru", "wru"); define constants in the metrics package (e.g., LblRRU = "rru" and LblWRU = "wru" in metrics/metrics.go) and replace the literals in recordResourceControlMetrics (and any other usages) to use metrics.LblRRU and metrics.LblWRU when calling metrics.TiKVResourceControlRUCounter.WithLabelValues to ensure consistency and avoid typos.internal/resourcecontrol/resource_control.go (1)
98-103: Consider edge cases with substring matching.The
strings.Containsapproach could theoretically cause false positives if a future request source contains a bypass source as a substring (e.g., a hypothetical "no_add_index" would match "add_index"). Given the current naming conventions in TiDB, this seems unlikely, but worth noting for future reference.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/resourcecontrol/resource_control.go` around lines 98 - 103, The current substring check using strings.Contains(requestSource, source) can produce false positives; update the loop in resource_control.go (bypassResourceSourceList and requestSource) to use a safer comparison such as exact equality (requestSource == source) or, if requestSource can contain multiple tokens, split requestSource into tokens and check membership, or use boundaries (strings.HasPrefix/HasSuffix with delimiters) so only whole-source matches bypass; implement one of these approaches in the loop to replace strings.Contains.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@internal/client/client_interceptor.go`:
- Around line 158-170: The function recordResourceControlMetrics uses hardcoded
RU label strings ("rru", "wru"); define constants in the metrics package (e.g.,
LblRRU = "rru" and LblWRU = "wru" in metrics/metrics.go) and replace the
literals in recordResourceControlMetrics (and any other usages) to use
metrics.LblRRU and metrics.LblWRU when calling
metrics.TiKVResourceControlRUCounter.WithLabelValues to ensure consistency and
avoid typos.
In `@internal/resourcecontrol/resource_control.go`:
- Around line 98-103: The current substring check using
strings.Contains(requestSource, source) can produce false positives; update the
loop in resource_control.go (bypassResourceSourceList and requestSource) to use
a safer comparison such as exact equality (requestSource == source) or, if
requestSource can contain multiple tokens, split requestSource into tokens and
check membership, or use boundaries (strings.HasPrefix/HasSuffix with
delimiters) so only whole-source matches bypass; implement one of these
approaches in the loop to replace strings.Contains.
In `@util/request_source.go`:
- Around line 44-53: The constant name InternalImportInto is inconsistent with
the InternalTxn* pattern; rename InternalImportInto to InternalTxnImportInto
across the codebase (replace occurrences of InternalImportInto with
InternalTxnImportInto) and update any references such as
bypassResourceSourceList in internal/resourcecontrol/resource_control.go to use
InternalTxnImportInto; ensure you update imports/usage sites and run tests to
verify no remaining references to InternalImportInto.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 9424dd89-6885-425f-95f8-0aec17877926
📒 Files selected for processing (4)
internal/client/client_interceptor.gointernal/resourcecontrol/resource_control.gometrics/metrics.goutil/request_source.go
| } | ||
| recordResourceControlMetrics(resourceGroupName, req.GetRequestSource(), consumption) | ||
| } else if reqInfo != nil && reqInfo.Bypass() { | ||
| metrics.TiKVResourceControlBypassedCounter.WithLabelValues(resourceGroupName, req.GetRequestSource()).Inc() |
There was a problem hiding this comment.
Avoid introducing WithLabelValues op on the hot path, should cache it in advance.
| return resp, err | ||
| }) | ||
| } else if reqInfo != nil && reqInfo.Bypass() { | ||
| metrics.TiKVResourceControlBypassedCounter.WithLabelValues(resourceGroupName, req.GetRequestSource()).Inc() |
| metrics.TiKVResourceControlRUCounter.WithLabelValues(resourceGroupName, requestSource, "rru").Add(consumption.RRU) | ||
| } | ||
| if consumption.WRU > 0 { | ||
| metrics.TiKVResourceControlRUCounter.WithLabelValues(resourceGroupName, requestSource, "wru").Add(consumption.WRU) |
|
Remove the client-go local resource-control metrics introduced in earlier iterations |
…und activities Signed-off-by: Yuhao Zhang <yhzhang00@outlook.com>
…und-activities Signed-off-by: Yuhao Zhang <yhzhang00@outlook.com>
Ref pingcap/tidb#64339.
Supersedes #1809.
Supersedes #1929.
Summary
Add RU bypass logic for NextGen background activities
shouldBypass()to skip RU accounting for heavy internal operations under thenextgenbuild tag: DDL index backfill (add_index,merge_temp_index), BR (br), IMPORT INTO (ImportInto), and workload learning (WorkloadLearning).Plumb
RequestSourceintoresourcecontrol.RequestInfoRequestSource()to the local request info object built inclient-gofor metric collection purposeRelative PR:
Tests
Manually verified on a local NextGen cluster (NextGen PD + CSE TiKV + TiDB
NEXT_GEN=1):Test Summary
SELECT)add_indexALTER TABLE t1 ADD INDEXon 4M rowsstats(cop tp=104)ANALYZE TABLE t1on 4M rowsstats(non-analyze)internal_othersTest Detail
Data:
bypass_test.t1— 4M rows, ~370 MBObservation: Before and after each activity, snapshot metrics:
A. Baseline (before any test)
bypassed_request_total:ru_total:B.
add_index— DDL index backfillSQL:
ALTER TABLE t1 ADD INDEX idx_col2 (col2);(4M rows)bypassed_request_totalAfter DDL add_index:bypassed_request_totaldelta:internal_ddl_add_indexleader_internal_ddl_add_indexleader_internal_ddl_add_index_ddlru_total: unchanged from baseline (no entries containing"add_index"appeared).C.
stats+ cop type 104 — ANALYZE TABLESQL:
ANALYZE TABLE t1;(4M rows)bypassed_request_totalAfter ANALYZE TABLE:bypassed_request_totaldelta:leader_internal_statsru_total(stats-related delta) After ANALYZE TABLE:ru_totaldelta:internal_statsleader_internal_statsThis is correct. The bypass only fires for cop requests with
tp == 104(analyze). Other stats operations (metadata reads, histogram writes) correctly consume RU.D.
internal_others— low-resource internal opsNot triggered by a specific test — these accumulate from background system operations.
bypassed_request_totalat end of session:internal_othersleader_internal_othersretry_leader_leader_internal_othersru_total: no entries containing"internal_others"appeared.Control: Normal user queries
SQL:
SELECT COUNT(*),SELECT AVG(), point readsru_totalleader_external_Selectrrubypassed_request_totalfor external sourcesUser queries are unaffected by bypass logic.
Summary by CodeRabbit
Bug Fixes
Tests