Flash: add MPP sql digest and connection metadata tracing#10751
Flash: add MPP sql digest and connection metadata tracing#10751JaySon-Huang wants to merge 4 commits intopingcap:masterfrom
Conversation
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
📝 WalkthroughWalkthroughAdded sql_digest and plan_digest fields to DAGContext, propagated those plus connection_id and connection_alias into MPPTaskStatistics, and updated DispatchMPPTask logging; also advanced the Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Suggested labels
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
📝 Coding Plan
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@dbms/src/Flash/Coprocessor/DAGContext.cpp`:
- Around line 107-108: The code incorrectly accesses the protobuf field as a
member (meta_.plan_digest) which doesn't exist; change the reference to use the
protobuf accessor method meta_.plan_digest() (consistent with the existing use
of meta_.sql_digest()) in the DAGContext constructor/initializer to fix the
compile error.
In `@dbms/src/Flash/Mpp/MPPTaskStatistics.cpp`:
- Line 117: The JSON-format string in MPPTaskStatistics.cpp embeds raw values
(connection_alias, sql_digest, plan_digest) directly into tracing logs; before
formatting those fields into the
R"(,"connection_id":{},"connection_alias":"{}","sql_digest":"{}","plan_digest":"{}")"
template (and the similar template at lines 129-132), run each string through a
JSON-escaping helper to sanitize characters like quotes, backslashes, and
control chars. Add or reuse a function such as escapeJsonString/escapeForJson
and apply it to connection_alias, sql_digest, and plan_digest in the code paths
that build the trace message (within the MPPTaskStatistics logging/trace
methods) so the formatted JSON is always escaped correctly.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
Run ID: fc49a427-f9df-4282-b797-9342b294a9b0
📒 Files selected for processing (6)
contrib/kvprotodbms/src/Flash/Coprocessor/DAGContext.cppdbms/src/Flash/Coprocessor/DAGContext.hdbms/src/Flash/FlashService.cppdbms/src/Flash/Mpp/MPPTaskStatistics.cppdbms/src/Flash/Mpp/MPPTaskStatistics.h
|
Review Complete Findings: 2 issues ℹ️ Learn more details on Pantheon AI. |
| /// don't use info log for initializing status since it does not contains too many information | ||
| status == INITIALIZING ? Poco::Message::PRIO_DEBUG : Poco::Message::PRIO_INFORMATION, | ||
| R"({{"query_tso":{},"task_id":{},"is_root":{},"sender_executor_id":"{}","executors":{},"host":"{}")" | ||
| R"(,"connection_id":{},"connection_alias":"{}","sql_digest":"{}","plan_digest":"{}")" |
There was a problem hiding this comment.
@windtalker I think only log it in
Handling mpp dispatch requestis enough?
This is useful to add sql_digest in the final report logging of MPPTaskStatistics.cpp. The MPPTaskStatistics.cpp logging contains rich information, such as how much RU was consumed, which executors were used, and how much data was scanned by the storage layer. We can filter logs containing MPPTaskStatistics.cpp and related information using Loki queries according to our needs.
For example, in the internal case of O11Y-2155585, we can filter the SQLs only with TopN executor that are executed on tiflash by Loki query like
{...", container=~"serverlog.*"} |= `MPPTaskStatistics` |~ `TopN`
and tidb log by Loki query like
{... container=~".*tidb"} |= `local_mpp_coordinator.go` |~ `TopN`
If there are
[2026/03/05 03:19:40.409 +00:00] [INFO] [MPPTaskStatistics.cpp:139] ["{\"query_tso\":464697630464938873,\"task_id\":1,\"is_root\":true,\"sender_executor_id\":\"ExchangeSender_53\",\"executors\":[{\"id\":\"ExchangeSender_53\",\"type\":\"ExchangeSender\",\"children\":[\"TopN_52\"],\"outbound_rows\":500,\"outbound_blocks\":1,\"outbound_bytes\":44960,\"outbound_allocated_bytes\":53248,\"concurrency\":32,\"execution_time_ns\":89997426,\"partition_num\":1,\"sender_target_task_ids\":[-1],\"exchange_type\":\"PassThrough\",\"connection_details\":[{\"tunnel_id\":\"tunnel1+-1\",\"sender_target_task_id\":-1,\"sender_target_host\":\"172.26.26.37:33298\",\"is_local\":false,\"packets\":1,\"bytes\":43054}]},{\"id\":\"Selection_51\",\"type\":\"Selection\",\"children\":[\"TableFullScan_50\"],\"outbound_rows\":608139,\"outbound_blocks\":206,\"outbound_bytes\":52208714,\"outbound_allocated_bytes\":74575488,\"concurrency\":33,\"execution_time_ns\":79997710},{\"id\":\"TableFullScan_50\",\"type\":\"TableScan\",\"children\":[],\"outbound_rows\":1020425,\"outbound_blocks\":206,\"outbound_bytes\":76121302,\"outbound_allocated_bytes\":250056320,\"concurrency\":33,\"execution_time_ns\":79997710,\"connection_details\":[{\"is_local\":true,\"packets\":0,\"bytes\":75914204,\"max\":39.998855,\"min\":29.999141},{\"is_local\":false,\"packets\":1,\"bytes\":137933,\"max\":79.99771,\"min\":79.99771}],\"scan_details\":{\"build_bitmap_time\":\"174.074ms\",\"build_stream_time\":\"177.412ms\",\"create_snapshot_time\":\"0.089ms\",\"delta_bytes\":255338442,\"delta_rows\":243061,\"dmfile_data_scanned_rows\":1634016,\"dmfile_data_skipped_rows\":7913032,\"dmfile_lm_filter_scanned_rows\":1650083,\"dmfile_lm_filter_skipped_rows\":7896965,\"dmfile_mvcc_scanned_rows\":1650083,\"dmfile_mvcc_skipped_rows\":7896965,\"dmfile_read_time\":\"133.610ms\",\"late_materialization_skip_rows\":16067,\"learner_read_time\":\"1.632ms\",\"local_max_stream_cost_ms\":\"39.999ms\",\"local_min_stream_cost_ms\":\"29.999ms\",\"mvcc_input_bytes\":27660819,\"mvcc_input_rows\":1627107,\"mvcc_read_bytes\":30523857,\"mvcc_skip_rows\":515147,\"num_columns\":6,\"num_local_region\":39,\"num_read_tasks\":15,\"num_remote_region\":1,\"num_segments\":15,\"num_stale_read\":0,\"pushdown\":{\"rs_operator\":{\"children\":[{\"op\":\"unsupported\",\"reason\":\"ColumnRef with field type is not supported, sig=EQString field_type=15\"},{\"op\":\"unsupported\",\"reason\":\"ColumnRef with field type is not supported, sig=EQString field_type=15\"},{\"children\":[{\"op\":\"unsupported\",\"reason\":\"ColumnRef with field type is not supported, sig=NEString field_type=15\"},{\"op\":\"unsupported\",\"reason\":\"ColumnRef with field type is not supported, sig=NEString field_type=15\"}],\"op\":\"or\"},{\"col\":\"id\",\"op\":\"greater\",\"value\":\"?\"}],\"op\":\"and\"}},\"query_read_bytes\":106567779,\"read_mode\":\"Bitmap\",\"region_num_of_instance\":[{\"instance_id\":\"db-tiflash-0.db-tiflash-peer.tidb10518506906906971664.svc:3930\",\"region_num\":39}],\"remote_max_stream_cost_ms\":\"79.998ms\",\"remote_min_stream_cost_ms\":\"79.998ms\",\"rs_dmfile_read_with_all\":0,\"rs_pack_filter_all\":0,\"rs_pack_filter_all_null\":0,\"rs_pack_filter_check_time\":\"1.480ms\",\"rs_pack_filter_none\":1008,\"rs_pack_filter_some\":216}},{\"id\":\"TopN_52\",\"type\":\"TopN\",\"children\":[\"Selection_51\"],\"outbound_rows\":500,\"outbound_blocks\":1,\"outbound_bytes\":44960,\"outbound_allocated_bytes\":53248,\"concurrency\":32,\"execution_time_ns\":89997426}],\"host\":\"db-tiflash-0.db-tiflash-peer.tidb10518506906906971664.svc:3930\",\"task_init_timestamp\":1772680780318360000,\"task_start_timestamp\":1772680780321330000,\"task_end_timestamp\":1772680780409195000,\"compile_start_timestamp\":1772680780318604000,\"compile_end_timestamp\":1772680780321318000,\"read_wait_index_start_timestamp\":1772680780320297000,\"read_wait_index_end_timestamp\":1772680780320332000,\"local_input_bytes\":75914204,\"remote_input_bytes\":207098,\"output_bytes\":0,\"status\":\"FINISHED\",\"error_message\":\"\",\"cpu_ru\":13.333333333333334,\"read_ru\":2091.8523559570312,\"memory_peak\":198287670,\"extra_info\":{\"cpu_execute_time_ns\":39998858,\"cpu_pending_time_ns\":0,\"io_execute_time_ns\":0,\"io_pending_time_ns\":0,\"await_time_ns\":79997710,\"wait_for_notify_time_ns\":4209879560}}"] [source="mpp_task_tracing MPP<gather_id:1, query_ts:1772680780316129293, local_query_id:18885517, server_id:1490, start_ts:464697630464938873,task_id:1>"] [thread_id=8526]
[2026/03/05 03:19:40.317 +00:00] [INFO] [local_mpp_coordinator.go:222] ["Dispatch mpp task"] [timestamp=464697630464938873] [ID=1] [QueryTs=1772680780316129293] [LocalQueryId=18885517] [ServerID=1490] [address=db-tiflash-0.db-tiflash-peer.tidb10518506906906971664.svc:3930] [plan="Table(dh_account_basic)->Sel([eq(dh_app_250.dh_account_basic.site_code, 250) eq(dh_app_250.dh_account_basic.register_currency, BRL) or(ne(dh_app_250.dh_account_basic.mtpush_id, ), ne(dh_app_250.dh_account_basic.jg_web_id, ))])->TopN([dh_app_250.dh_account_basic.id],0,500)->Send(-1, )"] [mpp-version=2] [exchange-compression-mode=NONE] [GatherID=1] [resource_group=default]
[2026/03/05 03:19:40.317 +00:00] [INFO] [local_mpp_coordinator.go:222] ["Dispatch mpp task"] [timestamp=464697630464938873] [ID=2] [QueryTs=1772680780316129293] [LocalQueryId=18885517] [ServerID=1490] [address=db-tiflash-3.db-tiflash-peer.tidb10518506906906971664.svc:3930] [plan="Table(dh_account_basic)->Sel([eq(dh_app_250.dh_account_basic.site_code, 250) eq(dh_app_250.dh_account_basic.register_currency, BRL) or(ne(dh_app_250.dh_account_basic.mtpush_id, ), ne(dh_app_250.dh_account_basic.jg_web_id, ))])->TopN([dh_app_250.dh_account_basic.id],0,500)->Send(-1, )"] [mpp-version=2] [exchange-compression-mode=NONE] [GatherID=1] [resource_group=default]
[2026/03/05 03:19:40.317 +00:00] [INFO] [local_mpp_coordinator.go:222] ["Dispatch mpp task"] [timestamp=464697630464938873] [ID=3] [QueryTs=1772680780316129293] [LocalQueryId=18885517] [ServerID=1490] [address=db-tiflash-1.db-tiflash-peer.tidb10518506906906971664.svc:3930] [plan="Table(dh_account_basic)->Sel([eq(dh_app_250.dh_account_basic.site_code, 250) eq(dh_app_250.dh_account_basic.register_currency, BRL) or(ne(dh_app_250.dh_account_basic.mtpush_id, ), ne(dh_app_250.dh_account_basic.jg_web_id, ))])->TopN([dh_app_250.dh_account_basic.id],0,500)->Send(-1, )"] [mpp-version=2] [exchange-compression-mode=NONE] [GatherID=1] [resource_group=default]
[2026/03/05 03:19:40.317 +00:00] [INFO] [local_mpp_coordinator.go:222] ["Dispatch mpp task"] [timestamp=464697630464938873] [ID=4] [QueryTs=1772680780316129293] [LocalQueryId=18885517] [ServerID=1490] [address=db-tiflash-2.db-tiflash-peer.tidb10518506906906971664.svc:3930] [plan="Table(dh_account_basic)->Sel([eq(dh_app_250.dh_account_basic.site_code, 250) eq(dh_app_250.dh_account_basic.register_currency, BRL) or(ne(dh_app_250.dh_account_basic.mtpush_id, ), ne(dh_app_250.dh_account_basic.jg_web_id, ))])->TopN([dh_app_250.dh_account_basic.id],0,500)->Send(-1, )"] [mpp-version=2] [exchange-compression-mode=NONE] [GatherID=1] [resource_group=default]
There was a problem hiding this comment.
If we log down the sql_digest in MPPTaskStatistics.cpp, then we can
- use loki filter as follow and directly collect the sql_digest from the returned rows
|= `MPPTaskStatistics` |~ `TopN`
If we only log down the sql_digest in Handling mpp dispatch request, then we have to
- filter logging with "MPPTaskStatistics.cpp" and "TopN", then collect start_ts
- filter the logging with "Handling mpp dispatch request" and the collected start_ts, then collect the "sql_digest" values.
Signed-off-by: JaySon-Huang <tshent@qq.com>
c7a8731 to
33cdede
Compare
Signed-off-by: JaySon-Huang <tshent@qq.com>
33cdede to
3a815d2
Compare
|
[FORMAT CHECKER NOTIFICATION] Notice: To remove the 📖 For more info, you can check the "Contribute Code" section in the development guide. |
There was a problem hiding this comment.
♻️ Duplicate comments (1)
dbms/src/Flash/Mpp/MPPTaskStatistics.cpp (1)
117-117:⚠️ Potential issue | 🟠 MajorEscape string fields before embedding them into JSON tracing logs.
Line 117 and Line 130-Line 132 still inject raw string values into JSON. Unescaped quotes/backslashes/control chars (especially in
connection_alias) can break JSON and enable log-injection style corruption.Also applies to: 129-132
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@dbms/src/Flash/Mpp/MPPTaskStatistics.cpp` at line 117, The JSON trace format string currently injects raw string fields (see the R"(,"connection_id":{},"connection_alias":"{}","sql_digest":"{}","plan_digest":"{}")" literal in MPPTaskStatistics.cpp) which can break JSON or permit log injection; escape/encode connection_alias, sql_digest and plan_digest before formatting into this JSON (e.g. call a JSON-string-escaping utility like escapeJSONString or toJsonString for those values inside the function/method that builds the trace string in class MPPTaskStatistics/related logging routine) so quotes, backslashes and control characters are properly backslash-escaped and then use the escaped values when populating the format string.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Duplicate comments:
In `@dbms/src/Flash/Mpp/MPPTaskStatistics.cpp`:
- Line 117: The JSON trace format string currently injects raw string fields
(see the
R"(,"connection_id":{},"connection_alias":"{}","sql_digest":"{}","plan_digest":"{}")"
literal in MPPTaskStatistics.cpp) which can break JSON or permit log injection;
escape/encode connection_alias, sql_digest and plan_digest before formatting
into this JSON (e.g. call a JSON-string-escaping utility like escapeJSONString
or toJsonString for those values inside the function/method that builds the
trace string in class MPPTaskStatistics/related logging routine) so quotes,
backslashes and control characters are properly backslash-escaped and then use
the escaped values when populating the format string.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
Run ID: 3bc2a7db-9645-4e69-bd5b-eca034d3f7d9
📒 Files selected for processing (6)
contrib/kvprotodbms/src/Flash/Coprocessor/DAGContext.cppdbms/src/Flash/Coprocessor/DAGContext.hdbms/src/Flash/FlashService.cppdbms/src/Flash/Mpp/MPPTaskStatistics.cppdbms/src/Flash/Mpp/MPPTaskStatistics.h
🚧 Files skipped from review as they are similar to previous changes (3)
- contrib/kvproto
- dbms/src/Flash/Mpp/MPPTaskStatistics.h
- dbms/src/Flash/Coprocessor/DAGContext.cpp
| resource_group, | ||
| task_meta.connection_id(), | ||
| task_meta.connection_alias()); | ||
| task_meta.connection_alias(), |
There was a problem hiding this comment.
[P0] Unbounded sql_digest accepted from gRPC can cause OOM/disk-fill DoS
Why: sql_digest is accepted from gRPC without any length or format validation and is logged at INFO level before security checks (checkGrpcContext at line 504). With gRPC max receive size set to unlimited (SetMaxReceiveMessageSize(-1) in FlashGrpcServerHolder.cpp:172), a single malicious request can force very large allocations and log writes, potentially OOM-crashing the TiFlash process or rapidly consuming disk space.
Evidence:
- Untrusted input logged before security check:
FlashService.cpp:488-503 - Copied into
DAGContextwithout validation:DAGContext.cpp:107(sql_digest(meta_.sql_digest())) - Unlimited gRPC receive size:
FlashGrpcServerHolder.cpp:172 - Default bind on all interfaces:
0.0.0.0:3930
Fix: Enforce max size (e.g., 128 bytes) and expected pattern (fixed hex digest length) on sql_digest before storing/logging. Truncate or hash if oversized.
| @@ -110,6 +114,7 @@ void MPPTaskStatistics::logTracingJson() | |||
| /// don't use info log for initializing status since it does not contains too many information | |||
| status == INITIALIZING ? Poco::Message::PRIO_DEBUG : Poco::Message::PRIO_INFORMATION, | |||
| R"({{"query_tso":{},"task_id":{},"is_root":{},"sender_executor_id":"{}","executors":{},"host":"{}")" | |||
There was a problem hiding this comment.
[P2] JSON tracing log builds JSON without escaping user-controlled fields
Why: logTracingJson() constructs JSON using fmt::format and interpolates connection_alias and sql_digest as raw strings (\"{}\"). If these fields contain quotes, backslashes, or control characters, the emitted JSON becomes invalid, breaking downstream log parsers and ETL pipelines that expect valid JSON.
Evidence:
- Line 116:
R\"(,\"connection_id\":{},\"connection_alias\":\"{}\",\"sql_digest\":\"{}\")\"with raw string interpolation - Fields sourced from protobuf without sanitization:
DAGContext.cpp:84-107
Fix: JSON-escape connection_alias, sql_digest, and other string fields (e.g., error_message, status) before interpolation, or use a proper JSON serialization library instead of manual string formatting.
What problem does this PR solve?
Issue Number: ref pingcap/tidb#66762
rely on pingcap/kvproto#1428
Problem Summary:
What is changed and how it works?
TaskMeta.sql_digestintoDAGContextand expose it via getter.MPPTaskStatistics::logTracingJson) to printsql_digest,connection_id, andconnection_alias.FlashService::DispatchMPPTaskhandling log withsql_digest.contrib/kvprotosubmodule pointer to include matched proto changes.Check List
Tests
Side effects
Documentation
Release note
Summary by CodeRabbit