Skip to content

[fix](be) Preserve nested paths for lazy rowid fetch#64242

Open
mrhhsg wants to merge 2 commits into
apache:masterfrom
mrhhsg:fix/topn-lazy-nested-rowid-fetch
Open

[fix](be) Preserve nested paths for lazy rowid fetch#64242
mrhhsg wants to merge 2 commits into
apache:masterfrom
mrhhsg:fix/topn-lazy-nested-rowid-fetch

Conversation

@mrhhsg

@mrhhsg mrhhsg commented Jun 8, 2026

Copy link
Copy Markdown
Member

fix Preserve nested paths for lazy rowid fetch

What problem does this PR solve?

Issue Number: None

Problem Summary: TopN lazy materialization can fetch nested-pruned columns by row id after nested-column pruning. The probe must keep the relation output slot metadata; otherwise the slot remapped to the TopN output ExprId can lose nested access paths. Then FE may consider a nested-pruned lazy slot safe for row-store fetch, while BE row-store fetch maps values only by column unique id and cannot apply sub-column or nested access paths. This patch preserves relation slot access paths through lazy materialization, passes slot access paths to the normal storage rowid fetch path, rejects row-store fetch only for lazy slots that actually carry sub-column paths or nested access paths, and keeps struct iterators readable when child fields are pruned.

Release note

None

Check List (For Author)

  • Test:
    • Build: ~/.codex/skills/doris-local-regression/scripts/doris-local-regression.sh --network 10.26.20.3/24 build
    • Build: ./build.sh --fe
    • Unit Test: ./run-fe-ut.sh --run org.apache.doris.nereids.glue.translator.PhysicalPlanTranslatorTest,org.apache.doris.nereids.processor.post.materialize.MaterializeProbeVisitorTest
    • Regression test: ~/.codex/skills/doris-local-regression/scripts/doris-local-regression.sh --network 10.26.20.3/24 run -d nereids_rules_p0/column_pruning -s topn_lazy_nested_column_pruning
    • Format: build-support/clang-format.sh be/src/exec/rowid_fetcher.cpp
    • Check: git diff --check
  • Behavior changed: No
  • Does this need documentation: No

@hello-stephen

Copy link
Copy Markdown
Contributor

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@mrhhsg

mrhhsg commented Jun 8, 2026

Copy link
Copy Markdown
Member Author

/review

@github-actions github-actions Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review result: no blocking issues found.

Critical checkpoint conclusions:

  • Goal and proof: The PR preserves nested access paths for TopN lazy rowid fetch, disables row-store lazy fetch for complex or nested-pruned slots, and adds FE unit plus regression coverage for the affected path.
  • Scope: The changes are focused on lazy materialization slot propagation, row-store eligibility, and BE rowid storage reads.
  • Concurrency: No new shared mutable state, thread entry, lock ordering, or lifecycle/concurrency-sensitive path was introduced.
  • Lifecycle/static initialization: Only namespace-scope helper functions were added; no new static object lifecycle issue found.
  • Configuration: No new config item was added.
  • Compatibility: Protobuf slot descriptors already carry access paths; this change consumes them in the rowid fetch path without introducing new protocol fields.
  • Parallel paths: Both PMultiGetRequest and PMultiGetRequestV2/internal rowid fetch paths are covered; row-store fallback is guarded on both FE and BE sides.
  • Special checks: Complex/nested-pruned row-store rejection has a clear storage-layout reason and matching tests.
  • Tests: Added FE unit assertions and a regression test with ordered output; I did not rerun tests in this review runner.
  • Observability: Existing errors/logging are sufficient for this narrow read-path fix.
  • Transaction/persistence/data writes: No transaction, persistence, or data write behavior is changed.
  • FE-BE variable passing: Existing SlotDescriptor access-path serialization/deserialization is used; rowid fetch now passes those options to segment iterators.
  • Performance: Row-store is conservatively disabled only where it cannot return the required nested/pruned layout; no obvious avoidable hot-path regression found.

User focus response: No additional user-provided review focus was present.

Residual risk: Coverage is strongest for internal rowid fetch with nested STRUCT/MAP/ARRAY pruning; external table lazy materialization paths were checked for obvious incompatibility but not executed here.

@mrhhsg

mrhhsg commented Jun 8, 2026

Copy link
Copy Markdown
Member Author

run buildall

@hello-stephen

Copy link
Copy Markdown
Contributor

BE UT Coverage Report

Increment line coverage 0.00% (0/61) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 53.90% (21134/39211)
Line Coverage 37.61% (201213/534954)
Region Coverage 33.64% (157814/469173)
Branch Coverage 34.68% (69079/199182)

@hello-stephen

Copy link
Copy Markdown
Contributor

BE Regression && UT Coverage Report

Increment line coverage 72.13% (44/61) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 71.97% (27570/38306)
Line Coverage 55.47% (295063/531908)
Region Coverage 52.04% (245327/471377)
Branch Coverage 53.32% (106350/199465)

@hello-stephen

Copy link
Copy Markdown
Contributor

FE Regression Coverage Report

Increment line coverage 55.56% (10/18) 🎉
Increment coverage report
Complete coverage report

@hello-stephen

Copy link
Copy Markdown
Contributor
TPC-H: Total hot run time: 29446 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 751a4be168135037195c9270a35891f4290b5f50, data reload: false

------ Round 1 ----------------------------------
orders	Doris	NULL	NULL	0	0	0	NULL	0	NULL	NULL	2023-12-26 18:27:23	2023-12-26 18:42:55	NULL	utf-8	NULL	NULL	
============================================
q1	17619	4132	3993	3993
q2	q3	10867	1365	808	808
q4	4692	502	354	354
q5	7520	887	591	591
q6	183	174	138	138
q7	784	847	653	653
q8	9344	1619	1637	1619
q9	5933	4535	4489	4489
q10	6765	1792	1539	1539
q11	439	276	258	258
q12	633	439	296	296
q13	18094	3394	2789	2789
q14	275	263	242	242
q15	q16	828	804	711	711
q17	1008	938	959	938
q18	7044	5957	5605	5605
q19	1323	1270	1123	1123
q20	529	421	273	273
q21	6347	2815	2706	2706
q22	473	367	321	321
Total cold run time: 100700 ms
Total hot run time: 29446 ms

----- Round 2, with runtime_filter_mode=off -----
orders	Doris	NULL	NULL	150000000	42	6422171781	NULL	22778155	NULL	NULL	2023-12-26 18:27:23	2023-12-26 18:42:55	NULL	utf-8	NULL	NULL	
============================================
q1	5232	4810	4744	4744
q2	q3	4903	5406	4709	4709
q4	2126	2186	1408	1408
q5	4809	4910	4623	4623
q6	233	175	127	127
q7	1992	1750	1592	1592
q8	2393	2099	2145	2099
q9	7944	7509	7432	7432
q10	4780	4693	4171	4171
q11	537	380	351	351
q12	752	738	533	533
q13	3040	3361	2780	2780
q14	271	287	258	258
q15	q16	681	697	607	607
q17	1277	1267	1271	1267
q18	7200	6742	6958	6742
q19	1133	1114	1110	1110
q20	2216	2211	1950	1950
q21	5329	4598	4479	4479
q22	539	460	406	406
Total cold run time: 57387 ms
Total hot run time: 51388 ms

@hello-stephen

Copy link
Copy Markdown
Contributor
TPC-DS: Total hot run time: 170485 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 751a4be168135037195c9270a35891f4290b5f50, data reload: false

query5	4326	645	480	480
query6	452	215	189	189
query7	4841	587	312	312
query8	371	218	199	199
query9	8743	4067	4061	4061
query10	458	319	265	265
query11	5925	2376	2201	2201
query12	159	107	100	100
query13	1331	634	429	429
query14	6412	5430	5053	5053
query14_1	4424	4432	4405	4405
query15	214	202	180	180
query16	1015	464	457	457
query17	1138	725	597	597
query18	2549	482	360	360
query19	212	190	149	149
query20	115	111	107	107
query21	214	150	119	119
query22	13614	13556	13505	13505
query23	17317	16459	16082	16082
query23_1	16318	16246	16268	16246
query24	7493	1784	1321	1321
query24_1	1346	1288	1334	1288
query25	595	481	413	413
query26	1309	327	169	169
query27	2633	569	358	358
query28	4421	2077	2032	2032
query29	1107	636	505	505
query30	316	239	198	198
query31	1135	1075	961	961
query32	103	62	63	62
query33	536	365	242	242
query34	1187	1116	641	641
query35	764	783	690	690
query36	1413	1380	1217	1217
query37	155	107	93	93
query38	3208	3125	3062	3062
query39	942	913	887	887
query39_1	885	855	874	855
query40	220	122	101	101
query41	64	63	62	62
query42	95	94	94	94
query43	334	322	284	284
query44	
query45	195	191	183	183
query46	1123	1246	745	745
query47	2358	2397	2248	2248
query48	400	406	283	283
query49	622	465	347	347
query50	1020	339	274	274
query51	4463	4320	4284	4284
query52	88	88	75	75
query53	246	272	194	194
query54	278	227	203	203
query55	83	76	70	70
query56	237	223	210	210
query57	1428	1406	1315	1315
query58	241	220	212	212
query59	1612	1665	1435	1435
query60	287	248	235	235
query61	157	152	153	152
query62	696	646	588	588
query63	235	201	192	192
query64	2583	831	665	665
query65	
query66	1799	466	341	341
query67	29841	29621	29632	29621
query68	
query69	436	310	275	275
query70	994	906	980	906
query71	311	209	218	209
query72	3073	2811	2419	2419
query73	839	762	461	461
query74	5143	4957	4763	4763
query75	2664	2590	2235	2235
query76	2312	1194	822	822
query77	361	383	297	297
query78	12418	12463	11815	11815
query79	1401	1048	778	778
query80	579	463	392	392
query81	459	286	243	243
query82	587	156	123	123
query83	359	277	250	250
query84	
query85	892	540	438	438
query86	397	310	291	291
query87	3389	3377	3200	3200
query88	3709	2787	2729	2729
query89	422	394	335	335
query90	1992	193	180	180
query91	180	158	142	142
query92	72	65	59	59
query93	1554	1457	890	890
query94	542	359	304	304
query95	693	479	347	347
query96	1088	799	344	344
query97	2725	2691	2550	2550
query98	212	207	205	205
query99	1154	1181	1045	1045
Total cold run time: 251608 ms
Total hot run time: 170485 ms

@mrhhsg mrhhsg force-pushed the fix/topn-lazy-nested-rowid-fetch branch from 751a4be to a61638c Compare June 8, 2026 23:12
@mrhhsg

mrhhsg commented Jun 8, 2026

Copy link
Copy Markdown
Member Author

/review

@github-actions github-actions Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review summary: no blocking issues found in the actual GitHub PR diff.

Critical checkpoint conclusions:

  • Goal and proof: The change preserves nested access paths for TopN lazy rowid fetch, disables row-store fetch where BE cannot apply nested/complex projections, and adds FE/unit plus regression coverage for nested lazy materialization.
  • Scope/focus: The actual PR diff is small and focused on lazy materialization rowid fetch, access-path propagation, row-store eligibility, and struct iterator pruning.
  • Concurrency/lifecycle: No new shared mutable lifecycle or concurrency paths were introduced; existing segment/iterator ownership is preserved.
  • Configuration/compatibility: No new config or incompatible storage/protocol format change was introduced.
  • Parallel paths: Both old PMultiGetRequest and V2 block-desc rowid fetch paths are guarded for row-store fallback and pass slot access paths into storage reads.
  • Special checks: The row-store disqualification for nested access paths and complex types is consistent with BE row-store limitations.
  • Tests: Added unit coverage for relation-slot access paths and row-store eligibility, plus a regression query that exercises sparse nested STRUCT/MAP/ARRAY lazy fetch with ordered output.
  • Observability: No additional observability appears necessary for this narrowly scoped correctness fix.
  • Transaction/persistence/data-write concerns: Not applicable; this is a read-path/materialization change.
  • Performance: Disabling row-store only for unsupported nested/complex cases is appropriate; simple scalar row-store lazy fetch remains eligible.

User focus: no additional user-provided review focus was supplied.

Comment thread be/src/exec/rowid_fetcher.cpp Outdated
}
slot->to_protobuf(mget_req.add_slots());
}
bool fetch_row_store = _fetch_option.t_fetch_opt.fetch_row_store;

@HappenLee HappenLee Jun 9, 2026

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个是不是直接在FE改就行了,有sub column prune的时候不走行存,不用在be判断了

Comment thread be/src/exec/rowid_fetcher.cpp Outdated
}

static ReaderType rowid_fetch_reader_type(const SlotDescriptor& slot) {
return has_nested_access_paths(slot) || has_complex_type(slot) ? ReaderType::READER_ALTER_TABLE

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

为什么这里要用ReaderType::READER_ALTER_TABLE这种方式呢?

Comment thread be/src/exec/rowid_fetcher.cpp Outdated
if (!slot.all_access_paths().empty()) {
storage_read_options.all_access_paths[unique_id] = slot.all_access_paths();
} else {
storage_read_options.all_access_paths.erase(unique_id);

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

为什么这里要erase,哪里还会插入对应的map吗

@mrhhsg mrhhsg force-pushed the fix/topn-lazy-nested-rowid-fetch branch 3 times, most recently from da7118a to 767e993 Compare June 9, 2026 07:09
@mrhhsg

mrhhsg commented Jun 9, 2026

Copy link
Copy Markdown
Member Author

/review

@github-actions github-actions Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Automated review summary for PR 64242:

I reviewed the current diff and the existing inline review threads. I did not find an additional distinct blocking issue beyond the already-known comments in be/src/exec/rowid_fetcher.cpp.

Critical checkpoint conclusions:

  • Goal and coverage: The PR addresses nested-column lazy materialization rowid fetch by propagating access paths to BE column-store fetch and by preventing row-store lazy fetch for nested/complex slots. Added FE unit tests and a regression case cover the targeted nested pruning path.
  • Scope: The change is focused on lazy materialization, row-store eligibility, and rowid fetch access-path propagation.
  • Concurrency: No new shared mutable state, thread lifecycle, atomics, or lock ordering changes were introduced. Existing batched/external rowid fetch scheduling remains unchanged.
  • Lifecycle/static initialization: No new static objects with cross-translation-unit initialization dependencies were introduced.
  • Configuration: No new configuration item was added.
  • Compatibility: No storage format or thrift schema change was introduced; the BE protobuf request already carries slot access paths through existing fields.
  • Parallel paths: Both older PMultiGetRequest and batched PMultiGetRequestV2 Doris-format rowid fetch paths now set slot access paths before creating/reusing the column iterator. External-file fetch paths are not directly affected.
  • Conditional checks: The FE row-store gating is conservative for complex/nested lazy slots, matching the BE row-store limitation that it does not apply nested access paths.
  • Tests: The added tests cover FE row-store gating, relation-slot access-path preservation, and an end-to-end regression query for nested STRUCT/MAP/ARRAY lazy rowid fetch. I did not run the tests locally in this review session.
  • Test results: The regression output uses a deterministic ORDER BY in the query and the new expected rows match the inserted data and ordering.
  • Observability: No new observability appears necessary for this focused correctness fix; existing rowid fetch logging/profile counters remain in place.
  • Transaction/persistence/data write: The PR does not modify transaction processing, persistence, or write paths.
  • FE/BE variable passing: The existing slot access-path fields are propagated through the lazy materialize output slot and then through SlotDescriptor protobuf to BE.
  • Performance: The BE change adds small per-slot map assignments before rowid column iterator creation/reuse; no obvious hot-path regression beyond the necessary access-path setup was found.
  • User focus: No additional user-provided review focus was specified.

@mrhhsg mrhhsg force-pushed the fix/topn-lazy-nested-rowid-fetch branch from 767e993 to 1ceb1bb Compare June 9, 2026 07:24
@mrhhsg

mrhhsg commented Jun 9, 2026

Copy link
Copy Markdown
Member Author

/review

@mrhhsg

mrhhsg commented Jun 9, 2026

Copy link
Copy Markdown
Member Author

run buildall

@hello-stephen

Copy link
Copy Markdown
Contributor
TPC-H: Total hot run time: 28864 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 72b6fc7d0d1cd2e2b214f8fb4f78ab2df2191e58, data reload: false

------ Round 1 ----------------------------------
orders	Doris	NULL	NULL	0	0	0	NULL	0	NULL	NULL	2023-12-26 18:27:23	2023-12-26 18:42:55	NULL	utf-8	NULL	NULL	
============================================
q1	17615	4067	4039	4039
q2	q3	10748	1434	802	802
q4	4693	482	337	337
q5	7599	873	603	603
q6	197	179	143	143
q7	802	849	663	663
q8	9823	1527	1519	1519
q9	6565	4484	4568	4484
q10	6866	1828	1565	1565
q11	437	277	257	257
q12	648	428	310	310
q13	18115	3474	2789	2789
q14	259	265	241	241
q15	q16	837	769	711	711
q17	1015	984	882	882
q18	6947	5806	5560	5560
q19	1338	1211	974	974
q20	516	411	264	264
q21	6013	2650	2413	2413
q22	445	360	308	308
Total cold run time: 101478 ms
Total hot run time: 28864 ms

----- Round 2, with runtime_filter_mode=off -----
orders	Doris	NULL	NULL	150000000	42	6422171781	NULL	22778155	NULL	NULL	2023-12-26 18:27:23	2023-12-26 18:42:55	NULL	utf-8	NULL	NULL	
============================================
q1	4373	4294	4274	4274
q2	q3	4543	4972	4364	4364
q4	2097	2214	1407	1407
q5	4456	4304	4263	4263
q6	230	172	127	127
q7	1727	1911	1777	1777
q8	2565	2160	2100	2100
q9	8059	7963	7943	7943
q10	4817	4768	4288	4288
q11	612	433	385	385
q12	771	758	539	539
q13	3479	3642	2996	2996
q14	326	316	270	270
q15	q16	766	737	644	644
q17	1361	1337	1351	1337
q18	7983	7382	6965	6965
q19	1163	1113	1131	1113
q20	2239	2233	1942	1942
q21	5294	4580	4476	4476
q22	540	472	424	424
Total cold run time: 57401 ms
Total hot run time: 51634 ms

@hello-stephen

Copy link
Copy Markdown
Contributor
TPC-DS: Total hot run time: 169157 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 72b6fc7d0d1cd2e2b214f8fb4f78ab2df2191e58, data reload: false

query5	4328	637	485	485
query6	434	207	179	179
query7	4838	605	305	305
query8	365	220	222	220
query9	8771	4041	4044	4041
query10	448	312	267	267
query11	5929	2343	2184	2184
query12	171	108	103	103
query13	1262	625	432	432
query14	6737	5415	5066	5066
query14_1	4379	4403	4361	4361
query15	210	199	179	179
query16	1070	472	438	438
query17	1151	743	622	622
query18	2742	493	359	359
query19	220	200	151	151
query20	115	112	120	112
query21	236	148	129	129
query22	13703	13620	13446	13446
query23	17367	16640	16214	16214
query23_1	16407	16311	16340	16311
query24	7695	1810	1306	1306
query24_1	1317	1311	1325	1311
query25	576	488	418	418
query26	1317	313	176	176
query27	2605	551	341	341
query28	4421	2082	2105	2082
query29	1115	639	518	518
query30	331	237	201	201
query31	1142	1097	949	949
query32	117	69	63	63
query33	539	345	275	275
query34	1171	1171	648	648
query35	755	808	672	672
query36	1441	1389	1247	1247
query37	153	110	94	94
query38	3240	3138	3044	3044
query39	915	931	907	907
query39_1	916	856	873	856
query40	239	123	104	104
query41	80	62	62	62
query42	96	94	94	94
query43	318	326	282	282
query44	
query45	205	185	179	179
query46	1087	1238	754	754
query47	2442	2404	2244	2244
query48	410	421	297	297
query49	625	474	360	360
query50	1053	358	259	259
query51	4379	4361	4268	4268
query52	89	88	79	79
query53	240	265	202	202
query54	274	216	206	206
query55	86	75	77	75
query56	263	255	214	214
query57	1471	1413	1336	1336
query58	237	208	215	208
query59	1579	1617	1413	1413
query60	286	255	234	234
query61	154	157	159	157
query62	705	646	597	597
query63	238	190	188	188
query64	2534	809	634	634
query65	
query66	1732	472	339	339
query67	29165	29680	29524	29524
query68	
query69	440	307	262	262
query70	988	955	932	932
query71	307	223	207	207
query72	3029	2763	2415	2415
query73	845	801	405	405
query74	5124	4929	4828	4828
query75	2659	2583	2246	2246
query76	2332	1130	759	759
query77	368	394	281	281
query78	12389	12574	11812	11812
query79	1435	1121	769	769
query80	1310	470	401	401
query81	536	282	256	256
query82	649	161	120	120
query83	329	274	250	250
query84	
query85	959	527	430	430
query86	439	300	281	281
query87	3405	3315	3163	3163
query88	3681	2788	2749	2749
query89	422	392	339	339
query90	1901	182	183	182
query91	176	157	143	143
query92	68	57	55	55
query93	1621	1482	819	819
query94	731	358	311	311
query95	681	492	339	339
query96	1092	790	380	380
query97	2743	2747	2595	2595
query98	209	204	204	204
query99	1160	1180	1059	1059
Total cold run time: 253122 ms
Total hot run time: 169157 ms

@hello-stephen

Copy link
Copy Markdown
Contributor

BE Regression && UT Coverage Report

Increment line coverage 75.00% (15/20) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 73.83% (28276/38299)
Line Coverage 57.89% (308179/532323)
Region Coverage 54.65% (257703/471583)
Branch Coverage 56.08% (111966/199651)

@hello-stephen

Copy link
Copy Markdown
Contributor

FE Regression Coverage Report

Increment line coverage 71.43% (20/28) 🎉
Increment coverage report
Complete coverage report

### What problem does this PR solve?

Issue Number: None

Problem Summary: TopN lazy materialization can fetch pruned complex columns by row id after nested-column pruning. The materialization tuple kept the pruned slot type but did not preserve the relation slot access paths, so BE could build full storage iterators and read full child layouts into pruned result columns. Row-store lazy fetch also cannot apply nested access paths, so FE now rejects row-store fetch for complex or nested-pruned lazy slots. This patch carries relation slot access paths into lazy materialization slots, passes slot access paths to storage rowid fetch, keeps rowid fetch on the normal query reader path, preserves FE's row-store fetch decision in the BE request and row-store decode path, and keeps struct iterators readable when only some child fields are pruned.

### Release note

None

### Check List (For Author)

- Test:
    - Build: ~/.codex/skills/doris-local-regression/scripts/doris-local-regression.sh --network 10.26.20.3/24 build
    - Unit Test: ./run-fe-ut.sh --run org.apache.doris.nereids.processor.post.materialize.MaterializeProbeVisitorTest
    - Unit Test: ./run-fe-ut.sh --run org.apache.doris.nereids.glue.translator.PhysicalPlanTranslatorTest#testCanUseRowStoreForLazySlots
    - Regression test: ~/.codex/skills/doris-local-regression/scripts/doris-local-regression.sh --network 10.26.20.3/24 run -d nereids_rules_p0/column_pruning -s topn_lazy_nested_column_pruning
    - Format: build-support/clang-format.sh be/src/exec/rowid_fetcher.cpp
    - Check: git diff --check
- Behavior changed: No
- Does this need documentation: No
@mrhhsg mrhhsg force-pushed the fix/topn-lazy-nested-rowid-fetch branch from 72b6fc7 to af79aaf Compare June 11, 2026 03:43
mrhhsg added a commit to mrhhsg/doris that referenced this pull request Jun 11, 2026
### What problem does this PR solve?

Issue Number: None

Related PR: apache#64242

Problem Summary: Lazy TopN materialization remaps a relation slot to the output slot ExprId when the source slot carries sub-column metadata. The previous remap kept subPath but dropped nested access paths, so row-store eligibility checks could not see nested pruning metadata. Preserve access paths in this local remap and keep row-store lazy fetch disabled only for slots that actually carry sub-column paths or nested access paths, instead of using a broad complex-type guard without an independent failure case.

### Release note

None

### Check List (For Author)

- Test: Regression test / Unit Test

    - ./build.sh --fe

    - ./run-fe-ut.sh --run org.apache.doris.nereids.glue.translator.PhysicalPlanTranslatorTest,org.apache.doris.nereids.processor.post.materialize.MaterializeProbeVisitorTest

    - ~/.codex/skills/doris-local-regression/scripts/doris-local-regression.sh --network 10.26.20.3/24 run -d nereids_rules_p0/column_pruning -s topn_lazy_nested_column_pruning

- Behavior changed: No

- Does this need documentation: No
HappenLee
HappenLee previously approved these changes Jun 11, 2026

@HappenLee HappenLee left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@github-actions

Copy link
Copy Markdown
Contributor

PR approved by at least one committer and no changes requested.

@github-actions github-actions Bot added approved Indicates a PR has been approved by one committer. reviewed labels Jun 11, 2026
@github-actions

Copy link
Copy Markdown
Contributor

PR approved by anyone and no changes requested.

### What problem does this PR solve?

Issue Number: None

Related PR: apache#64242

Problem Summary: Lazy TopN materialization needs nested access paths from the relation output slot to reach the BE rowid fetch SlotDescriptor, and row-store lazy fetch must be disabled when a lazily fetched source column has nested access paths. Preserve the relation output slot as MaterializeSource.baseSlot for lazy materialization output, and make PhysicalPlanTranslator.shouldUseRowStore check the lazy slots against the relation output before enabling row-store fetch.

### Release note

None

### Check List (For Author)

- Test: Unit Test / Regression test / Build
    - ./run-fe-ut.sh --run org.apache.doris.nereids.glue.translator.PhysicalPlanTranslatorTest,org.apache.doris.nereids.processor.post.materialize.MaterializeProbeVisitorTest
    - ~/.codex/skills/doris-local-regression/scripts/doris-local-regression.sh --network 10.26.20.3/24 run -d nereids_rules_p0/column_pruning -s topn_lazy_nested_column_pruning
    - ./build.sh --be --fe
    - git diff --check
- Behavior changed: No
- Does this need documentation: No
@mrhhsg mrhhsg force-pushed the fix/topn-lazy-nested-rowid-fetch branch from af79aaf to 7d7d152 Compare June 12, 2026 09:15
@github-actions github-actions Bot removed the approved Indicates a PR has been approved by one committer. label Jun 12, 2026
@mrhhsg

mrhhsg commented Jun 12, 2026

Copy link
Copy Markdown
Member Author

/review

@github-actions github-actions Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review opinion: no blocking issues found in the final PR diff.

Critical checkpoint conclusions:

  • Goal and proof: The PR preserves nested access-path metadata through TopN lazy materialization and passes those paths into BE row-id column-store fetch. The added MaterializeProbeVisitorTest cases and the topn_lazy_nested_column_pruning regression cover the targeted relation-slot/access-path path and sparse nested STRUCT/MAP/ARRAY row-id fetch.
  • Scope: The final PR is focused to lazy materialization metadata propagation, row-store eligibility for nested-pruned lazy slots, one struct iterator pruning behavior, and matching tests.
  • Concurrency: No new shared mutable state, thread entry, atomics, dependency state, or lock ordering is introduced. Existing row-id RPC/task scheduling behavior is unchanged.
  • Lifecycle/static initialization: Only helper methods/functions are added; no new static object or cross-translation-unit initialization dependency was found.
  • Configuration: No new configuration item was added.
  • Compatibility: No thrift/protobuf/storage-format field is added. The change reuses existing SlotDescriptor access-path serialization and existing BE StorageReadOptions maps, so mixed-format compatibility risk is low.
  • Parallel paths: Both Doris-format row-id fetch paths are covered: the older PMultiGetRequest path and the V2/internal batched path both call set_slot_access_paths before iterator creation/reuse. Row-store lazy fetch is guarded in FE for relation outputs carrying nested access paths.
  • Special conditions: The FE nested-path check is reached only after canUseRowStoreForLazySlots validates slot shape/original columns. The struct iterator change keeps the parent readable while pruning only skipped children, matching the downstream iterator layout requirement.
  • Tests: I did not run the full test suite in this review runner. I verified the PR patch applies cleanly in reverse with whitespace errors enabled. The submitted PR reports FE unit/regression coverage, and the added regression query uses explicit ordering and drops the table before use.
  • Test results: The new expected regression rows match the inserted data and ORDER BY ABS(pk % 3), pk ordering.
  • Observability: No additional logs or metrics appear necessary for this narrow correctness fix; existing row-id fetch errors/statistics remain in place.
  • Transaction/persistence/data writes: Not applicable; this PR changes read-path planning/execution and tests only.
  • FE-BE variable passing: Access paths flow from relation output slots to materialized output slots, then through tuple slot descriptors into BE requests, where storage iterators consume them by normalized column unique id.
  • Performance: The added per-slot access-path map setup is small and tied to row-id iterator creation/reuse. Row-store is only disabled for nested-pruned lazy slots where it cannot apply access paths.
  • Other issues: The current BE UT (macOS) check fails before running tests because the job reports ERROR: The JAVA version is 25, it must be JDK-17; I did not see evidence in the failed log that this is caused by the PR code.

Existing review context: I read the existing inline threads in be/src/exec/rowid_fetcher.cpp and did not duplicate the already-known comments about BE-side row-store gating, the earlier READER_ALTER_TABLE approach, or stale access-path erasure.

User focus: No additional user-provided review focus was present.

@mrhhsg

mrhhsg commented Jun 12, 2026

Copy link
Copy Markdown
Member Author

run buildall

@hello-stephen

Copy link
Copy Markdown
Contributor
TPC-H: Total hot run time: 28887 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 7d7d152feae3fb22df5cdcadab89e5010aa0865a, data reload: false

------ Round 1 ----------------------------------
orders	Doris	NULL	NULL	0	0	0	NULL	0	NULL	NULL	2023-12-26 18:27:23	2023-12-26 18:42:55	NULL	utf-8	NULL	NULL	
============================================
q1	17625	4039	4007	4007
q2	q3	10768	1404	821	821
q4	4685	479	338	338
q5	7529	861	583	583
q6	182	169	133	133
q7	796	827	645	645
q8	9380	1647	1694	1647
q9	5726	4438	4474	4438
q10	6726	1800	1535	1535
q11	437	271	243	243
q12	631	423	291	291
q13	18228	3335	2783	2783
q14	263	260	235	235
q15	q16	815	768	706	706
q17	931	967	988	967
q18	6749	5711	5508	5508
q19	1317	1325	1053	1053
q20	563	400	265	265
q21	5917	2578	2381	2381
q22	433	353	308	308
Total cold run time: 99701 ms
Total hot run time: 28887 ms

----- Round 2, with runtime_filter_mode=off -----
orders	Doris	NULL	NULL	150000000	42	6422171781	NULL	22778155	NULL	NULL	2023-12-26 18:27:23	2023-12-26 18:42:55	NULL	utf-8	NULL	NULL	
============================================
q1	4330	4235	4233	4233
q2	q3	4485	4990	4369	4369
q4	2089	2196	1382	1382
q5	4425	4281	4271	4271
q6	222	172	126	126
q7	1731	1603	1443	1443
q8	2933	2323	2178	2178
q9	8218	8119	8053	8053
q10	4823	4759	4232	4232
q11	568	419	394	394
q12	731	763	541	541
q13	3223	3635	3002	3002
q14	291	308	282	282
q15	q16	709	759	664	664
q17	1350	1329	1329	1329
q18	8149	7237	7223	7223
q19	1174	1175	1132	1132
q20	2226	2205	1967	1967
q21	5283	4551	4383	4383
q22	516	443	395	395
Total cold run time: 57476 ms
Total hot run time: 51599 ms

@hello-stephen

Copy link
Copy Markdown
Contributor
TPC-DS: Total hot run time: 167729 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 7d7d152feae3fb22df5cdcadab89e5010aa0865a, data reload: false

query5	4307	616	469	469
query6	437	191	175	175
query7	4891	552	311	311
query8	365	215	191	191
query9	8741	3977	3975	3975
query10	485	311	254	254
query11	5915	2342	2181	2181
query12	160	101	95	95
query13	1283	604	463	463
query14	6388	5353	5081	5081
query14_1	4391	4352	4352	4352
query15	204	193	173	173
query16	988	458	438	438
query17	944	718	570	570
query18	2476	476	349	349
query19	205	186	146	146
query20	116	117	106	106
query21	212	137	115	115
query22	13529	13568	13388	13388
query23	17246	16568	16238	16238
query23_1	16250	16353	16284	16284
query24	7591	1764	1301	1301
query24_1	1309	1321	1317	1317
query25	568	452	393	393
query26	1295	321	164	164
query27	2696	529	332	332
query28	4440	2049	2051	2049
query29	1074	615	485	485
query30	306	232	202	202
query31	1118	1074	947	947
query32	107	63	59	59
query33	521	321	272	272
query34	1174	1192	674	674
query35	756	822	666	666
query36	1410	1415	1214	1214
query37	147	98	88	88
query38	3209	3136	3037	3037
query39	914	922	885	885
query39_1	871	863	878	863
query40	218	118	95	95
query41	63	62	59	59
query42	94	94	94	94
query43	314	322	278	278
query44	
query45	186	182	177	177
query46	1037	1156	715	715
query47	2377	2369	2224	2224
query48	406	376	286	286
query49	621	461	352	352
query50	947	343	251	251
query51	4381	4265	4257	4257
query52	86	87	79	79
query53	254	276	182	182
query54	271	214	191	191
query55	80	73	68	68
query56	232	222	205	205
query57	1422	1403	1326	1326
query58	233	214	210	210
query59	1562	1633	1458	1458
query60	264	238	224	224
query61	161	154	143	143
query62	692	642	582	582
query63	230	182	185	182
query64	2497	766	597	597
query65	
query66	1758	447	347	347
query67	29760	29643	29503	29503
query68	
query69	422	299	258	258
query70	956	916	926	916
query71	289	215	204	204
query72	2806	2587	2329	2329
query73	833	768	434	434
query74	5109	4964	4779	4779
query75	2643	2542	2237	2237
query76	2315	1139	778	778
query77	351	379	284	284
query78	12440	12424	11663	11663
query79	1384	1040	742	742
query80	590	461	372	372
query81	456	278	244	244
query82	580	158	117	117
query83	351	260	246	246
query84	
query85	836	502	415	415
query86	367	294	282	282
query87	3348	3345	3162	3162
query88	3593	2735	2687	2687
query89	423	380	330	330
query90	1983	172	177	172
query91	171	153	130	130
query92	63	64	56	56
query93	1470	1589	849	849
query94	533	354	300	300
query95	689	461	334	334
query96	1129	838	365	365
query97	2691	2663	2596	2596
query98	209	204	196	196
query99	1141	1163	1029	1029
Total cold run time: 249727 ms
Total hot run time: 167729 ms

@hello-stephen

Copy link
Copy Markdown
Contributor

FE UT Coverage Report

Increment line coverage 25.81% (8/31) 🎉
Increment coverage report
Complete coverage report

@hello-stephen

Copy link
Copy Markdown
Contributor

BE UT Coverage Report

Increment line coverage 0.00% (0/20) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 53.91% (21117/39174)
Line Coverage 37.63% (201357/535110)
Region Coverage 33.66% (157907/469140)
Branch Coverage 34.71% (69144/199231)

@hello-stephen

Copy link
Copy Markdown
Contributor

BE Regression && UT Coverage Report

Increment line coverage 75.00% (15/20) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 73.90% (28275/38261)
Line Coverage 57.90% (308044/531991)
Region Coverage 54.87% (258598/471307)
Branch Coverage 56.17% (112048/199494)

@hello-stephen

Copy link
Copy Markdown
Contributor

FE Regression Coverage Report

Increment line coverage 1.31% (10/764) 🎉
Increment coverage report
Complete coverage report

@github-actions github-actions Bot added the approved Indicates a PR has been approved by one committer. label Jun 15, 2026
@github-actions

Copy link
Copy Markdown
Contributor

PR approved by at least one committer and no changes requested.

@HappenLee HappenLee left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by one committer. dev/4.1.x reviewed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants