Storage: Fix TableScan performance regression under wide-sparse table (#10379) #10385

ti-chi-bot · 2025-08-29T09:01:12Z

This is an automated cherry-pick of #10379

What problem does this PR solve?

Issue Number: close #10361

Problem Summary:

What is changed and how it works?

Storage: Fix TableScan performance regression under wide-sparse table
* Use merged_file_info.size as the buffer size when reading data  (mark, min-max index, col-data) from merged file to minimize read amplification
* Use merged_file_info.size as the buffer size when parsing data as ChecksumReadBuffer to minimize the memory allocation overhead
* Introduce class `MinMaxIndexLoader` to tidy codes of reading min-max index

test logging output: test.log.zip

Tested on x86_64 8c 32GB (m7i.2xlarge), with 16000 iops, 625MB/s throughput gp3 EBS (amd_rockylinux9)
Compare the uncompressed read/write throughput using the dttool bench

Before this PR:

the write throughput has increase by more than 60% compared to v2 format
the read throughput decrease about 10%~20% after using v3 DMFile format. Especially on wide-sparse table.

sparse_ratio	v2 write throughput	v2 read throughput	v3 write throughput	v3 read throughput
0	132.409	1814.432	220.047 (+66.2%)	1671.053 (-7.9%)
0.05	132.893	1695.434	216.352 (+62.8%)	1486.356 (-12.3%)
0.1	129.537	1599.987	212.952 (+64.4%)	1375.033 (-14.1%)
0.5	130.932	1318.756	211.084 (+61.2%)	1120.210 (-15.1%)
0.8	137.630	1497.458	227.778 (+65.5%)	1216.922 (-18.7%)
0.9	149.608	1737.411	245.560 (+64.1%)	1361.528 (-21.6%)
0.99	162.110	2160.127	295.130 (+82.1%)	1564.444 (-27.6%)

After this PR:

the write throughput has increase by more than 60% compared to v2 format
the read throughput change is not significant. Regression is not larger than 7%

sparse_ratio	v2 write throughput	v2 read throughput	v3 write throughput	v3 read throughput
0	131.757	1771.513	221.029 (+67.8%)	1839.603 (+3.8%)
0.05	129.977	1683.676	220.039 (+69.3%)	1691.280 (+0.5%)
0.1	130.816	1580.936	211.728 (+61.9%)	1559.970 (-1.3%)
0.5	130.103	1337.525	211.439 (+62.5%)	1292.864 (-3.3%)
0.8	140.769	1479.410	227.881 (+61.9%)	1386.370 (-6.3%)
0.9	146.884	1664.910	244.082 (+66.2%)	1603.719 (-3.7%)
0.99	161.374	2098.286	291.528 (+80.7%)	2116.858 (+0.9%)

Compare the read/write throughput of v3 DMFile format before and after this PR:

The write throughput is almost not changed
The read throughput has increased a lot, especially on the sparse-table scenario

sparse_ratio	v3 (before) write throughput	v3 (before) read throughput	v3 (after) write throughput	v3 (after) read throughput
0	220.047	1671.053	221.029 (+0.4%)	1839.603 (+10.1%)
0.05	216.352	1486.356	220.039 (+1.7%)	1691.280 (+13.8%)
0.1	212.952	1375.033	211.728 (-0.6%)	1559.970 (+13.4%)
0.5	211.084	1120.210	211.439 (+0.2%)	1292.864 (+15.4%)
0.8	227.778	1216.922	227.881 (+0.0%)	1386.370 (+13.9%)
0.9	245.560	1361.528	244.082 (-0.6%)	1603.719 (+17.8%)
0.99	295.130	1564.444	291.528 (-1.2%)	2116.858 (+35.3%)

Check List

Tests

Unit test
Integration test
Manual test (add detailed scripts or steps below)

#!/bin/bash

# Define arrays for sparse-ratio and version
sparse_ratios=(0 0.05 0.1 0.5 0.8 0.9 0.99)
#sparse_ratios=(0 0.1 0.9 0.99)
versions=(2 3)

# Define the base command and fixed arguments
base_cmd="beforefix/tiflash/tiflash"
sub_cmd="dttool bench"
fixed_args="--columns 600 --rows 131000 --field 12 --write-repeat 5 --repeat 20 --random 2021268696"

# Iterate through all combinations of sparse-ratio and version
for version in "${versions[@]}"; do
  for sparse_ratio in "${sparse_ratios[@]}"; do
    # Define a unique workdir for each run
    workdir="./tmp_v${version}_sr${sparse_ratio}"

    # Construct the full command
    cmd="${base_cmd} ${sub_cmd} ${fixed_args} --sparse-ratio ${sparse_ratio} --workdir ${workdir} --version ${version}"

    # Print the command being executed (optional, for logging/debugging)
    echo "Executing: $cmd"

    # Execute the command
    $cmd

    # Check if the command executed successfully
    if [ $? -ne 0 ]; then
      echo "Error: Command failed for version ${version}, sparse-ratio ${sparse_ratio}"
      # Optional: Add 'exit 1' here to stop the script on failure
      # exit 1
    fi

    echo "----------------------------------------"
  done
done

echo "All benchmarks completed."
echo "=================================="

base_cmd="afterfix/tiflash/tiflash"

# Iterate through all combinations of sparse-ratio and version
for version in "${versions[@]}"; do
  for sparse_ratio in "${sparse_ratios[@]}"; do
    # Define a unique workdir for each run
    workdir="./tmp_v${version}_sr${sparse_ratio}"

    # Construct the full command
    cmd="${base_cmd} ${sub_cmd} ${fixed_args} --sparse-ratio ${sparse_ratio} --workdir ${workdir} --version ${version}"

    # Print the command being executed (optional, for logging/debugging)
    echo "Executing: $cmd"

    # Execute the command
    $cmd

    # Check if the command executed successfully
    if [ $? -ne 0 ]; then
      echo "Error: Command failed for version ${version}, sparse-ratio ${sparse_ratio}"
      # Optional: Add 'exit 1' here to stop the script on failure
      # exit 1
    fi

    echo "----------------------------------------"
  done
done

echo "All benchmarks completed."

No code

Side effects

Performance regression: Consumes more CPU
Performance regression: Consumes more Memory
Breaking backward compatibility

Documentation

Release note

Fix a bug that cause TableScan performance regression under wide-sparse table

Signed-off-by: ti-chi-bot <ti-community-prow-bot@tidb.io>

ti-chi-bot · 2025-08-29T09:01:16Z

This cherry pick PR is for a release branch and has not yet been approved by triage owners.
Adding the do-not-merge/cherry-pick-not-approved label.

To merge this cherry pick:

It must be approved by the approvers firstly.
AFTER it has been approved by approvers, please wait for the cherry-pick merging approval from triage owners.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

ti-chi-bot · 2025-08-29T09:01:16Z

@JaySon-Huang This PR has conflicts, I have hold it.
Please resolve them or ask others to resolve them, then comment /unhold to remove the hold label.

ti-chi-bot · 2025-08-29T09:01:18Z

@ti-chi-bot: ## If you want to know how to resolve it, please read the guide in TiDB Dev Guide.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the ti-community-infra/tichi repository.

ti-chi-bot · 2025-08-29T09:01:19Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign lidezhu for approval. For more information see the Code Review Process.
Please ensure that each of them provides their approval before proceeding.

The full list of commands accepted by this bot can be found here.

Details

Needs approval from an approver in each of these files:

OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

ti-chi-bot · 2025-08-29T09:24:50Z

@ti-chi-bot: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name	Commit	Details	Required	Rerun command
pull-integration-test	`40edfd0`	link	true	`/test pull-integration-test`
pull-unit-test	`40edfd0`	link	true	`/test pull-unit-test`

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

This is an automated cherry-pick of pingcap#10379

40edfd0

Signed-off-by: ti-chi-bot <ti-community-prow-bot@tidb.io>

ti-chi-bot mentioned this pull request Aug 29, 2025

Storage: Fix TableScan performance regression under wide-sparse table #10379

Merged

12 tasks

ti-chi-bot bot added the do-not-merge/cherry-pick-not-approved label Aug 29, 2025

ti-chi-bot assigned JaySon-Huang Aug 29, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Storage: Fix TableScan performance regression under wide-sparse table (#10379) #10385

Storage: Fix TableScan performance regression under wide-sparse table (#10379) #10385

ti-chi-bot commented Aug 29, 2025

Uh oh!

ti-chi-bot bot commented Aug 29, 2025

Uh oh!

ti-chi-bot commented Aug 29, 2025

Uh oh!

ti-chi-bot bot commented Aug 29, 2025

Uh oh!

ti-chi-bot bot commented Aug 29, 2025

Uh oh!

ti-chi-bot bot commented Aug 29, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Storage: Fix TableScan performance regression under wide-sparse table (#10379) #10385

Are you sure you want to change the base?

Storage: Fix TableScan performance regression under wide-sparse table (#10379) #10385

Conversation

ti-chi-bot commented Aug 29, 2025

What problem does this PR solve?

What is changed and how it works?

Check List

Release note

Uh oh!

ti-chi-bot bot commented Aug 29, 2025

Uh oh!

ti-chi-bot commented Aug 29, 2025

Uh oh!

ti-chi-bot bot commented Aug 29, 2025

Uh oh!

ti-chi-bot bot commented Aug 29, 2025

Uh oh!

ti-chi-bot bot commented Aug 29, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants