Skip to content

chore: gather more stat4 samples#26

Merged
tantaman merged 1 commit intomainfrom
mlaw/more-samples
May 4, 2026
Merged

chore: gather more stat4 samples#26
tantaman merged 1 commit intomainfrom
mlaw/more-samples

Conversation

@tantaman
Copy link
Copy Markdown
Contributor

@tantaman tantaman commented May 4, 2026

stat4 gathers sample values and sample cardianlities for each index.

This helps the sqlite planner choose better plans.

Problems can still arise though. One such case exists for terabugs:

the issueLabel -> issue join.

SELECT * FROM issueLabel WHERE label_id = xx

If xx is not present in stat4, sqlite falls back to an average. The average is roughly the number of unsampled rows divided by the number of rows in the table.

If there is a heavy skew, this average can be way off. In terabugs we see it as 10x off in cases.

This PR gathers more stats to get better averages for the case a sample is missing.

stat4 gathers sample values and sample cardianlities for each index.

This helps the sqlite planner choose better plans.

Problems can still arise though. One such case exists for terabugs:

the issueLabel -> issue join.

`SELECT * FROM issueLabel WHERE label_id = xx`

If `xx` is not present in stat4, sqlite falls back to an average. The average is roughly the number of unsampled rows divided by the number of rows in the table.

If there is a heavy skew, this average can be way off. In terabugs we see it as being 10x off in cases.

This PR gathers more stats to get better averages for the case a sample is missing.
@tantaman tantaman merged commit d04ee2a into main May 4, 2026
53 of 54 checks passed
@tantaman tantaman deleted the mlaw/more-samples branch May 4, 2026 17:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant