Conversation
stat4 gathers sample values and sample cardianlities for each index. This helps the sqlite planner choose better plans. Problems can still arise though. One such case exists for terabugs: the issueLabel -> issue join. `SELECT * FROM issueLabel WHERE label_id = xx` If `xx` is not present in stat4, sqlite falls back to an average. The average is roughly the number of unsampled rows divided by the number of rows in the table. If there is a heavy skew, this average can be way off. In terabugs we see it as being 10x off in cases. This PR gathers more stats to get better averages for the case a sample is missing.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
stat4 gathers sample values and sample cardianlities for each index.
This helps the sqlite planner choose better plans.
Problems can still arise though. One such case exists for terabugs:
the issueLabel -> issue join.
SELECT * FROM issueLabel WHERE label_id = xxIf
xxis not present in stat4, sqlite falls back to an average. The average is roughly the number of unsampled rows divided by the number of rows in the table.If there is a heavy skew, this average can be way off. In terabugs we see it as 10x off in cases.
This PR gathers more stats to get better averages for the case a sample is missing.