perf: batch ERB snippets into single RuboCop investigation per file#457
Draft
ryanquanz wants to merge 2 commits into
Draft
perf: batch ERB snippets into single RuboCop investigation per file#457ryanquanz wants to merge 2 commits into
ryanquanz wants to merge 2 commits into
Conversation
Instead of calling RuboCop's team.investigate() once per ERB tag (thousands of times per run), combine all valid-syntax snippets from each file into a single source and investigate once per file. Uses Ripper for fast syntax pre-filtering to reject obviously invalid snippets (~38%) before batch assembly. When the combined source has invalid syntax (rare, ~4% of files due to Ripper/RuboCop parser disagreements), falls back to per-snippet validation and retries the batch with only RuboCop-validated snippets. Includes binary search for mapping offense byte positions back to their originating ERB snippets, and correct offset handling for autocorrect in the batched path.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Instead of calling
team.investigate()once per ERB tag, combine all valid-syntax snippets from each file into a single source string and investigate once per file.Depends on #454 (cache RuboCop team), which is cherry-picked into this branch.
Problem
The Rubocop linter previously called
team.investigate()for each<%= %>/<% %>tag independently. Eachinvestigate()call has significant fixed overhead — creating a Commissioner, runningon_new_investigationfor ~500 cops, walking the AST — that dominates for the small code snippets typical of ERB tags. A file with 20 ERB tags made 20 separate investigate calls.Approach
Ripper.sexpas a fast pre-filter to reject obviously invalid syntax (~38% of snippets are partial blocks likeif,end,else)team.investigate()on the combined sourceWhen the combined source has invalid syntax (~4% of files, due to rare Ripper/RuboCop parser disagreements), falls back to per-snippet validation and retries the batch with only RuboCop-validated snippets. Falls back to per-snippet investigation only as a final resort.
Impact
The improvement scales with ERB tag density — more tags per file means more avoided
investigate()calls. The old per-tag cost was roughly constant (~6–7ms). The new per-tag cost decreases as density increases because per-file overhead is amortized.(50 synthetic files, all default linters enabled. Includes the effect of #454.)
With
--cacheenabled, the improvement only applies to cache misses. Fully cached runs are unaffected.Changes
2 files changed, +190 −23.
inspect_content(per-node) withrun→prepare_snippets→run_batched_snippets/run_single_snippetRipperdependency for fast syntax pre-filteringfind_snippet_for_byte_pos(binary search for offense→snippet mapping)investigate_batchedwith correct offset handling for autocorrectrubocop_spec.rbcovering: multi-tag batch detection with position mapping, mixed valid/invalid syntax with fallback, single-tag bypass, and batch-path autocorrectReproducible benchmark