perf: batch ERB snippets into single RuboCop investigation per file by ryanquanz · Pull Request #457 · Shopify/erb_lint

ryanquanz · 2026-03-30T14:18:49Z

Summary

Instead of calling team.investigate() once per ERB tag, combine all valid-syntax snippets from each file into a single source string and investigate once per file.

Depends on #454 (cache RuboCop team), which is cherry-picked into this branch.

Problem

The Rubocop linter previously called team.investigate() for each <%= %> / <% %> tag independently. Each investigate() call has significant fixed overhead — creating a Commissioner, running on_new_investigation for ~500 cops, walking the AST — that dominates for the small code snippets typical of ERB tags. A file with 20 ERB tags made 20 separate investigate calls.

Approach

Prepare snippets: Extract and align each ERB tag's Ruby code, using Ripper.sexp as a fast pre-filter to reject obviously invalid syntax (~38% of snippets are partial blocks like if, end, else)
Batch: Combine all valid-syntax snippets into a single source string (one snippet per line), track byte offsets for each
Investigate once: Run team.investigate() on the combined source
Map back: Binary search maps each offense's byte position back to its originating snippet, translates to ERB-source coordinates

When the combined source has invalid syntax (~4% of files, due to rare Ripper/RuboCop parser disagreements), falls back to per-snippet validation and retries the batch with only RuboCop-validated snippets. Falls back to per-snippet investigation only as a final resort.

Impact

The improvement scales with ERB tag density — more tags per file means more avoided investigate() calls. The old per-tag cost was roughly constant (~6–7ms). The new per-tag cost decreases as density increases because per-file overhead is amortized.

ERB tags/file	Before	After	Reduction	Per-tag cost: before → after
2	0.64s	0.16s	−76%	6.4ms → 1.6ms
5	1.41s	0.35s	−75%	5.6ms → 1.4ms
10	3.25s	0.53s	−84%	6.5ms → 1.1ms
15	5.35s	0.76s	−86%	7.1ms → 1.0ms
25	8.94s	0.99s	−89%	7.2ms → 0.8ms

(50 synthetic files, all default linters enabled. Includes the effect of #454.)

With --cache enabled, the improvement only applies to cache misses. Fully cached runs are unaffected.

Changes

2 files changed, +190 −23.

Replaces inspect_content (per-node) with run → prepare_snippets → run_batched_snippets / run_single_snippet
Adds Ripper dependency for fast syntax pre-filtering
Adds find_snippet_for_byte_pos (binary search for offense→snippet mapping)
Adds investigate_batched with correct offset handling for autocorrect
6 new tests in rubocop_spec.rb covering: multi-tag batch detection with position mapping, mixed valid/invalid syntax with fallback, single-tag bypass, and batch-path autocorrect

Reproducible benchmark

# Save anywhere, run with: bundle exec ruby /tmp/erblint_benchmark.rb
require "erb_lint/all"; require "benchmark"; require "fileutils"
WORKDIR = Dir.mktmpdir("erblint_bench")
50.times do |i|
  lines = ['<div class="container">']
  15.times do |j|
    lines.concat(["  <div>", "    <%= helper_method_#{j}(arg1, arg2) %>",
      "    <% if condition_#{j} %>", "      <span><%= object_#{j}.name %></span>",
      "    <% end %>", "  </div>"])
  end
  File.write(File.join(WORKDIR, "template_#{i}.html.erb"), lines.push("</div>").join("\n"))
end
files = Dir.glob(File.join(WORKDIR, "*.html.erb"))
fl = ERBLint::FileLoader.new(WORKDIR)
cfg = ERBLint::RunnerConfig.default_for(ERBLint::RunnerConfig.new({
  "EnableDefaultLinters" => true, "linters" => { "ErbSafety" => { "enabled" => false },
  "Rubocop" => { "enabled" => true, "rubocop_config" => {
    "Layout/InitialIndentation" => { "Enabled" => false },
    "Layout/TrailingEmptyLines" => { "Enabled" => false },
    "Layout/TrailingWhitespace" => { "Enabled" => false },
    "Naming/FileName" => { "Enabled" => false },
    "Style/FrozenStringLiteralComment" => { "Enabled" => false },
    "Layout/LineLength" => { "Enabled" => false },
    "Lint/UselessAssignment" => { "Enabled" => false }}}}}, fl))
run = -> {
  r = ERBLint::Runner.new(fl, cfg)
  files.each { |f| r.clear_offenses; r.run(ERBLint::ProcessedSource.new(f, File.read(f, encoding: Encoding::UTF_8))) }
}
run.call # warmup
times = 3.times.map { GC.start; Benchmark.realtime { run.call } }.sort
puts "#{files.size} files, median: #{format("%.3f", times[1])}s"
FileUtils.rm_rf(WORKDIR)

Instead of calling RuboCop's team.investigate() once per ERB tag (thousands of times per run), combine all valid-syntax snippets from each file into a single source and investigate once per file. Uses Ripper for fast syntax pre-filtering to reject obviously invalid snippets (~38%) before batch assembly. When the combined source has invalid syntax (rare, ~4% of files due to Ripper/RuboCop parser disagreements), falls back to per-snippet validation and retries the batch with only RuboCop-validated snippets. Includes binary search for mapping offense byte positions back to their originating ERB snippets, and correct offset handling for autocorrect in the batched path.

ryanquanz added 2 commits March 29, 2026 23:26

perf: cache RuboCop team and filter cop registry to enabled cops

e2faeca

ryanquanz changed the base branch from main to perf/cache-rubocop-team March 30, 2026 15:04

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf: batch ERB snippets into single RuboCop investigation per file#457

perf: batch ERB snippets into single RuboCop investigation per file#457
ryanquanz wants to merge 2 commits into
perf/cache-rubocop-teamfrom
perf/batch-rubocop-investigation

ryanquanz commented Mar 30, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

ryanquanz commented Mar 30, 2026

Summary

Problem

Approach

Impact

Changes

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant