Skip to content

perf: batch ERB snippets into single RuboCop investigation per file#457

Draft
ryanquanz wants to merge 2 commits into
perf/cache-rubocop-teamfrom
perf/batch-rubocop-investigation
Draft

perf: batch ERB snippets into single RuboCop investigation per file#457
ryanquanz wants to merge 2 commits into
perf/cache-rubocop-teamfrom
perf/batch-rubocop-investigation

Conversation

@ryanquanz
Copy link
Copy Markdown

Summary

Instead of calling team.investigate() once per ERB tag, combine all valid-syntax snippets from each file into a single source string and investigate once per file.

Depends on #454 (cache RuboCop team), which is cherry-picked into this branch.

Problem

The Rubocop linter previously called team.investigate() for each <%= %> / <% %> tag independently. Each investigate() call has significant fixed overhead — creating a Commissioner, running on_new_investigation for ~500 cops, walking the AST — that dominates for the small code snippets typical of ERB tags. A file with 20 ERB tags made 20 separate investigate calls.

Approach

  1. Prepare snippets: Extract and align each ERB tag's Ruby code, using Ripper.sexp as a fast pre-filter to reject obviously invalid syntax (~38% of snippets are partial blocks like if, end, else)
  2. Batch: Combine all valid-syntax snippets into a single source string (one snippet per line), track byte offsets for each
  3. Investigate once: Run team.investigate() on the combined source
  4. Map back: Binary search maps each offense's byte position back to its originating snippet, translates to ERB-source coordinates

When the combined source has invalid syntax (~4% of files, due to rare Ripper/RuboCop parser disagreements), falls back to per-snippet validation and retries the batch with only RuboCop-validated snippets. Falls back to per-snippet investigation only as a final resort.

Impact

The improvement scales with ERB tag density — more tags per file means more avoided investigate() calls. The old per-tag cost was roughly constant (~6–7ms). The new per-tag cost decreases as density increases because per-file overhead is amortized.

ERB tags/file Before After Reduction Per-tag cost: before → after
2 0.64s 0.16s −76% 6.4ms → 1.6ms
5 1.41s 0.35s −75% 5.6ms → 1.4ms
10 3.25s 0.53s −84% 6.5ms → 1.1ms
15 5.35s 0.76s −86% 7.1ms → 1.0ms
25 8.94s 0.99s −89% 7.2ms → 0.8ms

(50 synthetic files, all default linters enabled. Includes the effect of #454.)

With --cache enabled, the improvement only applies to cache misses. Fully cached runs are unaffected.

Changes

2 files changed, +190 −23.

  • Replaces inspect_content (per-node) with runprepare_snippetsrun_batched_snippets / run_single_snippet
  • Adds Ripper dependency for fast syntax pre-filtering
  • Adds find_snippet_for_byte_pos (binary search for offense→snippet mapping)
  • Adds investigate_batched with correct offset handling for autocorrect
  • 6 new tests in rubocop_spec.rb covering: multi-tag batch detection with position mapping, mixed valid/invalid syntax with fallback, single-tag bypass, and batch-path autocorrect
Reproducible benchmark
# Save anywhere, run with: bundle exec ruby /tmp/erblint_benchmark.rb
require "erb_lint/all"; require "benchmark"; require "fileutils"
WORKDIR = Dir.mktmpdir("erblint_bench")
50.times do |i|
  lines = ['<div class="container">']
  15.times do |j|
    lines.concat(["  <div>", "    <%= helper_method_#{j}(arg1, arg2) %>",
      "    <% if condition_#{j} %>", "      <span><%= object_#{j}.name %></span>",
      "    <% end %>", "  </div>"])
  end
  File.write(File.join(WORKDIR, "template_#{i}.html.erb"), lines.push("</div>").join("\n"))
end
files = Dir.glob(File.join(WORKDIR, "*.html.erb"))
fl = ERBLint::FileLoader.new(WORKDIR)
cfg = ERBLint::RunnerConfig.default_for(ERBLint::RunnerConfig.new({
  "EnableDefaultLinters" => true, "linters" => { "ErbSafety" => { "enabled" => false },
  "Rubocop" => { "enabled" => true, "rubocop_config" => {
    "Layout/InitialIndentation" => { "Enabled" => false },
    "Layout/TrailingEmptyLines" => { "Enabled" => false },
    "Layout/TrailingWhitespace" => { "Enabled" => false },
    "Naming/FileName" => { "Enabled" => false },
    "Style/FrozenStringLiteralComment" => { "Enabled" => false },
    "Layout/LineLength" => { "Enabled" => false },
    "Lint/UselessAssignment" => { "Enabled" => false }}}}}, fl))
run = -> {
  r = ERBLint::Runner.new(fl, cfg)
  files.each { |f| r.clear_offenses; r.run(ERBLint::ProcessedSource.new(f, File.read(f, encoding: Encoding::UTF_8))) }
}
run.call # warmup
times = 3.times.map { GC.start; Benchmark.realtime { run.call } }.sort
puts "#{files.size} files, median: #{format("%.3f", times[1])}s"
FileUtils.rm_rf(WORKDIR)

Instead of calling RuboCop's team.investigate() once per ERB tag
(thousands of times per run), combine all valid-syntax snippets from
each file into a single source and investigate once per file.

Uses Ripper for fast syntax pre-filtering to reject obviously invalid
snippets (~38%) before batch assembly. When the combined source has
invalid syntax (rare, ~4% of files due to Ripper/RuboCop parser
disagreements), falls back to per-snippet validation and retries the
batch with only RuboCop-validated snippets.

Includes binary search for mapping offense byte positions back to
their originating ERB snippets, and correct offset handling for
autocorrect in the batched path.
@ryanquanz ryanquanz changed the base branch from main to perf/cache-rubocop-team March 30, 2026 15:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant