Skip to content

fix(grep): exclude token-bomb dirs from recursive search (#2064)#2174

Open
maxmilian wants to merge 1 commit into
rtk-ai:developfrom
maxmilian:fix/grep-rn-fallback-expansion
Open

fix(grep): exclude token-bomb dirs from recursive search (#2064)#2174
maxmilian wants to merge 1 commit into
rtk-ai:developfrom
maxmilian:fix/grep-rn-fallback-expansion

Conversation

@maxmilian
Copy link
Copy Markdown

@maxmilian maxmilian commented May 30, 2026

Summary

Target branch is develop per CONTRIBUTING.

Root cause

src/cmds/system/grep_cmd.rs passes --no-ignore-vcs to rg (intentional, to avoid false negatives for .gitignore'd files — comment at the call site). The side effect: rg descends dependency/build dirs like node_modules, emitting megabytes of minified single-line bundles. When rg is absent, the fallback execs system grep -rnHZ (BSD grep on macOS), which is not gitignore-aware and descends node_modules too.

Fix

Wire the existing but previously-unused config.filters.ignore_dirs / ignore_files list (node_modules, target, .venv, vendor, *.min.js, *.lock, …) into:

  • the rg invocation as --glob '!<pat>', and
  • the system-grep fallback as --exclude-dir=<dir> / --exclude=<file>.

Correctness (no false negatives)

  • rg honors an explicitly-given path inside an excluded dir, so rtk grep foo node_modules/pkg works.
  • grep --exclude-dir does not honor that (verified on BSD grep). To keep both paths consistent and never drop a result the user asked for on purpose, exclusions are skipped entirely when the search path itself targets an ignored dir (path_targets_ignored_dir). The check is path-component-anchored, so my-node_modules-helper is unaffected.

Tests

  • cargo fmt --all --check && cargo clippy --all-targets && cargo test — all green (1986 passed).
  • New unit tests (red→green): test_ignore_globs_exclude_node_modules, test_ignore_globs_empty_config_yields_nothing, test_ignore_grep_excludes_for_fallback, test_path_targets_ignored_dir.
  • Manual before/after on a frontend-style repo with node_modules:
    • rtk grep define .10,517 chars → 104 chars (node_modules no longer dumped).
    • rtk grep define node_modules/pkg-a → still returns the matches (explicit path honored).

Token-savings ≥60% checklist item is N/A: this is a correctness/expansion fix on the existing grep filter, not a new compression filter — and it strictly reduces output. Cites design principle #1 (Correctness over Token Savings).

Scope

  • Intentionally keeps --no-ignore-vcs (false-negative-avoidance preserved for non-token-bomb gitignore'd files).
  • Reuses the existing FilterConfig ignore list rather than inventing a new hard-coded set, so users can already customize it via config.
  • No shell-string interpolation; all args go through Command::arg.

Follow-up

rtk grep passes --no-ignore-vcs to rg so it still searches most
.gitignore'd files (avoids false negatives, rtk-ai#1436-era behavior). But
that also makes rg descend dependency/build dirs like node_modules,
dumping minified vendored code and turning a token *reducer* into a
token *expander* — a single `grep -rn` in a frontend repo can push
megabytes of minified code into context. When rg is absent, the
system-grep fallback (BSD grep on macOS) is not gitignore-aware and
descends node_modules too.

Re-exclude only the well-known token-bomb dirs/files by wiring the
existing-but-unused config.filters.ignore_dirs / ignore_files list into
both the rg invocation (--glob '!x') and the grep fallback
(--exclude-dir / --exclude).

Correctness: rg honors an explicitly-given path inside an excluded dir,
but `grep --exclude-dir` does not. To keep both paths consistent and
never drop a result the user asked for on purpose, exclusions are
skipped entirely when the search path itself targets an ignored dir
(e.g. `rtk grep foo node_modules/pkg`).

Repro (frontend repo with node_modules): `rtk grep define .`
10,517 chars -> 104 chars; explicit `rtk grep define node_modules/pkg`
still returns the match.
@maxmilian maxmilian force-pushed the fix/grep-rn-fallback-expansion branch from 57151ba to a4e0f5c Compare May 30, 2026 16:38
@maxmilian maxmilian marked this pull request as ready for review May 30, 2026 16:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

grep -rn fallback causes token *expansion*: execs system grep, descends node_modules, dumps minified files

1 participant