fix(grep): exclude token-bomb dirs from recursive search (#2064)#2174
Open
maxmilian wants to merge 1 commit into
Open
fix(grep): exclude token-bomb dirs from recursive search (#2064)#2174maxmilian wants to merge 1 commit into
maxmilian wants to merge 1 commit into
Conversation
rtk grep passes --no-ignore-vcs to rg so it still searches most .gitignore'd files (avoids false negatives, rtk-ai#1436-era behavior). But that also makes rg descend dependency/build dirs like node_modules, dumping minified vendored code and turning a token *reducer* into a token *expander* — a single `grep -rn` in a frontend repo can push megabytes of minified code into context. When rg is absent, the system-grep fallback (BSD grep on macOS) is not gitignore-aware and descends node_modules too. Re-exclude only the well-known token-bomb dirs/files by wiring the existing-but-unused config.filters.ignore_dirs / ignore_files list into both the rg invocation (--glob '!x') and the grep fallback (--exclude-dir / --exclude). Correctness: rg honors an explicitly-given path inside an excluded dir, but `grep --exclude-dir` does not. To keep both paths consistent and never drop a result the user asked for on purpose, exclusions are skipped entirely when the search path itself targets an ignored dir (e.g. `rtk grep foo node_modules/pkg`). Repro (frontend repo with node_modules): `rtk grep define .` 10,517 chars -> 104 chars; explicit `rtk grep define node_modules/pkg` still returns the match.
57151ba to
a4e0f5c
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
rtk grep -rn <pat> <dir>descendsnode_modulesand dumps minified vendored code, turning a token reducer into a token expander (~57× the gitignore-aware grep in the report).--no-ignore-vcsso genuinely-searched .gitignore'd files (configs, etc.) are still found.Root cause
src/cmds/system/grep_cmd.rspasses--no-ignore-vcsto rg (intentional, to avoid false negatives for .gitignore'd files — comment at the call site). The side effect: rg descends dependency/build dirs likenode_modules, emitting megabytes of minified single-line bundles. When rg is absent, the fallback execs systemgrep -rnHZ(BSD grep on macOS), which is not gitignore-aware and descendsnode_modulestoo.Fix
Wire the existing but previously-unused
config.filters.ignore_dirs/ignore_fileslist (node_modules,target,.venv,vendor,*.min.js,*.lock, …) into:--glob '!<pat>', and--exclude-dir=<dir>/--exclude=<file>.Correctness (no false negatives)
rtk grep foo node_modules/pkgworks.grep --exclude-dirdoes not honor that (verified on BSD grep). To keep both paths consistent and never drop a result the user asked for on purpose, exclusions are skipped entirely when the search path itself targets an ignored dir (path_targets_ignored_dir). The check is path-component-anchored, somy-node_modules-helperis unaffected.Tests
cargo fmt --all --check && cargo clippy --all-targets && cargo test— all green (1986 passed).test_ignore_globs_exclude_node_modules,test_ignore_globs_empty_config_yields_nothing,test_ignore_grep_excludes_for_fallback,test_path_targets_ignored_dir.node_modules:rtk grep define .→ 10,517 chars → 104 chars (node_modules no longer dumped).rtk grep define node_modules/pkg-a→ still returns the matches (explicit path honored).Token-savings ≥60% checklist item is N/A: this is a correctness/expansion fix on the existing grep filter, not a new compression filter — and it strictly reduces output. Cites design principle #1 (Correctness over Token Savings).
Scope
--no-ignore-vcs(false-negative-avoidance preserved for non-token-bomb gitignore'd files).FilterConfigignore list rather than inventing a new hard-coded set, so users can already customize it via config.Command::arg.Follow-up
-rnfallback as a coverage/measurement gap; this PR addresses the severity (expansion) side and is independent of those.