Skip to content

perf(scanner): chunk file discovery and cache I/O across rules#11

Merged
NhanAZ merged 1 commit into
mainfrom
agent/performance-optimization
Jun 9, 2026
Merged

perf(scanner): chunk file discovery and cache I/O across rules#11
NhanAZ merged 1 commit into
mainfrom
agent/performance-optimization

Conversation

@NhanAZ

@NhanAZ NhanAZ commented Jun 9, 2026

Copy link
Copy Markdown
Collaborator

What changed

  • src/scanner/index.ts: Introduced a chunk-based processing model where context.files are sliced into chunks of 100 files at a time.
  • src/types.ts: Added an optional getFileContent async method to ScanContext.
  • src/rules/opk-001, opk-002, opk-003: Refactored to consume getFileContent instead of calling fs.readFileSync independently.

Why

Previously, the scanner looped through rules on the outer layer, meaning that if a repository had 10,000 files, and 5 different rules needed to read the file contents to perform AST/Regex checks, the scanner would read each file 5 times from disk, triggering fs.readFileSync 50,000 times!

By introducing a shared cache per file chunk, we eliminate redundant disk I/O. The chunk size (100) ensures that we don't hold the entire repository in memory, preventing Out of Memory (OOM) exceptions.

Architecture

  • The Rule interface remains perfectly backwards compatible. Existing rules that don't use getFileContent will just continue to run perfectly fine.
  • Internal rules are updated to check if getFileContent exists; if it does, they use it.
  • This is entirely transparent to the end-user, except the scan duration drops significantly for large codebases.

Testing

  • Automated tests continue to pass 41/41.
  • Validated rule findings are identically triggered (no functionality regression).
  • Binary files and files >1MB are safely bypassed during the caching phase.

Risks

Negligible. We handle read rejections inside the cache and yield undefined seamlessly so that rules skip inaccessible files without crashing the whole process.

Follow-up

Next step: Add more comprehensive test fixtures to fully close out v0.3.0.

@NhanAZ NhanAZ merged commit b00b819 into main Jun 9, 2026
@NhanAZ NhanAZ deleted the agent/performance-optimization branch June 9, 2026 16:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant