perf(scanner): chunk file discovery and cache I/O across rules by NhanAZ · Pull Request #11 · NhanAZ-Drops/OpenPolicyKit

NhanAZ · 2026-06-09T16:21:21Z

What changed

src/scanner/index.ts: Introduced a chunk-based processing model where context.files are sliced into chunks of 100 files at a time.
src/types.ts: Added an optional getFileContent async method to ScanContext.
src/rules/opk-001, opk-002, opk-003: Refactored to consume getFileContent instead of calling fs.readFileSync independently.

Why

Previously, the scanner looped through rules on the outer layer, meaning that if a repository had 10,000 files, and 5 different rules needed to read the file contents to perform AST/Regex checks, the scanner would read each file 5 times from disk, triggering fs.readFileSync 50,000 times!

By introducing a shared cache per file chunk, we eliminate redundant disk I/O. The chunk size (100) ensures that we don't hold the entire repository in memory, preventing Out of Memory (OOM) exceptions.

Architecture

The Rule interface remains perfectly backwards compatible. Existing rules that don't use getFileContent will just continue to run perfectly fine.
Internal rules are updated to check if getFileContent exists; if it does, they use it.
This is entirely transparent to the end-user, except the scan duration drops significantly for large codebases.

Testing

Automated tests continue to pass 41/41.
Validated rule findings are identically triggered (no functionality regression).
Binary files and files >1MB are safely bypassed during the caching phase.

Risks

Negligible. We handle read rejections inside the cache and yield undefined seamlessly so that rules skip inaccessible files without crashing the whole process.

Follow-up

Next step: Add more comprehensive test fixtures to fully close out v0.3.0.

perf(scanner): chunk file discovery and cache I/O across rules

956d26e

NhanAZ merged commit b00b819 into main Jun 9, 2026

NhanAZ deleted the agent/performance-optimization branch June 9, 2026 16:21

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf(scanner): chunk file discovery and cache I/O across rules#11

perf(scanner): chunk file discovery and cache I/O across rules#11
NhanAZ merged 1 commit into
mainfrom
agent/performance-optimization

NhanAZ commented Jun 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

NhanAZ commented Jun 9, 2026

What changed

Why

Architecture

Testing

Risks

Follow-up

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant