perf(store): speed up redaction scanning by 0xjunha · Pull Request #155 · 0xjunha/darc

0xjunha · 2026-05-20T15:32:41Z

Summary

This PR speeds up Darc's store redaction path while preserving the existing redacted index output.

The main change is a redaction prefilter and targeted scanner rewrite in darc-store. Instead of sending every text field through every expensive detector, Darc now first checks for cheap literal indicators of redactable content. When there is no candidate signal, it skips the heavier redaction pipeline. Hot regex-based paths for generic secret assignments and base64-like blobs were also replaced with targeted scanners that preserve the previous matching behavior.

A benchmark helper script was added to make cold-refresh and index-rebuild timing easier to reproduce, including optional redaction snapshot comparison.

Changes

Add an aho-corasick-based prefilter for text redaction candidates.
Avoid allocating replacement strings for no-op redaction cases.
Replace the broad generic secret assignment regex with a manual scanner.
Replace base64-like blob regex scanning with a targeted scanner.
Keep the old generic assignment regex under cfg(test) as an oracle for behavior checks.
Add tests comparing scanner behavior against the prior regex behavior.
Add scripts/bench-cold-refresh.sh for repeatable timing and snapshot checks.
Update CHANGELOG.md under Unreleased.

Correctness

The intended behavior is no regression in redaction output.

Validation performed:

Compared old and new redacted SQLite outputs across:
- sessions
- turns
- turn_search
- tool_calls
- file_accesses
- turn_evidence
Result: 0 diffs.
Ran the new benchmark helper with snapshot comparison.
Result: snapshot_match=yes.
Added test coverage that keeps the old regex behavior available as a test oracle.

Performance

Observed local results:

Fresh temp-root cold refresh:
- before: 29.34s
- after: 9.59s
Frozen archive index rebuild:
- before: 28.82s
- after: 7.85s
Old regex-oracle rebuild:
- 30.37s

The speedup comes mostly from avoiding expensive regex work on text that has no plausible redaction candidate.

Risk

The main risk is false negatives in redaction correctness. This PR mitigates that by preserving the old detector semantics through oracle tests and by comparing full redacted index snapshots before and after the change.

The new scanners are intentionally scoped to match the previous behavior rather than expanding the redaction policy in this PR.

0xjunha added 2 commits May 21, 2026 00:28

perf(store): speed up redaction scanning

c101d42

chore(scripts): add cold refresh benchmark helper

0e205ba

0xjunha merged commit f27724f into main May 20, 2026
12 checks passed

0xjunha deleted the perf-redaction branch May 20, 2026 15:37

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf(store): speed up redaction scanning#155

perf(store): speed up redaction scanning#155
0xjunha merged 2 commits into
mainfrom
perf-redaction

0xjunha commented May 20, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

0xjunha commented May 20, 2026

Summary

Changes

Correctness

Performance

Risk

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant