Skip to content

fix: BoundedBacktracker span-based CanHandle + ReplaceAllStringFunc O(n)#128

Merged
kolkov merged 2 commits intomainfrom
feature/issue-127-bt-canhandle-fix
Mar 8, 2026
Merged

fix: BoundedBacktracker span-based CanHandle + ReplaceAllStringFunc O(n)#128
kolkov merged 2 commits intomainfrom
feature/issue-127-bt-canhandle-fix

Conversation

@kolkov
Copy link
Contributor

@kolkov kolkov commented Mar 8, 2026

Summary

  • BoundedBacktracker span-based CanHandleSearchAtWithState(haystack, at, state) now checks CanHandle(len(haystack) - at) instead of CanHandle(len(haystack)). The visited table is sized for the search span only, with positions stored relative to SpanStart. Full haystack is preserved for \b assertions. This matches Rust regex's Input span model (backtrack.rs line 1848).
  • ReplaceAllStringFunc O(n) performance — Replaced result += string O(n²) concatenation with strings.Builder. On 150K replacements over 6MB string: 2m19s → 1.3s.
  • Tests — Added TestBoundedBacktracker_SearchAtWithState_SpanBased (large haystack span test) and TestBoundedBacktracker_SearchAtWithState_WordBoundary (\b context preservation).

Fixes #127. Reported by @kostya via #124.

Test plan

  • go test ./... — all 9 packages pass
  • gofmt -l . — no formatting issues
  • Verified LogParser: 13 patterns × 52000 lines (7MB) — all match counts identical to stdlib
  • Verified Template: ReplaceAllStringFunc 150K replacements — results identical, 1.3s vs 2m19s
  • Span-based test: haystack > maxInputSize, search from late position finds matches
  • Word boundary test: \bfoo\b with search starting mid-haystack preserves backward context
  • CI: tests + benchmarks + lint

…(n) performance

SearchAtWithState checked CanHandle(len(haystack)) against full haystack,
rejecting valid searches when remaining span [at, len] fit within budget.
LogParser on 7MB input returned 22004 matches instead of 33089.

Fix: span-based visited table sizing matching Rust regex's Input span model.
Visited positions stored relative to SpanStart. Full haystack preserved for
zero-width assertions (\b) needing backward context.

ReplaceAllStringFunc replaced O(n^2) string concatenation with
strings.Builder for O(n) performance (2m19s -> 1.3s on 150K matches).

Fixes #127
@codecov
Copy link

codecov bot commented Mar 8, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

@github-actions
Copy link

github-actions bot commented Mar 8, 2026

Benchmark Comparison

Comparing main → PR #128

Summary: geomean 121.9n 121.8n -0.06%

⚠️ Potential regressions detected:

geomean                               ³                +0.00%               ³
geomean                               ³                +0.00%               ³
geomean                         ³                +0.00%               ³
geomean                         ³                +0.00%               ³
AhoCorasickLargeInput/coregex_IsMatch_64KB-4            109.4µ ± ∞ ¹    116.2µ ± ∞ ¹    +6.25% (p=0.008 n=5)
MatchAnchoredLiteral/no_match_prefix-4                  2.500n ± ∞ ¹    2.508n ± ∞ ¹    +0.32% (p=0.040 n=5)
ASCIIOptimization_Issue79/short_WithASCII-4             276.1n ± ∞ ¹    279.2n ± ∞ ¹    +1.12% (p=0.008 n=5)
ASCIIOptimization_Issue79/medium_WithoutASCII-4         1.646µ ± ∞ ¹    1.692µ ± ∞ ¹    +2.79% (p=0.008 n=5)
ASCIIOptimization_Issue79/long_WithASCII-4              1.101µ ± ∞ ¹    1.202µ ± ∞ ¹    +9.17% (p=0.032 n=5)
ASCIIOptimization_Issue79/long_WithoutASCII-4           2.638µ ± ∞ ¹    2.791µ ± ∞ ¹    +5.80% (p=0.008 n=5)

Full results available in workflow artifacts. CI runners have ~10-20% variance.
For accurate benchmarks, run locally: ./scripts/bench.sh --compare

@kolkov kolkov merged commit 3f1ca71 into main Mar 8, 2026
9 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

fix: BoundedBacktracker SearchAtWithState rejects valid searches on large inputs

1 participant