Skip to content

Commit 5d43429

Browse files
authored
Merge pull request #134 from coregx/feature/pikevm-sparse-dispatch
perf: PikeVM sparse-dispatch for dot patterns — 2.8-4.8x speedup (#132)
2 parents 30fb2d9 + 28ded30 commit 5d43429

5 files changed

Lines changed: 213 additions & 36 deletions

File tree

CHANGELOG.md

Lines changed: 15 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -12,6 +12,21 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
1212
- ARM NEON SIMD support (Go 1.26 `simd/archsimd` intrinsics — [#120](https://github.com/coregx/coregex/issues/120))
1313
- SIMD prefilter for CompositeSequenceDFA (#83)
1414

15+
## [0.12.7] - 2026-03-10
16+
17+
### Performance
18+
- **PikeVM sparse-dispatch for `.` patterns** (Issue [#132](https://github.com/coregx/coregex/issues/132)) —
19+
The NFA compiler generated ~9 split states chaining UTF-8 byte-range alternation
20+
branches for each `.` (AnyCharNotNL). PikeVM had to DFS-traverse the entire split
21+
chain at every byte position, resulting in O(branches) work per byte. For `.*?`
22+
patterns on large inputs (e.g., `\{\{(.*?)\}\}` on 10MB template), this caused
23+
~5 billion branch evaluations.
24+
Fix: new `compileUTF8AnySparse()` compiles `.` as a single sparse state that maps
25+
each leading byte range directly to its continuation chain — O(1) dispatch instead
26+
of O(branches) split-chain traversal. Same approach as Rust regex's `State::Sparse`.
27+
PikeVM speedup: **2.8-4.8x** on dot-heavy patterns. DFA unaffected (uses byte-level NFA).
28+
Reported by [@kostya](https://github.com/kostya) via LangArena benchmarks.
29+
1530
## [0.12.6] - 2026-03-08
1631

1732
### Fixed

ROADMAP.md

Lines changed: 10 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@
22

33
> **Strategic Focus**: Production-grade regex engine with RE2/rust-regex level optimizations
44
5-
**Last Updated**: 2026-03-08 | **Current Version**: v0.12.6 | **Target**: v1.0.0 stable
5+
**Last Updated**: 2026-03-10 | **Current Version**: v0.12.7 | **Target**: v1.0.0 stable
66

77
---
88

@@ -12,7 +12,7 @@ Build a **production-ready, high-performance regex engine** for Go that matches
1212

1313
### Current State vs Target
1414

15-
| Metric | Current (v0.12.6) | Target (v1.0.0) |
15+
| Metric | Current (v0.12.7) | Target (v1.0.0) |
1616
|--------|-------------------|-----------------|
1717
| Inner literal speedup | **280-3154x** | ✅ Achieved |
1818
| Case-insensitive speedup | **263x** | ✅ Achieved |
@@ -68,7 +68,9 @@ v0.12.4 ✅ → Test coverage 80%+, CI improvements, awesome-go readiness
6868
6969
v0.12.5 ✅ → Non-greedy quantifier fix, ReverseSuffix correctness (#124)
7070
71-
v0.12.6 (Current) ✅ → BoundedBacktracker span-based CanHandle, ReplaceAllStringFunc O(n) (#127)
71+
v0.12.6 ✅ → BoundedBacktracker span-based CanHandle, ReplaceAllStringFunc O(n) (#127)
72+
73+
v0.12.7 (Current) ✅ → PikeVM sparse-dispatch for dot patterns, 2.8-4.8x speedup (#132)
7274
7375
v1.0.0-rc → Feature freeze, API locked
7476
@@ -103,6 +105,7 @@ v1.0.0 STABLE → Production release with API stability guarantee
103105
-**v0.12.4**: Test coverage 80%+, CI improvements, awesome-go readiness (#123)
104106
-**v0.12.5**: Non-greedy quantifier fix, ReverseSuffix forward verification (#124)
105107
-**v0.12.6**: BoundedBacktracker span-based CanHandle, ReplaceAllStringFunc O(n) (#127)
108+
-**v0.12.7**: PikeVM sparse-dispatch for `.` patterns, 2.8-4.8x speedup (#132)
106109

107110
---
108111

@@ -194,7 +197,7 @@ v1.0.0 STABLE → Production release with API stability guarantee
194197

195198
## Feature Comparison Matrix
196199

197-
| Feature | RE2 | rust-regex | coregex v0.12.6 | coregex v1.0 |
200+
| Feature | RE2 | rust-regex | coregex v0.12.7 | coregex v1.0 |
198201
|---------|-----|------------|-----------------|--------------|
199202
| Lazy DFA |||||
200203
| Thompson NFA |||||
@@ -352,7 +355,8 @@ Reference implementations available locally:
352355

353356
| Version | Date | Type | Key Changes |
354357
|---------|------|------|-------------|
355-
| **v0.12.6** | 2026-03-08 | Fix | **BoundedBacktracker span-based CanHandle, ReplaceAllStringFunc O(n) (#127)** |
358+
| **v0.12.7** | 2026-03-10 | Performance | **PikeVM sparse-dispatch for `.` patterns, 2.8-4.8x speedup (#132)** |
359+
| v0.12.6 | 2026-03-08 | Fix | BoundedBacktracker span-based CanHandle, ReplaceAllStringFunc O(n) (#127) |
356360
| v0.12.5 | 2026-03-08 | Fix | Non-greedy quantifier fix, ReverseSuffix correctness (#124) |
357361
| v0.12.4 | 2026-03-01 | Test | Test coverage 80%+, CI improvements, awesome-go readiness |
358362
| **v0.12.3** | 2026-02-16 | Performance | **Cross-product literal expansion, 110x regexdna speedup (#119)** |
@@ -392,4 +396,4 @@ Reference implementations available locally:
392396

393397
---
394398

395-
*Current: v0.12.6 | Next: v0.13.0 | Target: v1.0.0*
399+
*Current: v0.12.7 | Next: v0.13.0 | Target: v1.0.0*

meta/compile.go

Lines changed: 73 additions & 30 deletions
Original file line numberDiff line numberDiff line change
@@ -279,11 +279,12 @@ func buildCharClassSearchers(
279279
strategy Strategy,
280280
re *syntax.Regexp,
281281
nfaEngine *nfa.NFA,
282+
btNFA *nfa.NFA, // NFA for BoundedBacktracker (runeNFA when available, else nfaEngine)
282283
) charClassSearcherResult {
283284
result := charClassSearcherResult{finalStrategy: strategy}
284285

285286
if strategy == UseBoundedBacktracker {
286-
result.boundedBT = nfa.NewBoundedBacktracker(nfaEngine)
287+
result.boundedBT = nfa.NewBoundedBacktracker(btNFA)
287288
}
288289

289290
if strategy == UseCharClassSearcher {
@@ -298,7 +299,7 @@ func buildCharClassSearchers(
298299
} else {
299300
// Fallback to BoundedBacktracker if extraction fails
300301
result.finalStrategy = UseBoundedBacktracker
301-
result.boundedBT = nfa.NewBoundedBacktracker(nfaEngine)
302+
result.boundedBT = nfa.NewBoundedBacktracker(btNFA)
302303
}
303304
}
304305

@@ -309,7 +310,7 @@ func buildCharClassSearchers(
309310
if result.compositeSrch == nil {
310311
// Fallback to BoundedBacktracker if extraction fails
311312
result.finalStrategy = UseBoundedBacktracker
312-
result.boundedBT = nfa.NewBoundedBacktracker(nfaEngine)
313+
result.boundedBT = nfa.NewBoundedBacktracker(btNFA)
313314
} else {
314315
// Try to build faster DFA (uses subset construction for overlapping patterns)
315316
result.compositeSeqDFA = nfa.NewCompositeSequenceDFA(re)
@@ -334,7 +335,7 @@ func buildCharClassSearchers(
334335
if result.branchDispatcher == nil {
335336
// Fallback to BoundedBacktracker if dispatch not possible
336337
result.finalStrategy = UseBoundedBacktracker
337-
result.boundedBT = nfa.NewBoundedBacktracker(nfaEngine)
338+
result.boundedBT = nfa.NewBoundedBacktracker(btNFA)
338339
}
339340
}
340341

@@ -343,12 +344,63 @@ func buildCharClassSearchers(
343344
// generation-based visited tracking (O(1) reset) vs PikeVM's thread queues.
344345
// This is similar to how stdlib uses backtracking for simple patterns.
345346
if result.finalStrategy == UseNFA && result.boundedBT == nil && nfaEngine.States() < 50 {
346-
result.boundedBT = nfa.NewBoundedBacktracker(nfaEngine)
347+
result.boundedBT = nfa.NewBoundedBacktracker(btNFA)
347348
}
348349

349350
return result
350351
}
351352

353+
// buildDotOptimizedNFAs compiles optimized NFA variants for patterns with '.'.
354+
// Returns:
355+
// - asciiNFA: NFA with '.' compiled as single ASCII byte range (for ASCII-only input)
356+
// - asciiBT: BoundedBacktracker for asciiNFA
357+
// - runeNFA: NFA with '.' compiled as sparse dispatch (fewer split states for PikeVM)
358+
func buildDotOptimizedNFAs(
359+
re *syntax.Regexp, config Config,
360+
) (*nfa.NFA, *nfa.BoundedBacktracker, *nfa.NFA) {
361+
if !nfa.ContainsDot(re) {
362+
return nil, nil, nil
363+
}
364+
365+
// ASCII-only NFA (V11-002 optimization):
366+
// compile '.' as single byte range [0x00-0x7F] for ASCII-only inputs.
367+
var asciiNFAEngine *nfa.NFA
368+
var asciiBT *nfa.BoundedBacktracker
369+
if config.EnableASCIIOptimization {
370+
asciiCompiler := nfa.NewCompiler(nfa.CompilerConfig{
371+
UTF8: true,
372+
Anchored: false,
373+
DotNewline: false,
374+
ASCIIOnly: true,
375+
MaxRecursionDepth: config.MaxRecursionDepth,
376+
})
377+
var err error
378+
asciiNFAEngine, err = asciiCompiler.CompileRegexp(re)
379+
if err == nil {
380+
asciiBT = nfa.NewBoundedBacktracker(asciiNFAEngine)
381+
}
382+
}
383+
384+
// Sparse-dispatch NFA: compile '.' as a single sparse state mapping each
385+
// leading byte range to the correct continuation chain. This eliminates
386+
// ~9 split states per dot, giving PikeVM O(1) dispatch instead of
387+
// O(branches) split-chain DFS. Measured 2.8-4.8x PikeVM speedup.
388+
var runeNFAEngine *nfa.NFA
389+
runeCompiler := nfa.NewCompiler(nfa.CompilerConfig{
390+
UTF8: true,
391+
Anchored: false,
392+
DotNewline: false,
393+
UseRuneStates: true,
394+
MaxRecursionDepth: config.MaxRecursionDepth,
395+
})
396+
runeNFAEngine, err := runeCompiler.CompileRegexp(re)
397+
if err != nil {
398+
runeNFAEngine = nil
399+
}
400+
401+
return asciiNFAEngine, asciiBT, runeNFAEngine
402+
}
403+
352404
// CompileRegexp compiles a parsed syntax.Regexp with default configuration.
353405
//
354406
// This is useful when you already have a parsed regexp from another source.
@@ -373,25 +425,8 @@ func CompileRegexp(re *syntax.Regexp, config Config) (*Engine, error) {
373425
}
374426
}
375427

376-
// Compile ASCII-only NFA for patterns with '.' (V11-002 optimization).
377-
// This enables runtime ASCII detection: if input is all ASCII, use the faster
378-
// ASCII NFA which has ~2.8x fewer states for '.'-heavy patterns.
379-
var asciiNFAEngine *nfa.NFA
380-
var asciiBT *nfa.BoundedBacktracker
381-
if nfa.ContainsDot(re) && config.EnableASCIIOptimization {
382-
asciiCompiler := nfa.NewCompiler(nfa.CompilerConfig{
383-
UTF8: true,
384-
Anchored: false,
385-
DotNewline: false,
386-
ASCIIOnly: true, // Key: compile '.' as single byte range
387-
MaxRecursionDepth: config.MaxRecursionDepth,
388-
})
389-
asciiNFAEngine, err = asciiCompiler.CompileRegexp(re)
390-
if err == nil {
391-
asciiBT = nfa.NewBoundedBacktracker(asciiNFAEngine)
392-
}
393-
// If ASCII NFA compilation fails, we fall back to UTF-8 NFA (asciiNFAEngine stays nil)
394-
}
428+
// Compile optimized NFA variants for patterns with '.'
429+
asciiNFAEngine, asciiBT, runeNFAEngine := buildDotOptimizedNFAs(re, config)
395430

396431
// Extract literals for prefiltering
397432
// NOTE: Don't build prefilter for start-anchored patterns (^...).
@@ -418,8 +453,14 @@ func CompileRegexp(re *syntax.Regexp, config Config) (*Engine, error) {
418453
// Select strategy (pass re for anchor detection)
419454
strategy := SelectStrategy(nfaEngine, re, literals, config)
420455

421-
// Build PikeVM (always needed for fallback)
422-
pikevm := nfa.NewPikeVM(nfaEngine)
456+
// Build PikeVM (always needed for fallback).
457+
// Use runeNFA when available — sparse dispatch replaces ~9 split states
458+
// with a single sparse state, giving PikeVM O(1) byte dispatch per '.'.
459+
pikevmNFA := nfaEngine
460+
if runeNFAEngine != nil {
461+
pikevmNFA = runeNFAEngine
462+
}
463+
pikevm := nfa.NewPikeVM(pikevmNFA)
423464

424465
// Build OnePass DFA for anchored patterns with captures (optional optimization)
425466
onePassRes := buildOnePassDFA(re, nfaEngine, config)
@@ -428,8 +469,9 @@ func CompileRegexp(re *syntax.Regexp, config Config) (*Engine, error) {
428469
engines := buildStrategyEngines(strategy, re, nfaEngine, literals, pf, config)
429470
strategy = engines.finalStrategy
430471

431-
// Build specialized searchers for character class patterns
432-
charClassResult := buildCharClassSearchers(strategy, re, nfaEngine)
472+
// Build specialized searchers for character class patterns.
473+
// Pass pikevmNFA so BoundedBacktrackers benefit from rune states.
474+
charClassResult := buildCharClassSearchers(strategy, re, nfaEngine, pikevmNFA)
433475
strategy = charClassResult.finalStrategy
434476

435477
// Check if pattern can match empty string.
@@ -497,7 +539,7 @@ func CompileRegexp(re *syntax.Regexp, config Config) (*Engine, error) {
497539
// Fallback if detection fails (shouldn't happen since SelectStrategy checked)
498540
if anchoredLiteralInfo == nil {
499541
strategy = UseBoundedBacktracker
500-
charClassResult.boundedBT = nfa.NewBoundedBacktracker(nfaEngine)
542+
charClassResult.boundedBT = nfa.NewBoundedBacktracker(pikevmNFA)
501543
}
502544
}
503545

@@ -506,6 +548,7 @@ func CompileRegexp(re *syntax.Regexp, config Config) (*Engine, error) {
506548

507549
return &Engine{
508550
nfa: nfaEngine,
551+
runeNFA: runeNFAEngine,
509552
asciiNFA: asciiNFAEngine,
510553
asciiBoundedBacktracker: asciiBT,
511554
dfa: engines.dfa,
@@ -534,7 +577,7 @@ func CompileRegexp(re *syntax.Regexp, config Config) (*Engine, error) {
534577
canMatchEmpty: canMatchEmpty,
535578
isStartAnchored: isStartAnchored,
536579
fatTeddyFallback: fatTeddyFallback,
537-
statePool: newSearchStatePool(nfaEngine, numCaptures),
580+
statePool: newSearchStatePool(pikevmNFA, numCaptures),
538581
stats: Stats{},
539582
}, nil
540583
}

meta/engine.go

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -50,6 +50,17 @@ type Engine struct {
5050

5151
nfa *nfa.NFA
5252

53+
// runeNFA is an NFA compiled with UseRuneStates=true (sparse dispatch).
54+
// Instead of ~9 split states chaining UTF-8 byte-range alternation branches
55+
// for each '.', it uses a single sparse state that maps each leading byte
56+
// range directly to its continuation chain. This gives PikeVM O(1) dispatch
57+
// instead of O(branches) split-chain DFS per byte position.
58+
// Measured 2.8-4.8x PikeVM speedup on dot-heavy patterns.
59+
//
60+
// This field is nil if the pattern doesn't contain '.' (no benefit).
61+
// The byte-level NFA (nfa field) remains unchanged for DFA/strategy use.
62+
runeNFA *nfa.NFA
63+
5364
// asciiNFA is an NFA compiled in ASCII-only mode (V11-002 optimization).
5465
// When the pattern contains '.' and input is ASCII-only (all bytes < 0x80),
5566
// this NFA is used instead of the main NFA. ASCII mode compiles '.' to

0 commit comments

Comments
 (0)