Skip to content

[pull] main from VirusTotal:main#236

Merged
pull[bot] merged 1 commit into
threatcode:mainfrom
VirusTotal:main
Jun 5, 2026
Merged

[pull] main from VirusTotal:main#236
pull[bot] merged 1 commit into
threatcode:mainfrom
VirusTotal:main

Conversation

@pull

@pull pull Bot commented Jun 5, 2026

Copy link
Copy Markdown

See Commits and Changes for more details.


Created by pull[bot] (v2.0.0-alpha.4)

Can you help keep this open source service alive? 💖 Please sponsor : )

This PR optimizes the atom extraction process to prevent generating an excessive number of atoms for patterns with optional suffixes. It also fixes a bug where exact prefix atoms could prematurely match and bypass the regex VM, leading to incorrect (non-greedy) match results.

Summary of Changes
Modified concat_seq in lib/src/re/thompson/compiler.rs to identify and ignore optional sequences at the tail of the concatenation when extracting candidate atoms. For example, in (com|net)[^{}]{0,100}, the [^{}]{0,100} suffix is optional and can be empty. We now stop concatenating before this suffix, avoiding the generation of a massive number of atoms.
If any tail sequences are skipped, the resulting atoms are correctly marked as inexact to force the regex VM to run and verify the optional suffix.

If a shorter atom is a prefix of a longer atom (e.g., ab and abc), the shorter one is made inexact. This forces the engine to run the regex VM to ensure greedy matching (e.g., matching abbb instead of stopping at abb for /a(bb|b)b/). Since the shorter inexact atom will trigger the VM to match the longer path anyway, the longer atom (e.g., abc) is redundant and is completely removed from the set. If two atoms have the same bytes but different exactness (one exact and one inexact), the exact one is removed.
@pull pull Bot locked and limited conversation to collaborators Jun 5, 2026
@pull pull Bot added the ⤵️ pull label Jun 5, 2026
@pull pull Bot merged commit f301811 into threatcode:main Jun 5, 2026
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant