[pull] main from VirusTotal:main#236
Merged
Merged
Conversation
This PR optimizes the atom extraction process to prevent generating an excessive number of atoms for patterns with optional suffixes. It also fixes a bug where exact prefix atoms could prematurely match and bypass the regex VM, leading to incorrect (non-greedy) match results.
Summary of Changes
Modified concat_seq in lib/src/re/thompson/compiler.rs to identify and ignore optional sequences at the tail of the concatenation when extracting candidate atoms. For example, in (com|net)[^{}]{0,100}, the [^{}]{0,100} suffix is optional and can be empty. We now stop concatenating before this suffix, avoiding the generation of a massive number of atoms.
If any tail sequences are skipped, the resulting atoms are correctly marked as inexact to force the regex VM to run and verify the optional suffix.
If a shorter atom is a prefix of a longer atom (e.g., ab and abc), the shorter one is made inexact. This forces the engine to run the regex VM to ensure greedy matching (e.g., matching abbb instead of stopping at abb for /a(bb|b)b/). Since the shorter inexact atom will trigger the VM to match the longer path anyway, the longer atom (e.g., abc) is redundant and is completely removed from the set. If two atoms have the same bytes but different exactness (one exact and one inexact), the exact one is removed.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
See Commits and Changes for more details.
Created by
pull[bot] (v2.0.0-alpha.4)
Can you help keep this open source service alive? 💖 Please sponsor : )