Can some linear-time regex engines be considered harmful? A runtime analysis of linear-time regex engines in the context of production software systems.
Linear-time Regex engines are considered the gold standard for reducing the risk of Regular Expression Denial of Service (ReDoS) attacks. However, engines that operate in linear-time can in theory still cause harm to software systems if the coefficient of the linear runtime is large enough. We investigate if any linear-time Regex engines found in either literature or libraries can be considered harmful in the context of production software systems, by causing a large enough stall in runtime.
Important
ReDoS Vulnerability Found! Use the following one-liner to run it if you have uv installed.
This code should timeout, as it tries to compute an exponentially large Regex.
uv run --with pyre2==0.3.10 python -c "import re2 as pyre2; pyre2.match('^(?=(a+)+b)\\w+$', 'a' * 50)"You can also run run_pyre2_timeout_simple.py to see proof of concept.
- Python code to run a regex pattern with many different libraries
- Code to interpret the runtime output
- A list of datasets for ReDoS
- include Python libraries
- include JavaScript / TypeScript libraries
- include Go libraries
- include Rust libraries
- include Re# and Dotnet library
- Vary input size and not just input pattern
- Make table of Regex libraries
- Collect more regex patterns from literature
- Draft up poster for initial review
- Make ReDoS test cases JSON
- Run scaling on tests in
./python/run_pyre2_timeout10_large.pyto check for exponential behavior
| Name | Language | Claimed to be linear |
|---|---|---|
| Re | Python | No |
| Dotnet Regex | C# | No |
| Regex | Python | Reduces backtracking chance but no guarantee |
| Rure | Python | Yes "guarantees linear time" |
| Pyre2 | Python | Yes "guarantees linear-time behavior" |
| RE# | C# | Yes "the main matching algorithm has input-linear complexity both in theory as well as experimentally" |
| Regolith | JavaScript | Yes "guarantees linear time" |
| RegExp | Go | Yes "guaranteed to run in time linear" |
| Regex | Rust | Yes "worst time O(m*nt)" |
These libraries were picked after I searched for "linear time regex library python". Re2 was removed from the test because it could not be installed. Similarly, Regexy was archived and out of date, so it too was excluded.
I use Python's default "re" library as a control even though it does not claim to be linear time.
Test 1 and 2 were done in just Python
- Test 1 -- Scaling Test
- Test 2 -- Preliminary Results
- Test 3 -- Dotnet & RE# Test
- Test 4 -- Check Python, TypeScript (bun runtime), and C# (.NET)
Each Regex pattern was run with an input size of 0 to 30 on all 4 of the tested Regex libraries. Each line represents a different Regex library, the y axis represents time on a log scale with a hard timeout at 2 seconds. The regex patterns where created by asking Claude Sonnet 4.5 for regex patterns that may lead to catastrophic backtracking.
Here is an example of one of the tests where both Regex and Re can be considered harmful.
Here is a list of each test run that links to its corresponding graph.
- Nested quantifiers (
^(a+)+$) - Nested quantifiers with Kleene star (
^(a*)*$) - Nested quantifiers with mismatch (
^(a+)+b$) - Alternation with overlapping patterns (
^(a|a)*$) - Alternation with prefix overlap (
^(a|ab)*$) - Multiple alternations (
(a|a|a|a|a|b)*c) - Triple nested groups (
^((a+)+)+$) - Nested Kleene star with suffix (
^(a*)*b$) - Nested plus with suffix (
^(a+)*b$) - Email-like pattern (ReDoS)
- Overlapping character classes lowercase (
^([a-z]+)+[A-Z]$) - Overlapping character classes alphanumeric (
^([0-9a-z]+)+[A-Z]$) - Wildcard nested quantifiers (
^(.*)*$) - Wildcard plus nested (
^(.+)+$) - Wildcard with suffix (
^(.*)+b$) - Multiple overlapping quantifiers (
^(a*)+b$) - Optional nested quantifiers (
^(a?)+b$) - Non-greedy nested quantifiers (
^(a*?)*b$) - Word boundary catastrophic (
^(\\w+\\s*)+$) - Word with spaces pattern (
^([\\w]+[\\s]*)*$) - Digit nested plus (
^(\\d+)+$) - Digit nested star (
^([0-9]+)*$) - Complex alternation plus (
^(a+|a+)+$) - Complex alternation star (
^(a*|a*)*$) - Alternation with length variation (
^(aa+|a+)+$) - URL pattern (simplified)
- Whitespace with letters (
^(\\s*a+\\s*)+$) - Whitespace alternation (
^(\\s+|a+)*b$) - Optional group patterns (
^(a+)?b?(a+)?$) - Optional with nested groups (
^(a+b?)+c$) - Character class repetition (
^([a-zA-Z]+)*$) - Alphanumeric with symbol (
^([a-z0-9]+)+[!]$) - Nested alternation simple (
^((a|b)+)+c$) - Nested alternation overlap (
^((a|ab)+)+c$) - Long repeating with suffix (
^(a+b)+c$) - Repeating pattern variation (
^(ab+)+c$)
| Name | Language | Claimed to be linear | Found to be harmful | Quantity of harmful results (out of 36) |
|---|---|---|---|---|
| Re | Python | No | Yes | 25 |
| Rure | Python | Yes "guarantees linear time" | No | 0 |
| Regex | Python | Reduces backtracking chance but no guarantee | Yes | 1 |
| Pyre2 | Python | Yes "guarantees linear-time behavior" | No | 0 |
This was the first test I ran where each pattern was run with a single input size. These results are preliminary and were to test if I was using a reasonable method for running regex patterns.
We run Program.cs with dotnet run. This tests runs 113 tests in both the RE# library and the default Dotnet Regex library. The RE# library has zero cases that can be considered harmful, but 75 cases that can be conspired harmful. Those results are expected, as the Dotnet Regex library does not claim to be linear-time and RE# does claim to be linear.
Included are the full results.
I standardized the tests into a JSON file called test_cases.json and changed how test cases are handled in Python, TS, and C# to use this test case file. I ran each language on these test cases and to get the results py_redos_test_results.json, ts_redos_test_results.json, csharp_redos_test_results.json. I then created results_table.py that produced a few graphs and tables.
A few takeaways:
- C# Regex is very vulnerable to ReDoS compared to the other languages, failing in 40 test cases for each 3 of the runs
- We did not find evidence that any library that claimed to be linear-time can be considered harmful
I had an issue installing https://pypi.org/project/re2.
I found a pull request from one of the authors of resharp where they optimize the dotnet regex library dotnet/runtime#102655
The source code for resharp has been moved or removed https://github.com/ieviev/resharp
You can install it from the library website https://www.nuget.org/packages/Resharp

