Skip to content

Reuse the precompiled regex for collection variable selectors#3573

Open
chweidling wants to merge 5 commits into
owasp-modsecurity:v3/masterfrom
chweidling:perf/precompiled-regex-collection-selector
Open

Reuse the precompiled regex for collection variable selectors#3573
chweidling wants to merge 5 commits into
owasp-modsecurity:v3/masterfrom
chweidling:perf/precompiled-regex-collection-selector

Conversation

@chweidling

@chweidling chweidling commented Jun 10, 2026

Copy link
Copy Markdown

what

  • Add a fast-path InMemoryPerProcess::resolveRegularExpression(const Utils::Regex *) overload that takes an already-compiled regex.
  • Tx_DictElementRegexp (the TX:/regex/ variable selector) now reuses its pre-compiled m_r: it dynamic_casts the TX collection to InMemoryPerProcess and calls the fast path if possible.
  • Adds a regression test for TX:/regex/ selection (positive + negative cases).

why

  • The collection backends' resolveRegularExpression(const std::string&, ...) built a fresh Utils::Regex — i.e. a pcre2_compile() and pcre2_jit_compile() — on every call.
  • For a regex variable selector such as TX:/regex/ that is evaluated per transaction (e.g. CRS 921180), this recompiled the same pattern on every request, even though the calling VariableRegex already holds it compiled once at configuration time in m_r. That compiled regex was simply being ignored.
  • Behaviour is unchanged: m_r is constructed with the same arguments — Utils::Regex(pattern, /*ignoreCase=*/true) — that the backend used, so the identical regex is applied; it is just compiled once instead of per transaction.
  • Measured (built against v3/master, gcc -O2, system PCRE2 with JIT):
    • Microbenchmark of the resolution call: reusing the compiled regex vs recompiling is ~1.6× faster for a short selector, and the saving grows with pattern length (it is the eliminated compile + JIT).
  • No behaviour change: the full regression suite passes (634 passed, 80 skipped — unchanged), including the new test.

references

chweidling and others added 2 commits June 10, 2026 11:27
resolveRegularExpression() in the collection backends compiled (and
JIT-compiled) a fresh Utils::Regex from the pattern string on every
call. For a regex variable selector such as TX:/regex/ that is evaluated
per transaction, this recompiled the same pattern on every request -
even though the calling VariableRegex already holds it compiled once at
configuration time in its m_r member.

Add a Collection::resolveRegularExpression(Utils::Regex *) overload that
accepts the pre-compiled regex. The base class keeps the previous
behaviour by default (it compiles from r->pattern and delegates), so
backends that do not override it are unaffected; InMemoryPerProcess
overrides it to scan the collection with the supplied regex directly.
Tx_DictElementRegexp now passes its already-compiled &m_r instead of the
pattern string.

Behaviour is unchanged: m_r is constructed with the same arguments
(Utils::Regex(pattern, /*ignoreCase=*/true)) the backend used, so the
identical regex is applied - it is just compiled once instead of per
transaction. A regression test covering TX:/regex/ selection is added.
@chweidling

Copy link
Copy Markdown
Author

The failing cppcheck check is unrelated to this PR. On a clean v3/master
checkout with no changes, make check-static already fails under cppcheck
2.21.0 (and passes under 2.20.0) — the job installs cppcheck unpinned, so the
analyzer version drifts. All 31 findings are in files this PR doesn't touch.

Filed as #3574.

All other CI jobs (Linux x32/x64 gcc+clang, all configs, macOS) pass.

@airween

airween commented Jun 16, 2026

Copy link
Copy Markdown
Member

Hi @chweidling,

could you pick these changes or update your branch?

Thanks!

@airween airween added the 3.x Related to ModSecurity version 3.x label Jun 16, 2026
@airween airween requested a review from Copilot June 16, 2026 18:54

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR optimizes TX:/regex/ collection variable selection by reusing the regex compiled at configuration time (in VariableRegex) instead of recompiling (and JIT-compiling) the same pattern on every transaction. It introduces a new Collection::resolveRegularExpression(Utils::Regex*) overload with a default fallback to the existing string-based method, and adds a regression test covering chained-rule behavior and non-matching key exclusion.

Changes:

  • Add a Collection::resolveRegularExpression(Utils::Regex*) overload (default implementation delegates to the string overload).
  • Update TX:/regex/ selection to pass the precompiled m_r, and implement a backend override in InMemoryPerProcess to use it directly.
  • Add regression coverage for precompiled TX:/regex/ selection (match and non-match cases).

Reviewed changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
headers/modsecurity/collection/collection.h Adds new public virtual overload accepting a compiled regex.
src/collection/collection.cc Implements the default fallback behavior for the new overload.
src/collection/backend/in_memory-per_process.h Declares backend override for compiled-regex resolution.
src/collection/backend/in_memory-per_process.cc Implements compiled-regex resolution and delegates string overload to it.
src/variables/tx.h Switches TX regex selector to pass the precompiled regex instead of the pattern string.
src/Makefile.am Adds the new collection.cc to the build sources.
test/test-cases/regression/variable-tx-regex-precompiled.json Adds regression tests validating the TX regex selector behavior.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +75 to +84
/*
* Resolve using a regular expression that was already compiled (e.g. the
* one held by a VariableRegex). This avoids recompiling - and JIT'ing - the
* same pattern on every transaction. The default implementation delegates
* to the string overload using the pattern text, so backends that do not
* override it keep their previous behaviour.
*/
virtual void resolveRegularExpression(Utils::Regex *r,
std::vector<const VariableValue *> *l,
variables::KeyExclusions &ke);

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch — this is accurate. Adding a virtual to Collection grows the
vtable, so a Collection backend built out-of-tree against an older
collection.h would be binary-incompatible with a newer libmodsecurity.

A few points on how I'd weigh it:

  • I kept it virtual on purpose. m_collections.m_tx_collection is typed as
    Collection*, so dispatching through the base class is the natural fit. The
    alternatives suggested here - a separate dynamic_cast-able interface, or a
    non-virtual helper keyed on the concrete type - add indirection at the call
    site for what is otherwise a drop-in. The method is also non-pure (it ships
    a default that delegates to the existing string overload), so neither in-tree
    nor out-of-tree backends are required to implement anything.
  • libmodsecurity is already SONAME-versioned via libtool -version-info
    (MSC_VERSION_INFO = current:revision:age, derived from the release version),
    so the mechanism to flag an ABI change is in place. The remaining question is
    a release-policy one rather than a code one.

@airween - this is your call: are you comfortable shipping this as an
ABI-affecting change with the corresponding version-info bump, or would you
prefer I keep the base vtable untouched and confine the optimization to the
concrete InMemoryPerProcess path instead? I can also move the new declaration
to the end of the class to minimize in-tree vtable churn if that helps. Happy
to take whichever direction you prefer.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Recently I've fixed something where the situation was the same like this. Then I solved it with the suggested method and created a helper.

Actually, I've been working on V4, where the collections will be replaced, because there are several issues. Those changes completely brakes the current API (this is why we need to increase the version).

If you deal with it, you can wait with this PR for V4. Otherwise, it would be nice to avoid to break the ABI.

Note: there will be one more release from v3.0 at least, because we have some security issues, so if you solve this, then it can be a part of the current code. I hope I can do that at the end of this month at last.

(Expected release of V4 is in autumn, or later...)

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reworked to avoid the ABI break. The introduced virtual methode in collection.h is removed, and the optimization is in InMemoryPerProcess only using a dynamic_cast.

Comment thread headers/modsecurity/collection/collection.h Outdated
Comment thread src/collection/collection.cc Outdated
chweidling and others added 3 commits June 17, 2026 16:17
Move the precompiled-regex fast path off the public Collection base class
(its vtable is unchanged) and onto the concrete `InMemoryPerProcess` backend.
`Tx_DictElementRegexp` reaches it via a guarded `dynamic_cast`, falling back to
the string-based resolution for any other backend. Drops the previously
added Collection `virtual` and `src/collection/collection.cc`.
@sonarqubecloud

Copy link
Copy Markdown

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 4 out of 4 changed files in this pull request and generated 1 comment.

Comment thread src/variables/tx.h
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

3.x Related to ModSecurity version 3.x

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants