Skip to content

Fix: Chunked regex with overlapping Matcher for boundary-safe data masking#959

Open
anthonygiuliano wants to merge 1 commit intojongpie:mainfrom
anthonygiuliano:fix/639-overlapping-chunk-data-masking
Open

Fix: Chunked regex with overlapping Matcher for boundary-safe data masking#959
anthonygiuliano wants to merge 1 commit intojongpie:mainfrom
anthonygiuliano:fix/639-overlapping-chunk-data-masking

Conversation

@anthonygiuliano
Copy link

Summary

Fixes #639System.LimitException: Regex too complicated when applyDataMaskRules processes long strings (~35K+ chars).

The existing chunking logic in applyDataMaskRuleToLongLine had three correctness issues that this PR addresses:

  • Longest-match deduplication: When two overlapping chunks find a match at the same start position, the old code kept whichever was found first (shorter context). Now keeps the longest match, since the chunk with more trailing context produces the most accurate result.
  • Sorted left-to-right processing: The old code processed matches in discovery order across chunks, which isn't guaranteed to be left-to-right. Now explicitly sorts start positions and skips matches consumed by previous replacements.
  • Single-pass expandReplacement: The old iterative String.replace('$' + i, groups[i]) reinterpreted $N patterns inside captured group values (e.g., if group 1 captured PRICE:$3, the $3 would be replaced with group 3's value). The new implementation scans the replacement template left-to-right, only interpreting $N in the template itself — matching Java's Matcher.appendReplacement behavior.

Test plan

  • 13 new targeted tests covering:
    • SSN in the overlap zone (no double-masking)
    • Multiple SSNs near the same chunk boundary
    • SSN at start/end of long strings
    • SSN exactly at the chunk step position
    • Credit card straddling a chunk boundary
    • Exact chunk-size and chunk-size+1 boundary conditions
    • Multiline where one line is exactly chunk size
    • Longer match wins when two chunks find same start
    • Left-to-right ordering across chunks (Map.keySet() has no guaranteed order in Apex)
    • Overlapping match skip (start < pos branch)
    • $N tokens inside captured group values are preserved literally (not re-expanded as group references)

🤖 Generated with Claude Code

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

System.LimitException: Regex too complicated from LogEntryEventBuilder.applyDataMaskRules:

1 participant