fix: add missing capture groups to multilingual named_months regex patterns#228
Open
Whishp wants to merge 2 commits into
Open
fix: add missing capture groups to multilingual named_months regex patterns#228Whishp wants to merge 2 commits into
Whishp wants to merge 2 commits into
Conversation
…tterns
Russian and Italian named_months regex patterns used non-capturing groups
(?:...) only, leaving zero capture groups in the pattern. The code at
line 3562 unconditionally calls m.group(1), which raises IndexError
('no such group') when the pattern has no capture groups.
Spanish named_months had 3 capture groups but the code only reads
m.group(1), producing broken named_date entries.
Affected languages:
- Russian (ru): zero groups -> 1 group
- Italian (it): zero groups -> 1 group
- Spanish (es): 3 groups -> 1 group (consolidated)
Also added try/except guards around all three extract_and_store_facts()
calls in remember() dedup path, remember() new-row path, and
remember_batch() so that any future regex extraction errors never block
memory storage.
This fix is critical for multilingual users - any Russian, Italian, or
Spanish text containing dates (e.g. '2 июня 2026', 'Gennaio 15, 2024',
'15 de enero de 2024') would silently crash memory writes. The try/except
defense ensures even if other patterns have the same bug, memory storage
continues to work.
Two more patterns had 0 capture groups despite code using .group(1): - ru.negation: wrapped non-capturing (?:...) in capturing ((?:...)) - it.negation: same fix Full automated verification: all 35 patterns (5 langs x 7 types) now have >=1 capture group. entity patterns have 2.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
mnemosyne_rememberraisesIndexError: no such groupwhen the input contains Russian, Italian, or Spanish date strings or Russian/Italian negation sentences. This is triggered insideextract_and_store_facts()where the code unconditionally callsm.group(1)on pattern matches that have zero capture groups.Root cause
named_months patterns (original fix)
r'(?:...)...m.group(1)→ IndexErrorr'(?:...)...m.group(1)→ IndexErrorEnglish and German named_months patterns were correct (1 capture group).
negation patterns (added in v2)
r'(?:...)...m.group(1)→ IndexErrorr'(?:...)...m.group(1)→ IndexErrorEnglish, German, and Spanish negation patterns already had >=1 capture group.
Fix
named_months
named_months: wrapped(?:...)→((?:...))— now has 1 capture groupnamed_months: same fixnamed_months: consolidated all 3 groups into 1 outer groupnegation
negation: wrapped(?:...)→((?:...))— now has 1 capture groupnegation: same fixDefense-in-depth
extract_and_store_facts()calls (remember()dedup path,remember()new-row path,remember_batch()) intry/exceptso regex extraction failures never block memory storageImpact
Any Russian, Italian, or Spanish text containing date references or Russian/Italian negation phrases causes
remember()to crash. Thetry/exceptguard prevents similar undiscovered pattern issues from affecting memory writes.Full verification
All 35 patterns (5 languages × 7 types: named_months, negation, decision, entity, sequence, instruction, preference) were programmatically verified:
Statistics
mnemosyne/core/beam.py)