Skip to content

fix: add missing capture groups to multilingual named_months regex patterns#228

Open
Whishp wants to merge 2 commits into
AxDSan:mainfrom
Whishp:fix/named-months-missing-capture-groups
Open

fix: add missing capture groups to multilingual named_months regex patterns#228
Whishp wants to merge 2 commits into
AxDSan:mainfrom
Whishp:fix/named-months-missing-capture-groups

Conversation

@Whishp
Copy link
Copy Markdown
Contributor

@Whishp Whishp commented Jun 2, 2026

Problem

mnemosyne_remember raises IndexError: no such group when the input contains Russian, Italian, or Spanish date strings or Russian/Italian negation sentences. This is triggered inside extract_and_store_facts() where the code unconditionally calls m.group(1) on pattern matches that have zero capture groups.

Root cause

named_months patterns (original fix)

Language Old pattern Groups Problem
Russian (ru) r'(?:...)... 0 m.group(1) → IndexError
Italian (it) r'(?:...)... 0 m.group(1) → IndexError
Spanish (es) `r'(\d{1,2})...de...(enero ...)...(?:de(\d{4}))?'` 3

English and German named_months patterns were correct (1 capture group).

negation patterns (added in v2)

Language Old pattern Groups Problem
Russian (ru) r'(?:...)... 0 m.group(1) → IndexError
Italian (it) r'(?:...)... 0 m.group(1) → IndexError

English, German, and Spanish negation patterns already had >=1 capture group.

Fix

named_months

  1. Russian named_months: wrapped (?:...)((?:...)) — now has 1 capture group
  2. Italian named_months: same fix
  3. Spanish named_months: consolidated all 3 groups into 1 outer group

negation

  1. Russian negation: wrapped (?:...)((?:...)) — now has 1 capture group
  2. Italian negation: same fix

Defense-in-depth

  1. Wrapped all three extract_and_store_facts() calls (remember() dedup path, remember() new-row path, remember_batch()) in try/except so regex extraction failures never block memory storage

Impact

Any Russian, Italian, or Spanish text containing date references or Russian/Italian negation phrases causes remember() to crash. The try/except guard prevents similar undiscovered pattern issues from affecting memory writes.

Full verification

All 35 patterns (5 languages × 7 types: named_months, negation, decision, entity, sequence, instruction, preference) were programmatically verified:

  • All have >=1 capture group
  • entity patterns have exactly 2 (used with m.group(1) + m.group(2))
  • Runtime tested with real text samples for all languages
  • All compiled without regex syntax errors

Statistics

  • Files changed: 1 (mnemosyne/core/beam.py)
  • Additions: +18, Deletions: -6

…tterns

Russian and Italian named_months regex patterns used non-capturing groups
(?:...) only, leaving zero capture groups in the pattern. The code at
line 3562 unconditionally calls m.group(1), which raises IndexError
('no such group') when the pattern has no capture groups.

Spanish named_months had 3 capture groups but the code only reads
m.group(1), producing broken named_date entries.

Affected languages:
- Russian (ru): zero groups  -> 1 group
- Italian (it): zero groups  -> 1 group
- Spanish (es): 3 groups     -> 1 group (consolidated)

Also added try/except guards around all three extract_and_store_facts()
calls in remember() dedup path, remember() new-row path, and
remember_batch() so that any future regex extraction errors never block
memory storage.

This fix is critical for multilingual users - any Russian, Italian, or
Spanish text containing dates (e.g. '2 июня 2026', 'Gennaio 15, 2024',
'15 de enero de 2024') would silently crash memory writes. The try/except
defense ensures even if other patterns have the same bug, memory storage
continues to work.
@AxDSan AxDSan added bug Something isn't working core labels Jun 2, 2026
@AxDSan AxDSan self-assigned this Jun 2, 2026
Two more patterns had 0 capture groups despite code using .group(1):

- ru.negation: wrapped non-capturing (?:...) in capturing ((?:...))
- it.negation: same fix

Full automated verification: all 35 patterns (5 langs x 7 types)
now have >=1 capture group. entity patterns have 2.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working core

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants