fix: remove diacritic maps for languages with distinct alphabet letters#176
fix: remove diacritic maps for languages with distinct alphabet letters#176
Conversation
Commit 574ab2f bulk-added diacritic_maps to 43 languages, but several had characters that are distinct letters (not accent variants): - Finnish ä/ö/å, Swedish ä/ö/å, Danish å/æ/ø are the 27th-29th letters - Turkish ç/ğ/ı/ö/ş/ü, Polish ą/ć/ę/ł etc. are separate alphabet entries - This caused wrong tile colors (e.g., ö showing yellow when answer has o) Fix: reviewed all 43 languages individually: - 17 languages: removed entire diacritic_map (all chars distinct) - 9 languages: removed on-keyboard distinct chars, kept off-keyboard accent variants (e.g., Czech: removed háčky, kept long vowels) - 17 languages: kept map (chars are genuine accent variants) Added automated test: any diacritic_map char that has its own keyboard key triggers a test failure unless the language is explicitly allowlisted. Fixes #175
|
Caution Review failedThe pull request is closed. ℹ️ Recent review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (1)
📝 WalkthroughWalkthroughThis PR removes or trims Changes
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches
🧪 Generate unit tests (beta)
📝 Coding Plan
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Fixes #175
Root cause
Commit 574ab2f bulk-added
diacritic_mapto 43 languages for accent-insensitive input. However, several languages had characters that are distinct alphabet letters, not accent variants. This caused the color algorithm to wrongly treat them as equivalent (e.g., Finnishöshowing yellow when the answer haso).Fix
Reviewed all 43 languages individually and made per-language decisions:
Decision rule: If a character has its own key on the keyboard, it's a distinct letter and must not be normalized. Exception: allowlisted languages where this is intentional (e.g., German treats ö as a variant of o).
Automated test
Added
test_diacritic_maps.py— anydiacritic_mapcharacter that also appears as a keyboard key triggers a failure unless the language is in an explicit allowlist with justification.Test plan
uv run pytest tests/test_diacritic_maps.py— 14 passed, 27 skipped (allowlisted)öandoshould be independent (no cross-coloring)öandoshould still be treated as equivalent (allowlisted)Summary by CodeRabbit