Skip to content

fix: remove diacritic maps for languages with distinct alphabet letters#176

Merged
Hugo0 merged 2 commits intomainfrom
fix/diacritic-maps-distinct-letters
Mar 16, 2026
Merged

fix: remove diacritic maps for languages with distinct alphabet letters#176
Hugo0 merged 2 commits intomainfrom
fix/diacritic-maps-distinct-letters

Conversation

@Hugo0
Copy link
Owner

@Hugo0 Hugo0 commented Mar 15, 2026

Fixes #175

Root cause

Commit 574ab2f bulk-added diacritic_map to 43 languages for accent-insensitive input. However, several languages had characters that are distinct alphabet letters, not accent variants. This caused the color algorithm to wrongly treat them as equivalent (e.g., Finnish ö showing yellow when the answer has o).

Fix

Reviewed all 43 languages individually and made per-language decisions:

Action Count Languages
Removed entire map 17 az, bg, da, et, hr, lt, ltg, lv, mk, mn, ro, sl, sq, sv, tk, tr, uk
Removed on-keyboard distinct chars, kept off-keyboard accent variants 9 cs, fi, fo, hu, is, lb, pl, ru, sk
Kept map (chars are genuine accent variants) 17 br, ckb, eo, eu, fa, fur, fy, ga, gd, hi, ie, mi, nds, oc, qya, tl, ur

Decision rule: If a character has its own key on the keyboard, it's a distinct letter and must not be normalized. Exception: allowlisted languages where this is intentional (e.g., German treats ö as a variant of o).

Automated test

Added test_diacritic_maps.py — any diacritic_map character that also appears as a keyboard key triggers a failure unless the language is in an explicit allowlist with justification.

Test plan

  • uv run pytest tests/test_diacritic_maps.py — 14 passed, 27 skipped (allowlisted)
  • Play Finnish — ö and o should be independent (no cross-coloring)
  • Play German — ö and o should still be treated as equivalent (allowlisted)
  • Play Czech — háček letters (č, š, ž) independent; long vowels (á, é) still equivalent

Summary by CodeRabbit

  • Chores
    • Simplified or removed diacritic mappings across many language configurations (Azerbaijani, Bulgarian, Czech, Danish, Estonian, Finnish, Hungarian, Polish, Romanian, Turkish, Ukrainian, and others).
  • Chores
    • Added a data-driven tool to update and normalize diacritic mappings across language configs.
  • Tests
    • Added automated tests to ensure diacritic mappings don’t conflict with keyboard characters and to catch regressions.

Commit 574ab2f bulk-added diacritic_maps to 43 languages, but several
had characters that are distinct letters (not accent variants):
- Finnish ä/ö/å, Swedish ä/ö/å, Danish å/æ/ø are the 27th-29th letters
- Turkish ç/ğ/ı/ö/ş/ü, Polish ą/ć/ę/ł etc. are separate alphabet entries
- This caused wrong tile colors (e.g., ö showing yellow when answer has o)

Fix: reviewed all 43 languages individually:
- 17 languages: removed entire diacritic_map (all chars distinct)
- 9 languages: removed on-keyboard distinct chars, kept off-keyboard
  accent variants (e.g., Czech: removed háčky, kept long vowels)
- 17 languages: kept map (chars are genuine accent variants)

Added automated test: any diacritic_map char that has its own keyboard
key triggers a test failure unless the language is explicitly allowlisted.

Fixes #175
@Hugo0 Hugo0 mentioned this pull request Mar 16, 2026
@coderabbitai
Copy link

coderabbitai bot commented Mar 16, 2026

Caution

Review failed

The pull request is closed.

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 26aa4942-6ba8-4b6a-b7f2-b56dc49b11d2

📥 Commits

Reviewing files that changed from the base of the PR and between 756f117 and e70a1c7.

📒 Files selected for processing (1)
  • scripts/fix_diacritic_maps.py

📝 Walkthrough

Walkthrough

This PR removes or trims diacritic_map entries across many data/languages/*/language_config.json files, adds scripts/fix_diacritic_maps.py to apply those edits programmatically, and introduces tests/test_diacritic_maps.py to validate diacritic maps against keyboard characters.

Changes

Cohort / File(s) Summary
Language configuration edits
data/languages/az/language_config.json, data/languages/bg/language_config.json, data/languages/cs/language_config.json, data/languages/da/language_config.json, data/languages/et/language_config.json, data/languages/fi/language_config.json, data/languages/fo/language_config.json, data/languages/hr/language_config.json, data/languages/hu/language_config.json, data/languages/is/language_config.json, data/languages/lb/language_config.json, data/languages/lt/language_config.json, data/languages/ltg/language_config.json, data/languages/lv/language_config.json, data/languages/mk/language_config.json, data/languages/mn/language_config.json, data/languages/pl/language_config.json, data/languages/ro/language_config.json, data/languages/ru/language_config.json, data/languages/sk/language_config.json, data/languages/sl/language_config.json, data/languages/sq/language_config.json, data/languages/sv/language_config.json, data/languages/tk/language_config.json, data/languages/tr/language_config.json, data/languages/uk/language_config.json
Removed or reduced diacritic_map content: some files had the entire diacritic_map removed, others had specific diacritic entries pruned (varies per language). No other UI keys were added.
Automation script
scripts/fix_diacritic_maps.py
New script that applies data-driven changes: constant sets REMOVE_ALL and REMOVE_SPECIFIC, fix_language(lang_code) to load/modify/save JSON, and main() to batch-process languages. Exposes constants and functions for reuse.
Validation tests
tests/test_diacritic_maps.py
New pytest suite that discovers languages with diacritic_map, extracts keyboard characters, and asserts diacritic entries do not overlap keyboard characters (with an allowlist). Adds helpers get_keyboard_chars() and get_languages_with_diacritic_maps() and parametrized tests.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

Poem

🐇 I pruned the maps with careful paws,
I hopped through keys and fixed the laws,
No phantom ö or misplaced ç,
A tidy script, a test that says,
Hooray — the languages hop free! 🎉

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 60.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The PR title accurately and concisely summarizes the main change: removing diacritic maps for languages where the mapped characters are distinct alphabet letters rather than accent variants.
Linked Issues check ✅ Passed The PR comprehensively addresses issue #175 by identifying and removing problematic diacritic mappings that treat distinct alphabet letters as equivalents, implementing per-language solutions, and adding automated validation tests.
Out of Scope Changes check ✅ Passed All changes directly support the stated objective of fixing diacritic map issues: language config updates remove problematic mappings, the fix script automates the corrections, and the test suite validates the solution.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
  • 📝 Generate docstrings (stacked PR)
  • 📝 Generate docstrings (commit on current branch)
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch fix/diacritic-maps-distinct-letters
📝 Coding Plan
  • Generate coding plan for human review comments

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@Hugo0 Hugo0 merged commit 70c4229 into main Mar 16, 2026
5 checks passed
@Hugo0 Hugo0 deleted the fix/diacritic-maps-distinct-letters branch March 16, 2026 00:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Wrong letters

1 participant