Skip to content

Scalable sub-component tile coloring for composition-based scripts (Korean, Tamil, Hindi, Chinese) #157

@Hugo0

Description

@Hugo0

Problem

Wordle Global currently treats every language's writing system as a simple alphabet: one character = one tile = one color. This works perfectly for Latin, Cyrillic, Arabic, and other scripts where each letter is an atomic unit. But for composition-based scripts — where characters are built from smaller phonetic components — this model breaks down.

Korean (the immediate case)

Korean syllable blocks are composed of 2-3 jamo (consonants and vowels):

  • 한 = ㅎ (initial) + ㅏ (vowel) + ㄴ (final)
  • 글 = ㄱ (initial) + ㅡ (vowel) + ㄹ (final)

Our current approach (PR #155) decomposes words into individual jamo and puts one jamo per tile — 5 jamo per word. This creates several problems:

  1. Unnatural grid: Korean speakers see ㅎ ㅏ ㄴ ㄱ ㅡ ㄹ (6 strokes) instead of 한 글 (2 syllables). It doesn't look like Korean.
  2. Unpredictable word length: A 2-syllable word could be 4, 5, or 6 jamo depending on whether syllables have final consonants. "4-letter Korean Wordle" doesn't map to any natural Korean concept.
  3. Doesn't scale: Variable word lengths, phrase-of-the-day, or difficulty modes (3-syllable → 5-syllable) are impossible because jamo count ≠ syllable count.
  4. 65-character keyboard: Because compound vowels (ㅘ, ㅙ) and compound jongseong (ㄺ, ㄻ) are separate characters, the keyboard needs 40+ keys across 5 rows. 꼬들 (kordle.kr) avoids this by decomposing everything to 26 basic jamo, but then needs 6 cells per word.
  5. IME conflict: Physical Korean keyboards compose syllable blocks via the OS IME. Our game must either bypass the IME (current fix: physical_key_map) or decompose composed input — both are workarounds for a data model that fights the writing system.

The same problem exists in other scripts

Language Natural unit Components within unit Speakers
Korean Syllable block (한) Initial consonant + vowel + optional final consonant 80M
Tamil Akshara (கா) Consonant + vowel mark (matra) 80M
Hindi/Devanagari Akshara (क्षा) Consonant(s) + vowel mark, with conjuncts 600M
Bengali Akshara (ক্ষা) Same as Hindi 230M
Chinese Character (春) Pinyin: initial + final + tone 1.1B
Thai Syllable (กาน) Consonant + vowel (multi-position) + tone mark 60M
Khmer Syllable Base + subscript consonants + vowel 16M

Current approach (PR #155)

PR #155 fixes the immediate Korean keyboard bug (Unicode mismatch between Compatibility Jamo and Hangul Jamo) using the existing diacritic normalization system. It works but adds complexity:

  • diacritic_map with 50+ Jamo mappings
  • 5-row keyboard with compound vowel and double consonant keys
  • Blocklist of 129 words with compound jongseong that can't be typed on the default keyboard
  • physical_key_map to bypass IME for physical keyboards
  • All of this to work around the fundamental mismatch between "one jamo per tile" and how Korean actually works

Proposed solution: sub-component tile coloring

The abstraction

Instead of decomposing characters into separate tiles, keep the natural linguistic unit as the tile and color its sub-components independently:

Current:     [ㅎ] [ㅏ] [ㄴ] [ㄱ] [ㅡ] [ㄹ]     (6 tiles, 1 color each)
             🟩   🟩   🟩   🟨   ⬜   🟩

Proposed:    [한]           [글]                  (2 tiles, 3 colors each)
              ㅎ=🟩 ㅏ=🟩 ㄴ=🟩   ㄱ=🟨 ㅡ=⬜ ㄹ=🟩

The data model would be:

interface TileResult {
    display: string;              // "한" — what the player sees
    components: string[];         // ["ㅎ", "ㅏ", "ㄴ"] — what gets compared
    colors: ComponentColor[];     // ["correct", "correct", "correct"]
}

This single abstraction handles every script:

  • Latin/Cyrillic/Arabic: 1 component per tile (current behavior, no change)
  • Korean: 2-3 components (initial, vowel, final)
  • Tamil/Hindi: 2-3 components (consonant, vowel mark, optional conjunct)
  • Chinese: 3-4 components (character, pinyin initial, pinyin final, tone)

Rendering approaches (by complexity)

  1. CSS diagonal gradient (simplest, 2 signals): Split tile diagonally — top-left = consonant color, bottom-right = vowel color. Used by Solladal (Tamil Wordle). ~5 lines of CSS.

  2. CSS absolute positioning (medium, 3-5 signals): Main character centered, component indicators positioned around it as colored dots or small text. Used by 汉兜 (Handle) (Chinese Wordle).

  3. SVG path decomposition (most polished, 3-5 signals): Decompose the font glyph into separate SVG paths per component, color each path independently. Used by 한들 (Handle) (Korean Wordle). Visually seamless — the syllable block looks normal but each jamo stroke is a different color. Requires a font with non-connected jamo paths.

Benefits

  • Natural word lengths: "5-letter word" = 5 syllables for Korean, 5 aksharas for Tamil/Hindi
  • Clean keyboard: Korean needs only 26 basic jamo keys (3 rows), IME works natively
  • No blocklists: No compound jongseong keyboard gap — they compose naturally within syllable blocks
  • Scalable: Variable word lengths, phrase-of-the-day, and difficulty modes all trivial
  • More information per guess: 3 color signals per tile instead of 1 — richer feedback for the player
  • Universal: One tile system handles every current and future script

Prior art

Game Script Signals/cell Technique
한들 (Handle) Korean 3-5 SVG path decomposition
汉兜 (Handle) Chinese 5 CSS positioned spans
Solladal Tamil 2 CSS diagonal gradient
Shabdle Hindi 1 No sub-coloring (chose not to)
꼬들 (Kordle) Korean 1 Full decomposition (6 cells, avoids the problem)

Scope

This is a significant frontend architecture change — not a quick fix. It involves:

  1. Tile data model: Extend from string to { display, components, colors }
  2. Color algorithm: Compare at component level, not character level
  3. Rendering: Choose and implement a sub-coloring technique (CSS gradient → SVG path)
  4. Word list migration (Korean): Re-encode from decomposed jamo to syllable blocks
  5. Per-language decomposition config: Define how each script splits characters into components

PR #155 ships the immediate Korean fix using the current architecture. This issue tracks the long-term scalable solution.

Affected languages

Currently supported, would benefit:

  • Korean (ko) — most impacted, current workarounds are complex

Not yet supported, would be unblocked:

  • Tamil, Hindi, Bengali, Thai, Khmer, Chinese, Japanese (kana — simple case, no sub-coloring needed but same tile model)

Not affected (already work fine):

  • All Latin, Cyrillic, Arabic, Greek, Hebrew, Georgian, Armenian scripts

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions