Scalable sub-component tile coloring for composition-based scripts (Korean, Tamil, Hindi, Chinese)

## Problem

Wordle Global currently treats every language's writing system as a simple alphabet: one character = one tile = one color. This works perfectly for Latin, Cyrillic, Arabic, and other scripts where each letter is an atomic unit. But for **composition-based scripts** — where characters are built from smaller phonetic components — this model breaks down.

### Korean (the immediate case)

Korean syllable blocks are composed of 2-3 jamo (consonants and vowels):
- 한 = ㅎ (initial) + ㅏ (vowel) + ㄴ (final)
- 글 = ㄱ (initial) + ㅡ (vowel) + ㄹ (final)

Our current approach (PR #155) decomposes words into individual jamo and puts one jamo per tile — 5 jamo per word. This creates several problems:

1. **Unnatural grid**: Korean speakers see `ㅎ ㅏ ㄴ ㄱ ㅡ ㄹ` (6 strokes) instead of `한 글` (2 syllables). It doesn't look like Korean.
2. **Unpredictable word length**: A 2-syllable word could be 4, 5, or 6 jamo depending on whether syllables have final consonants. "4-letter Korean Wordle" doesn't map to any natural Korean concept.
3. **Doesn't scale**: Variable word lengths, phrase-of-the-day, or difficulty modes (3-syllable → 5-syllable) are impossible because jamo count ≠ syllable count.
4. **65-character keyboard**: Because compound vowels (ㅘ, ㅙ) and compound jongseong (ㄺ, ㄻ) are separate characters, the keyboard needs 40+ keys across 5 rows. 꼬들 (kordle.kr) avoids this by decomposing everything to 26 basic jamo, but then needs 6 cells per word.
5. **IME conflict**: Physical Korean keyboards compose syllable blocks via the OS IME. Our game must either bypass the IME (current fix: `physical_key_map`) or decompose composed input — both are workarounds for a data model that fights the writing system.

### The same problem exists in other scripts

| Language | Natural unit | Components within unit | Speakers |
|---|---|---|---|
| **Korean** | Syllable block (한) | Initial consonant + vowel + optional final consonant | 80M |
| **Tamil** | Akshara (கா) | Consonant + vowel mark (matra) | 80M |
| **Hindi/Devanagari** | Akshara (क्षा) | Consonant(s) + vowel mark, with conjuncts | 600M |
| **Bengali** | Akshara (ক্ষা) | Same as Hindi | 230M |
| **Chinese** | Character (春) | Pinyin: initial + final + tone | 1.1B |
| **Thai** | Syllable (กาน) | Consonant + vowel (multi-position) + tone mark | 60M |
| **Khmer** | Syllable | Base + subscript consonants + vowel | 16M |

## Current approach (PR #155)

PR #155 fixes the immediate Korean keyboard bug (Unicode mismatch between Compatibility Jamo and Hangul Jamo) using the existing diacritic normalization system. It works but adds complexity:

- `diacritic_map` with 50+ Jamo mappings
- 5-row keyboard with compound vowel and double consonant keys
- Blocklist of 129 words with compound jongseong that can't be typed on the default keyboard
- `physical_key_map` to bypass IME for physical keyboards
- All of this to work around the fundamental mismatch between "one jamo per tile" and how Korean actually works

## Proposed solution: sub-component tile coloring

### The abstraction

Instead of decomposing characters into separate tiles, keep the natural linguistic unit as the tile and color its **sub-components independently**:

```
Current:     [ㅎ] [ㅏ] [ㄴ] [ㄱ] [ㅡ] [ㄹ]     (6 tiles, 1 color each)
             🟩   🟩   🟩   🟨   ⬜   🟩

Proposed:    [한]           [글]                  (2 tiles, 3 colors each)
              ㅎ=🟩 ㅏ=🟩 ㄴ=🟩   ㄱ=🟨 ㅡ=⬜ ㄹ=🟩
```

The data model would be:

```typescript
interface TileResult {
    display: string;              // "한" — what the player sees
    components: string[];         // ["ㅎ", "ㅏ", "ㄴ"] — what gets compared
    colors: ComponentColor[];     // ["correct", "correct", "correct"]
}
```

This single abstraction handles every script:
- **Latin/Cyrillic/Arabic**: 1 component per tile (current behavior, no change)
- **Korean**: 2-3 components (initial, vowel, final)
- **Tamil/Hindi**: 2-3 components (consonant, vowel mark, optional conjunct)
- **Chinese**: 3-4 components (character, pinyin initial, pinyin final, tone)

### Rendering approaches (by complexity)

1. **CSS diagonal gradient** (simplest, 2 signals): Split tile diagonally — top-left = consonant color, bottom-right = vowel color. Used by [Solladal](https://solladal.github.io/) (Tamil Wordle). ~5 lines of CSS.

2. **CSS absolute positioning** (medium, 3-5 signals): Main character centered, component indicators positioned around it as colored dots or small text. Used by [汉兜 (Handle)](https://handle.antfu.me/) (Chinese Wordle).

3. **SVG path decomposition** (most polished, 3-5 signals): Decompose the font glyph into separate SVG paths per component, color each path independently. Used by [한들 (Handle)](https://handle.wolim.net/) (Korean Wordle). Visually seamless — the syllable block looks normal but each jamo stroke is a different color. Requires a font with non-connected jamo paths.

### Benefits

- **Natural word lengths**: "5-letter word" = 5 syllables for Korean, 5 aksharas for Tamil/Hindi
- **Clean keyboard**: Korean needs only 26 basic jamo keys (3 rows), IME works natively
- **No blocklists**: No compound jongseong keyboard gap — they compose naturally within syllable blocks
- **Scalable**: Variable word lengths, phrase-of-the-day, and difficulty modes all trivial
- **More information per guess**: 3 color signals per tile instead of 1 — richer feedback for the player
- **Universal**: One tile system handles every current and future script

### Prior art

| Game | Script | Signals/cell | Technique |
|---|---|---|---|
| [한들 (Handle)](https://handle.wolim.net/) | Korean | 3-5 | SVG path decomposition |
| [汉兜 (Handle)](https://handle.antfu.me/) | Chinese | 5 | CSS positioned spans |
| [Solladal](https://solladal.github.io/) | Tamil | 2 | CSS diagonal gradient |
| [Shabdle](https://kach.github.io/shabdle/) | Hindi | 1 | No sub-coloring (chose not to) |
| [꼬들 (Kordle)](https://kordle.kr/) | Korean | 1 | Full decomposition (6 cells, avoids the problem) |

## Scope

This is a significant frontend architecture change — not a quick fix. It involves:

1. **Tile data model**: Extend from `string` to `{ display, components, colors }`
2. **Color algorithm**: Compare at component level, not character level
3. **Rendering**: Choose and implement a sub-coloring technique (CSS gradient → SVG path)
4. **Word list migration** (Korean): Re-encode from decomposed jamo to syllable blocks
5. **Per-language decomposition config**: Define how each script splits characters into components

PR #155 ships the immediate Korean fix using the current architecture. This issue tracks the long-term scalable solution.

## Affected languages

**Currently supported, would benefit:**
- Korean (ko) — most impacted, current workarounds are complex

**Not yet supported, would be unblocked:**
- Tamil, Hindi, Bengali, Thai, Khmer, Chinese, Japanese (kana — simple case, no sub-coloring needed but same tile model)

**Not affected (already work fine):**
- All Latin, Cyrillic, Arabic, Greek, Hebrew, Georgian, Armenian scripts

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Scalable sub-component tile coloring for composition-based scripts (Korean, Tamil, Hindi, Chinese) #157

Problem

Korean (the immediate case)

The same problem exists in other scripts

Current approach (PR #155)

Proposed solution: sub-component tile coloring

The abstraction

Rendering approaches (by complexity)

Benefits

Prior art

Scope

Affected languages

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Language	Natural unit	Components within unit	Speakers
Korean	Syllable block (한)	Initial consonant + vowel + optional final consonant	80M
Tamil	Akshara (கா)	Consonant + vowel mark (matra)	80M
Hindi/Devanagari	Akshara (क्षा)	Consonant(s) + vowel mark, with conjuncts	600M
Bengali	Akshara (ক্ষা)	Same as Hindi	230M
Chinese	Character (春)	Pinyin: initial + final + tone	1.1B
Thai	Syllable (กาน)	Consonant + vowel (multi-position) + tone mark	60M
Khmer	Syllable	Base + subscript consonants + vowel	16M

Game	Script	Signals/cell	Technique
한들 (Handle)	Korean	3-5	SVG path decomposition
汉兜 (Handle)	Chinese	5	CSS positioned spans
Solladal	Tamil	2	CSS diagonal gradient
Shabdle	Hindi	1	No sub-coloring (chose not to)
꼬들 (Kordle)	Korean	1	Full decomposition (6 cells, avoids the problem)

Scalable sub-component tile coloring for composition-based scripts (Korean, Tamil, Hindi, Chinese) #157

Description

Problem

Korean (the immediate case)

The same problem exists in other scripts

Current approach (PR #155)

Proposed solution: sub-component tile coloring

The abstraction

Rendering approaches (by complexity)

Benefits

Prior art

Scope

Affected languages

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions