Improve Translation Quality — Eliminate AI_MUST_REPLACE Leakage & Strengthen Validation Pipeline

## 📋 Issue Type
Bug Fix / Quality Improvement

## 🎯 Objective
Eliminate the persistent AI_MUST_REPLACE marker leakage in translated (non-EN) articles and strengthen the translation validation pipeline to catch content quality issues before articles reach production. Currently 35% of April 2026 articles still contain unresolved markers.

## 📊 Current State
- **AI_MUST_REPLACE leakage**: 146 out of 408 April 2026 articles contain AI_MUST_REPLACE markers (35.8%)
- **EN articles**: Clean — 0 markers in `2026-04-10-committee-reports-en.html`, `2026-04-09-opposition-motions-en.html`
- **Translated articles**: ES and FI articles have 2 markers each (e.g., `2026-04-09-opposition-motions-es.html`, `2026-04-09-opposition-motions-fi.html`)
- **Pattern**: EN articles are clean, AI_MUST_REPLACE markers leak into translations when AI agents translate template comments instead of replacing them
- **Translation validation**: `validate-news-translations.ts` v3.0 warns on EN/SV body-content leakage but does NOT fail CI (exit 0)
- **Swedish leakage detector**: `detect-swedish-leakage.ts` (366 lines) detects SV content in non-SV articles
- **Translation dictionary**: `translation-dictionary.ts` (3,673 lines) — massive file, hard to maintain
- **Banned patterns**: `check-banned-patterns.ts` checks for banned generic template text, sourcing from `shared.ts` BANNED_PATTERNS
- **shared.ts markers**: 17 AI_MUST_REPLACE comment-only HTML markers in content generator functions — designed to be invisible but translations sometimes expose them

## 🚀 Desired State
1. **Zero AI_MUST_REPLACE markers** in all published articles (EN and translated)
2. **CI-enforced validation**: `validate-news-translations.ts` should **fail CI** (exit 1) when AI_MUST_REPLACE markers are detected, not just warn
3. **Improved translation workflow**: `news-translate.md` workflow prompt should explicitly instruct AI to detect and replace ALL AI_MUST_REPLACE comments during translation
4. **Better marker detection**: Strengthen `check-banned-patterns.ts` to catch markers inside HTML comments (``) not just visible text
5. **Translation dictionary improvements**: Better coverage for political terminology across all 14 languages
6. **Content leakage prevention**: Strengthen EN→non-EN content leakage detection to catch English text in translated articles

## 🔧 Implementation Approach

### Files to modify (NO overlap with other issues):
- `scripts/validate-news-translations.ts` — enforce CI failure on AI_MUST_REPLACE detection
- `scripts/detect-swedish-leakage.ts` — extend to detect any source-language leakage patterns
- `scripts/translation-dictionary.ts` — improve political terminology coverage, split into manageable sections
- `scripts/check-banned-patterns.ts` — extend to detect markers inside HTML comments
- `scripts/validate-translations.ts` — improve validation coverage
- `scripts/statistical-claims-detector.ts` — validate statistical claims survive translation
- `.github/workflows/news-translate.md` — improve translation prompt to explicitly handle AI_MUST_REPLACE markers
- `.github/workflows/news-translate.lock.yml` — recompile after prompt update
- `scripts/validate-news-generation.sh` — update Check 15 for stricter enforcement

### Key improvements:
1. **Enforce CI failure**: Change `validate-news-translations.ts` to exit with code 1 when any article contains AI_MUST_REPLACE markers (currently warns only, exit 0)
2. **Extend banned pattern detection**: Update `check-banned-patterns.ts` to scan HTML comments for AI_MUST_REPLACE markers using regex ``
3. **Improve translation prompt**: Add explicit instruction in `news-translate.md`: "SCAN every HTML comment in the source article. If any contains 'AI_MUST_REPLACE', you MUST generate replacement content in the target language."
4. **Split translation dictionary**: Break `translation-dictionary.ts` (3,673 lines) into domain-specific files: `political-terms.ts`, `committee-names.ts`, `party-names.ts`, `general-terms.ts`
5. **Add marker-stripping step**: Add a post-translation cleanup step that strips any remaining AI_MUST_REPLACE markers with a warning, preventing them from reaching production
6. **Recompile workflow**: After updating `news-translate.md`, run `gh aw compile news-translate`

## 🤖 Recommended Agent
**quality-engineer** — Best expertise in validation pipelines, content quality gates, and testing

## ✅ Acceptance Criteria
- [ ] Zero AI_MUST_REPLACE markers in all newly generated articles (EN + all translations)
- [ ] `validate-news-translations.ts` exits with code 1 when markers detected
- [ ] `check-banned-patterns.ts` detects markers in HTML comments
- [ ] Translation prompt explicitly handles AI_MUST_REPLACE replacement
- [ ] `translation-dictionary.ts` split into ≤4 manageable domain files
- [ ] `news-translate.md` updated and recompiled to `.lock.yml`
- [ ] All existing tests pass (`npx vitest run`)
- [ ] Validation scripts correctly flag existing articles with markers

## 📚 References
- Translation Validator: `scripts/validate-news-translations.ts` (v3.0)
- Swedish Leakage: `scripts/detect-swedish-leakage.ts`
- Banned Patterns: `scripts/check-banned-patterns.ts`
- Translation Dict: `scripts/translation-dictionary.ts`
- Shared Markers: `scripts/data-transformers/content-generators/shared.ts` (17 markers)
- Validation Shell: `scripts/validate-news-generation.sh` (Check 15)
- Architecture: [ARCHITECTURE.md](https://github.com/Hack23/riksdagsmonitor/blob/main/ARCHITECTURE.md)

## 🏷️ Labels
`translation`, `validation`, `bug`, `code-quality`

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve Translation Quality — Eliminate AI_MUST_REPLACE Leakage & Strengthen Validation Pipeline #1656

📋 Issue Type

🎯 Objective

📊 Current State

🚀 Desired State

🔧 Implementation Approach

Files to modify (NO overlap with other issues):

Key improvements:

🤖 Recommended Agent

✅ Acceptance Criteria

📚 References

🏷️ Labels

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Improve Translation Quality — Eliminate AI_MUST_REPLACE Leakage & Strengthen Validation Pipeline #1656

Description

📋 Issue Type

🎯 Objective

📊 Current State

🚀 Desired State

🔧 Implementation Approach

Files to modify (NO overlap with other issues):

Key improvements:

🤖 Recommended Agent

✅ Acceptance Criteria

📚 References

🏷️ Labels

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions