π Issue Type
Bug Fix / Quality Improvement
π― Objective
Eliminate the persistent AI_MUST_REPLACE marker leakage in translated (non-EN) articles and strengthen the translation validation pipeline to catch content quality issues before articles reach production. Currently 35% of April 2026 articles still contain unresolved markers.
π Current State
- AI_MUST_REPLACE leakage: 146 out of 408 April 2026 articles contain AI_MUST_REPLACE markers (35.8%)
- EN articles: Clean β 0 markers in
2026-04-10-committee-reports-en.html, 2026-04-09-opposition-motions-en.html
- Translated articles: ES and FI articles have 2 markers each (e.g.,
2026-04-09-opposition-motions-es.html, 2026-04-09-opposition-motions-fi.html)
- Pattern: EN articles are clean, AI_MUST_REPLACE markers leak into translations when AI agents translate template comments instead of replacing them
- Translation validation:
validate-news-translations.ts v3.0 warns on EN/SV body-content leakage but does NOT fail CI (exit 0)
- Swedish leakage detector:
detect-swedish-leakage.ts (366 lines) detects SV content in non-SV articles
- Translation dictionary:
translation-dictionary.ts (3,673 lines) β massive file, hard to maintain
- Banned patterns:
check-banned-patterns.ts checks for banned generic template text, sourcing from shared.ts BANNED_PATTERNS
- shared.ts markers: 17 AI_MUST_REPLACE comment-only HTML markers in content generator functions β designed to be invisible but translations sometimes expose them
π Desired State
- Zero AI_MUST_REPLACE markers in all published articles (EN and translated)
- CI-enforced validation:
validate-news-translations.ts should fail CI (exit 1) when AI_MUST_REPLACE markers are detected, not just warn
- Improved translation workflow:
news-translate.md workflow prompt should explicitly instruct AI to detect and replace ALL AI_MUST_REPLACE comments during translation
- Better marker detection: Strengthen
check-banned-patterns.ts to catch markers inside HTML comments (<!-- AI_MUST_REPLACE -->) not just visible text
- Translation dictionary improvements: Better coverage for political terminology across all 14 languages
- Content leakage prevention: Strengthen ENβnon-EN content leakage detection to catch English text in translated articles
π§ Implementation Approach
Files to modify (NO overlap with other issues):
scripts/validate-news-translations.ts β enforce CI failure on AI_MUST_REPLACE detection
scripts/detect-swedish-leakage.ts β extend to detect any source-language leakage patterns
scripts/translation-dictionary.ts β improve political terminology coverage, split into manageable sections
scripts/check-banned-patterns.ts β extend to detect markers inside HTML comments
scripts/validate-translations.ts β improve validation coverage
scripts/statistical-claims-detector.ts β validate statistical claims survive translation
.github/workflows/news-translate.md β improve translation prompt to explicitly handle AI_MUST_REPLACE markers
.github/workflows/news-translate.lock.yml β recompile after prompt update
scripts/validate-news-generation.sh β update Check 15 for stricter enforcement
Key improvements:
- Enforce CI failure: Change
validate-news-translations.ts to exit with code 1 when any article contains AI_MUST_REPLACE markers (currently warns only, exit 0)
- Extend banned pattern detection: Update
check-banned-patterns.ts to scan HTML comments for AI_MUST_REPLACE markers using regex <!--[^>]*AI_MUST_REPLACE[^>]*-->
- Improve translation prompt: Add explicit instruction in
news-translate.md: "SCAN every HTML comment in the source article. If any contains 'AI_MUST_REPLACE', you MUST generate replacement content in the target language."
- Split translation dictionary: Break
translation-dictionary.ts (3,673 lines) into domain-specific files: political-terms.ts, committee-names.ts, party-names.ts, general-terms.ts
- Add marker-stripping step: Add a post-translation cleanup step that strips any remaining AI_MUST_REPLACE markers with a warning, preventing them from reaching production
- Recompile workflow: After updating
news-translate.md, run gh aw compile news-translate
π€ Recommended Agent
quality-engineer β Best expertise in validation pipelines, content quality gates, and testing
β
Acceptance Criteria
π References
- Translation Validator:
scripts/validate-news-translations.ts (v3.0)
- Swedish Leakage:
scripts/detect-swedish-leakage.ts
- Banned Patterns:
scripts/check-banned-patterns.ts
- Translation Dict:
scripts/translation-dictionary.ts
- Shared Markers:
scripts/data-transformers/content-generators/shared.ts (17 markers)
- Validation Shell:
scripts/validate-news-generation.sh (Check 15)
- Architecture: ARCHITECTURE.md
π·οΈ Labels
translation, validation, bug, code-quality
π Issue Type
Bug Fix / Quality Improvement
π― Objective
Eliminate the persistent AI_MUST_REPLACE marker leakage in translated (non-EN) articles and strengthen the translation validation pipeline to catch content quality issues before articles reach production. Currently 35% of April 2026 articles still contain unresolved markers.
π Current State
2026-04-10-committee-reports-en.html,2026-04-09-opposition-motions-en.html2026-04-09-opposition-motions-es.html,2026-04-09-opposition-motions-fi.html)validate-news-translations.tsv3.0 warns on EN/SV body-content leakage but does NOT fail CI (exit 0)detect-swedish-leakage.ts(366 lines) detects SV content in non-SV articlestranslation-dictionary.ts(3,673 lines) β massive file, hard to maintaincheck-banned-patterns.tschecks for banned generic template text, sourcing fromshared.tsBANNED_PATTERNSπ Desired State
validate-news-translations.tsshould fail CI (exit 1) when AI_MUST_REPLACE markers are detected, not just warnnews-translate.mdworkflow prompt should explicitly instruct AI to detect and replace ALL AI_MUST_REPLACE comments during translationcheck-banned-patterns.tsto catch markers inside HTML comments (<!-- AI_MUST_REPLACE -->) not just visible textπ§ Implementation Approach
Files to modify (NO overlap with other issues):
scripts/validate-news-translations.tsβ enforce CI failure on AI_MUST_REPLACE detectionscripts/detect-swedish-leakage.tsβ extend to detect any source-language leakage patternsscripts/translation-dictionary.tsβ improve political terminology coverage, split into manageable sectionsscripts/check-banned-patterns.tsβ extend to detect markers inside HTML commentsscripts/validate-translations.tsβ improve validation coveragescripts/statistical-claims-detector.tsβ validate statistical claims survive translation.github/workflows/news-translate.mdβ improve translation prompt to explicitly handle AI_MUST_REPLACE markers.github/workflows/news-translate.lock.ymlβ recompile after prompt updatescripts/validate-news-generation.shβ update Check 15 for stricter enforcementKey improvements:
validate-news-translations.tsto exit with code 1 when any article contains AI_MUST_REPLACE markers (currently warns only, exit 0)check-banned-patterns.tsto scan HTML comments for AI_MUST_REPLACE markers using regex<!--[^>]*AI_MUST_REPLACE[^>]*-->news-translate.md: "SCAN every HTML comment in the source article. If any contains 'AI_MUST_REPLACE', you MUST generate replacement content in the target language."translation-dictionary.ts(3,673 lines) into domain-specific files:political-terms.ts,committee-names.ts,party-names.ts,general-terms.tsnews-translate.md, rungh aw compile news-translateπ€ Recommended Agent
quality-engineer β Best expertise in validation pipelines, content quality gates, and testing
β Acceptance Criteria
validate-news-translations.tsexits with code 1 when markers detectedcheck-banned-patterns.tsdetects markers in HTML commentstranslation-dictionary.tssplit into β€4 manageable domain filesnews-translate.mdupdated and recompiled to.lock.ymlnpx vitest run)π References
scripts/validate-news-translations.ts(v3.0)scripts/detect-swedish-leakage.tsscripts/check-banned-patterns.tsscripts/translation-dictionary.tsscripts/data-transformers/content-generators/shared.ts(17 markers)scripts/validate-news-generation.sh(Check 15)π·οΈ Labels
translation,validation,bug,code-quality