Skip to content

Commit 3b4c57f

Browse files
authored
Merge pull request #1660 from Hack23/copilot/improve-translation-quality
Improve translation quality: enforce AI_MUST_REPLACE CI failure, extend banned-pattern detection, split translation dictionary
2 parents 17cc36d + 76868ec commit 3b4c57f

8 files changed

+1380
-1102
lines changed

.github/workflows/news-translate.md

Lines changed: 33 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -654,6 +654,39 @@ The TypeScript script generates **structural baselines only** — it translates
654654
3. **Raw Swedish API text** — Interpellation excerpts, proposition summaries that come from the Riksdag API are often pasted as-is. You MUST translate these to the target language or summarize them.
655655
4. **English boilerplate phrases** — Remove or translate phrases like "Read the full proposition", "Live intelligence platform for Swedish Parliament monitoring"
656656
5. **Section headings** that were not covered by CONTENT_LABELS (e.g., specific policy domain names used as h3/h4 headings)
657+
6. **🚨 AI_MUST_REPLACE HTML comments** — SCAN every HTML comment in the **translated** article. If any contains `AI_MUST_REPLACE`, you MUST generate replacement content in the target language. See critical section below.
658+
659+
#### 🚨 CRITICAL: AI_MUST_REPLACE Comment Handling
660+
661+
The content generator embeds placeholder HTML comments in the form:
662+
```html
663+
<!-- AI_MUST_REPLACE: marker_name — DATA: hint text. Write specific analysis here. Output MUST be in the article's language. -->
664+
```
665+
666+
**These comments MUST be replaced with real content before publication.** Leaving them in the article is a hard CI failure (exit 1). The translation workflow MUST:
667+
668+
1. **SCAN every HTML comment** in the translated article for `AI_MUST_REPLACE`
669+
2. **For each marker found**, read the `DATA:` hint inside the comment to understand what content to generate
670+
3. **Replace the entire `<!-- AI_MUST_REPLACE ... -->` comment** with genuine, specific analysis written in the **target language** (not English)
671+
4. **Use actual document data** (party names, vote counts, document titles) — NOT generic templates
672+
5. **Verify zero markers remain** before creating a PR
673+
674+
**Detection command (run before PR creation):**
675+
```bash
676+
grep -r 'AI_MUST_REPLACE' news/${ARTICLE_DATE}-*-${lang}.html && echo "❌ MARKERS FOUND — must replace before PR" || echo "✅ No markers found"
677+
```
678+
679+
**Common marker types and required output:**
680+
- `timeline_context` → Analysis of scheduling significance and political timing
681+
- `why_matters` → Specific explanation of why these documents matter politically
682+
- `political_impact` → Named-party analysis of political impact with vote arithmetic
683+
- `consequences` → Specific implementation consequences and next steps
684+
- `coalition_instability` → Current coalition stability indicators with evidence
685+
- `critical_assessment` → Critical evaluation of intent vs. likely outcomes
686+
- `single_party_dominance` → Analysis of why one party dominates
687+
- `debate_analysis` → Insights from debate data
688+
- `majority_impact` → Effect of thin majority on specific legislation
689+
- `winners_losers_analysis` → Political winners and losers analysis
657690

658691
#### Translation Completeness Check Process:
659692

scripts/check-banned-patterns.ts

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -3,6 +3,7 @@
33
*
44
* Sources banned patterns from the canonical BANNED_PATTERNS list in shared.ts
55
* so the bash script does not maintain a duplicate pattern set.
6+
* Also detects unresolved AI_MUST_REPLACE markers in HTML comments.
67
*
78
* Usage: npx tsx scripts/check-banned-patterns.ts news/*.html
89
* Exit code: number of articles containing banned patterns (0 = clean)
@@ -12,13 +13,23 @@
1213
import { readFileSync } from 'fs';
1314
import { detectBannedPatterns } from './data-transformers/content-generators/shared.js';
1415

16+
/** Regex for unresolved AI_MUST_REPLACE placeholders in HTML comments. */
17+
const AI_MUST_REPLACE_RE = /<!--[\s\S]*?AI_MUST_REPLACE[\s\S]*?-->/;
18+
1519
const files = process.argv.slice(2);
1620
let count = 0;
1721

1822
for (const file of files) {
1923
try {
2024
const html = readFileSync(file, 'utf-8');
2125
const labels = detectBannedPatterns(html);
26+
27+
// Detect AI_MUST_REPLACE markers inside HTML comments — these are
28+
// unresolved template placeholders that must never reach production.
29+
if (AI_MUST_REPLACE_RE.test(html)) {
30+
labels.push('aiMustReplaceComment: Unresolved AI_MUST_REPLACE placeholder in HTML comment');
31+
}
32+
2233
if (labels.length > 0) {
2334
count++;
2435
// Machine-readable output for the bash wrapper
Lines changed: 199 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,199 @@
1+
/**
2+
* @module Translation Dictionary — Committee Names
3+
* @description Committee names and Swedish parliamentary institution names
4+
* for all 14 supported languages.
5+
*
6+
* Split from translation-dictionary.ts for maintainability.
7+
* Imported and combined in translation-dictionary.ts.
8+
*/
9+
10+
import type { Language } from './types/language.js';
11+
12+
/**
13+
* Swedish parliamentary institution names and committee names.
14+
* Each entry: [Swedish term, per-language translations].
15+
*/
16+
export const COMMITTEE_NAME_TERMS: ReadonlyArray<readonly [string, Record<Language, string>]> = [
17+
// ---- Swedish parliamentary institution names ----
18+
[
19+
'riksdagen',
20+
{
21+
sv: 'riksdagen', en: 'the Riksdag', da: 'Riksdag', no: 'Riksdag',
22+
fi: 'Riksdag', de: 'Riksdag', fr: 'le Riksdag', es: 'el Riksdag',
23+
nl: 'de Riksdag', ar: 'البرلمان السويدي', he: 'הריקסדאג',
24+
ja: 'スウェーデン国会', ko: '스웨덴 의회', zh: '瑞典议会',
25+
},
26+
],
27+
[
28+
'regeringen',
29+
{
30+
sv: 'regeringen', en: 'the Government', da: 'regeringen', no: 'regjeringen',
31+
fi: 'hallitus', de: 'die Regierung', fr: 'le gouvernement', es: 'el gobierno',
32+
nl: 'de regering', ar: 'الحكومة', he: 'הממשלה',
33+
ja: '政府', ko: '정부', zh: '政府',
34+
},
35+
],
36+
// ---- Committee names ----
37+
[
38+
'arbetsmarknadsutskottet',
39+
{
40+
sv: 'arbetsmarknadsutskottet', en: 'Committee on Labour Market Affairs',
41+
da: 'Arbejdsmarkedsudvalget', no: 'Arbeidsmarkedskomiteen',
42+
fi: 'Työvaliokunta', de: 'Ausschuss für Arbeitsmarktangelegenheiten',
43+
fr: 'Comité du marché du travail', es: 'Comité de Mercado Laboral',
44+
nl: 'Commissie voor Arbeidsmarktzaken', ar: 'لجنة سوق العمل',
45+
he: 'ועדת שוק העבודה', ja: '労働市場委員会', ko: '노동시장위원회', zh: '劳动市场委员会',
46+
},
47+
],
48+
[
49+
'civilutskottet',
50+
{
51+
sv: 'civilutskottet', en: 'Committee on Civil Affairs',
52+
da: 'Civiludvalget', no: 'Sivilkomiteen', fi: 'Siviiliasioiden valiokunta',
53+
de: 'Ausschuss für Zivilrecht', fr: 'Comité des affaires civiles',
54+
es: 'Comité de Asuntos Civiles', nl: 'Commissie voor Burgerlijke Zaken',
55+
ar: 'لجنة الشؤون المدنية', he: 'ועדת ענייני אזרחות',
56+
ja: '市民問題委員会', ko: '민사문제위원회', zh: '民事委员会',
57+
},
58+
],
59+
[
60+
'finansutskottet',
61+
{
62+
sv: 'finansutskottet', en: 'Committee on Finance',
63+
da: 'Finansudvalget', no: 'Finanskomiteen', fi: 'Valtiovarainvaliokunta',
64+
de: 'Finanzausschuss', fr: 'Comité des finances',
65+
es: 'Comité de Finanzas', nl: 'Financiëncommissie',
66+
ar: 'لجنة المالية', he: 'ועדת האוצר',
67+
ja: '財政委員会', ko: '재정위원회', zh: '财政委员会',
68+
},
69+
],
70+
[
71+
'försvarsutskottet',
72+
{
73+
sv: 'försvarsutskottet', en: 'Committee on Defence',
74+
da: 'Forsvarsudvalget', no: 'Forsvarskomiteen', fi: 'Puolustusvaliokunta',
75+
de: 'Verteidigungsausschuss', fr: 'Comité de la défense',
76+
es: 'Comité de Defensa', nl: 'Defensiecommissie',
77+
ar: 'لجنة الدفاع', he: 'ועדת הביטחון', ja: '防衛委員会', ko: '방위위원회', zh: '国防委员会',
78+
},
79+
],
80+
[
81+
'justitieutskottet',
82+
{
83+
sv: 'justitieutskottet', en: 'Committee on Justice',
84+
da: 'Retsudvalget', no: 'Justiskomiteen', fi: 'Lakivaliokunta',
85+
de: 'Rechtsausschuss', fr: 'Comité de justice',
86+
es: 'Comité de Justicia', nl: 'Justitiecommissie',
87+
ar: 'لجنة العدل', he: 'ועדת המשפטים', ja: '司法委員会', ko: '법무위원회', zh: '司法委员会',
88+
},
89+
],
90+
[
91+
'konstitutionsutskottet',
92+
{
93+
sv: 'konstitutionsutskottet', en: 'Committee on the Constitution',
94+
da: 'Forfatningsudvalget', no: 'Konstitusjonskomiteen', fi: 'Perustuslakivaliokunta',
95+
de: 'Verfassungsausschuss', fr: 'Comité de la Constitution',
96+
es: 'Comité Constitucional', nl: 'Grondwetcommissie',
97+
ar: 'لجنة الدستور', he: 'ועדת החוקה', ja: '憲法委員会', ko: '헌법위원회', zh: '宪法委员会',
98+
},
99+
],
100+
[
101+
'kulturutskottet',
102+
{
103+
sv: 'kulturutskottet', en: 'Committee on Cultural Affairs',
104+
da: 'Kulturudvalget', no: 'Kulturkomiteen', fi: 'Kulttuurivaliokunta',
105+
de: 'Kulturausschuss', fr: 'Comité de la culture',
106+
es: 'Comité de Cultura', nl: 'Cultuurcommissie',
107+
ar: 'لجنة الثقافة', he: 'ועדת התרבות', ja: '文化委員会', ko: '문화위원회', zh: '文化委员会',
108+
},
109+
],
110+
[
111+
'miljö- och jordbruksutskottet',
112+
{
113+
sv: 'miljö- och jordbruksutskottet', en: 'Committee on Environment and Agriculture',
114+
da: 'Miljø- og Landbrugsudvalget', no: 'Miljø- og Landbrukskomiteen',
115+
fi: 'Ympäristö- ja maatalousvaliokunta',
116+
de: 'Ausschuss für Umwelt und Landwirtschaft',
117+
fr: 'Comité de l\'environnement et de l\'agriculture',
118+
es: 'Comité de Medio Ambiente y Agricultura',
119+
nl: 'Commissie voor Milieu en Landbouw',
120+
ar: 'لجنة البيئة والزراعة', he: 'ועדת הסביבה והחקלאות',
121+
ja: '環境農業委員会', ko: '환경농업위원회', zh: '环境农业委员会',
122+
},
123+
],
124+
[
125+
'näringsutskottet',
126+
{
127+
sv: 'näringsutskottet', en: 'Committee on Industry and Trade',
128+
da: 'Erhvervsudvalget', no: 'Næringskomiteen', fi: 'Talousvaliokunta',
129+
de: 'Ausschuss für Wirtschaft und Handel', fr: 'Comité de l\'industrie et du commerce',
130+
es: 'Comité de Industria y Comercio', nl: 'Commissie voor Industrie en Handel',
131+
ar: 'لجنة الصناعة والتجارة', he: 'ועדת התעשייה והמסחר',
132+
ja: '産業貿易委員会', ko: '산업통상위원회', zh: '工业贸易委员会',
133+
},
134+
],
135+
[
136+
'skatteutskottet',
137+
{
138+
sv: 'skatteutskottet', en: 'Committee on Taxation',
139+
da: 'Skatteudvalget', no: 'Skattekomiteen', fi: 'Verovaliokunta',
140+
de: 'Steuerausschuss', fr: 'Comité de la fiscalité',
141+
es: 'Comité Fiscal', nl: 'Belastingcommissie',
142+
ar: 'لجنة الضرائب', he: 'ועדת המיסים', ja: '税制委員会', ko: '세금위원회', zh: '税务委员会',
143+
},
144+
],
145+
[
146+
'socialförsäkringsutskottet',
147+
{
148+
sv: 'socialförsäkringsutskottet', en: 'Committee on Social Insurance',
149+
da: 'Socialforsikringsudvalget', no: 'Sosialforsikringskomiteen',
150+
fi: 'Sosiaalivakuutusvaliokunta',
151+
de: 'Ausschuss für Sozialversicherung', fr: 'Comité de l\'assurance sociale',
152+
es: 'Comité de Seguro Social', nl: 'Commissie voor Sociale Verzekering',
153+
ar: 'لجنة التأمين الاجتماعي', he: 'ועדת הביטוח הסוציאלי',
154+
ja: '社会保険委員会', ko: '사회보험위원회', zh: '社会保险委员会',
155+
},
156+
],
157+
[
158+
'socialutskottet',
159+
{
160+
sv: 'socialutskottet', en: 'Committee on Social Affairs',
161+
da: 'Socialudvalget', no: 'Sosialkomiteen', fi: 'Sosiaaliasioiden valiokunta',
162+
de: 'Sozialausschuss', fr: 'Comité des affaires sociales',
163+
es: 'Comité de Asuntos Sociales', nl: 'Sociale Commissie',
164+
ar: 'لجنة الشؤون الاجتماعية', he: 'ועדת הרווחה',
165+
ja: '社会問題委員会', ko: '사회문제위원회', zh: '社会事务委员会',
166+
},
167+
],
168+
[
169+
'trafikutskottet',
170+
{
171+
sv: 'trafikutskottet', en: 'Committee on Transport',
172+
da: 'Trafikudvalget', no: 'Transportkomiteen', fi: 'Liikennevaliokunta',
173+
de: 'Verkehrsausschuss', fr: 'Comité des transports',
174+
es: 'Comité de Transporte', nl: 'Transportcommissie',
175+
ar: 'لجنة المواصلات', he: 'ועדת התחבורה', ja: '交通委員会', ko: '교통위원회', zh: '交通委员会',
176+
},
177+
],
178+
[
179+
'utbildningsutskottet',
180+
{
181+
sv: 'utbildningsutskottet', en: 'Committee on Education',
182+
da: 'Uddannelsesudvalget', no: 'Utdanningskomiteen', fi: 'Koulutusvaliokunta',
183+
de: 'Bildungsausschuss', fr: 'Comité de l\'éducation',
184+
es: 'Comité de Educación', nl: 'Onderwijscommissie',
185+
ar: 'لجنة التعليم', he: 'ועדת החינוך', ja: '教育委員会', ko: '교육위원회', zh: '教育委员会',
186+
},
187+
],
188+
[
189+
'utrikesutskottet',
190+
{
191+
sv: 'utrikesutskottet', en: 'Committee on Foreign Affairs',
192+
da: 'Udenrigsudvalget', no: 'Utenrikskomiteen', fi: 'Ulkoasiainvaliokunta',
193+
de: 'Außenpolitischer Ausschuss', fr: 'Comité des affaires étrangères',
194+
es: 'Comité de Asuntos Exteriores', nl: 'Commissie voor Buitenlandse Zaken',
195+
ar: 'لجنة الشؤون الخارجية', he: 'ועדת החוץ',
196+
ja: '外務委員会', ko: '외무위원회', zh: '外交委员会',
197+
},
198+
],
199+
];

0 commit comments

Comments
 (0)