Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
15 commits
Select commit Hold shift + click to select a range
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
12 changes: 9 additions & 3 deletions .github/workflows/news-evening-analysis.md
Original file line number Diff line number Diff line change
Expand Up @@ -300,7 +300,7 @@ fi

### MCP Health Gate

Call `get_sync_status({})` first; retry up to 3× (30s wait). After 3 failures → `safeoutputs___noop({"message": "MCP unavailable"})`. All content MUST come from live MCP data.
STEP 1: ALWAYS check data freshness first — call `get_sync_status({})` to warm up MCP and check stale data. Retry up to 3× (30s wait). After 3 failures → `safeoutputs___noop({"message": "MCP unavailable"})`. All content MUST come from live MCP data.

### DATA FRESHNESS CHECK

Expand Down Expand Up @@ -329,15 +329,18 @@ const recent = results.filter(item =>
);
```

**Tools with native date params:** `get_calendar_events` (`from`/`tom`), `search_regering` + `analyze_g0v_by_department` (`dateFrom`/`dateTo`).
**Tools requiring post-query filter:** `search_voteringar` (`datum`), `get_betankanden` (`publicerad`), `get_motioner` (`inlämnad`), `get_propositioner` (`publicerad`), `search_anforanden` (`datum`).
**Tools with native date params:** `get_calendar_events` supports `from`/`tom`, `search_regering` + `analyze_g0v_by_department` supports `dateFrom`/`dateTo`.
**Tools requiring post-query filter:** `search_voteringar` (filter by `datum`), `get_betankanden` (filter by `publicerad`), `get_motioner` (filter by `inlämnad`), `get_propositioner` (filter by `publicerad`), `search_anforanden` (filter by `datum`).

### ⚠️ Calendar API Fallback

`get_calendar_events` intermittently returns HTML. If it fails: (1) do NOT treat failure as "no events"; (2) use `search_dokument({ from_date, to_date, doktyp: "bet" })` as a proxy for active parliamentary work; (3) flag the error in output. Calendar failure must never block article generation from other sources.

### Cross-Referencing Strategy

Cross-reference related data sources to produce richer analysis. Combine committee reports, voting records, propositions, and motions for comprehensive coverage.

**Example 1:** Link committee reports with voting records to show how parties voted on specific policy areas:
```javascript
// Committee Report Deep Dive
const fromDateIso = new Date(Date.now() - 7 * 86400000).toISOString().slice(0, 10);
Expand All @@ -346,7 +349,10 @@ const reports = (await get_betankanden({ rm: currentRm }))
for (const report of reports) {
const votes = await search_voteringar({ bet: report.beteckning });
}
```

**Example 2:** Cross-reference government propositions with press releases and party speeches:
```javascript
// Government Activity Analysis
const props = (await get_propositioner({ rm: currentRm, limit: 20 }))
.filter(p => (p.publicerad || '').slice(0, 10) >= fromDateIso);
Expand Down
201 changes: 201 additions & 0 deletions scripts/article-quality-enhancer.ts
Original file line number Diff line number Diff line change
Expand Up @@ -62,6 +62,9 @@ const DEFAULT_THRESHOLDS: QualityThresholds = {
recommendInternationalComparison: false,
recommendEconomicContext: true,
recommendSCBContext: true,
recommendWhatHappensNext: true,
recommendWinnersLosers: true,
minSpecificClaims: 3,
};

/**
Expand Down Expand Up @@ -179,6 +182,174 @@ function hasWhyThisMatters(content: string): boolean {
return patterns.some((pattern: RegExp) => pattern.test(content));
}

/**
* Detect "What Happens Next" timeline section.
* Looks for the rendered section class or heading variants across supported languages.
* Covers: en, sv, da, no, fi, de, fr, es, nl, ar, he, ja, ko, zh.
*
* @param content - HTML content of article
* @returns True if the section is present
*/
function hasWhatHappensNext(content: string): boolean {
Comment on lines +185 to +193
Copy link

Copilot AI Apr 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

New validation helpers (hasWhatHappensNext, hasWinnersLosers, countSpecificClaims, hasSubstantialLede, countSectionWords) are introduced here, but there don’t appear to be corresponding unit tests (existing suites already test other article-quality-enhancer helpers). Adding focused tests for positive/negative matches would prevent regressions in these regex-based detectors.

Copilot uses AI. Check for mistakes.
const patterns: readonly RegExp[] = [
/class=["'][^"']*\bwhat-happens-next\b/,
/what\s+happens\s+next/i,
/vad\s+händer\s+härnäst/i,
/hvad\s+sker\s+der\s+nu/i,
/hva\s+skjer\s+videre/i,
/mitä\s+tapahtuu\s+seuraavaksi/i,
/was\s+passiert\s+als\s+nächstes/i,
/la\s+suite\s+des\s+événements/i,
/qué\s+sucede\s+a\s+continuación/i,
/wat\s+gebeurt\s+er\s+nu/i,
/ماذا يحدث بعد ذلك/,
/מה קורה בהמשך/,
/次のステップ/,
/다음\s+단계/,
/下一步/,
];
return patterns.some((p: RegExp) => p.test(content));
}

/**
* Detect "Winners & Losers" analysis section.
* Looks for the rendered section class or heading variants across supported languages.
* Covers: en, sv, da, no, fi, de, fr, es, nl, ar, he, ja, ko, zh.
*
* @param content - HTML content of article
* @returns True if the section is present
*/
Comment on lines +185 to +221
Copy link

Copilot AI Apr 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hasWhatHappensNext()/hasWinnersLosers() docstrings claim heading detection “in 14 languages”, but the regex lists only cover a subset of languages. Either update the comment to match the current behavior, or add the missing language title variants so the documentation stays accurate (especially if the class-based marker is ever absent).

Copilot uses AI. Check for mistakes.
function hasWinnersLosers(content: string): boolean {
const patterns: readonly RegExp[] = [
/class=["'][^"']*\bwinners-losers\b/,
/winners\s*(?:&|and)\s*losers/i,
/vinnare\s+och\s+förlorare/i,
/vindere\s+og\s+tabere/i,
/vinnere\s+og\s+tapere/i,
/voittajat\s+ja\s+häviäjät/i,
/gewinner\s+und\s+verlierer/i,
/gagnants\s+et\s+perdants/i,
/ganadores\s+y\s+perdedores/i,
/winnaars\s+en\s+verliezers/i,
/الرابحون والخاسرون/,
/מנצחים ומפסידים/,
/勝者と敗者/,
/승자와\s+패자/,
/赢家与输家/,
];
return patterns.some((p: RegExp) => p.test(content));
}

/**
* Count approximate words in a specific HTML section identified by its CSS class.
* Returns 0 if the section is not found.
*
* @param content - Full HTML content of article
* @param sectionClass - CSS class of the target section element
* @returns Estimated word count within the section
*/
function countSectionWords(content: string, sectionClass: string): number {
// Find the opening tag with the requested class, then scan forward to the
// matching closing tag while tracking nested <section>/<div> elements.
const escapedClass = sectionClass.replace(/[.*+?^${}()|[\]\\]/g, '\\$&');
const openingTagPattern = new RegExp(
`<(section|div)\\b[^>]*class=(?:"|')[^"']*\\b${escapedClass}\\b[^"']*(?:"|')[^>]*>`,
'i',
);
const openingMatch = openingTagPattern.exec(content);
if (!openingMatch || openingMatch.index < 0) return 0;

const rootTag = openingMatch[1].toLowerCase();
const contentStart = openingMatch.index + openingMatch[0].length;
const tagPattern = /<\/?(section|div)\b[^>]*>/gi;
tagPattern.lastIndex = contentStart;

const stack: string[] = [rootTag];
let tagMatch: RegExpExecArray | null = tagPattern.exec(content);

while (tagMatch) {
const matchedTag = tagMatch[0];
const tagName = tagMatch[1].toLowerCase();
const isClosingTag = matchedTag.startsWith('</');

if (!isClosingTag) {
stack.push(tagName);
} else if (stack.length > 0 && stack[stack.length - 1] === tagName) {
stack.pop();
if (stack.length === 0) {
const innerHtml = content.slice(contentStart, tagMatch.index);
const text = innerHtml.replace(/<[^>]+>/g, ' ').replace(/\s+/g, ' ').trim();
return text.split(' ').filter((w: string) => w.length > 0).length;
}
}

tagMatch = tagPattern.exec(content);
}

return 0;
}

/**
* Validate that the lede paragraph has sufficient depth (minimum 30 words).
* A lede shorter than this is likely a stub or template placeholder.
*
* @param content - HTML content of article
* @returns True if lede meets the minimum word count
*/
function hasSubstantialLede(content: string): boolean {
const ledeMatch = content.match(/<p[^>]*class=["'][^"']*\blede\b[^"']*["'][^>]*>([\s\S]*?)<\/p>/i);
if (!ledeMatch?.[1]) return false;
const text = ledeMatch[1].replace(/<[^>]+>/g, ' ').replace(/\s+/g, ' ').trim();
return text.split(' ').filter((w: string) => w.length > 0).length >= 30;
}

/**
* Count specific claim indicators in article text.
* Looks for patterns that indicate potentially verifiable, specific content:
* - Explicit document references (Prop., Bet., Mot., IP)
* - Percentage figures
* - Named MPs or ministers matching the pattern "Firstname Lastname (Party)"
*
* @param content - HTML content of article
* @returns Number of detected specific claim indicators
*/
function countSpecificClaims(content: string): number {
const text = stripHtml(content);
let count = 0;

// Count unique normalized document IDs rather than every occurrence to
// prevent repeated mentions of the same citation from inflating the score.
// Cap at 5, consistent with other claim signals below.
const uniqueDocumentReferences = new Set<string>();
DOCUMENT_ID_PATTERNS.forEach((pattern: RegExp) => {
const flags = pattern.global ? pattern.flags : `${pattern.flags}g`;
const globalPattern = new RegExp(pattern.source, flags);

for (const match of text.matchAll(globalPattern)) {
const documentReference = match[0]?.trim();
if (documentReference) {
uniqueDocumentReferences.add(documentReference.replace(/\s+/g, ' ').toLowerCase());
}
}
});
Comment on lines +324 to +334
Copy link

Copilot AI Apr 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

countSpecificClaims() counts every occurrence of a document ID (and doesn’t dedupe), so repeating the same reference can satisfy minSpecificClaims without adding new verifiable claims. Consider counting unique document IDs (e.g., via a Set) and/or capping doc-reference contribution similar to the percent/name caps, so the threshold reflects distinct claims rather than repetition.

Suggested change
DOCUMENT_ID_PATTERNS.forEach((pattern: RegExp) => {
const matches = text.match(pattern);
count += matches ? matches.length : 0;
});
// Count unique normalized document IDs rather than every occurrence to
// prevent repeated mentions of the same citation from inflating the score.
// Cap at 5, consistent with other claim signals below.
const uniqueDocumentReferences = new Set<string>();
DOCUMENT_ID_PATTERNS.forEach((pattern: RegExp) => {
const flags = pattern.flags.includes('g') ? pattern.flags : `${pattern.flags}g`;
const globalPattern = new RegExp(pattern.source, flags);
for (const match of text.matchAll(globalPattern)) {
const documentReference = match[0]?.trim();
if (documentReference) {
uniqueDocumentReferences.add(documentReference.replace(/\s+/g, ' ').toLowerCase());
}
}
});
count += Math.min(uniqueDocumentReferences.size, 5);

Copilot uses AI. Check for mistakes.
count += Math.min(uniqueDocumentReferences.size, 5);

// Percentage figures (e.g. "12%", "3.5%")
// Cap at 5 to prevent a heavily statistics-driven article from
// single-handedly satisfying the minSpecificClaims threshold via
// repetitive figures alone (e.g. budget tables with 20+ percentages).
const percentMatches = text.match(/\b\d+(?:\.\d+)?%/g);
Copy link

Copilot AI Apr 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment says “Percentage figures with surrounding context (e.g. "increased by 12%")”, but the implementation counts any percentage token via /\b\d+(?:\.\d+)?%/g (no context check). Either adjust the comment to match the current behavior, or implement the intended context requirement to avoid over-counting unrelated percentages (e.g. in tables).

Suggested change
const percentMatches = text.match(/\b\d+(?:\.\d+)?%/g);
const percentMatches = text.match(
/\b(?:increased?|decreased?|rose|risen|fall(?:s|en)?|fell|dropped?|declined?|grew|growth|shrank|reduced?|up|down|change(?:d)?|gain(?:ed)?|loss(?:es)?|surge(?:d)?|jump(?:ed)?|improv(?:ed|ement)|worsen(?:ed|ing)?|inflation|unemployment|approval|support)\s+(?:by|of|to|at)?\s*\d+(?:\.\d+)?%/gi,
);

Copilot uses AI. Check for mistakes.
count += percentMatches ? Math.min(percentMatches.length, 5) : 0;

// Named MPs or ministers (Swedish name pattern: "Firstname Lastname (Party)")
// Capped at 5 for the same reason as percentage matches: prevents a roster
// of names without substantive claims from satisfying the threshold.
const namedActors = text.match(/[A-ZÅÄÖ][a-zåäö]+\s+[A-ZÅÄÖ][a-zåäö]+\s*\([A-ZÅÄÖ]{1,3}\)/g);
count += namedActors ? Math.min(namedActors.length, 5) : 0;

return count;
}

/**
* Detect historical context
*
Expand Down Expand Up @@ -337,6 +508,10 @@ export async function enhanceArticleQuality(
hasLanguageSwitcher: hasLanguageSwitcher(content),
hasArticleTopNav: hasArticleTopNav(content),
hasBackToNews: hasBackToNews(content),
hasWhatHappensNext: hasWhatHappensNext(content),
hasWinnersLosers: hasWinnersLosers(content),
specificClaimsCount: countSpecificClaims(content),
hasSubstantialLede: hasSubstantialLede(content),
};

// Calculate overall score
Expand Down Expand Up @@ -369,6 +544,15 @@ export async function enhanceArticleQuality(
issues.push('Missing required historical context (at least one historical comparison required)');
}

// Content depth validation — minimum specific claims
const minClaims = options.minSpecificClaims ?? 0;
if (minClaims > 0 && (metrics.specificClaimsCount ?? 0) < minClaims) {
issues.push(
`Only ${metrics.specificClaimsCount ?? 0} specific verifiable claims detected (need ${minClaims}). ` +
'Add document references, percentage figures, or named actors with party attributions.',
);
}

// Separate warnings (recommendations) from blocking failures
const warnings: string[] = [];

Expand All @@ -389,6 +573,18 @@ export async function enhanceArticleQuality(
warnings.push('Recommended: Add Swedish statistical context (SCB official statistics)');
}

if (options.recommendWhatHappensNext && !metrics.hasWhatHappensNext) {
warnings.push('Recommended: Add "What Happens Next" timeline section with legislative pipeline dates');
}

if (options.recommendWinnersLosers && !metrics.hasWinnersLosers) {
warnings.push('Recommended: Add "Winners & Losers" section naming political actors with evidence');
}

if (!metrics.hasSubstantialLede) {
warnings.push('Lede paragraph appears thin (< 30 words) — consider expanding with the most newsworthy fact');
}

if (metrics.hasStatisticalClaims) {
warnings.push('Info: Article contains statistical claims — consider adding fact-check section');
}
Expand Down Expand Up @@ -517,11 +713,16 @@ export {
countPartyPerspectives,
countCrossReferences,
hasWhyThisMatters,
hasWhatHappensNext,
hasWinnersLosers,
hasHistoricalContext,
hasInternationalComparison,
hasLanguageSwitcher,
hasArticleTopNav,
hasBackToNews,
countSpecificClaims,
hasSubstantialLede,
countSectionWords,
calculateQualityScore,
DEFAULT_THRESHOLDS,
};
103 changes: 103 additions & 0 deletions scripts/article-template/constants.ts
Original file line number Diff line number Diff line change
Expand Up @@ -148,6 +148,109 @@ export const WATCH_SECTION_TITLES: Record<Language, string> = {
zh: '本周关注要点'
};

/**
* "What Happens Next" timeline section titles for all 14 languages.
* Renders the legislative pipeline with dates and significance indicators.
*/
export const WHAT_HAPPENS_NEXT_TITLES: Record<Language, string> = {
en: 'What Happens Next',
sv: 'Vad händer härnäst',
da: 'Hvad sker der nu',
no: 'Hva skjer videre',
fi: 'Mitä tapahtuu seuraavaksi',
de: 'Was passiert als Nächstes',
fr: 'La suite des événements',
es: 'Qué sucede a continuación',
nl: 'Wat gebeurt er nu',
ar: 'ماذا يحدث بعد ذلك',
he: 'מה קורה בהמשך',
ja: '次のステップ',
ko: '다음 단계',
zh: '下一步将会发生什么'
};

/**
* "Winners & Losers" analysis section titles for all 14 languages.
* Names political actors with evidence-backed outcome assessments.
*/
export const WINNERS_LOSERS_TITLES: Record<Language, string> = {
en: 'Winners & Losers',
sv: 'Vinnare och förlorare',
da: 'Vindere og tabere',
no: 'Vinnere og tapere',
fi: 'Voittajat ja häviäjät',
de: 'Gewinner und Verlierer',
fr: 'Gagnants et perdants',
es: 'Ganadores y perdedores',
nl: 'Winnaars en verliezers',
ar: 'الرابحون والخاسرون',
he: 'מנצחים ומפסידים',
ja: '勝者と敗者',
ko: '승자와 패자',
zh: '赢家与输家'
};

/**
* FAQ section titles for all 14 languages.
*/
export const FAQ_SECTION_TITLES: Record<Language, string> = {
en: 'Frequently Asked Questions',
sv: 'Vanliga frågor',
da: 'Ofte stillede spørgsmål',
no: 'Vanlige spørsmål',
fi: 'Usein kysytyt kysymykset',
de: 'Häufig gestellte Fragen',
fr: 'Questions fréquemment posées',
es: 'Preguntas frecuentes',
nl: 'Veelgestelde vragen',
ar: 'الأسئلة المتكررة',
he: 'שאלות נפוצות',
ja: 'よくある質問',
ko: '자주 묻는 질문',
zh: '常见问题'
};

/**
* Significance labels for "What Happens Next" timeline items.
* Maps significance level to a localized label (High / Medium / Low).
*/
export const SIGNIFICANCE_LABELS: Record<Language, Record<'high' | 'medium' | 'low', string>> = {
en: { high: 'High', medium: 'Medium', low: 'Low' },
sv: { high: 'Hög', medium: 'Medel', low: 'Låg' },
da: { high: 'Høj', medium: 'Mellem', low: 'Lav' },
no: { high: 'Høy', medium: 'Middels', low: 'Lav' },
fi: { high: 'Korkea', medium: 'Keskitaso', low: 'Matala' },
de: { high: 'Hoch', medium: 'Mittel', low: 'Niedrig' },
fr: { high: 'Élevé', medium: 'Moyen', low: 'Faible' },
es: { high: 'Alto', medium: 'Medio', low: 'Bajo' },
nl: { high: 'Hoog', medium: 'Gemiddeld', low: 'Laag' },
ar: { high: 'عالٍ', medium: 'متوسط', low: 'منخفض' },
he: { high: 'גבוה', medium: 'בינוני', low: 'נמוך' },
ja: { high: '高', medium: '中', low: '低' },
ko: { high: '높음', medium: '보통', low: '낮음' },
zh: { high: '高', medium: '中', low: '低' },
};

/**
* Winners/Losers outcome labels for all 14 languages.
*/
export const OUTCOME_LABELS: Record<Language, Record<'wins' | 'loses' | 'mixed', string>> = {
en: { wins: 'Wins', loses: 'Loses', mixed: 'Mixed' },
sv: { wins: 'Vinner', loses: 'Förlorar', mixed: 'Blandat' },
da: { wins: 'Vinder', loses: 'Taber', mixed: 'Blandet' },
no: { wins: 'Vinner', loses: 'Taper', mixed: 'Blandet' },
fi: { wins: 'Voittaa', loses: 'Häviää', mixed: 'Sekoitettu' },
de: { wins: 'Gewinnt', loses: 'Verliert', mixed: 'Gemischt' },
fr: { wins: 'Gagne', loses: 'Perd', mixed: 'Mitigé' },
es: { wins: 'Gana', loses: 'Pierde', mixed: 'Mixto' },
nl: { wins: 'Wint', loses: 'Verliest', mixed: 'Gemengd' },
ar: { wins: 'يفوز', loses: 'يخسر', mixed: 'مختلط' },
he: { wins: 'מנצח', loses: 'מפסיד', mixed: 'מעורב' },
ja: { wins: '勝ち', loses: '負け', mixed: '混在' },
ko: { wins: '승리', loses: '패배', mixed: '혼합' },
zh: { wins: '获益', loses: '受损', mixed: '复杂' },
};

/**
* Locale map for Intl date formatting across all 14 languages
*/
Expand Down
Loading
Loading