Skip to content

Feature: Phrase-Level Syllabic Homogeneity Scorer #77

@craigtrim

Description

@craigtrim

Overview

Given a list of text phrases (candidate parallel units), compute total syllable count per phrase and the coefficient of variation across them. This is the syllabic validation layer for tricolon candidate scoring.

Why It's Needed

"quick, smart, and extraordinary" passes the surface tricolon test but is prosodically broken — the third item carries ~5× the syllabic weight of the first two. A rhythmically coherent tricolon has roughly equal syllabic weight across all three members. Without this check, the candidate gate produces too many false positives.

What Already Exists

All building blocks are in prosody/rhythm_prosody.py:

  • _count_syllables(word) — per-word syllable count via CMU dict with heuristic fallback
  • _compute_rhythmic_regularity(syllable_counts) — CV calculation already implemented

The gap: neither function operates on phrases. There is no wrapper that takes a list of phrase strings, totals syllables per phrase, and computes CV across the set.

Proposed Implementation

def compute_syllabic_homogeneity(phrases: list[str]) -> SyllabicHomogeneityResult:
    # For each phrase: tokenize → sum syllable counts → phrase weight
    # CV across phrase weights → homogeneity score
    # 1.0 / (1.0 + CV) → normalized 0-1 score (higher = more uniform)

Output

SyllabicHomogeneityResult:
  phrase_syllable_counts: list[int]   # total syllables per phrase
  syllable_cv: float                  # CV across phrase counts
  homogeneity_score: float            # 0-1, higher = more uniform
  outlier_index: int | None           # index of phrase with anomalous weight, if any

outlier_index is important: it tells the tricolon scorer which member is breaking the beat, not just that one is.

Acceptance Criteria

  • ["quick", "smart", "extraordinary"] → low homogeneity score, outlier_index=2
  • ["came", "saw", "conquered"] → high homogeneity score, no outlier
  • Reuses existing _count_syllables without duplication

Blockers / Downstream

  • Blocks Feature: Tricolon Detector #73 (Tricolon Detector) — required for candidate validation stage
  • No blockers on this issue — can be built immediately

Related

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions