Feature: AI-Tell Co-occurrence Scorer

## Overview

Individual AI stylistic tells are weak signals in isolation — a human writer might occasionally use tricolon, or end a paragraph with a short sentence. The strength of the detection is in **co-occurrence**: multiple tells appearing together repeatedly across a document.

This module aggregates existing and new per-feature scores into a single weighted confidence score.

## Input Signals

From existing modules (no new code needed):

| Signal | Source | Module |
|---|---|---|
| `complexity_uniformity_score` | Syllabic uniformity | `prosody/sentence_syllable_patterns.py` |
| `repetition_ratio` | Formulaic n-gram patterns | `prosody/syllable_pattern_repetition.py` |
| `starting_pattern_repetition_rate` | Formulaic openings | `prosody/syllable_pattern_repetition.py` |
| `ending_pattern_repetition_rate` | Formulaic closings | `prosody/syllable_pattern_repetition.py` |
| `pattern_entropy` | Distribution concentration | `prosody/syllable_pattern_repetition.py` |

From new modules (pending issues):

| Signal | Source | Issue |
|---|---|---|
| `tricolon_density` | Rule of three frequency | #70 |
| `terminal_brevity_ratio` | Mic drop paragraph shape | #71 (paragraph segmentation) |
| `short_paragraph_run_length` | Stacked short paragraphs | #71 (paragraph segmentation) |
| `ai_vocabulary_score` | LLM-preferred word frequency | #72 |

## Composite Score

```
ai_tell_score = weighted_mean([
    complexity_uniformity_score,
    repetition_ratio,
    tricolon_density_normalized,
    terminal_brevity_score,
    stacked_paragraph_score,
    ai_vocabulary_score,
])
```

Weights should be empirically tunable, with sensible defaults derived from signal reliability.

## Output

- `ai_tell_score`: 0.0–1.0 aggregate confidence
- `signal_breakdown`: per-feature contribution scores
- `dominant_signal`: which feature contributed most
- Interpretive band: Low / Moderate / High / Very High

## Relationship to #68

The co-occurrence score feeds directly into the style conformance penalty in #68 (Tonality Derivation). High `ai_tell_score` in LLM-generated output is a direct deduction against fidelity to a human author's tonality.

## Related

- #69 — AI Stylistic Tell Detection (parent feature)
- #68 — Tonality Derivation and Style Transfer Validation
- #65 — AI-Generated Text Detection
- #70 — Tricolon Detector
- #71 — Paragraph-Level Segmentation
- #72 — AI Vocabulary Wordlist

Signal	Source	Module
`complexity_uniformity_score`	Syllabic uniformity	`prosody/sentence_syllable_patterns.py`
`repetition_ratio`	Formulaic n-gram patterns	`prosody/syllable_pattern_repetition.py`
`starting_pattern_repetition_rate`	Formulaic openings	`prosody/syllable_pattern_repetition.py`
`ending_pattern_repetition_rate`	Formulaic closings	`prosody/syllable_pattern_repetition.py`
`pattern_entropy`	Distribution concentration	`prosody/syllable_pattern_repetition.py`

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature: AI-Tell Co-occurrence Scorer #75

Overview

Input Signals

Composite Score

Output

Relationship to #68

Related

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Signal	Source	Issue
`tricolon_density`	Rule of three frequency	#70
`terminal_brevity_ratio`	Mic drop paragraph shape	#71 (paragraph segmentation)
`short_paragraph_run_length`	Stacked short paragraphs	#71 (paragraph segmentation)
`ai_vocabulary_score`	LLM-preferred word frequency	#72

Feature: AI-Tell Co-occurrence Scorer #75

Description

Overview

Input Signals

Composite Score

Output

Relationship to #68

Related

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions