Skip to content

Feature: AI Vocabulary Wordlist and Frequency Scorer #74

@craigtrim

Description

@craigtrim

Overview

Certain words appear with statistically elevated frequency in LLM-generated text compared to human writing. These are not impossible in human prose — they're just overrepresented in AI output to a degree that creates a detectable signal.

Cassidy Johnston's observation ("the word 'quiet' in some form") is a practitioner-level identification of this phenomenon. Research and community observation have converged on a recognizable set.

Known AI-Preferred Vocabulary

Abstract/elevated register:
delve, tapestry, nuanced, robust, pivotal, transformative, foster, leverage, navigate, underscore, realm, testament

Framing/structural:
it's not X, it's Y, not only X but also Y, at its core, in essence, ultimately

Tone softeners:
quiet, quietly, gently, thoughtful, intentional, meaningful

Filler intensifiers:
crucial, vital, essential, critical, key (overused as emphasis)

Proposed Implementation

  • Curated wordlist stored as a data file (JSON or plain text), versioned and extensible
  • Frequency scorer: count occurrences per 1000 words, normalized against a human baseline corpus
  • Z-score or percentile rank against baseline to flag anomalous elevation
  • Per-word breakdown + aggregate ai_vocabulary_score

Baseline

Requires a reference corpus of human-written text (journalism, fiction, non-fiction) to establish expected frequency distributions. The signal is the delta between observed and expected, not raw counts.

Related

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions