Overview
Certain words appear with statistically elevated frequency in LLM-generated text compared to human writing. These are not impossible in human prose — they're just overrepresented in AI output to a degree that creates a detectable signal.
Cassidy Johnston's observation ("the word 'quiet' in some form") is a practitioner-level identification of this phenomenon. Research and community observation have converged on a recognizable set.
Known AI-Preferred Vocabulary
Abstract/elevated register:
delve, tapestry, nuanced, robust, pivotal, transformative, foster, leverage, navigate, underscore, realm, testament
Framing/structural:
it's not X, it's Y, not only X but also Y, at its core, in essence, ultimately
Tone softeners:
quiet, quietly, gently, thoughtful, intentional, meaningful
Filler intensifiers:
crucial, vital, essential, critical, key (overused as emphasis)
Proposed Implementation
- Curated wordlist stored as a data file (JSON or plain text), versioned and extensible
- Frequency scorer: count occurrences per 1000 words, normalized against a human baseline corpus
- Z-score or percentile rank against baseline to flag anomalous elevation
- Per-word breakdown + aggregate
ai_vocabulary_score
Baseline
Requires a reference corpus of human-written text (journalism, fiction, non-fiction) to establish expected frequency distributions. The signal is the delta between observed and expected, not raw counts.
Related
Overview
Certain words appear with statistically elevated frequency in LLM-generated text compared to human writing. These are not impossible in human prose — they're just overrepresented in AI output to a degree that creates a detectable signal.
Cassidy Johnston's observation ("the word 'quiet' in some form") is a practitioner-level identification of this phenomenon. Research and community observation have converged on a recognizable set.
Known AI-Preferred Vocabulary
Abstract/elevated register:
delve,tapestry,nuanced,robust,pivotal,transformative,foster,leverage,navigate,underscore,realm,testamentFraming/structural:
it's not X, it's Y,not only X but also Y,at its core,in essence,ultimatelyTone softeners:
quiet,quietly,gently,thoughtful,intentional,meaningfulFiller intensifiers:
crucial,vital,essential,critical,key(overused as emphasis)Proposed Implementation
ai_vocabulary_scoreBaseline
Requires a reference corpus of human-written text (journalism, fiction, non-fiction) to establish expected frequency distributions. The signal is the delta between observed and expected, not raw counts.
Related