Skip to content

Feature: trivial-docstring enrichment rule #325

@Alberto-Codes

Description

@Alberto-Codes

Problem Statement

A common pattern when developers are told "add docstrings" is to write:

def get_user(user_id):
    """Get user."""

def process_data(data):
    """Process data."""

class UserManager:
    """User manager."""

These docstrings technically satisfy presence checks (D100–D107, missing-docstring) but have zero information value. They restate what the name already communicates. ruff and pydocstyle don't catch this — they only check presence and formatting, not whether the content is meaningful.

This is a quality-of-documentation issue that fits squarely in docvet's "enrichment" layer.

Current Behavior

Docstrings that trivially restate the symbol name produce no finding. Presence checks are satisfied.

Proposed Solution

Detection strategy

Compare the docstring summary line against the function/class name. If the summary is a trivial restatement, emit a finding.

Algorithm:

  1. Extract the first sentence of the docstring (summary line)
  2. Normalize both symbol name and summary: strip punctuation, lowercase, split snake_case and CamelCase into word sets
  3. Filter stop words (a, an, the, of, for, to, in, etc.)
  4. If the summary word set is a subset of the name word set, it's trivial

Examples that trigger

Symbol Docstring Why trivial
get_user """Get user.""" {get, user} == {get, user}
process_data """Process the data.""" {process, data}{process, data} (ignoring "the")
UserManager """User manager.""" {user, manager} == {user, manager}
calculate_total """Calculate total.""" {calculate, total} == {calculate, total}

Examples that do NOT trigger

Symbol Docstring Why not trivial
get_user """Fetch a user from the database by their ID.""" Adds: "database", "ID", "fetch"
process_data """Apply normalization and deduplication to raw input data.""" Describes what "process" means
UserManager """Manages user lifecycle including creation, auth, and deletion.""" Adds substantial context

Word extraction

_STOP_WORDS = frozenset({
    "a", "an", "the", "of", "for", "to", "in", "is", "it",
    "and", "or", "this", "that", "with", "from", "by", "on",
})

def _name_to_words(name: str) -> set[str]:
    """Split snake_case and CamelCase into lowercase word sets."""
    parts = name.split("_")
    words: set[str] = set()
    for part in parts:
        tokens = re.findall(r"[A-Z]?[a-z]+|[A-Z]+(?=[A-Z]|$)", part)
        words.update(t.lower() for t in tokens)
    return words - _STOP_WORDS

Configuration

[tool.docvet.enrichment]
detect-trivial-docstrings = true  # default

Acceptance Criteria

  • Docstrings that restate the symbol name are flagged
  • Adding meaningful words beyond the name avoids the finding
  • Stop words filtered from both sides
  • Works for both snake_case and CamelCase names
  • Config key detect-trivial-docstrings added

Technical Notes

Files changed: enrichment.py (~50 lines), config.py (~3 lines), tests (~60 lines)

Category: recommended — quality signal, not a hard requirement. Some trivial docstrings are acceptable in simple utility code.

False positive mitigation: The subset check is conservative — adding even one meaningful word beyond the name avoids the finding. If false positive rate is too high, threshold could be adjusted to overlap ratio.


BMAD Workflow

When ready to implement:

  • /bmad-bmm-quick-spec -> /bmad-bmm-quick-dev

Metadata

Metadata

Assignees

No one assigned

    Labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions