explodable/config/rubrics/python_code_quality.yaml at main · tjkuhns/explodable · GitHub

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
# Python Code Quality Rubric
#
# 6-criterion rubric for evaluating subjective Python code quality.
# Designed to complement linters (Ruff, Black, Pylint) by covering
# the dimensions automated tools can't assess: readability, naming,
# architecture, documentation intent, error handling philosophy,
# and testability.
#
# Grounded in: Clean Code (Martin), A Philosophy of Software Design
# (Ousterhout), PEP 8/257, Google Python Style Guide, Buse & Weimer
# (IEEE TSE 2010) readability research.
#
# Scoring: 1-5 per criterion. Weighted criteria get 1.5x multiplier.
# Veto rules: any criterion at 1/5 flags the file for review.

rubric_version: "1.0"
rubric_type: "python_code_quality"
max_score_per_criterion: 5

criteria:
  - id: naming_clarity
    name: "Naming clarity"
    weight: 1.5
    description: >
      Do identifiers reveal intent? Are names consistent within the
      module's domain vocabulary? Are abbreviations avoided or
      universally understood? Does name length scale with scope
      (short for tight loops, descriptive for module-level)?
    anchors:
      1: >
        Single-letter variables in non-trivial scopes. Generic names
        (data, result, temp, val) dominate. Inconsistent naming
        conventions (camelCase mixed with snake_case). Reader must
        read the implementation to understand what a function does.
      3: >
        Most names communicate intent. A few generic names remain
        in non-obvious contexts. Naming conventions are consistent.
        Function names describe what, not how. Minor ambiguities
        that require brief context-reading to resolve.
      5: >
        Every identifier reads as documentation. Domain vocabulary
        is consistent and precise. Function names are verbs that
        describe the contract. A reader unfamiliar with the codebase
        can navigate by names alone.

  - id: readability_structure
    name: "Readability & structure"
    weight: 1.5
    description: >
      Can a competent Python developer read this file top-to-bottom
      and understand the flow without backtracking? Are functions at
      a consistent abstraction level? Is control flow straightforward
      or nested beyond easy comprehension? Does whitespace create
      meaningful visual grouping?
    anchors:
      1: >
        Functions mix abstraction levels freely. Deeply nested
        conditionals (3+ levels). Long functions (50+ lines) with
        multiple responsibilities. No visual separation between
        logical sections. Reader must hold too much state in their
        head to follow the flow.
      3: >
        Functions are mostly at one abstraction level. Nesting is
        manageable (≤2 levels typical). Logical sections are visually
        separated. A few functions are longer than ideal but each has
        a clear single purpose. Flow is followable on first read with
        minor backtracking.
      5: >
        Each function does one thing at one abstraction level. Control
        flow is linear or uses early returns to flatten nesting.
        The file reads like a well-organized document — top-level
        structure visible at a glance, details available on demand.
        No backtracking needed.

  - id: architectural_fit
    name: "Architectural fit"
    weight: 1.0
    description: >
      Does the code exhibit good module boundaries? Are classes and
      functions appropriately granular — deep modules (Ousterhout)
      with simple interfaces hiding complex implementations? Is
      coupling between components minimized? Does the code follow
      single-responsibility at the module level?
    anchors:
      1: >
        God class or god function that handles everything. Tight
        coupling between modules through shared mutable state.
        Circular dependencies. Interface exposes implementation
        details. Changing one function requires changing five others.
      3: >
        Reasonable module boundaries. Most classes/functions have a
        clear responsibility. Some coupling exists but is manageable.
        Interfaces mostly hide implementation. A few functions know
        too much about their callers or callees.
      5: >
        Clean module boundaries with deep modules — simple interfaces
        hiding complex implementations. Dependencies flow one
        direction. Each component can be understood, tested, and
        modified independently. Changes are local.

  - id: documentation_quality
    name: "Documentation quality"
    weight: 1.0
    description: >
      Do docstrings explain the contract (what, not how)? Do comments
      explain WHY, not WHAT? Is the comment-to-code ratio appropriate
      — not over-documented (every line explained) or under-documented
      (complex logic with no context)? Are type hints present and
      accurate?
    anchors:
      1: >
        No docstrings. Comments either absent entirely or explain
        what the code does line-by-line ("increment counter by 1").
        No type hints. A reader must reverse-engineer the interface
        contract from the implementation.
      3: >
        Key functions have docstrings explaining purpose and
        parameters. Comments exist for non-obvious logic. Type hints
        present on public interfaces but incomplete internally.
        A reader can understand the API without reading the body.
      5: >
        Every public function has a clear docstring explaining the
        contract, parameters, return value, and notable edge cases.
        Comments explain only non-obvious decisions (the WHY).
        Type hints are complete and accurate. The module docstring
        frames the purpose and design context.

  - id: error_handling
    name: "Error handling & edge cases"
    weight: 1.0
    description: >
      Are failure modes handled explicitly rather than silently
      swallowed? Are exceptions specific (not bare except)? Is input
      validated at system boundaries? Does the code fail loudly and
      early rather than propagating bad state? Does it follow Python's
      EAFP (easier to ask forgiveness than permission) idiom where
      appropriate?
    anchors:
      1: >
        Bare except clauses that swallow all exceptions. No input
        validation. Errors silently produce wrong results instead of
        failing. Functions return None on error with no indication
        of what went wrong. Code trusts all external input.
      3: >
        Most error cases are handled with specific exceptions.
        Input validation exists at major entry points. Some edge
        cases are documented or handled. A few broad exception
        handlers remain but don't silently swallow important errors.
      5: >
        Every failure mode is explicitly considered. Exceptions are
        specific and carry context. Input is validated at system
        boundaries. Functions fail fast with clear error messages.
        Edge cases are either handled or documented as known
        limitations. No silent swallowing.

  - id: testability
    name: "Testability & API design"
    weight: 1.0
    description: >
      Could a developer write unit tests for this code without
      extensive mocking? Are dependencies injectable? Are side effects
      isolated and observable? Do functions prefer pure computation
      over hidden state mutation? Is the public API minimal and
      well-defined?
    anchors:
      1: >
        Functions depend on global state, environment variables read
        inline, or hardcoded external services. Testing requires
        mocking half the module. Side effects are hidden inside
        pure-looking functions. No clear public API boundary.
      3: >
        Most functions could be tested with reasonable setup.
        Dependencies are mostly passed as parameters. A few functions
        mix computation with I/O but the core logic is extractable.
        Public API is identifiable if not explicitly defined.
      5: >
        Pure functions dominate. Dependencies are injected.
        Side effects are isolated in clearly-marked boundary
        functions. Testing requires minimal or no mocking. Public
        API is explicit, minimal, and stable. Internal implementation
        can change without breaking callers.

veto_rules:
  - pattern: "eval("
    reason: "eval() on potentially untrusted input is a security vulnerability"
  - pattern: "except:"
    reason: "Bare except swallows all exceptions including KeyboardInterrupt and SystemExit"
  - pattern: "except Exception:"
    reason: "Overly broad exception handler — catch specific exceptions"
  - pattern: "password"
    context: "hardcoded"
    reason: "Hardcoded credentials detected"