Opinionated, idiomatic rules for writing public Python SDKs. Optimized for Claude Code to read once and apply consistently. When a rule conflicts with a user request, follow the rule and note it.
| Metric | Limit | Note |
|---|---|---|
| Line length | 80 aspirational, 100 hard | Ruff: line-length = 100 |
| Function body length | 30 lines | Excludes signature, docstring, blanks |
| File length | 300 lines | Split by responsibility, not arbitrarily |
| Cyclomatic complexity | 8 | ruff rule C901 |
| Function parameters | 5 | After 5, take a dataclass |
| Nesting depth | 3 | Extract or invert |
| Public class methods | 15 | Above this = split the class |
Any file or function exceeding these MUST be refactored, not exempted.
- Every public symbol is explicit. Use
__all__in every module that contains anything public. __init__.pyre-exports the package surface. Nothing else is public.- Leading underscore = private. No exceptions. No
_helpers.pywith public functions inside. - Keyword-only arguments for any function with more than 2 parameters.
Use
*separator. - Never expose a boolean positional flag. Use
Enum,Literal, or split into two functions. - Return concrete types. Never
Any. Never baredictorlistwithout parameters. - Public signatures are immutable across minor versions. Adding optional kwargs is fine; reordering, renaming, or removing is not.
- Prefer factory functions or classmethods over constructors with
many optional kwargs (
Client.from_env(),Client.from_config()).
# bad
def fetch(url, retry=True, timeout=30, verify=True):
# good
def fetch(
url: str,
*,
retry: bool = True,
timeout: float = 30.0,
verify: bool = True,
) -> Response: ...- 100% type coverage on public surface.
mypy --strictpasses. - Ship
py.typedmarker file in the package root. - Use built-in generics:
list[str],dict[str, int],tuple[int, ...]. - Use
X | None, neverOptional[X]. NeverUnion[X, Y]; useX | Y. - Use
TYPE_CHECKINGguard for import-only-for-types. Protocolfor duck-typed parameters (input contracts).Literal["a", "b"]for closed string sets;Enumfor runtime sets.TypeAlias(PEP 613) for any type used in 2+ places.Generic[T]for containers;ParamSpecfor decorators.- Never
Anywithout a# type: ignore[<rule>]comment justifying it. cast()only at trust boundaries (deserialization, FFI).
- Value objects:
@dataclass(frozen=True, slots=True). - I/O validation (HTTP bodies, config files): Pydantic v2. Never raw dicts at API boundaries.
- Internal records that need converters but not validation:
attrs. - Never use mutable defaults.
field(default_factory=list). - Models do not have I/O methods.
.save()belongs on a repository, not a model.
- Define one root exception per package:
class LibraryError(Exception). - Subclass by category, then by specific failure:
NetworkError(LibraryError)→TimeoutError(NetworkError). - Never
raise Exception(...). Never catch bareExceptionexcept at top-level entry points (CLI, request handler). - Always chain:
raise NewError(...) from originalorraise NewError(...) from Noneto suppress. - Document
Raises:in every public docstring. - No control flow via exceptions. Exceptions = exceptional.
- Errors carry structured context: include the offending value, the operation attempted, and a remediation hint when possible.
- Pick one: sync-only, async-only, or both. Document the choice.
- If both, generate one from the other (e.g.,
unasync) — don't maintain twice. - Backend-agnostic async: prefer
anyioover rawasynciofor library code (works under asyncio and trio). - Provide
async withfor any resource. Never require the caller to remember to close. - Cancel-safe: every
awaitmust leave the object in a valid state if cancelled. - No blocking calls in async functions.
asyncio.to_threadfor unavoidable sync I/O.
logger = logging.getLogger(__name__)at module top.- Library packages add
logging.getLogger(__name__).addHandler( logging.NullHandler())in__init__.py. - Never
print()in library code. - Never call
logging.basicConfig()or touch the root logger. - Lazy formatting only:
logger.debug("user=%s", user), neverlogger.debug(f"user={user}")— the f-string evaluates even when the log level filters the message out. - Levels:
DEBUG= internal flow,INFO= lifecycle events,WARNING= recoverable issue,ERROR= operation failed,CRITICAL= process must terminate.
- Stdlib first. Reach for a third-party dep only when stdlib costs
50 lines or is materially worse.
- Pin lower bounds on runtime deps (
requests>=2.28). Don't pin upper bounds unless you've actually hit a break. - Extras for optional features:
pip install mylib[async,redis]. - No transitive surface leakage. If you use
httpxinternally, don't exposehttpx.Responsein your public API — wrap it. - Vendor only as last resort, and document why.
src/layout. Always.pyproject.tomlonly. Nosetup.py, nosetup.cfg.- PEP 621 metadata. Single source of truth for version.
- Ship
py.typed(empty file in package root). - Build backend:
hatchling(default),setuptoolsif you need C extensions. - One package per repo (monorepos are a separate decision).
- Absolute imports in library code. Relative imports only inside tightly-coupled sibling modules of the same subpackage.
- No wildcard imports. Ever.
- Sorted by
ruff(isort-compatible). - Order: stdlib, third-party, first-party, local relative.
from __future__ import annotationsat the top of every module (faster startup, forward refs without quoting).
snake_casefor functions, variables, module names.PascalCasefor classes and type aliases.SCREAMING_SNAKE_CASEfor module-level constants._single_leading_underscorefor private.- Never invent
__dunder__names — that namespace is Python's. - Names describe intent, not type.
users, notusers_list. - Booleans read as predicates:
is_active,has_permission,should_retry. - No abbreviations in public API.
configurationnotconfigin type names;cfgis fine for local variables.
- Google-style docstrings. Every public symbol.
- Sections: one-line summary, blank line, longer description,
Args:,Returns:,Raises:,Example:. - Examples are doctests where possible — they run in CI.
- README: install, 30-second quickstart, link to full docs.
- CHANGELOG.md follows [Keep a Changelog]. Updated in every PR that touches public surface.
- Public docs built with
mkdocs-materialorsphinx+ autodoc.
def parse_email(value: str) -> Email:
"""Parse a string into a validated Email.
Args:
value: The raw email string. Whitespace is stripped.
Returns:
A validated Email instance.
Raises:
InvalidEmailError: If the input does not match RFC 5322.
Example:
>>> parse_email("alice@example.com")
Email('alice@example.com')
"""pytestonly. Nounittest, nonose.- Test layout mirrors source:
src/pkg/foo.py→tests/test_foo.py. - 90%+ line coverage on public surface, 100% on parsers/validators.
- Fixtures over
setUp/tearDown. Scope them tightly. - Parametrize over copy-paste:
@pytest.mark.parametrize. hypothesisfor parsers, encoders, anything with an obvious round-trip property.- Mock at the boundary (HTTP, filesystem). Never mock internals of the code under test.
- Snapshot tests (
syrupy) for stable structured output only. - No network in unit tests. Use
responses,respx, orvcrpy. - One behavior per test. Test name describes the behavior:
test_retry_gives_up_after_max_attempts.
rufffor lint + format. Replace black, isort, flake8.mypy --strict(orpyrightstrict).pytest --cov --cov-fail-under=90.pre-commitruns ruff + mypy + pytest --collect-only.- Lock file (
uv.lockorpoetry.lock) committed for reproducibility. - Build artifacts on every PR. Publish only from tagged commits.
Minimal pyproject.toml lint config:
[tool.ruff]
line-length = 100
target-version = "py311"
[tool.ruff.lint]
select = ["E", "F", "I", "B", "C90", "UP", "N", "SIM", "RUF"]
ignore = []
[tool.ruff.lint.mccabe]
max-complexity = 8
[tool.ruff.lint.pylint]
max-args = 5
[tool.mypy]
strict = true
warn_unreachable = trueComplexity is the source of all maintenance pain. These rules are mechanical and non-negotiable.
- Max 300 lines per file. If you're past it, the file has more than one responsibility. Split.
- One public class per file is the default. Group only when the classes are inseparable (e.g., builder + product).
- Module name matches its primary public symbol when possible.
- Max 30 lines per function body. Excludes signature, docstring, and blank lines. If you can't fit, extract.
- Max 5 parameters. Beyond 5, group them into a
@dataclass. - Max nesting depth 3. If you have
for { if { for { ... } } }, invert with guard clauses or extract. - Max cyclomatic complexity 8. Each branch (
if,elif,case,and,or,except) adds 1. Above 8, decompose. - Single Responsibility: a function does one thing. Its name is a verb phrase that describes that one thing. If you find yourself using "and" in the name, split it.
- Early returns. No
elseafterreturn. Guard clauses at the top, happy path at the bottom. - No flag parameters.
parse(strict=True)andparse(strict=False)are two different functions. Make them so:parse_strict()andparse_lenient(). - Replace conditional dispatch with polymorphism or a dict. A
10-arm
if/elifchain on a string is adictof handlers. - Extract guard clauses ruthlessly.
if not user: raise NotFoundis its own line, not nested in the body.
# bad — nested, hard to scan
def process(user, request):
if user is not None:
if user.is_active:
if request.is_valid():
return handle(user, request)
else:
raise InvalidRequest
else:
raise InactiveUser
else:
raise UserNotFound
# good — guards first, happy path last
def process(user: User | None, request: Request) -> Result:
if user is None:
raise UserNotFound
if not user.is_active:
raise InactiveUser
if not request.is_valid():
raise InvalidRequest
return handle(user, request)- Prefer pure functions. Pure = same input → same output, no side effects.
- Push side effects (I/O, logging, mutation) to the edges. Pure core, impure shell.
- Immutability by default. Mutable state needs a reason.
- Don't reach into objects (
a.b.c.d). Law of Demeter — talk to your neighbor.
- Prefer composition over inheritance. Inherit only when there is a true is-a relationship and you control both sides.
- No more than 1 level of inheritance in your own code. Mixins are not exempt.
- Favor small protocols over abstract base classes for plug points.
You may not. Refactor instead. If genuinely impossible (generated
code, vendored upstream), add # noqa: <rule> with a comment
explaining why and a TODO with an owner.
- Strict SemVer. Breaking change → major bump.
- Deprecate one minor version before removing. Use
warnings.warn(..., DeprecationWarning, stacklevel=2). - Maintain CHANGELOG.md with sections: Added, Changed, Deprecated, Removed, Fixed, Security.
- Support the oldest Python that >5% of your users still run. Drop versions on major bumps only.
- CI matrix tests every supported Python on every PR.
- Correctness, clarity, complexity limits first. Performance second.
- Measure before optimizing.
pytest-benchmark,py-spy, orscalene. Never guess. - When you optimize, comment why and link the benchmark.
- Constant-time wins (avoid quadratic loops, use
setfor membership) are not optimization — they're correctness.
- Never
eval,exec,pickle.loadson untrusted input. - Never log secrets. Add a
__repr__to credential types that redacts. - Validate at the boundary. Trust internal types.
- Use
secretsmodule for tokens, neverrandom. - Pin TLS verification on by default. Make disabling it loud.
- Dependencies:
pip-auditin CI.
"""Email parsing for the foo package."""
from __future__ import annotations
import logging
import re
from dataclasses import dataclass
from foo.errors import InvalidEmailError
__all__ = ["Email", "parse_email"]
logger = logging.getLogger(__name__)
_EMAIL_RE = re.compile(r"^[^@\s]+@[^@\s]+\.[^@\s]+$")
@dataclass(frozen=True, slots=True)
class Email:
"""A validated email address."""
value: str
def __str__(self) -> str:
return self.value
def parse_email(value: str) -> Email:
"""Parse a string into a validated Email.
Args:
value: Raw input. Stripped of surrounding whitespace.
Returns:
A validated Email.
Raises:
InvalidEmailError: If the input is not a valid email.
"""
cleaned = value.strip()
if not cleaned:
raise InvalidEmailError("email is empty")
if not _EMAIL_RE.match(cleaned):
raise InvalidEmailError(f"invalid email: {cleaned!r}")
return Email(cleaned)That file is 30 lines of code. It hits every rule. Refactor toward this shape.