Status: Draft Author: James Wiesebron, james-in-a-box Created: December 2025 Purpose: A philosophy and framework for code review where LLMs handle mechanical validation and humans focus on strategic judgment Guiding Value: Intentionality
Part of: A Pragmatic Guide for Software Engineering in a Post-LLM World
This document articulates a fundamental shift in how we think about code review: LLM-first, human-last. The core insight is that traditional code review has become misaligned with how software is actually developed in the LLM era.
The problem: LLMs can now generate code faster than humans can review it. Traditional code review—where humans catch everything—becomes a bottleneck, not a quality gate.
The solution: Invert the review model. Let LLMs handle mechanical validation while humans focus on strategic judgment. The result is faster feedback, higher consistency, and human attention directed where it matters most.
The goal: Free human reviewers from repetitive feedback so they can focus on critical paths—the high-stakes decisions where human judgment, organizational context, and accountability matter most.
The guiding value—intentionality: Be deliberate about what deserves human cognitive investment. Every moment spent reviewing formatting is a moment not spent on architecture, security, or mentorship. This document is about reclaiming human attention for the work that only humans can do.
- The Core Philosophy
- The Transformation
- Division of Review Responsibilities
- The Review Stack
- Benefits for Reviewers
- Benefits for Teams
- The Self-Improving System
- Implementation Patterns
- Anti-Patterns to Avoid
- Getting Started
- Related Documents
If you find yourself giving the same feedback twice, stop and automate it.
Every piece of recurring feedback represents a process failure. It should either become an automated check or be questioned as not worth giving. This simple principle drives everything that follows.
This is intentionality in action: treating human attention as the scarce resource it is, and investing it only where it creates irreplaceable value.
Traditional code review places an enormous burden on human reviewers:
┌────────────────────────────────────────────────────────────────┐
│ TRADITIONAL REVIEW COGNITIVE LOAD │
│ │
│ Strategic Concerns │ Mechanical Concerns │
│ ───────────────── │ ────────────────── │
│ • Does this solve the │ • Is this formatted correctly? │
│ right problem? │ • Are type hints present? │
│ • Is the architecture │ • Is naming consistent? │
│ sound? │ • Are there obvious bugs? │
│ • Are there security │ • Is documentation current? │
│ implications? │ • Are tests comprehensive? │
│ • Is this the right │ • Are patterns followed? │
│ direction? │ • Are imports organized? │
│ │ │
└────────────────────────────────────────────────────────────────┘
A significant portion of reviewer attention goes toward mechanical concerns—tasks that require tedious attention to detail rather than strategic insight.
Modern LLMs can assess far more than mechanical concerns. They can evaluate architecture decisions, identify security vulnerabilities, analyze business logic coherence, and suggest improvements across nearly every dimension of code quality. The question isn't "what can LLMs review?" but "where should humans focus?"
┌────────────────────────────────────────────────────────────────┐
│ LLM-FIRST REVIEW RESPONSIBILITY │
│ │
│ HUMAN FOCUS │ LLM REVIEWER │
│ (Critical Paths) │ (Comprehensive Analysis) │
│ ─────────────── │ ───────────────────────── │
│ • Final go/no-go on │ • Format and style validation │
│ high-risk changes │ • Type safety verification │
│ • Organizational context │ • Pattern consistency checks │
│ LLMs can't access │ • Anti-pattern detection │
│ • Accountability for │ • Architecture assessment │
│ critical decisions │ • Security vulnerability scan │
│ • Novel strategic │ • Business logic coherence │
│ trade-offs │ • Documentation completeness │
│ • Relationship & trust │ • Test coverage analysis │
│ building with team │ • Performance implications │
│ │ • Suggested improvements │
│ │ │
│ High-stakes, Accountable │ Comprehensive, Tireless │
└────────────────────────────────────────────────────────────────┘
Human attention is concentrated on critical paths—not because LLMs can't assess other areas, but because these paths require accountability, organizational context, and the kind of judgment that carries weight with stakeholders. This is the heart of intentionality: knowing where your attention belongs, and protecting that focus.
Developer writes code
↓
Developer opens PR
↓
Human reviewer checks EVERYTHING
• Formatting ← tedious
• Types ← automatable
• Patterns ← automatable
• Architecture ← requires judgment
• Business logic ← requires judgment
↓
Developer fixes feedback
↓
Reviewer re-reviews
↓
Repeat until approved
↓
Eventually merges
Developer writes code
↓
Pre-commit hooks catch formatting, linting
↓
Developer opens PR
↓
LLM reviewer analyzes automatically
• Patterns, naming, anti-patterns
• Suggests or applies fixes
↓
Human reviewer focuses on:
• Business requirements
• Architecture decisions
• Security implications
• Strategic direction
↓
Approval
↓
Merge
Traditional review accumulated problems that LLMs can solve:
| Problem | Traditional Impact | LLM-First Solution |
|---|---|---|
| Same feedback given repeatedly | Reviewer fatigue, inconsistency | Automated once, enforced forever |
| Different reviewers catch different things | Inconsistent quality | Uniform rule application |
| Feedback comes late | Wasted developer time | Issues caught at commit time |
| Review doesn't scale | Bottleneck when PRs increase | Parallel automated review |
| Nitpicks dominate discussion | Important feedback gets lost | Mechanical issues pre-resolved |
Modern LLMs are capable reviewers across almost all dimensions of code quality:
| Domain | LLM Capability | Effectiveness |
|---|---|---|
| Formatting & Style | Perfect pattern matching | Excellent |
| Type Safety | Exhaustive verification | Excellent |
| Patterns & Conventions | Tireless consistency | Excellent |
| Common Bugs | Known anti-pattern detection | Excellent |
| Documentation | Completeness checking | Excellent |
| Test Coverage | Systematic enumeration | Excellent |
| Architecture | Structural analysis, coupling detection | Strong |
| Security | Vulnerability patterns, input validation | Strong |
| Performance | Algorithmic complexity, resource usage | Strong |
| Business Logic | Coherence, edge cases, spec alignment | Moderate-Strong |
| Design Trade-offs | Pattern selection, scalability concerns | Moderate |
The question isn't whether LLMs can assess these areas—they can. The question is where human attention provides irreplaceable value:
| Critical Path | Focus | Why Human |
|---|---|---|
| Go/No-Go Decisions | Should we proceed with this change? | Accountability—someone must own the decision |
| Organizational Context | How does this fit our roadmap, politics, constraints? | LLMs lack access to internal dynamics |
| Novel Strategic Trade-offs | Unprecedented situations requiring judgment | Creative problem-solving in new territory |
| Stakeholder Trust | Demonstrating human oversight to partners, regulators | Some contexts require human accountability |
| Team Dynamics | Mentorship, relationship building, morale | Human-to-human interaction matters |
Many review concerns fall into an overlap zone where both LLMs and humans can contribute:
┌──────────────────────────────────────────────────────────────────┐
│ THE OVERLAP ZONE │
│ │
│ LLM Primary │ Shared │ Human Primary │
│ ──────────── │ ────── │ ───────────── │
│ • Formatting │ • Architecture │ • Go/no-go │
│ • Types │ • Security │ • Org context │
│ • Patterns │ • Business logic │ • Novel strategy │
│ • Documentation │ • Design trade-offs │ • Accountability │
│ • Common bugs │ • Performance │ • Team dynamics │
│ │ │ │
│ LLM handles fully │ LLM provides │ Human provides │
│ │ analysis; human │ irreplaceable │
│ │ decides priority │ value │
└──────────────────────────────────────────────────────────────────┘
In the overlap zone, LLMs provide comprehensive analysis while humans decide what matters most for this specific change.
When a human reviewer catches something mechanical, the response shouldn't be "please fix this"—it should be "how does the system automatically catch this?"
This embodies a core tenet of LLM-first review: the system should make the right thing easy. Rather than requiring engineers to follow checklists or remember rules, build capabilities that guide, detect, and automate.
Shift the language:
-
From: "Engineers must add type hints"
-
To: "The system detects missing type hints and can propose them automatically"
-
From: "Engineers must not do superficial checks"
-
To: "The system detects rubber-stamping patterns and flags for deeper review"
-
From: "Engineers must add lint rules instead of comments"
-
To: "When recurring feedback is detected, the system proposes automating it as a lint rule"
Human catches issue
↓
Ask: "How can the system handle this?"
↓
┌───────┴──────┐
Automate Augment
↓ ↓
System catches System assists
automatically (suggestions,
templates,
↓ reminders)
↓
Future PRs benefit from system capability
The philosophy: Meet engineers where they are. Build systems that make good practices the path of least resistance, rather than expecting personal rigor to overcome friction.
Different tools handle different concerns, but the stack is more nuanced than a simple hierarchy. LLM reviewers provide comprehensive analysis across nearly all domains, while humans focus on critical paths.
┌────────────────────────────────────────────────────────────────┐
│ THE REVIEW STACK │
│ │
│ ┌──────────────────────────────────────────────────────────┐ │
│ │ HUMAN REVIEWER (Critical Paths) │ │
│ │ • Go/no-go decisions with accountability │ │
│ │ • Organizational context & strategic fit │ │
│ │ • Team dynamics, mentorship, trust │ │
│ │ Reviews LLM analysis, decides what matters │ │
│ └──────────────────────────────────────────────────────────┘ │
│ ▲ │
│ │ LLM analysis informs human focus │
│ ┌──────────────────────────────────────────────────────────┐ │
│ │ LLM REVIEWER (Comprehensive Analysis) │ │
│ │ • Architecture assessment, coupling analysis │ │
│ │ • Security vulnerability detection │ │
│ │ • Business logic coherence, edge cases │ │
│ │ • Design trade-offs, performance implications │ │
│ │ • Code patterns, naming, duplication │ │
│ └──────────────────────────────────────────────────────────┘ │
│ ▲ │
│ │ Semantic issues pre-analyzed │
│ ┌──────────────────────────────────────────────────────────┐ │
│ │ STATIC ANALYSIS (SAST) │ │
│ │ • Security basics (detect-secrets, semgrep) │ │
│ │ • Vulnerability patterns │ │
│ └──────────────────────────────────────────────────────────┘ │
│ ▲ │
│ │ Security issues filtered │
│ ┌──────────────────────────────────────────────────────────┐ │
│ │ TYPE CHECKERS │ │
│ │ • Type safety (mypy, TypeScript) │ │
│ └──────────────────────────────────────────────────────────┘ │
│ ▲ │
│ │ Type errors filtered │
│ ┌──────────────────────────────────────────────────────────┐ │
│ │ LINTERS & FORMATTERS (Base of Stack) │ │
│ └──────────────────────────────────────────────────────────┘ │
│ │
└────────────────────────────────────────────────────────────────┘
The LLM layer doesn't just filter—it provides comprehensive analysis that humans can use to prioritize their attention on what matters most for this specific change.
When LLMs handle mechanical validation:
- Reduced mental fatigue — No more scanning for typos, style violations, missing types
- Fewer context switches — Stay in strategic thinking mode
- Less repetition — Never give the same feedback twice
- More sustainable pace — Review energy isn't depleted by tedious checks
With LLMs providing comprehensive analysis, human reviewers can concentrate on what matters most:
Accountability is central — This is the irreplaceable human contribution. When you approve a PR, you're saying "I am accountable for this change reaching production." That accountability carries weight with stakeholders, regulatory bodies, and end users in ways that automated approval cannot.
The other critical paths flow from this accountability:
- Organizational context — Applying knowledge LLMs don't have access to (roadmap, politics, constraints)
- Strategic prioritization — Using LLM analysis to decide what matters most for this specific change
- Novel trade-offs — Exercising judgment in unprecedented situations where patterns don't apply
- Stakeholder trust — Demonstrating human oversight when partners or regulators require it
- Mentorship — Building relationships and teaching judgment, not rules
The shift in mindset:
Traditional review asks: "Did I catch all the problems?" LLM-first review asks: "Am I comfortable being accountable for this change?"
The LLM has already caught mechanical problems and provided comprehensive analysis. Your job is to decide whether this change—with its trade-offs and implications—should move forward. That decision is inherently human.
When reviewers aren't mentally exhausted:
- Clearer thinking — Full cognitive capacity for important decisions
- Deeper review — Time to actually understand the change
- More thoughtful feedback — Quality over quantity
- Healthier skepticism — Energy to question assumptions
| Metric | Traditional | LLM-First |
|---|---|---|
| Time to first feedback | Hours-days | Minutes |
| Review iterations | 3-5 | 1-2 |
| Time to merge | Days | Hours |
| Reviewer availability | Scarce | Augmented |
- Uniform enforcement — Same rules applied to every PR
- No reviewer lottery — Quality doesn't depend on who reviews
- Institutional knowledge captured — Patterns encoded in automation
- Reduced bus factor — Standards exist independent of individuals
When reviews focus on substance:
- Less friction — Feedback is about design, not nitpicks
- More learning — Discussions are educational, not pedantic
- Better relationships — Reviews feel collaborative, not adversarial
- Sustainable culture — Review doesn't cause burnout
┌─────────────────────────────────────────────────────────────────┐
│ SELF-IMPROVING REVIEW SYSTEM │
│ │
│ ┌─────────────────┐ │
│ │ Human gives │ │
│ │ feedback on PR │ │
│ └────────┬────────┘ │
│ ↓ │
│ ┌─────────────────┐ │
│ │ System tracks: │ │
│ │ "Feedback X │ │
│ │ appeared 3+ │ │
│ │ times" │ │
│ └────────┬────────┘ │
│ ↓ │
│ ┌─────────────────┐ │
│ │ New automated │ │
│ │ check proposed │ │
│ └────────┬────────┘ │
│ ↓ │
│ ┌─────────────────┐ │
│ │ Team reviews │ │
│ │ and approves │ │
│ └────────┬────────┘ │
│ ↓ │
│ ┌─────────────────┐ │
│ │ Future PRs │ │
│ │ never have │ │
│ │ this issue │ │
│ └─────────────────┘ │
│ │
└─────────────────────────────────────────────────────────────────┘
In this model, human reviewers become trainers of the automation, not gatekeepers:
- Every piece of feedback potentially improves the system
- Giving feedback once is an investment in never giving it again
- The system gets smarter over time
- Reviewer expertise is multiplied across all future PRs
This connects to Radical Self-Improvement for LLMs—the review process isn't static, it evolves.
Catch issues before code is even committed:
# .pre-commit-config.yaml
repos:
- repo: local
hooks:
- id: format
name: Format code
entry: ruff format
language: system
- id: lint
name: Lint code
entry: ruff check --fix
language: system
- id: typecheck
name: Type check
entry: mypy
language: systemConfigure LLM reviewers to run on every PR:
# LLM review triggers
on_pr_opened:
- check_patterns
- check_naming
- check_complexity
- suggest_improvements
on_feedback_pattern:
threshold: 3
action: propose_automationGuide human reviewers on what to focus on:
## Human Review Checklist
- [ ] Does this PR solve the stated problem?
- [ ] Is the architecture appropriate for this change?
- [ ] Are there security implications?
- [ ] Does this align with our technical direction?
- [ ] Would I be comfortable deploying this?
Note: Style, formatting, and mechanical issues are handled
by automated tools. Focus on strategic concerns.Problem: Human comments on something a linter should catch
Human: "Please use const instead of let here"
Why it fails: Wastes human attention on automatable concerns.
Solution: Add a lint rule; never comment on this again.
Problem: Reviewers argue about formatting, naming conventions
Reviewer A: "I prefer camelCase for this"
Reviewer B: "snake_case is more readable"
Why it fails: Style debates should happen once, then be encoded.
Solution: Establish conventions in automation; PRs follow rules, not opinions.
Problem: Human approves without substantive review
Human: "LGTM" (after 30 seconds on 500-line change)
Why it fails: Defeats the purpose of human oversight.
Solution: If automated tools pass and you have no strategic concerns, explicitly note what you checked.
Problem: Human reviewer doesn't read or use the LLM's comprehensive analysis
Human: "I'm worried about the architecture here."
(LLM already flagged coupling issues and suggested refactoring)
Why it fails: Duplicates work and misses the opportunity to build on LLM insights.
Solution: Review the LLM review first. Use it to prioritize your attention on what matters most.
- Trust the LLM analysis — LLMs can assess architecture, security, and business logic; use their analysis as input
- Focus on critical paths — Go/no-go decisions, organizational context, accountability, team dynamics
- Review the LLM review — Scan what the LLM flagged; decide what's worth human attention
- Think "automate or accept" — Every manual comment should either become a rule or be acknowledged as uniquely human
- Own your decisions — When you approve, you're providing human accountability for the change
- Configure LLMs for comprehensive review — Architecture, security, business logic—not just formatting
- Define critical paths — Explicitly list where human attention is irreplaceable
- Audit your feedback — What comments are given repeatedly? Automate them via LLM prompts.
- Build the stack — Linters → Type checkers → SAST → LLM reviewers → Human critical path review
- Track the loop — Monitor what LLMs miss; refine their prompts
The goal is not to limit LLM review to mechanical concerns—LLMs can assess nearly everything. The goal is to concentrate human attention on critical paths where accountability, organizational context, and human judgment are irreplaceable.
When LLMs provide comprehensive analysis and humans focus on high-stakes decisions, everyone wins: reviewers are more engaged, developers get faster feedback, and critical paths get the human attention they deserve.
This is what intentionality looks like in practice: a deliberate choice to protect human cognitive capacity for the decisions that matter most.
| Document | Description |
|---|---|
| A Pragmatic Guide for Software Engineering in a Post-LLM World | Strategic umbrella connecting all three pillars |
| Human-Led, LLM-Navigated Development | Philosophy for human-LLM collaboration |
| Radical Self-Improvement for LLMs | Framework for autonomous LLM self-improvement |
Authored-by: jib