Multi-agent conversational system for comprehensive code quality evaluation using a 7-pillar methodology.
CodeWave is a sophisticated Node.js CLI tool that leverages multiple AI agents in a coordinated discussion framework to perform in-depth analysis of Git commits. Using LangChain, LangGraph, and multiple LLM providers, CodeWave generates beautiful interactive HTML reports with conversation timelines, detailed metrics, and actionable insights.
- Key Features
- Quick Start
- Installation
- CLI Commands
- Output Structure
- Configuration
- The 7-Pillar Evaluation Methodology
- The 5 AI Agents
- Multi-Round Conversation Framework
- Developer Overview
- Advanced Features
- Examples
- Project Structure
- Contributing
- Troubleshooting
- Performance Considerations
- API Reference
- License
- Support & Community
- 🤖 Multi-Agent Conversations: 5 specialized AI agents discuss commits across 3 rounds (Initial Assessment → Concerns → Validation & Agreement)
- 📊 7-Pillar Methodology: Comprehensive evaluation across Code Quality, Complexity, Timing, Technical Debt, Functional Impact, and Test Coverage
- 🎨 Interactive HTML Reports: Beautiful, timeline-based reports with conversation history and metric visualization
- 📈 Batch Processing: Evaluate multiple commits with real-time progress tracking
- 🧠 RAG (Retrieval-Augmented Generation): Automatic handling of large diffs (>100KB) using vector storage and semantic search
- 🔌 Multi-LLM Support: Works with Anthropic Claude, OpenAI GPT, and Google Gemini
- ⚡ Production-Ready: LangGraph-based state machines with comprehensive error handling
- 💾 JSON Output: Structured results for programmatic access and CI/CD integration
- 🎯 Zero Configuration: Interactive setup wizard with sensible defaults
Get up and running in 3 simple steps:
npm install -g @techdebtgpt/codewave
codewave --helpgit clone <repo-url>
cd codewave
npm install
npm run buildcodewave config --initThis launches an interactive wizard to configure:
- LLM Provider: Choose Anthropic Claude, OpenAI, or Google Gemini
- API Keys: Set your LLM provider credentials
- Model Selection: Pick your preferred model (defaults recommended)
- Default Settings: Configure batch size, output directory, and reporting preferences
Configuration is stored securely and only needs to be done once.
Verify Setup:
codewave config --listcodewave evaluate --commit HEADOr use the shorthand:
codewave evaluate HEADThe system will:
- Fetch the commit details from your Git repository
- Extract the diff and metadata
- Run multi-agent conversation workflow (3 rounds)
- Generate interactive HTML report and JSON results
Find Your Results:
# Results are in: .evaluated-commits/{commit-hash}_{date}_{time}/
open .evaluated-commits/*/report.html # macOS
xdg-open .evaluated-commits/*/report.html # Linux
start .evaluated-commits\*\report.html # Windows- Node.js: 18.0.0 or later
- npm: 9.0.0 or later
- Git: 2.0.0 or later
- LLM API Key: Claude, OpenAI, or Google Gemini
npm install -g @techdebtgpt/codewaveThen verify installation:
codewave --help
codewave --versiongit clone <repo-url>
cd codewave
npm install
npm run buildcodewave [options] <command> [command-options]codewave --help, -h Show help message
codewave --version, -v Show version numbercodewave evaluate --commit <commit-hash>
# Alternative (shorthand):
codewave evaluate <commit-hash>Examples:
# Evaluate a specific commit (recommended)
codewave evaluate --commit HEAD
codewave evaluate --commit a1b2c3d
codewave evaluate --commit HEAD~5
# Alternative shorthand syntax
codewave evaluate HEAD
codewave evaluate a1b2c3d
# Evaluate staged changes
codewave evaluate --staged
# Evaluate all current changes (staged + unstaged)
codewave evaluate --current
# Evaluate from diff file
codewave evaluate --file my-changes.diffcodewave batch [options]Examples:
# Evaluate last 10 commits on current branch
codewave batch --count 10
# Evaluate with progress tracking
codewave batch --count 20 --verbose
# Evaluate commits in date range
codewave batch --since "2024-01-01" --until "2024-01-31"
# Evaluate with custom output and parallelization
codewave batch --count 50 --output "./reports" --parallel 3Verify Batch Results:
# Count evaluations
ls -1 .evaluated-commits/ | wc -l
# Calculate total cost
jq -s '[.[].totalCost] | add' .evaluated-commits/*/results.jsoncodewave config --init # Interactive setup wizard
codewave config --list # Display current configuration
codewave config --reset # Reset to defaultsIssue: "API Key not found"
# Solution: Run setup again or set environment variable
codewave config --init
# OR
export CODEWAVE_API_KEY=sk-ant-...
codewave evaluate --commit HEADIssue: "codewave: command not found" (after npm install -g)
# Solution: Restart your terminal
# The terminal needs to reload PATH after global npm install
codewave --versionIssue: Evaluation times out
# Solution: Enable RAG for large commits
codewave config --init
# Then when prompted, enable RAG for large diffs
codewave evaluate --commit HEADSee TROUBLESHOOTING.md for more help.
Evaluation results are organized in .evaluated-commits/ directory:
.evaluated-commits/
├── a1b2c3d_2024-01-15_10-30-45/
│ ├── report.html # Interactive HTML report with conversation timeline
│ ├── results.json # Full evaluation data with all metrics
│ ├── commit.diff # Original commit diff
│ └── summary.txt # Quick text summary
├── x9y8z7w_2024-01-15_11-15-20/
│ ├── report.html
│ ├── results.json
│ ├── commit.diff
│ └── summary.txt
Interactive report featuring:
- Commit metadata (hash, author, date, message)
- Agent roles and responsibilities
- Round-by-round conversation timeline
- Evolution of metrics across discussion rounds
- Final consensus scores and insights
- Key concerns and recommendations
- Beautiful responsive design
Structured data including:
- Commit information and diff
- Full conversation transcript
- All agent responses and reasoning
- Evolution of metrics (Initial → Final)
- Consensus scores and weights
- Processing metadata (tokens used, cost, duration)
Original unified diff format for reference and archival.
Quick text summary with key metrics and top 3 recommendations.
You can customize where evaluation results are saved using any of these methods (in priority order):
Use --commit flag for single evaluation:
# Single evaluation (recommended)
codewave evaluate --commit HEAD
# Alternative shorthand syntax
codewave evaluate HEAD
# Batch evaluation
codewave batch --count 10Set for current session or script:
export CODEWAVE_OUTPUT_DIR=./reports
codewave evaluate --commit HEAD
codewave batch --count 10Set as default for all evaluations:
User config (~/.codewave/config.json or %APPDATA%\codewave\config.json):
{
"outputDirectory": "./my-evaluations"
}Project config (.codewave.config.json in project root):
{
"output": {
"directory": "./commit-analysis"
}
}If not configured, defaults to .evaluated-commits/ in current directory.
Control which file formats to generate:
# Evaluate specific commit (recommended)
codewave evaluate --commit HEAD
# Evaluate staged changes
codewave evaluate --staged# Set default format
codewave config set report-format jsonOr in config file:
{
"reportFormat": "json"
}Available formats:
html- Interactive HTML report (default)json- Structured JSON for programmatic accessmarkdown- Markdown formatall- Generate all three formats
CodeWave uses a 3-tier configuration system with priority order:
- Environment Variables (highest priority)
- CLI Arguments
- Project Configuration (
.codewave.config.json) - User Configuration (user home directory)
- Defaults (lowest priority)
On first run, use codewave config --init to set up your LLM provider:
codewave config --initThis creates a user-level configuration file.
Applied to all projects in your user account:
- macOS/Linux:
~/.codewave/config.json - Windows:
%APPDATA%\codewave\config.json
Example: Set once, used everywhere
{
"llmProvider": "anthropic",
"model": "claude-haiku-4-5-20251001",
"apiKey": "sk-ant-...",
"apiBaseUrl": null,
"outputDirectory": ".evaluated-commits",
"defaultBatchSize": 10,
"parallelEvaluations": 3,
"maxTokensPerRequest": 4000,
"enableRag": true,
"ragChunkSize": 2000,
"vectorStoreType": "memory",
"reportFormat": "all",
"verbose": false
}Applied only to a specific project, overrides user-level settings:
Location: .codewave.config.json in your project root
Example with Real-World Setup:
{
"apiKeys": {
"anthropic": "sk-ant-...",
"openai": "sk-proj-...",
"google": "",
"xai": ""
},
"llm": {
"provider": "openai",
"model": "gpt-4o-mini",
"temperature": 0.2,
"maxTokens": 16000
},
"agents": {
"enabled": [
"business-analyst",
"sdet",
"developer-author",
"senior-architect",
"developer-reviewer"
],
"retries": 3,
"timeout": 300000,
"minRounds": 2,
"maxRounds": 3,
"clarityThreshold": 0.85
},
"output": {
"directory": "./commit-analysis",
"format": "json",
"generateHtml": true
},
"tracing": {
"enabled": true,
"apiKey": "lsv2_pt_...",
"project": "codewave-evaluations",
"endpoint": "https://api.smith.langchain.com"
}
}When to use project config:
- Different API keys per project
- Team-specific settings
- CI/CD pipeline customization
- Integration with LangSmith tracing
Override any configuration setting using environment variables (highest priority):
# LLM Settings
export CODEWAVE_LLM_PROVIDER=anthropic
export CODEWAVE_API_KEY=sk-ant-...
export CODEWAVE_MODEL=claude-haiku-4-5-20251001
# Output Settings
export CODEWAVE_OUTPUT_DIR=./reports
export CODEWAVE_REPORT_FORMAT=json
# Token & Cost Management
export CODEWAVE_MAX_TOKENS=4000
export CODEWAVE_BATCH_SIZE=10
export CODEWAVE_PARALLEL=3
# RAG Settings
export CODEWAVE_ENABLE_RAG=true
export CODEWAVE_RAG_CHUNK_SIZE=2000
export CODEWAVE_RAG_THRESHOLD=102400
# Logging
export CODEWAVE_VERBOSE=true
# Run evaluation
codewave evaluate --commit HEADPriority order (environment variables override all):
Environment Variables > CLI Arguments > Project Config > User Config > Defaults
CodeWave evaluates commits across 7 carefully chosen dimensions, with each pillar assigned to a specialized AI agent:
Agent: Developer Reviewer Description: Evaluates code correctness, design patterns, adherence to best practices, readability, and potential bugs. Weights: Critical for production quality and maintainability.
Agent: Senior Architect Description: Measures cyclomatic complexity, cognitive complexity, maintainability. Higher score = Lower complexity. Scale: 10 (simple) to 1 (very complex) Weights: Critical for long-term maintenance and team velocity.
Agent: Business Analyst Description: Estimates ideal development time under optimal conditions (clear requirements, no interruptions). Scale: Hours (0.5 to 80) Weights: Baseline for productivity metrics.
Agent: Developer Author Description: Actual time taken to implement (including research, debugging, iterations). Scale: Hours (0.5 to 160) Weights: Identifies scope creep and process inefficiencies.
Agent: Senior Architect Description: Positive = Additional debt introduced; Negative = Debt reduced/eliminated. Scale: Hours (+/- 0 to 40) Weights: Critical for assessing long-term codebase health.
Agent: Business Analyst Description: User-facing impact, business value, feature completeness, and alignment with requirements. Scale: 1 (no impact) to 10 (transformative) Weights: Aligns engineering efforts with business goals.
Agent: QA Engineer Description: Comprehensiveness of tests: unit, integration, edge cases, error scenarios. Scale: 1 (no tests) to 10 (comprehensive coverage) Weights: Critical for reliability and preventing regressions.
Role: Strategic stakeholder representing business value and user impact. Metrics: Ideal Time Hours, Functional Impact Responsibilities:
- Assess business value and feature completeness
- Estimate ideal development time
- Evaluate functional impact on users
- Consider market alignment and competitive advantage
Role: Original implementation owner providing implementation insights. Metrics: Actual Time Hours Responsibilities:
- Report actual development time
- Explain implementation decisions
- Discuss challenges and blockers encountered
- Provide context for complexity and time variance
Role: Code quality auditor ensuring production readiness. Metrics: Code Quality Responsibilities:
- Evaluate code correctness and design patterns
- Identify potential bugs and security issues
- Assess readability and maintainability
- Recommend refactoring opportunities
Role: Technical leader focused on scalability, design, and debt. Metrics: Code Complexity, Technical Debt Hours Responsibilities:
- Assess architectural decisions and scalability
- Measure code complexity and maintainability
- Estimate technical debt introduced or reduced
- Recommend long-term improvements
Role: Quality assurance specialist ensuring reliability. Metrics: Test Coverage Responsibilities:
- Evaluate test coverage and comprehensiveness
- Identify untested edge cases and error scenarios
- Assess reliability and resilience
- Recommend testing improvements
CodeWave's evaluation happens across 3 structured rounds:
Each agent independently evaluates the commit against their pillar metrics, providing initial scores and reasoning.
Duration: ~30-60 seconds Output: Initial scores, concerns, and observations
Agents present their concerns and challenge each other's assumptions. This creates a realistic discussion where different perspectives can influence thinking.
Duration: ~30-90 seconds Output: Refined perspectives, acknowledged concerns, potential consensus areas
Agents finalize their positions, considering all previous inputs. Final scores are calculated with a weighted consensus algorithm.
Duration: ~20-60 seconds Output: Final scores, consensus reasoning, and agreed-upon recommendations
Every evaluation begins with an AI-generated Developer Overview - a concise, intelligent summary of what changed in the commit, automatically extracted and formatted before agents evaluate.
The Developer Overview contains:
- Summary: One-line executive summary of the change (max 150 chars)
- Details: Paragraph explaining key changes and context (max 400 chars)
- Key Changes: Bullet list of implementation details
Summary: Added actual estimation as a separate step
Details:
Introduced actual time estimation alongside ideal time in PR analysis
for better accuracy.
Key Changes:
- Implemented IActualTimeEstimator interface
- Created ActualTimeRunnable for estimation
- Merged actual time with PR lifecycle data
- HTML Report: Top card in the report
- results.json:
developerOverviewfield - Agent Context: All agents receive this as context for their evaluation
The Developer Overview provides:
- Quick Context: Understand the change without reading the full diff
- Consistency: Same summary regardless of agent disagreement
- CI/CD Integration: Programmatic access to change summary
- Documentation: Auto-generated change documentation
For detailed information about Developer Overview generation, convergence detection, and multi-round discussion, see ADVANCED_FEATURES.md.
When commits exceed 100KB (configurable):
- Diff is chunked into semantic segments
- Vector embeddings generated for each chunk
- Agents query most relevant chunks instead of processing entire diff
- Reduces tokens used and speeds up evaluation
Configuration:
codewave config set enable-rag true
codewave config set rag-chunk-size 2000
codewave config set rag-threshold 102400Choose your LLM provider and model based on your needs and budget:
Anthropic Claude (Recommended)
- Best for code analysis and reasoning
- Default Model: claude-haiku-4-5-20251001 (6x cheaper, recommended for most use cases)
- Alternatives:
- claude-sonnet-4-5-20250929 (best balance of quality and cost)
- claude-opus-4-1-20250805 (maximum quality, highest cost)
OpenAI GPT
- Excellent multi-agent reasoning
- Cost-optimized: gpt-4o-mini (recommended)
- Balanced: gpt-4o
- Advanced reasoning: o3-mini-2025-01-31, o3
Google Gemini
- Most cost-effective option
- Recommended: gemini-2.5-flash-lite (most efficient)
- Alternatives: gemini-2.5-flash, gemini-2.5-pro
xAI Grok
- Specialized use cases
- Recommended: grok-4-fast-non-reasoning
- Alternatives: grok-4.2, grok-4-0709
Example: Switch to OpenAI
codewave config set llm-provider openai
codewave config set model gpt-4o-mini
codewave config set api-key sk-...Example: Switch to Google Gemini (most cost-effective)
codewave config set llm-provider google
codewave config set model gemini-2.5-flash-lite
codewave config set api-key YOUR_GEMINI_API_KEYSee CONFIGURATION.md for complete model comparison and cost analysis.
Monitor evaluations in real-time:
codewave batch --count 100 --verboseProgress Display:
- Overall completion percentage
- Current commit being evaluated
- Elapsed time and ETA
- Tokens used and estimated cost
- Success/error count
- Average evaluation time per commit
All results are saved as JSON files in the evaluation output directory for programmatic access:
codewave evaluate --commit HEAD
# Results are in: .evaluated-commits/{commit-hash}_{date}_{time}/
# Access results.json for structured dataUse cases:
- Integrate with CI/CD pipelines
- Custom reporting and dashboards
- Machine learning on evaluation metrics
- Automated quality gates
codewave batch --count 5 --verboseOutput:
CodeWave - Commit Intelligence Engine
================================
Evaluating 5 commits...
[████████████████████████████████] 100% (5/5)
Evaluation Summary:
├── Total evaluated: 5
├── Successful: 5
├── Failed: 0
├── Average time: 2.3s per commit
├── Total tokens: 18,450
└── Output: .evaluated-commits/
Reports generated:
✓ a1b2c3d - "feat: add user authentication" (Quality: 8.5/10)
✓ x9y8z7w - "fix: resolve memory leak" (Quality: 9.0/10)
✓ m5n4o3p - "docs: update README" (Quality: 7.0/10)
✓ k1l2m3n - "refactor: simplify payment module" (Quality: 8.5/10)
✓ j0i9h8g - "test: add integration tests" (Quality: 8.0/10)
codewave evaluate feature/auth# Evaluate last 20 commits with progress display
codewave batch --count 20 --verbose
# Output will show:
# - Current progress (20/20)
# - Elapsed time and ETA
# - Average quality score
# - Token usage and costs# Evaluate all commits from January 2024
codewave batch --since "2024-01-01" --until "2024-01-31"
# Evaluate commits from past week
codewave batch --since "7 days ago" --until "today"
# Evaluate commits in past month with custom output
codewave batch --since "30 days ago" --output "./monthly-analysis"# Use cheapest model (Gemini) with max parallelization
codewave config set llm-provider google
codewave config set model gemini-2.5-flash-lite
codewave batch --count 500 --parallel 5
# Expected cost: ~$10 for 500 commits# Use best model with sequential processing (better reasoning)
codewave config set model claude-opus-4-1-20250805
codewave batch --count 10 --parallel 1 --verbose
# Better quality, slower, higher cost per commit# Continue on errors, save to specific directory
codewave batch \
--since "2024-01-01" \
--until "2024-01-31" \
--skip-errors \
--parallel 5 \
--output "./january-analysis" \
--verbose
# Generates batch-summary.json with success/failure stats# Evaluate commits only on develop branch
codewave batch --branch develop --count 30
# Evaluate last 50 commits on feature branch
codewave batch --branch feature/new-auth --count 50
# Compare two branches
codewave batch --branch main --count 20 -o "./main-analysis"
codewave batch --branch develop --count 20 -o "./develop-analysis"# Evaluate and output only JSON (for programmatic access)
codewave batch \
--count 10 \
--format json \
--output "./ci-results" \
--skip-errors
# Access results programmatically
jq '.metrics | {quality: .codeQuality, coverage: .testCoverage}' \
./ci-results/*/results.json# Count total evaluations
ls -1 .evaluated-commits/ | wc -l
# Calculate average quality score
jq -s 'map(.metrics.codeQuality) | add/length' \
.evaluated-commits/*/results.json
# Find low-quality commits
jq 'select(.metrics.codeQuality < 5)' \
.evaluated-commits/*/results.json
# Calculate total cost
jq -s 'map(.totalCost) | add' \
.evaluated-commits/*/results.json
# Get average evaluation time
jq -s 'map(.metadata.evaluationTime) | add/length' \
.evaluated-commits/*/results.jsoncodewave/
├── cli/ # CLI entry points and commands
│ ├── index.ts # Main CLI entry point (Commander setup)
│ ├── commands/
│ │ ├── evaluate-command.ts # Single commit evaluation
│ │ ├── batch-evaluate-command.ts # Multiple commits
│ │ └── config.command.ts # Configuration management
│ └── utils/
│ ├── progress-tracker.ts # Progress bar UI
│ └── shared.utils.ts # CLI utilities
├── src/
│ ├── agents/ # AI agent implementations
│ │ ├── base-agent-workflow.ts # Base agent class
│ │ ├── business-analyst-agent.ts
│ │ ├── developer-author-agent.ts
│ │ ├── developer-reviewer-agent.ts
│ │ ├── qa-engineer-agent.ts
│ │ └── senior-architect-agent.ts
│ ├── config/ # Configuration management
│ │ ├── config.loader.ts # Interactive config loader
│ │ └── types.ts # Config type definitions
│ ├── llm/ # LLM provider integration
│ │ ├── llm.service.ts # Multi-provider service
│ │ └── token-manager.ts # Token tracking
│ ├── formatters/ # Output formatting
│ │ ├── html-report-formatter-enhanced.ts
│ │ ├── json-formatter.ts
│ │ └── markdown-formatter.ts
│ ├── orchestrator/ # LangGraph workflow
│ │ └── orchestrator.ts # Multi-round conversation
│ ├── services/ # Business logic
│ │ ├── commit.service.ts # Git operations
│ │ ├── vector-store.service.ts # RAG support
│ │ └── evaluation.service.ts
│ ├── types/ # Type definitions
│ │ ├── agent.types.ts
│ │ ├── commit.types.ts
│ │ └── output.types.ts
│ ├── constants/ # Constants and weights
│ │ └── agent-weights.ts
│ └── utils/ # Shared utilities
│ ├── token-utils.ts
│ └── file-utils.ts
├── package.json # npm configuration
├── tsconfig.json # TypeScript config
└── README.md # This file
We welcome contributions! Please follow these guidelines:
-
Fork and Clone
git clone <your-fork> cd codewave
-
Create Feature Branch
git checkout -b feature/your-feature
-
Make Changes and Test
npm run build npm test -
Ensure Code Quality
npm run lint npm run prettier
-
Submit Pull Request
- Include clear description of changes
- Reference related issues
- Include test cases for new features
Q: "API Key not found" error
A: Run 'codewave config --init' to set up your LLM provider credentials.
Alternatively, set CODEWAVE_API_KEY environment variable.
Q: Evaluation times out
A: For large commits (>100KB), enable RAG:
codewave config set enable-rag true
RAG automatically handles large diffs by chunking and semantic search.
Q: "Too many requests" error from LLM provider
A: Reduce parallel evaluations:
codewave batch --parallel 2
Or use a different LLM provider with higher rate limits.
Q: Results directory growing too large
A: Archive old evaluations:
find .evaluated-commits -type f -mtime +30 -delete
Q: Memory issues during batch processing
A: Reduce batch size and parallel count:
codewave batch --count 10 --parallel 1
See TROUBLESHOOTING.md for more detailed solutions.
- Average: 2-4 seconds per commit
- Small commits (<1KB): 1-2 seconds
- Medium commits (1-100KB): 2-5 seconds
- Large commits (>100KB with RAG): 3-8 seconds
- Average: 3,000-5,000 tokens per evaluation
- Small commits: 2,000-3,000 tokens
- Complex commits: 4,000-6,000 tokens
- RAG-assisted: 2,500-4,000 tokens (saved via chunking)
- Single evaluation: ~$0.015-0.030
- 100 commits: ~$1.50-3.00
- 1,000 commits: ~$15-30
For programmatic usage, see API.md.
import { CodeWaveEvaluator } from 'codewave';
const evaluator = new CodeWaveEvaluator({
llmProvider: 'anthropic',
model: 'claude-3-5-sonnet-20241022',
apiKey: process.env.ANTHROPIC_API_KEY,
});
const result = await evaluator.evaluate('HEAD');
console.log('Code Quality:', result.metrics.codeQuality);
console.log('Consensus:', result.consensus);MIT License - see LICENSE file for details.
We welcome contributions from the community! Please see .github/CONTRIBUTING.md for guidelines on how to contribute to CodeWave.
This project adheres to the Contributor Covenant Code of Conduct. By participating, you are expected to uphold this code.
Please report security vulnerabilities to .github/SECURITY.md or email security@techdebtgpt.com.
- Issues: GitHub Issues
- Discussions: GitHub Discussions
- Twitter: @TechDebtGPT
- Email: support@techdebtgpt.com
Built with ❤️ by the TechDebtGPT team using:
- LangChain - AI/LLM orchestration
- LangGraph - Workflow state machines
- Commander.js - CLI framework
- Chalk - Terminal styling
CodeWave - Making commit intelligence accessible to every team.