📑 Table of Contents
One task → Multi-role AI collaboration → One conclusion
Production Ready | V3.6.1
DevSquad V3.6.1 adds 5 new cybernetics modules: FeedbackControlLoop for closed-loop feedback control, ExecutionGuard for safe execution with rollback, PerformanceFingerprint for performance baseline tracking, SimilarTaskRecommender for TF-IDF-based task similarity search, and AdaptiveRoleSelector for intelligent role selection based on task characteristics — making multi-agent collaboration more adaptive, self-optimizing, and resilient.
Chinese Name: 反馈闭环控制器 (Feedback Closed-Loop Controller) Core Capabilities:
- Closed-loop feedback control with automatic iteration until quality threshold met
- Configurable quality gate (
quality_gate) and maximum iterations - Lightweight quality assessment (no LLM calls), supports dry-run mode
from scripts.collaboration.feedback_control_loop import FeedbackControlLoop
from scripts.collaboration.dispatcher import MultiAgentDispatcher
dispatcher = MultiAgentDispatcher()
loop = FeedbackControlLoop(dispatcher, quality_gate=0.7, max_iterations=3)
result = loop.run("Design secure auth system", roles=["architect", "security"])
print(f"Iterations: {loop.iteration_count}")
print(f"Best quality: {loop.best_quality:.2f}")
# Automatically iterates until quality gate met or max iterations reachedChinese Name: 执行守护者 (Execution Guardian) Core Capabilities:
- Real-time execution monitoring with 4 abort conditions: timeout, output size, token count, critical keywords
- Lightweight checks (<1ms), zero external dependencies
- Dynamically configurable thresholds (max_duration_sec, max_output_tokens, etc.)
from scripts.collaboration.execution_guard import ExecutionGuard
guard = ExecutionGuard(max_duration_sec=300.0, max_output_tokens=8000)
should_abort, reason = guard.check_abort(
worker_output="Generating code...",
elapsed_time=120.5,
token_count=5000
)
if should_abort:
print(f"Aborting: {reason}")
# Example: "Timeout exceeded: 120.5s > 300.0s"
# Also detect warning keywords (without triggering abort)
warnings = guard.check_warnings("WARNING: High memory usage")
print(f"Warnings: {warnings}") # ['WARNING']Chinese Name: 性能指纹系统 (Performance Fingerprint System) Core Capabilities:
- Unified execution fingerprint recording (fuses 4 data sources: invocation counts, latency, state snapshots, retrospective deviations)
- Pure Python TF-IDF implementation (no sklearn/numpy), supports English/Chinese mixed content
- JSON persistence to
.devsquad_data/fingerprints/, graceful cold-start degradation
from scripts.collaboration.performance_fingerprint import PerformanceFingerprint
fingerprint = PerformanceFingerprint()
fid = fingerprint.record_execution(
task="Implement user authentication",
result=dispatch_result,
timing={"total": 12.5, "planning": 2.0, "coding": 8.0, "review": 2.5},
roles_used=["architect", "coder", "tester"],
)
print(f"Fingerprint ID: {fid}") # fp_20260518_143052_a1b2c3d4
# Find similar historical tasks using TF-IDF
similar = fingerprint.find_similar("Add login page", top_k=3)
for case in similar:
print(f"Task: {case['task']}")
print(f"Similarity: {case['similarity']:.2%}")
print(f"Roles used: {case['roles_used']}")
print(f"Success: {case['success']}")
# Get overall statistics
stats = fingerprint.get_stats()
print(f"Total executions: {stats['total']}")
print(f"Success rate: {stats['success_rate']:.1%}")Chinese Name: 相似任务推荐器 (Similar Task Recommender) Core Capabilities:
- TF-IDF-based task similarity search with historical success configuration recommendations
- Intelligent role combination recommendation, intent prediction, execution time estimation
- Confidence scoring (high/medium/low), graceful cold-start degradation
from scripts.collaboration.similar_task_recommender import SimilarTaskRecommender
recommender = SimilarTaskRecommender()
result = recommender.recommend("Design user authentication system")
print(f"Recommended roles: {result['recommended_roles']}")
# Output: ['architect', 'coder', 'tester', 'security']
print(f"Confidence: {result['confidence']}") # high/medium/low
print(f"Estimated duration: {result['estimated_duration_s']:.1f}s")
# View similar case details
for case in result['similar_cases']:
print(f"Task: {case['task']}")
print(f"Similarity: {case['similarity']:.2%}")
print(f"Historical roles: {case['roles']}")
print(f"Success: {case['success']}")
# Quick method: get role suggestions only
roles = recommender.get_role_suggestion("Implement payment API")
print(f"Suggested roles: {roles}")Chinese Name: 自适应角色选择器 (Adaptive Role Selector) Core Capabilities:
- Three-tier selection strategy based on historical success rates (similar tasks → intent match → fallback to default)
- Configurable minimum success rate and maximum role count
- Supports manual statistics updates and comprehensive role effectiveness reporting
from scripts.collaboration.adaptive_role_selector import AdaptiveRoleSelector
selector = AdaptiveRoleSelector()
roles = selector.select_roles(
task="Build high-concurrency microservices architecture",
intent="feature_implementation",
min_success_rate=0.5,
max_roles=5,
)
print(f"Recommended roles: {roles}")
# Output: ['architect', 'devops', 'security', 'tester']
# Or: [] (returns empty when no historical data, caller falls back to default RoleMatcher)
# Manually update statistics (for external system integration)
selector.update_stats(["architect", "coder"], success=True, duration_s=12.5)
# Generate role effectiveness report
report = selector.get_role_report()
for role_name, metrics in report.items():
print(f"{role_name}: success_rate={metrics['success_rate']:.1%}, "
f"avg_duration={metrics['avg_duration']:.1f}s")The 5 cybernetic modules are designed as non-invasive wrappers — they work independently or together without modifying existing core logic:
User Task
↓
[SimilarTaskRecommender] ← Optional: suggest roles from history
↓
[AdaptiveRoleSelector] ← Optional: optimize role selection
↓
[MultiAgentDispatcher]
↓
[FeedbackControlLoop] ← Wrap dispatcher for auto-iteration
↓ [each worker step]
[ExecutionGuard] ← Guard each worker execution
↓
[PerformanceFingerprint] ← Record after dispatch completes
Recommended usage (progressive adoption):
from scripts.collaboration import (
MultiAgentDispatcher, FeedbackControlLoop,
ExecutionGuard, PerformanceFingerprint
)
dispatcher = MultiAgentDispatcher()
guard = ExecutionGuard()
fingerprint = PerformanceFingerprint()
# Option 1: Full cybernetics stack
loop = FeedbackControlLoop(dispatcher, quality_gate=0.7)
result = loop.run("Your task here")
# Option 2: Guard only (minimal adoption)
result = dispatcher.dispatch("Your task")
for w in result.worker_results:
abort, reason = guard.check_abort(w.output, w.duration)
if abort:
print(f"Aborted: {reason}")
# Option 3: Learning only
fingerprint.record_execution("task", result, result.timing, result.matched_roles)
similar = fingerprint.find_similar("new task", top_k=3)All modules are optional switches — DevSquad works perfectly without them.
# Interactive setup wizard (1-2 minutes)
python scripts/cli.py init
# Then start collaborating!
devsquad dispatch -t "your task description"# Start Streamlit dashboard with authentication
streamlit run scripts/dashboard.py
# Open http://localhost:8501
# ⚠️ Security: Default credentials are for initial setup only.
# Login with default account, then change password immediately.
# Username: admin Password: <your-secure-password>
# Or set via environment variables: $DASHBOARD_USER / $DASHBOARD_PASS# Install dependencies
pip install fastapi uvicorn
# Start API server
uvicorn scripts.api_server:app --host 0.0.0.0 --port 8000 --reload
# Access Swagger UI: http://localhost:8000/docs
# Access ReDoc: http://localhost:8000/redoc# Standard CLI usage
python scripts/cli.py lifecycle build
# Enhanced visual output
python scripts/cli.py lifecycle build --visual --verbose┌─────────────────────────────────────────────────────────────┐
│ User Access Layer │
│ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │
│ │ Streamlit │ │ FastAPI REST │ │ CLI/Notebook │ │
│ │ Dashboard │ │ API Server │ │ (Existing) │ │
│ │ (Auth+HTTPS) │ │ (Swagger) │ │ │ │
│ └──────┬───────┘ └──────┬───────┘ └──────────────┘ │
└─────────┼───────────────┼───────────────────────────────────┘
│ │
▼ ▼
┌─────────────────────────────────────────────────────────────┐
│ Business Logic Layer │
│ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │
│ │AuthManager │ │AlertManager │ │HistoryMgr │ │
│ │(RBAC Auth) │ │(Multi-Chnl) │ │(SQLite TSDB)│ │
│ └─────────────┘ └─────────────┘ └─────────────┘ │
│ ┌─────────────────────────────────────────────┐ │
│ │ LifecycleProtocol (11-Phase Engine) │ │
│ │ UnifiedGateEngine + CheckpointManager │ │
│ └─────────────────────────────────────────────┘ │
└─────────────────────────┬───────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────┐
│ Data Persistence Layer │
│ ┌────────────┐ ┌────────────┐ ┌────────────────────────┐ │
│ │ SQLite DB │ │ YAML Config│ │ Checkpoint Files │ │
│ │ (History) │ │ (Deploy) │ │ (Lifecycle State) │ │
│ └────────────┘ └────────────┘ └────────────────────────┘ │
└─────────────────────────────────────────────────────────────┘
Milestone anchor verification that ensures critical checkpoints are properly validated before proceeding:
- Anchor Point Definition — Define mandatory validation anchors at key lifecycle milestones
- Cross-Phase Verification — Verify consistency between phase outputs and anchor criteria
- Drift Detection — Detect when project execution drifts from defined anchor points
- Auto-Recovery — Suggest corrective actions when anchor checks fail
Independent retrospective mechanism for continuous improvement after each dispatch cycle:
- Post-Dispatch Review — Automatically analyze what went well and what could improve
- Pattern Extraction — Extract reusable patterns from successful collaborations
- Anti-Pattern Detection — Identify recurring issues and suggest process improvements
- Metric Trend Analysis — Track quality metrics across dispatches to spot degradation
Thread-safe feature invocation counter for data-driven feature optimization:
- Invocation Tracking — Count every feature call (dispatch, anchor_check, retrospective, consensus, etc.)
- Usage Reports — Top features, unused features, low-usage features with markdown export
- Auto-Persist — Periodic JSON persistence every 100 ticks
- 30 Known Features — Pre-registered feature set covering all DevSquad capabilities
Structured goal management that decomposes high-level objectives into trackable, verifiable sub-goals:
- Goal Decomposition — Break complex objectives into hierarchical sub-goals with clear criteria
- Progress Tracking — Real-time progress measurement against defined goal structure
- Dependency Mapping — Visualize and manage dependencies between sub-goals
- Completion Verification — Automated verification that goals meet their success criteria
Automatic backend failover that ensures LLM availability even when primary backends are down:
- Health Monitoring — Continuous health checks for all configured LLM backends
- Automatic Failover — Seamlessly switch to backup backend when primary fails
- Priority-Based Routing — Configure backend priority order (e.g., OpenAI → Anthropic → Mock)
- Recovery Detection — Automatically restore primary backend when it recovers
- Prove-It Pattern: Every completion claim must include verifiable evidence (test output, diff, benchmark)
- 7 Red Flags:
no_test|tests_pass_first_run|no_regression_test|no_security_scan|no_perf_baseline|vague_description|evidence_missing - Auto-active: Integrated into TaskCompletionChecker — zero config required
- Multi-user support with role-based access control (RBAC)
- Three roles: Admin (full access), Operator (execute), Viewer (read-only)
- Secure password hashing with SHA-256
- Session management for Streamlit dashboard
- OAuth2 support (optional, for enterprise deployments)
- 10+ endpoints for complete lifecycle management
- Automatic OpenAPI/Swagger documentation at
/docs - CORS middleware for cross-origin requests
- Request timing and comprehensive logging
- Standardized error responses
Key Endpoints:
Lifecycle:
GET /api/v1/lifecycle/phases → List all 11 phases
POST /api/v1/lifecycle/actions → Execute phase actions
GET /api/v1/lifecycle/status → Current status
Metrics:
GET /api/v1/metrics/current → Real-time metrics
GET /api/v1/metrics/history → Historical data
Gates:
GET /api/v1/gates/status → All gate statuses
POST /api/v1/gates/check → Check specific gate
System:
GET /api/v1/health → Health check
- 4 severity levels: INFO, WARNING, ERROR, CRITICAL
- Multiple channels: Console, Slack, Email, Webhook
- Rate limiting to prevent alert spam (configurable)
- Deduplication within time window
- Alert history tracking and statistics
- Metrics snapshots with time-range queries
- Alert history with acknowledgment tracking
- API request logs with performance metrics
- Lifecycle events audit trail
- Automatic cleanup with configurable retention
- Streamlit Dashboard: Real-time monitoring with authentication
- CLI Visual Module: Rich terminal output with colors and icons
- Jupyter Notebook: Interactive 10-section tutorial
- Benchmark Reports: HTML/JSON performance reports
DevSquad provides 6 atomic sub-skills that can be used independently or together. Each sub-skill is a thin wrapper (~50 lines) importing existing core modules — no duplicated logic.
skills/
├── dispatch/ → DispatchSkill — MultiAgentDispatcher (7-role orchestration)
├── intent/ → IntentSkill — IntentWorkflowMapper (6 intents × 3 languages)
├── review/ → ReviewSkill — FiveAxisConsensusEngine (5-axis code review)
├── security/ → SecuritySkill — InputValidator + OperationClassifier + PermissionGuard
├── test/ → TestSkill — TestQualityGuard + test strategy generation
└── retrospective/ → RetroSkill — RetrospectiveEngine + pattern extraction
| Skill | Core Method | Wraps | Mock Mode |
|---|---|---|---|
dispatch |
run(task, roles, mode) |
MultiAgentDispatcher | ✅ |
intent |
detect(text, lang) |
IntentWorkflowMapper | ✅ |
review |
review(code) |
FiveAxisConsensusEngine | ✅ |
security |
scan_input(text) |
InputValidator + OpClassifier | ✅ |
test |
generate_strategy(module) |
TestQualityGuard | ✅ |
retrospective |
run_retrospective(results) |
RetrospectiveEngine | ✅ |
# Direct import (recommended for single skill)
from skills.dispatch.handler import DispatchSkill
result = DispatchSkill().run("Fix login bug", roles=["coder", "tester"])
# Via registry (dynamic discovery)
from skills import get_skill, list_skills
print(list_skills()) # ['dispatch', 'intent', 'review', 'security', 'test', 'retrospective']
skill = get_skill("security")
result = skill.scan_input("DROP TABLE users; --")All sub-skills work without any API key in Mock mode.
Unified Lifecycle Architecture - Resolves CLI 6 commands vs 11-phase lifecycle:
CLI View Layer (6 commands) Core Engine (11 phases)
┌─────────────────────┐ ┌──────────────────────────┐
│ spec → P1, P2 │───View ──→│ P1: Requirements │
│ plan → P7 │ Mapping │ P2: Architecture │
│ build → P8 │ │ P3: Technical Design │
│ test → P9 │ │ ... │
│ review → P8,P6 │ │ P10: Deployment │
│ ship → P10 │ │ P11: Operations │
└─────────────────────┘ └──────────────────────────┘
↓ ↓
UnifiedGateEngine CheckpointManager
(Phase + Worker gates) (Lifecycle state persistence)
Core Components:
- ✅ LifecycleProtocol - Abstract interface for unified lifecycle management
- ✅ UnifiedGateEngine - Integrates VerificationGate + Phase transition gates
- ✅ FullLifecycleAdapter - Complete 11-phase lifecycle with dependency resolution
- ✅ Enhanced CheckpointManager - Auto save/restore lifecycle state across sessions
DevSquad transforms a single AI task into a multi-role AI collaboration. It automatically dispatches your task to the right combination of expert roles — architect, product manager, coder, tester, security reviewer, DevOps — orchestrates their parallel collaboration through a shared workspace, resolves conflicts via weighted consensus voting, and delivers a unified structured report.
You: "Design a microservices e-commerce backend"
│
▼
┌─────────────────┐
│ InputValidator ──→ Security check (XSS, SQL injection, prompt injection)
└────────┬────────┘
▼
┌─────────────────┐
│ RoleMatcher ──→ Auto-match: architect + devops + security
└────────┬────────┘
▼
┌──────────┬──────────┬──────────┐
│ Architect │ DevOps │ Security │ ← ThreadPoolExecutor parallel execution
│(Design) │(Infra) │(Threat) │
└────┬──────┴────┬─────┴────┬────┘
└────────────┼───────────┘
▼
┌──────────────────┐
│ Scratchpad │ ← Shared blackboard (real-time sync)
└────────┬─────────┘
▼
┌──────────────────┐
│ Consensus Engine │ ← Weighted vote + veto + escalation
└────────┬─────────┘
▼
┌──────────────────┐
│ Structured Report │ ← Findings + Action Items (H/M/L)
└──────────────────┘
- Python 3.9+ (3.9, 3.10, 3.11, 3.12 supported)
- pip or pipenv for package management
# Install from PyPI — zero setup, ready to use
pip install devsquad
# With optional dependencies
pip install "devsquad[api]" # FastAPI + Streamlit dashboard
pip install "devsquad[all]" # All optional featuresgit clone https://github.com/your-org/DevSquad.git
cd DevSquad
# Install core package (minimal dependencies)
pip install -e .
# Ready to use!
devsquad dispatch -t "Design user authentication system"# Clone and install with all production features
git clone https://github.com/your-org/DevSquad.git
cd DevSquad
# Install with API server dependencies
pip install -e ".[api]"
# Or install all optional features
pip install -e ".[all]"Optional Feature Groups:
# API Server (FastAPI + Uvicorn)
pip install -e ".[api]"
# Visualization (Streamlit + Jupyter)
pip install -e ".[visualization]"
# Alerting (Slack SDK)
pip install -e ".[alerts]"
# Development & Testing
pip install -e ".[dev]"
# Everything combined
pip install -e ".[all]"# Check version
devsquad --version
# Expected: devsquad 3.6.1
# Run tests
pytest tests/ -v --tb=short
# Expected: 1500+ passed1. CLI (Recommended)
# Mock mode (default) — no API key needed
python3 scripts/cli.py dispatch -t "Design user authentication system"
# Real AI output — set environment variables first
export OPENAI_API_KEY="sk-..."
export OPENAI_BASE_URL="https://api.openai.com/v1" # optional
export OPENAI_MODEL="gpt-4" # optional
python3 scripts/cli.py dispatch -t "Design auth system" --backend openai
# Specify roles (short IDs: arch/pm/test/coder/ui/infra/sec)
python3 scripts/cli.py dispatch -t "Design auth system" -r arch sec --backend openai
# Stream output in real-time
python3 scripts/cli.py dispatch -t "Design auth system" -r arch --backend openai --stream
# Other commands
python3 scripts/cli.py status # System status
python3 scripts/cli.py roles # List available roles
python3 scripts/cli.py --version # Show version (3.6.1)2. Python API
from scripts.collaboration.dispatcher import MultiAgentDispatcher
# Mock mode (default)
disp = MultiAgentDispatcher()
result = disp.dispatch("Design REST API for user management")
print(result.to_markdown())
disp.shutdown()
# With LLM backend
from scripts.collaboration.llm_backend import create_backend
backend = create_backend("openai", api_key="sk-...", base_url="https://api.openai.com/v1")
disp = MultiAgentDispatcher(llm_backend=backend)
result = disp.dispatch("Design auth system", roles=["architect", "security"])
print(result.summary)
disp.shutdown()4. Sub-Skills (Lightweight Independent)
# Each sub-skill works independently — no Dispatcher needed
from skills.security.handler import SecuritySkill
risk = SecuritySkill().scan_input("malicious input")
from skills.review.handler import ReviewScore
verdict = ReviewSkill().review(code_snippet)
from skills.intent.handler import IntentSkill
intent = IntentSkill().detect("修复登录漏洞", lang="zh")3. MCP Server (for Cursor / any MCP client)
pip install mcp
python3 scripts/mcp_server.py # stdio mode
python3 scripts/mcp_server.py --port 8080 # SSE modeExposes 6 tools: multiagent_dispatch, multiagent_quick, multiagent_roles,
multiagent_status, multiagent_analyze, multiagent_shutdown.
| Role | CLI ID | Aliases | Weight | Best For |
|---|---|---|---|---|
| Architect | arch |
architect |
1.5 | System design, tech stack, performance/security architecture |
| Product Manager | pm |
product-manager |
1.2 | Requirements, user stories, acceptance criteria |
| Security Expert | sec |
security |
1.1 | Threat modeling, vulnerability audit, compliance |
| Tester | test |
tester, qa |
1.0 | Test strategy, quality assurance, edge cases |
| Coder | coder |
solo-coder, dev |
1.0 | Implementation, code review, performance optimization |
| DevOps | infra |
devops |
1.0 | CI/CD, containerization, monitoring, infrastructure |
| UI Designer | ui |
ui-designer |
0.9 | UX flow, interaction design, accessibility |
Auto-match: If no roles specified, the dispatcher automatically matches based on task keywords.
DevSquad is built on a layered architecture with clear separation of concerns:
┌─────────────────────────────────────────────────┐
│ CLI / MCP / API │ Entry Points
├─────────────────────────────────────────────────┤
│ MultiAgentDispatcher │ Orchestration
│ ┌────────────┬──────────────┬────────────────┐ │
│ │RoleMatcher │ReportFormatter│InputValidator │ │ Extracted Components
│ └────────────┴──────────────┴────────────────┘ │
│ ┌────────────────────────────────────────────┐ │
│ │ RuleCollector (NL Rule Intercept) │ │ Rule Collection
│ └────────────────────────────────────────────┘ │
├─────────────────────────────────────────────────┤
│ Coordinator │ Task Planning
│ ┌──────────┬───────────┬────────────────────┐ │
│ │ Scratchpad│ Consensus │ BatchScheduler │ │ Collaboration
│ └──────────┴───────────┴────────────────────┘ │
├─────────────────────────────────────────────────┤
│ Worker (per role) │ Execution
│ ┌────────────────────────────────────────────┐ │
│ │ PromptAssembler → LLMBackend → Output │ │
│ └────────────────────────────────────────────┘ │
├─────────────────────────────────────────────────┤
│ LLMBackend: Mock | OpenAI | Anthropic │ LLM Layer
├─────────────────────────────────────────────────┤
│ CheckpointManager | WorkflowEngine | ... │ Infrastructure
└─────────────────────────────────────────────────┘
Milestone anchor verification that ensures critical checkpoints are validated before proceeding:
from scripts.collaboration.anchor_checker import AnchorChecker
checker = AnchorChecker()
checker.define_anchor("architecture_complete", criteria=["API spec defined", "tech stack selected"])
result = checker.check_anchor("architecture_complete", phase_output)
print(f"Anchor passed: {result.passed}")
print(f"Drift detected: {result.drift_score}")Features:
- Cross-phase consistency verification
- Drift detection with severity scoring
- Auto-recovery suggestions
- Anchor point persistence
Independent retrospective mechanism for continuous improvement:
from scripts.collaboration.retrospective_engine import RetrospectiveEngine
engine = RetrospectiveEngine()
report = engine.run_retrospective(dispatch_result)
print(f"Patterns found: {len(report.patterns)}")
print(f"Anti-patterns: {len(report.anti_patterns)}")
print(f"Improvement suggestions: {report.suggestions}")Features:
- Post-dispatch quality analysis
- Pattern and anti-pattern extraction
- Metric trend tracking
- Actionable improvement suggestions
Structured goal management with hierarchical decomposition:
from scripts.collaboration.structured_goal import StructuredGoal
goal = StructuredGoal("Build e-commerce platform")
goal.add_sub_goal("User auth", criteria=["OAuth2 support", "2FA ready"])
goal.add_sub_goal("Product catalog", criteria=["Search", "Filter", "Pagination"])
progress = goal.get_progress()
print(f"Overall: {progress.completion_pct}%")Features:
- Hierarchical goal decomposition
- Dependency mapping between sub-goals
- Real-time progress tracking
- Automated completion verification
Automatic LLM backend failover for high availability:
from scripts.collaboration.llm_backend import FallbackBackend
backend = FallbackBackend(
primary="openai",
fallbacks=["anthropic", "mock"],
health_check_interval=30,
)
result = backend.generate("Design auth system")
# Automatically fails over if primary is downFeatures:
- Continuous backend health monitoring
- Seamless automatic failover
- Priority-based routing configuration
- Automatic primary recovery detection
Automatically detect and store user rules from natural language input:
# User says: "记住规则:写代码时必须加注释"
# DevSquad automatically:
# 1. Detects rule-storing intent
# 2. Extracts: trigger="写代码时", action="必须加注释", type="always"
# 3. Sanitizes content (removes dangerous patterns)
# 4. Stores via CarryMem or local JSON fallback
# List stored rules
# User says: "列出规则" → Returns all stored rules
# Delete a rule
# User says: "删除规则 RULE-LOCAL-abc123"Pipeline: User Input → IntentDetector → RuleExtractor → RuleSanitizer → RuleStorage (CarryMem + local JSON)
Features:
- 11 intent patterns (Chinese + English)
- 4 rule types: always / avoid / prefer / forbid
- Prompt injection protection in rule content
- CarryMem primary + local JSON fallback storage
- Automatic rule injection into Worker prompts
See Integration Guide for detailed usage.
- InputValidator: XSS, SQL injection, command injection, HTML injection detection
- Prompt Injection Protection: 21+ patterns (ignore previous instructions, jailbreak, DAN mode, system prompt extraction, etc.)
- API Key Safety: Environment variables only, never CLI arguments or logs
- PermissionGuard: 4-level safety gate (PLAN → DEFAULT → AUTO → BYPASS)
- ThreadPoolExecutor: Real parallel execution for multi-role dispatch
- LLM Cache: TTL-based LRU cache with disk persistence (60-80% cost reduction)
- LLM Retry: Exponential backoff + circuit breaker + multi-backend fallback
- Streaming Output: Real-time chunk-by-chunk LLM output via
--stream
- CheckpointManager: SHA256 integrity, handoff documents, auto-cleanup
- WorkflowEngine: Task-to-workflow auto-split, step execution, resume from checkpoint, 11-phase lifecycle templates (full/backend/frontend/internal_tool/minimal), requirement change management
- TaskCompletionChecker: DispatchResult/ScheduleResult completion tracking
- ConsensusEngine: Weighted voting with veto power and human escalation
DevSquad V3.6.1 defines an 11-phase (4 optional) project lifecycle with clear roles, dependencies, and gate conditions:
P1 → P2 ──┬──→ P3 ──→ P6 ──→ P7 ──→ P8 ──→ P9 ──→ P10 ──→ P11
├──→ P4(∥P3) ──↗
└──→ P5(dep P1+P3) ──↗
| Template | Phases | Use Case |
|---|---|---|
full |
P1-P11 | Complete project |
backend |
No P5 | Backend services |
frontend |
No P4,P6 | Frontend applications |
internal_tool |
No P4,P5,P6,P11 | Internal tools |
minimal |
P1,P3,P7,P8,P9 | Minimum set |
See GUIDE.md §4 for full lifecycle details with gate conditions and requirement change process.
- Configuration File:
.devsquad.yamlin project root with env var overrides - Quality Control Injection: Auto-inject QC rules (hallucination prevention, overconfidence check, security guard, RACI protocol) into Worker prompts based on
.devsquad.yamlconfig - Docker Support:
docker build -t devsquad . && docker run devsquad dispatch -t "task" - GitHub Actions CI: Python 3.9-3.12 matrix testing
- pip installable:
pip install -e .with optional dependencies
💡 Table too wide? View modules online or use
devsquad --help modulesfor a compact list.
| Module | File | Purpose |
|---|---|---|
| MultiAgentDispatcher | dispatcher.py |
Unified entry point |
| Coordinator | coordinator.py |
Global orchestration: plan → assign → execute → collect |
| Worker | worker.py |
Role executor with LLM backend integration |
| EnhancedWorker | enhanced_worker.py |
Worker with auto QA (briefing + confidence + retry + memory rules) |
| Scratchpad | scratchpad.py |
Shared blackboard for inter-worker communication |
| ConsensusEngine | consensus.py |
Weighted voting + veto + escalation |
| RoleMatcher | role_matcher.py |
Keyword-based role matching with alias resolution |
| ReportFormatter | report_formatter.py |
Structured/compact/detailed report generation |
| InputValidator | input_validator.py |
Security validation + prompt injection detection |
| AISemanticMatcher | ai_semantic_matcher.py |
LLM-powered semantic role matching |
| CheckpointManager | checkpoint_manager.py |
State persistence + handoff documents |
| WorkflowEngine | workflow_engine.py |
Task-to-workflow auto-split + 11-phase lifecycle templates + requirement change |
| TaskCompletionChecker | task_completion_checker.py |
Completion tracking + progress reporting |
| CodeMapGenerator | code_map_generator.py |
Python AST-based code structure analysis |
| DualLayerContextManager | dual_layer_context.py |
Project-level + task-level context management |
| SkillRegistry | skill_registry.py |
Reusable skill registration + discovery |
| IntentWorkflowMapper | intent_workflow_mapper.py |
User intent → workflow chain mapping (6 intents × 3 languages) |
| OperationClassifier | operation_classifier.py |
Three-tier operation classification (ALWAYS_SAFE/NEEDS_REVIEW/FORBIDDEN) |
| FiveAxisConsensusEngine | five_axis_consensus.py |
Five-axis review consensus with weighted voting |
| FeatureUsageTracker | feature_usage_tracker.py |
Feature usage tracking + reporting + auto-persistence |
| LLMBackend | llm_backend.py |
Mock/OpenAI/Anthropic with streaming support |
| LLMCache | llm_cache.py |
TTL-based LRU cache with disk persistence |
| LLMRetry | llm_retry.py |
Exponential backoff + circuit breaker |
| ConfigManager | config_loader.py |
YAML config + env var overrides |
| PromptAssembler | prompt_assembler.py |
Dynamic prompt assembly + QC rule injection |
| AgentBriefing | agent_briefing.py |
Context-aware task briefing with priority filtering |
| ConfidenceScorer | confidence_score.py |
5-factor response quality assessment |
| PerformanceMonitor | performance_monitor.py |
P95/P99 tracking + CPU/memory monitoring |
| MCEAdapter | mce_adapter.py |
CarryMem integration adapter (optional dependency, supports match_rules + format_rules_as_prompt + add_rule) |
| Protocols | protocols.py |
Interface definitions (CacheProvider, MemoryProvider, etc.) |
| NullProviders | null_providers.py |
Graceful degradation providers |
| PermissionGuard | permission_guard.py |
4-level safety gate |
| MemoryBridge | memory_bridge.py |
Cross-session memory |
| BatchScheduler | batch_scheduler.py |
Batch task scheduling |
| ContextCompressor | context_compressor.py |
Context compression for long tasks |
| RoleTemplateMarket | role_template_market.py |
Role template sharing marketplace |
| Skillifier | skillifier.py |
Auto skill learning from tasks |
| UsageTracker | usage_tracker.py |
Token/cost tracking |
| WarmupManager | warmup_manager.py |
Startup warmup optimization |
| TestQualityGuard | test_quality_guard.py |
Test quality enforcement |
| PromptVariantGenerator | prompt_variant_generator.py |
A/B prompt testing |
| ConfigManager (YAML) | config_manager.py |
Project-level YAML config |
| WorkBuddyClawSource | memory_bridge.py |
WorkBuddy read-only bridge |
| Models | models.py |
Shared data models and type definitions |
| LLMCacheAsync | llm_cache_async.py |
Async LLM cache for concurrent workloads |
| LLMRetryAsync | llm_retry_async.py |
Async LLM retry with backoff |
| IntegrationExample | integration_example.py |
DevSquad integration example code |
| AsyncIntegrationExample | async_integration_example.py |
Async DevSquad integration example |
| AnchorChecker | anchor_checker.py |
Milestone anchor verification + drift detection + auto-recovery |
| RetrospectiveEngine | retrospective.py |
Independent post-dispatch retrospective + pattern extraction + anti-pattern detection |
| FeatureUsageTracker | feature_usage_tracker.py |
Feature invocation counter + usage reports + auto-persist |
| FallbackBackend | llm_backend.py |
Automatic LLM backend failover with health monitoring |
Create .devsquad.yaml in your project root:
quality_control:
enabled: true
strict_mode: true
min_quality_score: 85
ai_quality_control:
enabled: true
hallucination_check:
enabled: true
require_traceable_references: true
overconfidence_check:
enabled: true
require_alternatives_min: 2
ai_security_guard:
enabled: true
permission_level: "DEFAULT"
ai_team_collaboration:
enabled: true
raci:
mode: "strict"
llm:
backend: openai
base_url: "" # Set via LLM_BASE_URL env var
model: "" # Set via LLM_MODEL env var
timeout: 120
log_level: WARNINGOr use environment variables (higher priority):
export DEVSQUAD_LLM_BACKEND=openai
export DEVSQUAD_BASE_URL=https://api.openai.com/v1
export DEVSQUAD_MODEL=gpt-4
export OPENAI_API_KEY=sk-...| Variable | Purpose | Default |
|---|---|---|
OPENAI_API_KEY |
OpenAI API key | None (required for OpenAI backend) |
OPENAI_BASE_URL |
OpenAI-compatible base URL | None |
OPENAI_MODEL |
Model name | gpt-4 |
ANTHROPIC_API_KEY |
Anthropic API key | None (required for Anthropic backend) |
ANTHROPIC_MODEL |
Model name | claude-sonnet-4-20250514 |
DEVSQUAD_LLM_BACKEND |
Default backend type | mock |
DEVSQUAD_LOG_LEVEL |
Logging level | WARNING |
# Core tests (748+ tests all passing)
python3 -m pytest scripts/collaboration/core_test.py \
scripts/collaboration/role_mapping_test.py \
scripts/collaboration/upstream_test.py \
scripts/collaboration/mce_adapter_test.py \
tests/ test_v35_integration.py \
tests/test_anti_rationalization.py \
tests/test_verification_gate.py \
tests/test_intent_workflow_mapper.py \
tests/test_cli_lifecycle.py -v
# Quick smoke test
python3 scripts/cli.py --version # 3.6.1
python3 scripts/cli.py status # System ready
python3 scripts/cli.py roles # List 7 roles
# Lifecycle commands (NEW in v3.4.1)
python3 scripts/cli.py spec -t "User authentication system"
python3 scripts/cli.py build -t "Implement login API"
python3 scripts/cli.py test -t "Run all unit tests"
python3 scripts/cli.py review -t "Check PR #123"
python3 scripts/cli.py ship -t "Deploy to production"After upgrading DevSquad, run these commands to verify your environment:
# Quick health check (should complete in < 30s)
python3 scripts/cli.py --version # Expected: DevSquad 3.6.1
python3 scripts/cli.py status # Expected: System ready
python3 scripts/cli.py roles # Expected: 7 core roles listed
# Full test suite
python3 -m pytest tests/ -q --tb=line # Expected: 1662 passed# Install coverage tool first: pip install pytest-cov
python3 -m pytest tests/ --cov=scripts --cov-report=term-missing --cov-fail-under=80
# Expected: coverage ≥ 80%, detailed missing-line reportDevSquad uses a priority-based test layering strategy:
| Priority | Scope | Examples | Count |
|---|---|---|---|
| P0 | Quality Framework Core | AntiRationalization (39), VerificationGate (42), IntentWorkflowMapper (58), AuthManager (35) | ~200 |
| P1 | Enhancement Modules | FiveAxisConsensus (29), OperationClassifier (27), OutputSlicer (26), CIFeedbackAdapter (22) | ~150 |
| P1+ | Cybernetics (V3.6.1) | FeedbackControlLoop (19), ExecutionGuard (40), PerformanceFingerprint (13), SimilarTaskRecommender (17), AdaptiveRoleSelector (21) | 110 |
| P2 | Integration & E2E | Full lifecycle dispatch, cross-module integration | ~200 |
| P3 | Unit per Module | Core dispatcher, RoleMapping, MCEAdapter, LLM backends | ~400+ |
Total: 1662 tests
Run by priority:
# P0 only (critical path, < 10s)
python3 -m pytest tests/ -k "anti_ratif or verification or intent_workflow or auth" -q
# P0 + P1 (quality + enhancement, < 30s)
python3 -m pytest tests/ -k "anti_ratif or verification or intent or auth or five_axis or operation" -q
# Full suite
python3 -m pytest tests/ -q --tb=line| Document | Description |
|---|---|
| QUICK_START_EN.md | Quick start guide (English, 5 minutes) |
| REFERENCE_GUIDE_EN.md | Complete reference guide (English) |
| QUICK_START_JP.md | クイックスタートガイド (日本語, 5分) |
| REFERENCE_GUIDE_JP.md | 完全リファレンスガイド (日本語) |
| GUIDE.md | Complete user guide (Chinese) |
| GUIDE_EN.md | |
| GUIDE_JP.md | |
| INSTALL.md | Installation guide (Unix + Windows) |
| EXAMPLES.md | Real-world usage examples |
| SKILL.md | Skill manual (EN/CN/JP) |
| CLAUDE.md | Claude Code project instructions |
| CHANGELOG.md | Version history |
| README-CN.md | 中文说明 |
| README-JP.md | 日本語説明 |
New to DevSquad? Start here:
# 1. Run the interactive demo (3 scenarios, < 15 seconds)
python examples/quick_demo.py
# 2. Read the quick start guide
# English: docs/i18n/QUICK_START_EN.md
# Japanese: docs/i18n/QUICK_START_JP.md
# 3. Your first dispatch
python3 scripts/cli.py dispatch -t "Design user authentication system"# Deploy with Helm
helm install devsquad ./helm/devsquad
# Port forward
kubectl port-forward svc/devsquad-api 8000:8000See helm/devsquad/README.md for full documentation.
DevSquad is not TRAE-exclusive. It supports 6 integration methods:
| Platform | Integration | Setup Difficulty | Key Features Available |
|---|---|---|---|
| TRAE IDE | Native Skill (skill-manifest.yaml) |
Zero config | Full: Dispatcher + Dashboard + CLI |
| Claude Code | MCP Server / Python import | Low | 6 MCP tools or direct API |
| Cursor | MCP Server (stdio mode) |
Low | Same as Claude Code |
| OpenClaw / WorkBuddy Claw | WorkBuddyClawSource bridge |
Auto | Read-only memory bridge |
| Any MCP Client | stdio / SSE dual mode | Low | 6 tools, configurable port |
| Pure Python | pip install -e . |
Low | CLI + API + Skills + REST |
| Docker | docker build & run |
Low | Isolated container with all features |
# === TRAE IDE ===
# Just use it — zero configuration
# === Claude Code / Cursor (MCP) ===
# Add to .claude/mcp.json or .cursor/mcp.json:
# {"mcpServers": {"devsquad": {"command": "python", "args": ["/path/to/mcp_server.py"]}}}
# === Pure Python ===
pip install -e "/path/to/DevSquad[all]"
devsquad dispatch -t "task description"
# === REST API ===
uvicorn scripts.api_server:app --port 8000 # → http://localhost:8000/docs
# === Docker ===
docker build -t devsquad . && docker run -it devsquad dispatch -t "test"| Date | Version | Highlights |
|---|---|---|
| 2026-05-17 | V3.6.1 | Cybernetics Enhancement — 5 new modules (FeedbackControlLoop/ExecutionGuard/PerformanceFingerprint/SimilarTaskRecommender/AdaptiveRoleSelector) with feedback loops, execution guards, TF-IDF similarity search, and adaptive role selection. Inspired by upstream TraeMultiAgentSkill v2.5's cybernetics architecture. |
| 2026-05-16 | V3.6.0 | Layered Sub-Skill Architecture + Core Modules — 6 atomic sub-skills (dispatch/intent/review/security/test/retrospective) with lazy-loading registry via importlib, each ~50 lines wrapping existing core modules. Plus: AnchorChecker (milestone anchor verification + drift detection), RetrospectiveEngine (independent retrospective + pattern extraction), StructuredGoal (structured goal decomposition + progress tracking), FallbackBackend (automatic LLM failover + health monitoring), FeatureUsageTracker (feature usage tracking + reporting + auto-persistence), 7 module integrations (IntentWorkflowMapper/AISemanticMatcher/DualLayerContextManager/OperationClassifier/SkillRegistry/FiveAxisConsensusEngine/NullProviders), 1662+ tests, 48 core modules. Cross-platform compatibility: Claude Code/Cursor/OpenClaw/Pure Python/Docker/MCP. |
| 2026-05-05 | V3.5.0 | Enhancement Sprint — Code walkthrough enhancement, documentation consistency checks, Karpathy principles, project understanding (AgentBriefing), CLI lifecycle commands, structured output, 748+ tests |
| 2026-05-03 | V3.4.1 | Agent Skills Quality Framework (P0) — AntiRationalizationEngine + VerificationGate + IntentWorkflowMapper + CLI Lifecycle Commands (spec/plan/build/test/review/ship) + 167 new tests + Google Agent Skills integration + 49 core modules |
| 2026-05-02 | V3.4.0 | Foundation Release — Real LLM backend (OpenAI/Anthropic/Mock), ThreadPoolExecutor parallel execution, InputValidator + prompt injection protection, CheckpointManager, WorkflowEngine with 11-phase lifecycle templates (full/backend/frontend/internal_tool/minimal), TaskCompletionChecker, AISemanticMatcher, streaming output, Docker, GitHub Actions CI, config file, CodeMapGenerator, DualLayerContext, SkillRegistry, CarryMem integration, AgentBriefing, ConfidenceScore, EnhancedWorker with auto QA, Protocol interface system, 234+ unit tests, requirement change management with gate mechanism and gap reporting |
| 2026-04-17 | V3.2 | E2E Demo, MCE Adapter, Dispatcher UX |
| 2026-04-16 | V3.0 | Complete redesign — Coordinator/Worker/Scratchpad architecture |
MIT License — see LICENSE for details.
| Link | URL |
|---|---|
| GitHub (This Repo) | https://github.com/lulin70/DevSquad |
| Original / Upstream | https://github.com/weiransoft/TraeMultiAgentSkill |