Skip to content

lulin70/DevSquad

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

185 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

DevSquad — Multi-Role AI Task Orchestrator

📑 Table of Contents

One task → Multi-role AI collaboration → One conclusion
Production Ready | V3.6.1

Python License Tests Version CI Quality Security

Architecture API Dashboard Auth Alerts Database


🚀 V3.6.1: Cybernetics Enhancement Release

DevSquad V3.6.1 adds 5 new cybernetics modules: FeedbackControlLoop for closed-loop feedback control, ExecutionGuard for safe execution with rollback, PerformanceFingerprint for performance baseline tracking, SimilarTaskRecommender for TF-IDF-based task similarity search, and AdaptiveRoleSelector for intelligent role selection based on task characteristics — making multi-agent collaboration more adaptive, self-optimizing, and resilient.

🔄 V3.6.1 Cybernetics Module Details

1️⃣ FeedbackControlLoop (Feedback Controller)

Chinese Name: 反馈闭环控制器 (Feedback Closed-Loop Controller) Core Capabilities:

  • Closed-loop feedback control with automatic iteration until quality threshold met
  • Configurable quality gate (quality_gate) and maximum iterations
  • Lightweight quality assessment (no LLM calls), supports dry-run mode
from scripts.collaboration.feedback_control_loop import FeedbackControlLoop
from scripts.collaboration.dispatcher import MultiAgentDispatcher

dispatcher = MultiAgentDispatcher()
loop = FeedbackControlLoop(dispatcher, quality_gate=0.7, max_iterations=3)
result = loop.run("Design secure auth system", roles=["architect", "security"])
print(f"Iterations: {loop.iteration_count}")
print(f"Best quality: {loop.best_quality:.2f}")
# Automatically iterates until quality gate met or max iterations reached

2️⃣ ExecutionGuard (Execution Guardian)

Chinese Name: 执行守护者 (Execution Guardian) Core Capabilities:

  • Real-time execution monitoring with 4 abort conditions: timeout, output size, token count, critical keywords
  • Lightweight checks (<1ms), zero external dependencies
  • Dynamically configurable thresholds (max_duration_sec, max_output_tokens, etc.)
from scripts.collaboration.execution_guard import ExecutionGuard

guard = ExecutionGuard(max_duration_sec=300.0, max_output_tokens=8000)
should_abort, reason = guard.check_abort(
    worker_output="Generating code...",
    elapsed_time=120.5,
    token_count=5000
)
if should_abort:
    print(f"Aborting: {reason}")
    # Example: "Timeout exceeded: 120.5s > 300.0s"
# Also detect warning keywords (without triggering abort)
warnings = guard.check_warnings("WARNING: High memory usage")
print(f"Warnings: {warnings}")  # ['WARNING']

3️⃣ PerformanceFingerprint (Performance Fingerprint)

Chinese Name: 性能指纹系统 (Performance Fingerprint System) Core Capabilities:

  • Unified execution fingerprint recording (fuses 4 data sources: invocation counts, latency, state snapshots, retrospective deviations)
  • Pure Python TF-IDF implementation (no sklearn/numpy), supports English/Chinese mixed content
  • JSON persistence to .devsquad_data/fingerprints/, graceful cold-start degradation
from scripts.collaboration.performance_fingerprint import PerformanceFingerprint

fingerprint = PerformanceFingerprint()
fid = fingerprint.record_execution(
    task="Implement user authentication",
    result=dispatch_result,
    timing={"total": 12.5, "planning": 2.0, "coding": 8.0, "review": 2.5},
    roles_used=["architect", "coder", "tester"],
)
print(f"Fingerprint ID: {fid}")  # fp_20260518_143052_a1b2c3d4

# Find similar historical tasks using TF-IDF
similar = fingerprint.find_similar("Add login page", top_k=3)
for case in similar:
    print(f"Task: {case['task']}")
    print(f"Similarity: {case['similarity']:.2%}")
    print(f"Roles used: {case['roles_used']}")
    print(f"Success: {case['success']}")

# Get overall statistics
stats = fingerprint.get_stats()
print(f"Total executions: {stats['total']}")
print(f"Success rate: {stats['success_rate']:.1%}")

4️⃣ SimilarTaskRecommender (Similar Task Recommender)

Chinese Name: 相似任务推荐器 (Similar Task Recommender) Core Capabilities:

  • TF-IDF-based task similarity search with historical success configuration recommendations
  • Intelligent role combination recommendation, intent prediction, execution time estimation
  • Confidence scoring (high/medium/low), graceful cold-start degradation
from scripts.collaboration.similar_task_recommender import SimilarTaskRecommender

recommender = SimilarTaskRecommender()
result = recommender.recommend("Design user authentication system")
print(f"Recommended roles: {result['recommended_roles']}")
# Output: ['architect', 'coder', 'tester', 'security']
print(f"Confidence: {result['confidence']}")  # high/medium/low
print(f"Estimated duration: {result['estimated_duration_s']:.1f}s")

# View similar case details
for case in result['similar_cases']:
    print(f"Task: {case['task']}")
    print(f"Similarity: {case['similarity']:.2%}")
    print(f"Historical roles: {case['roles']}")
    print(f"Success: {case['success']}")

# Quick method: get role suggestions only
roles = recommender.get_role_suggestion("Implement payment API")
print(f"Suggested roles: {roles}")

5️⃣ AdaptiveRoleSelector (Adaptive Role Selector)

Chinese Name: 自适应角色选择器 (Adaptive Role Selector) Core Capabilities:

  • Three-tier selection strategy based on historical success rates (similar tasks → intent match → fallback to default)
  • Configurable minimum success rate and maximum role count
  • Supports manual statistics updates and comprehensive role effectiveness reporting
from scripts.collaboration.adaptive_role_selector import AdaptiveRoleSelector

selector = AdaptiveRoleSelector()
roles = selector.select_roles(
    task="Build high-concurrency microservices architecture",
    intent="feature_implementation",
    min_success_rate=0.5,
    max_roles=5,
)
print(f"Recommended roles: {roles}")
# Output: ['architect', 'devops', 'security', 'tester']
# Or: [] (returns empty when no historical data, caller falls back to default RoleMatcher)

# Manually update statistics (for external system integration)
selector.update_stats(["architect", "coder"], success=True, duration_s=12.5)

# Generate role effectiveness report
report = selector.get_role_report()
for role_name, metrics in report.items():
    print(f"{role_name}: success_rate={metrics['success_rate']:.1%}, "
          f"avg_duration={metrics['avg_duration']:.1f}s")

🔗 Integration Architecture

The 5 cybernetic modules are designed as non-invasive wrappers — they work independently or together without modifying existing core logic:

User Task
    ↓
[SimilarTaskRecommender] ← Optional: suggest roles from history
    ↓
[AdaptiveRoleSelector]   ← Optional: optimize role selection
    ↓
[MultiAgentDispatcher]
    ↓
[FeedbackControlLoop]     ← Wrap dispatcher for auto-iteration
    ↓ [each worker step]
[ExecutionGuard]          ← Guard each worker execution
    ↓
[PerformanceFingerprint]  ← Record after dispatch completes

Recommended usage (progressive adoption):

from scripts.collaboration import (
    MultiAgentDispatcher, FeedbackControlLoop,
    ExecutionGuard, PerformanceFingerprint
)

dispatcher = MultiAgentDispatcher()
guard = ExecutionGuard()
fingerprint = PerformanceFingerprint()

# Option 1: Full cybernetics stack
loop = FeedbackControlLoop(dispatcher, quality_gate=0.7)
result = loop.run("Your task here")

# Option 2: Guard only (minimal adoption)
result = dispatcher.dispatch("Your task")
for w in result.worker_results:
    abort, reason = guard.check_abort(w.output, w.duration)
    if abort:
        print(f"Aborted: {reason}")

# Option 3: Learning only
fingerprint.record_execution("task", result, result.timing, result.matched_roles)
similar = fingerprint.find_similar("new task", top_k=3)

All modules are optional switches — DevSquad works perfectly without them.

🎯 Quick Start (4 Ways to Use DevSquad)

0️⃣ First Time? Start Here!

# Interactive setup wizard (1-2 minutes)
python scripts/cli.py init

# Then start collaborating!
devsquad dispatch -t "your task description"

1️⃣ Interactive Web Dashboard (Recommended)

# Start Streamlit dashboard with authentication
streamlit run scripts/dashboard.py

# Open http://localhost:8501
# ⚠️ Security: Default credentials are for initial setup only.
#    Login with default account, then change password immediately.
#    Username: admin   Password: <your-secure-password>
#    Or set via environment variables: $DASHBOARD_USER / $DASHBOARD_PASS

2️⃣ REST API Server

# Install dependencies
pip install fastapi uvicorn

# Start API server
uvicorn scripts.api_server:app --host 0.0.0.0 --port 8000 --reload

# Access Swagger UI: http://localhost:8000/docs
# Access ReDoc:      http://localhost:8000/redoc

3️⃣ Command Line Interface

# Standard CLI usage
python scripts/cli.py lifecycle build

# Enhanced visual output
python scripts/cli.py lifecycle build --visual --verbose

🏗️ Architecture Overview

┌─────────────────────────────────────────────────────────────┐
│                    User Access Layer                         │
│  ┌──────────────┐ ┌──────────────┐ ┌──────────────┐        │
│  │ Streamlit    │ │ FastAPI REST │ │ CLI/Notebook │        │
│  │ Dashboard    │ │ API Server   │ │ (Existing)   │        │
│  │ (Auth+HTTPS) │ │ (Swagger)    │ │              │        │
│  └──────┬───────┘ └──────┬───────┘ └──────────────┘        │
└─────────┼───────────────┼───────────────────────────────────┘
          │               │
          ▼               ▼
┌─────────────────────────────────────────────────────────────┐
│                   Business Logic Layer                      │
│  ┌─────────────┐ ┌─────────────┐ ┌─────────────┐           │
│  │AuthManager  │ │AlertManager │ │HistoryMgr   │           │
│  │(RBAC Auth)  │ │(Multi-Chnl) │ │(SQLite TSDB)│           │
│  └─────────────┘ └─────────────┘ └─────────────┘           │
│  ┌─────────────────────────────────────────────┐            │
│  │     LifecycleProtocol (11-Phase Engine)       │            │
│  │     UnifiedGateEngine + CheckpointManager     │            │
│  └─────────────────────────────────────────────┘            │
└─────────────────────────┬───────────────────────────────────┘
                          │
                          ▼
┌─────────────────────────────────────────────────────────────┐
│                    Data Persistence Layer                    │
│  ┌────────────┐ ┌────────────┐ ┌────────────────────────┐  │
│  │ SQLite DB  │ │ YAML Config│ │ Checkpoint Files       │  │
│  │ (History)  │ │ (Deploy)   │ │ (Lifecycle State)      │  │
│  └────────────┘ └────────────┘ └────────────────────────┘  │
└─────────────────────────────────────────────────────────────┘

✨ Key Features (V3.6.1)

⚓ AnchorChecker (NEW)

Milestone anchor verification that ensures critical checkpoints are properly validated before proceeding:

  • Anchor Point Definition — Define mandatory validation anchors at key lifecycle milestones
  • Cross-Phase Verification — Verify consistency between phase outputs and anchor criteria
  • Drift Detection — Detect when project execution drifts from defined anchor points
  • Auto-Recovery — Suggest corrective actions when anchor checks fail

🔄 RetrospectiveEngine (NEW)

Independent retrospective mechanism for continuous improvement after each dispatch cycle:

  • Post-Dispatch Review — Automatically analyze what went well and what could improve
  • Pattern Extraction — Extract reusable patterns from successful collaborations
  • Anti-Pattern Detection — Identify recurring issues and suggest process improvements
  • Metric Trend Analysis — Track quality metrics across dispatches to spot degradation

📊 FeatureUsageTracker (NEW)

Thread-safe feature invocation counter for data-driven feature optimization:

  • Invocation Tracking — Count every feature call (dispatch, anchor_check, retrospective, consensus, etc.)
  • Usage Reports — Top features, unused features, low-usage features with markdown export
  • Auto-Persist — Periodic JSON persistence every 100 ticks
  • 30 Known Features — Pre-registered feature set covering all DevSquad capabilities

🎯 StructuredGoal (NEW)

Structured goal management that decomposes high-level objectives into trackable, verifiable sub-goals:

  • Goal Decomposition — Break complex objectives into hierarchical sub-goals with clear criteria
  • Progress Tracking — Real-time progress measurement against defined goal structure
  • Dependency Mapping — Visualize and manage dependencies between sub-goals
  • Completion Verification — Automated verification that goals meet their success criteria

🔀 FallbackBackend (NEW)

Automatic backend failover that ensures LLM availability even when primary backends are down:

  • Health Monitoring — Continuous health checks for all configured LLM backends
  • Automatic Failover — Seamlessly switch to backup backend when primary fails
  • Priority-Based Routing — Configure backend priority order (e.g., OpenAI → Anthropic → Mock)
  • Recovery Detection — Automatically restore primary backend when it recovers

🔍 VerificationGate — Evidence-Based Quality

  • Prove-It Pattern: Every completion claim must include verifiable evidence (test output, diff, benchmark)
  • 7 Red Flags: no_test | tests_pass_first_run | no_regression_test | no_security_scan | no_perf_baseline | vague_description | evidence_missing
  • Auto-active: Integrated into TaskCompletionChecker — zero config required

🔐 Authentication & Authorization

  • Multi-user support with role-based access control (RBAC)
  • Three roles: Admin (full access), Operator (execute), Viewer (read-only)
  • Secure password hashing with SHA-256
  • Session management for Streamlit dashboard
  • OAuth2 support (optional, for enterprise deployments)

🌐 REST API (FastAPI)

  • 10+ endpoints for complete lifecycle management
  • Automatic OpenAPI/Swagger documentation at /docs
  • CORS middleware for cross-origin requests
  • Request timing and comprehensive logging
  • Standardized error responses

Key Endpoints:

Lifecycle:
  GET    /api/v1/lifecycle/phases          → List all 11 phases
  POST   /api/v1/lifecycle/actions         → Execute phase actions
  GET    /api/v1/lifecycle/status          → Current status

Metrics:
  GET    /api/v1/metrics/current          → Real-time metrics
  GET    /api/v1/metrics/history          → Historical data

Gates:
  GET    /api/v1/gates/status             → All gate statuses
  POST   /api/v1/gates/check              → Check specific gate

System:
  GET    /api/v1/health                   → Health check

🔔 Alert Notification System

  • 4 severity levels: INFO, WARNING, ERROR, CRITICAL
  • Multiple channels: Console, Slack, Email, Webhook
  • Rate limiting to prevent alert spam (configurable)
  • Deduplication within time window
  • Alert history tracking and statistics

📊 Historical Data Storage (SQLite)

  • Metrics snapshots with time-range queries
  • Alert history with acknowledgment tracking
  • API request logs with performance metrics
  • Lifecycle events audit trail
  • Automatic cleanup with configurable retention

📈 Visualization & Monitoring

  • Streamlit Dashboard: Real-time monitoring with authentication
  • CLI Visual Module: Rich terminal output with colors and icons
  • Jupyter Notebook: Interactive 10-section tutorial
  • Benchmark Reports: HTML/JSON performance reports

🧩 Layered Sub-Skill Architecture (V3.6.1)

DevSquad provides 6 atomic sub-skills that can be used independently or together. Each sub-skill is a thin wrapper (~50 lines) importing existing core modules — no duplicated logic.

skills/
├── dispatch/       → DispatchSkill — MultiAgentDispatcher (7-role orchestration)
├── intent/         → IntentSkill   — IntentWorkflowMapper (6 intents × 3 languages)
├── review/         → ReviewSkill   — FiveAxisConsensusEngine (5-axis code review)
├── security/       → SecuritySkill — InputValidator + OperationClassifier + PermissionGuard
├── test/           → TestSkill     — TestQualityGuard + test strategy generation
└── retrospective/  → RetroSkill    — RetrospectiveEngine + pattern extraction

Sub-Skill Quick Reference

Skill Core Method Wraps Mock Mode
dispatch run(task, roles, mode) MultiAgentDispatcher
intent detect(text, lang) IntentWorkflowMapper
review review(code) FiveAxisConsensusEngine
security scan_input(text) InputValidator + OpClassifier
test generate_strategy(module) TestQualityGuard
retrospective run_retrospective(results) RetrospectiveEngine

Usage Examples

# Direct import (recommended for single skill)
from skills.dispatch.handler import DispatchSkill
result = DispatchSkill().run("Fix login bug", roles=["coder", "tester"])

# Via registry (dynamic discovery)
from skills import get_skill, list_skills
print(list_skills())  # ['dispatch', 'intent', 'review', 'security', 'test', 'retrospective']
skill = get_skill("security")
result = skill.scan_input("DROP TABLE users; --")

All sub-skills work without any API key in Mock mode.


📋 Plan C Architecture (Core Engine)

Unified Lifecycle Architecture - Resolves CLI 6 commands vs 11-phase lifecycle:

CLI View Layer (6 commands)          Core Engine (11 phases)
┌─────────────────────┐            ┌──────────────────────────┐
│ spec → P1, P2       │───View ──→│ P1: Requirements         │
│ plan → P7           │   Mapping │ P2: Architecture         │
│ build → P8          │            │ P3: Technical Design     │
│ test → P9           │            │ ...                      │
│ review → P8,P6      │            │ P10: Deployment          │
│ ship → P10          │            │ P11: Operations          │
└─────────────────────┘            └──────────────────────────┘
        ↓                                    ↓
  UnifiedGateEngine                   CheckpointManager
  (Phase + Worker gates)              (Lifecycle state persistence)

Core Components:

  • LifecycleProtocol - Abstract interface for unified lifecycle management
  • UnifiedGateEngine - Integrates VerificationGate + Phase transition gates
  • FullLifecycleAdapter - Complete 11-phase lifecycle with dependency resolution
  • Enhanced CheckpointManager - Auto save/restore lifecycle state across sessions

What is DevSquad?

DevSquad transforms a single AI task into a multi-role AI collaboration. It automatically dispatches your task to the right combination of expert roles — architect, product manager, coder, tester, security reviewer, DevOps — orchestrates their parallel collaboration through a shared workspace, resolves conflicts via weighted consensus voting, and delivers a unified structured report.

You: "Design a microservices e-commerce backend"
         │
         ▼
┌─────────────────┐
│  InputValidator   ──→ Security check (XSS, SQL injection, prompt injection)
└────────┬────────┘
         ▼
┌─────────────────┐
│  RoleMatcher     ──→ Auto-match: architect + devops + security
└────────┬────────┘
         ▼
┌──────────┬──────────┬──────────┐
│ Architect │  DevOps   │ Security │   ← ThreadPoolExecutor parallel execution
│(Design)   │(Infra)   │(Threat)  │
└────┬──────┴────┬─────┴────┬────┘
     └────────────┼───────────┘
                  ▼
      ┌──────────────────┐
      │    Scratchpad     │ ← Shared blackboard (real-time sync)
      └────────┬─────────┘
               ▼
      ┌──────────────────┐
      │ Consensus Engine  │ ← Weighted vote + veto + escalation
      └────────┬─────────┘
               ▼
      ┌──────────────────┐
      │ Structured Report │ ← Findings + Action Items (H/M/L)
      └──────────────────┘

📦 Installation

Prerequisites

  • Python 3.9+ (3.9, 3.10, 3.11, 3.12 supported)
  • pip or pipenv for package management

Option A: PyPI Install (Recommended)

# Install from PyPI — zero setup, ready to use
pip install devsquad

# With optional dependencies
pip install "devsquad[api]"    # FastAPI + Streamlit dashboard
pip install "devsquad[all]"    # All optional features

Option B: Core Installation (CLI + Dashboard)

git clone https://github.com/your-org/DevSquad.git
cd DevSquad

# Install core package (minimal dependencies)
pip install -e .

# Ready to use!
devsquad dispatch -t "Design user authentication system"

Option C: Full Production Stack (All Features)

# Clone and install with all production features
git clone https://github.com/your-org/DevSquad.git
cd DevSquad

# Install with API server dependencies
pip install -e ".[api]"

# Or install all optional features
pip install -e ".[all]"

Optional Feature Groups:

# API Server (FastAPI + Uvicorn)
pip install -e ".[api]"

# Visualization (Streamlit + Jupyter)
pip install -e ".[visualization]"

# Alerting (Slack SDK)
pip install -e ".[alerts]"

# Development & Testing
pip install -e ".[dev]"

# Everything combined
pip install -e ".[all]"

Verify Installation

# Check version
devsquad --version
# Expected: devsquad 3.6.1

# Run tests
pytest tests/ -v --tb=short
# Expected: 1500+ passed

3 Ways to Use

1. CLI (Recommended)

# Mock mode (default) — no API key needed
python3 scripts/cli.py dispatch -t "Design user authentication system"

# Real AI output — set environment variables first
export OPENAI_API_KEY="sk-..."
export OPENAI_BASE_URL="https://api.openai.com/v1"   # optional
export OPENAI_MODEL="gpt-4"                            # optional
python3 scripts/cli.py dispatch -t "Design auth system" --backend openai

# Specify roles (short IDs: arch/pm/test/coder/ui/infra/sec)
python3 scripts/cli.py dispatch -t "Design auth system" -r arch sec --backend openai

# Stream output in real-time
python3 scripts/cli.py dispatch -t "Design auth system" -r arch --backend openai --stream

# Other commands
python3 scripts/cli.py status          # System status
python3 scripts/cli.py roles           # List available roles
python3 scripts/cli.py --version       # Show version (3.6.1)

2. Python API

from scripts.collaboration.dispatcher import MultiAgentDispatcher

# Mock mode (default)
disp = MultiAgentDispatcher()
result = disp.dispatch("Design REST API for user management")
print(result.to_markdown())
disp.shutdown()

# With LLM backend
from scripts.collaboration.llm_backend import create_backend
backend = create_backend("openai", api_key="sk-...", base_url="https://api.openai.com/v1")
disp = MultiAgentDispatcher(llm_backend=backend)
result = disp.dispatch("Design auth system", roles=["architect", "security"])
print(result.summary)
disp.shutdown()

4. Sub-Skills (Lightweight Independent)

# Each sub-skill works independently — no Dispatcher needed
from skills.security.handler import SecuritySkill
risk = SecuritySkill().scan_input("malicious input")

from skills.review.handler import ReviewScore
verdict = ReviewSkill().review(code_snippet)

from skills.intent.handler import IntentSkill
intent = IntentSkill().detect("修复登录漏洞", lang="zh")

3. MCP Server (for Cursor / any MCP client)

pip install mcp
python3 scripts/mcp_server.py              # stdio mode
python3 scripts/mcp_server.py --port 8080  # SSE mode

Exposes 6 tools: multiagent_dispatch, multiagent_quick, multiagent_roles, multiagent_status, multiagent_analyze, multiagent_shutdown.

7 Core Roles

Role CLI ID Aliases Weight Best For
Architect arch architect 1.5 System design, tech stack, performance/security architecture
Product Manager pm product-manager 1.2 Requirements, user stories, acceptance criteria
Security Expert sec security 1.1 Threat modeling, vulnerability audit, compliance
Tester test tester, qa 1.0 Test strategy, quality assurance, edge cases
Coder coder solo-coder, dev 1.0 Implementation, code review, performance optimization
DevOps infra devops 1.0 CI/CD, containerization, monitoring, infrastructure
UI Designer ui ui-designer 0.9 UX flow, interaction design, accessibility

Auto-match: If no roles specified, the dispatcher automatically matches based on task keywords.

Architecture Overview (60+ Core Modules)

DevSquad is built on a layered architecture with clear separation of concerns:

┌─────────────────────────────────────────────────┐
│                    CLI / MCP / API               │  Entry Points
├─────────────────────────────────────────────────┤
│              MultiAgentDispatcher                │  Orchestration
│  ┌────────────┬──────────────┬────────────────┐ │
│  │RoleMatcher │ReportFormatter│InputValidator  │ │  Extracted Components
│  └────────────┴──────────────┴────────────────┘ │
│  ┌────────────────────────────────────────────┐ │
│  │ RuleCollector (NL Rule Intercept)          │ │  Rule Collection
│  └────────────────────────────────────────────┘ │
├─────────────────────────────────────────────────┤
│                 Coordinator                      │  Task Planning
│  ┌──────────┬───────────┬────────────────────┐  │
│  │ Scratchpad│ Consensus │  BatchScheduler    │  │  Collaboration
│  └──────────┴───────────┴────────────────────┘  │
├─────────────────────────────────────────────────┤
│              Worker (per role)                   │  Execution
│  ┌────────────────────────────────────────────┐ │
│  │ PromptAssembler → LLMBackend → Output      │ │
│  └────────────────────────────────────────────┘ │
├─────────────────────────────────────────────────┤
│  LLMBackend: Mock | OpenAI | Anthropic          │  LLM Layer
├─────────────────────────────────────────────────┤
│  CheckpointManager | WorkflowEngine | ...       │  Infrastructure
└─────────────────────────────────────────────────┘

What's New in V3.6.1 🆕

AnchorChecker System

Milestone anchor verification that ensures critical checkpoints are validated before proceeding:

from scripts.collaboration.anchor_checker import AnchorChecker

checker = AnchorChecker()
checker.define_anchor("architecture_complete", criteria=["API spec defined", "tech stack selected"])
result = checker.check_anchor("architecture_complete", phase_output)
print(f"Anchor passed: {result.passed}")
print(f"Drift detected: {result.drift_score}")

Features:

  • Cross-phase consistency verification
  • Drift detection with severity scoring
  • Auto-recovery suggestions
  • Anchor point persistence

RetrospectiveEngine

Independent retrospective mechanism for continuous improvement:

from scripts.collaboration.retrospective_engine import RetrospectiveEngine

engine = RetrospectiveEngine()
report = engine.run_retrospective(dispatch_result)
print(f"Patterns found: {len(report.patterns)}")
print(f"Anti-patterns: {len(report.anti_patterns)}")
print(f"Improvement suggestions: {report.suggestions}")

Features:

  • Post-dispatch quality analysis
  • Pattern and anti-pattern extraction
  • Metric trend tracking
  • Actionable improvement suggestions

StructuredGoal

Structured goal management with hierarchical decomposition:

from scripts.collaboration.structured_goal import StructuredGoal

goal = StructuredGoal("Build e-commerce platform")
goal.add_sub_goal("User auth", criteria=["OAuth2 support", "2FA ready"])
goal.add_sub_goal("Product catalog", criteria=["Search", "Filter", "Pagination"])
progress = goal.get_progress()
print(f"Overall: {progress.completion_pct}%")

Features:

  • Hierarchical goal decomposition
  • Dependency mapping between sub-goals
  • Real-time progress tracking
  • Automated completion verification

FallbackBackend

Automatic LLM backend failover for high availability:

from scripts.collaboration.llm_backend import FallbackBackend

backend = FallbackBackend(
    primary="openai",
    fallbacks=["anthropic", "mock"],
    health_check_interval=30,
)
result = backend.generate("Design auth system")
# Automatically fails over if primary is down

Features:

  • Continuous backend health monitoring
  • Seamless automatic failover
  • Priority-based routing configuration
  • Automatic primary recovery detection

Natural Language Rule Collection

Automatically detect and store user rules from natural language input:

# User says: "记住规则:写代码时必须加注释"
# DevSquad automatically:
# 1. Detects rule-storing intent
# 2. Extracts: trigger="写代码时", action="必须加注释", type="always"
# 3. Sanitizes content (removes dangerous patterns)
# 4. Stores via CarryMem or local JSON fallback

# List stored rules
# User says: "列出规则" → Returns all stored rules

# Delete a rule
# User says: "删除规则 RULE-LOCAL-abc123"

Pipeline: User Input → IntentDetector → RuleExtractor → RuleSanitizer → RuleStorage (CarryMem + local JSON)

Features:

  • 11 intent patterns (Chinese + English)
  • 4 rule types: always / avoid / prefer / forbid
  • Prompt injection protection in rule content
  • CarryMem primary + local JSON fallback storage
  • Automatic rule injection into Worker prompts

See Integration Guide for detailed usage.


Key Features

Security

  • InputValidator: XSS, SQL injection, command injection, HTML injection detection
  • Prompt Injection Protection: 21+ patterns (ignore previous instructions, jailbreak, DAN mode, system prompt extraction, etc.)
  • API Key Safety: Environment variables only, never CLI arguments or logs
  • PermissionGuard: 4-level safety gate (PLAN → DEFAULT → AUTO → BYPASS)

Performance

  • ThreadPoolExecutor: Real parallel execution for multi-role dispatch
  • LLM Cache: TTL-based LRU cache with disk persistence (60-80% cost reduction)
  • LLM Retry: Exponential backoff + circuit breaker + multi-backend fallback
  • Streaming Output: Real-time chunk-by-chunk LLM output via --stream

Reliability

  • CheckpointManager: SHA256 integrity, handoff documents, auto-cleanup
  • WorkflowEngine: Task-to-workflow auto-split, step execution, resume from checkpoint, 11-phase lifecycle templates (full/backend/frontend/internal_tool/minimal), requirement change management
  • TaskCompletionChecker: DispatchResult/ScheduleResult completion tracking
  • ConsensusEngine: Weighted voting with veto power and human escalation

Project Lifecycle (11-Phase Model)

DevSquad V3.6.1 defines an 11-phase (4 optional) project lifecycle with clear roles, dependencies, and gate conditions:

P1 → P2 ──┬──→ P3 ──→ P6 ──→ P7 ──→ P8 ──→ P9 ──→ P10 ──→ P11
           ├──→ P4(∥P3) ──↗
           └──→ P5(dep P1+P3) ──↗
Template Phases Use Case
full P1-P11 Complete project
backend No P5 Backend services
frontend No P4,P6 Frontend applications
internal_tool No P4,P5,P6,P11 Internal tools
minimal P1,P3,P7,P8,P9 Minimum set

See GUIDE.md §4 for full lifecycle details with gate conditions and requirement change process.

Developer Experience

  • Configuration File: .devsquad.yaml in project root with env var overrides
  • Quality Control Injection: Auto-inject QC rules (hallucination prevention, overconfidence check, security guard, RACI protocol) into Worker prompts based on .devsquad.yaml config
  • Docker Support: docker build -t devsquad . && docker run devsquad dispatch -t "task"
  • GitHub Actions CI: Python 3.9-3.12 matrix testing
  • pip installable: pip install -e . with optional dependencies

Module Reference (60+ Modules)

💡 Table too wide? View modules online or use devsquad --help modules for a compact list.

Module File Purpose
MultiAgentDispatcher dispatcher.py Unified entry point
Coordinator coordinator.py Global orchestration: plan → assign → execute → collect
Worker worker.py Role executor with LLM backend integration
EnhancedWorker enhanced_worker.py Worker with auto QA (briefing + confidence + retry + memory rules)
Scratchpad scratchpad.py Shared blackboard for inter-worker communication
ConsensusEngine consensus.py Weighted voting + veto + escalation
RoleMatcher role_matcher.py Keyword-based role matching with alias resolution
ReportFormatter report_formatter.py Structured/compact/detailed report generation
InputValidator input_validator.py Security validation + prompt injection detection
AISemanticMatcher ai_semantic_matcher.py LLM-powered semantic role matching
CheckpointManager checkpoint_manager.py State persistence + handoff documents
WorkflowEngine workflow_engine.py Task-to-workflow auto-split + 11-phase lifecycle templates + requirement change
TaskCompletionChecker task_completion_checker.py Completion tracking + progress reporting
CodeMapGenerator code_map_generator.py Python AST-based code structure analysis
DualLayerContextManager dual_layer_context.py Project-level + task-level context management
SkillRegistry skill_registry.py Reusable skill registration + discovery
IntentWorkflowMapper intent_workflow_mapper.py User intent → workflow chain mapping (6 intents × 3 languages)
OperationClassifier operation_classifier.py Three-tier operation classification (ALWAYS_SAFE/NEEDS_REVIEW/FORBIDDEN)
FiveAxisConsensusEngine five_axis_consensus.py Five-axis review consensus with weighted voting
FeatureUsageTracker feature_usage_tracker.py Feature usage tracking + reporting + auto-persistence
LLMBackend llm_backend.py Mock/OpenAI/Anthropic with streaming support
LLMCache llm_cache.py TTL-based LRU cache with disk persistence
LLMRetry llm_retry.py Exponential backoff + circuit breaker
ConfigManager config_loader.py YAML config + env var overrides
PromptAssembler prompt_assembler.py Dynamic prompt assembly + QC rule injection
AgentBriefing agent_briefing.py Context-aware task briefing with priority filtering
ConfidenceScorer confidence_score.py 5-factor response quality assessment
PerformanceMonitor performance_monitor.py P95/P99 tracking + CPU/memory monitoring
MCEAdapter mce_adapter.py CarryMem integration adapter (optional dependency, supports match_rules + format_rules_as_prompt + add_rule)
Protocols protocols.py Interface definitions (CacheProvider, MemoryProvider, etc.)
NullProviders null_providers.py Graceful degradation providers
PermissionGuard permission_guard.py 4-level safety gate
MemoryBridge memory_bridge.py Cross-session memory
BatchScheduler batch_scheduler.py Batch task scheduling
ContextCompressor context_compressor.py Context compression for long tasks
RoleTemplateMarket role_template_market.py Role template sharing marketplace
Skillifier skillifier.py Auto skill learning from tasks
UsageTracker usage_tracker.py Token/cost tracking
WarmupManager warmup_manager.py Startup warmup optimization
TestQualityGuard test_quality_guard.py Test quality enforcement
PromptVariantGenerator prompt_variant_generator.py A/B prompt testing
ConfigManager (YAML) config_manager.py Project-level YAML config
WorkBuddyClawSource memory_bridge.py WorkBuddy read-only bridge
Models models.py Shared data models and type definitions
LLMCacheAsync llm_cache_async.py Async LLM cache for concurrent workloads
LLMRetryAsync llm_retry_async.py Async LLM retry with backoff
IntegrationExample integration_example.py DevSquad integration example code
AsyncIntegrationExample async_integration_example.py Async DevSquad integration example
AnchorChecker anchor_checker.py Milestone anchor verification + drift detection + auto-recovery
RetrospectiveEngine retrospective.py Independent post-dispatch retrospective + pattern extraction + anti-pattern detection
FeatureUsageTracker feature_usage_tracker.py Feature invocation counter + usage reports + auto-persist
FallbackBackend llm_backend.py Automatic LLM backend failover with health monitoring

Configuration

Create .devsquad.yaml in your project root:

quality_control:
  enabled: true
  strict_mode: true
  min_quality_score: 85
  ai_quality_control:
    enabled: true
    hallucination_check:
      enabled: true
      require_traceable_references: true
    overconfidence_check:
      enabled: true
      require_alternatives_min: 2
  ai_security_guard:
    enabled: true
    permission_level: "DEFAULT"
  ai_team_collaboration:
    enabled: true
    raci:
      mode: "strict"

llm:
  backend: openai
  base_url: ""  # Set via LLM_BASE_URL env var
  model: ""     # Set via LLM_MODEL env var
  timeout: 120
  log_level: WARNING

Or use environment variables (higher priority):

export DEVSQUAD_LLM_BACKEND=openai
export DEVSQUAD_BASE_URL=https://api.openai.com/v1
export DEVSQUAD_MODEL=gpt-4
export OPENAI_API_KEY=sk-...

Environment Variables

Variable Purpose Default
OPENAI_API_KEY OpenAI API key None (required for OpenAI backend)
OPENAI_BASE_URL OpenAI-compatible base URL None
OPENAI_MODEL Model name gpt-4
ANTHROPIC_API_KEY Anthropic API key None (required for Anthropic backend)
ANTHROPIC_MODEL Model name claude-sonnet-4-20250514
DEVSQUAD_LLM_BACKEND Default backend type mock
DEVSQUAD_LOG_LEVEL Logging level WARNING

Running Tests

# Core tests (748+ tests all passing)
python3 -m pytest scripts/collaboration/core_test.py \
  scripts/collaboration/role_mapping_test.py \
  scripts/collaboration/upstream_test.py \
  scripts/collaboration/mce_adapter_test.py \
  tests/ test_v35_integration.py \
  tests/test_anti_rationalization.py \
  tests/test_verification_gate.py \
  tests/test_intent_workflow_mapper.py \
  tests/test_cli_lifecycle.py -v

# Quick smoke test
python3 scripts/cli.py --version    # 3.6.1
python3 scripts/cli.py status       # System ready
python3 scripts/cli.py roles        # List 7 roles

# Lifecycle commands (NEW in v3.4.1)
python3 scripts/cli.py spec -t "User authentication system"
python3 scripts/cli.py build -t "Implement login API"
python3 scripts/cli.py test -t "Run all unit tests"
python3 scripts/cli.py review -t "Check PR #123"
python3 scripts/cli.py ship -t "Deploy to production"

🔄 Upgrade Smoke Test

After upgrading DevSquad, run these commands to verify your environment:

# Quick health check (should complete in < 30s)
python3 scripts/cli.py --version       # Expected: DevSquad 3.6.1
python3 scripts/cli.py status          # Expected: System ready
python3 scripts/cli.py roles           # Expected: 7 core roles listed

# Full test suite
python3 -m pytest tests/ -q --tb=line # Expected: 1662 passed

With Coverage Report

# Install coverage tool first: pip install pytest-cov
python3 -m pytest tests/ --cov=scripts --cov-report=term-missing --cov-fail-under=80
# Expected: coverage ≥ 80%, detailed missing-line report

Test Layering Strategy

DevSquad uses a priority-based test layering strategy:

Priority Scope Examples Count
P0 Quality Framework Core AntiRationalization (39), VerificationGate (42), IntentWorkflowMapper (58), AuthManager (35) ~200
P1 Enhancement Modules FiveAxisConsensus (29), OperationClassifier (27), OutputSlicer (26), CIFeedbackAdapter (22) ~150
P1+ Cybernetics (V3.6.1) FeedbackControlLoop (19), ExecutionGuard (40), PerformanceFingerprint (13), SimilarTaskRecommender (17), AdaptiveRoleSelector (21) 110
P2 Integration & E2E Full lifecycle dispatch, cross-module integration ~200
P3 Unit per Module Core dispatcher, RoleMapping, MCEAdapter, LLM backends ~400+

Total: 1662 tests

Run by priority:

# P0 only (critical path, < 10s)
python3 -m pytest tests/ -k "anti_ratif or verification or intent_workflow or auth" -q

# P0 + P1 (quality + enhancement, < 30s)
python3 -m pytest tests/ -k "anti_ratif or verification or intent or auth or five_axis or operation" -q

# Full suite
python3 -m pytest tests/ -q --tb=line

Documentation

Document Description
QUICK_START_EN.md Quick start guide (English, 5 minutes)
REFERENCE_GUIDE_EN.md Complete reference guide (English)
QUICK_START_JP.md クイックスタートガイド (日本語, 5分)
REFERENCE_GUIDE_JP.md 完全リファレンスガイド (日本語)
GUIDE.md Complete user guide (Chinese)
GUIDE_EN.md Complete user guide (English) → See QUICK_START + REFERENCE_GUIDE
GUIDE_JP.md 完全なユーザーガイド (日本語) → クイックスタート+リファレンスを参照
INSTALL.md Installation guide (Unix + Windows)
EXAMPLES.md Real-world usage examples
SKILL.md Skill manual (EN/CN/JP)
CLAUDE.md Claude Code project instructions
CHANGELOG.md Version history
README-CN.md 中文说明
README-JP.md 日本語説明

🆕 Quick Start (Recommended for New Users)

New to DevSquad? Start here:

# 1. Run the interactive demo (3 scenarios, < 15 seconds)
python examples/quick_demo.py

# 2. Read the quick start guide
# English: docs/i18n/QUICK_START_EN.md
# Japanese: docs/i18n/QUICK_START_JP.md

# 3. Your first dispatch
python3 scripts/cli.py dispatch -t "Design user authentication system"

☸️ Kubernetes Deployment

# Deploy with Helm
helm install devsquad ./helm/devsquad

# Port forward
kubectl port-forward svc/devsquad-api 8000:8000

See helm/devsquad/README.md for full documentation.

Cross-Platform Compatibility

DevSquad is not TRAE-exclusive. It supports 6 integration methods:

Platform Integration Setup Difficulty Key Features Available
TRAE IDE Native Skill (skill-manifest.yaml) Zero config Full: Dispatcher + Dashboard + CLI
Claude Code MCP Server / Python import Low 6 MCP tools or direct API
Cursor MCP Server (stdio mode) Low Same as Claude Code
OpenClaw / WorkBuddy Claw WorkBuddyClawSource bridge Auto Read-only memory bridge
Any MCP Client stdio / SSE dual mode Low 6 tools, configurable port
Pure Python pip install -e . Low CLI + API + Skills + REST
Docker docker build & run Low Isolated container with all features

Quick Start per Platform

# === TRAE IDE ===
# Just use it — zero configuration

# === Claude Code / Cursor (MCP) ===
# Add to .claude/mcp.json or .cursor/mcp.json:
# {"mcpServers": {"devsquad": {"command": "python", "args": ["/path/to/mcp_server.py"]}}}

# === Pure Python ===
pip install -e "/path/to/DevSquad[all]"
devsquad dispatch -t "task description"

# === REST API ===
uvicorn scripts.api_server:app --port 8000   # → http://localhost:8000/docs

# === Docker ===
docker build -t devsquad . && docker run -it devsquad dispatch -t "test"

Version History

Date Version Highlights
2026-05-17 V3.6.1 Cybernetics Enhancement — 5 new modules (FeedbackControlLoop/ExecutionGuard/PerformanceFingerprint/SimilarTaskRecommender/AdaptiveRoleSelector) with feedback loops, execution guards, TF-IDF similarity search, and adaptive role selection. Inspired by upstream TraeMultiAgentSkill v2.5's cybernetics architecture.
2026-05-16 V3.6.0 Layered Sub-Skill Architecture + Core Modules — 6 atomic sub-skills (dispatch/intent/review/security/test/retrospective) with lazy-loading registry via importlib, each ~50 lines wrapping existing core modules. Plus: AnchorChecker (milestone anchor verification + drift detection), RetrospectiveEngine (independent retrospective + pattern extraction), StructuredGoal (structured goal decomposition + progress tracking), FallbackBackend (automatic LLM failover + health monitoring), FeatureUsageTracker (feature usage tracking + reporting + auto-persistence), 7 module integrations (IntentWorkflowMapper/AISemanticMatcher/DualLayerContextManager/OperationClassifier/SkillRegistry/FiveAxisConsensusEngine/NullProviders), 1662+ tests, 48 core modules. Cross-platform compatibility: Claude Code/Cursor/OpenClaw/Pure Python/Docker/MCP.
2026-05-05 V3.5.0 Enhancement Sprint — Code walkthrough enhancement, documentation consistency checks, Karpathy principles, project understanding (AgentBriefing), CLI lifecycle commands, structured output, 748+ tests
2026-05-03 V3.4.1 Agent Skills Quality Framework (P0) — AntiRationalizationEngine + VerificationGate + IntentWorkflowMapper + CLI Lifecycle Commands (spec/plan/build/test/review/ship) + 167 new tests + Google Agent Skills integration + 49 core modules
2026-05-02 V3.4.0 Foundation Release — Real LLM backend (OpenAI/Anthropic/Mock), ThreadPoolExecutor parallel execution, InputValidator + prompt injection protection, CheckpointManager, WorkflowEngine with 11-phase lifecycle templates (full/backend/frontend/internal_tool/minimal), TaskCompletionChecker, AISemanticMatcher, streaming output, Docker, GitHub Actions CI, config file, CodeMapGenerator, DualLayerContext, SkillRegistry, CarryMem integration, AgentBriefing, ConfidenceScore, EnhancedWorker with auto QA, Protocol interface system, 234+ unit tests, requirement change management with gate mechanism and gap reporting
2026-04-17 V3.2 E2E Demo, MCE Adapter, Dispatcher UX
2026-04-16 V3.0 Complete redesign — Coordinator/Worker/Scratchpad architecture

License

MIT License — see LICENSE for details.

Links

Link URL
GitHub (This Repo) https://github.com/lulin70/DevSquad
Original / Upstream https://github.com/weiransoft/TraeMultiAgentSkill

About

DevSquad — Multi-Role AI Task Orchestrator. One task in, multi-role AI collaboration, one conclusion out. 7 core roles, real LLM backend (OpenAI/Anthropic), CLI + MCP + Python API. ~825+ tests.

Topics

Resources

License

Contributing

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages