A comprehensive 20-phase autonomous project audit system for Claude Code with full GitHub integration.
| Version | Status | Release |
|---|---|---|
| Latest patch | v4.1.1 | |
| Prior major (untagged) | commit aa888d9 | |
| Prior patch | v3.0.1 | |
| Prior major | v3.0.0 | |
| Prior patch | v2.0.1 | |
| Prior major | v2.0.0 |
Note on the v3.0.1 → v4.1.1 tag gap: The
4.0.0and4.1.0versions were cut internally (VERSION file bumps + CHANGELOG entries) but never tagged on GitHub.v4.1.1is therefore the first tagged release in the4.xseries — its release notes cumulatively cover the phase-consolidation work from4.0.0(27 → 21 phases), the Phase H dissolution in4.1.0(21 → 20 phases), and the4.1.1patch additions. The "Prior major" entry above links to the commit where4.0.0was cut, since no tag exists for it.
Badge Color Convention
Each version segment gets its own badge. The number of badges indicates version depth:
| Level | Current | Prior | Example |
|---|---|---|---|
Major (W.0.0) |
v1.0.0, v2.0.0 | ||
Minor (W.X.0) |
v1.1.0, v1.2.0 | ||
Patch (W.X.Y) |
v1.0.3, v1.2.1 | ||
Tweak (W.X.Y.Z) |
v1.0.3.1 |
Color Scheme:
- Current: brightgreen → darkgreen → green → yellow
- Prior: brightred → darkred → red → orange
The /test skill performs a complete autonomous audit of any software project - running tests, scanning for vulnerabilities, checking code quality, validating production deployments, and auditing GitHub repository security settings. It fixes ALL issues automatically and loops until the codebase is clean.
Key Features:
- 🔄 Autonomous: Fixes all issues without prompting (no "manual items" lists)
- 🧩 Modular: 93% context reduction via on-demand phase loading
- 🔒 Security-First: Integrated CVE scanning, secret detection, GitHub security auditing
- 🌍 Multi-Language: Python, Node.js, Go, Rust, Shell, Docker, YAML
- 📸 BTRFS Snapshots: Safe rollback points before modifications
- 🐙 GitHub Integration: Full repository security audit and auto-remediation
/test # Full audit (autonomous - fixes everything)
/test security # Comprehensive security audit (Phase 5)
/test --phase=0-2 # Pre-flight through testing
/test prodapp # Validate installed production app
/test docker # Validate Docker image and registry
/test github # Audit GitHub repository settings
/test --phase=ST # Validate test-skill framework (meta-testing)
/test --interactive # Enable prompts for decisions
/test help # Show all options| Phase | Name | Description |
|---|---|---|
| Safety & Setup | ||
| S | Snapshot | BTRFS safety snapshot before modifications |
| 0 | Pre-Flight | Environment validation, config audit, sandbox setup |
| 1 | Discovery | Detect project type, test frameworks, tools, GitHub remote |
| Testing | ||
| 2 | Execute | Run tests, analyze results, measure coverage |
| 2a | Runtime | Service health checks, stuck process detection |
| Analysis | ||
| 5 | Security | Comprehensive security (7 tools: bandit, semgrep, CodeQL, pip-audit, trivy, grype, checkov) |
| 6 | Dependencies | Package health, outdated/unused/vulnerable packages |
| 7 | Quality | Linting, complexity analysis, dead code cleanup |
| I | Infrastructure | Infrastructure and runtime issue detection |
| Remediation | ||
| 10 | Fix | Auto-fix issues (ruff --fix, black, isort, shfmt, codespell) |
| Validation | ||
| A | App Test | Deployable application testing in sandbox |
| P | Production | Validate live installed application |
| D | Docker | Validate Docker image and registry package |
| G | GitHub | Audit GitHub repo security (Dependabot, CodeQL, branch protection) |
| 12 | Verify | Re-run tests, confirm no regressions |
| Finalization | ||
| 13 | Docs | Update documentation to match codebase |
| C | Restore | Cleanup temp files, restore environment |
| Special | ||
| V | VM Testing | Heavy isolation testing in libvirt/QEMU VMs |
| VM | Lifecycle | VM startup/shutdown management |
| ST | Self-Test | Validate test-skill framework (explicit only: --phase=ST) |
| Mode | Flag | Behavior |
|---|---|---|
| Autonomous (default) | (none) | Fixes ALL issues, no prompts, loops until clean |
| Interactive | --interactive |
May prompt for decisions, single pass |
- Fixes every issue regardless of priority/severity
- No user prompts except for safety/architecture/external blocks
- Loops between Fix (Phase 10) and Verify (Phase 12) until all tests pass
- Documentation automatically synchronized
- May prompt for decisions (e.g., Phase P/D conditional execution)
- Still fixes ALL issues — interactive mode changes prompting behavior, not the fix mandate
- Loops until all tests pass and all issues resolved
- Useful when architectural decisions require human judgment
Full installation guide: See INSTALL.md for detailed instructions including prerequisites, verification, updating, and troubleshooting.
git clone https://github.com/TheBoscoClub/claude-test-skill.git ~/claude-test-skill
mkdir -p ~/.claude/commands ~/.claude/skills
ln -s ~/claude-test-skill/commands/test.md ~/.claude/commands/test.md
ln -s ~/claude-test-skill/skills/test-phases ~/.claude/skills/test-phasesgit clone https://github.com/TheBoscoClub/claude-test-skill.git /tmp/claude-test-skill
mkdir -p ~/.claude/commands ~/.claude/skills
cp /tmp/claude-test-skill/commands/test.md ~/.claude/commands/
cp -r /tmp/claude-test-skill/skills/test-phases ~/.claude/skills/
rm -rf /tmp/claude-test-skill# In Claude Code:
/test --phase=STRequires: Claude Code 2.1.0+ (for YAML allowed-tools syntax). See INSTALL.md for full prerequisites.
Phase 3 (Discovery) automatically detects which tools are installed on your system. The skill uses the tools it finds:
| Tool | Languages | Purpose | Install |
|---|---|---|---|
| ruff | Python | Fast linter + formatter | pip install ruff |
| pylint | Python | Deep static analysis | pip install pylint |
| mypy | Python | Type checking | pip install mypy |
| black | Python | Code formatting | pip install black |
| isort | Python | Import sorting | pip install isort |
| eslint | JS/TS | Linting | npm install -g eslint |
| prettier | JS/TS/JSON/MD | Formatting | npm install -g prettier |
| hadolint | Docker | Dockerfile linting | OS package manager |
| yamllint | YAML | YAML validation | pip install yamllint |
| shfmt | Shell | Shell formatting | OS package manager |
| markdownlint-cli | Markdown | Markdown linting | npm install -g markdownlint-cli |
| codespell | All | Spelling errors | pip install codespell |
| Tool | Purpose | Install |
|---|---|---|
| pip-audit | Python CVE scanning | pip install pip-audit |
| bandit | Python security analysis | pip install bandit |
| semgrep | Multi-language SAST | pipx install semgrep |
| npm audit | Node.js CVE scanning | (built-in) |
| cargo audit | Rust CVE scanning | cargo install cargo-audit |
| trivy | Container/filesystem scanning | OS package manager |
| grype | SBOM vulnerability scanning | OS package manager (AUR: grype-bin) |
| checkov | Infrastructure-as-Code security | pipx install checkov |
| CodeQL | Advanced static analysis | GitHub Actions / Local install |
| Tool | Purpose | Install |
|---|---|---|
| gh | GitHub CLI for repo auditing | OS package manager |
Projects can include .claude-test.yaml for customization:
# Test coverage requirements
coverage:
minimum: 85
fail_on_below: true
# Sandbox configuration
mocking:
enabled: true
sandbox_dir: /tmp/claude-test-sandbox-${PROJECT_NAME}
# Cleanup behavior
cleanup:
after_test: true
remove_sandbox: true
# Tool-specific settings
tools:
ruff:
extend-select: ["I", "UP", "YTT", "ASYNC"]
pylint:
disable: ["C0114", "C0115", "C0116"]Phases execute in tiers with strict dependencies:
TIER 0: Safety [S, 0] ─────────────────── Can run in parallel
│
▼
TIER 1: Discovery [1] ──────────────────── GATE: Project Known
│
▼
TIER 2: Testing [2, 2a] ────────────────── Can run in parallel
│
▼
TIER 3: Analysis [5, 6, 7, I] ─────────── Can run in parallel (read-only)
│
▼
TIER 4: Fix [10] ───────────────────────── MODIFIES FILES (sequential)
│
▼
TIER 5: Validation [P, D, G] ───────────── CONDITIONAL (sequential)
│
▼
TIER 6: Verify [12] ────────────────────── Re-run tests
│
⟲ Loop to Fix if issues found
│
▼
TIER 7: Docs [13] ──────────────────────── ALWAYS runs
│
▼
TIER 8: Cleanup [C] ────────────────────── ALWAYS last
Phase 9b (Production) - Skipped if:
- No installable app detected
- App not installed on this system
Phase 9c (Docker) - Skipped if:
- No Dockerfile in project
- No registry package found
Phase 9d (GitHub) - Skipped if:
- No GitHub remote configured
ghCLI not authenticated
Phase G performs a comprehensive GitHub repository audit:
- ✅ Dependabot vulnerability alerts
- ✅ Dependabot security updates
- ✅ Secret scanning (if available)
- ✅ Code scanning (CodeQL workflows)
- ✅ Branch protection rules
- Enables Dependabot alerts if missing
- Enables automated security updates if missing
- Reports open security alerts for manual review
ghCLI installed and authenticated (gh auth login)- Push access to the repository (for enabling security features)
claude-test-skill/
├── commands/
│ └── test.md # Main dispatcher (~1,000 lines)
├── skills/
│ └── test-phases/ # 20 phase files (each with model tier config header)
│ ├── phase-1-snapshot.md # [haiku]
│ ├── phase-2-preflight.md # [sonnet]
│ ├── phase-3-discovery.md # [opus]
│ ├── phase-4a-execute.md # [sonnet]
│ ├── phase-4b-runtime.md # [sonnet]
│ ├── phase-5a-security.md # [opus]
│ ├── phase-5b-dependencies.md # [sonnet]
│ ├── phase-5c-quality.md # [opus]
│ ├── phase-5d-infrastructure.md # [sonnet]
│ ├── phase-6-fix.md # [opus]
│ ├── phase-7-verify.md # [sonnet]
│ ├── phase-8-docs.md # [sonnet]
│ ├── phase-9a-app-testing.md # [opus]
│ ├── phase-9b-production.md # [opus]
│ ├── phase-9c-docker.md # [opus]
│ ├── phase-9d-github.md # [opus]
│ ├── phase-10a-vm-testing.md # [sonnet]
│ ├── phase-10b-vm-lifecycle.md # [sonnet]
│ ├── phase-11-cleanup.md # [haiku]
│ └── phase-ST-self-test.md # [opus]
├── agents/ # Integrated into phases (reference docs)
│ ├── coverage-reviewer.md # → Phase 4a
│ ├── security-scanner.md # → Phase 5a
│ └── test-analyzer.md # → Phase 4a
├── docs/
│ └── ARCHITECTURE.md # System architecture
├── .github/
│ └── workflows/
│ └── security.yml # Daily security scanning
├── plugin.json
├── INSTALL.md # Third-party installation guide
├── SKILL.md # Claude.ai web upload version
└── README.md
The modular architecture significantly reduces context consumption:
| Component | Lines | When Loaded |
|---|---|---|
| Dispatcher | ~1,000 | Always |
| Each Phase | 50-300 | On-demand via subagent |
| Typical audit | ~1,500 | vs 3,652 monolithic |
Result: ~60% reduction in context for typical audits
/testRuns all phases autonomously, fixes all issues, loops until clean.
/test security
# or: /test --phase=5Runs comprehensive security audit with 8 tools (GitHub + local + installed app).
/test --phase=0-2,7Quick validation: Pre-flight, Discovery, Execute, and Quality.
/test prodappValidates the installed production application against install-manifest.json.
/test githubAudits GitHub security settings and enables missing protections.
/test --phase=STValidates the test-skill framework itself (phase files, symlinks, tools).
Note: Phase ST is never included in normal /test runs - explicit only.
- Create
~/.claude/skills/test-phases/phase-X-name.md - Add phase to the Available Phases table in
commands/test.md - Define tier placement in dependency graph
- Document in README
# Phase X: Your Phase Name
## Purpose
Brief description of what this phase does.
## Steps
### Step 1: First Action
[Instructions for Claude]
### Step 2: Second Action
[Instructions for Claude]
## Output Format
Status: ✅ PASS / ⚠️ ISSUES / ❌ FAIL
Issues Found: [count]
Key Findings:
- [finding 1]
- [finding 2]Run gh auth login to authenticate with GitHub.
Install the recommended tools for your language. Phase 1 will detect them automatically.
Ensure you have sudo access or run on a BTRFS filesystem. Snapshots are optional - the skill continues without them on other filesystems.
Phase P validates production installations. If the app isn't installed on this system, Phase P correctly skips.
- Fork the repository
- Create a feature branch
- Make your changes
- Run
/test --phase=STto validate framework integrity - Run
/teston the skill itself for full audit - Submit a pull request
- All markdown must pass markdownlint
- All YAML must pass yamllint
- No hardcoded secrets or credentials
MIT License - See LICENSE file for details.
See CHANGELOG.md for detailed version history.
- v4.0.0 - Phase consolidation (27 to 21), bloat reduction, project-agnostic, agents integrated (BREAKING)
- v3.0.1 - Canonical help block, Phase ST grep fix, argument-hint update
- v3.0.0 - Dispatcher execution fixes, Phase I/H integration, documentation consistency audit (BREAKING)
- v2.0.1 - Opus 4.6 phase configuration headers, Phase ST integration validation
- v2.0.0 - Opus 4.6 model pinning, subagent tiering, 16 tools, task tracking (BREAKING)
Added after a user accidentally typed /git-release tweak instead of /git-release patch because someone in their Teams meeting said "tweak the memory" at the exact moment they were typing.
Human cognition can be modeled as an I/O system where each modality (language, vision, motor) can handle multiple read-only input streams concurrently, but has only a single process table for output. When a read-only process suddenly needs to write, other read processes can insert data into the write process's I/O register—resulting in cross-talk.
┌─────────────────────────────────────────────────────────────┐
│ HUMAN COGNITION: I/O MODEL │
├─────────────────────────────────────────────────────────────┤
│ LANGUAGE MODALITY │
│ ┌─────────────────────────────────────────────────────┐ │
│ │ INPUT STREAMS (read-only, concurrent OK) │ │
│ │ ├─ Teams meeting audio ──────► buffer[0] │ │
│ │ ├─ Internal monologue ───────► buffer[1] │ │
│ │ └─ Reading (if any) ─────────► buffer[2] │ │
│ └─────────────────────────────────────────────────────┘ │
│ │ │
│ ▼ │
│ ┌─────────────────────────────────────────────────────┐ │
│ │ OUTPUT REGISTER (single writer, NO MUTEX) │ │
│ │ ┌──────────────────────────────────────────────┐ │ │
│ │ │ "tweak" ← RACE CONDITION: buffer[0] won │ │ │
│ │ └──────────────────────────────────────────────┘ │ │
│ └─────────────────────────────────────────────────────┘ │
│ │ │
│ ▼ │
│ Motor Cortex (keystrokes) │
└─────────────────────────────────────────────────────────────┘
Evolution optimized for speed over correctness. The panic-response code demonstrates this clearly:
if (predator_detected) {
// DO NOT WAIT for environment_scan() to complete
// Tree collision is survivable. Tiger is not.
motor_cortex.execute(FLEE); // non-blocking, fire-and-forget
}Survivorship bias in action: Ancestors who stopped to carefully survey escape routes got eaten. Those who face-planted into trees but survived passed on their genes.
EVOLUTION v2.0.0-LTS (Homo sapiens)
├── Release: ~300,000 years ago
├── Support Status: ACTIVE (no EOL planned)
├── Known Issues:
│ ├── #4,271: Panic response overwrites output buffer
│ ├── #12,847: Sugar addiction (deprecated food scarcity)
│ └── #89,421: Cannot distinguish real tigers from work emails
├── Patch Frequency: ~1 per 10,000 generations
└── Upgrade Path: None available. You're stuck with this kernel.
The original devs are unreachable and left no documentation.
// Region-specific threat assessment
if (location.continent == "Asia" && habitat.includes("forest")) {
TIGER_THREAT = LITERAL; // Bengal, Siberian, Indochinese, etc.
TREE_COLLISION_PRIORITY = ACCEPTABLE_RISK;
} else {
TIGER_THREAT = METAPHORICAL; // deadlines, managers, merge conflicts
TREE_COLLISION_PRIORITY = EMBARRASSING;
}In rural India, Nepal, or the Russian Far East, that legacy panic-response code is still very much production-ready. Evolution's LTS release still getting real-world use cases.
This addendum serves as a reminder that humans have eventual consistency at best—and sometimes experience dirty reads.