Claude Code Test Skill

A comprehensive 20-phase autonomous project audit system for Claude Code with full GitHub integration.

Version History

Version	Status	Release
	Latest patch	v4.1.1
	Prior major (untagged)	commit aa888d9
	Prior patch	v3.0.1
	Prior major	v3.0.0
	Prior patch	v2.0.1
	Prior major	v2.0.0

Note on the v3.0.1 → v4.1.1 tag gap: The 4.0.0 and 4.1.0 versions were cut internally (VERSION file bumps + CHANGELOG entries) but never tagged on GitHub. v4.1.1 is therefore the first tagged release in the 4.x series — its release notes cumulatively cover the phase-consolidation work from 4.0.0 (27 → 21 phases), the Phase H dissolution in 4.1.0 (21 → 20 phases), and the 4.1.1 patch additions. The "Prior major" entry above links to the commit where 4.0.0 was cut, since no tag exists for it.

Badge Color Convention

Each version segment gets its own badge. The number of badges indicates version depth:

Level	Current	Prior	Example
Major (`W.0.0`)			v1.0.0, v2.0.0
Minor (`W.X.0`)			v1.1.0, v1.2.0
Patch (`W.X.Y`)			v1.0.3, v1.2.1
Tweak (`W.X.Y.Z`)			v1.0.3.1

Color Scheme:

Current: brightgreen → darkgreen → green → yellow
Prior: brightred → darkred → red → orange

Overview

The /test skill performs a complete autonomous audit of any software project - running tests, scanning for vulnerabilities, checking code quality, validating production deployments, and auditing GitHub repository security settings. It fixes ALL issues automatically and loops until the codebase is clean.

Key Features:

🔄 Autonomous: Fixes all issues without prompting (no "manual items" lists)
🧩 Modular: 93% context reduction via on-demand phase loading
🔒 Security-First: Integrated CVE scanning, secret detection, GitHub security auditing
🌍 Multi-Language: Python, Node.js, Go, Rust, Shell, Docker, YAML
📸 BTRFS Snapshots: Safe rollback points before modifications
🐙 GitHub Integration: Full repository security audit and auto-remediation

Quick Start

/test                    # Full audit (autonomous - fixes everything)
/test security           # Comprehensive security audit (Phase 5)
/test --phase=0-2        # Pre-flight through testing
/test prodapp            # Validate installed production app
/test docker             # Validate Docker image and registry
/test github             # Audit GitHub repository settings
/test --phase=ST         # Validate test-skill framework (meta-testing)
/test --interactive      # Enable prompts for decisions
/test help               # Show all options

Phase Overview

Phase	Name	Description
Safety & Setup
S	Snapshot	BTRFS safety snapshot before modifications
0	Pre-Flight	Environment validation, config audit, sandbox setup
1	Discovery	Detect project type, test frameworks, tools, GitHub remote
Testing
2	Execute	Run tests, analyze results, measure coverage
2a	Runtime	Service health checks, stuck process detection
Analysis
5	Security	Comprehensive security (7 tools: bandit, semgrep, CodeQL, pip-audit, trivy, grype, checkov)
6	Dependencies	Package health, outdated/unused/vulnerable packages
7	Quality	Linting, complexity analysis, dead code cleanup
I	Infrastructure	Infrastructure and runtime issue detection
Remediation
10	Fix	Auto-fix issues (ruff --fix, black, isort, shfmt, codespell)
Validation
A	App Test	Deployable application testing in sandbox
P	Production	Validate live installed application
D	Docker	Validate Docker image and registry package
G	GitHub	Audit GitHub repo security (Dependabot, CodeQL, branch protection)
12	Verify	Re-run tests, confirm no regressions
Finalization
13	Docs	Update documentation to match codebase
C	Restore	Cleanup temp files, restore environment
Special
V	VM Testing	Heavy isolation testing in libvirt/QEMU VMs
VM	Lifecycle	VM startup/shutdown management
ST	Self-Test	Validate test-skill framework (explicit only: `--phase=ST`)

Execution Modes

Mode	Flag	Behavior
Autonomous (default)	(none)	Fixes ALL issues, no prompts, loops until clean
Interactive	`--interactive`	May prompt for decisions, single pass

Autonomous Mode (Default)

Fixes every issue regardless of priority/severity
No user prompts except for safety/architecture/external blocks
Loops between Fix (Phase 10) and Verify (Phase 12) until all tests pass
Documentation automatically synchronized

Interactive Mode

May prompt for decisions (e.g., Phase P/D conditional execution)
Still fixes ALL issues — interactive mode changes prompting behavior, not the fix mandate
Loops until all tests pass and all issues resolved
Useful when architectural decisions require human judgment

Installation

Full installation guide: See INSTALL.md for detailed instructions including prerequisites, verification, updating, and troubleshooting.

Quick Install (Symlinks)

git clone https://github.com/TheBoscoClub/claude-test-skill.git ~/claude-test-skill
mkdir -p ~/.claude/commands ~/.claude/skills
ln -s ~/claude-test-skill/commands/test.md ~/.claude/commands/test.md
ln -s ~/claude-test-skill/skills/test-phases ~/.claude/skills/test-phases

Quick Install (Copy)

git clone https://github.com/TheBoscoClub/claude-test-skill.git /tmp/claude-test-skill
mkdir -p ~/.claude/commands ~/.claude/skills
cp /tmp/claude-test-skill/commands/test.md ~/.claude/commands/
cp -r /tmp/claude-test-skill/skills/test-phases ~/.claude/skills/
rm -rf /tmp/claude-test-skill

Verify Installation

# In Claude Code:
/test --phase=ST

Requires: Claude Code 2.1.0+ (for YAML allowed-tools syntax). See INSTALL.md for full prerequisites.

Tool Detection

Phase 3 (Discovery) automatically detects which tools are installed on your system. The skill uses the tools it finds:

Code Quality Tools

Tool	Languages	Purpose	Install
ruff	Python	Fast linter + formatter	`pip install ruff`
pylint	Python	Deep static analysis	`pip install pylint`
mypy	Python	Type checking	`pip install mypy`
black	Python	Code formatting	`pip install black`
isort	Python	Import sorting	`pip install isort`
eslint	JS/TS	Linting	`npm install -g eslint`
prettier	JS/TS/JSON/MD	Formatting	`npm install -g prettier`
hadolint	Docker	Dockerfile linting	OS package manager
yamllint	YAML	YAML validation	`pip install yamllint`
shfmt	Shell	Shell formatting	OS package manager
markdownlint-cli	Markdown	Markdown linting	`npm install -g markdownlint-cli`
codespell	All	Spelling errors	`pip install codespell`

Security Tools

Tool	Purpose	Install
pip-audit	Python CVE scanning	`pip install pip-audit`
bandit	Python security analysis	`pip install bandit`
semgrep	Multi-language SAST	`pipx install semgrep`
npm audit	Node.js CVE scanning	(built-in)
cargo audit	Rust CVE scanning	`cargo install cargo-audit`
trivy	Container/filesystem scanning	OS package manager
grype	SBOM vulnerability scanning	OS package manager (AUR: grype-bin)
checkov	Infrastructure-as-Code security	`pipx install checkov`
CodeQL	Advanced static analysis	GitHub Actions / Local install

GitHub Tools

Tool	Purpose	Install
gh	GitHub CLI for repo auditing	OS package manager

Configuration

Projects can include .claude-test.yaml for customization:

# Test coverage requirements
coverage:
  minimum: 85
  fail_on_below: true

# Sandbox configuration
mocking:
  enabled: true
  sandbox_dir: /tmp/claude-test-sandbox-${PROJECT_NAME}

# Cleanup behavior
cleanup:
  after_test: true
  remove_sandbox: true

# Tool-specific settings
tools:
  ruff:
    extend-select: ["I", "UP", "YTT", "ASYNC"]
  pylint:
    disable: ["C0114", "C0115", "C0116"]

Phase Dependencies

Phases execute in tiers with strict dependencies:

TIER 0: Safety [S, 0] ─────────────────── Can run in parallel
           │
           ▼
TIER 1: Discovery [1] ──────────────────── GATE: Project Known
           │
           ▼
TIER 2: Testing [2, 2a] ────────────────── Can run in parallel
           │
           ▼
TIER 3: Analysis [5, 6, 7, I] ─────────── Can run in parallel (read-only)
           │
           ▼
TIER 4: Fix [10] ───────────────────────── MODIFIES FILES (sequential)
           │
           ▼
TIER 5: Validation [P, D, G] ───────────── CONDITIONAL (sequential)
           │
           ▼
TIER 6: Verify [12] ────────────────────── Re-run tests
           │
          ⟲ Loop to Fix if issues found
           │
           ▼
TIER 7: Docs [13] ──────────────────────── ALWAYS runs
           │
           ▼
TIER 8: Cleanup [C] ────────────────────── ALWAYS last

Conditional Phases

Phase 9b (Production) - Skipped if:

No installable app detected
App not installed on this system

Phase 9c (Docker) - Skipped if:

No Dockerfile in project
No registry package found

Phase 9d (GitHub) - Skipped if:

No GitHub remote configured
gh CLI not authenticated

GitHub Integration

Phase G performs a comprehensive GitHub repository audit:

Security Features Audited

✅ Dependabot vulnerability alerts
✅ Dependabot security updates
✅ Secret scanning (if available)
✅ Code scanning (CodeQL workflows)
✅ Branch protection rules

Automatic Remediation

Enables Dependabot alerts if missing
Enables automated security updates if missing
Reports open security alerts for manual review

Requirements

gh CLI installed and authenticated (gh auth login)
Push access to the repository (for enabling security features)

Architecture

claude-test-skill/
├── commands/
│   └── test.md              # Main dispatcher (~1,000 lines)
├── skills/
│   └── test-phases/         # 20 phase files (each with model tier config header)
│       ├── phase-1-snapshot.md         # [haiku]
│       ├── phase-2-preflight.md        # [sonnet]
│       ├── phase-3-discovery.md        # [opus]
│       ├── phase-4a-execute.md         # [sonnet]
│       ├── phase-4b-runtime.md         # [sonnet]
│       ├── phase-5a-security.md        # [opus]
│       ├── phase-5b-dependencies.md    # [sonnet]
│       ├── phase-5c-quality.md         # [opus]
│       ├── phase-5d-infrastructure.md  # [sonnet]
│       ├── phase-6-fix.md              # [opus]
│       ├── phase-7-verify.md           # [sonnet]
│       ├── phase-8-docs.md             # [sonnet]
│       ├── phase-9a-app-testing.md     # [opus]
│       ├── phase-9b-production.md      # [opus]
│       ├── phase-9c-docker.md          # [opus]
│       ├── phase-9d-github.md          # [opus]
│       ├── phase-10a-vm-testing.md     # [sonnet]
│       ├── phase-10b-vm-lifecycle.md   # [sonnet]
│       ├── phase-11-cleanup.md         # [haiku]
│       └── phase-ST-self-test.md       # [opus]
├── agents/                  # Integrated into phases (reference docs)
│   ├── coverage-reviewer.md # → Phase 4a
│   ├── security-scanner.md  # → Phase 5a
│   └── test-analyzer.md     # → Phase 4a
├── docs/
│   └── ARCHITECTURE.md      # System architecture
├── .github/
│   └── workflows/
│       └── security.yml     # Daily security scanning
├── plugin.json
├── INSTALL.md               # Third-party installation guide
├── SKILL.md                 # Claude.ai web upload version
└── README.md

Context Efficiency

The modular architecture significantly reduces context consumption:

Component	Lines	When Loaded
Dispatcher	~1,000	Always
Each Phase	50-300	On-demand via subagent
Typical audit	~1,500	vs 3,652 monolithic

Result: ~60% reduction in context for typical audits

Examples

Full Audit

/test

Runs all phases autonomously, fixes all issues, loops until clean.

Security-Only Audit

/test security
# or: /test --phase=5

Runs comprehensive security audit with 8 tools (GitHub + local + installed app).

Pre-Commit Check

/test --phase=0-2,7

Quick validation: Pre-flight, Discovery, Execute, and Quality.

Production Validation

/test prodapp

Validates the installed production application against install-manifest.json.

GitHub Repository Audit

/test github

Audits GitHub security settings and enables missing protections.

Framework Self-Test (Meta-Testing)

/test --phase=ST

Validates the test-skill framework itself (phase files, symlinks, tools). Note: Phase ST is never included in normal /test runs - explicit only.

Adding Custom Phases

Create ~/.claude/skills/test-phases/phase-X-name.md
Add phase to the Available Phases table in commands/test.md
Define tier placement in dependency graph
Document in README

Phase File Template

# Phase X: Your Phase Name

## Purpose
Brief description of what this phase does.

## Steps

### Step 1: First Action
[Instructions for Claude]

### Step 2: Second Action
[Instructions for Claude]

## Output Format

Status: ✅ PASS / ⚠️ ISSUES / ❌ FAIL
Issues Found: [count]
Key Findings:
- [finding 1]
- [finding 2]

Troubleshooting

"Phase G skipped: gh not authenticated"

Run gh auth login to authenticate with GitHub.

"No security tools detected"

Install the recommended tools for your language. Phase 1 will detect them automatically.

"BTRFS snapshot failed"

Ensure you have sudo access or run on a BTRFS filesystem. Snapshots are optional - the skill continues without them on other filesystems.

"Phase P skipped: App not installed"

Phase P validates production installations. If the app isn't installed on this system, Phase P correctly skips.

Contributing

Fork the repository
Create a feature branch
Make your changes
Run /test --phase=ST to validate framework integrity
Run /test on the skill itself for full audit
Submit a pull request

Code Quality Standards

All markdown must pass markdownlint
All YAML must pass yamllint
No hardcoded secrets or credentials

License

MIT License - See LICENSE file for details.

Changelog

See CHANGELOG.md for detailed version history.

Recent Releases

v4.0.0 - Phase consolidation (27 to 21), bloat reduction, project-agnostic, agents integrated (BREAKING)
v3.0.1 - Canonical help block, Phase ST grep fix, argument-hint update
v3.0.0 - Dispatcher execution fixes, Phase I/H integration, documentation consistency audit (BREAKING)
v2.0.1 - Opus 4.6 phase configuration headers, Phase ST integration validation
v2.0.0 - Opus 4.6 model pinning, subagent tiering, 16 tools, task tracking (BREAKING)

Addendum: On Human Multitasking and Evolution's LTS Release

Added after a user accidentally typed /git-release tweak instead of /git-release patch because someone in their Teams meeting said "tweak the memory" at the exact moment they were typing.

The Technical Analogy

Human cognition can be modeled as an I/O system where each modality (language, vision, motor) can handle multiple read-only input streams concurrently, but has only a single process table for output. When a read-only process suddenly needs to write, other read processes can insert data into the write process's I/O register—resulting in cross-talk.

┌─────────────────────────────────────────────────────────────┐
│ HUMAN COGNITION: I/O MODEL                                  │
├─────────────────────────────────────────────────────────────┤
│  LANGUAGE MODALITY                                          │
│  ┌─────────────────────────────────────────────────────┐   │
│  │  INPUT STREAMS (read-only, concurrent OK)           │   │
│  │  ├─ Teams meeting audio ──────► buffer[0]           │   │
│  │  ├─ Internal monologue ───────► buffer[1]           │   │
│  │  └─ Reading (if any) ─────────► buffer[2]           │   │
│  └─────────────────────────────────────────────────────┘   │
│                          │                                  │
│                          ▼                                  │
│  ┌─────────────────────────────────────────────────────┐   │
│  │  OUTPUT REGISTER (single writer, NO MUTEX)          │   │
│  │  ┌──────────────────────────────────────────────┐   │   │
│  │  │  "tweak" ← RACE CONDITION: buffer[0] won     │   │   │
│  │  └──────────────────────────────────────────────┘   │   │
│  └─────────────────────────────────────────────────────┘   │
│                          │                                  │
│                          ▼                                  │
│                     Motor Cortex (keystrokes)               │
└─────────────────────────────────────────────────────────────┘

Why Evolution Didn't Fix This

Evolution optimized for speed over correctness. The panic-response code demonstrates this clearly:

if (predator_detected) {
    // DO NOT WAIT for environment_scan() to complete
    // Tree collision is survivable. Tiger is not.
    motor_cortex.execute(FLEE);  // non-blocking, fire-and-forget
}

Survivorship bias in action: Ancestors who stopped to carefully survey escape routes got eaten. Those who face-planted into trees but survived passed on their genes.

Evolution v2.0.0-LTS

EVOLUTION v2.0.0-LTS (Homo sapiens)
├── Release: ~300,000 years ago
├── Support Status: ACTIVE (no EOL planned)
├── Known Issues:
│   ├── #4,271: Panic response overwrites output buffer
│   ├── #12,847: Sugar addiction (deprecated food scarcity)
│   └── #89,421: Cannot distinguish real tigers from work emails
├── Patch Frequency: ~1 per 10,000 generations
└── Upgrade Path: None available. You're stuck with this kernel.

The original devs are unreachable and left no documentation.

Regional Considerations

// Region-specific threat assessment
if (location.continent == "Asia" && habitat.includes("forest")) {
    TIGER_THREAT = LITERAL;      // Bengal, Siberian, Indochinese, etc.
    TREE_COLLISION_PRIORITY = ACCEPTABLE_RISK;
} else {
    TIGER_THREAT = METAPHORICAL; // deadlines, managers, merge conflicts
    TREE_COLLISION_PRIORITY = EMBARRASSING;
}

In rural India, Nepal, or the Russian Far East, that legacy panic-response code is still very much production-ready. Evolution's LTS release still getting real-world use cases.

This addendum serves as a reminder that humans have eventual consistency at best—and sometimes experience dirty reads.

Name		Name	Last commit message	Last commit date
Latest commit History 127 Commits
.claude/rules		.claude/rules
.github		.github
agents		agents
commands		commands
docs		docs
examples		examples
hooks		hooks
skills/test-phases		skills/test-phases
templates		templates
.claude-checkpoint		.claude-checkpoint
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
INSTALL.md		INSTALL.md
LICENSE		LICENSE
README.md		README.md
SECURITY.md		SECURITY.md
SKILL.md		SKILL.md
VERSION		VERSION
WIP-INTERACTIVE-MENU.md		WIP-INTERACTIVE-MENU.md
demo-fzf-menu.sh		demo-fzf-menu.sh
demo-whiptail-menu.sh		demo-whiptail-menu.sh
plugin.json		plugin.json
test-menu.sh		test-menu.sh

Folders and files

Latest commit

History

Repository files navigation

Claude Code Test Skill

Version History

Overview

Quick Start

Phase Overview

Execution Modes

Autonomous Mode (Default)

Interactive Mode

Installation

Quick Install (Symlinks)

Quick Install (Copy)

Verify Installation

Tool Detection

Code Quality Tools

Security Tools

GitHub Tools

Configuration

Phase Dependencies

Conditional Phases

GitHub Integration

Security Features Audited

Automatic Remediation

Requirements

Architecture

Context Efficiency

Examples

Full Audit

Security-Only Audit

Pre-Commit Check

Production Validation

GitHub Repository Audit

Framework Self-Test (Meta-Testing)

Adding Custom Phases

Phase File Template

Troubleshooting

"Phase G skipped: gh not authenticated"

"No security tools detected"

"BTRFS snapshot failed"

"Phase P skipped: App not installed"

Contributing

Code Quality Standards

License

Changelog

Recent Releases

Addendum: On Human Multitasking and Evolution's LTS Release

The Technical Analogy

Why Evolution Didn't Fix This

Evolution v2.0.0-LTS

Regional Considerations

About

Resources

License

Security policy

Uh oh!

Stars

Watchers

Forks

Releases 14

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages