Skip to content

TheBoscoClub/claude-test-skill

Repository files navigation

Claude Code Test Skill

A comprehensive 20-phase autonomous project audit system for Claude Code with full GitHub integration.

Security Scan GitHub Release

Version History

Version Status Release
411 Latest patch v4.1.1
400 Prior major (untagged) commit aa888d9
301 Prior patch v3.0.1
300 Prior major v3.0.0
201 Prior patch v2.0.1
200 Prior major v2.0.0

Note on the v3.0.1 → v4.1.1 tag gap: The 4.0.0 and 4.1.0 versions were cut internally (VERSION file bumps + CHANGELOG entries) but never tagged on GitHub. v4.1.1 is therefore the first tagged release in the 4.x series — its release notes cumulatively cover the phase-consolidation work from 4.0.0 (27 → 21 phases), the Phase H dissolution in 4.1.0 (21 → 20 phases), and the 4.1.1 patch additions. The "Prior major" entry above links to the commit where 4.0.0 was cut, since no tag exists for it.

Badge Color Convention

Each version segment gets its own badge. The number of badges indicates version depth:

Level Current Prior Example
Major (W.0.0) W W v1.0.0, v2.0.0
Minor (W.X.0) WX WX v1.1.0, v1.2.0
Patch (W.X.Y) WXY WXY v1.0.3, v1.2.1
Tweak (W.X.Y.Z) WXYZ WXYZ v1.0.3.1

Color Scheme:

  • Current: brightgreen → darkgreen → green → yellow
  • Prior: brightred → darkred → red → orange

Overview

The /test skill performs a complete autonomous audit of any software project - running tests, scanning for vulnerabilities, checking code quality, validating production deployments, and auditing GitHub repository security settings. It fixes ALL issues automatically and loops until the codebase is clean.

Key Features:

  • 🔄 Autonomous: Fixes all issues without prompting (no "manual items" lists)
  • 🧩 Modular: 93% context reduction via on-demand phase loading
  • 🔒 Security-First: Integrated CVE scanning, secret detection, GitHub security auditing
  • 🌍 Multi-Language: Python, Node.js, Go, Rust, Shell, Docker, YAML
  • 📸 BTRFS Snapshots: Safe rollback points before modifications
  • 🐙 GitHub Integration: Full repository security audit and auto-remediation

Quick Start

/test                    # Full audit (autonomous - fixes everything)
/test security           # Comprehensive security audit (Phase 5)
/test --phase=0-2        # Pre-flight through testing
/test prodapp            # Validate installed production app
/test docker             # Validate Docker image and registry
/test github             # Audit GitHub repository settings
/test --phase=ST         # Validate test-skill framework (meta-testing)
/test --interactive      # Enable prompts for decisions
/test help               # Show all options

Phase Overview

Phase Name Description
Safety & Setup
S Snapshot BTRFS safety snapshot before modifications
0 Pre-Flight Environment validation, config audit, sandbox setup
1 Discovery Detect project type, test frameworks, tools, GitHub remote
Testing
2 Execute Run tests, analyze results, measure coverage
2a Runtime Service health checks, stuck process detection
Analysis
5 Security Comprehensive security (7 tools: bandit, semgrep, CodeQL, pip-audit, trivy, grype, checkov)
6 Dependencies Package health, outdated/unused/vulnerable packages
7 Quality Linting, complexity analysis, dead code cleanup
I Infrastructure Infrastructure and runtime issue detection
Remediation
10 Fix Auto-fix issues (ruff --fix, black, isort, shfmt, codespell)
Validation
A App Test Deployable application testing in sandbox
P Production Validate live installed application
D Docker Validate Docker image and registry package
G GitHub Audit GitHub repo security (Dependabot, CodeQL, branch protection)
12 Verify Re-run tests, confirm no regressions
Finalization
13 Docs Update documentation to match codebase
C Restore Cleanup temp files, restore environment
Special
V VM Testing Heavy isolation testing in libvirt/QEMU VMs
VM Lifecycle VM startup/shutdown management
ST Self-Test Validate test-skill framework (explicit only: --phase=ST)

Execution Modes

Mode Flag Behavior
Autonomous (default) (none) Fixes ALL issues, no prompts, loops until clean
Interactive --interactive May prompt for decisions, single pass

Autonomous Mode (Default)

  • Fixes every issue regardless of priority/severity
  • No user prompts except for safety/architecture/external blocks
  • Loops between Fix (Phase 10) and Verify (Phase 12) until all tests pass
  • Documentation automatically synchronized

Interactive Mode

  • May prompt for decisions (e.g., Phase P/D conditional execution)
  • Still fixes ALL issues — interactive mode changes prompting behavior, not the fix mandate
  • Loops until all tests pass and all issues resolved
  • Useful when architectural decisions require human judgment

Installation

Full installation guide: See INSTALL.md for detailed instructions including prerequisites, verification, updating, and troubleshooting.

Quick Install (Symlinks)

git clone https://github.com/TheBoscoClub/claude-test-skill.git ~/claude-test-skill
mkdir -p ~/.claude/commands ~/.claude/skills
ln -s ~/claude-test-skill/commands/test.md ~/.claude/commands/test.md
ln -s ~/claude-test-skill/skills/test-phases ~/.claude/skills/test-phases

Quick Install (Copy)

git clone https://github.com/TheBoscoClub/claude-test-skill.git /tmp/claude-test-skill
mkdir -p ~/.claude/commands ~/.claude/skills
cp /tmp/claude-test-skill/commands/test.md ~/.claude/commands/
cp -r /tmp/claude-test-skill/skills/test-phases ~/.claude/skills/
rm -rf /tmp/claude-test-skill

Verify Installation

# In Claude Code:
/test --phase=ST

Requires: Claude Code 2.1.0+ (for YAML allowed-tools syntax). See INSTALL.md for full prerequisites.


Tool Detection

Phase 3 (Discovery) automatically detects which tools are installed on your system. The skill uses the tools it finds:

Code Quality Tools

Tool Languages Purpose Install
ruff Python Fast linter + formatter pip install ruff
pylint Python Deep static analysis pip install pylint
mypy Python Type checking pip install mypy
black Python Code formatting pip install black
isort Python Import sorting pip install isort
eslint JS/TS Linting npm install -g eslint
prettier JS/TS/JSON/MD Formatting npm install -g prettier
hadolint Docker Dockerfile linting OS package manager
yamllint YAML YAML validation pip install yamllint
shfmt Shell Shell formatting OS package manager
markdownlint-cli Markdown Markdown linting npm install -g markdownlint-cli
codespell All Spelling errors pip install codespell

Security Tools

Tool Purpose Install
pip-audit Python CVE scanning pip install pip-audit
bandit Python security analysis pip install bandit
semgrep Multi-language SAST pipx install semgrep
npm audit Node.js CVE scanning (built-in)
cargo audit Rust CVE scanning cargo install cargo-audit
trivy Container/filesystem scanning OS package manager
grype SBOM vulnerability scanning OS package manager (AUR: grype-bin)
checkov Infrastructure-as-Code security pipx install checkov
CodeQL Advanced static analysis GitHub Actions / Local install

GitHub Tools

Tool Purpose Install
gh GitHub CLI for repo auditing OS package manager

Configuration

Projects can include .claude-test.yaml for customization:

# Test coverage requirements
coverage:
  minimum: 85
  fail_on_below: true

# Sandbox configuration
mocking:
  enabled: true
  sandbox_dir: /tmp/claude-test-sandbox-${PROJECT_NAME}

# Cleanup behavior
cleanup:
  after_test: true
  remove_sandbox: true

# Tool-specific settings
tools:
  ruff:
    extend-select: ["I", "UP", "YTT", "ASYNC"]
  pylint:
    disable: ["C0114", "C0115", "C0116"]

Phase Dependencies

Phases execute in tiers with strict dependencies:

TIER 0: Safety [S, 0] ─────────────────── Can run in parallel
           │
           ▼
TIER 1: Discovery [1] ──────────────────── GATE: Project Known
           │
           ▼
TIER 2: Testing [2, 2a] ────────────────── Can run in parallel
           │
           ▼
TIER 3: Analysis [5, 6, 7, I] ─────────── Can run in parallel (read-only)
           │
           ▼
TIER 4: Fix [10] ───────────────────────── MODIFIES FILES (sequential)
           │
           ▼
TIER 5: Validation [P, D, G] ───────────── CONDITIONAL (sequential)
           │
           ▼
TIER 6: Verify [12] ────────────────────── Re-run tests
           │
          ⟲ Loop to Fix if issues found
           │
           ▼
TIER 7: Docs [13] ──────────────────────── ALWAYS runs
           │
           ▼
TIER 8: Cleanup [C] ────────────────────── ALWAYS last

Conditional Phases

Phase 9b (Production) - Skipped if:

  • No installable app detected
  • App not installed on this system

Phase 9c (Docker) - Skipped if:

  • No Dockerfile in project
  • No registry package found

Phase 9d (GitHub) - Skipped if:

  • No GitHub remote configured
  • gh CLI not authenticated

GitHub Integration

Phase G performs a comprehensive GitHub repository audit:

Security Features Audited

  • ✅ Dependabot vulnerability alerts
  • ✅ Dependabot security updates
  • ✅ Secret scanning (if available)
  • ✅ Code scanning (CodeQL workflows)
  • ✅ Branch protection rules

Automatic Remediation

  • Enables Dependabot alerts if missing
  • Enables automated security updates if missing
  • Reports open security alerts for manual review

Requirements

  • gh CLI installed and authenticated (gh auth login)
  • Push access to the repository (for enabling security features)

Architecture

claude-test-skill/
├── commands/
│   └── test.md              # Main dispatcher (~1,000 lines)
├── skills/
│   └── test-phases/         # 20 phase files (each with model tier config header)
│       ├── phase-1-snapshot.md         # [haiku]
│       ├── phase-2-preflight.md        # [sonnet]
│       ├── phase-3-discovery.md        # [opus]
│       ├── phase-4a-execute.md         # [sonnet]
│       ├── phase-4b-runtime.md         # [sonnet]
│       ├── phase-5a-security.md        # [opus]
│       ├── phase-5b-dependencies.md    # [sonnet]
│       ├── phase-5c-quality.md         # [opus]
│       ├── phase-5d-infrastructure.md  # [sonnet]
│       ├── phase-6-fix.md              # [opus]
│       ├── phase-7-verify.md           # [sonnet]
│       ├── phase-8-docs.md             # [sonnet]
│       ├── phase-9a-app-testing.md     # [opus]
│       ├── phase-9b-production.md      # [opus]
│       ├── phase-9c-docker.md          # [opus]
│       ├── phase-9d-github.md          # [opus]
│       ├── phase-10a-vm-testing.md     # [sonnet]
│       ├── phase-10b-vm-lifecycle.md   # [sonnet]
│       ├── phase-11-cleanup.md         # [haiku]
│       └── phase-ST-self-test.md       # [opus]
├── agents/                  # Integrated into phases (reference docs)
│   ├── coverage-reviewer.md # → Phase 4a
│   ├── security-scanner.md  # → Phase 5a
│   └── test-analyzer.md     # → Phase 4a
├── docs/
│   └── ARCHITECTURE.md      # System architecture
├── .github/
│   └── workflows/
│       └── security.yml     # Daily security scanning
├── plugin.json
├── INSTALL.md               # Third-party installation guide
├── SKILL.md                 # Claude.ai web upload version
└── README.md

Context Efficiency

The modular architecture significantly reduces context consumption:

Component Lines When Loaded
Dispatcher ~1,000 Always
Each Phase 50-300 On-demand via subagent
Typical audit ~1,500 vs 3,652 monolithic

Result: ~60% reduction in context for typical audits


Examples

Full Audit

/test

Runs all phases autonomously, fixes all issues, loops until clean.

Security-Only Audit

/test security
# or: /test --phase=5

Runs comprehensive security audit with 8 tools (GitHub + local + installed app).

Pre-Commit Check

/test --phase=0-2,7

Quick validation: Pre-flight, Discovery, Execute, and Quality.

Production Validation

/test prodapp

Validates the installed production application against install-manifest.json.

GitHub Repository Audit

/test github

Audits GitHub security settings and enables missing protections.

Framework Self-Test (Meta-Testing)

/test --phase=ST

Validates the test-skill framework itself (phase files, symlinks, tools). Note: Phase ST is never included in normal /test runs - explicit only.


Adding Custom Phases

  1. Create ~/.claude/skills/test-phases/phase-X-name.md
  2. Add phase to the Available Phases table in commands/test.md
  3. Define tier placement in dependency graph
  4. Document in README

Phase File Template

# Phase X: Your Phase Name

## Purpose
Brief description of what this phase does.

## Steps

### Step 1: First Action
[Instructions for Claude]

### Step 2: Second Action
[Instructions for Claude]

## Output Format

Status: ✅ PASS / ⚠️ ISSUES / ❌ FAIL
Issues Found: [count]
Key Findings:
- [finding 1]
- [finding 2]

Troubleshooting

"Phase G skipped: gh not authenticated"

Run gh auth login to authenticate with GitHub.

"No security tools detected"

Install the recommended tools for your language. Phase 1 will detect them automatically.

"BTRFS snapshot failed"

Ensure you have sudo access or run on a BTRFS filesystem. Snapshots are optional - the skill continues without them on other filesystems.

"Phase P skipped: App not installed"

Phase P validates production installations. If the app isn't installed on this system, Phase P correctly skips.


Contributing

  1. Fork the repository
  2. Create a feature branch
  3. Make your changes
  4. Run /test --phase=ST to validate framework integrity
  5. Run /test on the skill itself for full audit
  6. Submit a pull request

Code Quality Standards

  • All markdown must pass markdownlint
  • All YAML must pass yamllint
  • No hardcoded secrets or credentials

License

MIT License - See LICENSE file for details.


Changelog

See CHANGELOG.md for detailed version history.

Recent Releases

  • v4.0.0 - Phase consolidation (27 to 21), bloat reduction, project-agnostic, agents integrated (BREAKING)
  • v3.0.1 - Canonical help block, Phase ST grep fix, argument-hint update
  • v3.0.0 - Dispatcher execution fixes, Phase I/H integration, documentation consistency audit (BREAKING)
  • v2.0.1 - Opus 4.6 phase configuration headers, Phase ST integration validation
  • v2.0.0 - Opus 4.6 model pinning, subagent tiering, 16 tools, task tracking (BREAKING)

Addendum: On Human Multitasking and Evolution's LTS Release

Added after a user accidentally typed /git-release tweak instead of /git-release patch because someone in their Teams meeting said "tweak the memory" at the exact moment they were typing.

The Technical Analogy

Human cognition can be modeled as an I/O system where each modality (language, vision, motor) can handle multiple read-only input streams concurrently, but has only a single process table for output. When a read-only process suddenly needs to write, other read processes can insert data into the write process's I/O register—resulting in cross-talk.

┌─────────────────────────────────────────────────────────────┐
│ HUMAN COGNITION: I/O MODEL                                  │
├─────────────────────────────────────────────────────────────┤
│  LANGUAGE MODALITY                                          │
│  ┌─────────────────────────────────────────────────────┐   │
│  │  INPUT STREAMS (read-only, concurrent OK)           │   │
│  │  ├─ Teams meeting audio ──────► buffer[0]           │   │
│  │  ├─ Internal monologue ───────► buffer[1]           │   │
│  │  └─ Reading (if any) ─────────► buffer[2]           │   │
│  └─────────────────────────────────────────────────────┘   │
│                          │                                  │
│                          ▼                                  │
│  ┌─────────────────────────────────────────────────────┐   │
│  │  OUTPUT REGISTER (single writer, NO MUTEX)          │   │
│  │  ┌──────────────────────────────────────────────┐   │   │
│  │  │  "tweak" ← RACE CONDITION: buffer[0] won     │   │   │
│  │  └──────────────────────────────────────────────┘   │   │
│  └─────────────────────────────────────────────────────┘   │
│                          │                                  │
│                          ▼                                  │
│                     Motor Cortex (keystrokes)               │
└─────────────────────────────────────────────────────────────┘

Why Evolution Didn't Fix This

Evolution optimized for speed over correctness. The panic-response code demonstrates this clearly:

if (predator_detected) {
    // DO NOT WAIT for environment_scan() to complete
    // Tree collision is survivable. Tiger is not.
    motor_cortex.execute(FLEE);  // non-blocking, fire-and-forget
}

Survivorship bias in action: Ancestors who stopped to carefully survey escape routes got eaten. Those who face-planted into trees but survived passed on their genes.

Evolution v2.0.0-LTS

EVOLUTION v2.0.0-LTS (Homo sapiens)
├── Release: ~300,000 years ago
├── Support Status: ACTIVE (no EOL planned)
├── Known Issues:
│   ├── #4,271: Panic response overwrites output buffer
│   ├── #12,847: Sugar addiction (deprecated food scarcity)
│   └── #89,421: Cannot distinguish real tigers from work emails
├── Patch Frequency: ~1 per 10,000 generations
└── Upgrade Path: None available. You're stuck with this kernel.

The original devs are unreachable and left no documentation.

Regional Considerations

// Region-specific threat assessment
if (location.continent == "Asia" && habitat.includes("forest")) {
    TIGER_THREAT = LITERAL;      // Bengal, Siberian, Indochinese, etc.
    TREE_COLLISION_PRIORITY = ACCEPTABLE_RISK;
} else {
    TIGER_THREAT = METAPHORICAL; // deadlines, managers, merge conflicts
    TREE_COLLISION_PRIORITY = EMBARRASSING;
}

In rural India, Nepal, or the Russian Far East, that legacy panic-response code is still very much production-ready. Evolution's LTS release still getting real-world use cases.


This addendum serves as a reminder that humans have eventual consistency at best—and sometimes experience dirty reads.

About

Modular project audit skill for Claude Code - comprehensive security, quality, and testing phases

Resources

License

Security policy

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages