From 5bdd3be749c14cb0e5f6249fcae2f2fdb3fab9a3 Mon Sep 17 00:00:00 2001
From: Mo King <muhsinking@gmail.com>
Date: Thu, 19 Mar 2026 22:57:05 -0400
Subject: [PATCH 1/8] Add agent experience testing framework, expand .claude

---
 .claude/architecture.md | 150 ++++++++++++++++++++++++
 .claude/development.md  | 114 +++++++++++++++++++
 .claude/style-guide.md  |  94 ++++++++++++++++
 .claude/testing.md      |  84 ++++++++++++++
 .gitignore              |   5 +
 CLAUDE.md               | 153 ++++---------------------
 README.md               |  58 ++++++++++
 tests/README.md         |  42 +++++++
 tests/TESTS.md          | 244 ++++++++++++++++++++++++++++++++++++++++
 9 files changed, 816 insertions(+), 128 deletions(-)
 create mode 100644 .claude/architecture.md
 create mode 100644 .claude/development.md
 create mode 100644 .claude/style-guide.md
 create mode 100644 .claude/testing.md
 create mode 100644 tests/README.md
 create mode 100644 tests/TESTS.md

diff --git a/.claude/architecture.md b/.claude/architecture.md
new file mode 100644
index 00000000..047e8fcf
--- /dev/null
+++ b/.claude/architecture.md
@@ -0,0 +1,150 @@
+# Documentation Architecture
+
+## Directory Structure
+
+```
+mintlifydocs/
+├── docs.json              # Site configuration, navigation, theme, redirects
+├── CLAUDE.md              # AI assistant instructions (this file's parent)
+│
+├── get-started/           # Onboarding and account setup
+├── flash/                 # Flash SDK (Python functions on cloud GPUs)
+├── serverless/            # Serverless workers, endpoints, vLLM
+├── pods/                  # GPU/CPU instances
+├── storage/               # Network volumes, S3 API
+├── hub/                   # Runpod Hub and publishing
+├── public-endpoints/      # Public API endpoints
+├── instant-clusters/      # Multi-node GPU clusters
+├── sdks/                  # Python, JavaScript, Go, GraphQL SDKs
+├── runpodctl/             # CLI documentation
+├── api-reference/         # REST API reference
+├── integrations/          # Third-party integrations
+├── tutorials/             # Step-by-step guides
+├── references/            # Reference tables (GPU types, billing, etc.)
+├── community-solutions/   # Community-contributed content
+│
+├── snippets/              # Reusable content fragments
+│   ├── tooltips.jsx       # Tooltip component definitions
+│   └── *.mdx              # Reusable MDX snippets (e.g., pricing tables)
+│
+├── images/                # Static image assets
+├── logo/                  # Logo files
+├── styles/                # Custom CSS
+│
+├── scripts/               # Utility scripts
+│   └── validate-tooltips.js
+│
+└── helpers/               # Python scripts for generating content
+    ├── gpu_types.py       # Generates GPU reference tables
+    └── sls_cpu_types.py   # Generates CPU reference tables
+```
+
+## Configuration (docs.json)
+
+The `docs.json` file controls:
+
+- **Theme and styling**: Colors, fonts, code block themes
+- **Navigation**: Tab/group/page hierarchy
+- **SEO**: Meta tags, Open Graph images
+- **Redirects**: URL redirects for moved/renamed pages
+
+### Navigation Structure
+
+Pages are organized in a hierarchy:
+```
+tabs → groups → pages
+```
+
+Example:
+```json
+{
+  "tab": "Docs",
+  "groups": [
+    {
+      "group": "Serverless",
+      "pages": [
+        "serverless/overview",
+        "serverless/quickstart",
+        {
+          "group": "vLLM",
+          "pages": ["serverless/vllm/overview", "serverless/vllm/get-started"]
+        }
+      ]
+    }
+  ]
+}
+```
+
+Pages are referenced by file path without the `.mdx` extension.
+
+## MDX Files
+
+Each documentation page is an MDX file with:
+
+1. **Frontmatter** (required):
+   ```yaml
+   ---
+   title: "Page title"
+   sidebarTitle: "Shorter sidebar title"
+   description: "SEO description for the page."
+   ---
+   ```
+
+2. **Imports** (optional): React components, tooltips, snippets
+3. **Content**: Markdown with JSX components
+
+## Snippets
+
+Reusable content in `snippets/`:
+
+- **MDX snippets**: Embed with `import Table from '/snippets/pricing-table.mdx'`
+- **JSX components**: Import specific exports like tooltips
+
+### Tooltips
+
+Tooltips provide hover definitions for technical terms. Defined in `snippets/tooltips.jsx`.
+
+**Structure:**
+```jsx
+export const PodTooltip = () => {
+  return (
+    <Tooltip
+      headline="Pod"
+      tip="A dedicated GPU or CPU instance for containerized AI/ML workloads."
+      cta="Learn more about Pods"
+      href="/pods/overview"
+    >Pod</Tooltip>
+  );
+};
+```
+
+**Usage in MDX:**
+```mdx
+import { PodTooltip, TemplateTooltip } from "/snippets/tooltips.jsx";
+
+Deploy your first GPU <PodTooltip /> using a <TemplateTooltip />.
+```
+
+**Guidelines:**
+- Use for Runpod-specific terms users might not know.
+- Most tooltips have singular/plural variants (`PodTooltip`, `PodsTooltip`).
+- Group by category: Pods, Serverless, Storage, Products, Concepts, AI/ML, Flash.
+- Run `scripts/validate-tooltips.js` to check imports.
+
+## Adding New Pages
+
+1. Create `.mdx` file in the appropriate directory.
+2. Add frontmatter with `title`, `sidebarTitle`, and `description`.
+3. Add the page path to `docs.json` navigation.
+4. Import tooltips for technical terms.
+
+## Redirects
+
+When moving or renaming pages, add to `docs.json`:
+```json
+{
+  "redirects": [
+    { "source": "/old-path", "destination": "/new-path" }
+  ]
+}
+```
diff --git a/.claude/development.md b/.claude/development.md
new file mode 100644
index 00000000..1dbdcf9b
--- /dev/null
+++ b/.claude/development.md
@@ -0,0 +1,114 @@
+# Development Guide
+
+## Local Development
+
+### Setup
+
+Install Mintlify globally:
+```bash
+npm i -g mintlify
+```
+
+Start the local development server:
+```bash
+mintlify dev
+```
+
+Most changes are reflected live without restarting the server.
+
+### Linting
+
+Install [Vale](https://vale.sh/docs/vale-cli/installation/), then lint files:
+```bash
+vale path/to/docs/
+vale path/to/*.mdx
+```
+
+Vale is configured with Google and Readability style guides via `.vale.ini`.
+
+### Python Code Formatting
+
+Format Python code examples in documentation:
+```bash
+pip install blacken-docs
+git ls-files -z -- '*.mdx' | xargs -0 blacken-docs
+```
+
+## Helper Scripts
+
+### Update GPU/CPU Reference Tables
+
+These scripts fetch current types from Runpod's GraphQL API:
+```bash
+cd helpers
+python gpu_types.py      # Updates GPU reference tables
+python sls_cpu_types.py  # Updates CPU reference tables
+```
+
+Requirements: `requests`, `tabulate`, `pandas` (see `helpers/requirements.txt`).
+
+### Validate Tooltips
+
+Check that all imported tooltips exist:
+```bash
+node scripts/validate-tooltips.js
+```
+
+This runs automatically in CI via `.github/workflows/validate-tooltips.yml`.
+
+## Publishing Workflow
+
+1. Create a pull request with changes.
+2. Request review from [@muhsinking](https://github.com/muhsinking).
+3. Changes deploy automatically to production after merge to `main` branch.
+
+## Common Tasks
+
+### Add a New Page
+
+1. Create `.mdx` file in the appropriate directory.
+2. Add frontmatter:
+   ```yaml
+   ---
+   title: "Full page title"
+   sidebarTitle: "Shorter title"
+   description: "SEO description."
+   ---
+   ```
+3. Add the page path to `docs.json` navigation.
+4. Import tooltips for Runpod-specific terms.
+
+### Add a New Tooltip
+
+1. Open `snippets/tooltips.jsx`.
+2. Add a new export in the appropriate category:
+   ```jsx
+   export const NewTermTooltip = () => {
+     return (
+       <Tooltip
+         headline="New Term"
+         tip="Definition of the new term."
+         cta="Learn more"
+         href="/path/to/docs"
+       >new term</Tooltip>
+     );
+   };
+   ```
+3. Create singular and plural variants if needed.
+
+### Move or Rename a Page
+
+1. Move/rename the `.mdx` file.
+2. Update `docs.json` navigation.
+3. Add a redirect in `docs.json`:
+   ```json
+   {
+     "redirects": [
+       { "source": "/old-path", "destination": "/new-path" }
+     ]
+   }
+   ```
+
+### Update a Pricing Table
+
+Edit `snippets/serverless-gpu-pricing-table.mdx` or run the helper scripts to regenerate from the API.
diff --git a/.claude/style-guide.md b/.claude/style-guide.md
new file mode 100644
index 00000000..fc584b87
--- /dev/null
+++ b/.claude/style-guide.md
@@ -0,0 +1,94 @@
+# Style Guide
+
+Follow the Runpod style guide (`.cursor/rules/rp-styleguide.mdc`) and Google Developer Style Guide (`.cursor/rules/google-style-guide.mdc`).
+
+## Capitalization and Terminology
+
+### Proper Nouns (always capitalize)
+- Runpod
+- Pods
+- Serverless
+- Hub
+- Instant Clusters
+- Secure Cloud
+- Community Cloud
+- Flash
+
+### Generic Terms (lowercase)
+- endpoint
+- worker
+- cluster
+- template
+- handler
+- fine-tune
+- network volume
+- data center
+
+### Headings
+Always use **sentence case** for headings and titles:
+- ✅ "Create a serverless endpoint"
+- ❌ "Create a Serverless Endpoint"
+
+## Writing Style
+
+- Use **second person** ("you") instead of first person plural ("we").
+- Prefer **active voice** over passive voice.
+- Use **American English** spelling.
+- Prefer **paragraphs** over bullet points unless listing discrete items.
+- When using bullet points, **end each with a period**.
+
+## Tutorial Structure
+
+Tutorials should include:
+
+1. **Requirements** section (not "Prerequisites")
+2. Numbered steps using format: `## Step 1: Create a widget`
+3. Clear expected outcomes for each step
+
+Example:
+```markdown
+## Requirements
+
+- A Runpod account with credits
+- Docker installed locally
+
+## Step 1: Create a template
+
+Navigate to the Templates page...
+
+## Step 2: Deploy the endpoint
+
+Click Deploy and configure...
+```
+
+## Code Examples
+
+- Always use code blocks with **language identifiers**.
+- **Precede** code with context explaining what it does.
+- **Follow** code with explanation of key parts.
+- Include a **file title** where it makes sense.
+
+Example:
+````markdown
+Create a handler function that processes image generation requests:
+
+```python handler.py
+import runpod
+
+def handler(job):
+    prompt = job["input"]["prompt"]
+    # Generate image...
+    return {"image_url": result}
+
+runpod.serverless.start({"handler": handler})
+```
+
+The `handler` function receives a job dictionary containing the input from the API request.
+````
+
+## API and Code References
+
+- Use backticks for inline code: `runpod.serverless.start()`
+- Use backticks for file paths: `serverless/workers/handler.py`
+- Use backticks for environment variables: `RUNPOD_API_KEY`
+- Use backticks for API endpoints: `/v2/endpoint_id/run`
diff --git a/.claude/testing.md b/.claude/testing.md
new file mode 100644
index 00000000..62a571eb
--- /dev/null
+++ b/.claude/testing.md
@@ -0,0 +1,84 @@
+# Documentation Agent Tests
+
+The `tests/` directory contains minimal test definitions that simulate real user prompts. Tests are intentionally sparse - the agent must figure out how to accomplish the goal using only the documentation.
+
+## Philosophy
+
+Tests should be **hard to pass**. They simulate a user typing a simple request without context. If the docs are good, the agent figures it out. If not, the test reveals gaps.
+
+## Running Tests
+
+Use natural language:
+```
+Run the flash-quickstart test
+Run the vllm-deploy test using local docs
+Run all pods tests
+```
+
+## Test Execution Rules
+
+1. Read the test definition from `tests/TESTS.md`.
+2. **Do NOT use prior knowledge** - only use Runpod docs.
+3. Attempt to complete the goal using available tools.
+4. All created resources must use `doc_test_` prefix.
+5. Clean up resources after test.
+6. Write report to `tests/reports/{test-id}-{timestamp}.md`.
+
+## Doc Source Modes
+
+### Published Docs (default)
+
+Use the `mcp__runpod-dops__search_runpod_documentation` tool to search the live published documentation. This tests what real users see.
+
+### Local Docs
+
+When the user says "using local docs":
+- Search and read `.mdx` files directly from this repository.
+- Use Glob to find files: `**/*.mdx`
+- Use Grep to search content.
+- Use Read to read file contents.
+
+This validates unpublished doc changes before they go live.
+
+## Report Format
+
+```markdown
+# Test Report: {Test Name}
+
+**Date:** {timestamp}
+**Status:** PASS | FAIL | PARTIAL
+
+## What Happened
+Brief narrative of the attempt.
+
+## Where I Got Stuck
+Specific points of confusion or failure.
+
+## Documentation Gaps
+What was missing or unclear in the docs.
+
+## Suggestions
+Specific improvements to make tests pass.
+```
+
+## Test Categories
+
+Tests are organized by product area in `tests/TESTS.md`:
+
+- **Flash SDK**: Deploying Python functions
+- **Serverless Endpoints**: Creating and managing endpoints (must deploy real endpoints, not use public endpoints)
+- **vLLM**: Deploying LLM inference (must deploy real endpoints, not use public endpoints)
+- **Pods**: Creating and managing GPU instances
+- **Storage**: Network volumes and file transfer
+- **Templates**: Creating and using templates
+- **Instant Clusters**: Multi-node deployments
+- **SDKs & APIs**: Using client libraries
+- **CLI (runpodctl)**: Command-line operations
+- **Integrations**: Third-party tool integration
+- **Tutorials**: End-to-end workflows
+
+## Requirements
+
+- Runpod API MCP server configured
+- Runpod Docs MCP server configured
+- Docker available for building custom images
diff --git a/.gitignore b/.gitignore
index 64309e45..e1e8f6b4 100644
--- a/.gitignore
+++ b/.gitignore
@@ -30,3 +30,8 @@ helpers/__pycache__/** */
 .idea/*
 
 /.mintlify-last
+
+# Documentation test reports
+tests/reports/
+
+.serena
\ No newline at end of file
diff --git a/CLAUDE.md b/CLAUDE.md
index 0be9f6d3..47dd1de4 100644
--- a/CLAUDE.md
+++ b/CLAUDE.md
@@ -1,146 +1,43 @@
 # CLAUDE.md
 
-This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
+This file provides guidance to Claude Code when working with this repository.
 
 ## Project Overview
 
-This is the Runpod documentation site, built using [Mintlify](https://mintlify.com/). The documentation covers Runpod's cloud GPU platform, including Serverless endpoints, Pods, storage solutions, SDKs, and APIs.
+This is the Runpod documentation site, built with [Mintlify](https://mintlify.com/). The documentation covers Runpod's cloud GPU platform: Serverless endpoints, Pods, Flash SDK, storage, and APIs.
 
-## Development Commands
+## Quick Reference
 
-### Local Development
+| Topic | File |
+|-------|------|
+| Directory structure, navigation, snippets, tooltips | [.claude/architecture.md](.claude/architecture.md) |
+| Writing style, capitalization, terminology | [.claude/style-guide.md](.claude/style-guide.md) |
+| Running and writing documentation tests | [.claude/testing.md](.claude/testing.md) |
+| Local dev, linting, publishing workflow | [.claude/development.md](.claude/development.md) |
 
-Install Mintlify globally:
-```bash
-npm i -g mintlify
-```
-
-Start the local development server:
-```bash
-mintlify dev
-```
-
-Most changes will be reflected live without restarting the server.
-
-### Linting
-
-Install [vale](https://vale.sh/docs/vale-cli/installation/), then lint specific files or folders:
-```bash
-vale path/to/docs/
-# or
-vale path/to/*.md
-```
+## Key Commands
 
-Vale is configured with Google and Readability style guides via `.vale.ini`.
-
-### Python Code Formatting
-
-For Python code examples in documentation:
-```bash
-python -m pip install blacken-docs
-yarn format
-# or directly:
-git ls-files -z -- '*.md' | xargs -0 blacken-docs
-```
-
-### Update GPU and CPU Reference Tables
-
-These scripts fetch current GPU/CPU types from Runpod's GraphQL API and regenerate reference documentation:
 ```bash
-python helpers/gpu_types.py
-python helpers/sls_cpu_types.py
+mintlify dev                    # Start local dev server
+vale path/to/file.mdx           # Lint documentation
+node scripts/validate-tooltips.js  # Check tooltip imports
 ```
 
-The scripts require: `requests`, `tabulate`, and `pandas` (see `helpers/requirements.txt`).
-
-## Documentation Architecture
-
-### Content Organization
-
-The site is organized into major product areas, defined in `docs.json`:
-
-- **Serverless**: Worker handlers, endpoints, vLLM deployments, and load balancing
-- **Pods**: GPU instances, storage, templates, and connections
-- **Storage**: Network volumes and S3 API
-- **Hub**: Public endpoints and publishing guides
-- **Instant Clusters**: Multi-node GPU clusters
-- **SDKs**: Python, JavaScript, Go, and GraphQL client libraries
-- **API Reference**: REST API documentation for all resources
-- **Examples/Tutorials**: Step-by-step guides organized by product area
-- **Community**: Community-contributed tools and solutions
-
-### File Structure
-
-- **Documentation files**: MDX (`.mdx`) files organized by product area
-- **Snippets**: Reusable content fragments in `snippets/`
-- **Images**: Static assets in `images/`
-- **Configuration**: `docs.json` defines site structure, navigation, theme, and redirects
-
-### Navigation and Routing
-
-The `docs.json` file controls all site navigation through a hierarchical tab/group/page structure. Pages are referenced by their file path (without extension). When adding new documentation, you must update the `navigation.tabs` array in `docs.json` to make pages visible.
-
-### vLLM Documentation
-
-The vLLM section (`serverless/vllm/`) documents Runpod's vLLM worker for LLM inference. Key topics:
-- vLLM overview and architecture (PagedAttention, continuous batching)
-- Getting started and configuration
-- Environment variable reference
-- OpenAI API compatibility
-- Request handling
-
-vLLM documentation should explain both the underlying vLLM technology and Runpod-specific integration details.
-
-## Style Guidelines
-
-Follow the Runpod style guide (`.cursor/rules/rp-styleguide.mdc`) and Google Developer Style Guide (`.cursor/rules/google-style-guide.mdc`):
-
-### Capitalization and Terminology
-
-- **Always use sentence case** for headings and titles
-- **Proper nouns**: Runpod, Pods, Serverless, Hub, Instant Clusters, Secure Cloud, Community Cloud, Flash
-- **Generic terms** (lowercase): endpoint, worker, cluster, template, handler, fine-tune, network volume
-
-### Writing Style
-
-- Use second person ("you") instead of first person plural ("we")
-- Prefer active voice
-- Use American English spelling
-- Prefer paragraphs over bullet points unless specifically requested
-- When using bullet points, end each with a period
-
-### Tutorial Structure
-
-Tutorials should include:
-- **What you'll learn** section
-- **Requirements** section (not "Prerequisites")
-- Numbered steps using format: `## Step 1: Create a widget`
-
-### Code Examples
-
-- Always use code blocks with language identifiers
-- Precede code with context/purpose explanation
-- Follow code with explanation of key parts
-
-## Publishing Workflow
-
-1. Create a pull request with changes
-2. Request review from [@muhsinking](https://github.com/muhsinking)
-3. Changes deploy automatically to production after merge to `main` branch
-
-## Common Patterns
+## Self-Improvement
 
-### Adding New Documentation Pages
+**Claude should continuously learn and improve these docs.**
 
-1. Create `.mdx` file in appropriate directory
-2. Add frontmatter with `title`, `sidebarTitle`, and `description`
-3. Update `docs.json` navigation to include the page path
-4. Ensure proper categorization under relevant tab/group
+If you discover something that would be useful for future sessions, ask me:
+> "I noticed [insight]. Would you like me to add this to `.claude/[appropriate-file].md`?"
 
-### Using Snippets
+Examples of things worth capturing:
+- Patterns that work well (or don't) in this codebase
+- Common mistakes to avoid
+- Useful commands or workflows discovered during tasks
+- Clarifications about how Runpod products work
 
-Reusable content (like pricing tables) lives in `snippets/` and can be embedded in multiple pages to maintain consistency.
+## Terminology Quick Reference
 
-### Redirects
+**Capitalize:** Runpod, Pods, Serverless, Hub, Instant Clusters, Flash, Secure Cloud, Community Cloud
 
-When moving or renaming pages, add redirect entries to the `redirects` array in `docs.json` to maintain backward compatibility.
+**Lowercase:** endpoint, worker, template, handler, network volume, data center, cluster, fine-tune
diff --git a/README.md b/README.md
index 7c384da0..d9b6dbac 100644
--- a/README.md
+++ b/README.md
@@ -63,3 +63,61 @@ pip install -r helpers/requirements.txt
 python3 helpers/gpu_types.py
 python3 helpers/sls_cpu_types.py
 ```
+
+## Agent experience testing
+
+The `tests/TESTS.md` file contains test definitions for validating documentation quality through AI agent testing. Tests simulate real user prompts - a coding agent must accomplish the goal using only the documentation as it currently exists.
+
+### Requirements
+
+- [Claude Code](https://docs.anthropic.com/en/docs/claude-code) with the Runpod MCP servers configured:
+  ```bash
+  # Add Runpod API MCP server
+  claude mcp add runpod --scope user -e RUNPOD_API_KEY=your_key -- npx -y @runpod/mcp-server@latest
+
+  # Add Runpod Docs MCP server
+  claude mcp add runpod-docs --scope user --transport http https://docs.runpod.io/mcp
+  ```
+
+### Running tests
+
+In Claude Code, use natural language:
+
+```
+Run the flash-quickstart test
+```
+
+```
+Run all vLLM tests
+```
+
+To validate unpublished doc changes, use local docs mode:
+
+```
+Run the vllm-deploy test using local docs
+```
+
+Claude will:
+1. Read the test from `tests/TESTS.md`
+2. Attempt to accomplish the goal using only the docs
+3. Clean up any resources created (prefixed with `doc_test_`)
+4. Write a report to `tests/reports/`
+5. Suggest documentation improvements
+
+### Test definitions
+
+All tests are defined in [`tests/TESTS.md`](tests/TESTS.md) as a table
+
+### Adding new tests
+
+Add a row to the appropriate section in `tests/TESTS.md` with:
+- **ID**: Unique test identifier
+- **Goal**: One sentence describing what the user wants
+- **Cleanup**: Resource types to delete (`endpoints`, `pods`, `templates`, `network-volumes`, or `none`)
+
+### Reports
+
+Test reports are saved to `tests/reports/` (gitignored) and include:
+- What worked and what didn't
+- Where the agent got stuck
+- Specific documentation improvement suggestions
diff --git a/tests/README.md b/tests/README.md
new file mode 100644
index 00000000..f4179f16
--- /dev/null
+++ b/tests/README.md
@@ -0,0 +1,42 @@
+# Coding Agent Experience Tests
+
+Tests that simulate real user prompts. A coding agent must accomplish the goal using only the documentation.
+
+## Philosophy
+
+These tests should be **hard to pass**. They simulate a user typing a simple request without context. If the docs are good, the agent can figure it out. If not, the test reveals gaps.
+
+## Running Tests
+
+In Claude Code:
+
+```
+Run the flash-quickstart test
+```
+
+```
+Run all vLLM tests
+```
+
+### Doc Source Modes
+
+- **Published docs** (default) - Uses the Runpod Docs MCP server
+- **Local docs** - Reads `.mdx` files from this repo (for validating unpublished changes)
+
+```
+Run the vllm-deploy test using local docs
+```
+
+## Test Definitions
+
+All tests are defined in [TESTS.md](./TESTS.md) as a table with:
+- **ID**: Test identifier
+- **Goal**: What the user wants (one sentence)
+- **Cleanup**: Resource types to delete after test
+
+## Reports
+
+Reports are saved to `reports/` (gitignored) and include:
+- What worked / what didn't
+- Where the agent got stuck
+- Documentation improvements needed
diff --git a/tests/TESTS.md b/tests/TESTS.md
new file mode 100644
index 00000000..05ee21f9
--- /dev/null
+++ b/tests/TESTS.md
@@ -0,0 +1,244 @@
+# Documentation Agent Tests
+
+Minimal test definitions that simulate real user prompts. Tests are intentionally sparse - the agent must figure out how to accomplish the goal using only the documentation.
+
+## How to Run
+
+In Claude Code, use natural language:
+
+```
+Run the flash-quickstart test
+```
+
+```
+Run all vLLM tests
+```
+
+### Doc Source Modes
+
+**Published docs (default)** - Uses the Runpod Docs MCP server to search published documentation:
+```
+Run the vllm-deploy test
+```
+
+**Local docs** - Reads MDX files directly from this repo (use to validate unpublished changes):
+```
+Run the vllm-deploy test using local docs
+```
+
+When using local docs, the agent will search and read `.mdx` files in this repository instead of querying the MCP server.
+
+## Test Format
+
+Each test has:
+- **ID**: Unique identifier for the test
+- **Goal**: What a user would ask (one sentence, no hints)
+- **Cleanup**: Resources to delete after test (all use `doc_test_*` prefix)
+
+---
+
+## Flash SDK
+
+| ID | Goal | Difficulty |
+|----|------|------------|
+| flash-quickstart | Deploy a GPU function using Flash | Easy |
+| flash-hello-gpu | Run a simple PyTorch function on a GPU | Easy |
+| flash-sdxl | Generate an image using SDXL with Flash | Medium |
+| flash-text-gen | Deploy a text generation model with Flash | Medium |
+| flash-dependencies | Deploy a function with custom pip dependencies | Easy |
+| flash-multi-gpu | Create an endpoint that uses multiple GPUs | Medium |
+| flash-cpu-endpoint | Deploy a CPU-only endpoint with Flash | Easy |
+| flash-load-balancer | Build a REST API with load balancing using Flash | Hard |
+| flash-mixed-workers | Create an app with both GPU and CPU workers | Hard |
+| flash-env-vars | Configure environment variables for a Flash endpoint | Easy |
+| flash-idle-timeout | Set a custom idle timeout for a Flash endpoint | Easy |
+| flash-app-deploy | Initialize and deploy a complete Flash app | Medium |
+| flash-local-test | Test a Flash function locally before deploying | Medium |
+
+---
+
+## Serverless Endpoints
+
+> **Important:** Do NOT use public endpoints for these tests. The goal is to test the full deployment workflow: create a template, deploy an endpoint, send requests, and verify the integration works. Public endpoints are a separate product and skip the deployment steps we need to validate.
+
+| ID | Goal | Difficulty |
+|----|------|------------|
+| serverless-create-endpoint | Create a serverless endpoint | Medium |
+| serverless-serve-qwen | Create an endpoint to serve a Qwen model | Hard |
+| serverless-custom-handler | Write a custom handler function and deploy it | Hard |
+| serverless-logs | Build a custom handler that uses progress_update() to send log messages, deploy it, and verify updates appear in /status polling | Hard |
+| serverless-send-request | Send a request to an existing endpoint | Easy |
+| serverless-async-request | Submit an async job and poll for results | Medium |
+| serverless-sync-request | Make a synchronous request to an endpoint using /runsync | Easy |
+| serverless-streaming | Build a custom handler that uses yield to stream results, deploy it, and test the /stream endpoint | Hard |
+| serverless-webhook | Set up webhook notifications for a serverless endpoint | Medium |
+| serverless-cancel-job | Cancel a running or queued job | Easy |
+| serverless-queue-delay | Create an endpoint with queue delay scaling | Medium |
+| serverless-request-count | Create an endpoint with request count scaling | Medium |
+| serverless-min-workers | Create an endpoint with 1 minimum active worker | Easy |
+| serverless-idle-timeout | Create an endpoint with an idle timeout of 20 seconds | Easy |
+| serverless-gpu-priority | Create an endpoint with GPU type priority/fallback | Medium |
+| serverless-docker-deploy | Deploy an endpoint from Docker Hub | Hard |
+| serverless-github-deploy | Deploy an endpoint from GitHub | Hard |
+| serverless-ssh-worker | SSH into a running worker for debugging | Medium |
+| serverless-metrics | View endpoint metrics (execution time, delay) | Easy |
+
+---
+
+## vLLM
+
+> **Important:** Do NOT use public endpoints for these tests. Deploy your own vLLM endpoint to test the full workflow. Public endpoints skip the deployment and configuration steps we need to validate.
+
+| ID | Goal | Difficulty |
+|----|------|------------|
+| vllm-deploy | Deploy a vLLM endpoint | Medium |
+| vllm-openai-compat | Use the OpenAI Python client with a vLLM endpoint | Medium |
+| vllm-chat-completion | Send a chat completion request to vLLM | Easy |
+| vllm-streaming | Stream responses from a vLLM endpoint | Medium |
+| vllm-custom-model | Deploy a custom/fine-tuned model with vLLM | Hard |
+| vllm-gated-model | Deploy a gated Hugging Face model with vLLM | Medium |
+
+---
+
+## Pods
+
+| ID | Goal | Difficulty |
+|----|------|------------|
+| pods-create | Create a GPU Pod | Medium |
+| pods-start-stop | Start and stop an existing Pod | Easy |
+| pods-ssh-connect | Connect to a Pod via SSH | Medium |
+| pods-expose-port | Expose a custom port on a Pod | Medium |
+| pods-env-vars | Set environment variables on a Pod | Easy |
+| pods-resize-storage | Resize a Pod's container or volume disk | Easy |
+| pods-template-use | Deploy a Pod using a custom template | Medium |
+| pods-template-create | Create a custom Pod template | Hard |
+| pods-comfyui | Deploy ComfyUI on a Pod and generate an image | Hard |
+
+---
+
+## Storage
+
+| ID | Goal | Difficulty |
+|----|------|------------|
+| storage-create-volume | Create a network volume | Easy |
+| storage-attach-pod | Attach a network volume to a Pod | Medium |
+| storage-attach-serverless | Attach a network volume to a Serverless endpoint | Medium |
+| storage-s3-api | Access a network volume using the S3 API | Hard |
+| storage-upload-s3 | Upload a file to a network volume using S3 | Hard |
+| storage-download-s3 | Download a file from a network volume using S3 | Hard |
+| storage-runpodctl-send | Transfer files between Pods using runpodctl | Easy |
+| storage-migrate-volume | Migrate data between network volumes | Hard |
+| storage-cloud-sync | Sync data with cloud storage (S3, GCS) | Hard |
+| storage-scp-transfer | Transfer files to a Pod using SCP | Medium |
+| storage-rsync | Sync files to a Pod using rsync | Medium |
+
+---
+
+## Templates
+
+| ID | Goal | Difficulty |
+|----|------|------------|
+| template-create-pod | Create a Pod template | Medium |
+| template-create-serverless | Create a Serverless template | Medium |
+| template-list | List all templates | Easy |
+| template-preload-model | Create a template with a pre-loaded model | Hard |
+| template-custom-dockerfile | Create a template with a custom Dockerfile | Hard |
+| template-env-vars | Add environment variables to a template | Easy |
+
+---
+
+## Instant Clusters
+
+| ID | Goal | Difficulty |
+|----|------|------------|
+| cluster-create | Create an Instant Cluster | Medium |
+| cluster-pytorch | Run distributed PyTorch training on a cluster | Hard |
+| cluster-slurm | Deploy a Slurm cluster | Hard |
+| cluster-axolotl | Fine-tune an LLM with Axolotl on a cluster | Hard |
+
+---
+
+## SDKs & APIs
+
+| ID | Goal | Difficulty |
+|----|------|------------|
+| sdk-python-install | Install the Runpod Python SDK | Easy |
+| sdk-python-endpoint | Use the Python SDK to call an endpoint | Easy |
+| sdk-js-install | Install the Runpod JavaScript SDK | Easy |
+| sdk-js-endpoint | Use the JavaScript SDK to call an endpoint | Easy |
+| api-graphql-query | Make a GraphQL query to list pods | Medium |
+| api-graphql-mutation | Create a resource using GraphQL mutation | Medium |
+| api-key-create | Create an API key with specific permissions | Easy |
+| api-key-restricted | Create a restricted API key | Medium |
+
+---
+
+## CLI (runpodctl)
+
+| ID | Goal | Difficulty |
+|----|------|------------|
+| cli-install | Install runpodctl on your local machine | Easy |
+| cli-configure | Configure runpodctl with your API key | Easy |
+| cli-list-pods | List pods using runpodctl | Easy |
+| cli-create-pod | Create a pod using runpodctl | Medium |
+| cli-send-file | Send a file to a Pod using runpodctl | Medium |
+| cli-receive-file | Receive a file from a Pod using runpodctl | Medium |
+
+---
+
+## Model Caching
+
+| ID | Goal | Difficulty |
+|----|------|------------|
+| cache-enable | Create an endpoint with model caching enabled | Medium |
+
+---
+
+## Integrations
+
+| ID | Goal | Difficulty |
+|----|------|------------|
+| integration-openai-migrate | Create an OpenAI-compatible endpoint | Medium |
+| integration-vercel-ai | Create an image generation app with the Vercel AI SDK | Medium |
+| integration-cursor | Configure Cursor to use Runpod endpoints | Medium |
+| integration-skypilot | Use Runpod with SkyPilot | Hard |
+
+---
+
+## Public Endpoints
+
+| ID | Goal | Difficulty |
+|----|------|------------|
+| public-flux | Generate an image using FLUX public endpoint | Easy |
+| public-qwen | Use the Qwen3 32B public endpoint | Easy |
+| public-video | Generate video using WAN public endpoint | Medium |
+
+---
+
+## Tutorials (End-to-End)
+
+| ID | Goal | Difficulty |
+|----|------|------------|
+| tutorial-sdxl-serverless | Deploy SDXL as a serverless endpoint | Medium |
+| tutorial-comfyui-pod | Deploy ComfyUI on a Pod and generate an image | Medium |
+| tutorial-comfyui-serverless | Deploy ComfyUI as a serverless endpoint and generate an image | Hard |
+| tutorial-gemma-chatbot | Deploy a Gemma 3 chatbot with vLLM | Medium |
+| tutorial-custom-worker | Build and deploy a custom worker | Hard |
+| tutorial-web-integration | Integrate a Serverless endpoint into a web application | Hard |
+| tutorial-dual-mode-worker | Deploy a dual-mode (Pod/Serverless) worker | Hard |
+| tutorial-model-caching | Create an endpoint with model caching enabled | Hard |
+| tutorial-pytorch-cluster | Deploy a PyTorch cluster | Hard |
+
+---
+---
+
+## Cleanup Rules
+
+All test resources must use the `doc_test_` prefix. After each test:
+
+- **endpoints**: Delete endpoints matching `doc_test_*`
+- **pods**: Delete pods matching `doc_test_*`
+- **templates**: Delete templates matching `doc_test_*`
+- **network-volumes**: Delete network volumes matching `doc_test_*`
+- **clusters**: Delete clusters matching `doc_test_*`
+- **none**: No cleanup needed (read-only test)

From 23b7496d8b82f14b8a421e47e6fd6d3db02eef48 Mon Sep 17 00:00:00 2001
From: Mo King <muhsinking@gmail.com>
Date: Fri, 20 Mar 2026 09:08:14 -0400
Subject: [PATCH 2/8] Update Pods docs to improve AX

---
 pods/connect-to-a-pod.mdx                     | 31 ++++++-
 pods/manage-pods.mdx                          | 83 +++++++++++++++++++
 runpodctl/reference/runpodctl-create-pod.mdx  |  2 +-
 serverless/development/logs.mdx               |  4 +
 .../containers/docker-commands.mdx            |  2 +-
 tutorials/pods/comfyui.mdx                    | 37 ++++++++-
 6 files changed, 155 insertions(+), 4 deletions(-)

diff --git a/pods/connect-to-a-pod.mdx b/pods/connect-to-a-pod.mdx
index 864f7edd..a537d7bd 100644
--- a/pods/connect-to-a-pod.mdx
+++ b/pods/connect-to-a-pod.mdx
@@ -36,11 +36,40 @@ If **Start** doesn't respond, refresh the page.
 
 Interactive web environment for code, files, and data analysis. Available on templates with JupyterLab pre-configured (e.g., "Runpod Pytorch").
 
+<Tabs>
+<Tab title="Web Console">
+
 1. Deploy a Pod with a JupyterLab-compatible template (all official Runpod PyTorch templates have JupyterLab pre-configured).
 2. Navigate to the [Pods page](https://console.runpod.io/pods) and click **Connect**.
 3. Under **HTTP Services**, click the **Jupyter Lab** link (usually port 8888).
 
- <Tip>
+</Tab>
+
+<Tab title="CLI">
+
+Create a Pod with JupyterLab access using the CLI:
+
+```bash
+runpodctl create pod \
+  --name my-jupyter-pod \
+  --gpuType "NVIDIA GeForce RTX 4090" \
+  --imageName "runpod/pytorch:2.1.0-py3.10-cuda11.8.0-devel-ubuntu22.04" \
+  --containerDiskSize 20 \
+  --volumeSize 50 \
+  --ports "8888/http" \
+  --env "JUPYTER_PASSWORD=your_secure_password"
+```
+
+After the Pod starts, access JupyterLab at `https://[POD_ID]-8888.proxy.runpod.net`.
+
+<Tip>
+Set the `JUPYTER_PASSWORD` environment variable to configure JupyterLab authentication. If not set, some templates use a default password shown in the Pod logs.
+</Tip>
+
+</Tab>
+</Tabs>
+
+<Tip>
 If the JupyterLab tab displays a blank page for more than a minute or two, try restarting the Pod and opening it again.
 </Tip>
 
diff --git a/pods/manage-pods.mdx b/pods/manage-pods.mdx
index 7f80a09d..2172041a 100644
--- a/pods/manage-pods.mdx
+++ b/pods/manage-pods.mdx
@@ -18,6 +18,7 @@ runpodctl config --apiKey RUNPOD_API_KEY
 | **Deploy** | [Pods page](https://www.console.runpod.io/pods) → Deploy | `runpodctl create pods --name NAME --gpuType "GPU" --imageName "IMAGE"` |
 | **Start** | Expand Pod → Play icon | `runpodctl start pod POD_ID` |
 | **Stop** | Expand Pod → Stop icon | `runpodctl stop pod POD_ID` |
+| **Update** | Three-dot menu → Edit Pod | — |
 | **Terminate** | Expand Pod → Trash icon | `runpodctl remove pod POD_ID` |
 | **List** | [Pods page](https://www.console.runpod.io/pods) | `runpodctl get pod` |
 
@@ -74,6 +75,21 @@ curl --request POST \
   }'
 ```
 
+To deploy a Pod from an existing template, use the `templateId` parameter instead of specifying individual configuration options:
+
+```bash
+curl --request POST \
+  --url https://rest.runpod.io/v1/pods \
+  --header 'Authorization: Bearer RUNPOD_API_KEY' \
+  --header 'Content-Type: application/json' \
+  --data '{
+    "name": "my-pod-from-template",
+    "templateId": "YOUR_TEMPLATE_ID",
+    "gpuTypeIds": ["NVIDIA GeForce RTX 4090"],
+    "gpuCount": 1
+  }'
+```
+
 See the [Pod API reference](/api-reference/pods/POST/pods) for all parameters.
 
 </Tab>
@@ -110,6 +126,16 @@ runpodctl stop pod $RUNPOD_POD_ID
 sleep 2h; runpodctl stop pod $RUNPOD_POD_ID &
 ```
 
+</Tab>
+
+<Tab title="REST API">
+
+```bash
+curl --request POST \
+  --url "https://rest.runpod.io/v1/pods/$RUNPOD_POD_ID/stop" \
+  --header 'Authorization: Bearer RUNPOD_API_KEY'
+```
+
 </Tab>
 </Tabs>
 
@@ -131,9 +157,56 @@ Resume a stopped Pod. Note: You may be allocated [zero GPUs](/references/trouble
 runpodctl start pod $RUNPOD_POD_ID
 ```
 
+</Tab>
+
+<Tab title="REST API">
+
+```bash
+curl --request POST \
+  --url "https://rest.runpod.io/v1/pods/$RUNPOD_POD_ID/start" \
+  --header 'Authorization: Bearer RUNPOD_API_KEY'
+```
+
 </Tab>
 </Tabs>
 
+## Update a Pod
+
+Modify an existing Pod's configuration, such as storage size, image, ports, or environment variables.
+
+<Warning>
+Editing a running Pod resets it completely, erasing all data not stored in `/workspace` or a network volume.
+</Warning>
+
+<Tabs>
+<Tab title="Web">
+
+1. Open the [Pods page](https://www.console.runpod.io/pods).
+2. Click the three-dot menu next to the Pod you want to update.
+3. Click **Edit Pod** and modify your configuration.
+4. Click **Save** to apply changes.
+
+</Tab>
+
+<Tab title="REST API">
+
+```bash
+curl --request PATCH \
+  --url "https://rest.runpod.io/v1/pods/$RUNPOD_POD_ID" \
+  --header 'Authorization: Bearer RUNPOD_API_KEY' \
+  --header 'Content-Type: application/json' \
+  --data '{
+    "containerDiskInGb": 100,
+    "volumeInGb": 200
+  }'
+```
+
+See the [Pod API reference](/api-reference/pods/PATCH/pods/podId) for all editable fields.
+
+</Tab>
+</Tabs>
+
+
 ## Terminate a Pod
 
 <Danger>
@@ -158,6 +231,16 @@ runpodctl remove pod $RUNPOD_POD_ID
 runpodctl remove pods my-bulk-task --podCount 40
 ```
 
+</Tab>
+
+<Tab title="REST API">
+
+```bash
+curl --request DELETE \
+  --url "https://rest.runpod.io/v1/pods/$RUNPOD_POD_ID" \
+  --header 'Authorization: Bearer RUNPOD_API_KEY'
+```
+
 </Tab>
 </Tabs>
 
diff --git a/runpodctl/reference/runpodctl-create-pod.mdx b/runpodctl/reference/runpodctl-create-pod.mdx
index 315e4816..359668da 100644
--- a/runpodctl/reference/runpodctl-create-pod.mdx
+++ b/runpodctl/reference/runpodctl-create-pod.mdx
@@ -93,7 +93,7 @@ Additional arguments to pass to the container when it starts.
 </ResponseField>
 
 <ResponseField name="--ports" type="string">
-Ports to expose from the container. Maximum of 1 HTTP port and 1 TCP port allowed (e.g., `--ports 8888/http --ports 22/tcp`).
+Ports to expose from the container. Specify multiple times for multiple ports (e.g., `--ports 8888/http --ports 22/tcp`). You can expose up to 10 HTTP ports and multiple TCP ports. See [Expose ports](/pods/configuration/expose-ports) for details.
 </ResponseField>
 
 ## Related commands
diff --git a/serverless/development/logs.mdx b/serverless/development/logs.mdx
index 270b680c..c8a066ca 100644
--- a/serverless/development/logs.mdx
+++ b/serverless/development/logs.mdx
@@ -51,6 +51,10 @@ To view worker logs:
 4. Use the search and filtering capabilities to find specific log entries.
 5. Download logs as text files for offline analysis.
 
+## Stream output to clients
+
+To send progress updates or stream results to clients during job execution, see [Progress updates](/serverless/workers/handler-functions#progress-updates) and [Streaming handlers](/serverless/workers/handler-functions#streaming-handlers).
+
 ## Troubleshooting
 
 ### Missing logs
diff --git a/tutorials/introduction/containers/docker-commands.mdx b/tutorials/introduction/containers/docker-commands.mdx
index 1a53e645..18a5da7c 100644
--- a/tutorials/introduction/containers/docker-commands.mdx
+++ b/tutorials/introduction/containers/docker-commands.mdx
@@ -290,7 +290,7 @@ docker logs --tail 100 my-container
 docker logs -t my-container
 ```
 
-For Runpod Serverless, you can view worker logs through the web console or API. For Pods, `docker logs` helps debug containers you're running during development.
+For Runpod Serverless, you can view worker logs through the [web console](/serverless/development/logs). For Pods, `docker logs` helps debug containers you're running during development.
 
 ### docker exec
 
diff --git a/tutorials/pods/comfyui.mdx b/tutorials/pods/comfyui.mdx
index 9e189a48..e556c48d 100644
--- a/tutorials/pods/comfyui.mdx
+++ b/tutorials/pods/comfyui.mdx
@@ -28,7 +28,10 @@ Before you begin, you'll need:
 
 ## Step 1: Deploy a ComfyUI Pod
 
-First, you'll deploy a Pod using the official Runpod ComfyUI template, which pre-installs ComfyUI and the ComfyUI Manager plugin:
+First, you'll deploy a Pod using the official Runpod ComfyUI template, which pre-installs ComfyUI and the ComfyUI Manager plugin.
+
+<Tabs>
+<Tab title="Web Console">
 
 <Steps>
   <Step title="Choose the right ComfyUI template">
@@ -55,6 +58,38 @@ First, you'll deploy a Pod using the official Runpod ComfyUI template, which pre
   </Step>
 </Steps>
 
+</Tab>
+
+<Tab title="REST API">
+
+Deploy a ComfyUI Pod programmatically using the REST API:
+
+```bash
+curl --request POST \
+  --url https://rest.runpod.io/v1/pods \
+  --header 'Authorization: Bearer RUNPOD_API_KEY' \
+  --header 'Content-Type: application/json' \
+  --data '{
+    "name": "comfyui-pod",
+    "imageName": "runpod/comfyui:latest",
+    "gpuTypeIds": ["NVIDIA GeForce RTX 4090"],
+    "gpuCount": 1,
+    "containerDiskInGb": 50,
+    "volumeInGb": 100,
+    "ports": ["8188/http", "22/tcp", "8080/http"]
+  }'
+```
+
+**Port configuration:**
+- `8188/http`: ComfyUI web interface
+- `22/tcp`: SSH access
+- `8080/http`: File browser (optional)
+
+For Blackwell GPUs (RTX 5090, B200), use `runpod/comfyui:cuda12.8` instead.
+
+</Tab>
+</Tabs>
+
 ## Step 2: Open the ComfyUI interface
 
 Once your Pod has finished initializing, you can open the ComfyUI interface:

From ab6ac90e95c296cc6c2850807dbfba17a46b665a Mon Sep 17 00:00:00 2001
From: Mo King <muhsinking@gmail.com>
Date: Fri, 20 Mar 2026 09:39:52 -0400
Subject: [PATCH 3/8] Add terminal workflow to the Pods quickstart

---
 get-started.mdx | 113 +++++++++++++++++++++++++++++++++++++++---------
 tests/TESTS.md  |   2 +-
 2 files changed, 94 insertions(+), 21 deletions(-)

diff --git a/get-started.mdx b/get-started.mdx
index fb7575ff..18547333 100644
--- a/get-started.mdx
+++ b/get-started.mdx
@@ -26,6 +26,9 @@ Planning to share compute resources with your team? You can convert your persona
 
 Now that you've created your account, you're ready to deploy your first Pod:
 
+<Tabs>
+<Tab title="Web">
+
 1. Open the [Pods page](https://www.console.runpod.io/pods) in the web interface.
 2. Click the **Deploy** button.
 3. Select **A40** from the list of graphics cards (or any other GPU that's available).
@@ -34,36 +37,91 @@ Now that you've created your account, you're ready to deploy your first Pod:
 6. Click **Deploy On-Demand** to deploy and start your Pod. You'll be redirected back to the Pods page after a few seconds.
 
 <Note>
-
 If you haven't set up payments yet, you'll be prompted to add a payment method and purchase credits for your account.
-
 </Note>
 
-## Step 3: Explore the Pod detail pane
+</Tab>
+
+<Tab title="Terminal">
+
+First, [create an API key](/get-started/api-keys) if you haven't already. Then deploy your Pod:
+
+```bash
+curl --request POST \
+  --url https://rest.runpod.io/v1/pods \
+  --header "Authorization: Bearer $RUNPOD_API_KEY" \
+  --header "Content-Type: application/json" \
+  --data '{
+    "name": "quickstart-pod",
+    "imageName": "runpod/pytorch:2.1.0-py3.10-cuda11.8.0-devel-ubuntu22.04",
+    "gpuTypeIds": ["NVIDIA A40"],
+    "gpuCount": 1
+  }'
+```
+
+The response includes your Pod ID. Save it for later:
+
+```bash
+export RUNPOD_POD_ID="your-pod-id"
+```
+
+</Tab>
+</Tabs>
+
+## Step 3: Execute code on your Pod
+
+Once your Pod finishes initializing, connect and run some code:
+
+<Tabs>
+<Tab title="Web">
+
+1. On the [Pods page](https://www.console.runpod.io/pods), click your Pod to open the detail pane.
+2. Under **HTTP Services**, click **Jupyter Lab** to open a JupyterLab workspace.
+3. Under **Notebook**, select **Python 3 (ipykernel)**.
+4. Type `print("Hello, world!")` in the first cell and click the play button.
+
+</Tab>
 
-On the [Pods page](https://www.console.runpod.io/pods), click the Pod you just created to open the Pod detail pane. The pane opens onto the **Connect** tab, where you'll find options for connecting to your Pod so you can execute code on your GPU (after it's done initializing).
+<Tab title="Terminal">
 
-Take a minute to explore the other tabs: 
+Get your Pod's SSH connection details:
 
-- **Details**: Information about your Pod, such as hardware specs, pricing, and storage.
-- **Telemetry**: Realtime utilization metrics for your Pod's CPU, memory, and storage.
-- **Logs**: Logs streamed from your container (including stdout from any applications inside) and the Pod management system.
-- **Template Readme**: Details about the <TemplateTooltip /> your Pod is running. Your Pod is configured with the latest official Runpod <PyTorchTooltip /> template.
+```bash
+curl --request GET \
+  --url "https://rest.runpod.io/v1/pods/$RUNPOD_POD_ID" \
+  --header "Authorization: Bearer $RUNPOD_API_KEY"
+```
 
-## Step 4: Execute code on your Pod with JupyterLab
+Extract the `ip` and `port` from the response's `runtime.ports` array (look for port 22), then connect:
 
-1. Go back to the **Connect** tab, and under **HTTP Services**, click **Jupyter Lab** to open a JupyterLab workspace on your Pod.
-2. Under **Notebook**, select **Python 3 (ipykernel)**.
-3. Type `print("Hello, world!")` in the first line of the notebook.
-4. Click the play button to run your code.
+```bash
+ssh root@<ip> -p <port> -i ~/.ssh/your_key
+python3 -c "print('Hello, world!')"
+```
+
+<Note>
+You'll need an [SSH key added to your account](/pods/configuration/use-ssh) for this to work.
+</Note>
+
+</Tab>
+</Tabs>
 
 <Check>
 Congratulations! You just ran your first line of code on Runpod.
 </Check>
 
-## Step 5: Clean up
+## Step 4: Clean up
+
+To avoid incurring unnecessary charges, clean up your Pod resources.
+
+<Danger>
+Terminating a Pod permanently deletes all data that isn't stored in a <NetworkVolumeTooltip />. Be sure that you've saved any data you might need to access again. To learn more about how storage works, see the [Pod storage overview](/pods/storage/types).
+</Danger>
+
+<Tabs>
+<Tab title="Web">
 
-To avoid incurring unnecessary charges, follow these steps to clean up your Pod resources:
+To stop your Pod:
 
 1. Return to the [Pods page](https://www.console.runpod.io/pods) and click your running Pod.
 2. Click the **Stop** button (pause icon) to stop your Pod.
@@ -76,13 +134,28 @@ To terminate your Pod:
 1. Click the **Terminate** button (trash icon).
 2. Click **Terminate Pod** to confirm.
 
-<Danger>
+</Tab>
 
-Terminating a Pod permanently deletes all data that isn't stored in a <NetworkVolumeTooltip />. Be sure that you've saved any data you might need to access again.
+<Tab title="Terminal">
 
-To learn more about how storage works, see the [Pod storage overview](/pods/storage/types).
+Stop your Pod:
 
-</Danger>
+```bash
+curl --request POST \
+  --url "https://rest.runpod.io/v1/pods/$RUNPOD_POD_ID/stop" \
+  --header "Authorization: Bearer $RUNPOD_API_KEY"
+```
+
+You'll still be charged a small amount for storage on stopped Pods (\$0.20 per GB per month). If you don't need to retain any data on your Pod, terminate it completely:
+
+```bash
+curl --request DELETE \
+  --url "https://rest.runpod.io/v1/pods/$RUNPOD_POD_ID" \
+  --header "Authorization: Bearer $RUNPOD_API_KEY"
+```
+
+</Tab>
+</Tabs>
 
 ## Next steps
 
diff --git a/tests/TESTS.md b/tests/TESTS.md
index 05ee21f9..74ebe9a9 100644
--- a/tests/TESTS.md
+++ b/tests/TESTS.md
@@ -59,7 +59,7 @@ Each test has:
 
 ## Serverless Endpoints
 
-> **Important:** Do NOT use public endpoints for these tests. The goal is to test the full deployment workflow: create a template, deploy an endpoint, send requests, and verify the integration works. Public endpoints are a separate product and skip the deployment steps we need to validate.
+> **Important:** Do NOT use public endpoints for these tests. The goal is to test the full deployment workflow: deploy an endpoint, send requests, and verify the integration works. Public endpoints are a separate product and skip the deployment steps we need to validate.
 
 | ID | Goal | Difficulty |
 |----|------|------------|

From 4f001b0a094f5ffb3f348bacfc87377e7757d01c Mon Sep 17 00:00:00 2001
From: Mo King <muhsinking@gmail.com>
Date: Fri, 20 Mar 2026 09:50:48 -0400
Subject: [PATCH 4/8] Improve Pod quickstart terminal steps

---
 get-started.mdx | 58 ++++++++++++++++++++++++++++++++++---------------
 tests/TESTS.md  |  1 +
 2 files changed, 41 insertions(+), 18 deletions(-)

diff --git a/get-started.mdx b/get-started.mdx
index 18547333..b4549a2f 100644
--- a/get-started.mdx
+++ b/get-started.mdx
@@ -16,12 +16,6 @@ Start by creating a Runpod account:
 2. Verify your email address.
 3. Set up two-factor authentication (recommended for security).
 
-<Tip>
-
-Planning to share compute resources with your team? You can convert your personal account to a team account later. See [Manage accounts](/get-started/manage-accounts) for details.
-
-</Tip>
-
 ## Step 2: Deploy a Pod
 
 Now that you've created your account, you're ready to deploy your first Pod:
@@ -44,7 +38,13 @@ If you haven't set up payments yet, you'll be prompted to add a payment method a
 
 <Tab title="Terminal">
 
-First, [create an API key](/get-started/api-keys) if you haven't already. Then deploy your Pod:
+First, [create an API key](/get-started/api-keys) if you haven't already. Export it as an environment variable:
+
+```bash
+export RUNPOD_API_KEY="your-api-key"
+```
+
+Then deploy your Pod:
 
 ```bash
 curl --request POST \
@@ -59,10 +59,21 @@ curl --request POST \
   }'
 ```
 
-The response includes your Pod ID. Save it for later:
+The response includes your Pod ID:
+
+```json
+{
+  "id": "uv9wy55tyv30lo",
+  "name": "quickstart-pod",
+  "desiredStatus": "RUNNING",
+  ...
+}
+```
+
+Save it for later:
 
 ```bash
-export RUNPOD_POD_ID="your-pod-id"
+export RUNPOD_POD_ID="uv9wy55tyv30lo"
 ```
 
 </Tab>
@@ -84,6 +95,10 @@ Once your Pod finishes initializing, connect and run some code:
 
 <Tab title="Terminal">
 
+<Note>
+You'll need an [SSH key added to your account](/pods/configuration/use-ssh) for this to work.
+</Note>
+
 Get your Pod's SSH connection details:
 
 ```bash
@@ -92,17 +107,26 @@ curl --request GET \
   --header "Authorization: Bearer $RUNPOD_API_KEY"
 ```
 
-Extract the `ip` and `port` from the response's `runtime.ports` array (look for port 22), then connect:
+The response includes `publicIp` and `portMappings`:
+
+```json
+{
+  "id": "uv9wy55tyv30lo",
+  "publicIp": "194.68.245.207",
+  "portMappings": {
+    "22": 22100
+  },
+  ...
+}
+```
+
+Use these values to connect via SSH:
 
 ```bash
-ssh root@<ip> -p <port> -i ~/.ssh/your_key
+ssh root@194.68.245.207 -p 22100
 python3 -c "print('Hello, world!')"
 ```
 
-<Note>
-You'll need an [SSH key added to your account](/pods/configuration/use-ssh) for this to work.
-</Note>
-
 </Tab>
 </Tabs>
 
@@ -115,7 +139,7 @@ Congratulations! You just ran your first line of code on Runpod.
 To avoid incurring unnecessary charges, clean up your Pod resources.
 
 <Danger>
-Terminating a Pod permanently deletes all data that isn't stored in a <NetworkVolumeTooltip />. Be sure that you've saved any data you might need to access again. To learn more about how storage works, see the [Pod storage overview](/pods/storage/types).
+Terminating a Pod permanently deletes all data that isn't stored in a <NetworkVolumeTooltip />. Be sure that you've saved any data you might need to access again.
 </Danger>
 
 <Tabs>
@@ -159,8 +183,6 @@ curl --request DELETE \
 
 ## Next steps
 
-Now that you've learned the basics, you're ready to:
-
 <CardGroup cols={2}>
   <Card title="Generate API keys" href="/get-started/api-keys" icon="key" horizontal>
     Create API keys for programmatic resource management.
diff --git a/tests/TESTS.md b/tests/TESTS.md
index 74ebe9a9..0f733b84 100644
--- a/tests/TESTS.md
+++ b/tests/TESTS.md
@@ -104,6 +104,7 @@ Each test has:
 
 | ID | Goal | Difficulty |
 |----|------|------------|
+| pods-quickstart-terminal | Complete the Pod quickstart using only the terminal | Easy |
 | pods-create | Create a GPU Pod | Medium |
 | pods-start-stop | Start and stop an existing Pod | Easy |
 | pods-ssh-connect | Connect to a Pod via SSH | Medium |

From 6201fdfbb6d2f28ab5f76ed9119fb699911442e2 Mon Sep 17 00:00:00 2001
From: Mo King <muhsinking@gmail.com>
Date: Fri, 20 Mar 2026 12:42:06 -0400
Subject: [PATCH 5/8] Improve agent tests

---
 .claude/commands/test.md       |  73 ++++++++
 .claude/testing.md             | 214 +++++++++++++++++++++--
 .gitignore                     |   2 +-
 flash/quickstart.mdx           |   2 +-
 pods/configuration/use-ssh.mdx |  42 +++--
 tests/IMPROVEMENT_PLAN.md      | 146 ++++++++++++++++
 tests/README.md                |  19 ++-
 tests/TESTS.md                 | 299 +++++++++++++++++++--------------
 tests/scripts/README.md        |  75 +++++++++
 tests/scripts/cleanup.py       | 218 ++++++++++++++++++++++++
 tests/scripts/report.py        | 143 ++++++++++++++++
 tests/scripts/stats.py         | 170 +++++++++++++++++++
 12 files changed, 1237 insertions(+), 166 deletions(-)
 create mode 100644 .claude/commands/test.md
 create mode 100644 tests/IMPROVEMENT_PLAN.md
 create mode 100644 tests/scripts/README.md
 create mode 100755 tests/scripts/cleanup.py
 create mode 100755 tests/scripts/report.py
 create mode 100755 tests/scripts/stats.py

diff --git a/.claude/commands/test.md b/.claude/commands/test.md
new file mode 100644
index 00000000..b29a703f
--- /dev/null
+++ b/.claude/commands/test.md
@@ -0,0 +1,73 @@
+# /test - Run a documentation test
+
+Run a test from the testing framework to validate documentation quality.
+
+## Usage
+
+```
+/test <test-id>
+/test <test-id> local
+/test smoke
+```
+
+## Arguments
+
+- `<test-id>`: The test ID from `tests/TESTS.md` (e.g., `pods-quickstart-terminal`, `flash-quickstart`)
+- `local`: (Optional) Use local MDX files instead of published docs
+- `smoke`: Run all smoke tests
+
+## Execution Rules
+
+When running a test, you MUST follow these rules:
+
+1. **Read the test definition** from `tests/TESTS.md` - find the row matching the test ID
+2. **Do NOT use prior knowledge** - only use Runpod docs (published MCP or local MDX)
+3. **Doc source mode**:
+   - Default: Use `mcp__runpod-docs__search_runpod_documentation` for published docs
+   - If `local` specified: Search and read `.mdx` files in this repository
+4. **Resource naming**: All created resources MUST use `doc_test_` prefix
+5. **Attempt the goal** using available tools (MCP for API, Bash for CLI)
+6. **Handle GPU availability** - see GPU Fallback section below
+7. **Verify the Expected Outcome** from the test definition
+8. **Clean up** all `doc_test_*` resources after the test
+9. **Generate report** using the helper script:
+   ```bash
+   python tests/scripts/report.py <test-id> <PASS|FAIL|PARTIAL> [--local]
+   ```
+10. **Complete the report** by filling in the generated template
+
+## GPU Fallback Guidance
+
+GPU availability varies. When tests require GPU resources:
+
+| Queue Wait | Action |
+|------------|--------|
+| < 2 min | Keep waiting |
+| 2-5 min | Try fallback GPU |
+| > 5 min | Use fallback or mark blocked |
+
+**Fallback order**: L4 → A4000 → RTX 3090 (Community Cloud)
+
+**Status marking**:
+- PASS: Completed with documented GPU
+- PARTIAL: Completed with fallback GPU (doc improvement needed)
+- FAIL: Failed even with fallbacks
+
+## Report Locations
+
+Reports are saved to both:
+- `tests/reports/<test-id>-<timestamp>.md` (gitignored)
+- `~/Dev/doc-tests/<test-id>-<timestamp>.md` (persistent archive)
+
+## Example
+
+```
+/test pods-quickstart-terminal local
+```
+
+This will:
+1. Load the test definition for `pods-quickstart-terminal`
+2. Use local MDX files (not published docs)
+3. Attempt: "Complete the Pod quickstart using only the terminal"
+4. Verify: "Code runs on Pod via SSH"
+5. Clean up and generate report
diff --git a/.claude/testing.md b/.claude/testing.md
index 62a571eb..b03bbbe8 100644
--- a/.claude/testing.md
+++ b/.claude/testing.md
@@ -8,27 +8,131 @@ Tests should be **hard to pass**. They simulate a user typing a simple request w
 
 ## Running Tests
 
-Use natural language:
+Use the `/test` command or natural language:
+
+```
+/test pods-quickstart-terminal        # Command form
+Run the flash-quickstart test         # Natural language
+```
+
+Use the `/test` command to run tests:
+
 ```
-Run the flash-quickstart test
-Run the vllm-deploy test using local docs
-Run all pods tests
+/test pods-quickstart-terminal        # Run with published docs
+/test pods-quickstart-terminal local  # Run with local MDX files
+/test smoke                           # Run all smoke tests
 ```
 
+The `/test` command loads the test definition and reminds you of the execution rules.
+
 ## Test Execution Rules
 
 1. Read the test definition from `tests/TESTS.md`.
 2. **Do NOT use prior knowledge** - only use Runpod docs.
 3. Attempt to complete the goal using available tools.
 4. All created resources must use `doc_test_` prefix.
-5. Clean up resources after test.
-6. Write report to `tests/reports/{test-id}-{timestamp}.md`.
+5. Handle GPU availability issues (see GPU Fallback section below).
+6. Clean up resources after test (see Cleanup section below).
+7. Generate report using the helper script:
+   ```bash
+   python3 tests/scripts/report.py <test-id> <PASS|FAIL|PARTIAL> [--local]
+   ```
+8. Fill in the generated report template with actual results.
+
+## GPU Fallback Guidance
+
+GPU availability varies by type and time. When a test requires GPU resources:
+
+### Queue Timeout Thresholds
+
+| Wait Time | Action |
+|-----------|--------|
+| < 2 min | Normal, keep waiting |
+| 2-5 min | Consider trying fallback GPU |
+| > 5 min | Use fallback GPU or mark test blocked |
+
+### Fallback GPU Order
+
+If the documented GPU type is unavailable, try these in order:
+
+1. **First choice**: GPU specified in docs (tests the docs as-is)
+2. **Fallback 1**: NVIDIA L4 (good availability, cost-effective)
+3. **Fallback 2**: NVIDIA A4000 (broad availability)
+4. **Fallback 3**: RTX 3090 (community cloud)
+
+### When to Use Fallbacks
+
+- **Test the docs first**: Always try the GPU specified in documentation first.
+- **Document the issue**: If you must use a fallback, note it in the report as a documentation gap.
+- **Mark appropriately**:
+  - PASS: Test completed with documented GPU
+  - PARTIAL: Test completed with fallback GPU (doc improvement needed)
+  - FAIL: Test failed even with fallbacks
+
+### Cloud Type Fallbacks
+
+If Secure Cloud has no availability:
+1. Try Community Cloud for the same GPU type
+2. Note cloud type used in the test report
+
+### Example Report Note
+
+```markdown
+## Documentation Gaps
+GPU availability: Docs specify RTX 4090 but none available after 3 min wait.
+Used fallback: NVIDIA L4 on Community Cloud.
+Suggestion: Add note about GPU availability or use more available GPU in example.
+```
+
+## Cleanup
+
+All test resources use the `doc_test_` prefix. Clean up after each test to avoid orphaned resources.
+
+### During Tests (Claude Code)
+
+After completing a test, use the Runpod MCP tools to delete created resources:
+
+```
+# List and identify test resources
+mcp__runpod__list-pods (filter by name starting with "doc_test_")
+mcp__runpod__list-endpoints
+mcp__runpod__list-templates
+mcp__runpod__list-network-volumes
+
+# Delete matching resources
+mcp__runpod__delete-pod (podId)
+mcp__runpod__delete-endpoint (endpointId)
+mcp__runpod__delete-template (templateId)
+mcp__runpod__delete-network-volume (networkVolumeId)
+```
+
+### Manual Cleanup (Standalone Script)
+
+Run the cleanup script to find and delete orphaned test resources:
+
+```bash
+# Dry run - see what would be deleted
+python tests/scripts/cleanup.py
+
+# Actually delete resources
+python tests/scripts/cleanup.py --delete
+```
+
+### Cleanup Command
+
+Users can request cleanup directly:
+```
+Clean up test resources
+Delete all doc_test_ resources
+```
+
+When this is requested, list all resources matching `doc_test_*` and delete them after confirmation.
 
 ## Doc Source Modes
 
 ### Published Docs (default)
 
-Use the `mcp__runpod-dops__search_runpod_documentation` tool to search the live published documentation. This tests what real users see.
+Use the `mcp__runpod-docs__search_runpod_documentation` tool to search the live published documentation. This tests what real users see.
 
 ### Local Docs
 
@@ -40,25 +144,103 @@ When the user says "using local docs":
 
 This validates unpublished doc changes before they go live.
 
+## Test Tiers
+
+### Smoke Tests
+
+Fast tests that don't require GPU deployments. Run these for quick validation:
+
+```
+Run smoke tests
+Run all smoke tests using local docs
+```
+
+Smoke tests are listed in the "Smoke Tests" section of `tests/TESTS.md`. They include:
+- SDK/CLI installation tests
+- Read-only API tests (list templates, view metrics)
+- Public endpoint tests (FLUX, Qwen)
+- Account configuration tests (SSH keys, API keys)
+
+### Full Tests
+
+All tests including GPU deployments. Use for comprehensive validation:
+
+```
+Run all tests
+Run all serverless tests
+```
+
+Full tests may create billable resources. Always clean up after.
+
 ## Report Format
 
+Save reports to **both** locations:
+1. `tests/reports/{test-id}-{YYYYMMDD-HHMMSS}.md` (gitignored, in repo)
+2. `~/Dev/doc-tests/{test-id}-{YYYYMMDD-HHMMSS}.md` (persistent archive)
+
+Use this template:
+
 ```markdown
-# Test Report: {Test Name}
+# Test Report: {Test ID}
+
+## Metadata
+| Field | Value |
+|-------|-------|
+| **Test ID** | {test-id} |
+| **Date** | {YYYY-MM-DD HH:MM:SS} |
+| **Git SHA** | {git rev-parse --short HEAD} |
+| **Git Branch** | {git branch --show-current} |
+| **Doc Source** | Published / Local |
+| **Status** | PASS / FAIL / PARTIAL |
 
-**Date:** {timestamp}
-**Status:** PASS | FAIL | PARTIAL
+## Goal
+{Copy the goal from TESTS.md}
 
-## What Happened
-Brief narrative of the attempt.
+## Expected Outcome
+{Copy from TESTS.md}
 
-## Where I Got Stuck
-Specific points of confusion or failure.
+## Actual Result
+{What actually happened - be specific}
+
+## Steps Taken
+1. {First thing tried}
+2. {Second thing tried}
+...
 
 ## Documentation Gaps
-What was missing or unclear in the docs.
+{What was missing or unclear - be specific about which page/section}
 
 ## Suggestions
-Specific improvements to make tests pass.
+{Concrete improvements to make this test pass}
+```
+
+### Comparing Runs
+
+Reports in `~/Dev/doc-tests/` persist across git operations. To compare runs:
+```bash
+# List all runs for a test
+ls ~/Dev/doc-tests/flash-quickstart-*.md
+
+# Diff two runs
+diff ~/Dev/doc-tests/flash-quickstart-20240115-100000.md ~/Dev/doc-tests/flash-quickstart-20240120-140000.md
+```
+
+### Tracking Pass Rates
+
+Use the stats script to analyze historical results:
+
+```bash
+# Overall summary
+python3 tests/scripts/stats.py
+
+# Group by test
+python3 tests/scripts/stats.py --by-test
+
+# Recent runs
+python3 tests/scripts/stats.py --recent 10
+
+# Show failures
+python3 tests/scripts/stats.py --failures
 ```
 
 ## Test Categories
diff --git a/.gitignore b/.gitignore
index e1e8f6b4..0c0d2bcb 100644
--- a/.gitignore
+++ b/.gitignore
@@ -34,4 +34,4 @@ helpers/__pycache__/** */
 # Documentation test reports
 tests/reports/
 
-.serena
\ No newline at end of file
+.serena
diff --git a/flash/quickstart.mdx b/flash/quickstart.mdx
index e2a3aee3..4a20d990 100644
--- a/flash/quickstart.mdx
+++ b/flash/quickstart.mdx
@@ -65,7 +65,7 @@ from runpod_flash import Endpoint, GpuType
 
 @Endpoint(
     name="flash-quickstart",
-    gpu=GpuType.NVIDIA_GEFORCE_RTX_4090,
+    gpu=GpuGroup.ANY, # Use any available GPU
     workers=3,
     dependencies=["numpy", "torch"]
 )
diff --git a/pods/configuration/use-ssh.mdx b/pods/configuration/use-ssh.mdx
index 49a0ee85..b149ba81 100644
--- a/pods/configuration/use-ssh.mdx
+++ b/pods/configuration/use-ssh.mdx
@@ -33,26 +33,38 @@ SSH key authentication is recommended for security and convenience.
     </Warning>
   </Step>
 
-  <Step title="Retrieve your public SSH key">
-    Run this command on your local terminal to retrieve the public SSH key you just generated:
-
-    ```sh
-    cat ~/.ssh/id_ed25519.pub
-    ```
-    
-    This will output something similar to this:
+  <Step title="Add the key to your Runpod account">
 
-    ```sh
-    ssh-ed25519 AAAAC4NzaC1lZDI1JTE5AAAAIGP+L8hnjIcBqUb8NRrDiC32FuJBvRA0m8jLShzgq6BQ YOUR_EMAIL@DOMAIN.COM
-    ```
-  </Step>
+  <Tabs>
+  <Tab title="Web">
 
-  <Step title="Add the key to your Runpod account">
-    Copy and paste your public key from the previous step into the **SSH Public Keys** field in your [Runpod user account settings](https://www.console.runpod.io/user/settings).
+  1. Run `cat ~/.ssh/id_ed25519.pub` to display your public key.
+  2. Copy the output (starts with `ssh-ed25519`).
+  3. Paste it into the **SSH Public Keys** field in your [Runpod account settings](https://www.console.runpod.io/user/settings).
 
   <Warning>
-  If you need to add multiple SSH keys to your Runpod account, make sure that each key pair is on its own line in the **SSH Public Keys** field.
+  If you need to add multiple SSH keys, make sure each key is on its own line.
   </Warning>
+
+  </Tab>
+
+  <Tab title="CLI">
+
+  Use [runpodctl](/runpodctl/overview) to add your key directly:
+
+  ```sh
+  runpodctl ssh add-key --key-file ~/.ssh/id_ed25519.pub
+  ```
+
+  Verify it was added:
+
+  ```sh
+  runpodctl ssh list-keys
+  ```
+
+  </Tab>
+  </Tabs>
+
   </Step>
 </Steps>
 
diff --git a/tests/IMPROVEMENT_PLAN.md b/tests/IMPROVEMENT_PLAN.md
new file mode 100644
index 00000000..755725b8
--- /dev/null
+++ b/tests/IMPROVEMENT_PLAN.md
@@ -0,0 +1,146 @@
+# Testing Framework Improvement Plan
+
+Based on feedback from PR #561 review by @runpod-Henrik.
+
+## Immediate Fixes (Blockers)
+
+### 1. MCP tool name typo
+**File:** `.claude/testing.md` line 31
+**Issue:** References `mcp__runpod-dops__search_runpod_documentation` but server is `runpod-docs`
+**Fix:** Change to `mcp__runpod-docs__search_runpod_documentation`
+**Status:** [x] DONE
+
+### 2. Test table format mismatch
+**Files:** `tests/README.md`, `.claude/testing.md`, `tests/TESTS.md`
+**Issue:** Docs say tables have `ID | Goal | Cleanup` but actual tables have `ID | Goal | Difficulty`
+**Options:**
+- A) Add Cleanup column back to tables (more explicit per-test)
+- B) Update docs to say cleanup rules are global (simpler, current reality)
+**Recommendation:** Option B - cleanup rules ARE global (by resource type), not per-test
+**Status:** [x] DONE - Updated docs to describe actual format with global cleanup rules
+
+### 3. Port limit accuracy
+**File:** `runpodctl/reference/runpodctl-create-pod.mdx`
+**Issue:** Changed from "1 HTTP + 1 TCP" to "10 HTTP + multiple TCP" - needs verification
+**Action:** Verify actual runpodctl behavior before merging
+**Status:** [x] VERIFIED - `pods/configuration/expose-ports.mdx` confirms "Expose HTTP Ports (Max 10)"
+
+## Nits
+
+### 4. Missing trailing newline in .gitignore
+**Status:** [x] DONE
+
+### 5. Double `---` separator in TESTS.md
+**Status:** [x] DONE
+
+---
+
+## Structural Improvements (Future Work)
+
+Henrik correctly identified that this is currently a **catalog**, not a **framework**. Here's a plan to evolve it:
+
+### Phase 1: Cleanup Safety Net (Quick Win)
+**Status:** ✅ DONE
+
+Created `tests/scripts/cleanup.py`:
+- Lists and deletes resources matching `doc_test_*` prefix
+- Supports dry-run mode (default) and `--delete` flag
+- Handles pods, endpoints, templates, and network volumes
+- Can be run standalone or in CI
+
+Also updated `.claude/testing.md` with cleanup instructions for Claude Code.
+
+### Phase 2: Smoke Test Tier
+**Status:** ✅ DONE
+
+Added 12 smoke tests that don't require GPU deploys:
+- SDK installs: `sdk-python-install`, `sdk-js-install`
+- CLI: `cli-install`, `cli-configure`, `cli-list-pods`
+- Read-only: `template-list`, `serverless-metrics`
+- Config: `api-key-create`, `pods-add-ssh-key`
+- Public endpoints: `public-flux`, `public-qwen`, `public-video`
+
+Created separate "Smoke Tests" section in TESTS.md.
+Updated `.claude/testing.md` with test tier instructions.
+
+### Phase 3: Success Criteria
+**Status:** ✅ DONE
+
+Added "Expected Outcome" column to all test tables with objective, measurable criteria:
+- `Pod status is RUNNING`
+- `Endpoint responds to /health`
+- `SSH session established`
+- etc.
+
+Now each test has a clear PASS/FAIL condition.
+
+### Phase 4: Automation Layer
+**Status:** ⏸️ DEFERRED
+
+Requires Claude Code in CI or custom API runner. Skipped for now - tests run manually.
+
+Options for future:
+1. Claude Code headless mode (when available)
+2. Custom runner script with Anthropic API
+3. GitHub Action with Claude CLI
+
+### Phase 5: Results Tracking
+**Status:** ✅ DONE
+
+- Reports saved to **two locations**:
+  - `tests/reports/` (gitignored, in repo)
+  - `~/Dev/doc-tests/` (persistent local archive)
+- Enhanced report template with:
+  - Git SHA and branch
+  - Structured metadata table
+  - Steps taken section
+  - Actual vs expected results
+- Instructions for comparing runs over time
+
+### Phase 6: Convenience Tooling (Added)
+**Status:** ✅ DONE
+
+Based on trial run feedback, added:
+
+1. **`/test` command** (`.claude/commands/test.md`)
+   - Loads test definition and execution rules
+   - Supports `local` flag for local docs mode
+   - Supports `smoke` for running smoke tests
+
+2. **`report.py` script** (`tests/scripts/report.py`)
+   - Auto-generates report template with metadata
+   - Pulls goal and expected outcome from TESTS.md
+   - Saves to both report locations
+
+3. **`stats.py` script** (`tests/scripts/stats.py`)
+   - Analyzes historical test reports
+   - Shows pass rates overall and by test
+   - Lists recent runs and failures
+
+### Phase 7: GPU Fallback Guidance (Added)
+**Status:** ✅ DONE
+
+Based on flash-quickstart test failure (RTX 4090 unavailable), added:
+
+1. **Queue timeout thresholds** - When to wait vs try fallback
+2. **Fallback GPU order** - L4 → A4000 → RTX 3090
+3. **Cloud type fallbacks** - Secure → Community
+4. **Status marking guidance** - PASS/PARTIAL/FAIL based on GPU used
+
+---
+
+## Discussion Points
+
+1. **How often should full suite run?** Weekly? Monthly? On-demand only?
+2. **Budget for test runs?** ~$5-10 per full run was mentioned
+3. **Who reviews test reports?** Auto-file issues for failures?
+4. **Should we version the test definitions?** Track which tests existed at which doc version?
+
+---
+
+## Next Steps
+
+1. ~~Fix blockers (#1, #2, #3, #4, #5) immediately~~ ✅ All complete
+2. Merge PR with fixes
+3. Create issues for Phase 1-5 improvements
+4. Discuss automation priorities with team
diff --git a/tests/README.md b/tests/README.md
index f4179f16..3f9fc0e9 100644
--- a/tests/README.md
+++ b/tests/README.md
@@ -32,11 +32,20 @@ Run the vllm-deploy test using local docs
 All tests are defined in [TESTS.md](./TESTS.md) as a table with:
 - **ID**: Test identifier
 - **Goal**: What the user wants (one sentence)
-- **Cleanup**: Resource types to delete after test
+- **Expected Outcome**: What constitutes PASS
+
+**Smoke tests** are fast tests that don't require GPU deployments (SDK installs, read-only API calls, public endpoints).
+
+Cleanup rules are defined globally at the bottom of TESTS.md. All test resources use the `doc_test_` prefix.
 
 ## Reports
 
-Reports are saved to `reports/` (gitignored) and include:
-- What worked / what didn't
-- Where the agent got stuck
-- Documentation improvements needed
+Reports are saved to two locations:
+- `reports/` (gitignored, in repo)
+- `~/Dev/doc-tests/` (persistent local archive)
+
+Each report includes:
+- Git SHA and branch
+- Steps taken
+- Actual vs expected results
+- Documentation gaps and suggestions
diff --git a/tests/TESTS.md b/tests/TESTS.md
index 0f733b84..0a3b606f 100644
--- a/tests/TESTS.md
+++ b/tests/TESTS.md
@@ -14,6 +14,10 @@ Run the flash-quickstart test
 Run all vLLM tests
 ```
 
+```
+Run smoke tests
+```
+
 ### Doc Source Modes
 
 **Published docs (default)** - Uses the Runpod Docs MCP server to search published documentation:
@@ -28,32 +32,71 @@ Run the vllm-deploy test using local docs
 
 When using local docs, the agent will search and read `.mdx` files in this repository instead of querying the MCP server.
 
+### Test Tiers
+
+**Smoke tests** - Fast tests that don't deploy GPU resources. Use for quick validation:
+```
+Run smoke tests
+Run all smoke tests using local docs
+```
+
+**Full tests** - All tests including GPU deployments. Use for comprehensive validation.
+
 ## Test Format
 
 Each test has:
 - **ID**: Unique identifier for the test
 - **Goal**: What a user would ask (one sentence, no hints)
-- **Cleanup**: Resources to delete after test (all use `doc_test_*` prefix)
+- **Expected Outcome**: What constitutes PASS (objective, measurable)
+
+Cleanup rules are defined in the [Cleanup Rules](#cleanup-rules) section at the bottom. All test resources use the `doc_test_` prefix.
+
+---
+
+## Smoke Tests
+
+Fast tests that don't require GPU deployments. Run these for quick validation.
+
+| ID | Goal | Expected Outcome |
+|----|------|------------------|
+| sdk-python-install | Install the Runpod Python SDK | `import runpod` succeeds |
+| sdk-js-install | Install the Runpod JavaScript SDK | `require('runpod-sdk')` succeeds |
+| cli-install | Install runpodctl on your local machine | `runpodctl version` returns version |
+| cli-configure | Configure runpodctl with your API key | `runpodctl config` shows configured key |
+| cli-list-pods | List pods using runpodctl | `runpodctl get pods` returns list |
+| template-list | List all templates | API returns template array |
+| api-key-create | Create an API key with specific permissions | New API key ID returned |
+| pods-add-ssh-key | Add an SSH key to your Runpod account | Key appears in account |
+| public-flux | Generate an image using FLUX public endpoint | Image data returned |
+| public-qwen | Use the Qwen3 32B public endpoint | Chat completion returned |
+| public-video | Generate video using WAN public endpoint | Video generation starts |
+| serverless-metrics | View endpoint metrics (execution time, delay) | Metrics data returned |
+
+**Run smoke tests:**
+```
+Run smoke tests
+Run all smoke tests using local docs
+```
 
 ---
 
 ## Flash SDK
 
-| ID | Goal | Difficulty |
-|----|------|------------|
-| flash-quickstart | Deploy a GPU function using Flash | Easy |
-| flash-hello-gpu | Run a simple PyTorch function on a GPU | Easy |
-| flash-sdxl | Generate an image using SDXL with Flash | Medium |
-| flash-text-gen | Deploy a text generation model with Flash | Medium |
-| flash-dependencies | Deploy a function with custom pip dependencies | Easy |
-| flash-multi-gpu | Create an endpoint that uses multiple GPUs | Medium |
-| flash-cpu-endpoint | Deploy a CPU-only endpoint with Flash | Easy |
-| flash-load-balancer | Build a REST API with load balancing using Flash | Hard |
-| flash-mixed-workers | Create an app with both GPU and CPU workers | Hard |
-| flash-env-vars | Configure environment variables for a Flash endpoint | Easy |
-| flash-idle-timeout | Set a custom idle timeout for a Flash endpoint | Easy |
-| flash-app-deploy | Initialize and deploy a complete Flash app | Medium |
-| flash-local-test | Test a Flash function locally before deploying | Medium |
+| ID | Goal | Expected Outcome |
+|----|------|------------------|
+| flash-quickstart | Deploy a GPU function using Flash | Endpoint responds to request |
+| flash-hello-gpu | Run a simple PyTorch function on a GPU | PyTorch GPU tensor returned |
+| flash-sdxl | Generate an image using SDXL with Flash | Image bytes returned |
+| flash-text-gen | Deploy a text generation model with Flash | Generated text returned |
+| flash-dependencies | Deploy a function with custom pip dependencies | Function using deps succeeds |
+| flash-multi-gpu | Create an endpoint that uses multiple GPUs | Multi-GPU endpoint responds |
+| flash-cpu-endpoint | Deploy a CPU-only endpoint with Flash | CPU endpoint responds |
+| flash-load-balancer | Build a REST API with load balancing using Flash | Multiple routes respond |
+| flash-mixed-workers | Create an app with both GPU and CPU workers | Both worker types respond |
+| flash-env-vars | Configure environment variables for a Flash endpoint | Env vars accessible in function |
+| flash-idle-timeout | Set a custom idle timeout for a Flash endpoint | Timeout visible in config |
+| flash-app-deploy | Initialize and deploy a complete Flash app | App deploys successfully |
+| flash-local-test | Test a Flash function locally before deploying | Local test passes |
 
 ---
 
@@ -61,27 +104,27 @@ Each test has:
 
 > **Important:** Do NOT use public endpoints for these tests. The goal is to test the full deployment workflow: deploy an endpoint, send requests, and verify the integration works. Public endpoints are a separate product and skip the deployment steps we need to validate.
 
-| ID | Goal | Difficulty |
-|----|------|------------|
-| serverless-create-endpoint | Create a serverless endpoint | Medium |
-| serverless-serve-qwen | Create an endpoint to serve a Qwen model | Hard |
-| serverless-custom-handler | Write a custom handler function and deploy it | Hard |
-| serverless-logs | Build a custom handler that uses progress_update() to send log messages, deploy it, and verify updates appear in /status polling | Hard |
-| serverless-send-request | Send a request to an existing endpoint | Easy |
-| serverless-async-request | Submit an async job and poll for results | Medium |
-| serverless-sync-request | Make a synchronous request to an endpoint using /runsync | Easy |
-| serverless-streaming | Build a custom handler that uses yield to stream results, deploy it, and test the /stream endpoint | Hard |
-| serverless-webhook | Set up webhook notifications for a serverless endpoint | Medium |
-| serverless-cancel-job | Cancel a running or queued job | Easy |
-| serverless-queue-delay | Create an endpoint with queue delay scaling | Medium |
-| serverless-request-count | Create an endpoint with request count scaling | Medium |
-| serverless-min-workers | Create an endpoint with 1 minimum active worker | Easy |
-| serverless-idle-timeout | Create an endpoint with an idle timeout of 20 seconds | Easy |
-| serverless-gpu-priority | Create an endpoint with GPU type priority/fallback | Medium |
-| serverless-docker-deploy | Deploy an endpoint from Docker Hub | Hard |
-| serverless-github-deploy | Deploy an endpoint from GitHub | Hard |
-| serverless-ssh-worker | SSH into a running worker for debugging | Medium |
-| serverless-metrics | View endpoint metrics (execution time, delay) | Easy |
+| ID | Goal | Expected Outcome |
+|----|------|------------------|
+| serverless-create-endpoint | Create a serverless endpoint | Endpoint ID returned |
+| serverless-serve-qwen | Create an endpoint to serve a Qwen model | Chat completion works |
+| serverless-custom-handler | Write a custom handler function and deploy it | Handler responds to request |
+| serverless-logs | Build a custom handler that uses progress_update() to send log messages, deploy it, and verify updates appear in /status polling | Progress updates in /status |
+| serverless-send-request | Send a request to an existing endpoint | Response received |
+| serverless-async-request | Submit an async job and poll for results | Job completes, output returned |
+| serverless-sync-request | Make a synchronous request to an endpoint using /runsync | Sync response returned |
+| serverless-streaming | Build a custom handler that uses yield to stream results, deploy it, and test the /stream endpoint | Streamed chunks received |
+| serverless-webhook | Set up webhook notifications for a serverless endpoint | Webhook receives callback |
+| serverless-cancel-job | Cancel a running or queued job | Job status is CANCELLED |
+| serverless-queue-delay | Create an endpoint with queue delay scaling | Scaler type is QUEUE_DELAY |
+| serverless-request-count | Create an endpoint with request count scaling | Scaler type is REQUEST_COUNT |
+| serverless-min-workers | Create an endpoint with 1 minimum active worker | workersMin is 1 |
+| serverless-idle-timeout | Create an endpoint with an idle timeout of 20 seconds | idleTimeout is 20 |
+| serverless-gpu-priority | Create an endpoint with GPU type priority/fallback | Multiple GPU types listed |
+| serverless-docker-deploy | Deploy an endpoint from Docker Hub | Endpoint from Docker image |
+| serverless-github-deploy | Deploy an endpoint from GitHub | Endpoint from GitHub repo |
+| serverless-ssh-worker | SSH into a running worker for debugging | SSH session established |
+| serverless-metrics | View endpoint metrics (execution time, delay) | Metrics data returned |
 
 ---
 
@@ -89,148 +132,148 @@ Each test has:
 
 > **Important:** Do NOT use public endpoints for these tests. Deploy your own vLLM endpoint to test the full workflow. Public endpoints skip the deployment and configuration steps we need to validate.
 
-| ID | Goal | Difficulty |
-|----|------|------------|
-| vllm-deploy | Deploy a vLLM endpoint | Medium |
-| vllm-openai-compat | Use the OpenAI Python client with a vLLM endpoint | Medium |
-| vllm-chat-completion | Send a chat completion request to vLLM | Easy |
-| vllm-streaming | Stream responses from a vLLM endpoint | Medium |
-| vllm-custom-model | Deploy a custom/fine-tuned model with vLLM | Hard |
-| vllm-gated-model | Deploy a gated Hugging Face model with vLLM | Medium |
+| ID | Goal | Expected Outcome |
+|----|------|------------------|
+| vllm-deploy | Deploy a vLLM endpoint | Endpoint responds to /health |
+| vllm-openai-compat | Use the OpenAI Python client with a vLLM endpoint | OpenAI client call succeeds |
+| vllm-chat-completion | Send a chat completion request to vLLM | Chat response returned |
+| vllm-streaming | Stream responses from a vLLM endpoint | Streamed tokens received |
+| vllm-custom-model | Deploy a custom/fine-tuned model with vLLM | Custom model responds |
+| vllm-gated-model | Deploy a gated Hugging Face model with vLLM | Gated model loads and responds |
 
 ---
 
 ## Pods
 
-| ID | Goal | Difficulty |
-|----|------|------------|
-| pods-quickstart-terminal | Complete the Pod quickstart using only the terminal | Easy |
-| pods-create | Create a GPU Pod | Medium |
-| pods-start-stop | Start and stop an existing Pod | Easy |
-| pods-ssh-connect | Connect to a Pod via SSH | Medium |
-| pods-expose-port | Expose a custom port on a Pod | Medium |
-| pods-env-vars | Set environment variables on a Pod | Easy |
-| pods-resize-storage | Resize a Pod's container or volume disk | Easy |
-| pods-template-use | Deploy a Pod using a custom template | Medium |
-| pods-template-create | Create a custom Pod template | Hard |
-| pods-comfyui | Deploy ComfyUI on a Pod and generate an image | Hard |
+| ID | Goal | Expected Outcome |
+|----|------|------------------|
+| pods-quickstart-terminal | Complete the Pod quickstart using only the terminal | Code runs on Pod via SSH |
+| pods-add-ssh-key | Add an SSH key to your Runpod account | Key appears in account |
+| pods-create | Create a GPU Pod | Pod status is RUNNING |
+| pods-start-stop | Start and stop an existing Pod | Pod starts and stops |
+| pods-ssh-connect | Connect to a Pod via SSH | SSH session established |
+| pods-expose-port | Expose a custom port on a Pod | Port accessible via URL |
+| pods-env-vars | Set environment variables on a Pod | Env vars visible in Pod |
+| pods-resize-storage | Resize a Pod's container or volume disk | Storage size increased |
+| pods-template-use | Deploy a Pod using a custom template | Pod uses template config |
+| pods-template-create | Create a custom Pod template | Template ID returned |
+| pods-comfyui | Deploy ComfyUI on a Pod and generate an image | ComfyUI generates image |
 
 ---
 
 ## Storage
 
-| ID | Goal | Difficulty |
-|----|------|------------|
-| storage-create-volume | Create a network volume | Easy |
-| storage-attach-pod | Attach a network volume to a Pod | Medium |
-| storage-attach-serverless | Attach a network volume to a Serverless endpoint | Medium |
-| storage-s3-api | Access a network volume using the S3 API | Hard |
-| storage-upload-s3 | Upload a file to a network volume using S3 | Hard |
-| storage-download-s3 | Download a file from a network volume using S3 | Hard |
-| storage-runpodctl-send | Transfer files between Pods using runpodctl | Easy |
-| storage-migrate-volume | Migrate data between network volumes | Hard |
-| storage-cloud-sync | Sync data with cloud storage (S3, GCS) | Hard |
-| storage-scp-transfer | Transfer files to a Pod using SCP | Medium |
-| storage-rsync | Sync files to a Pod using rsync | Medium |
+| ID | Goal | Expected Outcome |
+|----|------|------------------|
+| storage-create-volume | Create a network volume | Volume ID returned |
+| storage-attach-pod | Attach a network volume to a Pod | Volume mounted in Pod |
+| storage-attach-serverless | Attach a network volume to a Serverless endpoint | Volume accessible to workers |
+| storage-s3-api | Access a network volume using the S3 API | S3 list/read works |
+| storage-upload-s3 | Upload a file to a network volume using S3 | File appears on volume |
+| storage-download-s3 | Download a file from a network volume using S3 | File downloaded locally |
+| storage-runpodctl-send | Transfer files between Pods using runpodctl | File arrives on target Pod |
+| storage-migrate-volume | Migrate data between network volumes | Data exists on new volume |
+| storage-cloud-sync | Sync data with cloud storage (S3, GCS) | Data synced both ways |
+| storage-scp-transfer | Transfer files to a Pod using SCP | File arrives on Pod |
+| storage-rsync | Sync files to a Pod using rsync | Files synced to Pod |
 
 ---
 
 ## Templates
 
-| ID | Goal | Difficulty |
-|----|------|------------|
-| template-create-pod | Create a Pod template | Medium |
-| template-create-serverless | Create a Serverless template | Medium |
-| template-list | List all templates | Easy |
-| template-preload-model | Create a template with a pre-loaded model | Hard |
-| template-custom-dockerfile | Create a template with a custom Dockerfile | Hard |
-| template-env-vars | Add environment variables to a template | Easy |
+| ID | Goal | Expected Outcome |
+|----|------|------------------|
+| template-create-pod | Create a Pod template | Template ID returned |
+| template-create-serverless | Create a Serverless template | Template ID returned |
+| template-list | List all templates | Template array returned |
+| template-preload-model | Create a template with a pre-loaded model | Model preloads on start |
+| template-custom-dockerfile | Create a template with a custom Dockerfile | Template uses custom image |
+| template-env-vars | Add environment variables to a template | Env vars in template config |
 
 ---
 
 ## Instant Clusters
 
-| ID | Goal | Difficulty |
-|----|------|------------|
-| cluster-create | Create an Instant Cluster | Medium |
-| cluster-pytorch | Run distributed PyTorch training on a cluster | Hard |
-| cluster-slurm | Deploy a Slurm cluster | Hard |
-| cluster-axolotl | Fine-tune an LLM with Axolotl on a cluster | Hard |
+| ID | Goal | Expected Outcome |
+|----|------|------------------|
+| cluster-create | Create an Instant Cluster | Cluster nodes are RUNNING |
+| cluster-pytorch | Run distributed PyTorch training on a cluster | Training completes on all nodes |
+| cluster-slurm | Deploy a Slurm cluster | Slurm queue accepts jobs |
+| cluster-axolotl | Fine-tune an LLM with Axolotl on a cluster | Fine-tuning starts |
 
 ---
 
 ## SDKs & APIs
 
-| ID | Goal | Difficulty |
-|----|------|------------|
-| sdk-python-install | Install the Runpod Python SDK | Easy |
-| sdk-python-endpoint | Use the Python SDK to call an endpoint | Easy |
-| sdk-js-install | Install the Runpod JavaScript SDK | Easy |
-| sdk-js-endpoint | Use the JavaScript SDK to call an endpoint | Easy |
-| api-graphql-query | Make a GraphQL query to list pods | Medium |
-| api-graphql-mutation | Create a resource using GraphQL mutation | Medium |
-| api-key-create | Create an API key with specific permissions | Easy |
-| api-key-restricted | Create a restricted API key | Medium |
+| ID | Goal | Expected Outcome |
+|----|------|------------------|
+| sdk-python-install | Install the Runpod Python SDK | `import runpod` succeeds |
+| sdk-python-endpoint | Use the Python SDK to call an endpoint | SDK call returns response |
+| sdk-js-install | Install the Runpod JavaScript SDK | `require('runpod-sdk')` succeeds |
+| sdk-js-endpoint | Use the JavaScript SDK to call an endpoint | SDK call returns response |
+| api-graphql-query | Make a GraphQL query to list pods | Query returns pod list |
+| api-graphql-mutation | Create a resource using GraphQL mutation | Resource created via mutation |
+| api-key-create | Create an API key with specific permissions | New API key ID returned |
+| api-key-restricted | Create a restricted API key | Key has limited permissions |
 
 ---
 
 ## CLI (runpodctl)
 
-| ID | Goal | Difficulty |
-|----|------|------------|
-| cli-install | Install runpodctl on your local machine | Easy |
-| cli-configure | Configure runpodctl with your API key | Easy |
-| cli-list-pods | List pods using runpodctl | Easy |
-| cli-create-pod | Create a pod using runpodctl | Medium |
-| cli-send-file | Send a file to a Pod using runpodctl | Medium |
-| cli-receive-file | Receive a file from a Pod using runpodctl | Medium |
+| ID | Goal | Expected Outcome |
+|----|------|------------------|
+| cli-install | Install runpodctl on your local machine | `runpodctl version` returns version |
+| cli-configure | Configure runpodctl with your API key | `runpodctl config` shows key |
+| cli-list-pods | List pods using runpodctl | `runpodctl get pods` returns list |
+| cli-create-pod | Create a pod using runpodctl | Pod ID returned |
+| cli-send-file | Send a file to a Pod using runpodctl | File arrives on Pod |
+| cli-receive-file | Receive a file from a Pod using runpodctl | File downloaded locally |
 
 ---
 
 ## Model Caching
 
-| ID | Goal | Difficulty |
-|----|------|------------|
-| cache-enable | Create an endpoint with model caching enabled | Medium |
+| ID | Goal | Expected Outcome |
+|----|------|------------------|
+| cache-enable | Create an endpoint with model caching enabled | Caching enabled in config |
 
 ---
 
 ## Integrations
 
-| ID | Goal | Difficulty |
-|----|------|------------|
-| integration-openai-migrate | Create an OpenAI-compatible endpoint | Medium |
-| integration-vercel-ai | Create an image generation app with the Vercel AI SDK | Medium |
-| integration-cursor | Configure Cursor to use Runpod endpoints | Medium |
-| integration-skypilot | Use Runpod with SkyPilot | Hard |
+| ID | Goal | Expected Outcome |
+|----|------|------------------|
+| integration-openai-migrate | Create an OpenAI-compatible endpoint | OpenAI client works |
+| integration-vercel-ai | Create an image generation app with the Vercel AI SDK | Image generated via Vercel AI |
+| integration-cursor | Configure Cursor to use Runpod endpoints | Cursor uses Runpod backend |
+| integration-skypilot | Use Runpod with SkyPilot | SkyPilot launches on Runpod |
 
 ---
 
 ## Public Endpoints
 
-| ID | Goal | Difficulty |
-|----|------|------------|
-| public-flux | Generate an image using FLUX public endpoint | Easy |
-| public-qwen | Use the Qwen3 32B public endpoint | Easy |
-| public-video | Generate video using WAN public endpoint | Medium |
+| ID | Goal | Expected Outcome |
+|----|------|------------------|
+| public-flux | Generate an image using FLUX public endpoint | Image data returned |
+| public-qwen | Use the Qwen3 32B public endpoint | Chat completion returned |
+| public-video | Generate video using WAN public endpoint | Video generation starts |
 
 ---
 
 ## Tutorials (End-to-End)
 
-| ID | Goal | Difficulty |
-|----|------|------------|
-| tutorial-sdxl-serverless | Deploy SDXL as a serverless endpoint | Medium |
-| tutorial-comfyui-pod | Deploy ComfyUI on a Pod and generate an image | Medium |
-| tutorial-comfyui-serverless | Deploy ComfyUI as a serverless endpoint and generate an image | Hard |
-| tutorial-gemma-chatbot | Deploy a Gemma 3 chatbot with vLLM | Medium |
-| tutorial-custom-worker | Build and deploy a custom worker | Hard |
-| tutorial-web-integration | Integrate a Serverless endpoint into a web application | Hard |
-| tutorial-dual-mode-worker | Deploy a dual-mode (Pod/Serverless) worker | Hard |
-| tutorial-model-caching | Create an endpoint with model caching enabled | Hard |
-| tutorial-pytorch-cluster | Deploy a PyTorch cluster | Hard |
+| ID | Goal | Expected Outcome |
+|----|------|------------------|
+| tutorial-sdxl-serverless | Deploy SDXL as a serverless endpoint | SDXL generates image |
+| tutorial-comfyui-pod | Deploy ComfyUI on a Pod and generate an image | ComfyUI workflow executes |
+| tutorial-comfyui-serverless | Deploy ComfyUI as a serverless endpoint and generate an image | ComfyUI endpoint generates image |
+| tutorial-gemma-chatbot | Deploy a Gemma 3 chatbot with vLLM | Chatbot responds |
+| tutorial-custom-worker | Build and deploy a custom worker | Custom worker responds |
+| tutorial-web-integration | Integrate a Serverless endpoint into a web application | Web app calls endpoint |
+| tutorial-dual-mode-worker | Deploy a dual-mode (Pod/Serverless) worker | Both modes work |
+| tutorial-model-caching | Create an endpoint with model caching enabled | Caching improves cold start |
+| tutorial-pytorch-cluster | Deploy a PyTorch cluster | Distributed training runs |
 
----
 ---
 
 ## Cleanup Rules
diff --git a/tests/scripts/README.md b/tests/scripts/README.md
new file mode 100644
index 00000000..f0f132fa
--- /dev/null
+++ b/tests/scripts/README.md
@@ -0,0 +1,75 @@
+# Test Scripts
+
+Utility scripts for the documentation testing framework.
+
+## cleanup.py
+
+Finds and deletes Runpod resources matching the test prefix (`doc_test_*`).
+
+```bash
+# Dry run - see what would be deleted
+python cleanup.py
+
+# Actually delete resources
+python cleanup.py --delete
+
+# Use custom prefix
+python cleanup.py --prefix my_test_
+```
+
+**Requirements:** `requests`, `RUNPOD_API_KEY` env var
+
+## report.py
+
+Generates a test report template with metadata pre-filled.
+
+```bash
+# Generate report for a passing test
+python report.py pods-quickstart-terminal PASS
+
+# Mark as using local docs
+python report.py pods-quickstart-terminal PASS --local
+
+# Generate report for a failing test
+python report.py flash-quickstart FAIL
+```
+
+**Output:** Creates report in both:
+- `tests/reports/<test-id>-<timestamp>.md`
+- `~/Dev/doc-tests/<test-id>-<timestamp>.md`
+
+The template includes:
+- Timestamp, git SHA, branch
+- Test goal and expected outcome (from TESTS.md)
+- Placeholder sections for you to fill in
+
+## stats.py
+
+Analyzes historical test reports to show pass rates and trends.
+
+```bash
+# Show overall summary
+python stats.py
+
+# Group by test ID
+python stats.py --by-test
+
+# Show last 10 reports
+python stats.py --recent 10
+
+# Show only failures
+python stats.py --failures
+```
+
+**Data source:** Reads reports from `~/Dev/doc-tests/`
+
+## CI Integration
+
+Add to GitHub Actions for scheduled cleanup:
+
+```yaml
+- name: Cleanup orphaned test resources
+  env:
+    RUNPOD_API_KEY: ${{ secrets.RUNPOD_API_KEY }}
+  run: python tests/scripts/cleanup.py --delete
+```
diff --git a/tests/scripts/cleanup.py b/tests/scripts/cleanup.py
new file mode 100755
index 00000000..9779de99
--- /dev/null
+++ b/tests/scripts/cleanup.py
@@ -0,0 +1,218 @@
+#!/usr/bin/env python3
+"""
+Cleanup script for documentation agent tests.
+
+Deletes all Runpod resources matching the test prefix (doc_test_*).
+Can be run manually or scheduled in CI to catch orphaned resources.
+
+Usage:
+    python cleanup.py              # Dry run (list only)
+    python cleanup.py --delete     # Actually delete resources
+    python cleanup.py --prefix my_ # Use custom prefix
+
+Requires:
+    RUNPOD_API_KEY environment variable
+"""
+
+import argparse
+import os
+import sys
+from typing import Any
+
+try:
+    import requests
+except ImportError:
+    print("Error: requests library required. Install with: pip install requests")
+    sys.exit(1)
+
+
+API_BASE = "https://rest.runpod.io/v1"
+DEFAULT_PREFIX = "doc_test_"
+
+
+def get_headers() -> dict:
+    """Get authorization headers."""
+    api_key = os.environ.get("RUNPOD_API_KEY")
+    if not api_key:
+        print("Error: RUNPOD_API_KEY environment variable not set")
+        sys.exit(1)
+    return {"Authorization": f"Bearer {api_key}"}
+
+
+def list_pods(prefix: str) -> list[dict[str, Any]]:
+    """List pods matching prefix."""
+    resp = requests.get(f"{API_BASE}/pods", headers=get_headers())
+    resp.raise_for_status()
+    pods = resp.json()
+    if isinstance(pods, dict):
+        pods = pods.get("pods", [])
+    return [p for p in pods if p.get("name", "").startswith(prefix)]
+
+
+def list_endpoints(prefix: str) -> list[dict[str, Any]]:
+    """List serverless endpoints matching prefix."""
+    resp = requests.get(f"{API_BASE}/endpoints", headers=get_headers())
+    resp.raise_for_status()
+    endpoints = resp.json()
+    if isinstance(endpoints, dict):
+        endpoints = endpoints.get("endpoints", [])
+    return [e for e in endpoints if e.get("name", "").startswith(prefix)]
+
+
+def list_templates(prefix: str) -> list[dict[str, Any]]:
+    """List templates matching prefix."""
+    resp = requests.get(f"{API_BASE}/templates", headers=get_headers())
+    resp.raise_for_status()
+    templates = resp.json()
+    if isinstance(templates, dict):
+        templates = templates.get("templates", [])
+    return [t for t in templates if t.get("name", "").startswith(prefix)]
+
+
+def list_network_volumes(prefix: str) -> list[dict[str, Any]]:
+    """List network volumes matching prefix."""
+    resp = requests.get(f"{API_BASE}/network-volumes", headers=get_headers())
+    resp.raise_for_status()
+    volumes = resp.json()
+    if isinstance(volumes, dict):
+        volumes = volumes.get("networkVolumes", [])
+    return [v for v in volumes if v.get("name", "").startswith(prefix)]
+
+
+def delete_pod(pod_id: str) -> bool:
+    """Delete a pod by ID."""
+    resp = requests.delete(f"{API_BASE}/pods/{pod_id}", headers=get_headers())
+    return resp.status_code == 200
+
+
+def delete_endpoint(endpoint_id: str) -> bool:
+    """Delete an endpoint by ID."""
+    resp = requests.delete(f"{API_BASE}/endpoints/{endpoint_id}", headers=get_headers())
+    return resp.status_code == 200
+
+
+def delete_template(template_id: str) -> bool:
+    """Delete a template by ID."""
+    resp = requests.delete(f"{API_BASE}/templates/{template_id}", headers=get_headers())
+    return resp.status_code == 200
+
+
+def delete_network_volume(volume_id: str) -> bool:
+    """Delete a network volume by ID."""
+    resp = requests.delete(
+        f"{API_BASE}/network-volumes/{volume_id}", headers=get_headers()
+    )
+    return resp.status_code == 200
+
+
+def main():
+    parser = argparse.ArgumentParser(
+        description="Clean up test resources matching prefix"
+    )
+    parser.add_argument(
+        "--delete", action="store_true", help="Actually delete (default: dry run)"
+    )
+    parser.add_argument(
+        "--prefix", default=DEFAULT_PREFIX, help=f"Resource prefix (default: {DEFAULT_PREFIX})"
+    )
+    args = parser.parse_args()
+
+    prefix = args.prefix
+    dry_run = not args.delete
+
+    if dry_run:
+        print(f"DRY RUN - Looking for resources matching '{prefix}*'\n")
+    else:
+        print(f"DELETING resources matching '{prefix}*'\n")
+
+    # Track totals
+    found = {"pods": 0, "endpoints": 0, "templates": 0, "volumes": 0}
+    deleted = {"pods": 0, "endpoints": 0, "templates": 0, "volumes": 0}
+
+    # Pods
+    print("Pods:")
+    pods = list_pods(prefix)
+    found["pods"] = len(pods)
+    if not pods:
+        print("  (none found)")
+    for pod in pods:
+        pod_id = pod.get("id")
+        name = pod.get("name")
+        if dry_run:
+            print(f"  Would delete: {name} ({pod_id})")
+        else:
+            if delete_pod(pod_id):
+                print(f"  Deleted: {name} ({pod_id})")
+                deleted["pods"] += 1
+            else:
+                print(f"  Failed to delete: {name} ({pod_id})")
+
+    # Endpoints
+    print("\nEndpoints:")
+    endpoints = list_endpoints(prefix)
+    found["endpoints"] = len(endpoints)
+    if not endpoints:
+        print("  (none found)")
+    for endpoint in endpoints:
+        endpoint_id = endpoint.get("id")
+        name = endpoint.get("name")
+        if dry_run:
+            print(f"  Would delete: {name} ({endpoint_id})")
+        else:
+            if delete_endpoint(endpoint_id):
+                print(f"  Deleted: {name} ({endpoint_id})")
+                deleted["endpoints"] += 1
+            else:
+                print(f"  Failed to delete: {name} ({endpoint_id})")
+
+    # Templates
+    print("\nTemplates:")
+    templates = list_templates(prefix)
+    found["templates"] = len(templates)
+    if not templates:
+        print("  (none found)")
+    for template in templates:
+        template_id = template.get("id")
+        name = template.get("name")
+        if dry_run:
+            print(f"  Would delete: {name} ({template_id})")
+        else:
+            if delete_template(template_id):
+                print(f"  Deleted: {name} ({template_id})")
+                deleted["templates"] += 1
+            else:
+                print(f"  Failed to delete: {name} ({template_id})")
+
+    # Network Volumes
+    print("\nNetwork Volumes:")
+    volumes = list_network_volumes(prefix)
+    found["volumes"] = len(volumes)
+    if not volumes:
+        print("  (none found)")
+    for volume in volumes:
+        volume_id = volume.get("id")
+        name = volume.get("name")
+        if dry_run:
+            print(f"  Would delete: {name} ({volume_id})")
+        else:
+            if delete_network_volume(volume_id):
+                print(f"  Deleted: {name} ({volume_id})")
+                deleted["volumes"] += 1
+            else:
+                print(f"  Failed to delete: {name} ({volume_id})")
+
+    # Summary
+    print("\n" + "=" * 40)
+    total_found = sum(found.values())
+    total_deleted = sum(deleted.values())
+
+    if dry_run:
+        print(f"Found {total_found} resources matching '{prefix}*'")
+        if total_found > 0:
+            print("Run with --delete to remove them")
+    else:
+        print(f"Deleted {total_deleted}/{total_found} resources")
+
+
+if __name__ == "__main__":
+    main()
diff --git a/tests/scripts/report.py b/tests/scripts/report.py
new file mode 100755
index 00000000..36422497
--- /dev/null
+++ b/tests/scripts/report.py
@@ -0,0 +1,143 @@
+#!/usr/bin/env python3
+"""
+Report generator for documentation tests.
+
+Generates a report template with metadata pre-filled (timestamp, git info).
+The agent fills in the remaining sections.
+
+Usage:
+    python report.py <test-id> <status> [--local]
+
+Arguments:
+    test-id: The test ID (e.g., pods-quickstart-terminal)
+    status: PASS, FAIL, or PARTIAL
+    --local: Mark as using local docs (default: published)
+
+Example:
+    python report.py pods-quickstart-terminal PASS --local
+"""
+
+import argparse
+import os
+import subprocess
+import sys
+from datetime import datetime
+from pathlib import Path
+
+
+def get_git_info() -> tuple[str, str]:
+    """Get current git SHA and branch."""
+    try:
+        sha = subprocess.check_output(
+            ["git", "rev-parse", "--short", "HEAD"],
+            stderr=subprocess.DEVNULL
+        ).decode().strip()
+    except Exception:
+        sha = "unknown"
+
+    try:
+        branch = subprocess.check_output(
+            ["git", "branch", "--show-current"],
+            stderr=subprocess.DEVNULL
+        ).decode().strip()
+    except Exception:
+        branch = "unknown"
+
+    return sha, branch
+
+
+def get_test_definition(test_id: str) -> tuple[str, str]:
+    """Look up test goal and expected outcome from TESTS.md."""
+    tests_file = Path(__file__).parent.parent / "TESTS.md"
+
+    if not tests_file.exists():
+        return "Unknown", "Unknown"
+
+    with open(tests_file) as f:
+        for line in f:
+            if line.startswith("|") and test_id in line:
+                parts = [p.strip() for p in line.split("|")]
+                if len(parts) >= 4 and parts[1] == test_id:
+                    return parts[2], parts[3]  # goal, expected outcome
+
+    return "Unknown", "Unknown"
+
+
+def generate_report(test_id: str, status: str, local: bool) -> str:
+    """Generate the report markdown."""
+    now = datetime.now()
+    timestamp = now.strftime("%Y-%m-%d %H:%M:%S")
+    sha, branch = get_git_info()
+    doc_source = "Local" if local else "Published"
+    goal, expected = get_test_definition(test_id)
+
+    return f"""# Test Report: {test_id}
+
+## Metadata
+| Field | Value |
+|-------|-------|
+| **Test ID** | {test_id} |
+| **Date** | {timestamp} |
+| **Git SHA** | {sha} |
+| **Git Branch** | {branch} |
+| **Doc Source** | {doc_source} |
+| **Status** | {status} |
+
+## Goal
+{goal}
+
+## Expected Outcome
+{expected}
+
+## Actual Result
+<!-- Describe what actually happened -->
+
+## Steps Taken
+<!-- List the steps you took -->
+1.
+2.
+3.
+
+## Documentation Gaps
+<!-- What was missing or unclear? Be specific about which page/section -->
+
+## Suggestions
+<!-- Concrete improvements to make this test pass -->
+"""
+
+
+def main():
+    parser = argparse.ArgumentParser(description="Generate test report template")
+    parser.add_argument("test_id", help="Test ID (e.g., pods-quickstart-terminal)")
+    parser.add_argument("status", choices=["PASS", "FAIL", "PARTIAL"], help="Test status")
+    parser.add_argument("--local", action="store_true", help="Mark as using local docs")
+    args = parser.parse_args()
+
+    # Generate timestamp for filename
+    timestamp = datetime.now().strftime("%Y%m%d-%H%M%S")
+    filename = f"{args.test_id}-{timestamp}.md"
+
+    # Generate report content
+    content = generate_report(args.test_id, args.status, args.local)
+
+    # Save to both locations
+    repo_reports = Path(__file__).parent.parent / "reports"
+    archive_reports = Path.home() / "Dev" / "doc-tests"
+
+    repo_reports.mkdir(exist_ok=True)
+    archive_reports.mkdir(exist_ok=True)
+
+    repo_path = repo_reports / filename
+    archive_path = archive_reports / filename
+
+    repo_path.write_text(content)
+    archive_path.write_text(content)
+
+    print(f"Report template created:")
+    print(f"  - {repo_path}")
+    print(f"  - {archive_path}")
+    print(f"\nEdit the report to fill in: Actual Result, Steps Taken, Documentation Gaps, Suggestions")
+
+
+if __name__ == "__main__":
+    main()
diff --git a/tests/scripts/stats.py b/tests/scripts/stats.py
new file mode 100755
index 00000000..95cd6c16
--- /dev/null
+++ b/tests/scripts/stats.py
@@ -0,0 +1,170 @@
+#!/usr/bin/env python3
+"""
+Test statistics analyzer.
+
+Analyzes historical test reports to show pass rates and trends.
+
+Usage:
+    python stats.py              # Show overall stats
+    python stats.py --by-test    # Group by test ID
+    python stats.py --recent 10  # Show last 10 reports
+    python stats.py --failures   # Show only failures
+"""
+
+import argparse
+import re
+from collections import defaultdict
+from datetime import datetime
+from pathlib import Path
+
+
+def parse_report(path: Path) -> dict | None:
+    """Parse a report file and extract metadata."""
+    try:
+        content = path.read_text()
+
+        # Extract metadata from table
+        test_id_match = re.search(r"\*\*Test ID\*\*\s*\|\s*(\S+)", content)
+        date_match = re.search(r"\*\*Date\*\*\s*\|\s*(.+)", content)
+        status_match = re.search(r"\*\*Status\*\*\s*\|\s*(\S+)", content)
+        doc_source_match = re.search(r"\*\*Doc Source\*\*\s*\|\s*(\S+)", content)
+        git_sha_match = re.search(r"\*\*Git SHA\*\*\s*\|\s*(\S+)", content)
+
+        if not all([test_id_match, status_match]):
+            return None
+
+        return {
+            "file": path.name,
+            "test_id": test_id_match.group(1),
+            "date": date_match.group(1).strip() if date_match else "Unknown",
+            "status": status_match.group(1),
+            "doc_source": doc_source_match.group(1) if doc_source_match else "Unknown",
+            "git_sha": git_sha_match.group(1) if git_sha_match else "Unknown",
+        }
+    except Exception as e:
+        print(f"Warning: Could not parse {path}: {e}")
+        return None
+
+
+def load_reports() -> list[dict]:
+    """Load all reports from the archive directory."""
+    archive_dir = Path.home() / "Dev" / "doc-tests"
+
+    if not archive_dir.exists():
+        print(f"Archive directory not found: {archive_dir}")
+        return []
+
+    reports = []
+    for path in sorted(archive_dir.glob("*.md")):
+        report = parse_report(path)
+        if report:
+            reports.append(report)
+
+    return reports
+
+
+def show_summary(reports: list[dict]):
+    """Show overall summary statistics."""
+    if not reports:
+        print("No reports found.")
+        return
+
+    total = len(reports)
+    passed = sum(1 for r in reports if r["status"] == "PASS")
+    failed = sum(1 for r in reports if r["status"] == "FAIL")
+    partial = sum(1 for r in reports if r["status"] == "PARTIAL")
+
+    pass_rate = (passed / total) * 100 if total > 0 else 0
+
+    print("=" * 50)
+    print("TEST SUMMARY")
+    print("=" * 50)
+    print(f"Total runs:   {total}")
+    print(f"Passed:       {passed} ({pass_rate:.1f}%)")
+    print(f"Failed:       {failed}")
+    print(f"Partial:      {partial}")
+    print("=" * 50)
+
+
+def show_by_test(reports: list[dict]):
+    """Show statistics grouped by test ID."""
+    if not reports:
+        print("No reports found.")
+        return
+
+    by_test = defaultdict(list)
+    for r in reports:
+        by_test[r["test_id"]].append(r)
+
+    print("=" * 60)
+    print(f"{'TEST ID':<35} {'RUNS':<6} {'PASS':<6} {'RATE':<8}")
+    print("=" * 60)
+
+    for test_id in sorted(by_test.keys()):
+        runs = by_test[test_id]
+        total = len(runs)
+        passed = sum(1 for r in runs if r["status"] == "PASS")
+        rate = (passed / total) * 100 if total > 0 else 0
+        print(f"{test_id:<35} {total:<6} {passed:<6} {rate:.0f}%")
+
+    print("=" * 60)
+
+
+def show_recent(reports: list[dict], count: int):
+    """Show most recent reports."""
+    if not reports:
+        print("No reports found.")
+        return
+
+    recent = reports[-count:]
+
+    print("=" * 80)
+    print(f"{'DATE':<20} {'TEST ID':<30} {'STATUS':<10} {'SOURCE':<10}")
+    print("=" * 80)
+
+    for r in reversed(recent):
+        print(f"{r['date']:<20} {r['test_id']:<30} {r['status']:<10} {r['doc_source']:<10}")
+
+    print("=" * 80)
+
+
+def show_failures(reports: list[dict]):
+    """Show only failed tests."""
+    failures = [r for r in reports if r["status"] in ("FAIL", "PARTIAL")]
+
+    if not failures:
+        print("No failures found!")
+        return
+
+    print("=" * 80)
+    print("FAILURES AND PARTIAL PASSES")
+    print("=" * 80)
+
+    for r in failures:
+        print(f"\n{r['test_id']} - {r['status']}")
+        print(f"  Date: {r['date']}")
+        print(f"  File: {r['file']}")
+        print(f"  Git SHA: {r['git_sha']}")
+
+
+def main():
+    parser = argparse.ArgumentParser(description="Analyze test report statistics")
+    parser.add_argument("--by-test", action="store_true", help="Group by test ID")
+    parser.add_argument("--recent", type=int, metavar="N", help="Show last N reports")
+    parser.add_argument("--failures", action="store_true", help="Show only failures")
+    args = parser.parse_args()
+
+    reports = load_reports()
+
+    if args.by_test:
+        show_by_test(reports)
+    elif args.recent:
+        show_recent(reports, args.recent)
+    elif args.failures:
+        show_failures(reports)
+    else:
+        show_summary(reports)
+
+
+if __name__ == "__main__":
+    main()

From c0b46c328a2fc7c38ee1d171ece78a5fefe9aeb6 Mon Sep 17 00:00:00 2001
From: Mo King <muhsinking@gmail.com>
Date: Fri, 20 Mar 2026 13:01:35 -0400
Subject: [PATCH 6/8] Update styleguide with guidance on using cards for links

---
 .claude/style-guide.md          | 14 ++++++++++++++
 .cursor/rules/rp-styleguide.mdc | 14 ++++++++++++++
 2 files changed, 28 insertions(+)

diff --git a/.claude/style-guide.md b/.claude/style-guide.md
index fc584b87..00cbae5d 100644
--- a/.claude/style-guide.md
+++ b/.claude/style-guide.md
@@ -92,3 +92,17 @@ The `handler` function receives a job dictionary containing the input from the A
 - Use backticks for file paths: `serverless/workers/handler.py`
 - Use backticks for environment variables: `RUNPOD_API_KEY`
 - Use backticks for API endpoints: `/v2/endpoint_id/run`
+
+## Next Steps and Learn More Sections
+
+Use `CardGroup` with horizontal cards instead of bullet lists for "Next steps" and "Learn more" sections:
+
+```mdx
+<CardGroup cols={2}>
+  <Card title="Card title" href="/path/to/page" icon="icon-name" horizontal>
+    Brief description of the linked content.
+  </Card>
+</CardGroup>
+```
+
+Choose icons that match the content (e.g., `github` for repos, `terminal` for CLI, `book` for docs).
diff --git a/.cursor/rules/rp-styleguide.mdc b/.cursor/rules/rp-styleguide.mdc
index 8d9db909..f1704452 100644
--- a/.cursor/rules/rp-styleguide.mdc
+++ b/.cursor/rules/rp-styleguide.mdc
@@ -22,3 +22,17 @@ And number steps like this:
 "## Step 1: Create a widget"
 
 ... and so on.
+
+### Next steps and learn more sections
+
+Use `CardGroup` with horizontal cards instead of bullet lists for "Next steps" and "Learn more" sections:
+
+```mdx
+<CardGroup cols={2}>
+  <Card title="Card title" href="/path/to/page" icon="icon-name" horizontal>
+    Brief description of the linked content.
+  </Card>
+</CardGroup>
+```
+
+Choose icons that match the content (e.g., `github` for repos, `terminal` for CLI, `book` for docs).

From d6d0910e8b7a4792bd6e695ad907c827c0ee208a Mon Sep 17 00:00:00 2001
From: Mo King <muhsinking@gmail.com>
Date: Fri, 20 Mar 2026 13:02:49 -0400
Subject: [PATCH 7/8] remove testing improvement plan

---
 tests/IMPROVEMENT_PLAN.md | 146 --------------------------------------
 1 file changed, 146 deletions(-)
 delete mode 100644 tests/IMPROVEMENT_PLAN.md

diff --git a/tests/IMPROVEMENT_PLAN.md b/tests/IMPROVEMENT_PLAN.md
deleted file mode 100644
index 755725b8..00000000
--- a/tests/IMPROVEMENT_PLAN.md
+++ /dev/null
@@ -1,146 +0,0 @@
-# Testing Framework Improvement Plan
-
-Based on feedback from PR #561 review by @runpod-Henrik.
-
-## Immediate Fixes (Blockers)
-
-### 1. MCP tool name typo
-**File:** `.claude/testing.md` line 31
-**Issue:** References `mcp__runpod-dops__search_runpod_documentation` but server is `runpod-docs`
-**Fix:** Change to `mcp__runpod-docs__search_runpod_documentation`
-**Status:** [x] DONE
-
-### 2. Test table format mismatch
-**Files:** `tests/README.md`, `.claude/testing.md`, `tests/TESTS.md`
-**Issue:** Docs say tables have `ID | Goal | Cleanup` but actual tables have `ID | Goal | Difficulty`
-**Options:**
-- A) Add Cleanup column back to tables (more explicit per-test)
-- B) Update docs to say cleanup rules are global (simpler, current reality)
-**Recommendation:** Option B - cleanup rules ARE global (by resource type), not per-test
-**Status:** [x] DONE - Updated docs to describe actual format with global cleanup rules
-
-### 3. Port limit accuracy
-**File:** `runpodctl/reference/runpodctl-create-pod.mdx`
-**Issue:** Changed from "1 HTTP + 1 TCP" to "10 HTTP + multiple TCP" - needs verification
-**Action:** Verify actual runpodctl behavior before merging
-**Status:** [x] VERIFIED - `pods/configuration/expose-ports.mdx` confirms "Expose HTTP Ports (Max 10)"
-
-## Nits
-
-### 4. Missing trailing newline in .gitignore
-**Status:** [x] DONE
-
-### 5. Double `---` separator in TESTS.md
-**Status:** [x] DONE
-
----
-
-## Structural Improvements (Future Work)
-
-Henrik correctly identified that this is currently a **catalog**, not a **framework**. Here's a plan to evolve it:
-
-### Phase 1: Cleanup Safety Net (Quick Win)
-**Status:** ✅ DONE
-
-Created `tests/scripts/cleanup.py`:
-- Lists and deletes resources matching `doc_test_*` prefix
-- Supports dry-run mode (default) and `--delete` flag
-- Handles pods, endpoints, templates, and network volumes
-- Can be run standalone or in CI
-
-Also updated `.claude/testing.md` with cleanup instructions for Claude Code.
-
-### Phase 2: Smoke Test Tier
-**Status:** ✅ DONE
-
-Added 12 smoke tests that don't require GPU deploys:
-- SDK installs: `sdk-python-install`, `sdk-js-install`
-- CLI: `cli-install`, `cli-configure`, `cli-list-pods`
-- Read-only: `template-list`, `serverless-metrics`
-- Config: `api-key-create`, `pods-add-ssh-key`
-- Public endpoints: `public-flux`, `public-qwen`, `public-video`
-
-Created separate "Smoke Tests" section in TESTS.md.
-Updated `.claude/testing.md` with test tier instructions.
-
-### Phase 3: Success Criteria
-**Status:** ✅ DONE
-
-Added "Expected Outcome" column to all test tables with objective, measurable criteria:
-- `Pod status is RUNNING`
-- `Endpoint responds to /health`
-- `SSH session established`
-- etc.
-
-Now each test has a clear PASS/FAIL condition.
-
-### Phase 4: Automation Layer
-**Status:** ⏸️ DEFERRED
-
-Requires Claude Code in CI or custom API runner. Skipped for now - tests run manually.
-
-Options for future:
-1. Claude Code headless mode (when available)
-2. Custom runner script with Anthropic API
-3. GitHub Action with Claude CLI
-
-### Phase 5: Results Tracking
-**Status:** ✅ DONE
-
-- Reports saved to **two locations**:
-  - `tests/reports/` (gitignored, in repo)
-  - `~/Dev/doc-tests/` (persistent local archive)
-- Enhanced report template with:
-  - Git SHA and branch
-  - Structured metadata table
-  - Steps taken section
-  - Actual vs expected results
-- Instructions for comparing runs over time
-
-### Phase 6: Convenience Tooling (Added)
-**Status:** ✅ DONE
-
-Based on trial run feedback, added:
-
-1. **`/test` command** (`.claude/commands/test.md`)
-   - Loads test definition and execution rules
-   - Supports `local` flag for local docs mode
-   - Supports `smoke` for running smoke tests
-
-2. **`report.py` script** (`tests/scripts/report.py`)
-   - Auto-generates report template with metadata
-   - Pulls goal and expected outcome from TESTS.md
-   - Saves to both report locations
-
-3. **`stats.py` script** (`tests/scripts/stats.py`)
-   - Analyzes historical test reports
-   - Shows pass rates overall and by test
-   - Lists recent runs and failures
-
-### Phase 7: GPU Fallback Guidance (Added)
-**Status:** ✅ DONE
-
-Based on flash-quickstart test failure (RTX 4090 unavailable), added:
-
-1. **Queue timeout thresholds** - When to wait vs try fallback
-2. **Fallback GPU order** - L4 → A4000 → RTX 3090
-3. **Cloud type fallbacks** - Secure → Community
-4. **Status marking guidance** - PASS/PARTIAL/FAIL based on GPU used
-
----
-
-## Discussion Points
-
-1. **How often should full suite run?** Weekly? Monthly? On-demand only?
-2. **Budget for test runs?** ~$5-10 per full run was mentioned
-3. **Who reviews test reports?** Auto-file issues for failures?
-4. **Should we version the test definitions?** Track which tests existed at which doc version?
-
----
-
-## Next Steps
-
-1. ~~Fix blockers (#1, #2, #3, #4, #5) immediately~~ ✅ All complete
-2. Merge PR with fixes
-3. Create issues for Phase 1-5 improvements
-4. Discuss automation priorities with team

From b6e21e1c9d01bf0ac9b4ec6556fae199c16c2249 Mon Sep 17 00:00:00 2001
From: Mo King <muhsinking@gmail.com>
Date: Fri, 20 Mar 2026 18:47:50 -0400
Subject: [PATCH 8/8] Add test batches

---
 .claude/commands/test.md        | 120 ++++++++++++++++++++++----------
 .claude/style-guide.md          |   2 +
 .claude/testing.md              |  75 ++++++++++++++++----
 .cursor/rules/rp-styleguide.mdc |   4 +-
 CLAUDE.md                       |   4 +-
 tests/TESTS.md                  |  15 ++--
 6 files changed, 158 insertions(+), 62 deletions(-)

diff --git a/.claude/commands/test.md b/.claude/commands/test.md
index b29a703f..813fdb90 100644
--- a/.claude/commands/test.md
+++ b/.claude/commands/test.md
@@ -5,40 +5,94 @@ Run a test from the testing framework to validate documentation quality.
 ## Usage
 
 ```
-/test <test-id>
-/test <test-id> local
-/test smoke
+/test <test-id>              # Run single test
+/test <test-id> local        # Run with local docs
+/test <category>             # Run all tests in category
+/test <category> local       # Run category with local docs
+/test smoke                  # Run smoke tests only
 ```
 
 ## Arguments
 
-- `<test-id>`: The test ID from `tests/TESTS.md` (e.g., `pods-quickstart-terminal`, `flash-quickstart`)
+- `<test-id>`: Single test ID (e.g., `pods-quickstart-terminal`, `flash-quickstart`)
+- `<category>`: Category name to run all tests in that section
 - `local`: (Optional) Use local MDX files instead of published docs
 - `smoke`: Run all smoke tests
 
-## Execution Rules
-
-When running a test, you MUST follow these rules:
-
-1. **Read the test definition** from `tests/TESTS.md` - find the row matching the test ID
-2. **Do NOT use prior knowledge** - only use Runpod docs (published MCP or local MDX)
+## Categories
+
+| Category | Tests | Description |
+|----------|-------|-------------|
+| `smoke` | 12 | Fast tests, no GPU deploys |
+| `flash` | 13 | Flash SDK tests |
+| `serverless` | 20 | Serverless endpoint tests |
+| `vllm` | 6 | vLLM deployment tests |
+| `pods` | 11 | Pod management tests |
+| `storage` | 11 | Network volume tests |
+| `templates` | 6 | Template tests |
+| `clusters` | 4 | Instant Cluster tests |
+| `sdk` | 8 | SDK and API tests |
+| `cli` | 6 | runpodctl tests |
+| `integrations` | 4 | Third-party integrations |
+| `public` | 3 | Public endpoint tests |
+| `tutorials` | 9 | End-to-end tutorials |
+
+## Single Test Execution
+
+When running a single test:
+
+1. **Read the test definition** from `tests/TESTS.md`
+2. **Do NOT use prior knowledge** - only use Runpod docs
 3. **Doc source mode**:
-   - Default: Use `mcp__runpod-docs__search_runpod_documentation` for published docs
-   - If `local` specified: Search and read `.mdx` files in this repository
-4. **Resource naming**: All created resources MUST use `doc_test_` prefix
-5. **Attempt the goal** using available tools (MCP for API, Bash for CLI)
-6. **Handle GPU availability** - see GPU Fallback section below
+   - Default: Use `mcp__runpod-docs__search_runpod_documentation`
+   - If `local`: Search and read `.mdx` files in this repository
+4. **Resource naming**: All resources MUST use `doc_test_` prefix
+5. **Attempt the goal** using available tools
+6. **Handle GPU availability** - see GPU Fallback section
 7. **Verify the Expected Outcome** from the test definition
-8. **Clean up** all `doc_test_*` resources after the test
-9. **Generate report** using the helper script:
-   ```bash
-   python tests/scripts/report.py <test-id> <PASS|FAIL|PARTIAL> [--local]
-   ```
-10. **Complete the report** by filling in the generated template
+8. **Clean up** all `doc_test_*` resources
+9. **Generate report**: `python tests/scripts/report.py <test-id> <PASS|FAIL|PARTIAL> [--local]`
+10. **Complete the report** with actual results
 
-## GPU Fallback Guidance
+## Batch Execution
+
+When running a category (e.g., `/test serverless`):
+
+1. **Parse category** - Identify all test IDs in that section of TESTS.md
+2. **Show test list** - Display tests to be run and ask for confirmation
+3. **Run sequentially** - Execute each test following single test rules
+4. **Track results** - Record PASS/FAIL/PARTIAL for each
+5. **Clean up between tests** - Delete all `doc_test_*` resources before next test
+6. **Generate summary** - Create batch summary report at end
+
+### Batch Summary Format
+
+After running all tests in a batch, output:
+
+```markdown
+## Batch Summary: <category>
+
+| Test ID | Status | Notes |
+|---------|--------|-------|
+| test-1 | PASS | |
+| test-2 | FAIL | Missing docs for X |
+| test-3 | PARTIAL | Used fallback GPU |
 
-GPU availability varies. When tests require GPU resources:
+**Results:** X passed, Y failed, Z partial out of N tests
+**Doc Source:** Published / Local
+**Date:** YYYY-MM-DD HH:MM
+```
+
+Save the summary to:
+- `tests/reports/batch-<category>-<timestamp>.md`
+- `~/Dev/doc-tests/batch-<category>-<timestamp>.md`
+
+### Batch Options
+
+- **Stop on failure**: By default, continue through all tests. User can say "stop on first failure"
+- **Skip cleanup**: User can say "skip cleanup between tests" for speed (not recommended)
+
+## GPU Fallback Guidance
 
 | Queue Wait | Action |
 |------------|--------|
@@ -53,21 +107,11 @@ GPU availability varies. When tests require GPU resources:
 - PARTIAL: Completed with fallback GPU (doc improvement needed)
 - FAIL: Failed even with fallbacks
 
-## Report Locations
-
-Reports are saved to both:
-- `tests/reports/<test-id>-<timestamp>.md` (gitignored)
-- `~/Dev/doc-tests/<test-id>-<timestamp>.md` (persistent archive)
-
-## Example
+## Examples
 
 ```
-/test pods-quickstart-terminal local
+/test pods-quickstart-terminal       # Single test
+/test flash local                    # All Flash tests with local docs
+/test serverless                     # All Serverless tests
+/test smoke                          # Quick validation
 ```
-
-This will:
-1. Load the test definition for `pods-quickstart-terminal`
-2. Use local MDX files (not published docs)
-3. Attempt: "Complete the Pod quickstart using only the terminal"
-4. Verify: "Code runs on Pod via SSH"
-5. Clean up and generate report
diff --git a/.claude/style-guide.md b/.claude/style-guide.md
index 00cbae5d..b0b0e776 100644
--- a/.claude/style-guide.md
+++ b/.claude/style-guide.md
@@ -13,6 +13,7 @@ Follow the Runpod style guide (`.cursor/rules/rp-styleguide.mdc`) and Google Dev
 - Secure Cloud
 - Community Cloud
 - Flash
+- Public Endpoint
 
 ### Generic Terms (lowercase)
 - endpoint
@@ -23,6 +24,7 @@ Follow the Runpod style guide (`.cursor/rules/rp-styleguide.mdc`) and Google Dev
 - fine-tune
 - network volume
 - data center
+- repo
 
 ### Headings
 Always use **sentence case** for headings and titles:
diff --git a/.claude/testing.md b/.claude/testing.md
index b03bbbe8..f3f93a9f 100644
--- a/.claude/testing.md
+++ b/.claude/testing.md
@@ -8,24 +8,35 @@ Tests should be **hard to pass**. They simulate a user typing a simple request w
 
 ## Running Tests
 
-Use the `/test` command or natural language:
+Use the `/test` command:
 
 ```
-/test pods-quickstart-terminal        # Command form
-Run the flash-quickstart test         # Natural language
+/test <test-id>              # Single test with published docs
+/test <test-id> local        # Single test with local docs
+/test <category>             # All tests in category
+/test <category> local       # Category with local docs
+/test smoke                  # Smoke tests only
 ```
 
-Use the `/test` command to run tests:
-
-```
-/test pods-quickstart-terminal        # Run with published docs
-/test pods-quickstart-terminal local  # Run with local MDX files
-/test smoke                           # Run all smoke tests
-```
-
-The `/test` command loads the test definition and reminds you of the execution rules.
-
-## Test Execution Rules
+### Categories
+
+| Category | Description |
+|----------|-------------|
+| `smoke` | Fast tests, no GPU deploys |
+| `flash` | Flash SDK |
+| `serverless` | Serverless endpoints |
+| `vllm` | vLLM deployment |
+| `pods` | Pod management |
+| `storage` | Network volumes |
+| `templates` | Template management |
+| `clusters` | Instant Clusters |
+| `sdk` | SDKs and APIs |
+| `cli` | runpodctl |
+| `integrations` | Third-party integrations |
+| `public` | Public endpoints |
+| `tutorials` | End-to-end tutorials |
+
+## Single Test Execution
 
 1. Read the test definition from `tests/TESTS.md`.
 2. **Do NOT use prior knowledge** - only use Runpod docs.
@@ -39,6 +50,42 @@ The `/test` command loads the test definition and reminds you of the execution r
    ```
 8. Fill in the generated report template with actual results.
 
+## Batch Execution
+
+When running a category (e.g., `/test serverless` or `/test flash local`):
+
+1. **Parse category** - Identify all test IDs in that section of `tests/TESTS.md`
+2. **Show test list** - Display tests to be run and ask for confirmation
+3. **Run sequentially** - Execute each test following single test rules
+4. **Track results** - Record PASS/FAIL/PARTIAL for each test
+5. **Clean up between tests** - Delete all `doc_test_*` resources before starting next test
+6. **Generate summary** - Create batch summary report at end
+
+### Batch Summary Format
+
+```markdown
+## Batch Summary: <category>
+
+| Test ID | Status | Notes |
+|---------|--------|-------|
+| test-1 | PASS | |
+| test-2 | FAIL | Missing docs for X |
+| test-3 | PARTIAL | Used fallback GPU |
+
+**Results:** X passed, Y failed, Z partial out of N tests
+**Doc Source:** Published / Local
+**Date:** YYYY-MM-DD HH:MM
+```
+
+Save batch summaries to:
+- `tests/reports/batch-<category>-<timestamp>.md`
+- `~/Dev/doc-tests/batch-<category>-<timestamp>.md`
+
+### Batch Options
+
+- **Stop on failure**: By default, continue through all tests. Say "stop on first failure" to halt early.
+- **Skip tests**: Say "skip test-id" during batch to skip specific tests.
+
 ## GPU Fallback Guidance
 
 GPU availability varies by type and time. When a test requires GPU resources:
diff --git a/.cursor/rules/rp-styleguide.mdc b/.cursor/rules/rp-styleguide.mdc
index f1704452..806f6b79 100644
--- a/.cursor/rules/rp-styleguide.mdc
+++ b/.cursor/rules/rp-styleguide.mdc
@@ -5,8 +5,8 @@ alwaysApply: true
 ---
 
 Always use sentence case for headings and titles.
-These are proper nouns: Runpod, Pods, Serverless, Hub, Instant Clusters, Secure Cloud, Community Cloud, Flash.
-These are generic terms: endpoint, worker, cluster, template, handler, fine-tune, network volume.
+These are proper nouns: Runpod, Pods, Serverless, Hub, Instant Clusters, Secure Cloud, Community Cloud, Flash, Public Endpoint.
+These are generic terms: endpoint, worker, cluster, template, handler, fine-tune, network volume, data center, repo.
 
 Prefer using paragraphs to bullet points unless directly asked.
 When using bullet points, end each line with a period.
diff --git a/CLAUDE.md b/CLAUDE.md
index 47dd1de4..ab419378 100644
--- a/CLAUDE.md
+++ b/CLAUDE.md
@@ -38,6 +38,6 @@ Examples of things worth capturing:
 
 ## Terminology Quick Reference
 
-**Capitalize:** Runpod, Pods, Serverless, Hub, Instant Clusters, Flash, Secure Cloud, Community Cloud
+**Capitalize:** Runpod, Pods, Serverless, Hub, Instant Clusters, Flash, Secure Cloud, Community Cloud, Public Endpoint
 
-**Lowercase:** endpoint, worker, template, handler, network volume, data center, cluster, fine-tune
+**Lowercase:** endpoint, worker, template, handler, network volume, data center, cluster, fine-tune, repo
diff --git a/tests/TESTS.md b/tests/TESTS.md
index 0a3b606f..9ce2ac75 100644
--- a/tests/TESTS.md
+++ b/tests/TESTS.md
@@ -4,18 +4,21 @@ Minimal test definitions that simulate real user prompts. Tests are intentionall
 
 ## How to Run
 
-In Claude Code, use natural language:
+Use the `/test` command:
 
 ```
-Run the flash-quickstart test
+/test flash-quickstart           # Single test
+/test serverless                 # All serverless tests
+/test pods local                 # All pod tests with local docs
+/test smoke                      # Smoke tests only
 ```
 
-```
-Run all vLLM tests
-```
+Or natural language:
 
 ```
-Run smoke tests
+Run the flash-quickstart test
+Run all vLLM tests
+Run smoke tests using local docs
 ```
 
 ### Doc Source Modes