From 5bdd3be749c14cb0e5f6249fcae2f2fdb3fab9a3 Mon Sep 17 00:00:00 2001 From: Mo King Date: Thu, 19 Mar 2026 22:57:05 -0400 Subject: [PATCH 1/8] Add agent experience testing framework, expand .claude --- .claude/architecture.md | 150 ++++++++++++++++++++++++ .claude/development.md | 114 +++++++++++++++++++ .claude/style-guide.md | 94 ++++++++++++++++ .claude/testing.md | 84 ++++++++++++++ .gitignore | 5 + CLAUDE.md | 153 ++++--------------------- README.md | 58 ++++++++++ tests/README.md | 42 +++++++ tests/TESTS.md | 244 ++++++++++++++++++++++++++++++++++++++++ 9 files changed, 816 insertions(+), 128 deletions(-) create mode 100644 .claude/architecture.md create mode 100644 .claude/development.md create mode 100644 .claude/style-guide.md create mode 100644 .claude/testing.md create mode 100644 tests/README.md create mode 100644 tests/TESTS.md diff --git a/.claude/architecture.md b/.claude/architecture.md new file mode 100644 index 00000000..047e8fcf --- /dev/null +++ b/.claude/architecture.md @@ -0,0 +1,150 @@ +# Documentation Architecture + +## Directory Structure + +``` +mintlifydocs/ +├── docs.json # Site configuration, navigation, theme, redirects +├── CLAUDE.md # AI assistant instructions (this file's parent) +│ +├── get-started/ # Onboarding and account setup +├── flash/ # Flash SDK (Python functions on cloud GPUs) +├── serverless/ # Serverless workers, endpoints, vLLM +├── pods/ # GPU/CPU instances +├── storage/ # Network volumes, S3 API +├── hub/ # Runpod Hub and publishing +├── public-endpoints/ # Public API endpoints +├── instant-clusters/ # Multi-node GPU clusters +├── sdks/ # Python, JavaScript, Go, GraphQL SDKs +├── runpodctl/ # CLI documentation +├── api-reference/ # REST API reference +├── integrations/ # Third-party integrations +├── tutorials/ # Step-by-step guides +├── references/ # Reference tables (GPU types, billing, etc.) +├── community-solutions/ # Community-contributed content +│ +├── snippets/ # Reusable content fragments +│ ├── tooltips.jsx # Tooltip component definitions +│ └── *.mdx # Reusable MDX snippets (e.g., pricing tables) +│ +├── images/ # Static image assets +├── logo/ # Logo files +├── styles/ # Custom CSS +│ +├── scripts/ # Utility scripts +│ └── validate-tooltips.js +│ +└── helpers/ # Python scripts for generating content + ├── gpu_types.py # Generates GPU reference tables + └── sls_cpu_types.py # Generates CPU reference tables +``` + +## Configuration (docs.json) + +The `docs.json` file controls: + +- **Theme and styling**: Colors, fonts, code block themes +- **Navigation**: Tab/group/page hierarchy +- **SEO**: Meta tags, Open Graph images +- **Redirects**: URL redirects for moved/renamed pages + +### Navigation Structure + +Pages are organized in a hierarchy: +``` +tabs → groups → pages +``` + +Example: +```json +{ + "tab": "Docs", + "groups": [ + { + "group": "Serverless", + "pages": [ + "serverless/overview", + "serverless/quickstart", + { + "group": "vLLM", + "pages": ["serverless/vllm/overview", "serverless/vllm/get-started"] + } + ] + } + ] +} +``` + +Pages are referenced by file path without the `.mdx` extension. + +## MDX Files + +Each documentation page is an MDX file with: + +1. **Frontmatter** (required): + ```yaml + --- + title: "Page title" + sidebarTitle: "Shorter sidebar title" + description: "SEO description for the page." + --- + ``` + +2. **Imports** (optional): React components, tooltips, snippets +3. **Content**: Markdown with JSX components + +## Snippets + +Reusable content in `snippets/`: + +- **MDX snippets**: Embed with `import Table from '/snippets/pricing-table.mdx'` +- **JSX components**: Import specific exports like tooltips + +### Tooltips + +Tooltips provide hover definitions for technical terms. Defined in `snippets/tooltips.jsx`. + +**Structure:** +```jsx +export const PodTooltip = () => { + return ( + Pod + ); +}; +``` + +**Usage in MDX:** +```mdx +import { PodTooltip, TemplateTooltip } from "/snippets/tooltips.jsx"; + +Deploy your first GPU using a . +``` + +**Guidelines:** +- Use for Runpod-specific terms users might not know. +- Most tooltips have singular/plural variants (`PodTooltip`, `PodsTooltip`). +- Group by category: Pods, Serverless, Storage, Products, Concepts, AI/ML, Flash. +- Run `scripts/validate-tooltips.js` to check imports. + +## Adding New Pages + +1. Create `.mdx` file in the appropriate directory. +2. Add frontmatter with `title`, `sidebarTitle`, and `description`. +3. Add the page path to `docs.json` navigation. +4. Import tooltips for technical terms. + +## Redirects + +When moving or renaming pages, add to `docs.json`: +```json +{ + "redirects": [ + { "source": "/old-path", "destination": "/new-path" } + ] +} +``` diff --git a/.claude/development.md b/.claude/development.md new file mode 100644 index 00000000..1dbdcf9b --- /dev/null +++ b/.claude/development.md @@ -0,0 +1,114 @@ +# Development Guide + +## Local Development + +### Setup + +Install Mintlify globally: +```bash +npm i -g mintlify +``` + +Start the local development server: +```bash +mintlify dev +``` + +Most changes are reflected live without restarting the server. + +### Linting + +Install [Vale](https://vale.sh/docs/vale-cli/installation/), then lint files: +```bash +vale path/to/docs/ +vale path/to/*.mdx +``` + +Vale is configured with Google and Readability style guides via `.vale.ini`. + +### Python Code Formatting + +Format Python code examples in documentation: +```bash +pip install blacken-docs +git ls-files -z -- '*.mdx' | xargs -0 blacken-docs +``` + +## Helper Scripts + +### Update GPU/CPU Reference Tables + +These scripts fetch current types from Runpod's GraphQL API: +```bash +cd helpers +python gpu_types.py # Updates GPU reference tables +python sls_cpu_types.py # Updates CPU reference tables +``` + +Requirements: `requests`, `tabulate`, `pandas` (see `helpers/requirements.txt`). + +### Validate Tooltips + +Check that all imported tooltips exist: +```bash +node scripts/validate-tooltips.js +``` + +This runs automatically in CI via `.github/workflows/validate-tooltips.yml`. + +## Publishing Workflow + +1. Create a pull request with changes. +2. Request review from [@muhsinking](https://github.com/muhsinking). +3. Changes deploy automatically to production after merge to `main` branch. + +## Common Tasks + +### Add a New Page + +1. Create `.mdx` file in the appropriate directory. +2. Add frontmatter: + ```yaml + --- + title: "Full page title" + sidebarTitle: "Shorter title" + description: "SEO description." + --- + ``` +3. Add the page path to `docs.json` navigation. +4. Import tooltips for Runpod-specific terms. + +### Add a New Tooltip + +1. Open `snippets/tooltips.jsx`. +2. Add a new export in the appropriate category: + ```jsx + export const NewTermTooltip = () => { + return ( + new term + ); + }; + ``` +3. Create singular and plural variants if needed. + +### Move or Rename a Page + +1. Move/rename the `.mdx` file. +2. Update `docs.json` navigation. +3. Add a redirect in `docs.json`: + ```json + { + "redirects": [ + { "source": "/old-path", "destination": "/new-path" } + ] + } + ``` + +### Update a Pricing Table + +Edit `snippets/serverless-gpu-pricing-table.mdx` or run the helper scripts to regenerate from the API. diff --git a/.claude/style-guide.md b/.claude/style-guide.md new file mode 100644 index 00000000..fc584b87 --- /dev/null +++ b/.claude/style-guide.md @@ -0,0 +1,94 @@ +# Style Guide + +Follow the Runpod style guide (`.cursor/rules/rp-styleguide.mdc`) and Google Developer Style Guide (`.cursor/rules/google-style-guide.mdc`). + +## Capitalization and Terminology + +### Proper Nouns (always capitalize) +- Runpod +- Pods +- Serverless +- Hub +- Instant Clusters +- Secure Cloud +- Community Cloud +- Flash + +### Generic Terms (lowercase) +- endpoint +- worker +- cluster +- template +- handler +- fine-tune +- network volume +- data center + +### Headings +Always use **sentence case** for headings and titles: +- ✅ "Create a serverless endpoint" +- ❌ "Create a Serverless Endpoint" + +## Writing Style + +- Use **second person** ("you") instead of first person plural ("we"). +- Prefer **active voice** over passive voice. +- Use **American English** spelling. +- Prefer **paragraphs** over bullet points unless listing discrete items. +- When using bullet points, **end each with a period**. + +## Tutorial Structure + +Tutorials should include: + +1. **Requirements** section (not "Prerequisites") +2. Numbered steps using format: `## Step 1: Create a widget` +3. Clear expected outcomes for each step + +Example: +```markdown +## Requirements + +- A Runpod account with credits +- Docker installed locally + +## Step 1: Create a template + +Navigate to the Templates page... + +## Step 2: Deploy the endpoint + +Click Deploy and configure... +``` + +## Code Examples + +- Always use code blocks with **language identifiers**. +- **Precede** code with context explaining what it does. +- **Follow** code with explanation of key parts. +- Include a **file title** where it makes sense. + +Example: +````markdown +Create a handler function that processes image generation requests: + +```python handler.py +import runpod + +def handler(job): + prompt = job["input"]["prompt"] + # Generate image... + return {"image_url": result} + +runpod.serverless.start({"handler": handler}) +``` + +The `handler` function receives a job dictionary containing the input from the API request. +```` + +## API and Code References + +- Use backticks for inline code: `runpod.serverless.start()` +- Use backticks for file paths: `serverless/workers/handler.py` +- Use backticks for environment variables: `RUNPOD_API_KEY` +- Use backticks for API endpoints: `/v2/endpoint_id/run` diff --git a/.claude/testing.md b/.claude/testing.md new file mode 100644 index 00000000..62a571eb --- /dev/null +++ b/.claude/testing.md @@ -0,0 +1,84 @@ +# Documentation Agent Tests + +The `tests/` directory contains minimal test definitions that simulate real user prompts. Tests are intentionally sparse - the agent must figure out how to accomplish the goal using only the documentation. + +## Philosophy + +Tests should be **hard to pass**. They simulate a user typing a simple request without context. If the docs are good, the agent figures it out. If not, the test reveals gaps. + +## Running Tests + +Use natural language: +``` +Run the flash-quickstart test +Run the vllm-deploy test using local docs +Run all pods tests +``` + +## Test Execution Rules + +1. Read the test definition from `tests/TESTS.md`. +2. **Do NOT use prior knowledge** - only use Runpod docs. +3. Attempt to complete the goal using available tools. +4. All created resources must use `doc_test_` prefix. +5. Clean up resources after test. +6. Write report to `tests/reports/{test-id}-{timestamp}.md`. + +## Doc Source Modes + +### Published Docs (default) + +Use the `mcp__runpod-dops__search_runpod_documentation` tool to search the live published documentation. This tests what real users see. + +### Local Docs + +When the user says "using local docs": +- Search and read `.mdx` files directly from this repository. +- Use Glob to find files: `**/*.mdx` +- Use Grep to search content. +- Use Read to read file contents. + +This validates unpublished doc changes before they go live. + +## Report Format + +```markdown +# Test Report: {Test Name} + +**Date:** {timestamp} +**Status:** PASS | FAIL | PARTIAL + +## What Happened +Brief narrative of the attempt. + +## Where I Got Stuck +Specific points of confusion or failure. + +## Documentation Gaps +What was missing or unclear in the docs. + +## Suggestions +Specific improvements to make tests pass. +``` + +## Test Categories + +Tests are organized by product area in `tests/TESTS.md`: + +- **Flash SDK**: Deploying Python functions +- **Serverless Endpoints**: Creating and managing endpoints (must deploy real endpoints, not use public endpoints) +- **vLLM**: Deploying LLM inference (must deploy real endpoints, not use public endpoints) +- **Pods**: Creating and managing GPU instances +- **Storage**: Network volumes and file transfer +- **Templates**: Creating and using templates +- **Instant Clusters**: Multi-node deployments +- **SDKs & APIs**: Using client libraries +- **CLI (runpodctl)**: Command-line operations +- **Integrations**: Third-party tool integration +- **Tutorials**: End-to-end workflows + +## Requirements + +- Runpod API MCP server configured +- Runpod Docs MCP server configured +- Docker available for building custom images diff --git a/.gitignore b/.gitignore index 64309e45..e1e8f6b4 100644 --- a/.gitignore +++ b/.gitignore @@ -30,3 +30,8 @@ helpers/__pycache__/** */ .idea/* /.mintlify-last + +# Documentation test reports +tests/reports/ + +.serena \ No newline at end of file diff --git a/CLAUDE.md b/CLAUDE.md index 0be9f6d3..47dd1de4 100644 --- a/CLAUDE.md +++ b/CLAUDE.md @@ -1,146 +1,43 @@ # CLAUDE.md -This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository. +This file provides guidance to Claude Code when working with this repository. ## Project Overview -This is the Runpod documentation site, built using [Mintlify](https://mintlify.com/). The documentation covers Runpod's cloud GPU platform, including Serverless endpoints, Pods, storage solutions, SDKs, and APIs. +This is the Runpod documentation site, built with [Mintlify](https://mintlify.com/). The documentation covers Runpod's cloud GPU platform: Serverless endpoints, Pods, Flash SDK, storage, and APIs. -## Development Commands +## Quick Reference -### Local Development +| Topic | File | +|-------|------| +| Directory structure, navigation, snippets, tooltips | [.claude/architecture.md](.claude/architecture.md) | +| Writing style, capitalization, terminology | [.claude/style-guide.md](.claude/style-guide.md) | +| Running and writing documentation tests | [.claude/testing.md](.claude/testing.md) | +| Local dev, linting, publishing workflow | [.claude/development.md](.claude/development.md) | -Install Mintlify globally: -```bash -npm i -g mintlify -``` - -Start the local development server: -```bash -mintlify dev -``` - -Most changes will be reflected live without restarting the server. - -### Linting - -Install [vale](https://vale.sh/docs/vale-cli/installation/), then lint specific files or folders: -```bash -vale path/to/docs/ -# or -vale path/to/*.md -``` +## Key Commands -Vale is configured with Google and Readability style guides via `.vale.ini`. - -### Python Code Formatting - -For Python code examples in documentation: -```bash -python -m pip install blacken-docs -yarn format -# or directly: -git ls-files -z -- '*.md' | xargs -0 blacken-docs -``` - -### Update GPU and CPU Reference Tables - -These scripts fetch current GPU/CPU types from Runpod's GraphQL API and regenerate reference documentation: ```bash -python helpers/gpu_types.py -python helpers/sls_cpu_types.py +mintlify dev # Start local dev server +vale path/to/file.mdx # Lint documentation +node scripts/validate-tooltips.js # Check tooltip imports ``` -The scripts require: `requests`, `tabulate`, and `pandas` (see `helpers/requirements.txt`). - -## Documentation Architecture - -### Content Organization - -The site is organized into major product areas, defined in `docs.json`: - -- **Serverless**: Worker handlers, endpoints, vLLM deployments, and load balancing -- **Pods**: GPU instances, storage, templates, and connections -- **Storage**: Network volumes and S3 API -- **Hub**: Public endpoints and publishing guides -- **Instant Clusters**: Multi-node GPU clusters -- **SDKs**: Python, JavaScript, Go, and GraphQL client libraries -- **API Reference**: REST API documentation for all resources -- **Examples/Tutorials**: Step-by-step guides organized by product area -- **Community**: Community-contributed tools and solutions - -### File Structure - -- **Documentation files**: MDX (`.mdx`) files organized by product area -- **Snippets**: Reusable content fragments in `snippets/` -- **Images**: Static assets in `images/` -- **Configuration**: `docs.json` defines site structure, navigation, theme, and redirects - -### Navigation and Routing - -The `docs.json` file controls all site navigation through a hierarchical tab/group/page structure. Pages are referenced by their file path (without extension). When adding new documentation, you must update the `navigation.tabs` array in `docs.json` to make pages visible. - -### vLLM Documentation - -The vLLM section (`serverless/vllm/`) documents Runpod's vLLM worker for LLM inference. Key topics: -- vLLM overview and architecture (PagedAttention, continuous batching) -- Getting started and configuration -- Environment variable reference -- OpenAI API compatibility -- Request handling - -vLLM documentation should explain both the underlying vLLM technology and Runpod-specific integration details. - -## Style Guidelines - -Follow the Runpod style guide (`.cursor/rules/rp-styleguide.mdc`) and Google Developer Style Guide (`.cursor/rules/google-style-guide.mdc`): - -### Capitalization and Terminology - -- **Always use sentence case** for headings and titles -- **Proper nouns**: Runpod, Pods, Serverless, Hub, Instant Clusters, Secure Cloud, Community Cloud, Flash -- **Generic terms** (lowercase): endpoint, worker, cluster, template, handler, fine-tune, network volume - -### Writing Style - -- Use second person ("you") instead of first person plural ("we") -- Prefer active voice -- Use American English spelling -- Prefer paragraphs over bullet points unless specifically requested -- When using bullet points, end each with a period - -### Tutorial Structure - -Tutorials should include: -- **What you'll learn** section -- **Requirements** section (not "Prerequisites") -- Numbered steps using format: `## Step 1: Create a widget` - -### Code Examples - -- Always use code blocks with language identifiers -- Precede code with context/purpose explanation -- Follow code with explanation of key parts - -## Publishing Workflow - -1. Create a pull request with changes -2. Request review from [@muhsinking](https://github.com/muhsinking) -3. Changes deploy automatically to production after merge to `main` branch - -## Common Patterns +## Self-Improvement -### Adding New Documentation Pages +**Claude should continuously learn and improve these docs.** -1. Create `.mdx` file in appropriate directory -2. Add frontmatter with `title`, `sidebarTitle`, and `description` -3. Update `docs.json` navigation to include the page path -4. Ensure proper categorization under relevant tab/group +If you discover something that would be useful for future sessions, ask me: +> "I noticed [insight]. Would you like me to add this to `.claude/[appropriate-file].md`?" -### Using Snippets +Examples of things worth capturing: +- Patterns that work well (or don't) in this codebase +- Common mistakes to avoid +- Useful commands or workflows discovered during tasks +- Clarifications about how Runpod products work -Reusable content (like pricing tables) lives in `snippets/` and can be embedded in multiple pages to maintain consistency. +## Terminology Quick Reference -### Redirects +**Capitalize:** Runpod, Pods, Serverless, Hub, Instant Clusters, Flash, Secure Cloud, Community Cloud -When moving or renaming pages, add redirect entries to the `redirects` array in `docs.json` to maintain backward compatibility. +**Lowercase:** endpoint, worker, template, handler, network volume, data center, cluster, fine-tune diff --git a/README.md b/README.md index 7c384da0..d9b6dbac 100644 --- a/README.md +++ b/README.md @@ -63,3 +63,61 @@ pip install -r helpers/requirements.txt python3 helpers/gpu_types.py python3 helpers/sls_cpu_types.py ``` + +## Agent experience testing + +The `tests/TESTS.md` file contains test definitions for validating documentation quality through AI agent testing. Tests simulate real user prompts - a coding agent must accomplish the goal using only the documentation as it currently exists. + +### Requirements + +- [Claude Code](https://docs.anthropic.com/en/docs/claude-code) with the Runpod MCP servers configured: + ```bash + # Add Runpod API MCP server + claude mcp add runpod --scope user -e RUNPOD_API_KEY=your_key -- npx -y @runpod/mcp-server@latest + + # Add Runpod Docs MCP server + claude mcp add runpod-docs --scope user --transport http https://docs.runpod.io/mcp + ``` + +### Running tests + +In Claude Code, use natural language: + +``` +Run the flash-quickstart test +``` + +``` +Run all vLLM tests +``` + +To validate unpublished doc changes, use local docs mode: + +``` +Run the vllm-deploy test using local docs +``` + +Claude will: +1. Read the test from `tests/TESTS.md` +2. Attempt to accomplish the goal using only the docs +3. Clean up any resources created (prefixed with `doc_test_`) +4. Write a report to `tests/reports/` +5. Suggest documentation improvements + +### Test definitions + +All tests are defined in [`tests/TESTS.md`](tests/TESTS.md) as a table + +### Adding new tests + +Add a row to the appropriate section in `tests/TESTS.md` with: +- **ID**: Unique test identifier +- **Goal**: One sentence describing what the user wants +- **Cleanup**: Resource types to delete (`endpoints`, `pods`, `templates`, `network-volumes`, or `none`) + +### Reports + +Test reports are saved to `tests/reports/` (gitignored) and include: +- What worked and what didn't +- Where the agent got stuck +- Specific documentation improvement suggestions diff --git a/tests/README.md b/tests/README.md new file mode 100644 index 00000000..f4179f16 --- /dev/null +++ b/tests/README.md @@ -0,0 +1,42 @@ +# Coding Agent Experience Tests + +Tests that simulate real user prompts. A coding agent must accomplish the goal using only the documentation. + +## Philosophy + +These tests should be **hard to pass**. They simulate a user typing a simple request without context. If the docs are good, the agent can figure it out. If not, the test reveals gaps. + +## Running Tests + +In Claude Code: + +``` +Run the flash-quickstart test +``` + +``` +Run all vLLM tests +``` + +### Doc Source Modes + +- **Published docs** (default) - Uses the Runpod Docs MCP server +- **Local docs** - Reads `.mdx` files from this repo (for validating unpublished changes) + +``` +Run the vllm-deploy test using local docs +``` + +## Test Definitions + +All tests are defined in [TESTS.md](./TESTS.md) as a table with: +- **ID**: Test identifier +- **Goal**: What the user wants (one sentence) +- **Cleanup**: Resource types to delete after test + +## Reports + +Reports are saved to `reports/` (gitignored) and include: +- What worked / what didn't +- Where the agent got stuck +- Documentation improvements needed diff --git a/tests/TESTS.md b/tests/TESTS.md new file mode 100644 index 00000000..05ee21f9 --- /dev/null +++ b/tests/TESTS.md @@ -0,0 +1,244 @@ +# Documentation Agent Tests + +Minimal test definitions that simulate real user prompts. Tests are intentionally sparse - the agent must figure out how to accomplish the goal using only the documentation. + +## How to Run + +In Claude Code, use natural language: + +``` +Run the flash-quickstart test +``` + +``` +Run all vLLM tests +``` + +### Doc Source Modes + +**Published docs (default)** - Uses the Runpod Docs MCP server to search published documentation: +``` +Run the vllm-deploy test +``` + +**Local docs** - Reads MDX files directly from this repo (use to validate unpublished changes): +``` +Run the vllm-deploy test using local docs +``` + +When using local docs, the agent will search and read `.mdx` files in this repository instead of querying the MCP server. + +## Test Format + +Each test has: +- **ID**: Unique identifier for the test +- **Goal**: What a user would ask (one sentence, no hints) +- **Cleanup**: Resources to delete after test (all use `doc_test_*` prefix) + +--- + +## Flash SDK + +| ID | Goal | Difficulty | +|----|------|------------| +| flash-quickstart | Deploy a GPU function using Flash | Easy | +| flash-hello-gpu | Run a simple PyTorch function on a GPU | Easy | +| flash-sdxl | Generate an image using SDXL with Flash | Medium | +| flash-text-gen | Deploy a text generation model with Flash | Medium | +| flash-dependencies | Deploy a function with custom pip dependencies | Easy | +| flash-multi-gpu | Create an endpoint that uses multiple GPUs | Medium | +| flash-cpu-endpoint | Deploy a CPU-only endpoint with Flash | Easy | +| flash-load-balancer | Build a REST API with load balancing using Flash | Hard | +| flash-mixed-workers | Create an app with both GPU and CPU workers | Hard | +| flash-env-vars | Configure environment variables for a Flash endpoint | Easy | +| flash-idle-timeout | Set a custom idle timeout for a Flash endpoint | Easy | +| flash-app-deploy | Initialize and deploy a complete Flash app | Medium | +| flash-local-test | Test a Flash function locally before deploying | Medium | + +--- + +## Serverless Endpoints + +> **Important:** Do NOT use public endpoints for these tests. The goal is to test the full deployment workflow: create a template, deploy an endpoint, send requests, and verify the integration works. Public endpoints are a separate product and skip the deployment steps we need to validate. + +| ID | Goal | Difficulty | +|----|------|------------| +| serverless-create-endpoint | Create a serverless endpoint | Medium | +| serverless-serve-qwen | Create an endpoint to serve a Qwen model | Hard | +| serverless-custom-handler | Write a custom handler function and deploy it | Hard | +| serverless-logs | Build a custom handler that uses progress_update() to send log messages, deploy it, and verify updates appear in /status polling | Hard | +| serverless-send-request | Send a request to an existing endpoint | Easy | +| serverless-async-request | Submit an async job and poll for results | Medium | +| serverless-sync-request | Make a synchronous request to an endpoint using /runsync | Easy | +| serverless-streaming | Build a custom handler that uses yield to stream results, deploy it, and test the /stream endpoint | Hard | +| serverless-webhook | Set up webhook notifications for a serverless endpoint | Medium | +| serverless-cancel-job | Cancel a running or queued job | Easy | +| serverless-queue-delay | Create an endpoint with queue delay scaling | Medium | +| serverless-request-count | Create an endpoint with request count scaling | Medium | +| serverless-min-workers | Create an endpoint with 1 minimum active worker | Easy | +| serverless-idle-timeout | Create an endpoint with an idle timeout of 20 seconds | Easy | +| serverless-gpu-priority | Create an endpoint with GPU type priority/fallback | Medium | +| serverless-docker-deploy | Deploy an endpoint from Docker Hub | Hard | +| serverless-github-deploy | Deploy an endpoint from GitHub | Hard | +| serverless-ssh-worker | SSH into a running worker for debugging | Medium | +| serverless-metrics | View endpoint metrics (execution time, delay) | Easy | + +--- + +## vLLM + +> **Important:** Do NOT use public endpoints for these tests. Deploy your own vLLM endpoint to test the full workflow. Public endpoints skip the deployment and configuration steps we need to validate. + +| ID | Goal | Difficulty | +|----|------|------------| +| vllm-deploy | Deploy a vLLM endpoint | Medium | +| vllm-openai-compat | Use the OpenAI Python client with a vLLM endpoint | Medium | +| vllm-chat-completion | Send a chat completion request to vLLM | Easy | +| vllm-streaming | Stream responses from a vLLM endpoint | Medium | +| vllm-custom-model | Deploy a custom/fine-tuned model with vLLM | Hard | +| vllm-gated-model | Deploy a gated Hugging Face model with vLLM | Medium | + +--- + +## Pods + +| ID | Goal | Difficulty | +|----|------|------------| +| pods-create | Create a GPU Pod | Medium | +| pods-start-stop | Start and stop an existing Pod | Easy | +| pods-ssh-connect | Connect to a Pod via SSH | Medium | +| pods-expose-port | Expose a custom port on a Pod | Medium | +| pods-env-vars | Set environment variables on a Pod | Easy | +| pods-resize-storage | Resize a Pod's container or volume disk | Easy | +| pods-template-use | Deploy a Pod using a custom template | Medium | +| pods-template-create | Create a custom Pod template | Hard | +| pods-comfyui | Deploy ComfyUI on a Pod and generate an image | Hard | + +--- + +## Storage + +| ID | Goal | Difficulty | +|----|------|------------| +| storage-create-volume | Create a network volume | Easy | +| storage-attach-pod | Attach a network volume to a Pod | Medium | +| storage-attach-serverless | Attach a network volume to a Serverless endpoint | Medium | +| storage-s3-api | Access a network volume using the S3 API | Hard | +| storage-upload-s3 | Upload a file to a network volume using S3 | Hard | +| storage-download-s3 | Download a file from a network volume using S3 | Hard | +| storage-runpodctl-send | Transfer files between Pods using runpodctl | Easy | +| storage-migrate-volume | Migrate data between network volumes | Hard | +| storage-cloud-sync | Sync data with cloud storage (S3, GCS) | Hard | +| storage-scp-transfer | Transfer files to a Pod using SCP | Medium | +| storage-rsync | Sync files to a Pod using rsync | Medium | + +--- + +## Templates + +| ID | Goal | Difficulty | +|----|------|------------| +| template-create-pod | Create a Pod template | Medium | +| template-create-serverless | Create a Serverless template | Medium | +| template-list | List all templates | Easy | +| template-preload-model | Create a template with a pre-loaded model | Hard | +| template-custom-dockerfile | Create a template with a custom Dockerfile | Hard | +| template-env-vars | Add environment variables to a template | Easy | + +--- + +## Instant Clusters + +| ID | Goal | Difficulty | +|----|------|------------| +| cluster-create | Create an Instant Cluster | Medium | +| cluster-pytorch | Run distributed PyTorch training on a cluster | Hard | +| cluster-slurm | Deploy a Slurm cluster | Hard | +| cluster-axolotl | Fine-tune an LLM with Axolotl on a cluster | Hard | + +--- + +## SDKs & APIs + +| ID | Goal | Difficulty | +|----|------|------------| +| sdk-python-install | Install the Runpod Python SDK | Easy | +| sdk-python-endpoint | Use the Python SDK to call an endpoint | Easy | +| sdk-js-install | Install the Runpod JavaScript SDK | Easy | +| sdk-js-endpoint | Use the JavaScript SDK to call an endpoint | Easy | +| api-graphql-query | Make a GraphQL query to list pods | Medium | +| api-graphql-mutation | Create a resource using GraphQL mutation | Medium | +| api-key-create | Create an API key with specific permissions | Easy | +| api-key-restricted | Create a restricted API key | Medium | + +--- + +## CLI (runpodctl) + +| ID | Goal | Difficulty | +|----|------|------------| +| cli-install | Install runpodctl on your local machine | Easy | +| cli-configure | Configure runpodctl with your API key | Easy | +| cli-list-pods | List pods using runpodctl | Easy | +| cli-create-pod | Create a pod using runpodctl | Medium | +| cli-send-file | Send a file to a Pod using runpodctl | Medium | +| cli-receive-file | Receive a file from a Pod using runpodctl | Medium | + +--- + +## Model Caching + +| ID | Goal | Difficulty | +|----|------|------------| +| cache-enable | Create an endpoint with model caching enabled | Medium | + +--- + +## Integrations + +| ID | Goal | Difficulty | +|----|------|------------| +| integration-openai-migrate | Create an OpenAI-compatible endpoint | Medium | +| integration-vercel-ai | Create an image generation app with the Vercel AI SDK | Medium | +| integration-cursor | Configure Cursor to use Runpod endpoints | Medium | +| integration-skypilot | Use Runpod with SkyPilot | Hard | + +--- + +## Public Endpoints + +| ID | Goal | Difficulty | +|----|------|------------| +| public-flux | Generate an image using FLUX public endpoint | Easy | +| public-qwen | Use the Qwen3 32B public endpoint | Easy | +| public-video | Generate video using WAN public endpoint | Medium | + +--- + +## Tutorials (End-to-End) + +| ID | Goal | Difficulty | +|----|------|------------| +| tutorial-sdxl-serverless | Deploy SDXL as a serverless endpoint | Medium | +| tutorial-comfyui-pod | Deploy ComfyUI on a Pod and generate an image | Medium | +| tutorial-comfyui-serverless | Deploy ComfyUI as a serverless endpoint and generate an image | Hard | +| tutorial-gemma-chatbot | Deploy a Gemma 3 chatbot with vLLM | Medium | +| tutorial-custom-worker | Build and deploy a custom worker | Hard | +| tutorial-web-integration | Integrate a Serverless endpoint into a web application | Hard | +| tutorial-dual-mode-worker | Deploy a dual-mode (Pod/Serverless) worker | Hard | +| tutorial-model-caching | Create an endpoint with model caching enabled | Hard | +| tutorial-pytorch-cluster | Deploy a PyTorch cluster | Hard | + +--- +--- + +## Cleanup Rules + +All test resources must use the `doc_test_` prefix. After each test: + +- **endpoints**: Delete endpoints matching `doc_test_*` +- **pods**: Delete pods matching `doc_test_*` +- **templates**: Delete templates matching `doc_test_*` +- **network-volumes**: Delete network volumes matching `doc_test_*` +- **clusters**: Delete clusters matching `doc_test_*` +- **none**: No cleanup needed (read-only test) From 23b7496d8b82f14b8a421e47e6fd6d3db02eef48 Mon Sep 17 00:00:00 2001 From: Mo King Date: Fri, 20 Mar 2026 09:08:14 -0400 Subject: [PATCH 2/8] Update Pods docs to improve AX --- pods/connect-to-a-pod.mdx | 31 ++++++- pods/manage-pods.mdx | 83 +++++++++++++++++++ runpodctl/reference/runpodctl-create-pod.mdx | 2 +- serverless/development/logs.mdx | 4 + .../containers/docker-commands.mdx | 2 +- tutorials/pods/comfyui.mdx | 37 ++++++++- 6 files changed, 155 insertions(+), 4 deletions(-) diff --git a/pods/connect-to-a-pod.mdx b/pods/connect-to-a-pod.mdx index 864f7edd..a537d7bd 100644 --- a/pods/connect-to-a-pod.mdx +++ b/pods/connect-to-a-pod.mdx @@ -36,11 +36,40 @@ If **Start** doesn't respond, refresh the page. Interactive web environment for code, files, and data analysis. Available on templates with JupyterLab pre-configured (e.g., "Runpod Pytorch"). + + + 1. Deploy a Pod with a JupyterLab-compatible template (all official Runpod PyTorch templates have JupyterLab pre-configured). 2. Navigate to the [Pods page](https://console.runpod.io/pods) and click **Connect**. 3. Under **HTTP Services**, click the **Jupyter Lab** link (usually port 8888). - + + + + +Create a Pod with JupyterLab access using the CLI: + +```bash +runpodctl create pod \ + --name my-jupyter-pod \ + --gpuType "NVIDIA GeForce RTX 4090" \ + --imageName "runpod/pytorch:2.1.0-py3.10-cuda11.8.0-devel-ubuntu22.04" \ + --containerDiskSize 20 \ + --volumeSize 50 \ + --ports "8888/http" \ + --env "JUPYTER_PASSWORD=your_secure_password" +``` + +After the Pod starts, access JupyterLab at `https://[POD_ID]-8888.proxy.runpod.net`. + + +Set the `JUPYTER_PASSWORD` environment variable to configure JupyterLab authentication. If not set, some templates use a default password shown in the Pod logs. + + + + + + If the JupyterLab tab displays a blank page for more than a minute or two, try restarting the Pod and opening it again. diff --git a/pods/manage-pods.mdx b/pods/manage-pods.mdx index 7f80a09d..2172041a 100644 --- a/pods/manage-pods.mdx +++ b/pods/manage-pods.mdx @@ -18,6 +18,7 @@ runpodctl config --apiKey RUNPOD_API_KEY | **Deploy** | [Pods page](https://www.console.runpod.io/pods) → Deploy | `runpodctl create pods --name NAME --gpuType "GPU" --imageName "IMAGE"` | | **Start** | Expand Pod → Play icon | `runpodctl start pod POD_ID` | | **Stop** | Expand Pod → Stop icon | `runpodctl stop pod POD_ID` | +| **Update** | Three-dot menu → Edit Pod | — | | **Terminate** | Expand Pod → Trash icon | `runpodctl remove pod POD_ID` | | **List** | [Pods page](https://www.console.runpod.io/pods) | `runpodctl get pod` | @@ -74,6 +75,21 @@ curl --request POST \ }' ``` +To deploy a Pod from an existing template, use the `templateId` parameter instead of specifying individual configuration options: + +```bash +curl --request POST \ + --url https://rest.runpod.io/v1/pods \ + --header 'Authorization: Bearer RUNPOD_API_KEY' \ + --header 'Content-Type: application/json' \ + --data '{ + "name": "my-pod-from-template", + "templateId": "YOUR_TEMPLATE_ID", + "gpuTypeIds": ["NVIDIA GeForce RTX 4090"], + "gpuCount": 1 + }' +``` + See the [Pod API reference](/api-reference/pods/POST/pods) for all parameters. @@ -110,6 +126,16 @@ runpodctl stop pod $RUNPOD_POD_ID sleep 2h; runpodctl stop pod $RUNPOD_POD_ID & ``` + + + + +```bash +curl --request POST \ + --url "https://rest.runpod.io/v1/pods/$RUNPOD_POD_ID/stop" \ + --header 'Authorization: Bearer RUNPOD_API_KEY' +``` + @@ -131,9 +157,56 @@ Resume a stopped Pod. Note: You may be allocated [zero GPUs](/references/trouble runpodctl start pod $RUNPOD_POD_ID ``` + + + + +```bash +curl --request POST \ + --url "https://rest.runpod.io/v1/pods/$RUNPOD_POD_ID/start" \ + --header 'Authorization: Bearer RUNPOD_API_KEY' +``` + +## Update a Pod + +Modify an existing Pod's configuration, such as storage size, image, ports, or environment variables. + + +Editing a running Pod resets it completely, erasing all data not stored in `/workspace` or a network volume. + + + + + +1. Open the [Pods page](https://www.console.runpod.io/pods). +2. Click the three-dot menu next to the Pod you want to update. +3. Click **Edit Pod** and modify your configuration. +4. Click **Save** to apply changes. + + + + + +```bash +curl --request PATCH \ + --url "https://rest.runpod.io/v1/pods/$RUNPOD_POD_ID" \ + --header 'Authorization: Bearer RUNPOD_API_KEY' \ + --header 'Content-Type: application/json' \ + --data '{ + "containerDiskInGb": 100, + "volumeInGb": 200 + }' +``` + +See the [Pod API reference](/api-reference/pods/PATCH/pods/podId) for all editable fields. + + + + + ## Terminate a Pod @@ -158,6 +231,16 @@ runpodctl remove pod $RUNPOD_POD_ID runpodctl remove pods my-bulk-task --podCount 40 ``` + + + + +```bash +curl --request DELETE \ + --url "https://rest.runpod.io/v1/pods/$RUNPOD_POD_ID" \ + --header 'Authorization: Bearer RUNPOD_API_KEY' +``` + diff --git a/runpodctl/reference/runpodctl-create-pod.mdx b/runpodctl/reference/runpodctl-create-pod.mdx index 315e4816..359668da 100644 --- a/runpodctl/reference/runpodctl-create-pod.mdx +++ b/runpodctl/reference/runpodctl-create-pod.mdx @@ -93,7 +93,7 @@ Additional arguments to pass to the container when it starts. -Ports to expose from the container. Maximum of 1 HTTP port and 1 TCP port allowed (e.g., `--ports 8888/http --ports 22/tcp`). +Ports to expose from the container. Specify multiple times for multiple ports (e.g., `--ports 8888/http --ports 22/tcp`). You can expose up to 10 HTTP ports and multiple TCP ports. See [Expose ports](/pods/configuration/expose-ports) for details. ## Related commands diff --git a/serverless/development/logs.mdx b/serverless/development/logs.mdx index 270b680c..c8a066ca 100644 --- a/serverless/development/logs.mdx +++ b/serverless/development/logs.mdx @@ -51,6 +51,10 @@ To view worker logs: 4. Use the search and filtering capabilities to find specific log entries. 5. Download logs as text files for offline analysis. +## Stream output to clients + +To send progress updates or stream results to clients during job execution, see [Progress updates](/serverless/workers/handler-functions#progress-updates) and [Streaming handlers](/serverless/workers/handler-functions#streaming-handlers). + ## Troubleshooting ### Missing logs diff --git a/tutorials/introduction/containers/docker-commands.mdx b/tutorials/introduction/containers/docker-commands.mdx index 1a53e645..18a5da7c 100644 --- a/tutorials/introduction/containers/docker-commands.mdx +++ b/tutorials/introduction/containers/docker-commands.mdx @@ -290,7 +290,7 @@ docker logs --tail 100 my-container docker logs -t my-container ``` -For Runpod Serverless, you can view worker logs through the web console or API. For Pods, `docker logs` helps debug containers you're running during development. +For Runpod Serverless, you can view worker logs through the [web console](/serverless/development/logs). For Pods, `docker logs` helps debug containers you're running during development. ### docker exec diff --git a/tutorials/pods/comfyui.mdx b/tutorials/pods/comfyui.mdx index 9e189a48..e556c48d 100644 --- a/tutorials/pods/comfyui.mdx +++ b/tutorials/pods/comfyui.mdx @@ -28,7 +28,10 @@ Before you begin, you'll need: ## Step 1: Deploy a ComfyUI Pod -First, you'll deploy a Pod using the official Runpod ComfyUI template, which pre-installs ComfyUI and the ComfyUI Manager plugin: +First, you'll deploy a Pod using the official Runpod ComfyUI template, which pre-installs ComfyUI and the ComfyUI Manager plugin. + + + @@ -55,6 +58,38 @@ First, you'll deploy a Pod using the official Runpod ComfyUI template, which pre + + + + +Deploy a ComfyUI Pod programmatically using the REST API: + +```bash +curl --request POST \ + --url https://rest.runpod.io/v1/pods \ + --header 'Authorization: Bearer RUNPOD_API_KEY' \ + --header 'Content-Type: application/json' \ + --data '{ + "name": "comfyui-pod", + "imageName": "runpod/comfyui:latest", + "gpuTypeIds": ["NVIDIA GeForce RTX 4090"], + "gpuCount": 1, + "containerDiskInGb": 50, + "volumeInGb": 100, + "ports": ["8188/http", "22/tcp", "8080/http"] + }' +``` + +**Port configuration:** +- `8188/http`: ComfyUI web interface +- `22/tcp`: SSH access +- `8080/http`: File browser (optional) + +For Blackwell GPUs (RTX 5090, B200), use `runpod/comfyui:cuda12.8` instead. + + + + ## Step 2: Open the ComfyUI interface Once your Pod has finished initializing, you can open the ComfyUI interface: From ab6ac90e95c296cc6c2850807dbfba17a46b665a Mon Sep 17 00:00:00 2001 From: Mo King Date: Fri, 20 Mar 2026 09:39:52 -0400 Subject: [PATCH 3/8] Add terminal workflow to the Pods quickstart --- get-started.mdx | 113 +++++++++++++++++++++++++++++++++++++++--------- tests/TESTS.md | 2 +- 2 files changed, 94 insertions(+), 21 deletions(-) diff --git a/get-started.mdx b/get-started.mdx index fb7575ff..18547333 100644 --- a/get-started.mdx +++ b/get-started.mdx @@ -26,6 +26,9 @@ Planning to share compute resources with your team? You can convert your persona Now that you've created your account, you're ready to deploy your first Pod: + + + 1. Open the [Pods page](https://www.console.runpod.io/pods) in the web interface. 2. Click the **Deploy** button. 3. Select **A40** from the list of graphics cards (or any other GPU that's available). @@ -34,36 +37,91 @@ Now that you've created your account, you're ready to deploy your first Pod: 6. Click **Deploy On-Demand** to deploy and start your Pod. You'll be redirected back to the Pods page after a few seconds. - If you haven't set up payments yet, you'll be prompted to add a payment method and purchase credits for your account. - -## Step 3: Explore the Pod detail pane + + + + +First, [create an API key](/get-started/api-keys) if you haven't already. Then deploy your Pod: + +```bash +curl --request POST \ + --url https://rest.runpod.io/v1/pods \ + --header "Authorization: Bearer $RUNPOD_API_KEY" \ + --header "Content-Type: application/json" \ + --data '{ + "name": "quickstart-pod", + "imageName": "runpod/pytorch:2.1.0-py3.10-cuda11.8.0-devel-ubuntu22.04", + "gpuTypeIds": ["NVIDIA A40"], + "gpuCount": 1 + }' +``` + +The response includes your Pod ID. Save it for later: + +```bash +export RUNPOD_POD_ID="your-pod-id" +``` + + + + +## Step 3: Execute code on your Pod + +Once your Pod finishes initializing, connect and run some code: + + + + +1. On the [Pods page](https://www.console.runpod.io/pods), click your Pod to open the detail pane. +2. Under **HTTP Services**, click **Jupyter Lab** to open a JupyterLab workspace. +3. Under **Notebook**, select **Python 3 (ipykernel)**. +4. Type `print("Hello, world!")` in the first cell and click the play button. + + -On the [Pods page](https://www.console.runpod.io/pods), click the Pod you just created to open the Pod detail pane. The pane opens onto the **Connect** tab, where you'll find options for connecting to your Pod so you can execute code on your GPU (after it's done initializing). + -Take a minute to explore the other tabs: +Get your Pod's SSH connection details: -- **Details**: Information about your Pod, such as hardware specs, pricing, and storage. -- **Telemetry**: Realtime utilization metrics for your Pod's CPU, memory, and storage. -- **Logs**: Logs streamed from your container (including stdout from any applications inside) and the Pod management system. -- **Template Readme**: Details about the your Pod is running. Your Pod is configured with the latest official Runpod template. +```bash +curl --request GET \ + --url "https://rest.runpod.io/v1/pods/$RUNPOD_POD_ID" \ + --header "Authorization: Bearer $RUNPOD_API_KEY" +``` -## Step 4: Execute code on your Pod with JupyterLab +Extract the `ip` and `port` from the response's `runtime.ports` array (look for port 22), then connect: -1. Go back to the **Connect** tab, and under **HTTP Services**, click **Jupyter Lab** to open a JupyterLab workspace on your Pod. -2. Under **Notebook**, select **Python 3 (ipykernel)**. -3. Type `print("Hello, world!")` in the first line of the notebook. -4. Click the play button to run your code. +```bash +ssh root@ -p -i ~/.ssh/your_key +python3 -c "print('Hello, world!')" +``` + + +You'll need an [SSH key added to your account](/pods/configuration/use-ssh) for this to work. + + + + Congratulations! You just ran your first line of code on Runpod. -## Step 5: Clean up +## Step 4: Clean up + +To avoid incurring unnecessary charges, clean up your Pod resources. + + +Terminating a Pod permanently deletes all data that isn't stored in a . Be sure that you've saved any data you might need to access again. To learn more about how storage works, see the [Pod storage overview](/pods/storage/types). + + + + -To avoid incurring unnecessary charges, follow these steps to clean up your Pod resources: +To stop your Pod: 1. Return to the [Pods page](https://www.console.runpod.io/pods) and click your running Pod. 2. Click the **Stop** button (pause icon) to stop your Pod. @@ -76,13 +134,28 @@ To terminate your Pod: 1. Click the **Terminate** button (trash icon). 2. Click **Terminate Pod** to confirm. - + -Terminating a Pod permanently deletes all data that isn't stored in a . Be sure that you've saved any data you might need to access again. + -To learn more about how storage works, see the [Pod storage overview](/pods/storage/types). +Stop your Pod: - +```bash +curl --request POST \ + --url "https://rest.runpod.io/v1/pods/$RUNPOD_POD_ID/stop" \ + --header "Authorization: Bearer $RUNPOD_API_KEY" +``` + +You'll still be charged a small amount for storage on stopped Pods (\$0.20 per GB per month). If you don't need to retain any data on your Pod, terminate it completely: + +```bash +curl --request DELETE \ + --url "https://rest.runpod.io/v1/pods/$RUNPOD_POD_ID" \ + --header "Authorization: Bearer $RUNPOD_API_KEY" +``` + + + ## Next steps diff --git a/tests/TESTS.md b/tests/TESTS.md index 05ee21f9..74ebe9a9 100644 --- a/tests/TESTS.md +++ b/tests/TESTS.md @@ -59,7 +59,7 @@ Each test has: ## Serverless Endpoints -> **Important:** Do NOT use public endpoints for these tests. The goal is to test the full deployment workflow: create a template, deploy an endpoint, send requests, and verify the integration works. Public endpoints are a separate product and skip the deployment steps we need to validate. +> **Important:** Do NOT use public endpoints for these tests. The goal is to test the full deployment workflow: deploy an endpoint, send requests, and verify the integration works. Public endpoints are a separate product and skip the deployment steps we need to validate. | ID | Goal | Difficulty | |----|------|------------| From 4f001b0a094f5ffb3f348bacfc87377e7757d01c Mon Sep 17 00:00:00 2001 From: Mo King Date: Fri, 20 Mar 2026 09:50:48 -0400 Subject: [PATCH 4/8] Improve Pod quickstart terminal steps --- get-started.mdx | 58 ++++++++++++++++++++++++++++++++++--------------- tests/TESTS.md | 1 + 2 files changed, 41 insertions(+), 18 deletions(-) diff --git a/get-started.mdx b/get-started.mdx index 18547333..b4549a2f 100644 --- a/get-started.mdx +++ b/get-started.mdx @@ -16,12 +16,6 @@ Start by creating a Runpod account: 2. Verify your email address. 3. Set up two-factor authentication (recommended for security). - - -Planning to share compute resources with your team? You can convert your personal account to a team account later. See [Manage accounts](/get-started/manage-accounts) for details. - - - ## Step 2: Deploy a Pod Now that you've created your account, you're ready to deploy your first Pod: @@ -44,7 +38,13 @@ If you haven't set up payments yet, you'll be prompted to add a payment method a -First, [create an API key](/get-started/api-keys) if you haven't already. Then deploy your Pod: +First, [create an API key](/get-started/api-keys) if you haven't already. Export it as an environment variable: + +```bash +export RUNPOD_API_KEY="your-api-key" +``` + +Then deploy your Pod: ```bash curl --request POST \ @@ -59,10 +59,21 @@ curl --request POST \ }' ``` -The response includes your Pod ID. Save it for later: +The response includes your Pod ID: + +```json +{ + "id": "uv9wy55tyv30lo", + "name": "quickstart-pod", + "desiredStatus": "RUNNING", + ... +} +``` + +Save it for later: ```bash -export RUNPOD_POD_ID="your-pod-id" +export RUNPOD_POD_ID="uv9wy55tyv30lo" ``` @@ -84,6 +95,10 @@ Once your Pod finishes initializing, connect and run some code: + +You'll need an [SSH key added to your account](/pods/configuration/use-ssh) for this to work. + + Get your Pod's SSH connection details: ```bash @@ -92,17 +107,26 @@ curl --request GET \ --header "Authorization: Bearer $RUNPOD_API_KEY" ``` -Extract the `ip` and `port` from the response's `runtime.ports` array (look for port 22), then connect: +The response includes `publicIp` and `portMappings`: + +```json +{ + "id": "uv9wy55tyv30lo", + "publicIp": "194.68.245.207", + "portMappings": { + "22": 22100 + }, + ... +} +``` + +Use these values to connect via SSH: ```bash -ssh root@ -p -i ~/.ssh/your_key +ssh root@194.68.245.207 -p 22100 python3 -c "print('Hello, world!')" ``` - -You'll need an [SSH key added to your account](/pods/configuration/use-ssh) for this to work. - - @@ -115,7 +139,7 @@ Congratulations! You just ran your first line of code on Runpod. To avoid incurring unnecessary charges, clean up your Pod resources. -Terminating a Pod permanently deletes all data that isn't stored in a . Be sure that you've saved any data you might need to access again. To learn more about how storage works, see the [Pod storage overview](/pods/storage/types). +Terminating a Pod permanently deletes all data that isn't stored in a . Be sure that you've saved any data you might need to access again. @@ -159,8 +183,6 @@ curl --request DELETE \ ## Next steps -Now that you've learned the basics, you're ready to: - Create API keys for programmatic resource management. diff --git a/tests/TESTS.md b/tests/TESTS.md index 74ebe9a9..0f733b84 100644 --- a/tests/TESTS.md +++ b/tests/TESTS.md @@ -104,6 +104,7 @@ Each test has: | ID | Goal | Difficulty | |----|------|------------| +| pods-quickstart-terminal | Complete the Pod quickstart using only the terminal | Easy | | pods-create | Create a GPU Pod | Medium | | pods-start-stop | Start and stop an existing Pod | Easy | | pods-ssh-connect | Connect to a Pod via SSH | Medium | From 6201fdfbb6d2f28ab5f76ed9119fb699911442e2 Mon Sep 17 00:00:00 2001 From: Mo King Date: Fri, 20 Mar 2026 12:42:06 -0400 Subject: [PATCH 5/8] Improve agent tests --- .claude/commands/test.md | 73 ++++++++ .claude/testing.md | 214 +++++++++++++++++++++-- .gitignore | 2 +- flash/quickstart.mdx | 2 +- pods/configuration/use-ssh.mdx | 42 +++-- tests/IMPROVEMENT_PLAN.md | 146 ++++++++++++++++ tests/README.md | 19 ++- tests/TESTS.md | 299 +++++++++++++++++++-------------- tests/scripts/README.md | 75 +++++++++ tests/scripts/cleanup.py | 218 ++++++++++++++++++++++++ tests/scripts/report.py | 143 ++++++++++++++++ tests/scripts/stats.py | 170 +++++++++++++++++++ 12 files changed, 1237 insertions(+), 166 deletions(-) create mode 100644 .claude/commands/test.md create mode 100644 tests/IMPROVEMENT_PLAN.md create mode 100644 tests/scripts/README.md create mode 100755 tests/scripts/cleanup.py create mode 100755 tests/scripts/report.py create mode 100755 tests/scripts/stats.py diff --git a/.claude/commands/test.md b/.claude/commands/test.md new file mode 100644 index 00000000..b29a703f --- /dev/null +++ b/.claude/commands/test.md @@ -0,0 +1,73 @@ +# /test - Run a documentation test + +Run a test from the testing framework to validate documentation quality. + +## Usage + +``` +/test +/test local +/test smoke +``` + +## Arguments + +- ``: The test ID from `tests/TESTS.md` (e.g., `pods-quickstart-terminal`, `flash-quickstart`) +- `local`: (Optional) Use local MDX files instead of published docs +- `smoke`: Run all smoke tests + +## Execution Rules + +When running a test, you MUST follow these rules: + +1. **Read the test definition** from `tests/TESTS.md` - find the row matching the test ID +2. **Do NOT use prior knowledge** - only use Runpod docs (published MCP or local MDX) +3. **Doc source mode**: + - Default: Use `mcp__runpod-docs__search_runpod_documentation` for published docs + - If `local` specified: Search and read `.mdx` files in this repository +4. **Resource naming**: All created resources MUST use `doc_test_` prefix +5. **Attempt the goal** using available tools (MCP for API, Bash for CLI) +6. **Handle GPU availability** - see GPU Fallback section below +7. **Verify the Expected Outcome** from the test definition +8. **Clean up** all `doc_test_*` resources after the test +9. **Generate report** using the helper script: + ```bash + python tests/scripts/report.py [--local] + ``` +10. **Complete the report** by filling in the generated template + +## GPU Fallback Guidance + +GPU availability varies. When tests require GPU resources: + +| Queue Wait | Action | +|------------|--------| +| < 2 min | Keep waiting | +| 2-5 min | Try fallback GPU | +| > 5 min | Use fallback or mark blocked | + +**Fallback order**: L4 → A4000 → RTX 3090 (Community Cloud) + +**Status marking**: +- PASS: Completed with documented GPU +- PARTIAL: Completed with fallback GPU (doc improvement needed) +- FAIL: Failed even with fallbacks + +## Report Locations + +Reports are saved to both: +- `tests/reports/-.md` (gitignored) +- `~/Dev/doc-tests/-.md` (persistent archive) + +## Example + +``` +/test pods-quickstart-terminal local +``` + +This will: +1. Load the test definition for `pods-quickstart-terminal` +2. Use local MDX files (not published docs) +3. Attempt: "Complete the Pod quickstart using only the terminal" +4. Verify: "Code runs on Pod via SSH" +5. Clean up and generate report diff --git a/.claude/testing.md b/.claude/testing.md index 62a571eb..b03bbbe8 100644 --- a/.claude/testing.md +++ b/.claude/testing.md @@ -8,27 +8,131 @@ Tests should be **hard to pass**. They simulate a user typing a simple request w ## Running Tests -Use natural language: +Use the `/test` command or natural language: + +``` +/test pods-quickstart-terminal # Command form +Run the flash-quickstart test # Natural language +``` + +Use the `/test` command to run tests: + ``` -Run the flash-quickstart test -Run the vllm-deploy test using local docs -Run all pods tests +/test pods-quickstart-terminal # Run with published docs +/test pods-quickstart-terminal local # Run with local MDX files +/test smoke # Run all smoke tests ``` +The `/test` command loads the test definition and reminds you of the execution rules. + ## Test Execution Rules 1. Read the test definition from `tests/TESTS.md`. 2. **Do NOT use prior knowledge** - only use Runpod docs. 3. Attempt to complete the goal using available tools. 4. All created resources must use `doc_test_` prefix. -5. Clean up resources after test. -6. Write report to `tests/reports/{test-id}-{timestamp}.md`. +5. Handle GPU availability issues (see GPU Fallback section below). +6. Clean up resources after test (see Cleanup section below). +7. Generate report using the helper script: + ```bash + python3 tests/scripts/report.py [--local] + ``` +8. Fill in the generated report template with actual results. + +## GPU Fallback Guidance + +GPU availability varies by type and time. When a test requires GPU resources: + +### Queue Timeout Thresholds + +| Wait Time | Action | +|-----------|--------| +| < 2 min | Normal, keep waiting | +| 2-5 min | Consider trying fallback GPU | +| > 5 min | Use fallback GPU or mark test blocked | + +### Fallback GPU Order + +If the documented GPU type is unavailable, try these in order: + +1. **First choice**: GPU specified in docs (tests the docs as-is) +2. **Fallback 1**: NVIDIA L4 (good availability, cost-effective) +3. **Fallback 2**: NVIDIA A4000 (broad availability) +4. **Fallback 3**: RTX 3090 (community cloud) + +### When to Use Fallbacks + +- **Test the docs first**: Always try the GPU specified in documentation first. +- **Document the issue**: If you must use a fallback, note it in the report as a documentation gap. +- **Mark appropriately**: + - PASS: Test completed with documented GPU + - PARTIAL: Test completed with fallback GPU (doc improvement needed) + - FAIL: Test failed even with fallbacks + +### Cloud Type Fallbacks + +If Secure Cloud has no availability: +1. Try Community Cloud for the same GPU type +2. Note cloud type used in the test report + +### Example Report Note + +```markdown +## Documentation Gaps +GPU availability: Docs specify RTX 4090 but none available after 3 min wait. +Used fallback: NVIDIA L4 on Community Cloud. +Suggestion: Add note about GPU availability or use more available GPU in example. +``` + +## Cleanup + +All test resources use the `doc_test_` prefix. Clean up after each test to avoid orphaned resources. + +### During Tests (Claude Code) + +After completing a test, use the Runpod MCP tools to delete created resources: + +``` +# List and identify test resources +mcp__runpod__list-pods (filter by name starting with "doc_test_") +mcp__runpod__list-endpoints +mcp__runpod__list-templates +mcp__runpod__list-network-volumes + +# Delete matching resources +mcp__runpod__delete-pod (podId) +mcp__runpod__delete-endpoint (endpointId) +mcp__runpod__delete-template (templateId) +mcp__runpod__delete-network-volume (networkVolumeId) +``` + +### Manual Cleanup (Standalone Script) + +Run the cleanup script to find and delete orphaned test resources: + +```bash +# Dry run - see what would be deleted +python tests/scripts/cleanup.py + +# Actually delete resources +python tests/scripts/cleanup.py --delete +``` + +### Cleanup Command + +Users can request cleanup directly: +``` +Clean up test resources +Delete all doc_test_ resources +``` + +When this is requested, list all resources matching `doc_test_*` and delete them after confirmation. ## Doc Source Modes ### Published Docs (default) -Use the `mcp__runpod-dops__search_runpod_documentation` tool to search the live published documentation. This tests what real users see. +Use the `mcp__runpod-docs__search_runpod_documentation` tool to search the live published documentation. This tests what real users see. ### Local Docs @@ -40,25 +144,103 @@ When the user says "using local docs": This validates unpublished doc changes before they go live. +## Test Tiers + +### Smoke Tests + +Fast tests that don't require GPU deployments. Run these for quick validation: + +``` +Run smoke tests +Run all smoke tests using local docs +``` + +Smoke tests are listed in the "Smoke Tests" section of `tests/TESTS.md`. They include: +- SDK/CLI installation tests +- Read-only API tests (list templates, view metrics) +- Public endpoint tests (FLUX, Qwen) +- Account configuration tests (SSH keys, API keys) + +### Full Tests + +All tests including GPU deployments. Use for comprehensive validation: + +``` +Run all tests +Run all serverless tests +``` + +Full tests may create billable resources. Always clean up after. + ## Report Format +Save reports to **both** locations: +1. `tests/reports/{test-id}-{YYYYMMDD-HHMMSS}.md` (gitignored, in repo) +2. `~/Dev/doc-tests/{test-id}-{YYYYMMDD-HHMMSS}.md` (persistent archive) + +Use this template: + ```markdown -# Test Report: {Test Name} +# Test Report: {Test ID} + +## Metadata +| Field | Value | +|-------|-------| +| **Test ID** | {test-id} | +| **Date** | {YYYY-MM-DD HH:MM:SS} | +| **Git SHA** | {git rev-parse --short HEAD} | +| **Git Branch** | {git branch --show-current} | +| **Doc Source** | Published / Local | +| **Status** | PASS / FAIL / PARTIAL | -**Date:** {timestamp} -**Status:** PASS | FAIL | PARTIAL +## Goal +{Copy the goal from TESTS.md} -## What Happened -Brief narrative of the attempt. +## Expected Outcome +{Copy from TESTS.md} -## Where I Got Stuck -Specific points of confusion or failure. +## Actual Result +{What actually happened - be specific} + +## Steps Taken +1. {First thing tried} +2. {Second thing tried} +... ## Documentation Gaps -What was missing or unclear in the docs. +{What was missing or unclear - be specific about which page/section} ## Suggestions -Specific improvements to make tests pass. +{Concrete improvements to make this test pass} +``` + +### Comparing Runs + +Reports in `~/Dev/doc-tests/` persist across git operations. To compare runs: +```bash +# List all runs for a test +ls ~/Dev/doc-tests/flash-quickstart-*.md + +# Diff two runs +diff ~/Dev/doc-tests/flash-quickstart-20240115-100000.md ~/Dev/doc-tests/flash-quickstart-20240120-140000.md +``` + +### Tracking Pass Rates + +Use the stats script to analyze historical results: + +```bash +# Overall summary +python3 tests/scripts/stats.py + +# Group by test +python3 tests/scripts/stats.py --by-test + +# Recent runs +python3 tests/scripts/stats.py --recent 10 + +# Show failures +python3 tests/scripts/stats.py --failures ``` ## Test Categories diff --git a/.gitignore b/.gitignore index e1e8f6b4..0c0d2bcb 100644 --- a/.gitignore +++ b/.gitignore @@ -34,4 +34,4 @@ helpers/__pycache__/** */ # Documentation test reports tests/reports/ -.serena \ No newline at end of file +.serena diff --git a/flash/quickstart.mdx b/flash/quickstart.mdx index e2a3aee3..4a20d990 100644 --- a/flash/quickstart.mdx +++ b/flash/quickstart.mdx @@ -65,7 +65,7 @@ from runpod_flash import Endpoint, GpuType @Endpoint( name="flash-quickstart", - gpu=GpuType.NVIDIA_GEFORCE_RTX_4090, + gpu=GpuGroup.ANY, # Use any available GPU workers=3, dependencies=["numpy", "torch"] ) diff --git a/pods/configuration/use-ssh.mdx b/pods/configuration/use-ssh.mdx index 49a0ee85..b149ba81 100644 --- a/pods/configuration/use-ssh.mdx +++ b/pods/configuration/use-ssh.mdx @@ -33,26 +33,38 @@ SSH key authentication is recommended for security and convenience. - - Run this command on your local terminal to retrieve the public SSH key you just generated: - - ```sh - cat ~/.ssh/id_ed25519.pub - ``` - - This will output something similar to this: + - ```sh - ssh-ed25519 AAAAC4NzaC1lZDI1JTE5AAAAIGP+L8hnjIcBqUb8NRrDiC32FuJBvRA0m8jLShzgq6BQ YOUR_EMAIL@DOMAIN.COM - ``` - + + - - Copy and paste your public key from the previous step into the **SSH Public Keys** field in your [Runpod user account settings](https://www.console.runpod.io/user/settings). + 1. Run `cat ~/.ssh/id_ed25519.pub` to display your public key. + 2. Copy the output (starts with `ssh-ed25519`). + 3. Paste it into the **SSH Public Keys** field in your [Runpod account settings](https://www.console.runpod.io/user/settings). - If you need to add multiple SSH keys to your Runpod account, make sure that each key pair is on its own line in the **SSH Public Keys** field. + If you need to add multiple SSH keys, make sure each key is on its own line. + + + + + + Use [runpodctl](/runpodctl/overview) to add your key directly: + + ```sh + runpodctl ssh add-key --key-file ~/.ssh/id_ed25519.pub + ``` + + Verify it was added: + + ```sh + runpodctl ssh list-keys + ``` + + + + diff --git a/tests/IMPROVEMENT_PLAN.md b/tests/IMPROVEMENT_PLAN.md new file mode 100644 index 00000000..755725b8 --- /dev/null +++ b/tests/IMPROVEMENT_PLAN.md @@ -0,0 +1,146 @@ +# Testing Framework Improvement Plan + +Based on feedback from PR #561 review by @runpod-Henrik. + +## Immediate Fixes (Blockers) + +### 1. MCP tool name typo +**File:** `.claude/testing.md` line 31 +**Issue:** References `mcp__runpod-dops__search_runpod_documentation` but server is `runpod-docs` +**Fix:** Change to `mcp__runpod-docs__search_runpod_documentation` +**Status:** [x] DONE + +### 2. Test table format mismatch +**Files:** `tests/README.md`, `.claude/testing.md`, `tests/TESTS.md` +**Issue:** Docs say tables have `ID | Goal | Cleanup` but actual tables have `ID | Goal | Difficulty` +**Options:** +- A) Add Cleanup column back to tables (more explicit per-test) +- B) Update docs to say cleanup rules are global (simpler, current reality) +**Recommendation:** Option B - cleanup rules ARE global (by resource type), not per-test +**Status:** [x] DONE - Updated docs to describe actual format with global cleanup rules + +### 3. Port limit accuracy +**File:** `runpodctl/reference/runpodctl-create-pod.mdx` +**Issue:** Changed from "1 HTTP + 1 TCP" to "10 HTTP + multiple TCP" - needs verification +**Action:** Verify actual runpodctl behavior before merging +**Status:** [x] VERIFIED - `pods/configuration/expose-ports.mdx` confirms "Expose HTTP Ports (Max 10)" + +## Nits + +### 4. Missing trailing newline in .gitignore +**Status:** [x] DONE + +### 5. Double `---` separator in TESTS.md +**Status:** [x] DONE + +--- + +## Structural Improvements (Future Work) + +Henrik correctly identified that this is currently a **catalog**, not a **framework**. Here's a plan to evolve it: + +### Phase 1: Cleanup Safety Net (Quick Win) +**Status:** ✅ DONE + +Created `tests/scripts/cleanup.py`: +- Lists and deletes resources matching `doc_test_*` prefix +- Supports dry-run mode (default) and `--delete` flag +- Handles pods, endpoints, templates, and network volumes +- Can be run standalone or in CI + +Also updated `.claude/testing.md` with cleanup instructions for Claude Code. + +### Phase 2: Smoke Test Tier +**Status:** ✅ DONE + +Added 12 smoke tests that don't require GPU deploys: +- SDK installs: `sdk-python-install`, `sdk-js-install` +- CLI: `cli-install`, `cli-configure`, `cli-list-pods` +- Read-only: `template-list`, `serverless-metrics` +- Config: `api-key-create`, `pods-add-ssh-key` +- Public endpoints: `public-flux`, `public-qwen`, `public-video` + +Created separate "Smoke Tests" section in TESTS.md. +Updated `.claude/testing.md` with test tier instructions. + +### Phase 3: Success Criteria +**Status:** ✅ DONE + +Added "Expected Outcome" column to all test tables with objective, measurable criteria: +- `Pod status is RUNNING` +- `Endpoint responds to /health` +- `SSH session established` +- etc. + +Now each test has a clear PASS/FAIL condition. + +### Phase 4: Automation Layer +**Status:** ⏸️ DEFERRED + +Requires Claude Code in CI or custom API runner. Skipped for now - tests run manually. + +Options for future: +1. Claude Code headless mode (when available) +2. Custom runner script with Anthropic API +3. GitHub Action with Claude CLI + +### Phase 5: Results Tracking +**Status:** ✅ DONE + +- Reports saved to **two locations**: + - `tests/reports/` (gitignored, in repo) + - `~/Dev/doc-tests/` (persistent local archive) +- Enhanced report template with: + - Git SHA and branch + - Structured metadata table + - Steps taken section + - Actual vs expected results +- Instructions for comparing runs over time + +### Phase 6: Convenience Tooling (Added) +**Status:** ✅ DONE + +Based on trial run feedback, added: + +1. **`/test` command** (`.claude/commands/test.md`) + - Loads test definition and execution rules + - Supports `local` flag for local docs mode + - Supports `smoke` for running smoke tests + +2. **`report.py` script** (`tests/scripts/report.py`) + - Auto-generates report template with metadata + - Pulls goal and expected outcome from TESTS.md + - Saves to both report locations + +3. **`stats.py` script** (`tests/scripts/stats.py`) + - Analyzes historical test reports + - Shows pass rates overall and by test + - Lists recent runs and failures + +### Phase 7: GPU Fallback Guidance (Added) +**Status:** ✅ DONE + +Based on flash-quickstart test failure (RTX 4090 unavailable), added: + +1. **Queue timeout thresholds** - When to wait vs try fallback +2. **Fallback GPU order** - L4 → A4000 → RTX 3090 +3. **Cloud type fallbacks** - Secure → Community +4. **Status marking guidance** - PASS/PARTIAL/FAIL based on GPU used + +--- + +## Discussion Points + +1. **How often should full suite run?** Weekly? Monthly? On-demand only? +2. **Budget for test runs?** ~$5-10 per full run was mentioned +3. **Who reviews test reports?** Auto-file issues for failures? +4. **Should we version the test definitions?** Track which tests existed at which doc version? + +--- + +## Next Steps + +1. ~~Fix blockers (#1, #2, #3, #4, #5) immediately~~ ✅ All complete +2. Merge PR with fixes +3. Create issues for Phase 1-5 improvements +4. Discuss automation priorities with team diff --git a/tests/README.md b/tests/README.md index f4179f16..3f9fc0e9 100644 --- a/tests/README.md +++ b/tests/README.md @@ -32,11 +32,20 @@ Run the vllm-deploy test using local docs All tests are defined in [TESTS.md](./TESTS.md) as a table with: - **ID**: Test identifier - **Goal**: What the user wants (one sentence) -- **Cleanup**: Resource types to delete after test +- **Expected Outcome**: What constitutes PASS + +**Smoke tests** are fast tests that don't require GPU deployments (SDK installs, read-only API calls, public endpoints). + +Cleanup rules are defined globally at the bottom of TESTS.md. All test resources use the `doc_test_` prefix. ## Reports -Reports are saved to `reports/` (gitignored) and include: -- What worked / what didn't -- Where the agent got stuck -- Documentation improvements needed +Reports are saved to two locations: +- `reports/` (gitignored, in repo) +- `~/Dev/doc-tests/` (persistent local archive) + +Each report includes: +- Git SHA and branch +- Steps taken +- Actual vs expected results +- Documentation gaps and suggestions diff --git a/tests/TESTS.md b/tests/TESTS.md index 0f733b84..0a3b606f 100644 --- a/tests/TESTS.md +++ b/tests/TESTS.md @@ -14,6 +14,10 @@ Run the flash-quickstart test Run all vLLM tests ``` +``` +Run smoke tests +``` + ### Doc Source Modes **Published docs (default)** - Uses the Runpod Docs MCP server to search published documentation: @@ -28,32 +32,71 @@ Run the vllm-deploy test using local docs When using local docs, the agent will search and read `.mdx` files in this repository instead of querying the MCP server. +### Test Tiers + +**Smoke tests** - Fast tests that don't deploy GPU resources. Use for quick validation: +``` +Run smoke tests +Run all smoke tests using local docs +``` + +**Full tests** - All tests including GPU deployments. Use for comprehensive validation. + ## Test Format Each test has: - **ID**: Unique identifier for the test - **Goal**: What a user would ask (one sentence, no hints) -- **Cleanup**: Resources to delete after test (all use `doc_test_*` prefix) +- **Expected Outcome**: What constitutes PASS (objective, measurable) + +Cleanup rules are defined in the [Cleanup Rules](#cleanup-rules) section at the bottom. All test resources use the `doc_test_` prefix. + +--- + +## Smoke Tests + +Fast tests that don't require GPU deployments. Run these for quick validation. + +| ID | Goal | Expected Outcome | +|----|------|------------------| +| sdk-python-install | Install the Runpod Python SDK | `import runpod` succeeds | +| sdk-js-install | Install the Runpod JavaScript SDK | `require('runpod-sdk')` succeeds | +| cli-install | Install runpodctl on your local machine | `runpodctl version` returns version | +| cli-configure | Configure runpodctl with your API key | `runpodctl config` shows configured key | +| cli-list-pods | List pods using runpodctl | `runpodctl get pods` returns list | +| template-list | List all templates | API returns template array | +| api-key-create | Create an API key with specific permissions | New API key ID returned | +| pods-add-ssh-key | Add an SSH key to your Runpod account | Key appears in account | +| public-flux | Generate an image using FLUX public endpoint | Image data returned | +| public-qwen | Use the Qwen3 32B public endpoint | Chat completion returned | +| public-video | Generate video using WAN public endpoint | Video generation starts | +| serverless-metrics | View endpoint metrics (execution time, delay) | Metrics data returned | + +**Run smoke tests:** +``` +Run smoke tests +Run all smoke tests using local docs +``` --- ## Flash SDK -| ID | Goal | Difficulty | -|----|------|------------| -| flash-quickstart | Deploy a GPU function using Flash | Easy | -| flash-hello-gpu | Run a simple PyTorch function on a GPU | Easy | -| flash-sdxl | Generate an image using SDXL with Flash | Medium | -| flash-text-gen | Deploy a text generation model with Flash | Medium | -| flash-dependencies | Deploy a function with custom pip dependencies | Easy | -| flash-multi-gpu | Create an endpoint that uses multiple GPUs | Medium | -| flash-cpu-endpoint | Deploy a CPU-only endpoint with Flash | Easy | -| flash-load-balancer | Build a REST API with load balancing using Flash | Hard | -| flash-mixed-workers | Create an app with both GPU and CPU workers | Hard | -| flash-env-vars | Configure environment variables for a Flash endpoint | Easy | -| flash-idle-timeout | Set a custom idle timeout for a Flash endpoint | Easy | -| flash-app-deploy | Initialize and deploy a complete Flash app | Medium | -| flash-local-test | Test a Flash function locally before deploying | Medium | +| ID | Goal | Expected Outcome | +|----|------|------------------| +| flash-quickstart | Deploy a GPU function using Flash | Endpoint responds to request | +| flash-hello-gpu | Run a simple PyTorch function on a GPU | PyTorch GPU tensor returned | +| flash-sdxl | Generate an image using SDXL with Flash | Image bytes returned | +| flash-text-gen | Deploy a text generation model with Flash | Generated text returned | +| flash-dependencies | Deploy a function with custom pip dependencies | Function using deps succeeds | +| flash-multi-gpu | Create an endpoint that uses multiple GPUs | Multi-GPU endpoint responds | +| flash-cpu-endpoint | Deploy a CPU-only endpoint with Flash | CPU endpoint responds | +| flash-load-balancer | Build a REST API with load balancing using Flash | Multiple routes respond | +| flash-mixed-workers | Create an app with both GPU and CPU workers | Both worker types respond | +| flash-env-vars | Configure environment variables for a Flash endpoint | Env vars accessible in function | +| flash-idle-timeout | Set a custom idle timeout for a Flash endpoint | Timeout visible in config | +| flash-app-deploy | Initialize and deploy a complete Flash app | App deploys successfully | +| flash-local-test | Test a Flash function locally before deploying | Local test passes | --- @@ -61,27 +104,27 @@ Each test has: > **Important:** Do NOT use public endpoints for these tests. The goal is to test the full deployment workflow: deploy an endpoint, send requests, and verify the integration works. Public endpoints are a separate product and skip the deployment steps we need to validate. -| ID | Goal | Difficulty | -|----|------|------------| -| serverless-create-endpoint | Create a serverless endpoint | Medium | -| serverless-serve-qwen | Create an endpoint to serve a Qwen model | Hard | -| serverless-custom-handler | Write a custom handler function and deploy it | Hard | -| serverless-logs | Build a custom handler that uses progress_update() to send log messages, deploy it, and verify updates appear in /status polling | Hard | -| serverless-send-request | Send a request to an existing endpoint | Easy | -| serverless-async-request | Submit an async job and poll for results | Medium | -| serverless-sync-request | Make a synchronous request to an endpoint using /runsync | Easy | -| serverless-streaming | Build a custom handler that uses yield to stream results, deploy it, and test the /stream endpoint | Hard | -| serverless-webhook | Set up webhook notifications for a serverless endpoint | Medium | -| serverless-cancel-job | Cancel a running or queued job | Easy | -| serverless-queue-delay | Create an endpoint with queue delay scaling | Medium | -| serverless-request-count | Create an endpoint with request count scaling | Medium | -| serverless-min-workers | Create an endpoint with 1 minimum active worker | Easy | -| serverless-idle-timeout | Create an endpoint with an idle timeout of 20 seconds | Easy | -| serverless-gpu-priority | Create an endpoint with GPU type priority/fallback | Medium | -| serverless-docker-deploy | Deploy an endpoint from Docker Hub | Hard | -| serverless-github-deploy | Deploy an endpoint from GitHub | Hard | -| serverless-ssh-worker | SSH into a running worker for debugging | Medium | -| serverless-metrics | View endpoint metrics (execution time, delay) | Easy | +| ID | Goal | Expected Outcome | +|----|------|------------------| +| serverless-create-endpoint | Create a serverless endpoint | Endpoint ID returned | +| serverless-serve-qwen | Create an endpoint to serve a Qwen model | Chat completion works | +| serverless-custom-handler | Write a custom handler function and deploy it | Handler responds to request | +| serverless-logs | Build a custom handler that uses progress_update() to send log messages, deploy it, and verify updates appear in /status polling | Progress updates in /status | +| serverless-send-request | Send a request to an existing endpoint | Response received | +| serverless-async-request | Submit an async job and poll for results | Job completes, output returned | +| serverless-sync-request | Make a synchronous request to an endpoint using /runsync | Sync response returned | +| serverless-streaming | Build a custom handler that uses yield to stream results, deploy it, and test the /stream endpoint | Streamed chunks received | +| serverless-webhook | Set up webhook notifications for a serverless endpoint | Webhook receives callback | +| serverless-cancel-job | Cancel a running or queued job | Job status is CANCELLED | +| serverless-queue-delay | Create an endpoint with queue delay scaling | Scaler type is QUEUE_DELAY | +| serverless-request-count | Create an endpoint with request count scaling | Scaler type is REQUEST_COUNT | +| serverless-min-workers | Create an endpoint with 1 minimum active worker | workersMin is 1 | +| serverless-idle-timeout | Create an endpoint with an idle timeout of 20 seconds | idleTimeout is 20 | +| serverless-gpu-priority | Create an endpoint with GPU type priority/fallback | Multiple GPU types listed | +| serverless-docker-deploy | Deploy an endpoint from Docker Hub | Endpoint from Docker image | +| serverless-github-deploy | Deploy an endpoint from GitHub | Endpoint from GitHub repo | +| serverless-ssh-worker | SSH into a running worker for debugging | SSH session established | +| serverless-metrics | View endpoint metrics (execution time, delay) | Metrics data returned | --- @@ -89,148 +132,148 @@ Each test has: > **Important:** Do NOT use public endpoints for these tests. Deploy your own vLLM endpoint to test the full workflow. Public endpoints skip the deployment and configuration steps we need to validate. -| ID | Goal | Difficulty | -|----|------|------------| -| vllm-deploy | Deploy a vLLM endpoint | Medium | -| vllm-openai-compat | Use the OpenAI Python client with a vLLM endpoint | Medium | -| vllm-chat-completion | Send a chat completion request to vLLM | Easy | -| vllm-streaming | Stream responses from a vLLM endpoint | Medium | -| vllm-custom-model | Deploy a custom/fine-tuned model with vLLM | Hard | -| vllm-gated-model | Deploy a gated Hugging Face model with vLLM | Medium | +| ID | Goal | Expected Outcome | +|----|------|------------------| +| vllm-deploy | Deploy a vLLM endpoint | Endpoint responds to /health | +| vllm-openai-compat | Use the OpenAI Python client with a vLLM endpoint | OpenAI client call succeeds | +| vllm-chat-completion | Send a chat completion request to vLLM | Chat response returned | +| vllm-streaming | Stream responses from a vLLM endpoint | Streamed tokens received | +| vllm-custom-model | Deploy a custom/fine-tuned model with vLLM | Custom model responds | +| vllm-gated-model | Deploy a gated Hugging Face model with vLLM | Gated model loads and responds | --- ## Pods -| ID | Goal | Difficulty | -|----|------|------------| -| pods-quickstart-terminal | Complete the Pod quickstart using only the terminal | Easy | -| pods-create | Create a GPU Pod | Medium | -| pods-start-stop | Start and stop an existing Pod | Easy | -| pods-ssh-connect | Connect to a Pod via SSH | Medium | -| pods-expose-port | Expose a custom port on a Pod | Medium | -| pods-env-vars | Set environment variables on a Pod | Easy | -| pods-resize-storage | Resize a Pod's container or volume disk | Easy | -| pods-template-use | Deploy a Pod using a custom template | Medium | -| pods-template-create | Create a custom Pod template | Hard | -| pods-comfyui | Deploy ComfyUI on a Pod and generate an image | Hard | +| ID | Goal | Expected Outcome | +|----|------|------------------| +| pods-quickstart-terminal | Complete the Pod quickstart using only the terminal | Code runs on Pod via SSH | +| pods-add-ssh-key | Add an SSH key to your Runpod account | Key appears in account | +| pods-create | Create a GPU Pod | Pod status is RUNNING | +| pods-start-stop | Start and stop an existing Pod | Pod starts and stops | +| pods-ssh-connect | Connect to a Pod via SSH | SSH session established | +| pods-expose-port | Expose a custom port on a Pod | Port accessible via URL | +| pods-env-vars | Set environment variables on a Pod | Env vars visible in Pod | +| pods-resize-storage | Resize a Pod's container or volume disk | Storage size increased | +| pods-template-use | Deploy a Pod using a custom template | Pod uses template config | +| pods-template-create | Create a custom Pod template | Template ID returned | +| pods-comfyui | Deploy ComfyUI on a Pod and generate an image | ComfyUI generates image | --- ## Storage -| ID | Goal | Difficulty | -|----|------|------------| -| storage-create-volume | Create a network volume | Easy | -| storage-attach-pod | Attach a network volume to a Pod | Medium | -| storage-attach-serverless | Attach a network volume to a Serverless endpoint | Medium | -| storage-s3-api | Access a network volume using the S3 API | Hard | -| storage-upload-s3 | Upload a file to a network volume using S3 | Hard | -| storage-download-s3 | Download a file from a network volume using S3 | Hard | -| storage-runpodctl-send | Transfer files between Pods using runpodctl | Easy | -| storage-migrate-volume | Migrate data between network volumes | Hard | -| storage-cloud-sync | Sync data with cloud storage (S3, GCS) | Hard | -| storage-scp-transfer | Transfer files to a Pod using SCP | Medium | -| storage-rsync | Sync files to a Pod using rsync | Medium | +| ID | Goal | Expected Outcome | +|----|------|------------------| +| storage-create-volume | Create a network volume | Volume ID returned | +| storage-attach-pod | Attach a network volume to a Pod | Volume mounted in Pod | +| storage-attach-serverless | Attach a network volume to a Serverless endpoint | Volume accessible to workers | +| storage-s3-api | Access a network volume using the S3 API | S3 list/read works | +| storage-upload-s3 | Upload a file to a network volume using S3 | File appears on volume | +| storage-download-s3 | Download a file from a network volume using S3 | File downloaded locally | +| storage-runpodctl-send | Transfer files between Pods using runpodctl | File arrives on target Pod | +| storage-migrate-volume | Migrate data between network volumes | Data exists on new volume | +| storage-cloud-sync | Sync data with cloud storage (S3, GCS) | Data synced both ways | +| storage-scp-transfer | Transfer files to a Pod using SCP | File arrives on Pod | +| storage-rsync | Sync files to a Pod using rsync | Files synced to Pod | --- ## Templates -| ID | Goal | Difficulty | -|----|------|------------| -| template-create-pod | Create a Pod template | Medium | -| template-create-serverless | Create a Serverless template | Medium | -| template-list | List all templates | Easy | -| template-preload-model | Create a template with a pre-loaded model | Hard | -| template-custom-dockerfile | Create a template with a custom Dockerfile | Hard | -| template-env-vars | Add environment variables to a template | Easy | +| ID | Goal | Expected Outcome | +|----|------|------------------| +| template-create-pod | Create a Pod template | Template ID returned | +| template-create-serverless | Create a Serverless template | Template ID returned | +| template-list | List all templates | Template array returned | +| template-preload-model | Create a template with a pre-loaded model | Model preloads on start | +| template-custom-dockerfile | Create a template with a custom Dockerfile | Template uses custom image | +| template-env-vars | Add environment variables to a template | Env vars in template config | --- ## Instant Clusters -| ID | Goal | Difficulty | -|----|------|------------| -| cluster-create | Create an Instant Cluster | Medium | -| cluster-pytorch | Run distributed PyTorch training on a cluster | Hard | -| cluster-slurm | Deploy a Slurm cluster | Hard | -| cluster-axolotl | Fine-tune an LLM with Axolotl on a cluster | Hard | +| ID | Goal | Expected Outcome | +|----|------|------------------| +| cluster-create | Create an Instant Cluster | Cluster nodes are RUNNING | +| cluster-pytorch | Run distributed PyTorch training on a cluster | Training completes on all nodes | +| cluster-slurm | Deploy a Slurm cluster | Slurm queue accepts jobs | +| cluster-axolotl | Fine-tune an LLM with Axolotl on a cluster | Fine-tuning starts | --- ## SDKs & APIs -| ID | Goal | Difficulty | -|----|------|------------| -| sdk-python-install | Install the Runpod Python SDK | Easy | -| sdk-python-endpoint | Use the Python SDK to call an endpoint | Easy | -| sdk-js-install | Install the Runpod JavaScript SDK | Easy | -| sdk-js-endpoint | Use the JavaScript SDK to call an endpoint | Easy | -| api-graphql-query | Make a GraphQL query to list pods | Medium | -| api-graphql-mutation | Create a resource using GraphQL mutation | Medium | -| api-key-create | Create an API key with specific permissions | Easy | -| api-key-restricted | Create a restricted API key | Medium | +| ID | Goal | Expected Outcome | +|----|------|------------------| +| sdk-python-install | Install the Runpod Python SDK | `import runpod` succeeds | +| sdk-python-endpoint | Use the Python SDK to call an endpoint | SDK call returns response | +| sdk-js-install | Install the Runpod JavaScript SDK | `require('runpod-sdk')` succeeds | +| sdk-js-endpoint | Use the JavaScript SDK to call an endpoint | SDK call returns response | +| api-graphql-query | Make a GraphQL query to list pods | Query returns pod list | +| api-graphql-mutation | Create a resource using GraphQL mutation | Resource created via mutation | +| api-key-create | Create an API key with specific permissions | New API key ID returned | +| api-key-restricted | Create a restricted API key | Key has limited permissions | --- ## CLI (runpodctl) -| ID | Goal | Difficulty | -|----|------|------------| -| cli-install | Install runpodctl on your local machine | Easy | -| cli-configure | Configure runpodctl with your API key | Easy | -| cli-list-pods | List pods using runpodctl | Easy | -| cli-create-pod | Create a pod using runpodctl | Medium | -| cli-send-file | Send a file to a Pod using runpodctl | Medium | -| cli-receive-file | Receive a file from a Pod using runpodctl | Medium | +| ID | Goal | Expected Outcome | +|----|------|------------------| +| cli-install | Install runpodctl on your local machine | `runpodctl version` returns version | +| cli-configure | Configure runpodctl with your API key | `runpodctl config` shows key | +| cli-list-pods | List pods using runpodctl | `runpodctl get pods` returns list | +| cli-create-pod | Create a pod using runpodctl | Pod ID returned | +| cli-send-file | Send a file to a Pod using runpodctl | File arrives on Pod | +| cli-receive-file | Receive a file from a Pod using runpodctl | File downloaded locally | --- ## Model Caching -| ID | Goal | Difficulty | -|----|------|------------| -| cache-enable | Create an endpoint with model caching enabled | Medium | +| ID | Goal | Expected Outcome | +|----|------|------------------| +| cache-enable | Create an endpoint with model caching enabled | Caching enabled in config | --- ## Integrations -| ID | Goal | Difficulty | -|----|------|------------| -| integration-openai-migrate | Create an OpenAI-compatible endpoint | Medium | -| integration-vercel-ai | Create an image generation app with the Vercel AI SDK | Medium | -| integration-cursor | Configure Cursor to use Runpod endpoints | Medium | -| integration-skypilot | Use Runpod with SkyPilot | Hard | +| ID | Goal | Expected Outcome | +|----|------|------------------| +| integration-openai-migrate | Create an OpenAI-compatible endpoint | OpenAI client works | +| integration-vercel-ai | Create an image generation app with the Vercel AI SDK | Image generated via Vercel AI | +| integration-cursor | Configure Cursor to use Runpod endpoints | Cursor uses Runpod backend | +| integration-skypilot | Use Runpod with SkyPilot | SkyPilot launches on Runpod | --- ## Public Endpoints -| ID | Goal | Difficulty | -|----|------|------------| -| public-flux | Generate an image using FLUX public endpoint | Easy | -| public-qwen | Use the Qwen3 32B public endpoint | Easy | -| public-video | Generate video using WAN public endpoint | Medium | +| ID | Goal | Expected Outcome | +|----|------|------------------| +| public-flux | Generate an image using FLUX public endpoint | Image data returned | +| public-qwen | Use the Qwen3 32B public endpoint | Chat completion returned | +| public-video | Generate video using WAN public endpoint | Video generation starts | --- ## Tutorials (End-to-End) -| ID | Goal | Difficulty | -|----|------|------------| -| tutorial-sdxl-serverless | Deploy SDXL as a serverless endpoint | Medium | -| tutorial-comfyui-pod | Deploy ComfyUI on a Pod and generate an image | Medium | -| tutorial-comfyui-serverless | Deploy ComfyUI as a serverless endpoint and generate an image | Hard | -| tutorial-gemma-chatbot | Deploy a Gemma 3 chatbot with vLLM | Medium | -| tutorial-custom-worker | Build and deploy a custom worker | Hard | -| tutorial-web-integration | Integrate a Serverless endpoint into a web application | Hard | -| tutorial-dual-mode-worker | Deploy a dual-mode (Pod/Serverless) worker | Hard | -| tutorial-model-caching | Create an endpoint with model caching enabled | Hard | -| tutorial-pytorch-cluster | Deploy a PyTorch cluster | Hard | +| ID | Goal | Expected Outcome | +|----|------|------------------| +| tutorial-sdxl-serverless | Deploy SDXL as a serverless endpoint | SDXL generates image | +| tutorial-comfyui-pod | Deploy ComfyUI on a Pod and generate an image | ComfyUI workflow executes | +| tutorial-comfyui-serverless | Deploy ComfyUI as a serverless endpoint and generate an image | ComfyUI endpoint generates image | +| tutorial-gemma-chatbot | Deploy a Gemma 3 chatbot with vLLM | Chatbot responds | +| tutorial-custom-worker | Build and deploy a custom worker | Custom worker responds | +| tutorial-web-integration | Integrate a Serverless endpoint into a web application | Web app calls endpoint | +| tutorial-dual-mode-worker | Deploy a dual-mode (Pod/Serverless) worker | Both modes work | +| tutorial-model-caching | Create an endpoint with model caching enabled | Caching improves cold start | +| tutorial-pytorch-cluster | Deploy a PyTorch cluster | Distributed training runs | ---- --- ## Cleanup Rules diff --git a/tests/scripts/README.md b/tests/scripts/README.md new file mode 100644 index 00000000..f0f132fa --- /dev/null +++ b/tests/scripts/README.md @@ -0,0 +1,75 @@ +# Test Scripts + +Utility scripts for the documentation testing framework. + +## cleanup.py + +Finds and deletes Runpod resources matching the test prefix (`doc_test_*`). + +```bash +# Dry run - see what would be deleted +python cleanup.py + +# Actually delete resources +python cleanup.py --delete + +# Use custom prefix +python cleanup.py --prefix my_test_ +``` + +**Requirements:** `requests`, `RUNPOD_API_KEY` env var + +## report.py + +Generates a test report template with metadata pre-filled. + +```bash +# Generate report for a passing test +python report.py pods-quickstart-terminal PASS + +# Mark as using local docs +python report.py pods-quickstart-terminal PASS --local + +# Generate report for a failing test +python report.py flash-quickstart FAIL +``` + +**Output:** Creates report in both: +- `tests/reports/-.md` +- `~/Dev/doc-tests/-.md` + +The template includes: +- Timestamp, git SHA, branch +- Test goal and expected outcome (from TESTS.md) +- Placeholder sections for you to fill in + +## stats.py + +Analyzes historical test reports to show pass rates and trends. + +```bash +# Show overall summary +python stats.py + +# Group by test ID +python stats.py --by-test + +# Show last 10 reports +python stats.py --recent 10 + +# Show only failures +python stats.py --failures +``` + +**Data source:** Reads reports from `~/Dev/doc-tests/` + +## CI Integration + +Add to GitHub Actions for scheduled cleanup: + +```yaml +- name: Cleanup orphaned test resources + env: + RUNPOD_API_KEY: ${{ secrets.RUNPOD_API_KEY }} + run: python tests/scripts/cleanup.py --delete +``` diff --git a/tests/scripts/cleanup.py b/tests/scripts/cleanup.py new file mode 100755 index 00000000..9779de99 --- /dev/null +++ b/tests/scripts/cleanup.py @@ -0,0 +1,218 @@ +#!/usr/bin/env python3 +""" +Cleanup script for documentation agent tests. + +Deletes all Runpod resources matching the test prefix (doc_test_*). +Can be run manually or scheduled in CI to catch orphaned resources. + +Usage: + python cleanup.py # Dry run (list only) + python cleanup.py --delete # Actually delete resources + python cleanup.py --prefix my_ # Use custom prefix + +Requires: + RUNPOD_API_KEY environment variable +""" + +import argparse +import os +import sys +from typing import Any + +try: + import requests +except ImportError: + print("Error: requests library required. Install with: pip install requests") + sys.exit(1) + + +API_BASE = "https://rest.runpod.io/v1" +DEFAULT_PREFIX = "doc_test_" + + +def get_headers() -> dict: + """Get authorization headers.""" + api_key = os.environ.get("RUNPOD_API_KEY") + if not api_key: + print("Error: RUNPOD_API_KEY environment variable not set") + sys.exit(1) + return {"Authorization": f"Bearer {api_key}"} + + +def list_pods(prefix: str) -> list[dict[str, Any]]: + """List pods matching prefix.""" + resp = requests.get(f"{API_BASE}/pods", headers=get_headers()) + resp.raise_for_status() + pods = resp.json() + if isinstance(pods, dict): + pods = pods.get("pods", []) + return [p for p in pods if p.get("name", "").startswith(prefix)] + + +def list_endpoints(prefix: str) -> list[dict[str, Any]]: + """List serverless endpoints matching prefix.""" + resp = requests.get(f"{API_BASE}/endpoints", headers=get_headers()) + resp.raise_for_status() + endpoints = resp.json() + if isinstance(endpoints, dict): + endpoints = endpoints.get("endpoints", []) + return [e for e in endpoints if e.get("name", "").startswith(prefix)] + + +def list_templates(prefix: str) -> list[dict[str, Any]]: + """List templates matching prefix.""" + resp = requests.get(f"{API_BASE}/templates", headers=get_headers()) + resp.raise_for_status() + templates = resp.json() + if isinstance(templates, dict): + templates = templates.get("templates", []) + return [t for t in templates if t.get("name", "").startswith(prefix)] + + +def list_network_volumes(prefix: str) -> list[dict[str, Any]]: + """List network volumes matching prefix.""" + resp = requests.get(f"{API_BASE}/network-volumes", headers=get_headers()) + resp.raise_for_status() + volumes = resp.json() + if isinstance(volumes, dict): + volumes = volumes.get("networkVolumes", []) + return [v for v in volumes if v.get("name", "").startswith(prefix)] + + +def delete_pod(pod_id: str) -> bool: + """Delete a pod by ID.""" + resp = requests.delete(f"{API_BASE}/pods/{pod_id}", headers=get_headers()) + return resp.status_code == 200 + + +def delete_endpoint(endpoint_id: str) -> bool: + """Delete an endpoint by ID.""" + resp = requests.delete(f"{API_BASE}/endpoints/{endpoint_id}", headers=get_headers()) + return resp.status_code == 200 + + +def delete_template(template_id: str) -> bool: + """Delete a template by ID.""" + resp = requests.delete(f"{API_BASE}/templates/{template_id}", headers=get_headers()) + return resp.status_code == 200 + + +def delete_network_volume(volume_id: str) -> bool: + """Delete a network volume by ID.""" + resp = requests.delete( + f"{API_BASE}/network-volumes/{volume_id}", headers=get_headers() + ) + return resp.status_code == 200 + + +def main(): + parser = argparse.ArgumentParser( + description="Clean up test resources matching prefix" + ) + parser.add_argument( + "--delete", action="store_true", help="Actually delete (default: dry run)" + ) + parser.add_argument( + "--prefix", default=DEFAULT_PREFIX, help=f"Resource prefix (default: {DEFAULT_PREFIX})" + ) + args = parser.parse_args() + + prefix = args.prefix + dry_run = not args.delete + + if dry_run: + print(f"DRY RUN - Looking for resources matching '{prefix}*'\n") + else: + print(f"DELETING resources matching '{prefix}*'\n") + + # Track totals + found = {"pods": 0, "endpoints": 0, "templates": 0, "volumes": 0} + deleted = {"pods": 0, "endpoints": 0, "templates": 0, "volumes": 0} + + # Pods + print("Pods:") + pods = list_pods(prefix) + found["pods"] = len(pods) + if not pods: + print(" (none found)") + for pod in pods: + pod_id = pod.get("id") + name = pod.get("name") + if dry_run: + print(f" Would delete: {name} ({pod_id})") + else: + if delete_pod(pod_id): + print(f" Deleted: {name} ({pod_id})") + deleted["pods"] += 1 + else: + print(f" Failed to delete: {name} ({pod_id})") + + # Endpoints + print("\nEndpoints:") + endpoints = list_endpoints(prefix) + found["endpoints"] = len(endpoints) + if not endpoints: + print(" (none found)") + for endpoint in endpoints: + endpoint_id = endpoint.get("id") + name = endpoint.get("name") + if dry_run: + print(f" Would delete: {name} ({endpoint_id})") + else: + if delete_endpoint(endpoint_id): + print(f" Deleted: {name} ({endpoint_id})") + deleted["endpoints"] += 1 + else: + print(f" Failed to delete: {name} ({endpoint_id})") + + # Templates + print("\nTemplates:") + templates = list_templates(prefix) + found["templates"] = len(templates) + if not templates: + print(" (none found)") + for template in templates: + template_id = template.get("id") + name = template.get("name") + if dry_run: + print(f" Would delete: {name} ({template_id})") + else: + if delete_template(template_id): + print(f" Deleted: {name} ({template_id})") + deleted["templates"] += 1 + else: + print(f" Failed to delete: {name} ({template_id})") + + # Network Volumes + print("\nNetwork Volumes:") + volumes = list_network_volumes(prefix) + found["volumes"] = len(volumes) + if not volumes: + print(" (none found)") + for volume in volumes: + volume_id = volume.get("id") + name = volume.get("name") + if dry_run: + print(f" Would delete: {name} ({volume_id})") + else: + if delete_network_volume(volume_id): + print(f" Deleted: {name} ({volume_id})") + deleted["volumes"] += 1 + else: + print(f" Failed to delete: {name} ({volume_id})") + + # Summary + print("\n" + "=" * 40) + total_found = sum(found.values()) + total_deleted = sum(deleted.values()) + + if dry_run: + print(f"Found {total_found} resources matching '{prefix}*'") + if total_found > 0: + print("Run with --delete to remove them") + else: + print(f"Deleted {total_deleted}/{total_found} resources") + + +if __name__ == "__main__": + main() diff --git a/tests/scripts/report.py b/tests/scripts/report.py new file mode 100755 index 00000000..36422497 --- /dev/null +++ b/tests/scripts/report.py @@ -0,0 +1,143 @@ +#!/usr/bin/env python3 +""" +Report generator for documentation tests. + +Generates a report template with metadata pre-filled (timestamp, git info). +The agent fills in the remaining sections. + +Usage: + python report.py [--local] + +Arguments: + test-id: The test ID (e.g., pods-quickstart-terminal) + status: PASS, FAIL, or PARTIAL + --local: Mark as using local docs (default: published) + +Example: + python report.py pods-quickstart-terminal PASS --local +""" + +import argparse +import os +import subprocess +import sys +from datetime import datetime +from pathlib import Path + + +def get_git_info() -> tuple[str, str]: + """Get current git SHA and branch.""" + try: + sha = subprocess.check_output( + ["git", "rev-parse", "--short", "HEAD"], + stderr=subprocess.DEVNULL + ).decode().strip() + except Exception: + sha = "unknown" + + try: + branch = subprocess.check_output( + ["git", "branch", "--show-current"], + stderr=subprocess.DEVNULL + ).decode().strip() + except Exception: + branch = "unknown" + + return sha, branch + + +def get_test_definition(test_id: str) -> tuple[str, str]: + """Look up test goal and expected outcome from TESTS.md.""" + tests_file = Path(__file__).parent.parent / "TESTS.md" + + if not tests_file.exists(): + return "Unknown", "Unknown" + + with open(tests_file) as f: + for line in f: + if line.startswith("|") and test_id in line: + parts = [p.strip() for p in line.split("|")] + if len(parts) >= 4 and parts[1] == test_id: + return parts[2], parts[3] # goal, expected outcome + + return "Unknown", "Unknown" + + +def generate_report(test_id: str, status: str, local: bool) -> str: + """Generate the report markdown.""" + now = datetime.now() + timestamp = now.strftime("%Y-%m-%d %H:%M:%S") + sha, branch = get_git_info() + doc_source = "Local" if local else "Published" + goal, expected = get_test_definition(test_id) + + return f"""# Test Report: {test_id} + +## Metadata +| Field | Value | +|-------|-------| +| **Test ID** | {test_id} | +| **Date** | {timestamp} | +| **Git SHA** | {sha} | +| **Git Branch** | {branch} | +| **Doc Source** | {doc_source} | +| **Status** | {status} | + +## Goal +{goal} + +## Expected Outcome +{expected} + +## Actual Result + + +## Steps Taken + +1. +2. +3. + +## Documentation Gaps + + +## Suggestions + +""" + + +def main(): + parser = argparse.ArgumentParser(description="Generate test report template") + parser.add_argument("test_id", help="Test ID (e.g., pods-quickstart-terminal)") + parser.add_argument("status", choices=["PASS", "FAIL", "PARTIAL"], help="Test status") + parser.add_argument("--local", action="store_true", help="Mark as using local docs") + args = parser.parse_args() + + # Generate timestamp for filename + timestamp = datetime.now().strftime("%Y%m%d-%H%M%S") + filename = f"{args.test_id}-{timestamp}.md" + + # Generate report content + content = generate_report(args.test_id, args.status, args.local) + + # Save to both locations + repo_reports = Path(__file__).parent.parent / "reports" + archive_reports = Path.home() / "Dev" / "doc-tests" + + repo_reports.mkdir(exist_ok=True) + archive_reports.mkdir(exist_ok=True) + + repo_path = repo_reports / filename + archive_path = archive_reports / filename + + repo_path.write_text(content) + archive_path.write_text(content) + + print(f"Report template created:") + print(f" - {repo_path}") + print(f" - {archive_path}") + print(f"\nEdit the report to fill in: Actual Result, Steps Taken, Documentation Gaps, Suggestions") + + +if __name__ == "__main__": + main() diff --git a/tests/scripts/stats.py b/tests/scripts/stats.py new file mode 100755 index 00000000..95cd6c16 --- /dev/null +++ b/tests/scripts/stats.py @@ -0,0 +1,170 @@ +#!/usr/bin/env python3 +""" +Test statistics analyzer. + +Analyzes historical test reports to show pass rates and trends. + +Usage: + python stats.py # Show overall stats + python stats.py --by-test # Group by test ID + python stats.py --recent 10 # Show last 10 reports + python stats.py --failures # Show only failures +""" + +import argparse +import re +from collections import defaultdict +from datetime import datetime +from pathlib import Path + + +def parse_report(path: Path) -> dict | None: + """Parse a report file and extract metadata.""" + try: + content = path.read_text() + + # Extract metadata from table + test_id_match = re.search(r"\*\*Test ID\*\*\s*\|\s*(\S+)", content) + date_match = re.search(r"\*\*Date\*\*\s*\|\s*(.+)", content) + status_match = re.search(r"\*\*Status\*\*\s*\|\s*(\S+)", content) + doc_source_match = re.search(r"\*\*Doc Source\*\*\s*\|\s*(\S+)", content) + git_sha_match = re.search(r"\*\*Git SHA\*\*\s*\|\s*(\S+)", content) + + if not all([test_id_match, status_match]): + return None + + return { + "file": path.name, + "test_id": test_id_match.group(1), + "date": date_match.group(1).strip() if date_match else "Unknown", + "status": status_match.group(1), + "doc_source": doc_source_match.group(1) if doc_source_match else "Unknown", + "git_sha": git_sha_match.group(1) if git_sha_match else "Unknown", + } + except Exception as e: + print(f"Warning: Could not parse {path}: {e}") + return None + + +def load_reports() -> list[dict]: + """Load all reports from the archive directory.""" + archive_dir = Path.home() / "Dev" / "doc-tests" + + if not archive_dir.exists(): + print(f"Archive directory not found: {archive_dir}") + return [] + + reports = [] + for path in sorted(archive_dir.glob("*.md")): + report = parse_report(path) + if report: + reports.append(report) + + return reports + + +def show_summary(reports: list[dict]): + """Show overall summary statistics.""" + if not reports: + print("No reports found.") + return + + total = len(reports) + passed = sum(1 for r in reports if r["status"] == "PASS") + failed = sum(1 for r in reports if r["status"] == "FAIL") + partial = sum(1 for r in reports if r["status"] == "PARTIAL") + + pass_rate = (passed / total) * 100 if total > 0 else 0 + + print("=" * 50) + print("TEST SUMMARY") + print("=" * 50) + print(f"Total runs: {total}") + print(f"Passed: {passed} ({pass_rate:.1f}%)") + print(f"Failed: {failed}") + print(f"Partial: {partial}") + print("=" * 50) + + +def show_by_test(reports: list[dict]): + """Show statistics grouped by test ID.""" + if not reports: + print("No reports found.") + return + + by_test = defaultdict(list) + for r in reports: + by_test[r["test_id"]].append(r) + + print("=" * 60) + print(f"{'TEST ID':<35} {'RUNS':<6} {'PASS':<6} {'RATE':<8}") + print("=" * 60) + + for test_id in sorted(by_test.keys()): + runs = by_test[test_id] + total = len(runs) + passed = sum(1 for r in runs if r["status"] == "PASS") + rate = (passed / total) * 100 if total > 0 else 0 + print(f"{test_id:<35} {total:<6} {passed:<6} {rate:.0f}%") + + print("=" * 60) + + +def show_recent(reports: list[dict], count: int): + """Show most recent reports.""" + if not reports: + print("No reports found.") + return + + recent = reports[-count:] + + print("=" * 80) + print(f"{'DATE':<20} {'TEST ID':<30} {'STATUS':<10} {'SOURCE':<10}") + print("=" * 80) + + for r in reversed(recent): + print(f"{r['date']:<20} {r['test_id']:<30} {r['status']:<10} {r['doc_source']:<10}") + + print("=" * 80) + + +def show_failures(reports: list[dict]): + """Show only failed tests.""" + failures = [r for r in reports if r["status"] in ("FAIL", "PARTIAL")] + + if not failures: + print("No failures found!") + return + + print("=" * 80) + print("FAILURES AND PARTIAL PASSES") + print("=" * 80) + + for r in failures: + print(f"\n{r['test_id']} - {r['status']}") + print(f" Date: {r['date']}") + print(f" File: {r['file']}") + print(f" Git SHA: {r['git_sha']}") + + +def main(): + parser = argparse.ArgumentParser(description="Analyze test report statistics") + parser.add_argument("--by-test", action="store_true", help="Group by test ID") + parser.add_argument("--recent", type=int, metavar="N", help="Show last N reports") + parser.add_argument("--failures", action="store_true", help="Show only failures") + args = parser.parse_args() + + reports = load_reports() + + if args.by_test: + show_by_test(reports) + elif args.recent: + show_recent(reports, args.recent) + elif args.failures: + show_failures(reports) + else: + show_summary(reports) + + +if __name__ == "__main__": + main() From c0b46c328a2fc7c38ee1d171ece78a5fefe9aeb6 Mon Sep 17 00:00:00 2001 From: Mo King Date: Fri, 20 Mar 2026 13:01:35 -0400 Subject: [PATCH 6/8] Update styleguide with guidance on using cards for links --- .claude/style-guide.md | 14 ++++++++++++++ .cursor/rules/rp-styleguide.mdc | 14 ++++++++++++++ 2 files changed, 28 insertions(+) diff --git a/.claude/style-guide.md b/.claude/style-guide.md index fc584b87..00cbae5d 100644 --- a/.claude/style-guide.md +++ b/.claude/style-guide.md @@ -92,3 +92,17 @@ The `handler` function receives a job dictionary containing the input from the A - Use backticks for file paths: `serverless/workers/handler.py` - Use backticks for environment variables: `RUNPOD_API_KEY` - Use backticks for API endpoints: `/v2/endpoint_id/run` + +## Next Steps and Learn More Sections + +Use `CardGroup` with horizontal cards instead of bullet lists for "Next steps" and "Learn more" sections: + +```mdx + + + Brief description of the linked content. + + +``` + +Choose icons that match the content (e.g., `github` for repos, `terminal` for CLI, `book` for docs). diff --git a/.cursor/rules/rp-styleguide.mdc b/.cursor/rules/rp-styleguide.mdc index 8d9db909..f1704452 100644 --- a/.cursor/rules/rp-styleguide.mdc +++ b/.cursor/rules/rp-styleguide.mdc @@ -22,3 +22,17 @@ And number steps like this: "## Step 1: Create a widget" ... and so on. + +### Next steps and learn more sections + +Use `CardGroup` with horizontal cards instead of bullet lists for "Next steps" and "Learn more" sections: + +```mdx + + + Brief description of the linked content. + + +``` + +Choose icons that match the content (e.g., `github` for repos, `terminal` for CLI, `book` for docs). From d6d0910e8b7a4792bd6e695ad907c827c0ee208a Mon Sep 17 00:00:00 2001 From: Mo King Date: Fri, 20 Mar 2026 13:02:49 -0400 Subject: [PATCH 7/8] remove testing improvement plan --- tests/IMPROVEMENT_PLAN.md | 146 -------------------------------------- 1 file changed, 146 deletions(-) delete mode 100644 tests/IMPROVEMENT_PLAN.md diff --git a/tests/IMPROVEMENT_PLAN.md b/tests/IMPROVEMENT_PLAN.md deleted file mode 100644 index 755725b8..00000000 --- a/tests/IMPROVEMENT_PLAN.md +++ /dev/null @@ -1,146 +0,0 @@ -# Testing Framework Improvement Plan - -Based on feedback from PR #561 review by @runpod-Henrik. - -## Immediate Fixes (Blockers) - -### 1. MCP tool name typo -**File:** `.claude/testing.md` line 31 -**Issue:** References `mcp__runpod-dops__search_runpod_documentation` but server is `runpod-docs` -**Fix:** Change to `mcp__runpod-docs__search_runpod_documentation` -**Status:** [x] DONE - -### 2. Test table format mismatch -**Files:** `tests/README.md`, `.claude/testing.md`, `tests/TESTS.md` -**Issue:** Docs say tables have `ID | Goal | Cleanup` but actual tables have `ID | Goal | Difficulty` -**Options:** -- A) Add Cleanup column back to tables (more explicit per-test) -- B) Update docs to say cleanup rules are global (simpler, current reality) -**Recommendation:** Option B - cleanup rules ARE global (by resource type), not per-test -**Status:** [x] DONE - Updated docs to describe actual format with global cleanup rules - -### 3. Port limit accuracy -**File:** `runpodctl/reference/runpodctl-create-pod.mdx` -**Issue:** Changed from "1 HTTP + 1 TCP" to "10 HTTP + multiple TCP" - needs verification -**Action:** Verify actual runpodctl behavior before merging -**Status:** [x] VERIFIED - `pods/configuration/expose-ports.mdx` confirms "Expose HTTP Ports (Max 10)" - -## Nits - -### 4. Missing trailing newline in .gitignore -**Status:** [x] DONE - -### 5. Double `---` separator in TESTS.md -**Status:** [x] DONE - ---- - -## Structural Improvements (Future Work) - -Henrik correctly identified that this is currently a **catalog**, not a **framework**. Here's a plan to evolve it: - -### Phase 1: Cleanup Safety Net (Quick Win) -**Status:** ✅ DONE - -Created `tests/scripts/cleanup.py`: -- Lists and deletes resources matching `doc_test_*` prefix -- Supports dry-run mode (default) and `--delete` flag -- Handles pods, endpoints, templates, and network volumes -- Can be run standalone or in CI - -Also updated `.claude/testing.md` with cleanup instructions for Claude Code. - -### Phase 2: Smoke Test Tier -**Status:** ✅ DONE - -Added 12 smoke tests that don't require GPU deploys: -- SDK installs: `sdk-python-install`, `sdk-js-install` -- CLI: `cli-install`, `cli-configure`, `cli-list-pods` -- Read-only: `template-list`, `serverless-metrics` -- Config: `api-key-create`, `pods-add-ssh-key` -- Public endpoints: `public-flux`, `public-qwen`, `public-video` - -Created separate "Smoke Tests" section in TESTS.md. -Updated `.claude/testing.md` with test tier instructions. - -### Phase 3: Success Criteria -**Status:** ✅ DONE - -Added "Expected Outcome" column to all test tables with objective, measurable criteria: -- `Pod status is RUNNING` -- `Endpoint responds to /health` -- `SSH session established` -- etc. - -Now each test has a clear PASS/FAIL condition. - -### Phase 4: Automation Layer -**Status:** ⏸️ DEFERRED - -Requires Claude Code in CI or custom API runner. Skipped for now - tests run manually. - -Options for future: -1. Claude Code headless mode (when available) -2. Custom runner script with Anthropic API -3. GitHub Action with Claude CLI - -### Phase 5: Results Tracking -**Status:** ✅ DONE - -- Reports saved to **two locations**: - - `tests/reports/` (gitignored, in repo) - - `~/Dev/doc-tests/` (persistent local archive) -- Enhanced report template with: - - Git SHA and branch - - Structured metadata table - - Steps taken section - - Actual vs expected results -- Instructions for comparing runs over time - -### Phase 6: Convenience Tooling (Added) -**Status:** ✅ DONE - -Based on trial run feedback, added: - -1. **`/test` command** (`.claude/commands/test.md`) - - Loads test definition and execution rules - - Supports `local` flag for local docs mode - - Supports `smoke` for running smoke tests - -2. **`report.py` script** (`tests/scripts/report.py`) - - Auto-generates report template with metadata - - Pulls goal and expected outcome from TESTS.md - - Saves to both report locations - -3. **`stats.py` script** (`tests/scripts/stats.py`) - - Analyzes historical test reports - - Shows pass rates overall and by test - - Lists recent runs and failures - -### Phase 7: GPU Fallback Guidance (Added) -**Status:** ✅ DONE - -Based on flash-quickstart test failure (RTX 4090 unavailable), added: - -1. **Queue timeout thresholds** - When to wait vs try fallback -2. **Fallback GPU order** - L4 → A4000 → RTX 3090 -3. **Cloud type fallbacks** - Secure → Community -4. **Status marking guidance** - PASS/PARTIAL/FAIL based on GPU used - ---- - -## Discussion Points - -1. **How often should full suite run?** Weekly? Monthly? On-demand only? -2. **Budget for test runs?** ~$5-10 per full run was mentioned -3. **Who reviews test reports?** Auto-file issues for failures? -4. **Should we version the test definitions?** Track which tests existed at which doc version? - ---- - -## Next Steps - -1. ~~Fix blockers (#1, #2, #3, #4, #5) immediately~~ ✅ All complete -2. Merge PR with fixes -3. Create issues for Phase 1-5 improvements -4. Discuss automation priorities with team From b6e21e1c9d01bf0ac9b4ec6556fae199c16c2249 Mon Sep 17 00:00:00 2001 From: Mo King Date: Fri, 20 Mar 2026 18:47:50 -0400 Subject: [PATCH 8/8] Add test batches --- .claude/commands/test.md | 120 ++++++++++++++++++++++---------- .claude/style-guide.md | 2 + .claude/testing.md | 75 ++++++++++++++++---- .cursor/rules/rp-styleguide.mdc | 4 +- CLAUDE.md | 4 +- tests/TESTS.md | 15 ++-- 6 files changed, 158 insertions(+), 62 deletions(-) diff --git a/.claude/commands/test.md b/.claude/commands/test.md index b29a703f..813fdb90 100644 --- a/.claude/commands/test.md +++ b/.claude/commands/test.md @@ -5,40 +5,94 @@ Run a test from the testing framework to validate documentation quality. ## Usage ``` -/test -/test local -/test smoke +/test # Run single test +/test local # Run with local docs +/test # Run all tests in category +/test local # Run category with local docs +/test smoke # Run smoke tests only ``` ## Arguments -- ``: The test ID from `tests/TESTS.md` (e.g., `pods-quickstart-terminal`, `flash-quickstart`) +- ``: Single test ID (e.g., `pods-quickstart-terminal`, `flash-quickstart`) +- ``: Category name to run all tests in that section - `local`: (Optional) Use local MDX files instead of published docs - `smoke`: Run all smoke tests -## Execution Rules - -When running a test, you MUST follow these rules: - -1. **Read the test definition** from `tests/TESTS.md` - find the row matching the test ID -2. **Do NOT use prior knowledge** - only use Runpod docs (published MCP or local MDX) +## Categories + +| Category | Tests | Description | +|----------|-------|-------------| +| `smoke` | 12 | Fast tests, no GPU deploys | +| `flash` | 13 | Flash SDK tests | +| `serverless` | 20 | Serverless endpoint tests | +| `vllm` | 6 | vLLM deployment tests | +| `pods` | 11 | Pod management tests | +| `storage` | 11 | Network volume tests | +| `templates` | 6 | Template tests | +| `clusters` | 4 | Instant Cluster tests | +| `sdk` | 8 | SDK and API tests | +| `cli` | 6 | runpodctl tests | +| `integrations` | 4 | Third-party integrations | +| `public` | 3 | Public endpoint tests | +| `tutorials` | 9 | End-to-end tutorials | + +## Single Test Execution + +When running a single test: + +1. **Read the test definition** from `tests/TESTS.md` +2. **Do NOT use prior knowledge** - only use Runpod docs 3. **Doc source mode**: - - Default: Use `mcp__runpod-docs__search_runpod_documentation` for published docs - - If `local` specified: Search and read `.mdx` files in this repository -4. **Resource naming**: All created resources MUST use `doc_test_` prefix -5. **Attempt the goal** using available tools (MCP for API, Bash for CLI) -6. **Handle GPU availability** - see GPU Fallback section below + - Default: Use `mcp__runpod-docs__search_runpod_documentation` + - If `local`: Search and read `.mdx` files in this repository +4. **Resource naming**: All resources MUST use `doc_test_` prefix +5. **Attempt the goal** using available tools +6. **Handle GPU availability** - see GPU Fallback section 7. **Verify the Expected Outcome** from the test definition -8. **Clean up** all `doc_test_*` resources after the test -9. **Generate report** using the helper script: - ```bash - python tests/scripts/report.py [--local] - ``` -10. **Complete the report** by filling in the generated template +8. **Clean up** all `doc_test_*` resources +9. **Generate report**: `python tests/scripts/report.py [--local]` +10. **Complete the report** with actual results -## GPU Fallback Guidance +## Batch Execution + +When running a category (e.g., `/test serverless`): + +1. **Parse category** - Identify all test IDs in that section of TESTS.md +2. **Show test list** - Display tests to be run and ask for confirmation +3. **Run sequentially** - Execute each test following single test rules +4. **Track results** - Record PASS/FAIL/PARTIAL for each +5. **Clean up between tests** - Delete all `doc_test_*` resources before next test +6. **Generate summary** - Create batch summary report at end + +### Batch Summary Format + +After running all tests in a batch, output: + +```markdown +## Batch Summary: + +| Test ID | Status | Notes | +|---------|--------|-------| +| test-1 | PASS | | +| test-2 | FAIL | Missing docs for X | +| test-3 | PARTIAL | Used fallback GPU | -GPU availability varies. When tests require GPU resources: +**Results:** X passed, Y failed, Z partial out of N tests +**Doc Source:** Published / Local +**Date:** YYYY-MM-DD HH:MM +``` + +Save the summary to: +- `tests/reports/batch--.md` +- `~/Dev/doc-tests/batch--.md` + +### Batch Options + +- **Stop on failure**: By default, continue through all tests. User can say "stop on first failure" +- **Skip cleanup**: User can say "skip cleanup between tests" for speed (not recommended) + +## GPU Fallback Guidance | Queue Wait | Action | |------------|--------| @@ -53,21 +107,11 @@ GPU availability varies. When tests require GPU resources: - PARTIAL: Completed with fallback GPU (doc improvement needed) - FAIL: Failed even with fallbacks -## Report Locations - -Reports are saved to both: -- `tests/reports/-.md` (gitignored) -- `~/Dev/doc-tests/-.md` (persistent archive) - -## Example +## Examples ``` -/test pods-quickstart-terminal local +/test pods-quickstart-terminal # Single test +/test flash local # All Flash tests with local docs +/test serverless # All Serverless tests +/test smoke # Quick validation ``` - -This will: -1. Load the test definition for `pods-quickstart-terminal` -2. Use local MDX files (not published docs) -3. Attempt: "Complete the Pod quickstart using only the terminal" -4. Verify: "Code runs on Pod via SSH" -5. Clean up and generate report diff --git a/.claude/style-guide.md b/.claude/style-guide.md index 00cbae5d..b0b0e776 100644 --- a/.claude/style-guide.md +++ b/.claude/style-guide.md @@ -13,6 +13,7 @@ Follow the Runpod style guide (`.cursor/rules/rp-styleguide.mdc`) and Google Dev - Secure Cloud - Community Cloud - Flash +- Public Endpoint ### Generic Terms (lowercase) - endpoint @@ -23,6 +24,7 @@ Follow the Runpod style guide (`.cursor/rules/rp-styleguide.mdc`) and Google Dev - fine-tune - network volume - data center +- repo ### Headings Always use **sentence case** for headings and titles: diff --git a/.claude/testing.md b/.claude/testing.md index b03bbbe8..f3f93a9f 100644 --- a/.claude/testing.md +++ b/.claude/testing.md @@ -8,24 +8,35 @@ Tests should be **hard to pass**. They simulate a user typing a simple request w ## Running Tests -Use the `/test` command or natural language: +Use the `/test` command: ``` -/test pods-quickstart-terminal # Command form -Run the flash-quickstart test # Natural language +/test # Single test with published docs +/test local # Single test with local docs +/test # All tests in category +/test local # Category with local docs +/test smoke # Smoke tests only ``` -Use the `/test` command to run tests: - -``` -/test pods-quickstart-terminal # Run with published docs -/test pods-quickstart-terminal local # Run with local MDX files -/test smoke # Run all smoke tests -``` - -The `/test` command loads the test definition and reminds you of the execution rules. - -## Test Execution Rules +### Categories + +| Category | Description | +|----------|-------------| +| `smoke` | Fast tests, no GPU deploys | +| `flash` | Flash SDK | +| `serverless` | Serverless endpoints | +| `vllm` | vLLM deployment | +| `pods` | Pod management | +| `storage` | Network volumes | +| `templates` | Template management | +| `clusters` | Instant Clusters | +| `sdk` | SDKs and APIs | +| `cli` | runpodctl | +| `integrations` | Third-party integrations | +| `public` | Public endpoints | +| `tutorials` | End-to-end tutorials | + +## Single Test Execution 1. Read the test definition from `tests/TESTS.md`. 2. **Do NOT use prior knowledge** - only use Runpod docs. @@ -39,6 +50,42 @@ The `/test` command loads the test definition and reminds you of the execution r ``` 8. Fill in the generated report template with actual results. +## Batch Execution + +When running a category (e.g., `/test serverless` or `/test flash local`): + +1. **Parse category** - Identify all test IDs in that section of `tests/TESTS.md` +2. **Show test list** - Display tests to be run and ask for confirmation +3. **Run sequentially** - Execute each test following single test rules +4. **Track results** - Record PASS/FAIL/PARTIAL for each test +5. **Clean up between tests** - Delete all `doc_test_*` resources before starting next test +6. **Generate summary** - Create batch summary report at end + +### Batch Summary Format + +```markdown +## Batch Summary: + +| Test ID | Status | Notes | +|---------|--------|-------| +| test-1 | PASS | | +| test-2 | FAIL | Missing docs for X | +| test-3 | PARTIAL | Used fallback GPU | + +**Results:** X passed, Y failed, Z partial out of N tests +**Doc Source:** Published / Local +**Date:** YYYY-MM-DD HH:MM +``` + +Save batch summaries to: +- `tests/reports/batch--.md` +- `~/Dev/doc-tests/batch--.md` + +### Batch Options + +- **Stop on failure**: By default, continue through all tests. Say "stop on first failure" to halt early. +- **Skip tests**: Say "skip test-id" during batch to skip specific tests. + ## GPU Fallback Guidance GPU availability varies by type and time. When a test requires GPU resources: diff --git a/.cursor/rules/rp-styleguide.mdc b/.cursor/rules/rp-styleguide.mdc index f1704452..806f6b79 100644 --- a/.cursor/rules/rp-styleguide.mdc +++ b/.cursor/rules/rp-styleguide.mdc @@ -5,8 +5,8 @@ alwaysApply: true --- Always use sentence case for headings and titles. -These are proper nouns: Runpod, Pods, Serverless, Hub, Instant Clusters, Secure Cloud, Community Cloud, Flash. -These are generic terms: endpoint, worker, cluster, template, handler, fine-tune, network volume. +These are proper nouns: Runpod, Pods, Serverless, Hub, Instant Clusters, Secure Cloud, Community Cloud, Flash, Public Endpoint. +These are generic terms: endpoint, worker, cluster, template, handler, fine-tune, network volume, data center, repo. Prefer using paragraphs to bullet points unless directly asked. When using bullet points, end each line with a period. diff --git a/CLAUDE.md b/CLAUDE.md index 47dd1de4..ab419378 100644 --- a/CLAUDE.md +++ b/CLAUDE.md @@ -38,6 +38,6 @@ Examples of things worth capturing: ## Terminology Quick Reference -**Capitalize:** Runpod, Pods, Serverless, Hub, Instant Clusters, Flash, Secure Cloud, Community Cloud +**Capitalize:** Runpod, Pods, Serverless, Hub, Instant Clusters, Flash, Secure Cloud, Community Cloud, Public Endpoint -**Lowercase:** endpoint, worker, template, handler, network volume, data center, cluster, fine-tune +**Lowercase:** endpoint, worker, template, handler, network volume, data center, cluster, fine-tune, repo diff --git a/tests/TESTS.md b/tests/TESTS.md index 0a3b606f..9ce2ac75 100644 --- a/tests/TESTS.md +++ b/tests/TESTS.md @@ -4,18 +4,21 @@ Minimal test definitions that simulate real user prompts. Tests are intentionall ## How to Run -In Claude Code, use natural language: +Use the `/test` command: ``` -Run the flash-quickstart test +/test flash-quickstart # Single test +/test serverless # All serverless tests +/test pods local # All pod tests with local docs +/test smoke # Smoke tests only ``` -``` -Run all vLLM tests -``` +Or natural language: ``` -Run smoke tests +Run the flash-quickstart test +Run all vLLM tests +Run smoke tests using local docs ``` ### Doc Source Modes