Add test batches

muhsinking · muhsinking · commit b6e21e1c9d01 · 2026-03-20T18:47:50.000-04:00
diff --git a/.claude/commands/test.md b/.claude/commands/test.md
@@ -5,40 +5,94 @@ Run a test from the testing framework to validate documentation quality.
 ## Usage
 
 ```
-/test <test-id>
-/test <test-id> local
-/test smoke
+/test <test-id>              # Run single test
+/test <test-id> local        # Run with local docs
+/test <category>             # Run all tests in category
+/test <category> local       # Run category with local docs
+/test smoke                  # Run smoke tests only
 ```
 
 ## Arguments
 
-- `<test-id>`: The test ID from `tests/TESTS.md` (e.g., `pods-quickstart-terminal`, `flash-quickstart`)
+- `<test-id>`: Single test ID (e.g., `pods-quickstart-terminal`, `flash-quickstart`)
+- `<category>`: Category name to run all tests in that section
 - `local`: (Optional) Use local MDX files instead of published docs
 - `smoke`: Run all smoke tests
 
-## Execution Rules
-
-When running a test, you MUST follow these rules:
-
-1. **Read the test definition** from `tests/TESTS.md` - find the row matching the test ID
-2. **Do NOT use prior knowledge** - only use Runpod docs (published MCP or local MDX)
+## Categories
+
+| Category | Tests | Description |
+|----------|-------|-------------|
+| `smoke` | 12 | Fast tests, no GPU deploys |
+| `flash` | 13 | Flash SDK tests |
+| `serverless` | 20 | Serverless endpoint tests |
+| `vllm` | 6 | vLLM deployment tests |
+| `pods` | 11 | Pod management tests |
+| `storage` | 11 | Network volume tests |
+| `templates` | 6 | Template tests |
+| `clusters` | 4 | Instant Cluster tests |
+| `sdk` | 8 | SDK and API tests |
+| `cli` | 6 | runpodctl tests |
+| `integrations` | 4 | Third-party integrations |
+| `public` | 3 | Public endpoint tests |
+| `tutorials` | 9 | End-to-end tutorials |
+
+## Single Test Execution
+
+When running a single test:
+
+1. **Read the test definition** from `tests/TESTS.md`
+2. **Do NOT use prior knowledge** - only use Runpod docs
 3. **Doc source mode**:
-   - Default: Use `mcp__runpod-docs__search_runpod_documentation` for published docs
-   - If `local` specified: Search and read `.mdx` files in this repository
-4. **Resource naming**: All created resources MUST use `doc_test_` prefix
-5. **Attempt the goal** using available tools (MCP for API, Bash for CLI)
-6. **Handle GPU availability** - see GPU Fallback section below
+   - Default: Use `mcp__runpod-docs__search_runpod_documentation`
+   - If `local`: Search and read `.mdx` files in this repository
+4. **Resource naming**: All resources MUST use `doc_test_` prefix
+5. **Attempt the goal** using available tools
+6. **Handle GPU availability** - see GPU Fallback section
 7. **Verify the Expected Outcome** from the test definition
-8. **Clean up** all `doc_test_*` resources after the test
-9. **Generate report** using the helper script:
-   ```bash
-   python tests/scripts/report.py <test-id> <PASS|FAIL|PARTIAL> [--local]
-   ```
-10. **Complete the report** by filling in the generated template
+8. **Clean up** all `doc_test_*` resources
+9. **Generate report**: `python tests/scripts/report.py <test-id> <PASS|FAIL|PARTIAL> [--local]`
+10. **Complete the report** with actual results
 
-## GPU Fallback Guidance
+## Batch Execution
+
+When running a category (e.g., `/test serverless`):
+
+1. **Parse category** - Identify all test IDs in that section of TESTS.md
+2. **Show test list** - Display tests to be run and ask for confirmation
+3. **Run sequentially** - Execute each test following single test rules
+4. **Track results** - Record PASS/FAIL/PARTIAL for each
+5. **Clean up between tests** - Delete all `doc_test_*` resources before next test
+6. **Generate summary** - Create batch summary report at end
+
+### Batch Summary Format
+
+After running all tests in a batch, output:
+
+```markdown
+## Batch Summary: <category>
+
+| Test ID | Status | Notes |
+|---------|--------|-------|
+| test-1 | PASS | |
+| test-2 | FAIL | Missing docs for X |
+| test-3 | PARTIAL | Used fallback GPU |
 
-GPU availability varies. When tests require GPU resources:
+**Results:** X passed, Y failed, Z partial out of N tests
+**Doc Source:** Published / Local
+**Date:** YYYY-MM-DD HH:MM
+```
+
+Save the summary to:
+- `tests/reports/batch-<category>-<timestamp>.md`
+- `~/Dev/doc-tests/batch-<category>-<timestamp>.md`
+
+### Batch Options
+
+- **Stop on failure**: By default, continue through all tests. User can say "stop on first failure"
+- **Skip cleanup**: User can say "skip cleanup between tests" for speed (not recommended)
+
+## GPU Fallback Guidance
 
 | Queue Wait | Action |
 |------------|--------|
@@ -53,21 +107,11 @@ GPU availability varies. When tests require GPU resources:
 - PARTIAL: Completed with fallback GPU (doc improvement needed)
 - FAIL: Failed even with fallbacks
 
-## Report Locations
-
-Reports are saved to both:
-- `tests/reports/<test-id>-<timestamp>.md` (gitignored)
-- `~/Dev/doc-tests/<test-id>-<timestamp>.md` (persistent archive)
-
-## Example
+## Examples
 
 ```
-/test pods-quickstart-terminal local
+/test pods-quickstart-terminal       # Single test
+/test flash local                    # All Flash tests with local docs
+/test serverless                     # All Serverless tests
+/test smoke                          # Quick validation
 ```
-
-This will:
-1. Load the test definition for `pods-quickstart-terminal`
-2. Use local MDX files (not published docs)
-3. Attempt: "Complete the Pod quickstart using only the terminal"
-4. Verify: "Code runs on Pod via SSH"
-5. Clean up and generate report
diff --git a/.claude/style-guide.md b/.claude/style-guide.md
@@ -13,6 +13,7 @@ Follow the Runpod style guide (`.cursor/rules/rp-styleguide.mdc`) and Google Dev
 - Secure Cloud
 - Community Cloud
 - Flash
+- Public Endpoint
 
 ### Generic Terms (lowercase)
 - endpoint
@@ -23,6 +24,7 @@ Follow the Runpod style guide (`.cursor/rules/rp-styleguide.mdc`) and Google Dev
 - fine-tune
 - network volume
 - data center
+- repo
 
 ### Headings
 Always use **sentence case** for headings and titles:
diff --git a/.claude/testing.md b/.claude/testing.md
@@ -8,24 +8,35 @@ Tests should be **hard to pass**. They simulate a user typing a simple request w
 
 ## Running Tests
 
-Use the `/test` command or natural language:
+Use the `/test` command:
 
 ```
-/test pods-quickstart-terminal        # Command form
-Run the flash-quickstart test         # Natural language
+/test <test-id>              # Single test with published docs
+/test <test-id> local        # Single test with local docs
+/test <category>             # All tests in category
+/test <category> local       # Category with local docs
+/test smoke                  # Smoke tests only
 ```
 
-Use the `/test` command to run tests:
-
-```
-/test pods-quickstart-terminal        # Run with published docs
-/test pods-quickstart-terminal local  # Run with local MDX files
-/test smoke                           # Run all smoke tests
-```
-
-The `/test` command loads the test definition and reminds you of the execution rules.
-
-## Test Execution Rules
+### Categories
+
+| Category | Description |
+|----------|-------------|
+| `smoke` | Fast tests, no GPU deploys |
+| `flash` | Flash SDK |
+| `serverless` | Serverless endpoints |
+| `vllm` | vLLM deployment |
+| `pods` | Pod management |
+| `storage` | Network volumes |
+| `templates` | Template management |
+| `clusters` | Instant Clusters |
+| `sdk` | SDKs and APIs |
+| `cli` | runpodctl |
+| `integrations` | Third-party integrations |
+| `public` | Public endpoints |
+| `tutorials` | End-to-end tutorials |
+
+## Single Test Execution
 
 1. Read the test definition from `tests/TESTS.md`.
 2. **Do NOT use prior knowledge** - only use Runpod docs.
@@ -39,6 +50,42 @@ The `/test` command loads the test definition and reminds you of the execution r
    ```
 8. Fill in the generated report template with actual results.
 
+## Batch Execution
+
+When running a category (e.g., `/test serverless` or `/test flash local`):
+
+1. **Parse category** - Identify all test IDs in that section of `tests/TESTS.md`
+2. **Show test list** - Display tests to be run and ask for confirmation
+3. **Run sequentially** - Execute each test following single test rules
+4. **Track results** - Record PASS/FAIL/PARTIAL for each test
+5. **Clean up between tests** - Delete all `doc_test_*` resources before starting next test
+6. **Generate summary** - Create batch summary report at end
+
+### Batch Summary Format
+
+```markdown
+## Batch Summary: <category>
+
+| Test ID | Status | Notes |
+|---------|--------|-------|
+| test-1 | PASS | |
+| test-2 | FAIL | Missing docs for X |
+| test-3 | PARTIAL | Used fallback GPU |
+
+**Results:** X passed, Y failed, Z partial out of N tests
+**Doc Source:** Published / Local
+**Date:** YYYY-MM-DD HH:MM
+```
+
+Save batch summaries to:
+- `tests/reports/batch-<category>-<timestamp>.md`
+- `~/Dev/doc-tests/batch-<category>-<timestamp>.md`
+
+### Batch Options
+
+- **Stop on failure**: By default, continue through all tests. Say "stop on first failure" to halt early.
+- **Skip tests**: Say "skip test-id" during batch to skip specific tests.
+
 ## GPU Fallback Guidance
 
 GPU availability varies by type and time. When a test requires GPU resources:
diff --git a/.cursor/rules/rp-styleguide.mdc b/.cursor/rules/rp-styleguide.mdc
@@ -5,8 +5,8 @@ alwaysApply: true
 ---
 
 Always use sentence case for headings and titles.
-These are proper nouns: Runpod, Pods, Serverless, Hub, Instant Clusters, Secure Cloud, Community Cloud, Flash.
-These are generic terms: endpoint, worker, cluster, template, handler, fine-tune, network volume.
+These are proper nouns: Runpod, Pods, Serverless, Hub, Instant Clusters, Secure Cloud, Community Cloud, Flash, Public Endpoint.
+These are generic terms: endpoint, worker, cluster, template, handler, fine-tune, network volume, data center, repo.
 
 Prefer using paragraphs to bullet points unless directly asked.
 When using bullet points, end each line with a period.
diff --git a/CLAUDE.md b/CLAUDE.md
@@ -38,6 +38,6 @@ Examples of things worth capturing:
 
 ## Terminology Quick Reference
 
-**Capitalize:** Runpod, Pods, Serverless, Hub, Instant Clusters, Flash, Secure Cloud, Community Cloud
+**Capitalize:** Runpod, Pods, Serverless, Hub, Instant Clusters, Flash, Secure Cloud, Community Cloud, Public Endpoint
 
-**Lowercase:** endpoint, worker, template, handler, network volume, data center, cluster, fine-tune
+**Lowercase:** endpoint, worker, template, handler, network volume, data center, cluster, fine-tune, repo
diff --git a/tests/TESTS.md b/tests/TESTS.md
@@ -4,18 +4,21 @@ Minimal test definitions that simulate real user prompts. Tests are intentionall
 
 ## How to Run
 
-In Claude Code, use natural language:
+Use the `/test` command:
 
 ```
-Run the flash-quickstart test
+/test flash-quickstart           # Single test
+/test serverless                 # All serverless tests
+/test pods local                 # All pod tests with local docs
+/test smoke                      # Smoke tests only
 ```
 
-```
-Run all vLLM tests
-```
+Or natural language:
 
 ```
-Run smoke tests
+Run the flash-quickstart test
+Run all vLLM tests
+Run smoke tests using local docs
 ```
 
 ### Doc Source Modes