Skip to content

Commit b6e21e1

Browse files
committed
Add test batches
1 parent d6d0910 commit b6e21e1

6 files changed

Lines changed: 158 additions & 62 deletions

File tree

.claude/commands/test.md

Lines changed: 82 additions & 38 deletions
Original file line numberDiff line numberDiff line change
@@ -5,40 +5,94 @@ Run a test from the testing framework to validate documentation quality.
55
## Usage
66

77
```
8-
/test <test-id>
9-
/test <test-id> local
10-
/test smoke
8+
/test <test-id> # Run single test
9+
/test <test-id> local # Run with local docs
10+
/test <category> # Run all tests in category
11+
/test <category> local # Run category with local docs
12+
/test smoke # Run smoke tests only
1113
```
1214

1315
## Arguments
1416

15-
- `<test-id>`: The test ID from `tests/TESTS.md` (e.g., `pods-quickstart-terminal`, `flash-quickstart`)
17+
- `<test-id>`: Single test ID (e.g., `pods-quickstart-terminal`, `flash-quickstart`)
18+
- `<category>`: Category name to run all tests in that section
1619
- `local`: (Optional) Use local MDX files instead of published docs
1720
- `smoke`: Run all smoke tests
1821

19-
## Execution Rules
20-
21-
When running a test, you MUST follow these rules:
22-
23-
1. **Read the test definition** from `tests/TESTS.md` - find the row matching the test ID
24-
2. **Do NOT use prior knowledge** - only use Runpod docs (published MCP or local MDX)
22+
## Categories
23+
24+
| Category | Tests | Description |
25+
|----------|-------|-------------|
26+
| `smoke` | 12 | Fast tests, no GPU deploys |
27+
| `flash` | 13 | Flash SDK tests |
28+
| `serverless` | 20 | Serverless endpoint tests |
29+
| `vllm` | 6 | vLLM deployment tests |
30+
| `pods` | 11 | Pod management tests |
31+
| `storage` | 11 | Network volume tests |
32+
| `templates` | 6 | Template tests |
33+
| `clusters` | 4 | Instant Cluster tests |
34+
| `sdk` | 8 | SDK and API tests |
35+
| `cli` | 6 | runpodctl tests |
36+
| `integrations` | 4 | Third-party integrations |
37+
| `public` | 3 | Public endpoint tests |
38+
| `tutorials` | 9 | End-to-end tutorials |
39+
40+
## Single Test Execution
41+
42+
When running a single test:
43+
44+
1. **Read the test definition** from `tests/TESTS.md`
45+
2. **Do NOT use prior knowledge** - only use Runpod docs
2546
3. **Doc source mode**:
26-
- Default: Use `mcp__runpod-docs__search_runpod_documentation` for published docs
27-
- If `local` specified: Search and read `.mdx` files in this repository
28-
4. **Resource naming**: All created resources MUST use `doc_test_` prefix
29-
5. **Attempt the goal** using available tools (MCP for API, Bash for CLI)
30-
6. **Handle GPU availability** - see GPU Fallback section below
47+
- Default: Use `mcp__runpod-docs__search_runpod_documentation`
48+
- If `local`: Search and read `.mdx` files in this repository
49+
4. **Resource naming**: All resources MUST use `doc_test_` prefix
50+
5. **Attempt the goal** using available tools
51+
6. **Handle GPU availability** - see GPU Fallback section
3152
7. **Verify the Expected Outcome** from the test definition
32-
8. **Clean up** all `doc_test_*` resources after the test
33-
9. **Generate report** using the helper script:
34-
```bash
35-
python tests/scripts/report.py <test-id> <PASS|FAIL|PARTIAL> [--local]
36-
```
37-
10. **Complete the report** by filling in the generated template
53+
8. **Clean up** all `doc_test_*` resources
54+
9. **Generate report**: `python tests/scripts/report.py <test-id> <PASS|FAIL|PARTIAL> [--local]`
55+
10. **Complete the report** with actual results
3856

39-
## GPU Fallback Guidance
57+
## Batch Execution
58+
59+
When running a category (e.g., `/test serverless`):
60+
61+
1. **Parse category** - Identify all test IDs in that section of TESTS.md
62+
2. **Show test list** - Display tests to be run and ask for confirmation
63+
3. **Run sequentially** - Execute each test following single test rules
64+
4. **Track results** - Record PASS/FAIL/PARTIAL for each
65+
5. **Clean up between tests** - Delete all `doc_test_*` resources before next test
66+
6. **Generate summary** - Create batch summary report at end
67+
68+
### Batch Summary Format
69+
70+
After running all tests in a batch, output:
71+
72+
```markdown
73+
## Batch Summary: <category>
74+
75+
| Test ID | Status | Notes |
76+
|---------|--------|-------|
77+
| test-1 | PASS | |
78+
| test-2 | FAIL | Missing docs for X |
79+
| test-3 | PARTIAL | Used fallback GPU |
4080

41-
GPU availability varies. When tests require GPU resources:
81+
**Results:** X passed, Y failed, Z partial out of N tests
82+
**Doc Source:** Published / Local
83+
**Date:** YYYY-MM-DD HH:MM
84+
```
85+
86+
Save the summary to:
87+
- `tests/reports/batch-<category>-<timestamp>.md`
88+
- `~/Dev/doc-tests/batch-<category>-<timestamp>.md`
89+
90+
### Batch Options
91+
92+
- **Stop on failure**: By default, continue through all tests. User can say "stop on first failure"
93+
- **Skip cleanup**: User can say "skip cleanup between tests" for speed (not recommended)
94+
95+
## GPU Fallback Guidance
4296

4397
| Queue Wait | Action |
4498
|------------|--------|
@@ -53,21 +107,11 @@ GPU availability varies. When tests require GPU resources:
53107
- PARTIAL: Completed with fallback GPU (doc improvement needed)
54108
- FAIL: Failed even with fallbacks
55109

56-
## Report Locations
57-
58-
Reports are saved to both:
59-
- `tests/reports/<test-id>-<timestamp>.md` (gitignored)
60-
- `~/Dev/doc-tests/<test-id>-<timestamp>.md` (persistent archive)
61-
62-
## Example
110+
## Examples
63111

64112
```
65-
/test pods-quickstart-terminal local
113+
/test pods-quickstart-terminal # Single test
114+
/test flash local # All Flash tests with local docs
115+
/test serverless # All Serverless tests
116+
/test smoke # Quick validation
66117
```
67-
68-
This will:
69-
1. Load the test definition for `pods-quickstart-terminal`
70-
2. Use local MDX files (not published docs)
71-
3. Attempt: "Complete the Pod quickstart using only the terminal"
72-
4. Verify: "Code runs on Pod via SSH"
73-
5. Clean up and generate report

.claude/style-guide.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -13,6 +13,7 @@ Follow the Runpod style guide (`.cursor/rules/rp-styleguide.mdc`) and Google Dev
1313
- Secure Cloud
1414
- Community Cloud
1515
- Flash
16+
- Public Endpoint
1617

1718
### Generic Terms (lowercase)
1819
- endpoint
@@ -23,6 +24,7 @@ Follow the Runpod style guide (`.cursor/rules/rp-styleguide.mdc`) and Google Dev
2324
- fine-tune
2425
- network volume
2526
- data center
27+
- repo
2628

2729
### Headings
2830
Always use **sentence case** for headings and titles:

.claude/testing.md

Lines changed: 61 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -8,24 +8,35 @@ Tests should be **hard to pass**. They simulate a user typing a simple request w
88

99
## Running Tests
1010

11-
Use the `/test` command or natural language:
11+
Use the `/test` command:
1212

1313
```
14-
/test pods-quickstart-terminal # Command form
15-
Run the flash-quickstart test # Natural language
14+
/test <test-id> # Single test with published docs
15+
/test <test-id> local # Single test with local docs
16+
/test <category> # All tests in category
17+
/test <category> local # Category with local docs
18+
/test smoke # Smoke tests only
1619
```
1720

18-
Use the `/test` command to run tests:
19-
20-
```
21-
/test pods-quickstart-terminal # Run with published docs
22-
/test pods-quickstart-terminal local # Run with local MDX files
23-
/test smoke # Run all smoke tests
24-
```
25-
26-
The `/test` command loads the test definition and reminds you of the execution rules.
27-
28-
## Test Execution Rules
21+
### Categories
22+
23+
| Category | Description |
24+
|----------|-------------|
25+
| `smoke` | Fast tests, no GPU deploys |
26+
| `flash` | Flash SDK |
27+
| `serverless` | Serverless endpoints |
28+
| `vllm` | vLLM deployment |
29+
| `pods` | Pod management |
30+
| `storage` | Network volumes |
31+
| `templates` | Template management |
32+
| `clusters` | Instant Clusters |
33+
| `sdk` | SDKs and APIs |
34+
| `cli` | runpodctl |
35+
| `integrations` | Third-party integrations |
36+
| `public` | Public endpoints |
37+
| `tutorials` | End-to-end tutorials |
38+
39+
## Single Test Execution
2940

3041
1. Read the test definition from `tests/TESTS.md`.
3142
2. **Do NOT use prior knowledge** - only use Runpod docs.
@@ -39,6 +50,42 @@ The `/test` command loads the test definition and reminds you of the execution r
3950
```
4051
8. Fill in the generated report template with actual results.
4152

53+
## Batch Execution
54+
55+
When running a category (e.g., `/test serverless` or `/test flash local`):
56+
57+
1. **Parse category** - Identify all test IDs in that section of `tests/TESTS.md`
58+
2. **Show test list** - Display tests to be run and ask for confirmation
59+
3. **Run sequentially** - Execute each test following single test rules
60+
4. **Track results** - Record PASS/FAIL/PARTIAL for each test
61+
5. **Clean up between tests** - Delete all `doc_test_*` resources before starting next test
62+
6. **Generate summary** - Create batch summary report at end
63+
64+
### Batch Summary Format
65+
66+
```markdown
67+
## Batch Summary: <category>
68+
69+
| Test ID | Status | Notes |
70+
|---------|--------|-------|
71+
| test-1 | PASS | |
72+
| test-2 | FAIL | Missing docs for X |
73+
| test-3 | PARTIAL | Used fallback GPU |
74+
75+
**Results:** X passed, Y failed, Z partial out of N tests
76+
**Doc Source:** Published / Local
77+
**Date:** YYYY-MM-DD HH:MM
78+
```
79+
80+
Save batch summaries to:
81+
- `tests/reports/batch-<category>-<timestamp>.md`
82+
- `~/Dev/doc-tests/batch-<category>-<timestamp>.md`
83+
84+
### Batch Options
85+
86+
- **Stop on failure**: By default, continue through all tests. Say "stop on first failure" to halt early.
87+
- **Skip tests**: Say "skip test-id" during batch to skip specific tests.
88+
4289
## GPU Fallback Guidance
4390

4491
GPU availability varies by type and time. When a test requires GPU resources:

.cursor/rules/rp-styleguide.mdc

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -5,8 +5,8 @@ alwaysApply: true
55
---
66

77
Always use sentence case for headings and titles.
8-
These are proper nouns: Runpod, Pods, Serverless, Hub, Instant Clusters, Secure Cloud, Community Cloud, Flash.
9-
These are generic terms: endpoint, worker, cluster, template, handler, fine-tune, network volume.
8+
These are proper nouns: Runpod, Pods, Serverless, Hub, Instant Clusters, Secure Cloud, Community Cloud, Flash, Public Endpoint.
9+
These are generic terms: endpoint, worker, cluster, template, handler, fine-tune, network volume, data center, repo.
1010

1111
Prefer using paragraphs to bullet points unless directly asked.
1212
When using bullet points, end each line with a period.

CLAUDE.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -38,6 +38,6 @@ Examples of things worth capturing:
3838

3939
## Terminology Quick Reference
4040

41-
**Capitalize:** Runpod, Pods, Serverless, Hub, Instant Clusters, Flash, Secure Cloud, Community Cloud
41+
**Capitalize:** Runpod, Pods, Serverless, Hub, Instant Clusters, Flash, Secure Cloud, Community Cloud, Public Endpoint
4242

43-
**Lowercase:** endpoint, worker, template, handler, network volume, data center, cluster, fine-tune
43+
**Lowercase:** endpoint, worker, template, handler, network volume, data center, cluster, fine-tune, repo

tests/TESTS.md

Lines changed: 9 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -4,18 +4,21 @@ Minimal test definitions that simulate real user prompts. Tests are intentionall
44

55
## How to Run
66

7-
In Claude Code, use natural language:
7+
Use the `/test` command:
88

99
```
10-
Run the flash-quickstart test
10+
/test flash-quickstart # Single test
11+
/test serverless # All serverless tests
12+
/test pods local # All pod tests with local docs
13+
/test smoke # Smoke tests only
1114
```
1215

13-
```
14-
Run all vLLM tests
15-
```
16+
Or natural language:
1617

1718
```
18-
Run smoke tests
19+
Run the flash-quickstart test
20+
Run all vLLM tests
21+
Run smoke tests using local docs
1922
```
2023

2124
### Doc Source Modes

0 commit comments

Comments
 (0)