Skip to content

Commit 90997ad

Browse files
gltanakaclaude
andcommitted
feat: Add agentic test workflow for UI test generation (#332)
Add new `pdd test <github_issue_url>` agentic workflow that automates UI test creation through a 9-step process: 1. Duplicate check 2. Documentation research 3. Requirements clarification 4. Frontend detection 5. Test plan creation 6. Test generation 7. Test execution 8. Fix/iterate cycle 9. PR submission New prompts: - agentic_test_python.prompt - CLI entry point - agentic_test_orchestrator_python.prompt - 9-step workflow orchestrator - 9 LLM step prompts for each workflow phase Modified: - cli_python.prompt - Add re-export for agentic_test_main - README.md - Document new pdd test agentic command - docs/TUTORIALS.md - Add usage tutorial The existing `pdd test` command remains backward compatible. The --manual flag preserves original behavior. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
1 parent 275111b commit 90997ad

16 files changed

Lines changed: 1739 additions & 0 deletions

README.md

Lines changed: 62 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -28,6 +28,7 @@ For CLI users, PDD also offers powerful **agentic commands** that implement GitH
2828
- `pdd change <issue-url>` - Implement feature requests (12-step workflow)
2929
- `pdd bug <issue-url>` - Create failing tests for bugs
3030
- `pdd fix <issue-url>` - Fix the failing tests
31+
- `pdd test <issue-url>` - Generate UI tests from issue descriptions (9-step workflow)
3132

3233
For prompt-based workflows, the **`sync`** command automates the complete development cycle with intelligent decision-making, real-time visual feedback, and sophisticated state management.
3334

@@ -540,6 +541,7 @@ flowchart TB
540541
change["pdd change &lt;url&gt;"]
541542
bug["pdd bug &lt;url&gt;"]
542543
fix_url["pdd fix &lt;url&gt;"]
544+
test_url["pdd test &lt;url&gt;"]
543545
end
544546
545547
sync["pdd sync"]
@@ -570,6 +572,7 @@ flowchart TB
570572
- **[`change`](#8-change)**: Implement feature requests from GitHub issues (12-step workflow)
571573
- **[`bug`](#14-bug)**: Analyze bugs and create failing tests from GitHub issues
572574
- **[`fix`](#6-fix)**: Fix failing tests (supports issue-driven and manual modes)
575+
- **[`test`](#4-test)**: Generate UI tests from GitHub issues (9-step workflow in agentic mode)
573576

574577
### Core Commands (Prompt-Based)
575578
- **[`sync`](#1-sync)**: **[PRIMARY FOR PROMPT WORKFLOWS]** Automated prompt-to-code cycle
@@ -1433,6 +1436,64 @@ pdd [GLOBAL OPTIONS] example --output examples/factorial_calculator_example.py f
14331436
14341437
### 4. test
14351438
1439+
Generate or enhance unit tests for a given code file and its corresponding prompt file. Also supports **agentic mode** for generating UI tests from GitHub issues.
1440+
1441+
#### Agentic Mode (UI Test Generation)
1442+
1443+
Generate UI tests from a GitHub issue. The issue describes what needs to be tested (a webpage, CLI, or desktop app), and an agentic workflow analyzes the target, creates a test plan, and generates comprehensive UI tests.
1444+
1445+
```
1446+
pdd [GLOBAL OPTIONS] test <github-issue-url>
1447+
```
1448+
1449+
**How it works (9-step workflow with GitHub comments):**
1450+
1451+
1. **Duplicate check** - Search for existing issues describing the same test requirements. If found, merge content and close the duplicate. Posts comment with findings.
1452+
1453+
2. **Documentation check** - Review repo documentation and codebase to understand what needs to be tested. Posts comment with findings.
1454+
1455+
3. **Analyze & clarify** - Determine if enough information exists in the issue to create tests. Posts comment requesting clarification if needed.
1456+
1457+
4. **Detect frontend** - Identify the frontend type: web UI (Next.js, React, etc.), CLI, or desktop app. Determines the appropriate testing framework (e.g., Playwright for web). Posts comment with frontend analysis.
1458+
1459+
5. **Create test plan** - Design a comprehensive test plan and verify it's achievable. Posts comment requesting information (e.g., credential access) if plan is blocked.
1460+
1461+
6. **Generate tests** - Create UI tests in a new worktree following the test plan. Posts comment with generated test code.
1462+
1463+
7. **Run tests** - Execute the generated tests against the target. Posts comment with test results.
1464+
1465+
8. **Fix & iterate** - Fix any failing tests and re-run until they pass. Posts comment with fix attempts and final status.
1466+
1467+
9. **Submit PR** - Create a draft pull request with the UI tests linked to the issue. Posts comment with PR link.
1468+
1469+
**Agentic Options:**
1470+
- `--timeout-adder FLOAT`: Add additional seconds to each step's timeout (default: 0.0)
1471+
- `--no-github-state`: Disable GitHub issue comment-based state persistence, use local-only
1472+
- `--manual`: Use legacy prompt-based mode instead of agentic mode
1473+
1474+
**Cross-Machine Resume**: By default, workflow state is stored in a hidden comment on the GitHub issue, enabling resume from any machine. Use `--no-github-state` to disable this feature. You can also set `PDD_NO_GITHUB_STATE=1` environment variable.
1475+
1476+
**Example (Agentic Mode):**
1477+
```bash
1478+
# Generate UI tests from a GitHub issue
1479+
pdd test https://github.com/myorg/myrepo/issues/789
1480+
1481+
# Resume after answering clarifying questions
1482+
pdd test https://github.com/myorg/myrepo/issues/789
1483+
```
1484+
1485+
**Next Step - Fixing Test Issues:**
1486+
1487+
If the generated tests reveal issues that need code fixes, use `pdd fix` with the same issue URL:
1488+
1489+
```bash
1490+
pdd fix https://github.com/myorg/myrepo/issues/789
1491+
```
1492+
1493+
---
1494+
1495+
#### Manual Mode (Prompt-Based)
1496+
14361497
Generate or enhance unit tests for a given code file and its corresponding prompt file.
14371498

14381499
Test organization:
@@ -1441,6 +1502,7 @@ Test organization:
14411502

14421503
```
14431504
pdd [GLOBAL OPTIONS] test [OPTIONS] PROMPT_FILE CODE_OR_EXAMPLE_FILE
1505+
pdd [GLOBAL OPTIONS] test --manual [OPTIONS] PROMPT_FILE CODE_OR_EXAMPLE_FILE
14441506
```
14451507

14461508
Arguments:
Lines changed: 167 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,167 @@
1+
"""
2+
Example usage of the agentic_test_orchestrator module.
3+
4+
This script demonstrates how to invoke the `run_agentic_test_orchestrator` function.
5+
Since the orchestrator relies on internal modules like `run_agentic_task` and `load_prompt_template`,
6+
this example mocks those dependencies to simulate a successful UI test generation workflow
7+
without making actual LLM calls or requiring a real GitHub issue.
8+
9+
Scenario:
10+
We simulate an issue where a user requests UI tests for a login page.
11+
The orchestrator will step through the 9-step process:
12+
1. Check for duplicate test requests
13+
2. Review codebase documentation
14+
3. Analyze and ask clarifying questions if needed
15+
4. Detect frontend type (web/CLI/desktop)
16+
5. Create test plan
17+
6. Generate UI tests
18+
7. Run tests
19+
8. Fix and iterate on failing tests
20+
9. Submit PR
21+
"""
22+
23+
import sys
24+
from pathlib import Path
25+
from unittest.mock import patch, MagicMock
26+
27+
# Ensure the project root is in sys.path so we can import the module
28+
project_root = Path(__file__).resolve().parent.parent
29+
sys.path.append(str(project_root))
30+
31+
try:
32+
from pdd.agentic_test_orchestrator import run_agentic_test_orchestrator
33+
except ImportError:
34+
print("Error: Could not import 'pdd.agentic_test_orchestrator'.")
35+
print("Ensure your PYTHONPATH is set correctly or the file structure matches.")
36+
sys.exit(1)
37+
38+
39+
def mock_load_prompt_template(template_name: str) -> str:
40+
"""
41+
Mock implementation of load_prompt_template.
42+
Returns a dummy prompt string based on the requested template name.
43+
"""
44+
return f"MOCK PROMPT FOR: {template_name}\nContext: {{issue_content}}"
45+
46+
47+
def mock_run_agentic_task(instruction: str, cwd: Path, verbose: bool, quiet: bool, label: str, timeout: float = None, max_retries: int = 3):
48+
"""
49+
Mock implementation of run_agentic_task.
50+
Simulates the output of an LLM agent for each step of the 9-step UI test workflow.
51+
"""
52+
step_num = label.replace("step", "")
53+
54+
# Default return values
55+
success = True
56+
cost = 0.15 # Simulated cost per step
57+
provider = "anthropic"
58+
output = ""
59+
60+
if step_num == "1":
61+
output = "No duplicate test requests found. Proceeding with UI test generation."
62+
elif step_num == "2":
63+
output = """Codebase review complete:
64+
- Frontend: Next.js application in /frontend
65+
- Auth pages: /frontend/pages/auth/login.tsx, /frontend/pages/auth/register.tsx
66+
- Components: LoginForm, RegisterForm in /frontend/components/auth/
67+
- API routes: /api/auth/login, /api/auth/register"""
68+
elif step_num == "3":
69+
output = """Requirements are clear:
70+
- Test login page functionality
71+
- Verify form validation (email format, password requirements)
72+
- Test successful and failed login scenarios
73+
- No clarification needed from author."""
74+
elif step_num == "4":
75+
output = """Frontend detected: Next.js (React)
76+
Test framework: Playwright
77+
Base URL: http://localhost:3000
78+
Authentication: Session-based via NextAuth.js"""
79+
elif step_num == "5":
80+
output = """Test Plan:
81+
1. Login page renders correctly
82+
2. Form validation - invalid email shows error
83+
3. Form validation - password too short shows error
84+
4. Successful login redirects to dashboard
85+
5. Failed login shows error message
86+
6. Remember me checkbox persists session
87+
88+
Estimated tests: 6 test cases
89+
Files to create: tests/e2e/login.spec.ts"""
90+
elif step_num == "6":
91+
output = """FILES_CREATED: tests/e2e/login.spec.ts
92+
93+
Generated Playwright test file with 6 test cases:
94+
- test('login page renders correctly')
95+
- test('shows error for invalid email')
96+
- test('shows error for short password')
97+
- test('successful login redirects to dashboard')
98+
- test('failed login shows error message')
99+
- test('remember me persists session')"""
100+
elif step_num == "7":
101+
output = """Test execution results:
102+
6 tests total
103+
5 passed
104+
1 failed: 'remember me persists session' - localStorage not mocked
105+
106+
Overall: 83% pass rate"""
107+
elif step_num == "8":
108+
output = """Fixed failing test:
109+
- Added localStorage mock in beforeEach hook
110+
- Re-ran tests: 6/6 passed
111+
112+
FILES_MODIFIED: tests/e2e/login.spec.ts
113+
All tests now passing."""
114+
elif step_num == "9":
115+
output = """PR Created: https://github.com/example/myapp/pull/456
116+
117+
Title: Add UI tests for login page (#123)
118+
Branch: test/issue-123
119+
Files: tests/e2e/login.spec.ts"""
120+
else:
121+
output = f"Unknown step executed: {step_num}"
122+
123+
return success, output, cost, provider
124+
125+
126+
def main():
127+
"""Main function to run the agentic test orchestrator simulation."""
128+
# Define dummy issue data
129+
issue_data = {
130+
"issue_url": "https://github.com/example/myapp/issues/123",
131+
"issue_content": "Create UI tests for the login page. Should test form validation, successful login, and error handling.",
132+
"repo_owner": "example",
133+
"repo_name": "myapp",
134+
"issue_number": 123,
135+
"issue_author": "test_requester",
136+
"issue_title": "Add UI tests for login page",
137+
"cwd": Path("./temp_workspace"),
138+
"verbose": True,
139+
"quiet": False,
140+
"timeout_adder": 0.0,
141+
"use_github_state": False # Disable for simulation
142+
}
143+
144+
print("Starting Agentic UI Test Orchestrator Simulation...")
145+
print("-" * 60)
146+
147+
# Patch the internal dependencies
148+
with patch("pdd.agentic_test_orchestrator.load_prompt_template", side_effect=mock_load_prompt_template), \
149+
patch("pdd.agentic_test_orchestrator.run_agentic_task", side_effect=mock_run_agentic_task):
150+
151+
# Run the orchestrator
152+
success, final_msg, total_cost, model, changed_files = run_agentic_test_orchestrator(
153+
**issue_data
154+
)
155+
156+
print("-" * 60)
157+
print("Simulation Complete.")
158+
print(f"Success: {success}")
159+
print(f"Final Message: {final_msg}")
160+
print(f"Total Cost: ${total_cost:.2f}")
161+
print(f"Model Used: {model}")
162+
print(f"Changed Files: {changed_files}")
163+
print("\nNext step: Review the generated tests and merge the PR.")
164+
165+
166+
if __name__ == "__main__":
167+
main()

docs/TUTORIALS.md

Lines changed: 28 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -75,6 +75,34 @@ This tutorial walks through implementing a GitHub issue using PDD.
7575
- The PR is updated with the fix
7676
- Review and merge when ready
7777

78+
### Method 4: Generating UI Tests
79+
80+
1. **Create a GitHub Issue**
81+
- Describe what needs to be tested (webpage URL, CLI command, or desktop app)
82+
- Include screenshots or text descriptions of expected behavior
83+
- Specify what elements/interactions should be verified
84+
85+
2. **Generate UI Tests**
86+
```bash
87+
pdd test https://github.com/myorg/myrepo/issues/789
88+
```
89+
This analyzes the target and creates comprehensive UI tests.
90+
91+
3. **Handle Clarifying Questions**
92+
- If PDD needs more information (e.g., credentials, test environment setup), it posts questions to the issue
93+
- Answer them in the GitHub issue comments
94+
- Run `pdd test` again to resume
95+
96+
4. **Review the Generated Tests**
97+
- The PR contains tests for the specified UI (Playwright for web, pytest for CLI, etc.)
98+
- Review and adjust tests as needed
99+
100+
5. **Fix Any Issues Found**
101+
```bash
102+
pdd fix https://github.com/myorg/myrepo/issues/789
103+
```
104+
Use this if tests reveal bugs that need fixing.
105+
78106
### Tips
79107

80108
- **Resume from anywhere**: Workflow state is saved to GitHub, so you can continue on any machine

0 commit comments

Comments
 (0)