Replay 2906 commits from private repo history#549
Open
jiaminc-cmu wants to merge 2907 commits intomainfrom
Open
Replay 2906 commits from private repo history#549jiaminc-cmu wants to merge 2907 commits intomainfrom
jiaminc-cmu wants to merge 2907 commits intomainfrom
Conversation
…parse` in `unfinished_prompt.py`.
- Fix format string injection: Escape curly braces in LLM outputs before
storing in context to prevent KeyError when subsequent prompts contain
{placeholders} from code/error analysis
- Fix silent error: Print KeyError messages to console before returning
- Fix resume message: Calculate actual start step (5.5) before displaying
resume message instead of showing incorrect "step 6"
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
Add 5 new tests covering: - Format string injection: Verify curly braces in LLM outputs don't cause KeyError - Restored context escaping: Verify curly braces in resumed state are escaped - Error console output: Verify KeyError messages are printed to console - Resume message for step 5.5: Verify correct step shown when resuming after step 5 - Resume message for step 6: Verify correct step shown when resuming after step 5.5 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
Cherry-picked changes from PR #267: - Add optional interactive steering to sync command - Fix sync animation for horizontal terminal resizes - Add --no-steer and --steer-timeout options - Add sync_tui tests and example Note: Excluded pdd/prompts/* (symlink) and project_dependencies.csv Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
## Non-Python Sync Fixes - Skip test_extend for non-Python languages since code coverage tooling is Python-specific - Fix sync returning success without generating tests for non-Python modules - Added check for test file existence before accepting workflow as complete - The synthetic RunReport from crash/verify was incorrectly triggering "all_synced" - Add safety checks in sync_orchestration.py and pin_example_hack.py ## Frontend File Detection Fixes - Support new .pddrc `outputs.code.path` format (Issue #237) - Previously only looked for legacy `generate_output_path` - Now checks `outputs.code.path`, `outputs.test.path`, `outputs.example.path` first - Add .test.ts/.spec.ts patterns for Jest/TypeScript test file detection - Fixes detection of files like `test_prisma_schema.test.ts` - Rebuild frontend with updated file detection logic ## Architecture Generation Fixes - Add valid language suffixes guidance to prevent LLM using invalid suffixes like `_NextJS` - Escape curly braces in architecture_json.prompt template to prevent .format() errors - Add preprocessing in orchestrator before .format() calls ## Path Resolution Fixes - Add `typescriptreact` -> `.tsx`, `javascriptreact` -> `.jsx`, `prisma` -> `.prisma` mappings - Ensure example and test paths always have fallback defaults 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
## Workflow Changes
Reorganized the agentic architecture workflow for better modularity:
- Step 1-6: Unchanged (analyze PRD, decompose, research, design, deps, generate)
- Step 7: NEW - Generate .pddrc configuration file
- Step 8: Renamed from step 7 - Generate individual prompts
- Step 9: Renamed from step 8 - Completeness validation
- Step 10: Renamed from step 9 - Sync prompts with architecture
- Step 11: NEW - Dependency resolution
- Step 12: Renamed from step 11 - Fix validation errors
## New Files
- `prompts/agentic_arch_step7_pddrc_LLM.prompt` - .pddrc generation step
- `prompts/agentic_arch_step11_deps_LLM.prompt` - Dependency resolution step
- `prompts/agentic_arch_step12_fix_LLM.prompt` - Enhanced fix step
- `pdd/templates/architecture/example_nextjs_task_notes.prompt` - Example for Next.js projects
- `pdd/templates/architecture/pdd_path_construction_guide.prompt` - Path construction reference
## Template Fixes
- Escape curly braces in docs/prompting_guide.md to prevent .format() errors
- Change {PLACEHOLDER} to [PLACEHOLDER] in generate_prompt.prompt to avoid confusion
- Updated step count references in step 1 and 2 prompts (8 -> 11)
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
## Task Queue Panel Improvements - Make task queue panel draggable by adding drag handle - Save/restore panel position to localStorage - Add reset position button to return to default (top-right corner) - Keep panel within viewport bounds on window resize - Separate collapse toggle from drag handle for better UX ## Generate Command - Add --skip-prompts flag to skip prompt generation in agentic architecture mode - Prompts are generated by default; flag allows skipping when not needed ## Logging - Change generate_output_paths logging from INFO to DEBUG level - Reduces noise since paths may be overridden by outputs.code.path config 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
- test_agentic_architecture_orchestrator: Update all tests to reflect the new 11-step workflow (steps 1-8 linear + steps 9-11 validation) - test_sync_determine_operation: Fix test_decision_test_on_low_coverage to create actual test file (required after test file existence check) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
Update README and TUTORIALS.md to reflect the new 11-step agentic architecture workflow: - Steps 1-8: Analysis & generation (architecture.json, .pddrc, prompts) - Steps 9-11: Validation (completeness, sync, dependencies) Also document the new --skip-prompts option for faster architecture-only generation. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
- test_commands_generate.py: Add skip_prompts=False to expected call
- code_generator_main.py: Handle both {{PLACEHOLDER}} (YAML-escaped)
and {PLACEHOLDER} (single brace) in post_process_args substitution
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
This reverts commit 351813a22a247312e74cefb85f8ed2a4334ad218.
…ck.acquire() This commit adds comprehensive unit and E2E tests that detect the file handle resource leak in SyncLock.acquire() when non-IOError/OSError exceptions occur. Tests added: - Unit tests in test_sync_determine_operation.py (6 tests) - KeyboardInterrupt during lock acquisition - RuntimeError during lock acquisition - Exception during file operations - IOError/OSError regression tests - Normal operation regression tests - Context manager exception handling - E2E tests in test_e2e_issue_403_file_handle_leak.py (4 tests) - Real-world KeyboardInterrupt scenario (Ctrl+C) - RuntimeError leak detection - Normal operation verification - Context manager interrupt handling All tests correctly fail on current code, detecting the bug where file descriptors remain open when exceptions other than IOError/OSError occur during lock acquisition. Related to #403 Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
- Revert unnecessary code change to handle double braces in code_generator_main.py
- Update test_code_generator_main.py to normalize {{PLACEHOLDER}} to {PLACEHOLDER}
when reading the template for testing purposes
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
- Fix mock_httpx_client fixture to properly mock async context manager using AsyncMock for __aenter__ and __aexit__ - Fix test_first_heartbeat_sent_immediately to avoid orphan coroutines by using side_effect instead of reassigning the mock - Fix test_heartbeat_refreshes_token_on_401 to use side_effect pattern - Fix test_heartbeat_only_refreshes_once_per_cycle to use return_value 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
Add explanation that core_dump files are created when PDD runs crash or hit internal errors, per Copilot review suggestion. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
…h expanded agentic architecture, various bug fixes, and refactors.
…gration This update improves the verbose logging setup by allowing the LiteLLM library to toggle its debug output based on the verbose flag or environment variable. It ensures that logging levels are appropriately set for production and development modes, and adds error handling for potential attribute access issues in LiteLLM.
…n SyncLock.acquire()" This reverts commit 400e8ea02e6d20877f41232f568dea3183aadab2.
- Add batch detection using Union-Find algorithm to group modules by dependency - Each batch is a connected component in the dependency graph - Modules within a batch sync sequentially (by priority), different batches are independent - Add BatchFilterDropdown component with expandable module list view - Add batch color stripe indicator on graph nodes - Add SyncOptionsModal for configuring sync options before execution - Various UI improvements to PromptSelector, PromptSpace, and constants 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
- Add `agentic_mode` parameter to sync_orchestration for Python agentic path - Change cmd_test_main return type from 3-tuple to 4-tuple with agentic_success flag - Add run_agentic_test_generate 4-tuple return with success boolean - Add _use_agentic_path() helper to determine agentic behavior - Add _create_synthetic_run_report_for_agentic_success() for non-Python languages - Fix sync_determine_operation to differentiate synthetic vs real run reports using test_hash - Use sentinel value "agentic_test_success" when agent succeeds but file is at different path - Trust agentic_success flag for non-Python test generation instead of file existence check - Update prompts and examples to reflect API changes Fixes issue where sync reported failure despite successful agentic test generation for non-Python languages (CSS, TypeScript, etc.) where test files may be created at different paths or with different extensions than expected. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
…upport Add agentic_mode parameter throughout the sync workflow: - commands/maintenance.py: Add --agentic CLI flag to sync command - sync_main.py: Pass agentic_mode parameter to sync_orchestration - prompts/sync_main_python.prompt: Document agentic_mode parameter Update agentic_test_generate return signature: - prompts/agentic_test_generate_python.prompt: Update return type from tuple[str, float, str] to tuple[str, float, str, bool] to include success boolean, matching actual implementation Fix cloud timeout handling: - fix_verification_errors_loop.py: Use get_cloud_timeout() function instead of hardcoded CLOUD_REQUEST_TIMEOUT constant Improve server job failure detection: - server/jobs.py: Check stdout for sync failure indicators even when exit code is 0, since sync may return 0 but report failure in output Extend language extension mappings: - server/routes/files.py: Add HTML, CSS, and Makefile extensions; include "Dockerfile" without extension prefix 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
Update test_sync_dry_run_mode to expect the new agentic_mode=False parameter in sync_orchestration calls. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
…er for the 3blue1brown demo, along with related dependency and changelog updates.
…of a test fix attempt.
This commit adds comprehensive test coverage for the bug where commits created by LLM agents during Step 1 of the agentic E2E fix workflow are not pushed to the remote repository when the workflow exits early at Step 2. Test files: - tests/test_e2e_issue_419_unpushed_commits.py: Unit tests for _commit_and_push() - tests/test_e2e_issue_419_cli_unpushed_commits.py: E2E integration test The tests verify the expected behavior documented in CHANGELOG v0.0.121: "pdd fix now automatically commits and pushes changes after successful completion" These tests fail on the current buggy code and will pass once the fix is implemented in pdd/agentic_e2e_fix_orchestrator.py lines 237-238. Related to #419
result[-2] on a 4-tuple returns model name string instead of cost float, silently dropping test/test_extend costs. Fixes #508 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Fix generate test to use correct 4-tuple (content, was_incremental, cost, model) - Update E2E mock for code_generator_main to return 4-tuple - Improve comment accuracy for tuple format documentation Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…antic schema validation, and introduce new grounding stability experiments.
…encies Unit tests and E2E test that reproduce the bug where sync_determine_operation only hashes the raw .prompt file, missing changes to <include>d dependencies. Fixes #522
…er included file changes Fixes #522
Reverts 53a9caa6 and 38d3ab33. The calculate_prompt_hash() fix is correct at the fingerprint-calculation layer but incomplete end-to-end: pdd sync's insert-includes step strips <include> tags from the original .prompt file, so subsequent syncs cannot detect include dependency changes. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…cle detection) Fixes #521
…o-deps) - Skip auto-deps in agentic mode (prompts already have explicit dependencies) - Add 30s client-side timeout to Firecrawl scrape_url via ThreadPoolExecutor (works around SDK bug where timeout ms is passed to requests as seconds) - Update Firecrawl method from scrape() to scrape_url() for current SDK - Add 30s timeout to git ls-files subprocess call - Add label parameter to _run_with_provider for future heartbeat support Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
… auth When Claude Code runs on subscription (not API key), total_cost_usd is absent from JSON output. Add _calculate_anthropic_cost() with three-tier fallback: (1) modelUsage per-model costUSD, (2) token-based estimation from usage field with model-family-aware pricing (Opus/Sonnet/Haiku), (3) zero. This matches the existing pattern used for Gemini and Codex providers which always estimate from token counts. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…it tests for issue 509.
…and Vertex AI ADC, various bug fixes, build improvements, and grounding experiment documentation.
- Fix result[-2] indexing bug in sync_orchestration.py and pin_example_hack.py that caused $0.00 cost for agentic test generation. For 4-tuple returns (content, cost, model, success), result[-2] gave model (str) not cost (float). Changed to result[1] which is always the cost index. - Increase MODULE_TIMEOUT from 900s (15 min) to 1800s (30 min) in agentic_sync_runner.py. Complex modules (e.g. TypeScriptReact with <web> tags) need generate+crash+verify+test which can take 20+ min total. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Remove unused signal/threading imports from agentic_common.py - Guard against UnboundLocalError in _save_state if mkstemp fails - Clean up temp cost_file on Popen failure in _sync_one_module Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…state desynchronization (Issue #159), update LLM invocation logic and prompts, and add new grounding experiment results.
…e Claude model, and migrate pytest configuration to pyproject.toml.
Introduces user story tests as a first-class PDD feature: - `pdd/user_story_tests.py`: core validation logic — discover story files, run detect_change against each story, and apply fixes via change_main - `pdd detect --stories`: new mode that validates all user_stories/story__*.md files against current prompts (pass = no changes needed) - `pdd fix user_stories/story__<name>.md`: auto-detects affected prompts, applies changes, then re-validates - `pdd change`: auto-validates user stories post-change before finalizing, respecting `skip_user_stories` context flag to prevent recursion - `user_stories/story__template.md`: standard story template - Full test coverage (89 tests across 4 test files) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Set mock_proc.pid = 99999 in _make_mock_popen so that the timeout test calls os.killpg(99999, SIGTERM) instead of resolving MagicMock.__index__() to 1, which was sending SIGTERM to process group 1 and killing the entire pytest-xdist worker mid-run, causing CI to fail at 26% every time. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Replace claude-sonnet-4-5 entries (both Vertex AI and Anthropic) with claude-sonnet-4-6 in the canonical data file shipped with the package. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…dates for `sync_orchestration` to fix state desynchronization.
…hestration runs All 5 runs used vertex_ai/claude-opus-4-6 (context-1m-2025-08-07 beta working), achieving 100% test pass rate (108/108) and ref_sim=0.823 ± 0.031. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
… models to LLM configurations.
…architecture improvements (#482) * feat: Add iterative fix-verify loop (steps 3-7) to checkup orchestrator Steps 3-7 (build, interfaces, test, fix, verify) now run in a while loop (max 3 iterations) instead of once. If step 7 finds remaining failures, the workflow loops back to re-check/fix until "All Issues Fixed" or max iterations. Worktree is created before the loop; step 8 runs after. Prompts updated with iteration awareness, previous_fixes context, e2e test instructions, and "All Issues Fixed" exit signal. 59 tests (12 new). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: Copy uncommitted/untracked files into worktree on creation The worktree is created from HEAD, which only contains committed files. If the user has uncommitted or untracked files (e.g. new CRM modules), the worktree would be missing them, causing steps 3-7 to see different files than steps 1-2 analyzed. Now _setup_worktree calls _copy_uncommitted_changes which: 1. Applies uncommitted tracked changes via git diff HEAD | git apply 2. Copies untracked files (excluding .pdd/) into the worktree Both operations are best-effort — failures are logged but don't block. Added 7 tests for the new behavior. Reverted prompt workaround. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * feat: Split step 6 into sub-steps, fix resume bugs, and fix iteration display - Split monolithic step 6 into 6.1 (fix), 6.2 (regression tests), 6.3 (e2e tests) with separate prompts and 600s timeouts each - Bug A fix: save worktree state immediately after creation so Ctrl+C during step 3 can resume without recreating - Bug B fix: detect between-iterations resume (start_step > 7 with fix_verify_iteration > 0) and restart at step 3 with incremented iteration - Fix iteration number always showing "1" in GitHub comments: add iteration suffix to steps 3-5 comment headers, add explicit instruction to all loop step prompts to use exact iteration number - Fix total step count: "X of 7" -> "X of 8" across all prompts - Add STEP_ORDER constant and _next_step() helper for fractional step arithmetic - Add checkup command, agentic_checkup module, and comprehensive tests (108 total) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * feat: Add frontend integration checks to checkup step 4 and step 6.1 Step 4 (Interface Check) now audits: - Frontend navigation reachability: detect orphan pages with no nav link - Frontend→Backend API call consistency: detect pages using different URL patterns than the rest of the codebase (e.g. relative vs base URL) Step 6.1 (Fix) now handles: - Adding missing nav links for orphan pages - Updating inconsistent API call patterns to match codebase convention Found via QA on the CRM app where the page existed but had no sidebar link and used relative `/adminCrmActions` instead of the standard `${NEXT_PUBLIC_API_BASE_URL}/adminCrmActions` pattern. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: Increase step 7 (verify) timeout from 600s to 1200s Step 7 re-runs the full test suite to verify fixes, which can exceed 10 minutes on larger projects (e.g. 4600+ tests). The 600s timeout caused step 7 to time out after posting its GitHub comment but before returning, leaving state stuck at step 6.3 and causing infinite resume loops. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * feat: Improve architecture generation with Strategy B support, register checkup command, and add gh timeout - Add Strategy B (template-based group contexts) support to arch steps 7, 8, 10, 12 - Add pdd_path_construction_guide Strategy B documentation - Add example_python_backend.prompt template - Register checkup command in CLI - Add timeout parameter to _run_gh_command() - Update test durations Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * feat: Add PDD prompts, examples, and architecture entries for checkup modules - Add agentic_checkup_python.prompt and agentic_checkup_orchestrator_python.prompt - Add context/agentic_checkup_example.py and context/agentic_checkup_orchestrator_example.py - Register checkup and orchestrator in architecture.json (priority 217, 218) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * docs: Add README documentation for pdd checkup and pdd sync URL mode Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * ci: trigger Cloud Build CI --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com> Co-authored-by: Greg Tanaka <glt@alumni.caltech.edu>
Align the replay branch's final file state with main so the PR shows zero diff. The branch preserves the full commit history while ending at the same tree as main.
Contributor
There was a problem hiding this comment.
Copilot wasn't able to review any files in this pull request.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
sharedsection of.sync-config.ymlWhat's included
pdd/*.py,pdd/commands/*.py,pdd/core/*.py,pdd/server/**/*.pytests/**/*.pyprompts/*_LLM.prompt(rewritten topdd/prompts/*_LLM.prompt)examples/,context/**/*_example.pydocs/,utils/vscode_prompt/What's excluded
architecture.jsonpdd/prompts/*_python.prompt(cap-only).github/workflows/(not synced)Verification
git log --format="%an <%ae> %ad %s" | head -20🤖 Generated with Claude Code