Skip to content

Comments

Replay 2906 commits from private repo history#549

Open
jiaminc-cmu wants to merge 2907 commits intomainfrom
replay-commit-history
Open

Replay 2906 commits from private repo history#549
jiaminc-cmu wants to merge 2907 commits intomainfrom
replay-commit-history

Conversation

@jiaminc-cmu
Copy link
Collaborator

Summary

  • Replayed 2,906 filtered commits from the private repo onto this branch
  • Only includes files matching the shared section of .sync-config.yml
  • Original author, date, and commit messages are preserved
  • Private-only files (architecture.json, _python.prompt in pdd/prompts/, etc.) were excluded

What's included

  • pdd/*.py, pdd/commands/*.py, pdd/core/*.py, pdd/server/**/*.py
  • tests/**/*.py
  • prompts/*_LLM.prompt (rewritten to pdd/prompts/*_LLM.prompt)
  • examples/, context/**/*_example.py
  • Root configs (README, requirements.txt, pyproject.toml, Makefile, etc.)
  • docs/, utils/vscode_prompt/

What's excluded

  • architecture.json
  • pdd/prompts/*_python.prompt (cap-only)
  • .github/workflows/ (not synced)
  • All non-shared internal files

Verification

  • Leak check passed — no private files introduced by replay
  • Review final file state matches expected public content
  • Verify commit authorship: git log --format="%an <%ae> %ad %s" | head -20

🤖 Generated with Claude Code

gltanaka and others added 30 commits January 26, 2026 12:32
- Fix format string injection: Escape curly braces in LLM outputs before
  storing in context to prevent KeyError when subsequent prompts contain
  {placeholders} from code/error analysis
- Fix silent error: Print KeyError messages to console before returning
- Fix resume message: Calculate actual start step (5.5) before displaying
  resume message instead of showing incorrect "step 6"

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
Add 5 new tests covering:
- Format string injection: Verify curly braces in LLM outputs don't cause KeyError
- Restored context escaping: Verify curly braces in resumed state are escaped
- Error console output: Verify KeyError messages are printed to console
- Resume message for step 5.5: Verify correct step shown when resuming after step 5
- Resume message for step 6: Verify correct step shown when resuming after step 5.5

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
Cherry-picked changes from PR #267:
- Add optional interactive steering to sync command
- Fix sync animation for horizontal terminal resizes
- Add --no-steer and --steer-timeout options
- Add sync_tui tests and example

Note: Excluded pdd/prompts/* (symlink) and project_dependencies.csv

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
## Non-Python Sync Fixes

- Skip test_extend for non-Python languages since code coverage tooling is Python-specific
- Fix sync returning success without generating tests for non-Python modules
  - Added check for test file existence before accepting workflow as complete
  - The synthetic RunReport from crash/verify was incorrectly triggering "all_synced"
- Add safety checks in sync_orchestration.py and pin_example_hack.py

## Frontend File Detection Fixes

- Support new .pddrc `outputs.code.path` format (Issue #237)
  - Previously only looked for legacy `generate_output_path`
  - Now checks `outputs.code.path`, `outputs.test.path`, `outputs.example.path` first
- Add .test.ts/.spec.ts patterns for Jest/TypeScript test file detection
  - Fixes detection of files like `test_prisma_schema.test.ts`
- Rebuild frontend with updated file detection logic

## Architecture Generation Fixes

- Add valid language suffixes guidance to prevent LLM using invalid suffixes like `_NextJS`
- Escape curly braces in architecture_json.prompt template to prevent .format() errors
- Add preprocessing in orchestrator before .format() calls

## Path Resolution Fixes

- Add `typescriptreact` -> `.tsx`, `javascriptreact` -> `.jsx`, `prisma` -> `.prisma` mappings
- Ensure example and test paths always have fallback defaults

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
## Workflow Changes

Reorganized the agentic architecture workflow for better modularity:
- Step 1-6: Unchanged (analyze PRD, decompose, research, design, deps, generate)
- Step 7: NEW - Generate .pddrc configuration file
- Step 8: Renamed from step 7 - Generate individual prompts
- Step 9: Renamed from step 8 - Completeness validation
- Step 10: Renamed from step 9 - Sync prompts with architecture
- Step 11: NEW - Dependency resolution
- Step 12: Renamed from step 11 - Fix validation errors

## New Files

- `prompts/agentic_arch_step7_pddrc_LLM.prompt` - .pddrc generation step
- `prompts/agentic_arch_step11_deps_LLM.prompt` - Dependency resolution step
- `prompts/agentic_arch_step12_fix_LLM.prompt` - Enhanced fix step
- `pdd/templates/architecture/example_nextjs_task_notes.prompt` - Example for Next.js projects
- `pdd/templates/architecture/pdd_path_construction_guide.prompt` - Path construction reference

## Template Fixes

- Escape curly braces in docs/prompting_guide.md to prevent .format() errors
- Change {PLACEHOLDER} to [PLACEHOLDER] in generate_prompt.prompt to avoid confusion
- Updated step count references in step 1 and 2 prompts (8 -> 11)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
## Task Queue Panel Improvements

- Make task queue panel draggable by adding drag handle
- Save/restore panel position to localStorage
- Add reset position button to return to default (top-right corner)
- Keep panel within viewport bounds on window resize
- Separate collapse toggle from drag handle for better UX

## Generate Command

- Add --skip-prompts flag to skip prompt generation in agentic architecture mode
- Prompts are generated by default; flag allows skipping when not needed

## Logging

- Change generate_output_paths logging from INFO to DEBUG level
- Reduces noise since paths may be overridden by outputs.code.path config

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
- test_agentic_architecture_orchestrator: Update all tests to reflect
  the new 11-step workflow (steps 1-8 linear + steps 9-11 validation)
- test_sync_determine_operation: Fix test_decision_test_on_low_coverage
  to create actual test file (required after test file existence check)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
Update README and TUTORIALS.md to reflect the new 11-step agentic
architecture workflow:
- Steps 1-8: Analysis & generation (architecture.json, .pddrc, prompts)
- Steps 9-11: Validation (completeness, sync, dependencies)

Also document the new --skip-prompts option for faster architecture-only
generation.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
- test_commands_generate.py: Add skip_prompts=False to expected call
- code_generator_main.py: Handle both {{PLACEHOLDER}} (YAML-escaped)
  and {PLACEHOLDER} (single brace) in post_process_args substitution

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
This reverts commit 351813a22a247312e74cefb85f8ed2a4334ad218.
…ck.acquire()

This commit adds comprehensive unit and E2E tests that detect the file handle
resource leak in SyncLock.acquire() when non-IOError/OSError exceptions occur.

Tests added:
- Unit tests in test_sync_determine_operation.py (6 tests)
  - KeyboardInterrupt during lock acquisition
  - RuntimeError during lock acquisition
  - Exception during file operations
  - IOError/OSError regression tests
  - Normal operation regression tests
  - Context manager exception handling

- E2E tests in test_e2e_issue_403_file_handle_leak.py (4 tests)
  - Real-world KeyboardInterrupt scenario (Ctrl+C)
  - RuntimeError leak detection
  - Normal operation verification
  - Context manager interrupt handling

All tests correctly fail on current code, detecting the bug where file
descriptors remain open when exceptions other than IOError/OSError occur
during lock acquisition.

Related to #403

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
- Revert unnecessary code change to handle double braces in code_generator_main.py
- Update test_code_generator_main.py to normalize {{PLACEHOLDER}} to {PLACEHOLDER}
  when reading the template for testing purposes

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
- Fix mock_httpx_client fixture to properly mock async context manager
  using AsyncMock for __aenter__ and __aexit__
- Fix test_first_heartbeat_sent_immediately to avoid orphan coroutines
  by using side_effect instead of reassigning the mock
- Fix test_heartbeat_refreshes_token_on_401 to use side_effect pattern
- Fix test_heartbeat_only_refreshes_once_per_cycle to use return_value

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
Add explanation that core_dump files are created when PDD runs crash
or hit internal errors, per Copilot review suggestion.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
…h expanded agentic architecture, various bug fixes, and refactors.
…gration

This update improves the verbose logging setup by allowing the LiteLLM library to toggle its debug output based on the verbose flag or environment variable. It ensures that logging levels are appropriately set for production and development modes, and adds error handling for potential attribute access issues in LiteLLM.
…n SyncLock.acquire()"

This reverts commit 400e8ea02e6d20877f41232f568dea3183aadab2.
- Add batch detection using Union-Find algorithm to group modules by dependency
- Each batch is a connected component in the dependency graph
- Modules within a batch sync sequentially (by priority), different batches are independent
- Add BatchFilterDropdown component with expandable module list view
- Add batch color stripe indicator on graph nodes
- Add SyncOptionsModal for configuring sync options before execution
- Various UI improvements to PromptSelector, PromptSpace, and constants

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
- Add `agentic_mode` parameter to sync_orchestration for Python agentic path
- Change cmd_test_main return type from 3-tuple to 4-tuple with agentic_success flag
- Add run_agentic_test_generate 4-tuple return with success boolean
- Add _use_agentic_path() helper to determine agentic behavior
- Add _create_synthetic_run_report_for_agentic_success() for non-Python languages
- Fix sync_determine_operation to differentiate synthetic vs real run reports using test_hash
- Use sentinel value "agentic_test_success" when agent succeeds but file is at different path
- Trust agentic_success flag for non-Python test generation instead of file existence check
- Update prompts and examples to reflect API changes

Fixes issue where sync reported failure despite successful agentic test generation
for non-Python languages (CSS, TypeScript, etc.) where test files may be created
at different paths or with different extensions than expected.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
…upport

Add agentic_mode parameter throughout the sync workflow:
- commands/maintenance.py: Add --agentic CLI flag to sync command
- sync_main.py: Pass agentic_mode parameter to sync_orchestration
- prompts/sync_main_python.prompt: Document agentic_mode parameter

Update agentic_test_generate return signature:
- prompts/agentic_test_generate_python.prompt: Update return type from
  tuple[str, float, str] to tuple[str, float, str, bool] to include
  success boolean, matching actual implementation

Fix cloud timeout handling:
- fix_verification_errors_loop.py: Use get_cloud_timeout() function
  instead of hardcoded CLOUD_REQUEST_TIMEOUT constant

Improve server job failure detection:
- server/jobs.py: Check stdout for sync failure indicators even when
  exit code is 0, since sync may return 0 but report failure in output

Extend language extension mappings:
- server/routes/files.py: Add HTML, CSS, and Makefile extensions;
  include "Dockerfile" without extension prefix

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
Update test_sync_dry_run_mode to expect the new agentic_mode=False
parameter in sync_orchestration calls.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
…er for the 3blue1brown demo, along with related dependency and changelog updates.
This commit adds comprehensive test coverage for the bug where commits
created by LLM agents during Step 1 of the agentic E2E fix workflow are
not pushed to the remote repository when the workflow exits early at Step 2.

Test files:
- tests/test_e2e_issue_419_unpushed_commits.py: Unit tests for _commit_and_push()
- tests/test_e2e_issue_419_cli_unpushed_commits.py: E2E integration test

The tests verify the expected behavior documented in CHANGELOG v0.0.121:
"pdd fix now automatically commits and pushes changes after successful completion"

These tests fail on the current buggy code and will pass once the fix
is implemented in pdd/agentic_e2e_fix_orchestrator.py lines 237-238.

Related to #419
Serhan-Asad and others added 28 commits February 15, 2026 18:13
result[-2] on a 4-tuple returns model name string instead of cost float,
silently dropping test/test_extend costs.

Fixes #508

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Fix generate test to use correct 4-tuple (content, was_incremental, cost, model)
- Update E2E mock for code_generator_main to return 4-tuple
- Improve comment accuracy for tuple format documentation

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…antic schema validation, and introduce new grounding stability experiments.
…encies

Unit tests and E2E test that reproduce the bug where sync_determine_operation
only hashes the raw .prompt file, missing changes to <include>d dependencies.

Fixes #522
Reverts 53a9caa6 and 38d3ab33. The calculate_prompt_hash() fix is
correct at the fingerprint-calculation layer but incomplete end-to-end:
pdd sync's insert-includes step strips <include> tags from the original
.prompt file, so subsequent syncs cannot detect include dependency
changes.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…o-deps)

- Skip auto-deps in agentic mode (prompts already have explicit dependencies)
- Add 30s client-side timeout to Firecrawl scrape_url via ThreadPoolExecutor
  (works around SDK bug where timeout ms is passed to requests as seconds)
- Update Firecrawl method from scrape() to scrape_url() for current SDK
- Add 30s timeout to git ls-files subprocess call
- Add label parameter to _run_with_provider for future heartbeat support

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
… auth

When Claude Code runs on subscription (not API key), total_cost_usd is
absent from JSON output. Add _calculate_anthropic_cost() with three-tier
fallback: (1) modelUsage per-model costUSD, (2) token-based estimation
from usage field with model-family-aware pricing (Opus/Sonnet/Haiku),
(3) zero. This matches the existing pattern used for Gemini and Codex
providers which always estimate from token counts.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…and Vertex AI ADC, various bug fixes, build improvements, and grounding experiment documentation.
- Fix result[-2] indexing bug in sync_orchestration.py and pin_example_hack.py
  that caused $0.00 cost for agentic test generation. For 4-tuple returns
  (content, cost, model, success), result[-2] gave model (str) not cost (float).
  Changed to result[1] which is always the cost index.

- Increase MODULE_TIMEOUT from 900s (15 min) to 1800s (30 min) in
  agentic_sync_runner.py. Complex modules (e.g. TypeScriptReact with <web> tags)
  need generate+crash+verify+test which can take 20+ min total.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Remove unused signal/threading imports from agentic_common.py
- Guard against UnboundLocalError in _save_state if mkstemp fails
- Clean up temp cost_file on Popen failure in _sync_one_module

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…state desynchronization (Issue #159), update LLM invocation logic and prompts, and add new grounding experiment results.
…e Claude model, and migrate pytest configuration to pyproject.toml.
Introduces user story tests as a first-class PDD feature:
- `pdd/user_story_tests.py`: core validation logic — discover story files,
  run detect_change against each story, and apply fixes via change_main
- `pdd detect --stories`: new mode that validates all user_stories/story__*.md
  files against current prompts (pass = no changes needed)
- `pdd fix user_stories/story__<name>.md`: auto-detects affected prompts,
  applies changes, then re-validates
- `pdd change`: auto-validates user stories post-change before finalizing,
  respecting `skip_user_stories` context flag to prevent recursion
- `user_stories/story__template.md`: standard story template
- Full test coverage (89 tests across 4 test files)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Set mock_proc.pid = 99999 in _make_mock_popen so that the timeout test
calls os.killpg(99999, SIGTERM) instead of resolving MagicMock.__index__()
to 1, which was sending SIGTERM to process group 1 and killing the entire
pytest-xdist worker mid-run, causing CI to fail at 26% every time.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Replace claude-sonnet-4-5 entries (both Vertex AI and Anthropic) with
claude-sonnet-4-6 in the canonical data file shipped with the package.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…dates for `sync_orchestration` to fix state desynchronization.
…hestration runs

All 5 runs used vertex_ai/claude-opus-4-6 (context-1m-2025-08-07 beta working),
achieving 100% test pass rate (108/108) and ref_sim=0.823 ± 0.031.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…architecture improvements (#482)

* feat: Add iterative fix-verify loop (steps 3-7) to checkup orchestrator

Steps 3-7 (build, interfaces, test, fix, verify) now run in a while loop
(max 3 iterations) instead of once. If step 7 finds remaining failures,
the workflow loops back to re-check/fix until "All Issues Fixed" or max
iterations. Worktree is created before the loop; step 8 runs after.

Prompts updated with iteration awareness, previous_fixes context, e2e test
instructions, and "All Issues Fixed" exit signal. 59 tests (12 new).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: Copy uncommitted/untracked files into worktree on creation

The worktree is created from HEAD, which only contains committed files.
If the user has uncommitted or untracked files (e.g. new CRM modules),
the worktree would be missing them, causing steps 3-7 to see different
files than steps 1-2 analyzed.

Now _setup_worktree calls _copy_uncommitted_changes which:
1. Applies uncommitted tracked changes via git diff HEAD | git apply
2. Copies untracked files (excluding .pdd/) into the worktree

Both operations are best-effort — failures are logged but don't block.
Added 7 tests for the new behavior. Reverted prompt workaround.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* feat: Split step 6 into sub-steps, fix resume bugs, and fix iteration display

- Split monolithic step 6 into 6.1 (fix), 6.2 (regression tests), 6.3 (e2e tests)
  with separate prompts and 600s timeouts each
- Bug A fix: save worktree state immediately after creation so Ctrl+C during
  step 3 can resume without recreating
- Bug B fix: detect between-iterations resume (start_step > 7 with
  fix_verify_iteration > 0) and restart at step 3 with incremented iteration
- Fix iteration number always showing "1" in GitHub comments: add iteration
  suffix to steps 3-5 comment headers, add explicit instruction to all loop
  step prompts to use exact iteration number
- Fix total step count: "X of 7" -> "X of 8" across all prompts
- Add STEP_ORDER constant and _next_step() helper for fractional step arithmetic
- Add checkup command, agentic_checkup module, and comprehensive tests (108 total)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* feat: Add frontend integration checks to checkup step 4 and step 6.1

Step 4 (Interface Check) now audits:
- Frontend navigation reachability: detect orphan pages with no nav link
- Frontend→Backend API call consistency: detect pages using different
  URL patterns than the rest of the codebase (e.g. relative vs base URL)

Step 6.1 (Fix) now handles:
- Adding missing nav links for orphan pages
- Updating inconsistent API call patterns to match codebase convention

Found via QA on the CRM app where the page existed but had no sidebar
link and used relative `/adminCrmActions` instead of the standard
`${NEXT_PUBLIC_API_BASE_URL}/adminCrmActions` pattern.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: Increase step 7 (verify) timeout from 600s to 1200s

Step 7 re-runs the full test suite to verify fixes, which can exceed
10 minutes on larger projects (e.g. 4600+ tests). The 600s timeout
caused step 7 to time out after posting its GitHub comment but before
returning, leaving state stuck at step 6.3 and causing infinite
resume loops.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* feat: Improve architecture generation with Strategy B support, register checkup command, and add gh timeout

- Add Strategy B (template-based group contexts) support to arch steps 7, 8, 10, 12
- Add pdd_path_construction_guide Strategy B documentation
- Add example_python_backend.prompt template
- Register checkup command in CLI
- Add timeout parameter to _run_gh_command()
- Update test durations

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* feat: Add PDD prompts, examples, and architecture entries for checkup modules

- Add agentic_checkup_python.prompt and agentic_checkup_orchestrator_python.prompt
- Add context/agentic_checkup_example.py and context/agentic_checkup_orchestrator_example.py
- Register checkup and orchestrator in architecture.json (priority 217, 218)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* docs: Add README documentation for pdd checkup and pdd sync URL mode

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* ci: trigger Cloud Build CI

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: Greg Tanaka <glt@alumni.caltech.edu>
Align the replay branch's final file state with main so the PR
shows zero diff. The branch preserves the full commit history
while ending at the same tree as main.
@gltanaka gltanaka requested a review from Copilot February 21, 2026 07:44
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot wasn't able to review any files in this pull request.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants