Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
95 changes: 95 additions & 0 deletions agent/pr15-debug-comments-handoff.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,95 @@
# PR #15 Debug Doc Comments — Handoff

**Date:** 2026-03-28
**Source:** PR #15 review comments on `docs/ops/igit-merge-test-and-debug.md`
**For:** Parent session implementing eval harness plan

---

## Summary

11 review comments on the debug log. All concern `merge_context.sh`, `git_policy_hook`, or SKILL.md — none require changes to the eval harness plan itself. Several expand the design direction for existing bugs and should be captured in work items.

## Decisions and Actions by Bug

### Bug 1 — Structured errors instead of silent fallback

**Comments:** Lines 13, 28
**Decision:** Two-part fix:
1. `merge_context.sh` should return structured error objects on `gh` failure instead of silent `{}` fallback. Format: `{ "cmd": "...", "error": "...", "hint": "Try unsandboxed" }`
2. Model handles `gh` calls directly as fallback — script should NOT be run unsandboxed entirely (security hole). The fallback path already exists in SKILL.md.

**Action:** New work item — fix `merge_context.sh` error handling for sandboxed `gh` failures.

### Bug 2 — Parallel detection with priority aggregation

**Comment:** Line 47
**Decision:** Restructure task tracking detection to run all methods (env, mcp, cli, fs, cfg) in parallel and aggregate by priority, instead of sequential `if/elif` chain. This is a bigger redesign than just fixing the elif fall-through bug.

**Action:** Update existing Bug 2 fix direction in the debug doc. The work item scope is larger than originally described — not just "fix the elif" but "redesign detection as parallel with priority aggregation."

### Bug 3 — PR URL in AskUserQuestion preview field

**Comment:** Line 66
**Decision:** Use AskUserQuestion's `preview` field for the PR URL instead of interpolating into the `description` field. Whether `<a>` HTML links render in preview is unknown — needs empirical testing.

**Action:** New work item — test AskUserQuestion preview field rendering, then update SKILL.md. Two-part: (1) test what renders, (2) apply fix.

### Bug 4 — Identify PR-associated resources

**Comment:** Line 81
**Decision:** Reframe from "filter out main worktree" to "identify all resources associated with the PR being merged." Filter worktrees to `branch == current_branch && path != main_worktree_path`. The broader "all resources" framing (stashes, related branches, etc.) is future scope.

**Action:** Update Bug 4 fix direction in debug doc. Immediate fix: filter worktrees in `merge_context.sh` by comparing against the primary worktree path.

### Bug 5 — Implicit checklist scope and configuration

**Comments:** Lines 95, 96, 99
**Decisions:**
- Version bumps apply to ANY project releasing versioned artifacts (packages, containers, apps, infrastructure, configs), not just plugins
- "Project documentation" = README.md, CLAUDE.md, docs/, etc.
- Users need to document their merge checklist in a known, discoverable location
- Candidate locations: README.md, DEPLOY.md, RELEASE.md, CLAUDE.md, MERGE.md, or `docs/{i,}git/*.md`
- Concern: `docs/igit/*.md` is "dangerous if not protected" — files could be manipulated

**Action:** Update existing backlog item `docs/work/backlog/2026-03-27-merge-implicit-checklist.md` with these expanded requirements. The checklist location decision is a design question for that work item.

### Observation — bare `git push` bypasses hook

**Comments:** Lines 103, 111
**Decisions:**
- The bare push was in a chained command: `git tag ... && git push`
- Two proposed fix directions for `git_policy_hook`:
1. Fill in missing refspec before applying policies (resolve bare `git push` to the actual remote/branch)
2. Reject commands without specified remote and ref
- Tagging should be bumped from `allow` to `ask` if it was intentionally allowed

**Action:** New work item for `git_policy_hook` — handle bare `git push` and review tag push policy level.

### Observation — `git pull` fails in sandbox

**Comment:** Line 121
**Decision:** Root cause is likely GitHub API failure or sandboxing, not just DNS. The existing "narrate the failure" fix direction still applies.

**Action:** Minor — update debug doc root cause. No separate work item needed (covered by existing sandbox documentation).

## Impact on Eval Harness Plan

**None.** All items are `merge_context.sh`, `git_policy_hook`, or SKILL.md changes. The eval harness plan is not affected.

However, once Bug 1 (structured errors) and Bug 2 (parallel detection) are fixed, the eval cases for those bugs may need updated `context.output` payloads to reflect the new error/detection format.

## Work Items

| File | Scope |
|------|-------|
| `docs/work/backlog/2026-03-28-merge-context-bugs.md` | Bugs 1+2+4 — all `merge_context.sh` fixes (structured errors, parallel detection, worktree filtering) |
| `docs/work/backlog/2026-03-28-merge-skill-bugs.md` | Bugs 3+6 — SKILL.md fixes (PR URL preview field, cleanup order) |
| `docs/work/backlog/2026-03-27-merge-implicit-checklist-bugfix.md` | Bug 5 — expanded scope per review (versioned artifacts, doc scope, checklist location) |
| `docs/work/backlog/2026-03-28-policy-hook-bare-push-tags.md` | Observation — bare push refspec resolution + tag policy level |

Bug 6 (cleanup order) added from live testing 2026-03-28: Stage 3 tries to delete branch before removing worktree, but git refuses.

## PR Comment Replies

Each review comment needs a reply acknowledging the decision. Use `gh api` to reply in-thread.
133 changes: 133 additions & 0 deletions docs/ops/igit-merge-test-and-debug.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,133 @@
# /merge skill — live test & debug log

Running log of bugs and observations from manual testing of the `/merge` skill on real PRs.

---

## Session: 2026-03-27 — PR #14 (feature/merged-skill)

### Bug 1 — `gh` silently fails in `sh` subprocess (sandbox)

**Symptom:** `merge_context.sh` returns `pr.state: "none"` even when an open PR exists.

**Root cause:** Same Mach IPC sandbox restriction that affects `uv`. When `sh script.sh` spawns a child process, `gh` cannot reach the macOS keychain and exits non-zero. The `|| printf '{}'` fallback silently swallows the failure, leaving all PR fields at defaults.
Comment on lines +11 to +13

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of eating the error silently, could the script return the error
pr.state: { cmd: "...", error: "...", hint: "Try unsandboxed (requires user permission or config)" }

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Structured error objects instead of silent fallback — agreed. Script returns { cmd, error, hint } on gh failure so the model can see what broke and fall back to direct gh calls. New backlog item to track.


**Repro:**
```sh
sh -c 'gh pr view --json number,url,state,title,baseRefName 2>/dev/null || printf "{}"'
# → {}
```
Direct call works fine:
```sh
gh pr view --json number,url,state,title,baseRefName
# → {"number":14,...}
```

**Impact:** PR state, mergeable, merge_state_status, title, URL all lost.

**Fix direction:** Same workaround pattern as `uv --no-sync` — document that `merge_context.sh` must be run unsandboxed, or restructure so `gh` calls are made directly by the model (not via the shell script).

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree model should handle since it knows the user env better and running the entire script unsandboxed would be opening a huge hole in security.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Confirmed — model handles gh fallback directly, script stays sandboxed. The SKILL.md fallback path already covers this; the missing piece is the script reporting the failure instead of swallowing it.


---

### Bug 2 — MCP elif branch eats filesystem task tracking detection

**Symptom:** `merge_context.sh` returns `task_tracking.tool: "none"` even though `docs/work/active/` exists.

**Root cause:** The detection chain is a flat `if/elif/elif/...`. The MCP branch fires when `~/.claude/settings.json` exists (which it does for any Claude Code user). If no matching MCP server is found (linear/jira/asana), `_tt_tool` stays `"none"` — but the `elif` for `docs/work/active` is never reached.

**Relevant code in `merge_context.sh`:**
```sh
elif [ -f ".mcp.json" ] || [ -f "${HOME}/.claude/settings.json" ]; then
for _cfg in ...; do
# checks linear, jira, asana — if none match, loop exits without setting _tt_tool
done
# elif [ -d "docs/work/active" ] ← never reached
```

**Fix direction:** After the MCP loop, fall through to the remaining checks (cli, fs, cfg) if `_tt_tool` is still `"none"`. Either nest the lower-priority checks inside the MCP branch or restructure as a sequential check without elif.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally the checks would all run in parallel (for speed) and the results would be aggregated based on priority order.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Parallel detection with priority aggregation — bigger scope than fixing the elif. Backlog item will capture the redesign: run all methods concurrently, pick highest-priority result.


---

### Bug 3 — PR URL missing from "I'll merge manually" option description

**Symptom:** The `AskUserQuestion` "I'll merge manually" option says `"Open the PR URL — /merge will wait and detect when it's merged"` but doesn't actually include the URL. User has to look it up separately.

**Fix direction:** Interpolate the PR URL into the description: `"Open <pr.url> — /merge will wait and detect when it's merged"`.

**Relevant section in `SKILL.md`:**
```
- label: "I'll merge manually"
description: "Open the PR URL — /merge will wait and detect when it's merged"
```

Should be:
```
- label: "I'll merge manually"
description: "Open <pr.url> — /merge will wait and detect when it's merged"
Comment on lines +65 to +66

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think what I actually want is the PR url in the preview field which renders ascii (and possibly html) in a box to the side. Will have to test and see if a link can be rendered.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Preview field for PR URL — will need to test what AskUserQuestion renders. If <a> works, that's the cleanest UX. Backlog item to investigate and apply.

```

---

### Bug 4 — Main worktree appears in `worktrees[]`, could trigger spurious `git worktree remove`

**Symptom:** `merge_context.sh` includes the main worktree in `worktrees[]`. If the skill iterates `worktrees` looking for a match on the current branch, it would find the main worktree and attempt `git worktree remove` on it — which would fail or cause confusion.

**Observed:** During PR #14 test, `worktrees` contained:
```json
[{"branch": "feature/merged-skill", "path": "/Users/tgulls/Code/llm/claude/igit"}]
```
That path is the main working tree, not an additional worktree.

**Fix direction:** Filter the main worktree out in `merge_context.sh` (compare each path against the primary worktree path from `git worktree list --porcelain | head -2`), or have the skill skip `git worktree remove` when the matched path is the repo root.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a way to only list the worktree associated with an open branch or PR? I think the goal should be to identify all resources associated with the PR being merged.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reframing as "identify all PR-associated resources" — immediate scope is filtering worktrees to branch == current_branch && path != main_worktree. The main worktree path comes from the first entry in git worktree list --porcelain. Broader resource identification (stashes, related branches) is future scope.


---

### Bug 5 — Stage 1 doesn't check implicit merge checklist (docs, tests, version bump)

**Symptom:** Skill proceeded through Stage 1 without flagging that the PR lacked a version bump. The version bump had to be applied after merge as a separate direct-to-main commit.

**Root cause:** Stage 1 only iterates `task_tracking.work_items` and checks their explicit AC. There is no implicit checklist covering common pre-merge hygiene — docs updated, tests passing, version bumped — that applies regardless of task tracking state.

**Impact:** New skill shipped in PR #14 with no version bump; bump was applied post-merge directly to main (bypassing branch protection — see bare `git push` observation below).

**Fix direction:** Add an implicit pre-merge checklist to Stage 1 that the model runs before proceeding, independent of work item AC. Candidates:
- Tests passing (`uv run --no-sync pytest` or equivalent)
- Version bumped if new skills/hooks/commands were added

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

version bumps are relevant to any project that releases or deploys versioned artifacts (packages/libraries or container images, applications, infrastructure, configs, etc.)

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Noted — version bumps scope expanded beyond plugins to any versioned artifact project. Updating the implicit checklist backlog item.

- Docs/CLAUDE.md updated if conventions changed

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Project documentation: README.md, CLAUDE.md, docs/, etc.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Expanding "project documentation" to include README.md, CLAUDE.md, docs/, etc. in the checklist scope.

- No untracked files that should have been committed

This checklist could be surfaced as a confirmation panel or auto-checked where possible (e.g. run tests, check if plugin.json version changed since base branch).

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably need to have users document this in a known place
ex. README.md, DEPLOY.md, RELEASE.md, CLAUDE.md, MERGE.md
One option is docs/{i,}git/*.md files for each workflow (dangerous if not protected..)

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Checklist location is a design question for the backlog item. The protection concern is real — if the checklist lives in an unprotected path, it could be manipulated to skip checks. Will capture the candidate locations and the security consideration.


---

### Bug 6 — Stage 3 cleanup order: worktree must be removed before branch deletion

**Symptom:** `git branch --delete` fails with `error: cannot delete branch used by worktree` when a worktree exists for the branch.

**Root cause:** SKILL.md Stage 3 orders "Delete Local Branch" before "Remove Worktree." Git refuses to delete a branch checked out in any worktree — safety mechanism to prevent leaving a worktree in an invalid state.

**Fix direction:** Swap the order in Stage 3: remove worktree first, then delete branch.

---

### Observation — bare `git push` bypasses block-push-to-main hook

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was in a chained command git tag ... && git push

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Noted — chained command context. The hook matcher saw git push with no ref, which doesn't match the main/master pattern.


**Symptom:** Running `git push` (implicit tracking branch push) on main succeeded without being blocked. `git push origin main` is correctly blocked.

**Root cause:** `~/.claude/hooks/block-push-to-main.sh` matches on an explicit `main`/`master` ref in the command string. A bare `git push` with no ref specified doesn't match, even though it implicitly pushes to main via the tracking branch.

**Observed:** During v0.11.0 release, `git tag v0.11.0 5c291a4 && git push && git push origin v0.11.0` pushed a commit directly to main unblocked.

**Fix direction:** Either extend the hook to also block bare `git push` when the current branch is main/master, or amend the release process to require a PR for the version bump commit.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't release tags have to be pushed to main (or the release branch) directly? One solution I'll offer is to change the policy hook to always "fill in" missing refspec before applying policies, or reject commands without specified remote and ref. If allowing tagging was intentional it should be bumped from allow to ask.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two options: (1) policy hook resolves bare git push to actual remote/ref before applying rules, or (2) reject commands without explicit remote and ref. Also bumping tag push from allow to ask. New backlog item for policy hook improvements.


---

### Observation — `git pull` fails in sandbox

**Symptom:** Stage 3 `git pull` fails with `Could not resolve host: github.com`.

**Workaround:** User must run `git pull` in their terminal after the skill completes.

**Fix direction:** Narrate when the pull fails so the user knows to run it manually. Don't treat it as a hard stop.
Comment on lines +125 to +131

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this was due to github API failures. That or its a sandboxing issue.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Likely GitHub API or sandboxing — updating root cause in the debug doc.


---
Loading