Epic: Autonomous Swarm Mode for Epics by acreeger · Pull Request #565 · iloom-ai/iloom-cli

acreeger · 2026-02-05T23:37:53Z

PR for issue #557

This PR was created automatically by iloom.

acreeger · 2026-02-05T23:39:11Z

Complexity Assessment for Issue #557

Analysis Plan

Scan issue description and comments for scope
Identify files affected (new and modified)
Estimate lines of code and architectural signals
Assess breaking changes and cross-cutting concerns
Perform quick codebase searches for existing patterns
Classify complexity and document findings

Complexity Assessment

Classification: COMPLEX

Metrics:

Estimated files affected: 14-18 (8-10 new + 6-8 modified)
Estimated lines of code: 1000-1250 LOC
Breaking changes: Yes (il start behavior change, new --swarm flag, user interaction changes)
Database migrations: No (progress file only, no schema changes)
Cross-cutting changes: Yes (epic detection → supervisor spawn → task claiming → worktree → agent → PR merge → progress tracking)
File architecture quality: Good (modular structure, but start.ts is 610 LOC with complex entry logic)
Architectural signals triggered: External constraints (Beads), Uncertain approach (conflict resolver agent pattern), New patterns (supervisor orchestration loop), Integration points (5+ - Beads, agents, GitHub/Linear, worktrees, progress tracking), Implementation unclear (how conflict resolution coordinates with parallel agents)
Overall risk level: High

Reasoning: This epic implements a sophisticated multi-agent orchestration system with external Beads integration, requiring coordination across 14+ files, significant new code (~1000 LOC), and critical architectural decisions about supervisor loops, conflict resolution agents, and parallel task claiming. The cross-cutting nature of the data flow (epic detection threading through supervisor → task claiming → worktree → agent → merge) and multiple integration points with uncertain implementation approaches trigger automatic COMPLEX classification despite moderate individual file impacts.

acreeger · 2026-02-06T02:17:38Z

Code Review: Epic #557 — Autonomous Swarm Mode

Review of all 7 commits (#558-#564) against their respective issue requirements, ADR compliance, and project guidelines.

Critical / High Priority (Must Fix)

1. Silent auto-install in CI when `autoInstallBeads: false` — #558

File: src/lib/BeadsManager.ts (ensureInstalled())

When autoInstall is false and the environment is non-interactive (CI/CD), the code falls through to install without consent — silently downloading and executing a remote script. Contradicts the purpose of having an autoInstallBeads setting.

Fix: In non-interactive mode with autoInstall=false, throw an error telling the user to install bd manually or enable autoInstallBeads.

2. PATH not updated after Beads install — #558

File: src/lib/BeadsManager.ts (ensureInstalled())

After runInstallScript() completes, isInstalled() checks command -v bd which fails because the Node process PATH is stale. Install scripts add to ~/.local/bin but the running process doesn't pick that up.

Fix: Append likely install locations (~/.local/bin) to process.env.PATH before verification check.

3. `findPRForBranch` swallows all errors, causing premature task closure — #562

File: src/lib/SwarmSupervisor.ts:303-316

The catch block returns null for any error (network failures, rate limits, auth errors). When it returns null, checkCompletedAgents assumes no PR was created and permanently closes the Beads task — effectively losing that agent's work. Violates "DO NOT SWALLOW ERRORS".

Fix: Propagate unexpected errors. Only return null for "no PR found" scenarios, not transient failures.

4. `GIT_REMOTE` not set for swarm mode — #561

File: src/commands/ignite.ts:562-571

The template's STEP 5-SWARM uses {{GIT_REMOTE}} in git push {{GIT_REMOTE}} HEAD, but GIT_REMOTE is only set inside the draft PR block. In swarm mode without draft PR, the template renders git push HEAD (empty remote).

Fix: Set GIT_REMOTE inside the swarm mode block if not already set (default to origin or settings.mergeBehavior.remote).

5. `EPIC_BRANCH` not validated when `SWARM_MODE` active — #561

File: src/commands/ignite.ts:565-566

EPIC_BRANCH is optional, but the template uses it unconditionally in gh pr create --base {{EPIC_BRANCH}}. Missing value causes gh to fail with a confusing argument parsing error.

Fix: Throw Error('ILOOM_EPIC_BRANCH is required when ILOOM_SWARM_MODE is enabled') if missing.

6. Broken prompt interpolation in confirmSwarmMode — #559

File: src/commands/start.ts (confirmSwarmMode)

The template literal Issue #${epicDetection.totalChildren > 0 ? ...} never interpolates the issue number. Output is "Issue #is an epic with 3 child issues...".

Fix: Pass issue number into confirmSwarmMode and use it in the template.

7. `parseInt` NaN for `--max-agents` — #559

File: src/cli.ts:384

parseInt('abc') returns NaN, which passes through ?? (NaN is not null/undefined) and flows into SwarmSettings.maxConcurrent. User sees "Max NaN concurrent agents".

Fix: Replace parseInt with a custom parser that validates and throws on invalid input.

Medium Priority (Should Fix)

8. Potential infinite loop on permanently failed tasks — #563

File: src/lib/SwarmSupervisor.ts:256

When a task is permanently failed, releaseClaim() returns it to "ready" state in Beads. The code skips claiming it (permanentlyFailed check), but readyTasks.length > 0 prevents the exit condition from being met. The loop polls every 2s indefinitely.

Fix: Filter permanently failed tasks from the exit condition: const actionableReadyTasks = readyTasks.filter(t => !this.permanentlyFailed.has(t.id)).

9. Conflict resolver uses wrong working directory — #563

File: src/lib/SwarmSupervisor.ts:649-651

By the time a task reaches the merge queue, activeAgents has already been cleared. The cwd falls back to epicLoom.epicLoomPath instead of the child worktree. The resolver runs without the right branch context.

Fix: Store loomPath in MergeQueueEntry when enqueuing, or keep a separate taskLoomPaths map.

10. Log file streams never closed — #562

File: src/lib/SwarmSupervisor.ts:185-190

fs.createWriteStream is called per agent but never closed. In long-running swarms, this leaks file descriptors.

Fix: Store logStream in ActiveAgent, call logStream.end() when agent completes.

11. Supervisor loop can exit prematurely — #562

File: src/lib/SwarmSupervisor.ts:134-136

If a failed agent releases its claim, the task may not appear immediately in the next ready() call (Beads internal delay). Supervisor exits before the released task becomes ready again.

Fix: Check Beads DAG overall status to distinguish "all done" from "no tasks ready right now".

12. `parseInt` truncates mixed-format IDs — #562

File: src/lib/SwarmSupervisor.ts:332-335

parseInt("100-fix-login", 10) returns 100. Any task ID starting with digits but containing non-digit chars is silently truncated.

Fix: Use strict regex: if (/^\d+$/.test(taskId)) return parseInt(taskId, 10); return taskId;

13. `ready()` and `list()` swallow JSON parse errors — #558, #563

Files: src/lib/BeadsManager.ts

Both methods catch JSON parse failures and return [] instead of throwing. Violates "DO NOT SWALLOW ERRORS". Supervisor sees zero tasks and idles indefinitely.

Fix: Throw BeadsError on parse failure. Callers already have try/catch for graceful handling.

14. Sync idempotency check uses `ready()` instead of full list — #558

File: src/lib/BeadsSyncService.ts

bd ready only returns unblocked tasks. Blocked/claimed/completed tasks are missed, causing re-creation attempts on re-sync.

Fix: Use beadsManager.list() for the idempotency check, or rely solely on the "already exists" catch.

15. Catch-all in `detectEpic` swallows all errors — #559

File: src/commands/start.ts:735-738

Catches all errors and returns null. Network errors, auth failures, type errors all treated as "not an epic". Violates project guidelines.

Fix: Narrow catch to expected API/network errors. Re-throw unexpected errors.

16. N+1 dependency fetching — #559

File: src/lib/EpicDetector.ts:107-135

Each child issue's dependencies fetched sequentially. 10 children = 10 serial API calls.

Fix: Use Promise.allSettled for parallel fetching.

17. `SKIP_IMPLEMENTATION` path not swarm-aware — #561

File: templates/prompts/issue-prompt.txt:796-798

In swarm mode, marking SKIP_IMPLEMENTATION skips PR creation and issue closing. The supervisor has no visibility into the agent's work.

Fix: Add swarm-mode conditional that still closes the issue and reports to supervisor.

18. "Human review" text contradicts swarm mode — #561

File: templates/prompts/issue-prompt.txt:218-220

"Each step requires explicit human approval" runs in swarm mode where agents are autonomous. Could cause agents to halt waiting for approval.

Fix: Wrap in {{#unless SWARM_MODE}} or add override.

19. Missing `isEpic`/`swarmStatus` in swarm metadata path — #560

File: src/lib/LoomManager.ts:~1203-1218

Normal path propagates isEpic/swarmStatus to metadata but finishSwarmLoom does not. Metadata inconsistency.

Fix: Add same propagation to swarm path, or document intentional omission.

Low Priority (Nice to Fix)

#	Commit	Location	Issue
20	#558	`BeadsManager.ts` (runInstallScript)	`curl\|bash` from unpinned `main` branch — no integrity verification
21	#558	`BeadsManager.ts` (execBd)	Spreads full `process.env` to `bd` subprocess — potential secret leakage
22	#558	`BeadsSyncService.test.ts`	Mock missing `providerName` and `issuePrefix` properties
23	#558	`plan-prompt.txt:280-283`	Duplicate numbered list items (two `1.` entries)
24	#559	`start.ts:722`	Settings loaded twice (in execute and detectEpic)
25	#559	`start.ts:225-228`	`--swarm` on null `epicDetection` produces no log message
26	#560	`LoomManager.ts:~1186`	Missing `extractIssueNumber` for PR-type swarm looms
27	#562	`SwarmSupervisor.ts:372-376`	Force shutdown orphans child processes
28	#562	`SwarmSupervisor.ts:279-289`	Graceful shutdown has no timeout — can hang indefinitely
29	#562	`SwarmSupervisor.ts:304-310`	PR search by `issueId in:title` can match wrong PRs
30	#563	`BeadsManager.ts:256-264`	`list()` silently returns `[]` on parse failure (same as #13)
31	#563	`SwarmSupervisor.ts`	`maxRetries=1` means 1 total attempt, 0 retries — naming misleading
32	#563	`SwarmSupervisor.ts:844`	Progress file not written atomically — readers could see partial JSON
33	#564	`start-swarm.test.ts:303`	Dynamic import in test violates project guidelines
34	#564	`start.ts:763`	No range validation for `--max-agents` CLI flag

Positive Observations

Clean dependency injection throughout — all core classes accept dependencies via constructor
Good error typing with BeadsError preserving exit code and stderr
Well-structured test coverage: 98 new tests across 7 commits
Clean separation of concerns: BeadsManager (CLI), BeadsSyncService (sync), EpicDetector (detection), SwarmSupervisor (orchestration)
Swarm fast path cleanly short-circuits without touching normal flow
Signal handler install/removal in try/finally prevents handler leaks
Template conditionals preserve non-swarm workflow unchanged

acreeger · 2026-02-06T03:28:42Z

Code Review Fixes Applied

All 34 issues from the code review have been addressed in commit 7991e33.

Critical/High (7 fixes)

#	Issue	Fix
1	Silent auto-install in CI	Throw error in non-interactive mode when `autoInstallBeads: false`
2	PATH stale after install	Append `~/.local/bin`, `~/.cargo/bin`, `/usr/local/bin` to PATH before verification
3	`findPRForBranch` swallows errors	Only return `null` for "no PR found"; re-throw network/auth/rate-limit errors
4	`GIT_REMOTE` not set in swarm mode	Default to `settings.mergeBehavior.remote` or `origin`
5	`EPIC_BRANCH` not validated	Throw if missing when `SWARM_MODE` is enabled
6	Broken interpolation in `confirmSwarmMode`	Fixed template literal to include issue number
7	`parseInt` NaN for `--max-agents`	Custom parser with NaN check + range validation (1-20)

Medium (12 fixes)

#	Issue	Fix
8	Infinite loop on failed tasks	Filter permanently failed tasks from exit condition
9	Conflict resolver wrong cwd	Store `loomPath` in `MergeQueueEntry` + `taskLoomPaths` map
10	Log streams never closed	Store `logStream` in `ActiveAgent`, close on completion/shutdown
11	Premature supervisor exit	Track `pendingReleases` counter in exit condition
12	`parseInt` truncates IDs	Strict regex `^\d+$` before parsing
13+30	`ready()`/`list()` swallow parse errors	Throw `BeadsError` on JSON parse failure
14	Sync idempotency uses `ready()`	Changed to `list()` for full task visibility
15	`detectEpic` catch-all	Narrow to expected API/network errors only
16	N+1 dependency fetching	Parallel via `Promise.allSettled`
17	`SKIP_IMPLEMENTATION` not swarm-aware	Added swarm conditional for issue close + status reporting
18	Human review contradicts swarm	Wrapped in `{{#unless SWARM_MODE}}`, autonomous override added
19	Missing metadata in swarm path	Propagate `isEpic`/`swarmStatus` in `finishSwarmLoom`

Low (15 fixes)

Issues #20-34: Security comments for unpinned install, env filtering for subprocess, mock property fixes, duplicate list numbering, extractIssueNumber for PR-type looms, dynamic import replaced, settings double-load removed, --swarm on non-epic warning, force shutdown process cleanup, graceful shutdown 30s timeout, PR search branch matching, maxRetries semantics documented, atomic progress file writes.

Validation

3908 tests pass (119 files, 0 failures)
TypeScript compile: clean
ESLint: clean

acreeger · 2026-02-08T15:35:46Z

Implementation Complete - Beads ID format fix

Summary

Fixed bug where Beads CLI rejected task IDs because they were plain GitHub issue numbers (e.g., 54) instead of the required prefix-hash format (e.g., gh-54). Added toBeadsId() and fromBeadsId() helper functions and fixed all ID mapping throughout BeadsSyncService and SwarmSupervisor.

Changes Made

BeadsSyncService.ts: Added toBeadsId() and fromBeadsId() exported helpers; fixed idempotency check to compare using toBeadsId(child.id) instead of raw child.id; replaced inline regex patterns with toBeadsId() calls
SwarmSupervisor.ts: Imported fromBeadsId; updated parseIssueIdentifier() to strip gh- prefix before parsing; updated closeIssue() and findPRForBranch() to use raw issue IDs for GitHub API calls
BeadsSyncService.test.ts: Updated all mock Beads task IDs to use gh- prefix; added unit tests for toBeadsId() and fromBeadsId() helpers
SwarmSupervisor.test.ts: Updated all Beads task IDs and assertions to use gh- prefix format

Validation Results

Build: Passed
Tests: 3905 passed / 3928 total (23 skipped)
All 119 test files passing

Detailed Changes by File (click to expand)

src/lib/BeadsSyncService.ts

Changes: Added toBeadsId() and fromBeadsId() helper functions; fixed idempotency check and replaced inline regex patterns

toBeadsId(): Converts raw issue IDs to Beads format (e.g., '54' -> 'gh-54'). IDs already in prefix-hash format (e.g., 'ENG-123') pass through unchanged
fromBeadsId(): Strips gh- prefix to recover raw issue ID. Non-gh- prefixed IDs pass through unchanged
Fixed existingTaskIds.has() check to use toBeadsId(child.id) so re-sync correctly detects existing tasks
Replaced inline child.id.match(/^[a-z]+-/) ? child.id : \gh-${child.id}`withtoBeadsId()` calls

src/lib/SwarmSupervisor.ts

Changes: Fixed ID mapping for GitHub API calls that need raw issue numbers

parseIssueIdentifier(): Now calls fromBeadsId() to strip gh- prefix before parsing (e.g., 'gh-100' -> 100)
closeIssue(): Strips gh- prefix before calling gh issue close
findPRForBranch(): Strips gh- prefix before searching for PRs by issue number

src/lib/BeadsSyncService.test.ts

Changes: Updated all test expectations and added helper function tests

All mockBeadsManager.create return values and assertions now use gh- prefixed IDs
Existing tasks in idempotency test use gh- prefix
Dependency assertions use gh- prefixed IDs
Added describe('toBeadsId') with 2 test cases
Added describe('fromBeadsId') with 3 test cases

src/lib/SwarmSupervisor.test.ts

Changes: Updated all Beads task IDs and assertions to use gh- prefix

All createBeadsTask() calls use gh-100, gh-101, etc.
All syncService.syncEpicToBeads mock results use gh- prefixed beadsTaskId
Assertions for beadsManager.claim, beadsManager.close, beadsManager.releaseClaim updated
closeIssue assertion verifies raw ID (100) is passed to gh issue close

acreeger · 2026-02-08T15:45:18Z

Analysis: Research `beads.role` in Beads CLI

Fetch issue context
Research Beads GitHub repo documentation
Search Beads source code for beads.role
Check our BeadsManager.ts integration
Document findings

Executive Summary

beads.role is a git config value (maintainer or contributor) that controls how Beads routes planning issues. The warning "warning: beads.role not configured. Run 'bd init' to set." is emitted to stderr by DetectUserRole() on every bd command that invokes role detection. Our bd init --quiet --skip-hooks --skip-merge-driver call never sets the role because --quiet suppresses prompts and we run in non-TTY contexts -- the interactive role prompt in promptContributorMode() only fires when stdin is a TTY and no --contributor/--team flag is passed.

The role does not affect our DAG operations (create, ready, claim, close, dep). It only affects Beads' issue routing system, which we bypass entirely since iloom syncs issues from GitHub/Linear to Beads itself.

Impact Summary

The warning is cosmetic -- it does not cause bd commands to fail (exit code 0)
It appears on stderr during commands that trigger DetectUserRole() (routing-related operations)
Fix: After bd init, set the role via either git config beads.role maintainer or bd config set beads.role maintainer in the project directory
Only 1 file affected: /Users/adam/Documents/Projects/iloom-cli/feat-issue-557__autonomous-swarm-mode/src/lib/BeadsManager.ts (the init() method at line 168)

Complete Technical Reference (click to expand for implementation details)

Answers to Each Question

1. What is `beads.role` and what does it do?

beads.role determines whether Beads treats your repository context as a maintainer (push access, in-repo storage) or contributor (fork workflow, separate planning repo). It governs how DetermineTargetRepo() in internal/routing/routing.go routes planning issues.

For iloom's use case (DAG task management with BEADS_DIR outside the repo), the role is irrelevant -- we never use Beads' issue routing system. We sync issues ourselves and only use bd create, bd ready, bd claim, bd close, and bd dep.

2. Valid values

Exactly two values, validated in cmd/bd/config.go:

maintainer -- repo owner or team with push access
contributor -- fork/OSS contributor without direct push access

Any other value triggers a warning from bd doctor and DetectUserRole().

3. How is it configured?

Primary storage: Git config (git config beads.role <value>)

Set via:

git config beads.role maintainer -- direct git config
bd config set beads.role maintainer -- validates value before writing to git config (preferred)
bd init interactive prompt -- asks "Contributing to someone else's repo? [y/N]" when stdin is TTY and no --contributor/--team flag

Read via:

git config --get beads.role
bd config get beads.role

Fallback: If unset, DetectUserRole() falls back to deprecated URL-based heuristic (SSH = maintainer, HTTPS without creds = contributor), which triggers the warning.

Database fallback: cmd/bd/doctor/role.go also checks SQLite for legacy configs (pre-GH#1531).

4. Is it required or just a warning we can suppress?

Just a warning. The DetectUserRole() function emits to stderr via fmt.Fprintln(os.Stderr, "warning: beads.role not configured. Run 'bd init' to set.") and then falls back to URL heuristics. The bd command still completes with exit code 0. Our execBd() method captures stderr but only throws on non-zero exit codes.

However, the warning pollutes stderr which could be confusing in logs.

5. Can we set it programmatically during our `bd init` call?

bd init does not accept a --role flag. The role-related flags are:

--contributor -- runs contributor wizard (sets role to contributor)
--team -- runs team wizard (sets role to maintainer)

Neither is appropriate since they trigger full wizard flows.

Best approach: After bd init, call bd config set beads.role maintainer which validates and writes to git config. This can be done via our existing execBd() method:

await this.execBd(['config', 'set', 'beads.role', 'maintainer'], { cwd: this.projectPath })

Alternatively, run git config beads.role maintainer directly via execa in the project directory, but bd config set is preferred because it validates the value.

Codebase Research Findings

Affected Area: BeadsManager.init()

Entry Point: /Users/adam/Documents/Projects/iloom-cli/feat-issue-557__autonomous-swarm-mode/src/lib/BeadsManager.ts:168-179 - the init() method
Dependencies:

Uses: execBd() private method (line 298)
Used By: SwarmSupervisor and il start swarm flow

Why the warning occurs with our current init call

Our call: bd init --quiet --skip-hooks --skip-merge-driver

The --quiet flag suppresses output but does NOT set the role. The interactive role prompt (promptContributorMode() in cmd/bd/init.go) only fires when:

In a git repository (yes)
No --contributor or --team flag (correct, we don't pass them)
stdin is a TTY (shouldPromptForRole() check)

Since our execBd() runs with encoding: 'utf8' and no stdio: 'inherit', stdin is NOT a TTY, so the prompt is skipped entirely. The role is never set.

Beads Source Files Investigated

cmd/bd/doctor/role.go -- CheckBeadsRole() validates role, returns warning status if unset
cmd/bd/config.go -- bd config set/get beads.role implementation, validates against allowlist
cmd/bd/init.go -- promptContributorMode() sets role interactively during init
internal/routing/routing.go -- DetectUserRole() emits the warning to stderr, falls back to URL heuristic
internal/beads/context.go -- ErrRoleNotConfigured error constant, RequireRole() method

bd init available flags (full list)

Flag	Description
`--prefix/-p`	Issue prefix
`--quiet/-q`	Suppress output
`--branch/-b`	Git branch for beads commits
`--backend`	Storage backend (sqlite/dolt)
`--force`	Force re-initialization
`--contributor`	Run contributor wizard
`--team`	Run team wizard
`--stealth`	Enable stealth mode
`--setup-exclude`	Configure git exclude for forks
`--skip-hooks`	Skip git hooks installation
`--skip-merge-driver`	Skip merge driver setup
`--from-jsonl`	Import from existing issues.jsonl

No --role flag exists.

acreeger · 2026-02-08T15:55:20Z

Analysis: Beads CLI (`bd`) Prefix System Research

Fetch issue context
Research Beads CLI source code (GitHub repo)
Analyze bd init prefix behavior
Analyze bd create --id validation logic
Document findings

Executive Summary

The prefix mismatch error is a direct consequence of iloom's BeadsManager.init() not passing --prefix to bd init. Beads auto-detects the prefix from the current working directory name, producing iloom-test-project-github as the prefix. When BeadsSyncService.toBeadsId() then generates IDs like gh-54, Beads rejects them because they don't start with iloom-test-project-github-. The fix requires coordinating the prefix between bd init and task ID generation.

Question	Answer
Should the prefix be a short token like `gh` or the full repo slug like `iloom-test-project-github`?	Using the full repo slug prevents collisions across repos but creates verbose IDs. Beads supports any prefix format; this is an iloom design choice.
Should we use `--prefix` on init, or `--force` on create?	`--prefix` on init is the clean approach. `--force` bypasses all validation and is a blunt workaround.
Where should the prefix be derived from (repo name, org/repo, custom)?	The prefix needs to be consistent between init and ID generation. It can be derived from the repo slug or be configurable via swarm settings.

Impact Summary

2 files requiring modification: BeadsManager.ts (init method) and BeadsSyncService.ts (toBeadsId/fromBeadsId)
The prefix choice must be consistent between init and create calls
Existing tests in BeadsSyncService.test.ts assume gh- prefix and will need updating if the prefix changes

Complete Technical Reference (click to expand for implementation details)

Problem Space Research

Problem Understanding

When iloom syncs GitHub child issues into Beads for swarm mode DAG orchestration, bd create --id gh-54 fails because the Beads database was initialized with prefix iloom-test-project-github (auto-detected from the directory name), but the ID gh-54 starts with gh, not iloom-test-project-github.

Architectural Context

Beads enforces prefix consistency: all IDs in a database must share the same prefix. This is by design to prevent cross-project contamination when databases are shared. iloom stores Beads databases outside the repo at ~/.config/iloom-ai/beads/<project-hash>, so cross-project contamination is already prevented by the project-hash directory isolation.

Edge Cases Identified

Multiple repos with same name: Two repos named my-app in different orgs would hash to different BEADS_DIR paths (iloom uses project path hash), so prefix collision is not a concern at the filesystem level
Prefix with special characters: Beads normalizes prefixes by stripping trailing hyphens. The prefix iloom-test-project-github is valid but verbose
Linear integration: Linear IDs like ENG-123 already have a prefix format. If Linear is the issue tracker, the prefix should match (e.g., ENG)

Third-Party Research Findings

Beads CLI (Go, ~4 months old)

Source: GitHub source code at https://github.com/steveyegge/beads (cloned and analyzed directly)

How `bd init` Sets the Prefix

Prefix determination follows strict precedence (from cmd/bd/init.go:122-152):

--prefix / -p flag: Highest priority. bd init --prefix gh sets prefix to gh
config.yaml value: config.GetString("issue-prefix") from .beads/config.yaml
Auto-detect from git history: Scans JSONL for existing issues, extracts prefix from first issue
Directory name fallback: filepath.Base(cwd) - the basename of CWD

After determination, trailing hyphens are stripped: strings.TrimRight(prefix, "-").

The prefix is permanently stored in SQLite: store.SetConfig(ctx, "issue_prefix", prefix) at init.go:369.

How `bd create --id` Validates

From cmd/bd/create.go:458-483:

Reads issue_prefix from database (or daemon RPC)
Also reads allowed_prefixes for multi-prefix support
Calls validation.ValidateIDPrefixAllowed(explicitID, dbPrefix, allowedPrefixes, forceCreate)

The validation (internal/validation/bead.go:139-167) checks:

If force is true, skip all validation
If dbPrefix is empty, skip validation
If id starts with dbPrefix + "-", pass
If id starts with any entry in allowedPrefixes + "-", pass
Otherwise: error with "prefix mismatch: database uses '{dbPrefix}-' but ID '{id}' doesn't match (use --force to override)"

ID Format Requirements

From internal/validation/bead.go:55-76 (ValidateIDFormat):

Must contain at least one hyphen
Format: prefix-hash or prefix-number (e.g., bd-a3f8e9, bd-42)
Hierarchical: prefix-hash.number (e.g., bd-a3f8.1)
Multi-hyphen prefixes supported: web-app-a3f8e9 extracts as prefix web-app

Querying the Current Prefix

Two methods:

bd info --json returns { "issue_prefix": "..." } among other fields
bd config get issue_prefix reads from database (not documented for this key but uses same mechanism)

Multi-Prefix Support

allowed_prefixes is a comma-separated config value stored in the database: store.GetConfig(ctx, "allowed_prefixes"). IDs matching any prefix in this list (plus the primary issue_prefix) pass validation.

`--force` Bypass

bd create --id gh-54 --force skips all prefix validation. This is the emergency escape hatch but is not the intended workflow.

Codebase Research Findings

Affected Area: BeadsManager + BeadsSyncService

Entry Point for the bug: BeadsManager.init() at /Users/adam/Documents/Projects/iloom-cli/feat-issue-557__autonomous-swarm-mode/src/lib/BeadsManager.ts:168-181

The init method calls bd init --quiet --skip-hooks --skip-merge-driver without --prefix. Since BEADS_DIR is set to a directory outside the repo, and cwd is set to projectPath, Beads auto-detects the prefix from filepath.Base(projectPath).

For repo iloom-ai/iloom-test-project-github, if the checkout is at a path ending in iloom-test-project-github, the prefix becomes iloom-test-project-github.

ID Generation: BeadsSyncService.toBeadsId() at /Users/adam/Documents/Projects/iloom-cli/feat-issue-557__autonomous-swarm-mode/src/lib/BeadsSyncService.ts:30-32

This function prefixes numeric GitHub IDs with gh- (e.g., 54 becomes gh-54). This hardcoded gh- prefix does not match the database prefix.

Dependencies:

BeadsManager.init() is called from SwarmSupervisor during epic startup
BeadsSyncService.syncEpicToBeads() calls BeadsManager.create() with IDs from toBeadsId()
toBeadsId() and fromBeadsId() are used throughout swarm code for ID conversion

Key Design Tension

The user's stated preference is IDs that include repo info to prevent collisions (e.g., iloom-test-project-github-54). However, iloom already isolates Beads databases per-project via a SHA-256 hash of the project path (BeadsManager.computeProjectHash()), so cross-project collision is already impossible at the filesystem level. The question is whether human-readable disambiguation is worth the verbosity.

Possible Approaches (for reference, not recommendation)

Pass --prefix to bd init with a value that matches what toBeadsId() generates (e.g., gh)
Change toBeadsId() to generate IDs that match whatever prefix Beads auto-detects
Use --force on every bd create call to bypass validation
Use allowed_prefixes to register gh as an additional allowed prefix after init

Affected Files

/Users/adam/Documents/Projects/iloom-cli/feat-issue-557__autonomous-swarm-mode/src/lib/BeadsManager.ts:168-181 - init() method does not pass --prefix to bd init
/Users/adam/Documents/Projects/iloom-cli/feat-issue-557__autonomous-swarm-mode/src/lib/BeadsSyncService.ts:30-32 - toBeadsId() hardcodes gh- prefix
/Users/adam/Documents/Projects/iloom-cli/feat-issue-557__autonomous-swarm-mode/src/lib/BeadsSyncService.ts:40-42 - fromBeadsId() hardcodes gh- stripping
/Users/adam/Documents/Projects/iloom-cli/feat-issue-557__autonomous-swarm-mode/src/lib/BeadsSyncService.test.ts - Tests assume gh- prefix format throughout

Integration Points

BeadsManager.init() is called by SwarmSupervisor which passes this.projectPath as CWD
BeadsSyncService uses BeadsManager.create() with IDs from toBeadsId()
fromBeadsId() is used to convert Beads task IDs back to issue tracker IDs for GitHub API calls
The bd CLI reads BEADS_DIR env var set by BeadsManager.execBd() for all operations

Beads CLI Reference Summary

Command	Relevant Behavior
`bd init --prefix <value>`	Sets prefix explicitly, stored in SQLite
`bd init` (no --prefix)	Auto-detects: config.yaml > git history > directory name
`bd create --id <id>`	Validates ID starts with `{prefix}-`
`bd create --id <id> --force`	Bypasses prefix validation
`bd info --json`	Returns `issue_prefix` among other config
`bd config set allowed_prefixes "gh,other"`	Adds allowed prefixes for multi-prefix support

acreeger · 2026-02-08T16:02:43Z

Implementation Complete - Fix Beads task ID prefix system

Summary

Replaced the hardcoded gh- prefix in Beads task IDs with a repo-aware prefix derived from the repository name. The prefix is now set via bd init --prefix <repo-name> and queried at runtime via bd info --json.

Changes Made

BeadsManager.ts: Added prefix parameter to init() and new getPrefix() method
BeadsSyncService.ts: toBeadsId() and fromBeadsId() now accept a prefix parameter; sync queries prefix from BeadsManager.getPrefix() with caching
SwarmSupervisor.ts: Added beadsPrefix to EpicLoomContext; all fromBeadsId() calls use the dynamic prefix
start.ts: Fetches repo info via getRepoInfo() and passes repoInfo.name as the Beads prefix

Validation Results

Tests: 3935 passed / 3935 total (119 test files)
Typecheck/Build: Passed
Lint: Passed

Detailed Changes by File (click to expand)

Files Modified

`src/lib/BeadsManager.ts`

Changes: Added optional prefix parameter to init() which passes --prefix to bd init; added getPrefix() method that queries bd info --json for the issue_prefix field.

`src/lib/BeadsSyncService.ts`

Changes: toBeadsId(issueId, prefix) and fromBeadsId(beadsId, prefix) now require a prefix parameter instead of hardcoding gh-. The BeadsSyncService class caches the prefix via getBeadsPrefix() using BeadsManager.getPrefix().

`src/lib/SwarmSupervisor.ts`

Changes: Added beadsPrefix field to EpicLoomContext interface. Supervisor loads prefix via beadsManager.getPrefix() after init and uses it in all fromBeadsId() calls (closeIssue, findPRForBranch, parseIssueIdentifier). Passes prefix to beadsManager.init().

`src/commands/start.ts`

Changes: Imports getRepoInfo from github utils. Derives beadsPrefix from repoInfo.name and passes it to the supervisor via EpicLoomContext.beadsPrefix.

`src/lib/BeadsManager.test.ts`

Changes: Added tests for init() with prefix parameter and getPrefix() method.

`src/lib/BeadsSyncService.test.ts`

Changes: Updated all test data to use iloom-test-project prefix instead of gh-. Added test for prefix caching. Updated toBeadsId/fromBeadsId tests for new signature.

`src/lib/SwarmSupervisor.test.ts`

Changes: Added getPrefix mock to BeadsManager, updated all task IDs from gh-XXX to test-repo-XXX, added beadsPrefix to EpicLoomContext.

`src/commands/start-swarm.test.ts`

Changes: Added getRepoInfo mock, added getPrefix to BeadsManager mock, verified beadsPrefix is passed in EpicLoomContext.

acreeger · 2026-02-08T22:03:14Z

Implementation: Simplify Beads prefix system

Remove getPrefix() method from BeadsManager.ts
Update BeadsSyncService.ts to accept prefix as constructor parameter
Update SwarmSupervisor.ts to use epicLoom.beadsPrefix directly
Update start.ts to pass prefix to BeadsSyncService constructor
Update BeadsManager.test.ts - remove getPrefix tests
Update BeadsSyncService.test.ts - update constructor, remove getPrefix mock
Update SwarmSupervisor.test.ts - remove getPrefix mock
Update start-swarm.test.ts - remove getPrefix mock
Run pnpm build and pnpm test to verify

acreeger · 2026-02-08T22:05:05Z

Implementation Complete - Issue #557 (Simplify Beads Prefix) ✅

Summary

Removed stale getPrefix references from test mocks across 3 test files. The production code (BeadsManager.ts, BeadsSyncService.ts, SwarmSupervisor.ts, start.ts) was already correctly using a deterministic prefix pattern -- BeadsSyncService accepts prefix via constructor, SwarmSupervisor uses epicLoom.beadsPrefix, and start.ts computes it from repoInfo.name. No production code changes were needed.

Changes Made

src/lib/BeadsSyncService.test.ts: Replaced stale getPrefix cache test with a test verifying the constructor-provided prefix is used for all task IDs
src/lib/SwarmSupervisor.test.ts: Removed getPrefix from createMockBeadsManager() mock
src/commands/start-swarm.test.ts: Removed getPrefix from BeadsManager mock (2 occurrences)

Validation Results

✅ Build: Passed
✅ Tests: 3910 passed / 3933 total (23 skipped)
✅ All 119 test files passed

📋 Detailed Changes by File (click to expand)

Files Modified

`src/lib/BeadsSyncService.test.ts`

Changes: Replaced the "should cache the prefix across multiple calls within the same sync" test (which asserted mockBeadsManager.getPrefix was called once) with "should use the prefix passed via constructor for all task IDs" (which verifies both create calls use the constructor-provided prefix).

`src/lib/SwarmSupervisor.test.ts`

Changes: Removed getPrefix: vi.fn().mockResolvedValue(TEST_PREFIX) from the mock BeadsManager factory at line 70.

`src/commands/start-swarm.test.ts`

Changes: Removed all getPrefix: vi.fn().mockResolvedValue('iloom-test-project') entries from the BeadsManager mock (lines 62 and 476).

Dependencies Added

None

acreeger · 2026-02-08T22:08:00Z

Implementation Complete - Add user-visible logging to Beads DAG sync

Summary

Added user-visible logger.info() calls to BeadsSyncService.syncEpicToBeads() so users can see the DAG setup process. Changed existing logger.debug() calls at key milestones to logger.info() and removed the redundant summary line from SwarmSupervisor since BeadsSyncService now handles the final summary.

Changes Made

src/lib/BeadsSyncService.ts: Changed open child issue count from debug to info, added per-task creation log, added per-dependency link log, updated final summary to show "DAG ready: N tasks, M dependencies" format with proper singular/plural
src/lib/SwarmSupervisor.ts: Removed redundant "Synced N tasks" log line (now handled by BeadsSyncService's "DAG ready" summary)
src/lib/BeadsSyncService.test.ts: Added 5 new tests verifying user-visible logging for child count, task creation, dependency linking, DAG summary, and singular forms

Validation Results

Tests: 3915 passed / 3915 total (22 BeadsSyncService tests including 5 new)
Build: Passed
No lint issues

Detailed Changes by File (click to expand)

Files Modified

src/lib/BeadsSyncService.ts

Changes: Upgraded 3 debug log calls to info-level user-visible output, added 1 new dependency log line

Line 87: logger.info(' Found N open child issue(s)') - shows count after filtering closed issues
Line 122: logger.info(' Creating task: #ID - Title') - shows each task as it's created
Line 155: logger.info(' Linking dependency: #X depends on #Y') - shows each dependency link
Lines 180-183: logger.info(' DAG ready: N tasks, M dependencies') - final summary with proper singular/plural

src/lib/SwarmSupervisor.ts

Changes: Removed redundant summary line after sync (line 215 previously: logger.info('Synced N tasks...'))

src/lib/BeadsSyncService.test.ts

Changes: Added logger import and 5 new test cases

should log user-visible output for child issue count
should log user-visible output for each created task
should log user-visible output for dependencies
should log DAG summary with correct counts
should use singular forms when counts are 1

acreeger · 2026-02-09T00:50:02Z

Implementation: Fix swarm mode worktree reuse bug

Analyze bug: reuseIloom() doesn't check swarmMode, launches IDE/terminal/dev server/Claude
Fix reuseIloom() to add swarm mode fast path (skip color sync, issue status, launching)
Add test: reusing worktree with swarmMode: true should not launch components
Add test: reusing worktree with swarmMode: true should write metadata with swarmAgent flag
Run build and tests to validate

acreeger · 2026-02-09T00:54:17Z

Implementation Complete - Issue #557 (Swarm Mode Reuse Path Fix)

Summary

Fixed the bug where SwarmSupervisor child looms launched IDE, terminals, and dev servers when reusing existing worktrees in swarm mode. The reuseIloom() method in LoomManager.ts now checks for swarmMode and calls finishSwarmLoomReuse() to skip all interactive/visual components, matching the behavior of the new worktree creation path.

Changes Made

src/lib/LoomManager.ts: Added finishSwarmLoomReuse() private method that mirrors finishSwarmLoom() for reused worktrees (skips color sync, issue status, IDE, terminal, dev server, Claude launch). Added swarm mode check in reuseIloom() before the launch sequence.
src/lib/LoomManager.test.ts: Added 3 tests for swarm mode reuse: verifying no workspace components are launched, metadata includes swarmAgent flag, and environment files are still copied.
src/lib/SwarmSupervisor.ts: Added beadsPrefix to EpicLoomContext, uses fromBeadsId() to strip prefix for GitHub operations (closeIssue, findPRForBranch, parseIssueIdentifier), added duplicate progress line suppression.
src/lib/SwarmSupervisor.test.ts: Updated task IDs to use prefixed Beads format (test-repo-100), added test for progress line deduplication.
Other files (BeadsManager, BeadsSyncService, start-swarm, start): Related swarm infrastructure changes.

Validation Results

Tests: 3919 passed / 3942 total (23 skipped)
Build: Passed
All test files: 119 passed

Detailed Changes by File (click to expand)

Files Modified

`src/lib/LoomManager.ts`

Changes: Added swarm mode fast path in reuseIloom() at line ~1417, and finishSwarmLoomReuse() private method (lines 1273-1354) that writes metadata with swarmAgent: true and neutral color, then returns a Loom without launching any interactive components.

`src/lib/LoomManager.test.ts`

Changes: Added 3 test cases under the existing worktree reuse describe block:

should not launch any workspace components when reusing worktree in swarm mode
should write metadata with swarmAgent flag when reusing worktree in swarm mode
should still copy environment files when reusing worktree in swarm mode

`src/lib/SwarmSupervisor.ts`

Changes: Added beadsPrefix field to EpicLoomContext, uses it in closeIssue(), findPRForBranch(), and parseIssueIdentifier() via fromBeadsId(). Added progress line dedup logic.

`src/lib/SwarmSupervisor.test.ts`

Changes: Updated all task IDs from '100' to 'test-repo-100' format to match Beads prefix behavior. Added dedup progress test.

acreeger · 2026-02-09T02:56:52Z

Analysis: SwarmSupervisor Bugs (Issue #557)

Analyze Bug 1: Swarm stops after first task completes
Analyze Bug 2: PR search hits GitHub API rate limit
Document findings

Executive Summary

Two bugs in SwarmSupervisor.orchestrate() cause the swarm to stop after completing just 1 of 6 tasks. Bug 1 is a premature loop exit: the exit condition only checks whether bd ready returns empty, without verifying that uncompleted tasks remain in the DAG. Bug 2 amplifies this: the PR search hits GitHub's rate limit (exhausted by the child agent), causing findPRForBranch to throw, which is caught and swallowed -- the task gets marked "completed without PR" and its PR is never merged.

HIGH/CRITICAL Risks

Premature loop exit loses work: The exit condition at SwarmSupervisor.ts:279 does not verify all tasks are finished. If bd ready returns empty for even one iteration (propagation delay, closeTask failure, or Beads state inconsistency), the loop breaks and the swarm exits with uncompleted tasks.
Silent closeTask failure blocks dependents: closeTask() at line 760-765 swallows all errors. If bd close fails, dependent tasks never get unblocked in the DAG, and the swarm exits thinking there is nothing left to do.
Rate limit causes missed PR merge: When findPRForBranch fails due to rate limits, the task is marked "completed without PR" -- but the agent likely DID create a PR. That PR is never merged into the epic branch.

Impact Summary

2 methods need modification in /src/lib/SwarmSupervisor.ts
1 method needs rate-limit retry logic: findPRForBranch()
Loop exit condition at line 279 needs to incorporate total task completion check
Tests in /src/lib/SwarmSupervisor.test.ts need updates for new behaviors

Complete Technical Reference (click to expand for implementation details)

Problem Space Research

Problem Understanding

The swarm supervisor orchestrates N child agents working on an epic's sub-issues. It relies on Beads (a DAG task tracker) to determine which tasks are ready (unblocked). When a task completes, closing it in Beads should unblock dependent tasks, making them appear in subsequent bd ready calls. Two failures conspire to break this flow.

Architectural Context

The supervisor loop follows a poll-based model: each iteration queries bd ready, claims/spawns agents, checks for completions, processes merges, then checks exit. The exit condition assumes that if bd ready returns empty AND no agents are running AND no merges pending, all work is done. This assumption is invalid when tasks exist in the DAG but aren't yet surfaced by bd ready.

Codebase Research Findings

Bug 1: Premature Loop Exit

Root Cause: The exit condition at /src/lib/SwarmSupervisor.ts:278-281 is:

const actionableReadyTasks = readyTasks.filter(t => !this.permanentlyFailed.has(t.id))
if (actionableReadyTasks.length === 0 && this.activeAgents.size === 0 && this.mergeQueue.length === 0 && this.pendingReleases === 0) {
    break
}

This checks only the current bd ready output. It does not check whether uncompleted tasks remain in the DAG. After task 54 completes:

checkCompletedAgents() processes task 54's completion (line 462-508)
closeTask() is called (line 501), which calls beadsManager.close() to mark it done in Beads
On the next iteration, bd ready is called (line 241). If the 5 dependent tasks haven't been unblocked yet (due to Beads propagation timing, or if closeTask silently failed), bd ready returns []
Exit condition is met: readyTasks empty, no active agents, no merge queue, no pending releases -> loop breaks

Contributing factor: closeTask() at line 760-765 wraps beadsManager.close() in try/catch and only warns on failure. If the Beads close command fails, dependent tasks are never unblocked, but the supervisor doesn't know.

The correct exit condition should verify that result.completed + result.failed + this.activeAgents.size + this.mergeQueue.length >= result.totalTasks, i.e., all tasks have been accounted for. Only then is it safe to exit.

Bug 2: Rate-Limited PR Search

Entry Point: /src/lib/SwarmSupervisor.ts:775-809 - findPRForBranch()

Mechanism:

Child agent (task 54) runs il spin -p, which internally makes many GitHub API calls during its work session
When agent completes, supervisor calls findPRForBranch() at line 482
findPRForBranch() executes gh pr list --search "is:pr is:open 54 in:title" (line 782-783)
This search uses GitHub's GraphQL API, which shares the rate limit budget with the child agent
GitHub returns "API rate limit already exceeded" with exit code 1
The error is caught at line 795-808 and re-thrown (it doesn't match "no pull requests" patterns)
The re-thrown error is caught at line 483-486 in checkCompletedAgents(), logged, and prNumber stays null
Falls through to line 498-503: task marked "completed without PR", never merged

Sub-issues with the search query:

gh pr list --search "54 in:title" is imprecise. Issue number "54" could match PR titles containing "54" in any context (e.g., "Update 54 dependencies", "PR This is a test piece of feedback, testing the new hb feedback command. It should have been added to the hatchbox-cli repo, and the have the label cli-feedback #154")
The --search flag sends a GraphQL query, which is more rate-limit-expensive than REST filtering
gh pr list --head <branch-name> would be a lighter-weight alternative that filters by branch name on the REST API side

executeGhCommand behavior (/src/utils/github.ts:13-32): No retry logic. No rate-limit detection. Throws on any non-zero exit code from gh. The 30-second timeout at line 19 is the only protection.

Affected Files

/src/lib/SwarmSupervisor.ts:278-281 - Loop exit condition: too aggressive, doesn't check for uncompleted tasks remaining in DAG
/src/lib/SwarmSupervisor.ts:760-765 - closeTask(): swallows errors that could prevent dependent task unblocking
/src/lib/SwarmSupervisor.ts:775-809 - findPRForBranch(): no rate-limit retry, imprecise title search query
/src/lib/SwarmSupervisor.ts:480-504 - checkCompletedAgents(): catches PR search failure and proceeds as if no PR exists
/src/lib/SwarmSupervisor.test.ts - Tests need cases for: premature exit with remaining tasks, rate-limited PR search retry

Similar Patterns Found

The pendingReleases counter (lines 166, 383-385, 543) was introduced to prevent premature exit after claim releases. This is the same class of problem -- the exit condition needs awareness of "tasks that exist but aren't yet visible in bd ready." The pendingReleases approach is a band-aid; checking total task completion would be more robust.

Edge Cases Identified

Beads propagation delay: After bd close, there may be a brief window where bd ready hasn't re-evaluated the DAG. A single poll cycle returning empty would cause premature exit.
All tasks blocked by failed task: If task A fails permanently and tasks B-F depend on A, they will never become ready. The current code handles this via permanentlyFailed filter, but does NOT detect that B-F are permanently stuck. The loop would exit correctly (no actionable tasks), but result.failed would only count 1, not 6.
Rate limit recovery: GitHub rate limits reset on a per-hour window. A retry with exponential backoff (e.g., wait 60s) could recover without user intervention, but would slow the loop significantly for one task.

Medium Severity Risks

Imprecise PR title search: Searching 54 in:title could match wrong PRs (e.g., PR This is a test piece of feedback, testing the new hb feedback command. It should have been added to the hatchbox-cli repo, and the have the label cli-feedback #154, or a PR mentioning "54" in its title). The exactMatch filter on line 786 mitigates this somewhat by preferring PRs whose branch name contains the issue ID, but the initial search can still return false positives.
maxRetries=1 semantic confusion: The docstring at line 514-516 says "maxRetries=1 means 1 total attempt (no retries)" but the setting name maxRetries implies "number of retries after first attempt." The default of 1 means tasks get exactly one shot with no retry, which may be surprising.

acreeger · 2026-02-09T03:03:27Z

Combined Analysis & Plan - Issue #557 (Swarm Bugs)

Executive Summary

Three bugs in SwarmSupervisor: (1) the orchestration loop exits after the first task completes because the exit condition doesn't verify all tasks are done -- blocked tasks won't appear in bd ready but aren't finished; (2) findPRForBranch uses imprecise --search with title matching instead of exact --head <branch> matching; (3) no retry logic for rate-limited GitHub API calls. Additionally, closeTask() silently swallows errors, preventing dependent tasks from being unblocked.

HIGH/CRITICAL Risks

Premature exit loses work: With 6 tasks where task 3 depends on task 1, completing task 1 and closing it in Beads can cause bd ready to return empty for one cycle (before task 3 becomes unblocked), triggering the exit condition with 5 tasks remaining.
closeTask swallowing errors blocks DAG: If bd close fails in processMergeQueue, dependent tasks are never unblocked and the swarm silently stalls or exits incomplete.

Implementation Overview

High-Level Execution Phases

Fix premature exit: Add result.completed + result.failed >= result.totalTasks check to exit condition
Fix closeTask error handling: Propagate errors from closeTask in critical paths (merge queue), keep non-fatal in failure paths
Store branch names: Add taskBranchNames map and populate from loom.branch in claimAndSpawnAgent
Fix findPRForBranch: Switch from --search to --head <branch-name>
Add rate limit retry: Create executeGhCommandWithRetry wrapper with exponential backoff for rate-limited calls
Update tests: Cover all three fixes

Quick Stats

2 files to modify (SwarmSupervisor.ts, SwarmSupervisor.test.ts)
1 file to modify for retry utility (github.ts)
0 new files
Dependencies: None

Complete Analysis & Implementation Details (click to expand)

Research Findings

Problem Space

Problem: SwarmSupervisor has three bugs that cause premature exit, imprecise PR matching, and fragility under rate limits.
Architectural context: SwarmSupervisor orchestrates headless agents via Beads DAG; all three bugs are in the supervisor loop and its helper methods.
Edge cases: (1) All tasks blocked by a single dependency that just closed; (2) PR branch name not matching issue ID format; (3) GitHub returning 403 rate limit during critical merge flow.

Codebase Research

Exit condition: SwarmSupervisor.ts:278-281 -- checks actionableReadyTasks.length === 0 && activeAgents.size === 0 && mergeQueue.length === 0 && pendingReleases === 0 but doesn't check if all tasks are actually done.
closeTask: SwarmSupervisor.ts:760-765 -- catches all errors from beadsManager.close() and logs warning only. Called from processMergeQueue:580, handleMergeConflict:665, handleAgentFailure:550, and checkCompletedAgents:501.
findPRForBranch: SwarmSupervisor.ts:775-809 -- uses --search with rawId in:title. The _epicLoom param is unused (prefixed with underscore).
Branch name availability: claimAndSpawnAgent at line 392-406 calls loomManager.createIloom() which returns a Loom object with .branch field, but only .path is stored in taskLoomPaths.
executeGhCommand: github.ts:13-32 -- thin wrapper with no retry logic.
ActiveAgent interface: SwarmSupervisor.ts:29-40 -- has no branch name field.

Affected Files

/src/lib/SwarmSupervisor.ts:141-173 - Private fields (add taskBranchNames map)
/src/lib/SwarmSupervisor.ts:278-281 - Exit condition in run() loop
/src/lib/SwarmSupervisor.ts:409 - claimAndSpawnAgent where loom.path is stored (also store loom.branch)
/src/lib/SwarmSupervisor.ts:566-607 - processMergeQueue where closeTask errors need to propagate
/src/lib/SwarmSupervisor.ts:760-765 - closeTask method (split into critical/non-critical paths)
/src/lib/SwarmSupervisor.ts:775-809 - findPRForBranch method
/src/lib/SwarmSupervisor.ts:732-736 - mergePR (add retry)
/src/utils/github.ts:13-32 - executeGhCommand (add retry wrapper)
/src/lib/SwarmSupervisor.test.ts - Tests for all three fixes

Medium Severity Risks

Retry delay in tests: Rate limit retry with real delays will slow tests; need to ensure sleep mock covers the retry utility.

Implementation Plan

Automated Test Cases to Create

Test File: /src/lib/SwarmSupervisor.test.ts (MODIFY)

Click to expand complete test structure (35 lines)

describe('premature exit fix', () => {
  it('should NOT exit when bd ready returns empty but tasks remain incomplete', async () => {
    // Setup: 3 tasks synced, task 1 ready, tasks 2+3 blocked on task 1
    // Cycle 1: bd ready returns [task1], claim+spawn, agent completes
    // Cycle 2: bd ready returns [] (task1 just closed, task2/3 not yet unblocked)
    // Verify: loop does NOT exit. Cycle 3: bd ready returns [task2, task3]
    // result.completed should be 3, not 1
  })
})

describe('closeTask error propagation', () => {
  it('should propagate closeTask error in processMergeQueue and count as failure', async () => {
    // Setup: agent succeeds, PR found, merge succeeds, bd close throws
    // Verify: error is thrown/caught at processMergeQueue level, task counted as failed
  })

  it('should still log-and-continue for closeTask errors in handleAgentFailure', async () => {
    // Setup: agent fails, exhausts retries, bd close throws
    // Verify: still counted as failed, no crash
  })
})

describe('findPRForBranch uses --head', () => {
  it('should search PR by head branch name instead of title search', async () => {
    // Verify executeGhCommand called with ['pr', 'list', '--state', 'open', '--json', 'number,headRefName', '--head', '<branch-name>']
  })
})

describe('rate limit retry', () => {
  it('should retry on rate limit error with backoff', async () => {
    // Setup: executeGhCommand fails with rate limit error, then succeeds
    // Verify: retried and succeeded
  })

  it('should give up after max retries', async () => {
    // Setup: executeGhCommand keeps failing with rate limit
    // Verify: eventually throws
  })
})

Test File: /src/utils/github.test.ts (CHECK IF EXISTS, otherwise add to SwarmSupervisor.test.ts)

describe('executeGhCommandWithRetry', () => {
  // Test retry on rate limit (403 + "rate limit")
  // Test no retry on non-rate-limit errors
  // Test max retries respected
})

Files to Modify

1. `/src/utils/github.ts`:13-32

Change: Add executeGhCommandWithRetry function that wraps executeGhCommand with exponential backoff for rate limit errors (HTTP 403 or "rate limit" in error message).

Click to expand implementation guidance (20 lines)

// Add after executeGhCommand (line 32):
export async function executeGhCommandWithRetry<T = unknown>(
  args: string[],
  options?: { cwd?: string; timeout?: number; maxRetries?: number; baseDelayMs?: number }
): Promise<T> {
  const maxRetries = options?.maxRetries ?? 3
  const baseDelayMs = options?.baseDelayMs ?? 5000
  
  for (let attempt = 0; attempt <= maxRetries; attempt++) {
    try {
      return await executeGhCommand<T>(args, options)
    } catch (error) {
      const isRateLimit = error instanceof Error && 
        (error.message.includes('rate limit') || error.message.includes('403') || 
         error.message.includes('secondary rate limit') || error.message.includes('API rate limit'))
      
      if (!isRateLimit || attempt >= maxRetries) throw error
      
      const delay = baseDelayMs * Math.pow(2, attempt) // exponential backoff
      logger.warn(`GitHub rate limit hit, retrying in ${delay / 1000}s (attempt ${attempt + 1}/${maxRetries})...`)
      await sleep(delay) // import from timers/promises
    }
  }
  throw new Error('unreachable')
}

2. `/src/lib/SwarmSupervisor.ts`:11

Change: Import executeGhCommandWithRetry alongside executeGhCommand.

3. `/src/lib/SwarmSupervisor.ts`:172

Change: Add private taskBranchNames: Map<string, string> = new Map() field after taskLoomPaths.

4. `/src/lib/SwarmSupervisor.ts`:409

Change: After this.taskLoomPaths.set(task.id, loom.path), add this.taskBranchNames.set(task.id, loom.branch).

5. `/src/lib/SwarmSupervisor.ts`:278-281 (exit condition)

Change: Add a check that all tasks are accounted for before allowing exit. Replace the current exit condition with:

const allTasksAccountedFor = result.totalTasks === 0 || (result.completed + result.failed >= result.totalTasks)
const actionableReadyTasks = readyTasks.filter(t => !this.permanentlyFailed.has(t.id))
if (allTasksAccountedFor && actionableReadyTasks.length === 0 && this.activeAgents.size === 0 && this.mergeQueue.length === 0 && this.pendingReleases === 0) {
  break
}

6. `/src/lib/SwarmSupervisor.ts`:760-765 (closeTask)

Change: Make closeTask propagate errors by default. Add a swallow option parameter for non-critical call sites. Critical callers (processMergeQueue line 580, handleMergeConflict line 665) should let the error propagate. Non-critical callers (handleAgentFailure line 550, checkCompletedAgents line 501) should pass { swallow: true }.

Click to expand implementation guidance (15 lines)

// Replace closeTask:
private async closeTask(taskId: string, reason?: string, options?: { swallow?: boolean }): Promise<void> {
  try {
    await this.beadsManager.close(taskId, reason)
  } catch (error) {
    const message = `Failed to close Beads task ${taskId}: ${error instanceof Error ? error.message : 'Unknown error'}`
    if (options?.swallow) {
      logger.warn(message)
      return
    }
    logger.error(message)
    throw error
  }
}

// Update call sites:
// Line 501 (checkCompletedAgents, no-PR path): closeTask(id, reason, { swallow: true })
// Line 550 (handleAgentFailure): closeTask(id, reason, { swallow: true })
// Line 580 (processMergeQueue): closeTask(id, reason) -- let it throw, caught by outer try/catch
// Line 665 (handleMergeConflict): closeTask(id, reason) -- let it throw, caught by outer try/catch

7. `/src/lib/SwarmSupervisor.ts`:775-809 (findPRForBranch)

Change: Accept branch name, use --head instead of --search. Use executeGhCommandWithRetry for resilience.

Click to expand implementation guidance (20 lines)

// Change signature to accept epicLoom (remove underscore) and use branch name:
private async findPRForBranch(issueId: string, epicLoom: EpicLoomContext): Promise<number | null> {
  // Get the branch name from our stored map
  const branchName = this.taskBranchNames.get(issueId)
  if (!branchName) {
    logger.warn(`No branch name stored for task ${issueId}, falling back to title search`)
    // Fallback: use old approach if branch name unknown
    // (shouldn't happen in normal flow)
    return this.findPRForBranchByTitle(issueId)
  }

  try {
    const prList = await executeGhCommandWithRetry<Array<{ number: number; headRefName: string }>>(
      ['pr', 'list', '--state', 'open', '--json', 'number,headRefName', '--head', branchName],
    )
    if (prList.length > 0 && prList[0]) {
      return prList[0].number
    }
    return null
  } catch (error: unknown) {
    // Same error handling as before for "no PR found" edge cases
    if (error instanceof Error) {
      const msg = error.message.toLowerCase()
      if (msg.includes('no pull requests match') || msg.includes('no open pull requests')) {
        return null
      }
    }
    throw error
  }
}

// Keep old method as fallback (rename from current findPRForBranch):
private async findPRForBranchByTitle(issueId: string): Promise<number | null> {
  // ... existing title-search logic ...
}

8. `/src/lib/SwarmSupervisor.ts`:732-736 (mergePR)

Change: Use executeGhCommandWithRetry instead of executeGhCommand for merge operations.

private async mergePR(prNumber: number): Promise<void> {
  await executeGhCommandWithRetry(['pr', 'merge', String(prNumber), '--merge', '--delete-branch'])
}

Detailed Execution Order

NOTE: These steps are executed in a SINGLE implementation run.

Add retry utility to github.ts
- Files: /src/utils/github.ts
- Add import { setTimeout as sleep } from 'timers/promises' at top
- Add executeGhCommandWithRetry function after line 32 -> Verify: TypeScript compiles
Add taskBranchNames field and populate it
- Files: /src/lib/SwarmSupervisor.ts
- Add field at line ~172, populate in claimAndSpawnAgent at line ~409 -> Verify: loom.branch stored
Fix exit condition
- Files: /src/lib/SwarmSupervisor.ts
- Replace lines 276-281 with allTasksAccountedFor check -> Verify: loop continues when tasks remain
Fix closeTask error handling
- Files: /src/lib/SwarmSupervisor.ts
- Update closeTask signature and body at line 760-765
- Update 4 call sites (lines ~501, ~550, ~580, ~665) -> Verify: critical paths throw, non-critical swallow
Fix findPRForBranch to use --head
- Files: /src/lib/SwarmSupervisor.ts
- Update import at line 11 to add executeGhCommandWithRetry
- Rewrite findPRForBranch at line 775 -> Verify: uses --head with branch name
Update mergePR to use retry
- Files: /src/lib/SwarmSupervisor.ts
- Replace executeGhCommand with executeGhCommandWithRetry in mergePR at line 733 -> Verify: merges retry on rate limit
Update and add tests
- Files: /src/lib/SwarmSupervisor.test.ts
- Add premature exit test, closeTask error test, findPRForBranch --head test, rate limit retry test -> Verify: pnpm test passes
Build verification
- Run pnpm build to verify TypeScript compiles successfully

Dependencies and Configuration

None

acreeger · 2026-02-09T03:41:34Z

Summary: Swarm Bug Fixes & Resilience Improvements

Three critical swarm mode bugs were fixed plus resilience improvements:

Bug 1: Premature Loop Exit (Critical)

Problem: The swarm orchestration loop exited after completing just 1 of 6 tasks. The exit condition only checked if bd ready returned empty + no active agents, but didn't verify all tasks were actually done. Between one task closing and its dependents being unblocked, bd ready could temporarily return empty, triggering premature exit.

Fix: Added allTasksAccountedFor check (result.completed + result.failed >= result.totalTasks) to the exit condition in SwarmSupervisor.ts. The loop now only exits when all tasks are accounted for.

Bug 2: Imprecise PR Search

Problem: findPRForBranch() used gh pr list --search "54 in:title" which was imprecise (could match wrong PRs) and used GraphQL (heavier on rate limits). It also failed due to a GitHub API rate limit during testing.

Fix:

Switched to gh pr list --head <branch-name> for exact matching by branch name
Added taskBranchNames map to store branch names from loom.branch when agents spawn
Falls back to title search if no branch name is stored

Bug 3: No Rate Limit Resilience

Problem: When GitHub API calls hit rate limits, the supervisor just failed and moved on. Critical operations like PR merge and PR search had no retry logic.

Fix: Added executeGhCommandWithRetry() in github.ts with exponential backoff for rate-limited calls (403, "rate limit", "secondary rate limit"). Used for mergePR() and findPRForBranch().

Bug 4: closeTask Error Swallowing

Problem: closeTask() silently swallowed all errors from bd close. If closing a task failed, dependent tasks were never unblocked in the Beads DAG, but the supervisor didn't know and exited incomplete.

Fix: Made closeTask() propagate errors by default in critical paths (merge queue, conflict resolution). Non-critical paths (agent failure handling, no-PR completion) pass { swallow: true } to keep the old behavior.

Additional Fix: Progress Logging

Status line now only prints when values change (no more repeated identical lines)
Dots (.) printed to stderr between changes to show liveness
Added detailed DAG sync logging (task creation, dependency linking, summary)

OOM Investigation Finding

During testing, discovered that SwarmSupervisor.test.ts caused OOM crashes when run alongside other test files. Root cause: the process.stderr.write('.') in the progress logging creates an infinite tight loop when setTimeout is mocked to resolve instantly (as vitest does).

Fix: Added sleepFn injection parameter to executeGhCommandWithRetry() and ensured the SwarmSupervisor test mocks properly bound the orchestration loop iterations.

Lesson learned: Mocking Node.js built-in modules like timers/promises at file level can destabilize vitest worker processes. Prefer dependency injection (sleepFn parameter) over module-level mocks for timer-dependent code.

Files Changed

src/lib/SwarmSupervisor.ts - Loop exit condition, closeTask error handling, findPRForBranch rewrite, progress dedup, branch name tracking
src/utils/github.ts - Added executeGhCommandWithRetry with exponential backoff
src/lib/BeadsSyncService.ts - DAG sync logging, deterministic prefix
src/lib/BeadsManager.ts - bd init --prefix, bd config set beads.role maintainer
src/lib/LoomManager.ts - Swarm fast path for reused worktrees
src/commands/start.ts - Pluralization fix, beadsPrefix from repoInfo.name
Tests updated: SwarmSupervisor.test.ts, BeadsSyncService.test.ts, BeadsManager.test.ts, github.test.ts, start-swarm.test.ts

…a, and epic label support (#558)

)

…gents flags (#559)

…and sequential merge queue (#562)

…and progress reporting (#563)

…ts and docs (#564)

…ntation Fix all critical, medium, and low priority issues from code review: - Eliminate error swallowing in BeadsManager, SwarmSupervisor, and start.ts - Add input validation for --max-agents (NaN check + range 1-20) - Fix silent auto-install in CI when autoInstallBeads is false - Update PATH after Beads install for verification check - Prevent infinite loop on permanently failed tasks - Fix conflict resolver using wrong working directory - Close log file streams on agent completion - Add 30s timeout to graceful shutdown - Write progress file atomically via temp+rename - Fix broken prompt interpolation in confirmSwarmMode - Set GIT_REMOTE and validate EPIC_BRANCH in swarm mode - Make templates swarm-aware (autonomous mode, SKIP_IMPLEMENTATION) - Parallel dependency fetching via Promise.allSettled - Filter subprocess environment to prevent secret leakage - Fix parseInt truncation for mixed-format task IDs - Propagate isEpic/swarmStatus metadata in swarm finish path - Replace dynamic import with static import in tests

Catch "already initialized" error from `bd init` and treat as success, fixing re-runs on existing epic looms. Also improve execBd() error handling to distinguish CLI failures from unexpected runtime errors. Fixes #571

github-project-automation bot added this to iloom-cli Feb 5, 2026

acreeger force-pushed the feat/issue-557__autonomous-swarm-mode branch from 6f1f1e6 to f9ecc92 Compare February 6, 2026 02:06

acreeger added 8 commits February 8, 2026 23:25

feat(swarm): add BeadsManager, BeadsSyncService, swarm settings schem…

1eddb75

…a, and epic label support (#558)

feat(swarm): add minimal worktree-only start path for swarm agents (#560

f92ea78

)

feat(swarm): add swarm mode template variables and prompt changes (#561)

ed8a374

feat(swarm): add epic detection, confirmation prompt, --swarm/--max-a…

05312ed

…gents flags (#559)

feat(swarm): add SwarmSupervisor with DAG-driven agent orchestration …

6726cf7

…and sequential merge queue (#562)

feat(swarm): add failure retry, conflict resolution, resume support, …

3fc72f1

…and progress reporting (#563)

feat(swarm): wire swarm supervisor into il start, add integration tes…

12338f9

…ts and docs (#564)

acreeger force-pushed the feat/issue-557__autonomous-swarm-mode branch from 7991e33 to e4d4a83 Compare February 9, 2026 04:54

fix(beads): make BeadsManager.init() idempotent on re-run

a3fe3c8

Catch "already initialized" error from `bd init` and treat as success, fixing re-runs on existing epic looms. Also improve execBd() error handling to distinguish CLI failures from unexpected runtime errors. Fixes #571

Conversation

acreeger commented Feb 5, 2026

Uh oh!

acreeger commented Feb 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Complexity Assessment for Issue #557

Analysis Plan

Complexity Assessment

Uh oh!

acreeger commented Feb 6, 2026

Code Review: Epic #557 — Autonomous Swarm Mode

Critical / High Priority (Must Fix)

1. Silent auto-install in CI when autoInstallBeads: false — #558

2. PATH not updated after Beads install — #558

3. findPRForBranch swallows all errors, causing premature task closure — #562

4. GIT_REMOTE not set for swarm mode — #561

5. EPIC_BRANCH not validated when SWARM_MODE active — #561

6. Broken prompt interpolation in confirmSwarmMode — #559

7. parseInt NaN for --max-agents — #559

Medium Priority (Should Fix)

8. Potential infinite loop on permanently failed tasks — #563

9. Conflict resolver uses wrong working directory — #563

10. Log file streams never closed — #562

11. Supervisor loop can exit prematurely — #562

12. parseInt truncates mixed-format IDs — #562

13. ready() and list() swallow JSON parse errors — #558, #563

14. Sync idempotency check uses ready() instead of full list — #558

15. Catch-all in detectEpic swallows all errors — #559

16. N+1 dependency fetching — #559

17. SKIP_IMPLEMENTATION path not swarm-aware — #561

18. "Human review" text contradicts swarm mode — #561

19. Missing isEpic/swarmStatus in swarm metadata path — #560

Low Priority (Nice to Fix)

Positive Observations

Uh oh!

acreeger commented Feb 6, 2026

Code Review Fixes Applied

Critical/High (7 fixes)

Medium (12 fixes)

Low (15 fixes)

Validation

Uh oh!

acreeger commented Feb 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Implementation Complete - Beads ID format fix

Summary

Changes Made

Validation Results

src/lib/BeadsSyncService.ts

src/lib/SwarmSupervisor.ts

src/lib/BeadsSyncService.test.ts

src/lib/SwarmSupervisor.test.ts

Uh oh!

acreeger commented Feb 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Analysis: Research beads.role in Beads CLI

Executive Summary

Impact Summary

Answers to Each Question

1. What is beads.role and what does it do?

2. Valid values

3. How is it configured?

4. Is it required or just a warning we can suppress?

5. Can we set it programmatically during our bd init call?

Codebase Research Findings

Affected Area: BeadsManager.init()

Why the warning occurs with our current init call

Beads Source Files Investigated

bd init available flags (full list)

Uh oh!

acreeger commented Feb 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Analysis: Beads CLI (bd) Prefix System Research

Executive Summary

Impact Summary

Problem Space Research

Problem Understanding

Architectural Context

Edge Cases Identified

Third-Party Research Findings

acreeger commented Feb 5, 2026 •

edited

Loading

1. Silent auto-install in CI when `autoInstallBeads: false` — #558

3. `findPRForBranch` swallows all errors, causing premature task closure — #562

4. `GIT_REMOTE` not set for swarm mode — #561

5. `EPIC_BRANCH` not validated when `SWARM_MODE` active — #561

7. `parseInt` NaN for `--max-agents` — #559

12. `parseInt` truncates mixed-format IDs — #562

13. `ready()` and `list()` swallow JSON parse errors — #558, #563

14. Sync idempotency check uses `ready()` instead of full list — #558

15. Catch-all in `detectEpic` swallows all errors — #559

17. `SKIP_IMPLEMENTATION` path not swarm-aware — #561

19. Missing `isEpic`/`swarmStatus` in swarm metadata path — #560

acreeger commented Feb 8, 2026 •

edited

Loading

acreeger commented Feb 8, 2026 •

edited

Loading

Analysis: Research `beads.role` in Beads CLI

1. What is `beads.role` and what does it do?

5. Can we set it programmatically during our `bd init` call?

acreeger commented Feb 8, 2026 •

edited

Loading

Analysis: Beads CLI (`bd`) Prefix System Research

How `bd init` Sets the Prefix

How `bd create --id` Validates

`--force` Bypass

acreeger commented Feb 8, 2026 •

edited

Loading

`src/lib/BeadsManager.ts`

`src/lib/BeadsSyncService.ts`

`src/lib/SwarmSupervisor.ts`

`src/commands/start.ts`

`src/lib/BeadsManager.test.ts`

`src/lib/BeadsSyncService.test.ts`

`src/lib/SwarmSupervisor.test.ts`

`src/commands/start-swarm.test.ts`

acreeger commented Feb 8, 2026 •

edited

Loading

`src/lib/BeadsSyncService.test.ts`

`src/lib/SwarmSupervisor.test.ts`

`src/commands/start-swarm.test.ts`

acreeger commented Feb 8, 2026 •

edited

Loading

acreeger commented Feb 9, 2026 •

edited

Loading