Skip to content

fix: resolve codegraph CLI hangs in git worktrees (#839)#863

Merged
carlos-alm merged 5 commits intomainfrom
fix/839-wal-worktree-contention
Apr 5, 2026
Merged

fix: resolve codegraph CLI hangs in git worktrees (#839)#863
carlos-alm merged 5 commits intomainfrom
fix/839-wal-worktree-contention

Conversation

@carlos-alm
Copy link
Copy Markdown
Contributor

Summary

  • Hooks used wrong DB path: enrich-context.sh and update-graph.sh used CLAUDE_PROJECT_DIR to locate graph.db, which always resolves to the main repo root — so all concurrent worktree agents contended on the same SQLite database. Now uses git rev-parse --show-toplevel (matching pre-commit.sh which already did this correctly).
  • Missing busy_timeout on readonly connections: openReadonlyOrFail() opened the DB without PRAGMA busy_timeout, so readers hung indefinitely during WAL contention (e.g., while another process ran a checkpoint). Now sets busy_timeout = 5000 to match openDb() and the native engine.
  • Silent fallback amplified the problem: openRepo() caught native SQLITE_BUSY errors and silently fell back to better-sqlite3 (which had no busy_timeout), turning a 5-second timeout into an indefinite hang. Now re-throws locking errors as DbError.

Closes #839

Test plan

  • Existing db.test.ts tests pass (25/25)
  • New test verifies busy_timeout = 5000 on readonly connections
  • Lint clean
  • Manual: run codegraph stats from a git worktree while main session has DB open
  • Manual: verify hooks create/use worktree-local DB in a fresh worktree

Three root causes fixed:

1. Hooks used CLAUDE_PROJECT_DIR for DB path, which always resolves to
   the main repo — all worktrees contended on the same graph.db. Now
   uses git rev-parse --show-toplevel (matching pre-commit.sh).

2. openReadonlyOrFail() had no busy_timeout pragma — readers hung
   indefinitely on WAL contention instead of retrying for 5 seconds.

3. openRepo() silently fell back from native (which has busy_timeout)
   to better-sqlite3 (which didn't) on SQLITE_BUSY errors, making
   contention worse. Now re-throws locking errors as DbError.

Closes #839
@claude
Copy link
Copy Markdown

claude bot commented Apr 4, 2026

Claude finished @carlos-alm's task —— View job


Reviewing PR: fix: resolve codegraph CLI hangs in git worktrees (#839)

Todo List

  • Read and analyze all changed files
  • Read original issue tooling: codegraph CLI hangs in git worktrees (WAL lock contention) #839 for context
  • Analyze root cause and approach justification
  • Check for configuration changes or test integrity issues
  • Evaluate code quality and potential bugs
  • Grade against backlog compliance criteria
  • Provide comprehensive review feedback with rating

Root Cause Analysis

Problem Being Solved: CLI hangs in git worktrees due to SQLite WAL contention when multiple processes try to access the same database file. Three specific issues: (1) hooks used wrong DB path causing all worktrees to share one DB, (2) readonly connections lacked busy_timeout causing indefinite hangs, and (3) silent fallback from native to better-sqlite3 masked locking errors.

Why This Approach: The approach directly addresses each root cause:

  • Uses git rev-parse --show-toplevel to get worktree-specific paths (consistent with pre-commit.sh)
  • Adds missing busy_timeout = 5000 to readonly connections to match write connections
  • Re-throws locking errors instead of silent fallback to prevent hangs without timeout

Risk Assessment: Low risk - changes are surgical fixes to specific concurrency issues with good test coverage.

Backlog Compliance

  • Zero-dep: ✓ — no new dependencies added
  • Foundation-aligned: ✓ — fixes reliability/performance issues, maintains offline-first operation
  • Problem-fit: 4 — improves reliability for concurrent agent sessions (common workflow)
  • Breaking: No — purely internal fixes, no API/CLI changes
  • Tier: 1

Technical Review

Code Quality: Excellent

Hook Changes (.claude/hooks/*.sh):

  • Consistent approach: Both hooks now use git rev-parse --show-toplevel matching pre-commit.sh:31
  • Proper fallback: || WORK_ROOT="${CLAUDE_PROJECT_DIR:-.}" preserves backward compatibility
  • Worktree isolation: Each worktree gets its own .codegraph/graph.db preventing contention

Database Connection (src/db/connection.ts:314):

  • Symmetry restored: Readonly connections now have same busy_timeout = 5000 as write connections (openDb:169)
  • Proper error handling: Lines 385-387 re-throw locking errors as DbError instead of silent fallback
  • Clear reasoning: Comment explains why fallback would hit same contention issue

Test Coverage (tests/unit/db.test.ts:384-394):

  • Targeted verification: New test specifically checks busy_timeout = 5000 on readonly connections
  • Matches existing pattern: Follows same structure as write connection timeout test (lines 110-116)

No Configuration Issues

  • No linting rules relaxed or disabled
  • No test expectations weakened
  • No timeouts increased to hide problems
  • Changes fix actual concurrency bugs, don't mask symptoms

Security & Performance

  • ✅ No command injection risks (uses execFileSync with array args)
  • ✅ No SQL injection (uses pragma with literal value)
  • ✅ Performance neutral (timeout only matters during contention)
  • ✅ Maintains advisory locking for process coordination

Edge Cases Handled

  • ✅ Git worktree detection via git rev-parse --show-toplevel (standard approach)
  • ✅ Graceful fallback when not in git repo
  • ✅ Process PID checking in advisory locks
  • ✅ Proper cleanup of both native and SQLite handles

Critical Concerns

None identified. This is a well-crafted fix addressing a real concurrency issue with:

  • Clear root cause identification
  • Surgical changes targeting specific problems
  • Good test coverage
  • Consistent patterns with existing codebase

Final Recommendation

  • Rating: ⭐⭐⭐⭐⭐ (5/5)
  • Action: APPROVE
  • Reasoning: Excellent technical execution. Fixes a real concurrency problem affecting multi-session workflows with minimal, well-tested changes. Perfect example of how to resolve SQLite WAL contention in multi-process scenarios.

The PR demonstrates solid understanding of SQLite WAL mechanics, git worktree isolation, and proper timeout handling. All changes follow existing patterns and maintain backward compatibility.


Mirror the WORK_ROOT fix from the live hooks to the example hooks in
docs/examples/claude-code-hooks/ so users copying the examples don't
hit the same WAL contention in worktree workflows.
@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps bot commented Apr 4, 2026

Greptile Summary

This PR fixes three root causes of CLI hangs when running codegraph from a git worktree: hooks now use git rev-parse --show-toplevel instead of CLAUDE_PROJECT_DIR so each worktree reads its own SQLite DB; openReadonlyOrFail now sets PRAGMA busy_timeout = 5000 so readonly connections blocked by WAL contention time out in ≤5 s rather than hanging indefinitely; and openRepo now re-throws SQLITE_BUSY errors from the native engine as DbError instead of silently falling back to better-sqlite3 (which also lacked a busy timeout).

  • Hooks (enrich-context.sh, update-graph.sh): both the live .claude/hooks/ and the docs examples are updated consistently to use the correct worktree-local DB path
  • openReadonlyOrFail: one-line pragma fix eliminates the hang path for all readonly callers
  • openRepo: regex guard on the locking-error message prevents silent fallback to an equally-hung connection
  • openReadonlyWithNative: improved debug logging distinguishes busy-related failures from other native failures; silently degrades to better-sqlite3-only (intentional — nativeDb is optional in this API, unlike the mandatory Repository returned by openRepo)
  • Tests: busy_timeout pragma is verified directly; a dedicated mock-based test covers the SQLITE_BUSY re-throw path — addressing the prior review concern from thread

Confidence Score: 5/5

Safe to merge — all three root causes of worktree hangs are correctly addressed with appropriate test coverage and no regressions introduced.

No P0 or P1 issues found. All fixes are correct: busy_timeout is set consistently on readonly connections, SQLITE_BUSY errors are re-thrown in openRepo rather than silently degraded, and hooks now correctly isolate each worktree to its own DB. The prior review concern (missing test for the re-throw behavior) was addressed in 61fe866 with the new openRepo-busy.test.ts file. The asymmetry between openRepo (re-throws) and openReadonlyWithNative (degrades gracefully) is intentional by design since nativeDb is optional in the latter API.

No files require special attention — all changes are straightforward and well-tested.

Important Files Changed

Filename Overview
src/db/connection.ts Added busy_timeout=5000 to openReadonlyOrFail; added SQLITE_BUSY re-throw guard in openRepo; improved debug logging in openReadonlyWithNative to distinguish busy from other native failures
.claude/hooks/enrich-context.sh Fixed DB path to use worktree root via git rev-parse --show-toplevel instead of CLAUDE_PROJECT_DIR
.claude/hooks/update-graph.sh Fixed DB path and build target to use worktree root; fallback node CLI path now uses CLAUDE_PROJECT_DIR with $WORK_ROOT as fallback (correct precedence)
docs/examples/claude-code-hooks/enrich-context.sh Mirrored worktree DB path fix from live hooks to docs example — kept in sync correctly
docs/examples/claude-code-hooks/update-graph.sh Mirrored worktree DB path and build target fix from live hooks to docs example
tests/unit/db.test.ts New test directly verifies busy_timeout=5000 pragma is applied to openReadonlyOrFail connections
tests/unit/openRepo-busy.test.ts New test file mocks native engine to throw SQLITE_BUSY and verifies openRepo re-throws as DbError with correct name and code instead of falling back

Sequence Diagram

sequenceDiagram
    participant H as Hook (enrich/update)
    participant G as git rev-parse
    participant C as codegraph CLI
    participant OR as openReadonlyOrFail
    participant ON as openRepoNative (Rust)
    participant BSQ as better-sqlite3

    H->>G: rev-parse --show-toplevel
    G-->>H: WORK_ROOT (worktree path)
    H->>C: codegraph build/brief WORK_ROOT<br/>-d WORK_ROOT/.codegraph/graph.db

    C->>ON: NativeDatabase.openReadonly(dbPath)
    alt SQLITE_BUSY from native
        ON-->>C: Error(SQLITE_BUSY)
        C-->>C: throw DbError("Database is busy")
    else native succeeds
        ON-->>C: NativeDatabase handle → NativeRepository
    end

    C->>OR: openReadonlyOrFail(dbPath)
    OR->>BSQ: new Database(dbPath, {readonly:true})
    OR->>BSQ: PRAGMA busy_timeout = 5000
    BSQ-->>OR: handle (waits ≤5 s on contention)
    OR-->>C: BetterSqlite3Database → SqliteRepository
Loading

Reviews (3): Last reviewed commit: "fix: address Greptile review — better bu..." | Re-trigger Greptile

Comment on lines +384 to 395
it('sets busy_timeout pragma to 5000 on readonly connections', () => {
const dbPath = path.join(tmpDir, 'readonly-busy.db');
const db = openDb(dbPath);
initSchema(db);
closeDb(db);

const readDb = openReadonlyOrFail(dbPath);
const timeout = readDb.pragma('busy_timeout', { simple: true });
expect(timeout).toBe(5000);
readDb.close();
});
});
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 No test coverage for the openRepo locking-error re-throw

The new test validates busy_timeout on readonly connections (the openReadonlyOrFail fix), but there is no test for the second fix: the regex guard that re-throws locking errors in openRepo instead of silently falling back to better-sqlite3.

Consider adding a test that mocks openRepoNative (or isNativeAvailable) to throw an error with SQLITE_BUSY in the message and verifies that openRepo re-throws a DbError rather than silently degrading:

it('re-throws SQLITE_BUSY errors instead of falling back to better-sqlite3', () => {
  // ... mock isNativeAvailable() to return true and getNative() to throw SQLITE_BUSY
  expect(() => openRepo(dbPath)).toThrow(DbError);
  expect(() => openRepo(dbPath)).toThrow(/busy/i);
});

Without this, a regression that removes the regex guard would not be caught by the automated test suite.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed in 61fe866 — added tests/unit/openRepo-busy.test.ts which mocks the native engine to throw SQLITE_BUSY and verifies that openRepo re-throws it as a DbError (with code DB_ERROR and message containing SQLITE_BUSY) instead of silently falling back. Two test cases cover both the error message check and the DbError type/code assertions.

- openReadonlyWithNative: distinguish busy/locked errors in debug logs
  instead of generic "native path failed" message
- Add dedicated test file for openRepo SQLITE_BUSY re-throw behavior
@carlos-alm
Copy link
Copy Markdown
Contributor Author

Addressed both Greptile findings in 61fe866:

  1. openReadonlyWithNative busy logging: Now distinguishes busy/locked errors from generic native failures in debug logs, making WAL contention visible during debugging.

  2. Test coverage for openRepo re-throw: Added tests/unit/openRepo-busy.test.ts with dedicated tests that mock the native engine to return SQLITE_BUSY and verify it's re-thrown as DbError instead of silently falling back.

@greptileai

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 4, 2026

Codegraph Impact Analysis

3 functions changed126 callers affected across 85 files

  • openReadonlyOrFail in src/db/connection.ts:304 (114 transitive callers)
  • openRepo in src/db/connection.ts:361 (29 transitive callers)
  • openReadonlyWithNative in src/db/connection.ts:408 (8 transitive callers)

@carlos-alm
Copy link
Copy Markdown
Contributor Author

@greptileai

@carlos-alm carlos-alm merged commit 729789f into main Apr 5, 2026
13 checks passed
@carlos-alm carlos-alm deleted the fix/839-wal-worktree-contention branch April 5, 2026 07:05
@github-actions github-actions bot locked and limited conversation to collaborators Apr 5, 2026
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

tooling: codegraph CLI hangs in git worktrees (WAL lock contention)

1 participant