fix: resolve codegraph CLI hangs in git worktrees (#839)#863
fix: resolve codegraph CLI hangs in git worktrees (#839)#863carlos-alm merged 5 commits intomainfrom
Conversation
Three root causes fixed: 1. Hooks used CLAUDE_PROJECT_DIR for DB path, which always resolves to the main repo — all worktrees contended on the same graph.db. Now uses git rev-parse --show-toplevel (matching pre-commit.sh). 2. openReadonlyOrFail() had no busy_timeout pragma — readers hung indefinitely on WAL contention instead of retrying for 5 seconds. 3. openRepo() silently fell back from native (which has busy_timeout) to better-sqlite3 (which didn't) on SQLITE_BUSY errors, making contention worse. Now re-throws locking errors as DbError. Closes #839
|
Claude finished @carlos-alm's task —— View job Reviewing PR: fix: resolve codegraph CLI hangs in git worktrees (#839)Todo List
Root Cause AnalysisProblem Being Solved: CLI hangs in git worktrees due to SQLite WAL contention when multiple processes try to access the same database file. Three specific issues: (1) hooks used wrong DB path causing all worktrees to share one DB, (2) readonly connections lacked Why This Approach: The approach directly addresses each root cause:
Risk Assessment: Low risk - changes are surgical fixes to specific concurrency issues with good test coverage. Backlog Compliance
Technical ReviewCode Quality: ExcellentHook Changes (
Database Connection (
Test Coverage (
No Configuration Issues
Security & Performance
Edge Cases Handled
Critical ConcernsNone identified. This is a well-crafted fix addressing a real concurrency issue with:
Final Recommendation
The PR demonstrates solid understanding of SQLite WAL mechanics, git worktree isolation, and proper timeout handling. All changes follow existing patterns and maintain backward compatibility. |
Mirror the WORK_ROOT fix from the live hooks to the example hooks in docs/examples/claude-code-hooks/ so users copying the examples don't hit the same WAL contention in worktree workflows.
Greptile SummaryThis PR fixes three root causes of CLI hangs when running codegraph from a git worktree: hooks now use
Confidence Score: 5/5Safe to merge — all three root causes of worktree hangs are correctly addressed with appropriate test coverage and no regressions introduced. No P0 or P1 issues found. All fixes are correct: busy_timeout is set consistently on readonly connections, SQLITE_BUSY errors are re-thrown in openRepo rather than silently degraded, and hooks now correctly isolate each worktree to its own DB. The prior review concern (missing test for the re-throw behavior) was addressed in 61fe866 with the new openRepo-busy.test.ts file. The asymmetry between openRepo (re-throws) and openReadonlyWithNative (degrades gracefully) is intentional by design since nativeDb is optional in the latter API. No files require special attention — all changes are straightforward and well-tested. Important Files Changed
Sequence DiagramsequenceDiagram
participant H as Hook (enrich/update)
participant G as git rev-parse
participant C as codegraph CLI
participant OR as openReadonlyOrFail
participant ON as openRepoNative (Rust)
participant BSQ as better-sqlite3
H->>G: rev-parse --show-toplevel
G-->>H: WORK_ROOT (worktree path)
H->>C: codegraph build/brief WORK_ROOT<br/>-d WORK_ROOT/.codegraph/graph.db
C->>ON: NativeDatabase.openReadonly(dbPath)
alt SQLITE_BUSY from native
ON-->>C: Error(SQLITE_BUSY)
C-->>C: throw DbError("Database is busy")
else native succeeds
ON-->>C: NativeDatabase handle → NativeRepository
end
C->>OR: openReadonlyOrFail(dbPath)
OR->>BSQ: new Database(dbPath, {readonly:true})
OR->>BSQ: PRAGMA busy_timeout = 5000
BSQ-->>OR: handle (waits ≤5 s on contention)
OR-->>C: BetterSqlite3Database → SqliteRepository
Reviews (3): Last reviewed commit: "fix: address Greptile review — better bu..." | Re-trigger Greptile |
| it('sets busy_timeout pragma to 5000 on readonly connections', () => { | ||
| const dbPath = path.join(tmpDir, 'readonly-busy.db'); | ||
| const db = openDb(dbPath); | ||
| initSchema(db); | ||
| closeDb(db); | ||
|
|
||
| const readDb = openReadonlyOrFail(dbPath); | ||
| const timeout = readDb.pragma('busy_timeout', { simple: true }); | ||
| expect(timeout).toBe(5000); | ||
| readDb.close(); | ||
| }); | ||
| }); |
There was a problem hiding this comment.
No test coverage for the
openRepo locking-error re-throw
The new test validates busy_timeout on readonly connections (the openReadonlyOrFail fix), but there is no test for the second fix: the regex guard that re-throws locking errors in openRepo instead of silently falling back to better-sqlite3.
Consider adding a test that mocks openRepoNative (or isNativeAvailable) to throw an error with SQLITE_BUSY in the message and verifies that openRepo re-throws a DbError rather than silently degrading:
it('re-throws SQLITE_BUSY errors instead of falling back to better-sqlite3', () => {
// ... mock isNativeAvailable() to return true and getNative() to throw SQLITE_BUSY
expect(() => openRepo(dbPath)).toThrow(DbError);
expect(() => openRepo(dbPath)).toThrow(/busy/i);
});Without this, a regression that removes the regex guard would not be caught by the automated test suite.
There was a problem hiding this comment.
Addressed in 61fe866 — added tests/unit/openRepo-busy.test.ts which mocks the native engine to throw SQLITE_BUSY and verifies that openRepo re-throws it as a DbError (with code DB_ERROR and message containing SQLITE_BUSY) instead of silently falling back. Two test cases cover both the error message check and the DbError type/code assertions.
- openReadonlyWithNative: distinguish busy/locked errors in debug logs instead of generic "native path failed" message - Add dedicated test file for openRepo SQLITE_BUSY re-throw behavior
|
Addressed both Greptile findings in 61fe866:
|
Codegraph Impact Analysis3 functions changed → 126 callers affected across 85 files
|
Summary
enrich-context.shandupdate-graph.shusedCLAUDE_PROJECT_DIRto locategraph.db, which always resolves to the main repo root — so all concurrent worktree agents contended on the same SQLite database. Now usesgit rev-parse --show-toplevel(matchingpre-commit.shwhich already did this correctly).busy_timeouton readonly connections:openReadonlyOrFail()opened the DB withoutPRAGMA busy_timeout, so readers hung indefinitely during WAL contention (e.g., while another process ran a checkpoint). Now setsbusy_timeout = 5000to matchopenDb()and the native engine.openRepo()caught native SQLITE_BUSY errors and silently fell back to better-sqlite3 (which had nobusy_timeout), turning a 5-second timeout into an indefinite hang. Now re-throws locking errors asDbError.Closes #839
Test plan
db.test.tstests pass (25/25)busy_timeout = 5000on readonly connectionscodegraph statsfrom a git worktree while main session has DB open