fix(brainbar): harden reliability under db contention by EtanHey · Pull Request #276 · EtanHey/brainlayer

EtanHey · 2026-05-03T10:48:12Z

Summary

Enforce a single BrainBar instance with a nonblocking socket-side flock; duplicate app launches now terminate instead of competing for /tmp/brainbar.sock.
Start the MCP socket/router before database write readiness, and move startup DB open/retry work off the socket queue so initialize and tools/list respond during migration/DDL lock contention.
Add Swift brain_store write-budget fallback to the existing durable pending-store queue, including a regression for R02's write burst followed by brain_search staying responsive.

Context

Follow-up to today's BrainBar reliability cascade: PRs feat(brainbar): port MCP parity tools #270, fix(brainbar): bugbot nits + startup retry for migration DDL contention #273, fix(brainbar): stabilize json-rpc envelope order #274, fix(brainbar): compact tools list for Claude MCPB #275.
Addresses Phase A from the reliability sprint: A1 single-instance invariant, A2 handshake/read availability under DB contention, A3 queued write fallback.
Phase B/C scaffolding doc written locally at ~/Gits/orchestrator/docs.local/plans/2026-05-03-brainlayer-reliability/README.md.

Tests

swift test --package-path brain-bar --filter BrainBarReliabilityTests — 4 tests, 0 failures.
swift test --package-path brain-bar — 345 tests, 0 failures.
./scripts/run_tests.sh — BrainLayer test gate passed: Python unit suite 1823 passed, 9 skipped, 75 deselected, 1 xfailed; MCP registration 3 passed; isolated eval/hook routing 32 passed; bun 1 passed; FTS5 regression PASS.
Pre-push hook reran the repo gate and passed with the same Python/MCP/eval/bun/regression results.

@codex review
@cursor review
@BugBot review

Note

Medium Risk
Touches BrainBar startup/server lifecycle and SQLite write paths; incorrect locking or async DB open coordination could prevent startup or lead to missed/duplicate writes under contention.

Overview
BrainBar now starts the MCP socket/router before the SQLite database is ready, moving database open/retry work off the socket queue so initialize/tools/list remain responsive while the DB is locked or recovering.

Adds a single-instance enforcement via a non-blocking flock lockfile (BrainBarInstanceLock) owned by BrainBarServer, with an onStartRejected callback used by the app to terminate duplicate launches.

Hardens write behavior under contention by adding BrainDatabase.storeOrQueueWithinBudget (short busy-timeout, no retries) and switching brain_store to queue to the pending-store file when the DB is busy; includes new reliability tests covering instance locking, startup responsiveness during write locks, queued writes, and read-path responsiveness during queued write bursts.

^{Reviewed by Cursor Bugbot for commit d059427. Bugbot is set up for automated code reviews on this repo. Configure here.}

Note

Harden BrainBar reliability under database contention with single-instance locking and queued writes

Adds BrainBarInstanceLock, a file-based single-instance lock using flock, so secondary app instances are rejected immediately via onStartRejected rather than competing for resources.
Refactors BrainBarServer to start with a dbPath and open the database asynchronously after the socket is bound, preventing startup hangs under write contention.
Adds storeOrQueueWithinBudget to BrainDatabase, which falls back to queuing writes when the database is busy, and returns a StoreWriteOutcome so callers can distinguish immediate vs. queued writes.
Updates the brain_store MCP tool handler in MCPRouter to use the budgeted write path and return queued=true in the response when a write is deferred.
Adds reliability tests in BrainBarReliabilityTests covering lock rejection, initialize responsiveness under write locks, and read-path stability during queued write bursts.

📊 Macroscope summarized d059427. 5 files reviewed, 6 issues evaluated, 1 issue filtered, 1 comment posted

🗂️ Filtered Issues

brain-bar/Sources/BrainBar/BrainBarServer.swift — 1 comment posted, 4 evaluated, 1 filtered

line 552: Misleading error message in handleSubscribeTool. The guard on lines 552-555 checks agentID, tags, AND database, but always returns "Database not available" regardless of which condition failed. If agent_id or tags is missing/invalid, the client receives an incorrect error message. [ Out of scope ]

Summary by CodeRabbit

Release Notes

New Features
- Prevents multiple app instances from running simultaneously
- Intelligent database write queuing when the database is busy
- Improved timeout handling for database operations
Bug Fixes
- Optimized startup sequence for faster initialization
- Better handling of database contention scenarios
Tests
- Added reliability tests for instance locking and write-lock behavior

greptile-apps

Your free trial has ended. If you'd like to continue receiving code reviews, you can add a payment method here.

coderabbitai · 2026-05-03T10:48:23Z

📝 Walkthrough

Walkthrough

This PR introduces single-instance locking via OS-level file locks and makes BrainBar resilient to database write-locks during startup. A new BrainBarInstanceLock type enforces exclusive process ownership; BrainBarServer acquires the lock at startup and rejects via onStartRejected if another instance runs. Database opening is refactored to be asynchronous and non-blocking. BrainDatabase adds storeOrQueueWithinBudget() to queue writes when the database is under contention, and the startup flow defers database readiness via server.onDatabaseReady rather than blocking initialization.

Changes

Single-Instance Locking

Layer / File(s)	Summary
Lock Mechanism `BrainBarInstanceLock.swift`	New type providing OS-level exclusive `flock`-based locking. `acquire(lockPath:)` ensures directory exists, opens/creates lock file, attempts non-blocking exclusive lock (throws `alreadyRunning` on conflict), writes PID, and fsyncs. `release()` is idempotent and cleans up lock file.
Server Lock Integration `BrainBarServer.swift`	Constructor accepts optional `instanceLockPath`; `startOnQueue()` attempts lock acquisition before server setup and invokes `onStartRejected` callback if another instance is detected. Shutdown releases instance lock state.
App Startup Wiring `BrainBarApp.swift`	Sets `server.onStartRejected` to terminate the app immediately if startup lock fails, preventing duplicate instances from initializing.
Reliability Tests `BrainBarReliabilityTests.swift`	`testSingleInstanceLockRejectsSecondOwner` verifies lock acquisition blocks a second owner and throws `alreadyRunning`.

Write-Lock Resilience & Async Database Opening

Layer / File(s)	Summary
Database Write Queuing `BrainDatabase.swift`	Introduced `StoreWriteOutcome` enum (`.stored` or `.queued`). Extended `store(...)` to accept configurable `retries` and per-call `busyTimeoutMillis` (overriding SQLite `PRAGMA busy_timeout`). Added `storeOrQueueWithinBudget(...)` which attempts `store()` with `retries: 0`; on retryable lock errors, enqueues via `queuePendingStore()` and returns `.queued` instead of throwing.
Server Async DB Opening `BrainBarServer.swift`	Database opening refactored to async: `attemptDatabaseOpen()` initiates open on a background queue when needed; `finishDatabaseOpen()` runs on server queue and closes DB only if server is no longer listening and no external DB was provided. Added `databaseOpenInProgress` flag to guard concurrent attempts.
App Startup Flow `BrainBarApp.swift`	Removed blocking background queue DB open. Server now calls `server.start()` immediately. Runtime installation, database assignment, and URL flushing moved into `server.onDatabaseReady` callback (executed on `@MainActor`). Legacy menu-bar surface and quick-capture hotkey setup now run after `server.start()` without blocking on DB readiness.
Request Routing `MCPRouter.swift`	`handleBrainStore` replaced exception-driven control flow (`do/catch` + conditional queueing) with switch-based outcome handling: `.stored` path returns `queued: false` with flush metadata; `.queued` path returns `queued: true` without metadata.
Reliability Tests `BrainBarReliabilityTests.swift`	`testInitializeReturnsImmediatelyWhileStartupDatabaseIsWriteLocked` verifies server responds to `initialize` requests quickly even while DB is write-locked. `testBrainStoreQueuesWithinBudgetWhenDatabaseWriteLockIsHeld` confirms `brain_store` returns `queued: true` under lock. `testQueuedWriteBurstDoesNotTakeReadPathDown` ensures a burst of queued writes does not break the read path (`brain_search`).

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

The PR introduces a new concurrency-aware locking primitive (BrainBarInstanceLock with flock semantics), refactors server-database lifecycle management into an asynchronous, non-blocking flow, and introduces a new queuing strategy (storeOrQueueWithinBudget) that replaces exception-driven logic with enum-based outcome handling. The changes span multiple core files with nuanced state management (databaseOpenInProgress, instanceLock lifecycle, lock release in shutdown), plus detailed test utilities (socket framing, timeout handling, SQLite lock seeding). Logic is moderately dense and state interactions require careful reasoning across app, server, and database layers.

Possibly related PRs

Fix queue fallback for Swift brain_store #261: Both PRs modify the brain_store write-path and queuing behavior in BrainDatabase/MCPRouter; this PR adds storeOrQueueWithinBudget and switch-based routing, while that PR implements the underlying file-backed pending-store queue and flush mechanics.
fix: restore quick capture panel actions #149: Both PRs modify BrainBarApp startup flow and server.onDatabaseReady callback wiring; related in deferred UI initialization and database-ready sequencing.
fix(brainbar): bugbot nits + startup retry for migration DDL contention #273: Both PRs modify BrainBarServer database opening and async retry logic, with overlap in busy-timeout handling and DB readiness signaling.

Poem

🔐 A lock and a queue, both born in this change,
One guards the gates so instances don't range,
The other queues writes when the DB's too tight,
Now startup's resilient—no more freezing in flight! 🐰✨

🚥 Pre-merge checks | ✅ 5

✅ Passed checks (5 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title 'fix(brainbar): harden reliability under db contention' accurately summarizes the main change: hardening BrainBar reliability by addressing database contention issues through instance locking, responsive socket startup, and write queuing.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch fix/brainbar-reliability-A

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Review rate limit: 0/1 reviews remaining, refill in 60 minutes.}

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

chatgpt-codex-connector

💡 Codex Review

brainlayer/brain-bar/Sources/BrainBar/BrainBarServer.swift

Lines 238 to 241 in d059427

    
           guard bindResult == 0 else { 
        
               NSLog("[BrainBar] Failed to bind: errno %d", errno) 
        
               close(fd) 
        
               return

Release instance lock on socket bind/listen startup failure

After acquiring the singleton lock, several startup failure branches return early without cleanup (for example on bind failure). In those cases this process keeps instanceLock held even though the server never starts listening, so later relaunch attempts are rejected as "already running" until this broken process exits. These error paths should release the lock (or call a shared cleanup path) before returning.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-05-03T10:51:04Z

+        guard shouldRelease else { return }
+        flock(fd, LOCK_UN)
+        close(fd)
+        try? FileManager.default.removeItem(atPath: lockPath)


Keep lockfile inode stable while another instance may acquire

release() unlinks lockPath after unlocking, which breaks flock-based singleton guarantees during handoff: if a second process acquires the lock between LOCK_UN and removeItem, this code can delete the inode that process is locking, allowing a third process to recreate the path and acquire a separate lock concurrently. For flock locks, the lock file should remain in place (or at least not be removed in this sequence) to avoid split-brain ownership.

Useful? React with 👍 / 👎.

macroscopeapp · 2026-05-03T10:52:37Z

🟢 Low

brainlayer/brain-bar/Sources/BrainBar/BrainBarServer.swift

Line 417 in d059427

private func handleMessage(fd: Int32, request: [String: Any]) -> [String: Any] {

handleMessage intercepts brain_subscribe, brain_unsubscribe, and brain_ack before they reach MCPRouter.handleToolsCall, skipping the validate(arguments:for:) check. This lets clients send oversized agent_id strings or unbounded tags arrays that bypass the maxLength/maxItems limits defined in limitedInputSchema, risking database storage failures or memory pressure.

🚀 Reply "fix it for me" or copy this AI Prompt for your agent:

In file brain-bar/Sources/BrainBar/BrainBarServer.swift around line 417: `handleMessage` intercepts `brain_subscribe`, `brain_unsubscribe`, and `brain_ack` before they reach `MCPRouter.handleToolsCall`, skipping the `validate(arguments:for:)` check. This lets clients send oversized `agent_id` strings or unbounded `tags` arrays that bypass the `maxLength`/`maxItems` limits defined in `limitedInputSchema`, risking database storage failures or memory pressure. Evidence trail: BrainBarServer.swift lines 417-425: handleMessage intercepts brain_subscribe/brain_unsubscribe/brain_ack before router.handle(). MCPRouter.swift line 148: validate(arguments:for:) is called in the router path. MCPRouter.swift lines 703-762: validate() enforces maxLength/maxItems from schema. MCPRouter.swift lines 659-700: limitedInputSchema adds default maxLength/maxItems. MCPRouter.swift lines 988-1030: brain_subscribe/brain_unsubscribe/brain_ack schemas use limitedInputSchema. BrainBarServer.swift lines 551-640: handleSubscribeTool/handleUnsubscribeTool/handleAckTool have no validation of argument sizes.

cursor

Cursor Bugbot has reviewed your changes and found 1 potential issue.

^{❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.}

^{Reviewed by Cursor Bugbot for commit d059427. Configure here.}

cursor · 2026-05-03T10:55:06Z

+            if let previousBusyTimeout {
+                try? executeOnHandle(db, sql: "PRAGMA busy_timeout = \(previousBusyTimeout)")
+            }
+        }


Busy timeout permanently lowered if pragma query fails

Medium Severity

The store() method's defer block only restores the previous busy_timeout when previousBusyTimeout is non-nil. Since previousBusyTimeout is computed via busyTimeoutMillis.flatMap { _ in queryPragma(…) }, it becomes nil if queryPragma fails — yet the PRAGMA is still changed to the low budget value (e.g., 50ms). In that case, the defer skips restoration and the shared database connection keeps the 50ms timeout permanently. All subsequent write operations under any contention would then fail far too quickly instead of using the configured 30-second timeout.

^{Reviewed by Cursor Bugbot for commit d059427. Configure here.}

coderabbitai

Actionable comments posted: 5

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@brain-bar/Sources/BrainBar/BrainBarApp.swift`:
- Around line 112-120: server.start() returns before the server's lock check
completes causing installLegacyMenuBarSurface(with:), collector.start(), and
configureQuickCaptureHotkey() to run even for rejected duplicates; change the
flow to only perform those side effects after an explicit success callback from
the server (e.g. a completion handler on server.start or observing an
onStartAccepted/onStartRejected event) — move calls to
installLegacyMenuBarSurface(with: collector), collector.start(),
configureQuickCaptureHotkey(), and the NSLog trailing message into the
success/accepted path and ensure the rejected path (onStartRejected) does not
run them.

In `@brain-bar/Sources/BrainBar/BrainBarInstanceLock.swift`:
- Around line 55-64: The release() method currently unlinks the lock file after
releasing the flock, which can cause a race allowing multiple processes to hold
locks on different inodes; remove the FileManager.default.removeItem(atPath:
lockPath) call and keep only the unlock (flock(fd, LOCK_UN)) and close(fd) steps
(preserving the releaseLock guard logic) so the lock file remains on disk as a
marker and prevents the race.

In `@brain-bar/Sources/BrainBar/BrainBarServer.swift`:
- Around line 188-198: After successfully acquiring instanceLock via
BrainBarInstanceLock.acquire, ensure the lock is always released and
onStartRejected is invoked for any subsequent early-return failure paths:
introduce a deferred cleanup (e.g., a defer block or shared failure handler)
immediately after assignment of instanceLock that checks a startupSucceeded flag
(set true only on full successful startup) and if false calls
instanceLock.release (or the appropriate BrainBarInstanceLock release method)
and invokes onStartRejected with the existing error message; update all
post-acquire early returns to rely on this shared cleanup so the lock is not
left held when startup fails.

In `@brain-bar/Sources/BrainBar/BrainDatabase.swift`:
- Around line 722-730: The current code mutates the shared SQLite handle's
PRAGMA busy_timeout via busyTimeoutMillis/queryPragma/executeOnHandle which
races when storeAsync/store() run concurrently; instead, when a custom
busyTimeoutMillis is requested create and use a dedicated short-lived DB
connection/handle for that budgeted write (set its PRAGMA busy_timeout once on
the new handle, run the write via executeOnHandle or equivalent, then close the
handle) rather than changing and restoring the shared connection; update the
logic around busyTimeoutMillis, previousBusyTimeout, queryPragma, and
executeOnHandle to open/close a temporary connection for timed writes to
eliminate interleaving on the shared handle.

In `@brain-bar/Tests/BrainBarTests/BrainBarReliabilityTests.swift`:
- Around line 240-284: connectReliabilitySocket currently returns a blocking
socket but readReliabilityFramedMessage expects a non-blocking fd; after
creating fd in connectReliabilitySocket set O_NONBLOCK (use fcntl to get current
flags, add O_NONBLOCK, set them) and handle any fcntl errors by closing fd and
throwing; ensure the rest of connect loop still works (check for
EINPROGRESS/EWOULDBLOCK as needed) and leave the non-blocking flag on the
returned descriptor so readReliabilityFramedMessage’s EAGAIN/EWOULDBLOCK
handling and the timeout loop behave correctly.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 4fc27bf1-aa71-41dc-a9d1-e075d2720650

📥 Commits

Reviewing files that changed from the base of the PR and between 70606e0 and d059427.

📒 Files selected for processing (6)

brain-bar/Sources/BrainBar/BrainBarApp.swift
brain-bar/Sources/BrainBar/BrainBarInstanceLock.swift
brain-bar/Sources/BrainBar/BrainBarServer.swift
brain-bar/Sources/BrainBar/BrainDatabase.swift
brain-bar/Sources/BrainBar/MCPRouter.swift
brain-bar/Tests/BrainBarTests/BrainBarReliabilityTests.swift

📜 Review details

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)

GitHub Check: Cursor Bugbot
GitHub Check: Macroscope - Correctness Check
GitHub Check: test (3.12)
GitHub Check: test (3.13)
GitHub Check: test (3.11)

🧰 Additional context used

🧠 Learnings (2)

📚 Learning: 2026-03-18T00:12:08.774Z

Learnt from: EtanHey
Repo: EtanHey/brainlayer PR: 87
File: brain-bar/Sources/BrainBar/BrainBarServer.swift:118-129
Timestamp: 2026-03-18T00:12:08.774Z
Learning: In Swift files under brain-bar/Sources/BrainBar, enforce that when a critical dependency like the database is nil due to startup ordering (socket before DB), any tool handler that accesses the database must throw an explicit error (e.g., ToolError.noDatabase) instead of returning a default/empty value. Do not allow silent defaults (e.g., guard let db else { return ... }). Flag patterns that silently return defaults when db is nil, as this masks startup timing issues. This guidance applies broadly to similar Swift files in the BrainBar module, not just this one location.

Applied to files:

brain-bar/Sources/BrainBar/BrainBarInstanceLock.swift
brain-bar/Sources/BrainBar/BrainBarApp.swift
brain-bar/Sources/BrainBar/BrainBarServer.swift
brain-bar/Sources/BrainBar/MCPRouter.swift
brain-bar/Sources/BrainBar/BrainDatabase.swift

📚 Learning: 2026-03-29T18:45:40.988Z

Learnt from: EtanHey
Repo: EtanHey/brainlayer PR: 133
File: brain-bar/Sources/BrainBar/BrainDatabase.swift:0-0
Timestamp: 2026-03-29T18:45:40.988Z
Learning: In the BrainBar module’s Swift database layer (notably BrainDatabase.swift), ensure that the `search()` function’s `unreadOnly=true` path orders results by the delivery frontier cursor so the watermark `maxRowID` stays contiguous. Specifically, when `unreadOnly` is enabled, the query must include `ORDER BY c.rowid ASC` (e.g., via `let orderByClause = unreadOnly ? "c.rowid ASC" : "f.rank"`). Do not replace the unread-only ordering with relevance-based sorting (e.g., `f.rank`) unconditionally or for the unread-only path, as it can introduce gaps in the watermark and incorrectly mark unseen rows as delivered. Flag any future change to the `ORDER BY` clause in this function that makes relevance sorting apply to the unread-only case.

Applied to files:

brain-bar/Sources/BrainBar/BrainBarInstanceLock.swift
brain-bar/Sources/BrainBar/BrainBarApp.swift
brain-bar/Sources/BrainBar/BrainBarServer.swift
brain-bar/Sources/BrainBar/MCPRouter.swift
brain-bar/Sources/BrainBar/BrainDatabase.swift

🪛 SwiftLint (0.63.2)

brain-bar/Tests/BrainBarTests/BrainBarReliabilityTests.swift

[Warning] 5-5: Classes should have an explicit deinit method

(required_deinit)

coderabbitai · 2026-05-03T10:57:01Z

+        server.start()
+
+        if launchMode == .legacyStatusItem {
+            installLegacyMenuBarSurface(with: collector)
        }
+
+        collector.start()
+        configureQuickCaptureHotkey()
+        NSLog("[BrainBar] Socket ready; database will self-heal — launchMode=%@", String(describing: launchMode))


⚠️ Potential issue | 🟠 Major | 🏗️ Heavy lift

Don't start app side effects before startup acceptance is known.

server.start() returns before the lock check runs on the server queue, so a losing duplicate instance still installs the legacy surface, starts the collector, and registers the hotkey before onStartRejected gets back onto the main actor. That briefly recreates the duplicate side effects this PR is trying to eliminate. Gate this work behind an explicit successful-start callback, or make startup acceptance synchronous.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@brain-bar/Sources/BrainBar/BrainBarApp.swift` around lines 112 - 120, server.start() returns before the server's lock check completes causing installLegacyMenuBarSurface(with:), collector.start(), and configureQuickCaptureHotkey() to run even for rejected duplicates; change the flow to only perform those side effects after an explicit success callback from the server (e.g. a completion handler on server.start or observing an onStartAccepted/onStartRejected event) — move calls to installLegacyMenuBarSurface(with: collector), collector.start(), configureQuickCaptureHotkey(), and the NSLog trailing message into the success/accepted path and ensure the rejected path (onStartRejected) does not run them.

coderabbitai · 2026-05-03T10:57:01Z

+    func release() {
+        let shouldRelease = releaseLock.withLock { released -> Bool in
+            guard !released else { return false }
+            released = true
+            return true
+        }
+        guard shouldRelease else { return }
+        flock(fd, LOCK_UN)
+        close(fd)
+        try? FileManager.default.removeItem(atPath: lockPath)


⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

# First, let's find and examine the BrainBarInstanceLock.swift file find . -name "BrainBarInstanceLock.swift" -type f

Repository: EtanHey/brainlayer

Length of output: 118

🏁 Script executed:

# Search for the BrainBarInstanceLock file more broadly fd -e swift -i brainbarinstancelock

Repository: EtanHey/brainlayer

Length of output: 116

🏁 Script executed:

# List the brain-bar sources directory structure fd . brain-bar/Sources/ -type f -name "*.swift" | head -20

Repository: EtanHey/brainlayer

Length of output: 233

🏁 Script executed:

# Read the BrainBarInstanceLock.swift file cat -n ./brain-bar/Sources/BrainBar/BrainBarInstanceLock.swift

Repository: EtanHey/brainlayer

Length of output: 2504

🏁 Script executed:

# Check for any related tests or usage patterns rg "BrainBarInstanceLock" --type swift -A 3 -B 3

Repository: EtanHey/brainlayer

Length of output: 4530

🏁 Script executed:

# Search for any initialization or creation logic that shows the lock design intent rg "flock\|lockPath\|removeItem" brain-bar/Sources/BrainBar/BrainBarInstanceLock.swift -A 2 -B 2

Repository: EtanHey/brainlayer

Length of output: 44

🏁 Script executed:

# Check if there are any issues, PRs, or comments about this race condition rg "BrainBarInstanceLock|instance.*lock|race" --type swift brain-bar/ -i | head -30

Repository: EtanHey/brainlayer

Length of output: 2762

🏁 Script executed:

# Look at the test more closely to see if it covers the race condition scenario cat -n brain-bar/Tests/BrainBarTests/BrainBarReliabilityTests.swift | grep -A 20 "BrainBarInstanceLock"

Repository: EtanHey/brainlayer

Length of output: 1308

🏁 Script executed:

# Search for any documentation or comments about why removeItem is there rg "removeItem|cleanup|delete.*lock" brain-bar/Sources/BrainBar/ -B 3 -A 3

Repository: EtanHey/brainlayer

Length of output: 4156

Remove the lock file deletion to prevent a race condition that breaks single-instance guarantees.

The flock syscall attaches locks to inodes, not pathnames. Between close(fd) and removeItem(lockPath), another process can open the same path and acquire the lock on the same inode. A third process can then create a new file at that path before the first process completes deletion, gaining a lock on a different inode. This allows two processes to hold locks simultaneously on different inodes, violating the single-instance invariant.

🔧 Suggested fix

func release() { let shouldRelease = releaseLock.withLock { released -> Bool in guard !released else { return false } released = true return true } guard shouldRelease else { return } flock(fd, LOCK_UN) close(fd) - try? FileManager.default.removeItem(atPath: lockPath) }

The lock file can safely remain on disk; it acts as a marker and causes no harm on subsequent acquisitions.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@brain-bar/Sources/BrainBar/BrainBarInstanceLock.swift` around lines 55 - 64, The release() method currently unlinks the lock file after releasing the flock, which can cause a race allowing multiple processes to hold locks on different inodes; remove the FileManager.default.removeItem(atPath: lockPath) call and keep only the unlock (flock(fd, LOCK_UN)) and close(fd) steps (preserving the releaseLock guard logic) so the lock file remains on disk as a marker and prevents the race.

coderabbitai · 2026-05-03T10:57:01Z

+        do {
+            instanceLock = try BrainBarInstanceLock.acquire(lockPath: instanceLockPath)
+        } catch BrainBarInstanceLock.AcquireError.alreadyRunning {
+            NSLog("[BrainBar] Another BrainBar instance owns %@. Exiting this server.", instanceLockPath)
+            onStartRejected?("another BrainBar instance owns \(instanceLockPath)")
+            return
+        } catch {
+            NSLog("[BrainBar] Failed to acquire single-instance lock %@: %@", instanceLockPath, String(describing: error))
+            onStartRejected?("failed to acquire single-instance lock \(instanceLockPath): \(error)")
+            return
+        }


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Release the singleton lock on every startup failure after this point.

Once instanceLock is acquired here, later startup failures still just return. That leaves this process holding the lock and never invokes onStartRejected, so the app can stay alive with no listener while blocking all future launches until exit. Add a shared failure path that releases the lock and reports rejection on every post-acquire early return.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@brain-bar/Sources/BrainBar/BrainBarServer.swift` around lines 188 - 198, After successfully acquiring instanceLock via BrainBarInstanceLock.acquire, ensure the lock is always released and onStartRejected is invoked for any subsequent early-return failure paths: introduce a deferred cleanup (e.g., a defer block or shared failure handler) immediately after assignment of instanceLock that checks a startupSucceeded flag (set true only on full successful startup) and if false calls instanceLock.release (or the appropriate BrainBarInstanceLock release method) and invokes onStartRejected with the existing error message; update all post-acquire early returns to rely on this shared cleanup so the lock is not left held when startup fails.

coderabbitai · 2026-05-03T10:57:01Z

+        let previousBusyTimeout = busyTimeoutMillis.flatMap { _ in queryPragma(db, name: "busy_timeout") }
+        if let busyTimeoutMillis {
+            try executeOnHandle(db, sql: "PRAGMA busy_timeout = \(max(1, busyTimeoutMillis))")
+        }
+        defer {
+            if let previousBusyTimeout {
+                try? executeOnHandle(db, sql: "PRAGMA busy_timeout = \(previousBusyTimeout)")
+            }
+        }


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

# First, let's look at the BrainDatabase.swift file around the flagged lines cat -n brain-bar/Sources/BrainBar/BrainDatabase.swift | sed -n '710,750p'

Repository: EtanHey/brainlayer

Length of output: 2224

🏁 Script executed:

# Check the broader context of the store method and how db is used cat -n brain-bar/Sources/BrainBar/BrainDatabase.swift | sed -n '700,780p'

Repository: EtanHey/brainlayer

Length of output: 3771

🏁 Script executed:

# Look for the BrainDatabase class definition and how the db handle is initialized cat -n brain-bar/Sources/BrainBar/BrainDatabase.swift | head -100

Repository: EtanHey/brainlayer

Length of output: 4353

🏁 Script executed:

# Check if there are any synchronization primitives (locks, queues, etc.) around the db handle rg -n "nonisolated\|DispatchQueue\|Lock\|Mutex\|serial" brain-bar/Sources/BrainBar/BrainDatabase.swift | head -30

Repository: EtanHey/brainlayer

Length of output: 44

🏁 Script executed:

# Search for other callers of the store method to understand concurrency context rg -n "\.store\(" brain-bar/Sources/BrainBar/ -A 2 | head -40

Repository: EtanHey/brainlayer

Length of output: 1921

🏁 Script executed:

# Check if there's any synchronization around db handle access or SQLite connection management cat -n brain-bar/Sources/BrainBar/BrainDatabase.swift | sed -n '150,250p'

Repository: EtanHey/brainlayer

Length of output: 3860

🏁 Script executed:

# Look for the db property definition rg -n "var db\|let db\|private.*db" brain-bar/Sources/BrainBar/BrainDatabase.swift | head -20

Repository: EtanHey/brainlayer

Length of output: 44

🏁 Script executed:

# Check for any dispatch queue or serial access patterns in the class cat -n brain-bar/Sources/BrainBar/BrainDatabase.swift | sed -n '1,150p' | tail -50

Repository: EtanHey/brainlayer

Length of output: 2375

🏁 Script executed:

# Look at how storeAsync is implemented to understand threading model cat -n brain-bar/Sources/BrainBar/BrainDatabase.swift | sed -n '778,800p'

Repository: EtanHey/brainlayer

Length of output: 1297

🏁 Script executed:

# Check the open() method to see how db connection is established and managed rg -n "func open\|private func open" brain-bar/Sources/BrainBar/BrainDatabase.swift -A 15 | head -40

Repository: EtanHey/brainlayer

Length of output: 44

🏁 Script executed:

# Check for any actor or main thread enforcement rg -n "@MainActor\|@globalActor\|nonisolated" brain-bar/Sources/BrainBar/BrainDatabase.swift

Repository: EtanHey/brainlayer

Length of output: 44

🏁 Script executed:

# Check if there's a serial queue or other synchronization I missed cat -n brain-bar/Sources/BrainBar/BrainDatabase.swift | sed -n '239,300p'

Repository: EtanHey/brainlayer

Length of output: 3080

🏁 Script executed:

# Look for any queue or lock protecting db access rg -n "serialQueue\|dbQueue\|dbLock\|synchronized" brain-bar/Sources/BrainBar/BrainDatabase.swift

Repository: EtanHey/brainlayer

Length of output: 44

🏁 Script executed:

# Check the runWriteStatement implementation to see if it has synchronization rg -n "func runWriteStatement\|private func runWriteStatement" brain-bar/Sources/BrainBar/BrainDatabase.swift -A 10 | head -30

Repository: EtanHey/brainlayer

Length of output: 44

🏁 Script executed:

# Verify SQLite PRAGMA busy_timeout behavior web_search SQLite PRAGMA busy_timeout connection state per connection or global

Repository: EtanHey/brainlayer

Length of output: 158

🏁 Script executed:

# Check if there are any other store() calls that happen concurrently rg -n "\.store\(|storeAsync|storeOrQueueWithinBudget" brain-bar/Sources/BrainBar/ --type swift | wc -l

Repository: EtanHey/brainlayer

Length of output: 65

🏁 Script executed:

# Check the full threading model - is db protected by any queue? cat -n brain-bar/Sources/BrainBar/BrainDatabase.swift | sed -n '239,400p'

Repository: EtanHey/brainlayer

Length of output: 7265

🏁 Script executed:

# Check if runWriteStatement or similar has any protection cat -n brain-bar/Sources/BrainBar/BrainDatabase.swift | sed -n '850,950p'

Repository: EtanHey/brainlayer

Length of output: 4988

🏁 Script executed:

# Look for queryPragma and executeOnHandle to understand what they do rg -n "func queryPragma|func executeOnHandle" brain-bar/Sources/BrainBar/BrainDatabase.swift -A 5

Repository: EtanHey/brainlayer

Length of output: 757

🏁 Script executed:

# Verify that there's no synchronized access pattern I'm missing rg -n "DispatchQueue\|serial\|queue\|Lock\|Mutex" brain-bar/Sources/BrainBar/BrainDatabase.swift | head -10

Repository: EtanHey/brainlayer

Length of output: 44

🌐 Web query:

SQLite PRAGMA busy_timeout connection-specific thread-safe

💡 Result:

Yes, the SQLite PRAGMA busy_timeout is connection-specific (per-database connection) and operates safely in thread-safe builds of SQLite. Key points: - It sets the busy handler for that specific database connection only. Each connection has its own independent busy handler/timeout setting. Setting it on one connection does not affect others. - Default is 0 (no timeout, immediate SQLITE_BUSY return). - Use: PRAGMA busy_timeout = milliseconds; (e.g., PRAGMA busy_timeout = 5000; for 5 seconds) - Equivalent C API: sqlite3_busy_timeout(sqlite3 *db, int ms); explicitly per connection. - Thread safety: Works in multi-threaded (SQLITE_THREADSAFE=2) and serialized (default, SQLITE_THREADSAFE=1) modes. In serialized mode, full mutex protection serializes access to the connection. In multi-thread, connections must not be shared across threads without external synchronization, but the timeout itself is a simple per-connection value. Compile with thread-safe options for multi-threaded use; single-thread mode (SQLITE_THREADSAFE=0) omits mutexes but still supports per-connection timeouts if not multi-threaded. - Must be set after opening each connection, as PRAGMAs are session/connection-specific and do not persist. - In WAL mode (recommended for concurrency), it helps handle lock contention between the single writer and multiple readers. Note: Timeouts may not apply in deadlocks (immediate SQLITE_BUSY) or certain recovery scenarios.

Citations:

1: https://www.sqlite.org/c3ref/busy_timeout.html

2: https://sqlite.org/pragma.html

3: https://sqlite.org/threadsafe.html

4: https://www.sqlite.org/pragma.html

🏁 Script executed:

# Check how storeAsync and other concurrent paths use the db cat -n brain-bar/Sources/BrainBar/BrainDatabase.swift | sed -n '779,790p'

Repository: EtanHey/brainlayer

Length of output: 731

🏁 Script executed:

# Verify there's truly no serial queue or synchronization by looking at the full class structure rg -n "private.*queue\|private.*lock\|private.*serial" brain-bar/Sources/BrainBar/BrainDatabase.swift

Repository: EtanHey/brainlayer

Length of output: 44

🏁 Script executed:

# Check if any of the store callers use main thread enforcement rg -n "db.store\|db.storeAsync\|db.storeOrQueue" brain-bar/Sources/BrainBar/QuickCaptureController.swift -B 3 -A 3 | head -20

Repository: EtanHey/brainlayer

Length of output: 44

🏁 Script executed:

# Check configureConnection to see if busy_timeout is set during init rg -n "func configureConnection|func openConnection" brain-bar/Sources/BrainBar/BrainDatabase.swift -A 15 | head -40

Repository: EtanHey/brainlayer

Length of output: 1836

🏁 Script executed:

# Check if the initial OpenConfiguration.busyTimeoutMillis is applied at all rg -n "openConfiguration\|busyTimeoutMillis" brain-bar/Sources/BrainBar/BrainDatabase.swift | head -20

Repository: EtanHey/brainlayer

Length of output: 44

🏁 Script executed:

# Look at executeOnHandle more completely to confirm it's not synchronized cat -n brain-bar/Sources/BrainBar/BrainDatabase.swift | sed -n '1642,1660p'

Repository: EtanHey/brainlayer

Length of output: 985

🏁 Script executed:

# Let me check if there are any other clues about thread safety in the broader context cat -n brain-bar/Sources/BrainBar/BrainDatabase.swift | sed -n '1,50p'

Repository: EtanHey/brainlayer

Length of output: 2346

🏁 Script executed:

# Verify the SQLITE_OPEN_FULLMUTEX flag means SQLite serializes access internally # But that doesn't prevent the caller from having race conditions in PRAGMA mutation rg -n "FULLMUTEX\|Sendable" brain-bar/Sources/BrainBar/BrainDatabase.swift | head -10

Repository: EtanHey/brainlayer

Length of output: 44

Concurrent PRAGMA mutations on shared handle can cause operations to inherit wrong timeouts.

The code mutates busy_timeout on the shared SQLite connection and restores it via defer, but since storeAsync dispatches to a concurrent global queue and there is no external synchronization on the db handle, concurrent store() calls can interleave. One thread may set the timeout to 50 ms while another thread reads that value before restoration, causing unrelated operations to use incorrect timeout values or fail spuriously. Use a dedicated short-timeout connection for budgeted writes instead of mutating the shared handle.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@brain-bar/Sources/BrainBar/BrainDatabase.swift` around lines 722 - 730, The current code mutates the shared SQLite handle's PRAGMA busy_timeout via busyTimeoutMillis/queryPragma/executeOnHandle which races when storeAsync/store() run concurrently; instead, when a custom busyTimeoutMillis is requested create and use a dedicated short-lived DB connection/handle for that budgeted write (set its PRAGMA busy_timeout once on the new handle, run the write via executeOnHandle or equivalent, then close the handle) rather than changing and restoring the shared connection; update the logic around busyTimeoutMillis, previousBusyTimeout, queryPragma, and executeOnHandle to open/close a temporary connection for timed writes to eliminate interleaving on the shared handle.

coderabbitai · 2026-05-03T10:57:01Z

+private func connectReliabilitySocket(path: String, timeout: TimeInterval) throws -> Int32 {
+    let fd = socket(AF_UNIX, SOCK_STREAM, 0)
+    guard fd >= 0 else {
+        throw NSError(domain: NSPOSIXErrorDomain, code: Int(errno))
+    }
+
+    var addr = sockaddr_un()
+    addr.sun_family = sa_family_t(AF_UNIX)
+    let pathBytes = path.utf8CString
+    let pathCapacity = MemoryLayout.size(ofValue: addr.sun_path)
+    guard pathBytes.count <= pathCapacity else {
+        close(fd)
+        throw NSError(domain: NSPOSIXErrorDomain, code: Int(ENAMETOOLONG), userInfo: [
+            NSLocalizedDescriptionKey: "Socket path too long (\(pathBytes.count) > \(pathCapacity)): \(path)"
+        ])
+    }
+    withUnsafeMutablePointer(to: &addr.sun_path) { ptr in
+        ptr.withMemoryRebound(to: CChar.self, capacity: pathCapacity) { dest in
+            pathBytes.withUnsafeBufferPointer { src in
+                _ = memcpy(dest, src.baseAddress!, src.count)
+            }
+        }
+    }
+
+    let deadline = Date().addingTimeInterval(timeout)
+    var lastErrno = ENOENT
+    while Date() < deadline {
+        let result = withUnsafePointer(to: &addr) { addrPtr in
+            addrPtr.withMemoryRebound(to: sockaddr.self, capacity: 1) { ptr in
+                connect(fd, ptr, socklen_t(MemoryLayout<sockaddr_un>.size))
+            }
+        }
+        if result == 0 {
+            return fd
+        }
+        lastErrno = errno
+        if errno != ENOENT && errno != ECONNREFUSED {
+            break
+        }
+        Thread.sleep(forTimeInterval: 0.01)
+    }
+
+    close(fd)
+    throw NSError(domain: NSPOSIXErrorDomain, code: Int(lastErrno))
+}


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

find . -name "*.swift" -type f | head -20

Repository: EtanHey/brainlayer

Length of output: 1184

🏁 Script executed:

rg "readReliabilityFramedMessage" --type swift -A 20

Repository: EtanHey/brainlayer

Length of output: 4787

🏁 Script executed:

rg "connectReliabilitySocket" --type swift -B 3 -A 3

Repository: EtanHey/brainlayer

Length of output: 1480

The socket must be nonblocking for the timeout to function.

The readReliabilityFramedMessage(...) function expects the fd to be non-blocking, as evidenced by its checks for EAGAIN/EWOULDBLOCK. However, connectReliabilitySocket returns a blocking socket from socket(AF_UNIX, SOCK_STREAM, 0). When the server never responds, read() blocks indefinitely and the deadline is never evaluated, causing the test to hang rather than timeout.

🔧 Suggested fix

if result == 0 { + let flags = fcntl(fd, F_GETFL) + if flags >= 0 { + _ = fcntl(fd, F_SETFL, flags | O_NONBLOCK) + } return fd }

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@brain-bar/Tests/BrainBarTests/BrainBarReliabilityTests.swift` around lines 240 - 284, connectReliabilitySocket currently returns a blocking socket but readReliabilityFramedMessage expects a non-blocking fd; after creating fd in connectReliabilitySocket set O_NONBLOCK (use fcntl to get current flags, add O_NONBLOCK, set them) and handle any fcntl errors by closing fd and throwing; ensure the rest of connect loop still works (check for EINPROGRESS/EWOULDBLOCK as needed) and leave the non-blocking flag on the returned descriptor so readReliabilityFramedMessage’s EAGAIN/EWOULDBLOCK handling and the timeout loop behave correctly.

fix(brainbar): harden reliability under db contention

d059427

greptile-apps Bot reviewed May 3, 2026

View reviewed changes

chatgpt-codex-connector Bot reviewed May 3, 2026

View reviewed changes

macroscopeapp Bot reviewed May 3, 2026

View reviewed changes

cursor Bot reviewed May 3, 2026

View reviewed changes

coderabbitai Bot reviewed May 3, 2026

View reviewed changes

EtanHey merged commit 219f697 into main May 3, 2026
7 checks passed

	guard bindResult == 0 else {
	NSLog("[BrainBar] Failed to bind: errno %d", errno)
	close(fd)
	return

Conversation

EtanHey commented May 3, 2026 • edited by macroscopeapp Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Context

Tests

Harden BrainBar reliability under database contention with single-instance locking and queued writes

🗂️ Filtered Issues

Summary by CodeRabbit

Release Notes

Uh oh!

greptile-apps Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot commented May 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Poem

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot May 3, 2026

Choose a reason for hiding this comment

Uh oh!

macroscopeapp Bot May 3, 2026

Choose a reason for hiding this comment

Uh oh!

cursor Bot left a comment

Choose a reason for hiding this comment

Uh oh!

cursor Bot May 3, 2026

Choose a reason for hiding this comment

Busy timeout permanently lowered if pragma query fails

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 3, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 3, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 3, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 3, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 3, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

EtanHey commented May 3, 2026 •

edited by macroscopeapp Bot

Loading

coderabbitai Bot commented May 3, 2026 •

edited

Loading