Skip to content

fix: recover from poisoned mutex in SqliteStorage (#11)#23

Merged
Liohtml merged 2 commits into
masterfrom
claude/storage-mutex-poisoning
May 28, 2026
Merged

fix: recover from poisoned mutex in SqliteStorage (#11)#23
Liohtml merged 2 commits into
masterfrom
claude/storage-mutex-poisoning

Conversation

@Liohtml
Copy link
Copy Markdown
Owner

@Liohtml Liohtml commented May 28, 2026

Summary

SqliteStorage::save and ::retrieve called self.conn.lock().unwrap(). A panic in one worker thread while holding that lock would poison it, causing every subsequent lock().unwrap() to panic too — a single bad page could bring down storage for the whole crawler.

Switched both call sites to lock().unwrap_or_else(|e| e.into_inner()), recovering the inner connection from a PoisonError so the storage remains usable.

Test plan

  • New regression test that deliberately poisons the mutex from a panicking thread, then verifies save / retrieve still succeed
  • cargo test — full suite green
  • cargo clippy --all-targets -- -D warnings clean
  • cargo fmt --check clean

Closes #11


Generated by Claude Code

Summary by CodeRabbit

  • Bug Fixes

    • Enhanced storage reliability with improved error recovery for connection failures.
  • Tests

    • Added test coverage for storage resilience scenarios.

Review Change Stack

A panic in one thread while holding the storage mutex used to poison the
lock for every other thread, taking down the whole crawler. Use
unwrap_or_else(PoisonError::into_inner) so the connection stays usable
after such a panic.

Closes #11

https://claude.ai/code/session_012RmdaovmNWZVAim4XxCWwn
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 28, 2026

📝 Walkthrough

Walkthrough

SqliteStorage now stores the database path and introduces a locked_conn() helper that safely recovers from poisoned Mutex by reopening a fresh SQLite connection. The save() and retrieve() methods use this helper instead of unconditional unwrap(), preventing panics. A new test validates recovery after intentional mutex poisoning.

Changes

Mutex Poisoning Recovery

Layer / File(s) Summary
Storage struct updates and connection recovery helper
src/core/storage.rs
SqliteStorage adds db_path field and introduces locked_conn() helper that handles poisoned Mutex<Connection> by extracting the inner connection and reopening a fresh SQLite connection from the stored path.
Integration into save and retrieve methods
src/core/storage.rs
SqliteStorage::save() and ::retrieve() replace direct lock().unwrap() calls with locked_conn() to enable automatic recovery from mutex poisoning.
Recovery behavior verification test
src/core/storage.rs
New test intentionally poisons the mutex by panicking while holding the lock, then asserts that subsequent save() and retrieve() calls recover and succeed without panicking.

🎯 2 (Simple) | ⏱️ ~12 minutes

🐰 A poisoned lock once panicked and crashed,
But now our little helper works so fast—
It catches poison, reopens fresh,
And lets the storage dance afresh! ✨

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title directly describes the main fix: recovering from poisoned mutexes in SqliteStorage, which matches the primary change in the PR.
Linked Issues check ✅ Passed The PR successfully implements all coding objectives from issue #11: replaces unwrap() with poisoned mutex recovery, handles PoisonError appropriately, and includes regression tests.
Out of Scope Changes check ✅ Passed All changes are directly related to fixing mutex poisoning in SqliteStorage as specified in issue #11; no unrelated modifications detected.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch claude/storage-mutex-poisoning

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/core/storage.rs`:
- Line 41: The code currently uses self.conn.lock().unwrap_or_else(|e|
e.into_inner()) which reuses a potentially poisoned rusqlite::Connection;
instead, detect a poisoned mutex error and recreate a fresh Connection rather
than calling into_inner(). In storage.rs, replace the unwrap_or_else usage
around self.conn with logic that, on Err(std::sync::PoisonError), constructs a
new rusqlite::Connection (e.g., via the same open/create routine used elsewhere
or a new helper like recreate_connection()) and swaps it into self.conn before
proceeding; ensure this logic appears where the mutex is locked (references to
self.conn and the methods using it) so subsequent uses (the other lock at the
later occurrence) also get a recovered connection.

In `@tests/core_storage.rs`:
- Around line 106-139: The test test_save_works_after_mutex_poisoning never
actually poisons the mutex because SqliteStorage::save() releases self.conn
before the spawned thread panics; add a test-only helper on SqliteStorage (e.g.,
pub fn poison_mutex_for_test(&self)) that locks self.conn and panics while the
guard is held, then update the test to call storage.poison_mutex_for_test()
inside the spawned thread (instead of calling save there) so the MutexGuard is
still alive when the panic occurs; keep existing assertions that
storage.save(...) and storage.retrieve(...) recover from the poisoned mutex.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: 8527969e-1cd7-4e0e-9048-249ba780d8ae

📥 Commits

Reviewing files that changed from the base of the PR and between 805f55b and 36504e2.

📒 Files selected for processing (2)
  • src/core/storage.rs
  • tests/core_storage.rs

Comment thread src/core/storage.rs Outdated
Comment thread tests/core_storage.rs Outdated
…'t poison

Address CodeRabbit review on #23:
- The previous test held the guard only during s.save() and panicked after
  the guard was dropped, so the mutex was never poisoned and the recovery
  path was never exercised. Replace with a unit test in the same module
  that panics while the guard is still alive and asserts is_poisoned().
- Reusing a rusqlite Connection after a panic mid-operation is not
  documented as safe. Instead of into_inner()ing the recovered handle,
  reopen a fresh Connection inside the recovered guard.

https://claude.ai/code/session_012RmdaovmNWZVAim4XxCWwn
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/core/storage.rs`:
- Around line 46-56: The current locked_conn silently returns a poisoned
Connection when Self::open(&self.db_path) fails, which can reuse a potentially
corrupted handle; change locked_conn to return Result<MutexGuard<'_,
Connection>, anyhow::Error> (or a suitable error type) and propagate errors
instead of falling back: in the Err(poison) branch, attempt
Self::open(&self.db_path) and on success replace the inner Connection, but if
opening fails return an error; update all callers (notably save and retrieve) to
handle the Result (propagate with ? or map_err) and adjust their signatures to
return Result as needed so failures to reopen are surfaced instead of reusing a
poisoned connection.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: e849f319-aa72-4eb4-beb8-6cf5cdc5c728

📥 Commits

Reviewing files that changed from the base of the PR and between 36504e2 and f228358.

📒 Files selected for processing (1)
  • src/core/storage.rs

Comment thread src/core/storage.rs
Comment on lines +46 to +56
fn locked_conn(&self) -> MutexGuard<'_, Connection> {
match self.conn.lock() {
Ok(guard) => guard,
Err(poison) => {
let mut guard = poison.into_inner();
if let Ok(fresh) = Self::open(&self.db_path) {
*guard = fresh;
}
guard
}
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Silent fallback to poisoned connection undermines recovery.

Lines 51-53 silently fall back to the old (potentially corrupted) connection if Self::open() fails. This contradicts the intent stated in the comment (lines 42-45) that "reopening is safer than reusing the recovered handle" and defeats the purpose of the recovery mechanism. Per the past review and rusqlite documentation, a Connection after a panic mid-operation is not guaranteed to be usable.

If reopening fails due to filesystem issues, returning the poisoned connection could lead to data corruption or undefined behavior.

🔧 Proposed fix: propagate the error instead of silent fallback

Change locked_conn to return Result:

-    fn locked_conn(&self) -> MutexGuard<'_, Connection> {
+    fn locked_conn(&self) -> Result<MutexGuard<'_, Connection>, StorageError> {
         match self.conn.lock() {
-            Ok(guard) => guard,
+            Ok(guard) => Ok(guard),
             Err(poison) => {
                 let mut guard = poison.into_inner();
-                if let Ok(fresh) = Self::open(&self.db_path) {
-                    *guard = fresh;
-                }
-                guard
+                let fresh = Self::open(&self.db_path)?;
+                *guard = fresh;
+                Ok(guard)
             }
         }
     }

Update call sites in save and retrieve:

     pub fn save(...) -> Result<(), StorageError> {
         let hash = Self::get_hash(identifier);
         let json = serde_json::to_string(data)?;
-        let conn = self.locked_conn();
+        let conn = self.locked_conn()?;
         conn.execute(...)?;
         Ok(())
     }
     pub fn retrieve(...) -> Result<Option<HashMap<String, serde_json::Value>>, StorageError> {
         let hash = Self::get_hash(identifier);
-        let conn = self.locked_conn();
+        let conn = self.locked_conn()?;
         let mut stmt = conn.prepare(...)?;
         ...
     }
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/core/storage.rs` around lines 46 - 56, The current locked_conn silently
returns a poisoned Connection when Self::open(&self.db_path) fails, which can
reuse a potentially corrupted handle; change locked_conn to return
Result<MutexGuard<'_, Connection>, anyhow::Error> (or a suitable error type) and
propagate errors instead of falling back: in the Err(poison) branch, attempt
Self::open(&self.db_path) and on success replace the inner Connection, but if
opening fails return an error; update all callers (notably save and retrieve) to
handle the Result (propagate with ? or map_err) and adjust their signatures to
return Result as needed so failures to reopen are surfaced instead of reusing a
poisoned connection.

@Liohtml Liohtml merged commit 2ff9c70 into master May 28, 2026
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[repo-health] Medium: Mutex poisoning panics in SqliteStorage — lock().unwrap() crashes worker threads

2 participants