fix: recover from poisoned mutex in SqliteStorage (#11)#23
Conversation
A panic in one thread while holding the storage mutex used to poison the lock for every other thread, taking down the whole crawler. Use unwrap_or_else(PoisonError::into_inner) so the connection stays usable after such a panic. Closes #11 https://claude.ai/code/session_012RmdaovmNWZVAim4XxCWwn
📝 WalkthroughWalkthroughSqliteStorage now stores the database path and introduces a ChangesMutex Poisoning Recovery
🎯 2 (Simple) | ⏱️ ~12 minutes
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@src/core/storage.rs`:
- Line 41: The code currently uses self.conn.lock().unwrap_or_else(|e|
e.into_inner()) which reuses a potentially poisoned rusqlite::Connection;
instead, detect a poisoned mutex error and recreate a fresh Connection rather
than calling into_inner(). In storage.rs, replace the unwrap_or_else usage
around self.conn with logic that, on Err(std::sync::PoisonError), constructs a
new rusqlite::Connection (e.g., via the same open/create routine used elsewhere
or a new helper like recreate_connection()) and swaps it into self.conn before
proceeding; ensure this logic appears where the mutex is locked (references to
self.conn and the methods using it) so subsequent uses (the other lock at the
later occurrence) also get a recovered connection.
In `@tests/core_storage.rs`:
- Around line 106-139: The test test_save_works_after_mutex_poisoning never
actually poisons the mutex because SqliteStorage::save() releases self.conn
before the spawned thread panics; add a test-only helper on SqliteStorage (e.g.,
pub fn poison_mutex_for_test(&self)) that locks self.conn and panics while the
guard is held, then update the test to call storage.poison_mutex_for_test()
inside the spawned thread (instead of calling save there) so the MutexGuard is
still alive when the panic occurs; keep existing assertions that
storage.save(...) and storage.retrieve(...) recover from the poisoned mutex.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro Plus
Run ID: 8527969e-1cd7-4e0e-9048-249ba780d8ae
📒 Files selected for processing (2)
src/core/storage.rstests/core_storage.rs
…'t poison Address CodeRabbit review on #23: - The previous test held the guard only during s.save() and panicked after the guard was dropped, so the mutex was never poisoned and the recovery path was never exercised. Replace with a unit test in the same module that panics while the guard is still alive and asserts is_poisoned(). - Reusing a rusqlite Connection after a panic mid-operation is not documented as safe. Instead of into_inner()ing the recovered handle, reopen a fresh Connection inside the recovered guard. https://claude.ai/code/session_012RmdaovmNWZVAim4XxCWwn
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@src/core/storage.rs`:
- Around line 46-56: The current locked_conn silently returns a poisoned
Connection when Self::open(&self.db_path) fails, which can reuse a potentially
corrupted handle; change locked_conn to return Result<MutexGuard<'_,
Connection>, anyhow::Error> (or a suitable error type) and propagate errors
instead of falling back: in the Err(poison) branch, attempt
Self::open(&self.db_path) and on success replace the inner Connection, but if
opening fails return an error; update all callers (notably save and retrieve) to
handle the Result (propagate with ? or map_err) and adjust their signatures to
return Result as needed so failures to reopen are surfaced instead of reusing a
poisoned connection.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
| fn locked_conn(&self) -> MutexGuard<'_, Connection> { | ||
| match self.conn.lock() { | ||
| Ok(guard) => guard, | ||
| Err(poison) => { | ||
| let mut guard = poison.into_inner(); | ||
| if let Ok(fresh) = Self::open(&self.db_path) { | ||
| *guard = fresh; | ||
| } | ||
| guard | ||
| } | ||
| } |
There was a problem hiding this comment.
Silent fallback to poisoned connection undermines recovery.
Lines 51-53 silently fall back to the old (potentially corrupted) connection if Self::open() fails. This contradicts the intent stated in the comment (lines 42-45) that "reopening is safer than reusing the recovered handle" and defeats the purpose of the recovery mechanism. Per the past review and rusqlite documentation, a Connection after a panic mid-operation is not guaranteed to be usable.
If reopening fails due to filesystem issues, returning the poisoned connection could lead to data corruption or undefined behavior.
🔧 Proposed fix: propagate the error instead of silent fallback
Change locked_conn to return Result:
- fn locked_conn(&self) -> MutexGuard<'_, Connection> {
+ fn locked_conn(&self) -> Result<MutexGuard<'_, Connection>, StorageError> {
match self.conn.lock() {
- Ok(guard) => guard,
+ Ok(guard) => Ok(guard),
Err(poison) => {
let mut guard = poison.into_inner();
- if let Ok(fresh) = Self::open(&self.db_path) {
- *guard = fresh;
- }
- guard
+ let fresh = Self::open(&self.db_path)?;
+ *guard = fresh;
+ Ok(guard)
}
}
}Update call sites in save and retrieve:
pub fn save(...) -> Result<(), StorageError> {
let hash = Self::get_hash(identifier);
let json = serde_json::to_string(data)?;
- let conn = self.locked_conn();
+ let conn = self.locked_conn()?;
conn.execute(...)?;
Ok(())
} pub fn retrieve(...) -> Result<Option<HashMap<String, serde_json::Value>>, StorageError> {
let hash = Self::get_hash(identifier);
- let conn = self.locked_conn();
+ let conn = self.locked_conn()?;
let mut stmt = conn.prepare(...)?;
...
}🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@src/core/storage.rs` around lines 46 - 56, The current locked_conn silently
returns a poisoned Connection when Self::open(&self.db_path) fails, which can
reuse a potentially corrupted handle; change locked_conn to return
Result<MutexGuard<'_, Connection>, anyhow::Error> (or a suitable error type) and
propagate errors instead of falling back: in the Err(poison) branch, attempt
Self::open(&self.db_path) and on success replace the inner Connection, but if
opening fails return an error; update all callers (notably save and retrieve) to
handle the Result (propagate with ? or map_err) and adjust their signatures to
return Result as needed so failures to reopen are surfaced instead of reusing a
poisoned connection.
Summary
SqliteStorage::saveand::retrievecalledself.conn.lock().unwrap(). A panic in one worker thread while holding that lock would poison it, causing every subsequentlock().unwrap()to panic too — a single bad page could bring down storage for the whole crawler.Switched both call sites to
lock().unwrap_or_else(|e| e.into_inner()), recovering the inner connection from aPoisonErrorso the storage remains usable.Test plan
save/retrievestill succeedcargo test— full suite greencargo clippy --all-targets -- -D warningscleancargo fmt --checkcleanCloses #11
Generated by Claude Code
Summary by CodeRabbit
Bug Fixes
Tests