Skip to content

Fix node startup failure when syncing on slow disk (v5.4)#4170

Merged
reinkrul merged 3 commits intoV5.4from
bbolt-read-lock-fix
Apr 10, 2026
Merged

Fix node startup failure when syncing on slow disk (v5.4)#4170
reinkrul merged 3 commits intoV5.4from
bbolt-read-lock-fix

Conversation

@reinkrul
Copy link
Copy Markdown
Member

Summary

Updates go-stoabs from v1.9.0 to v1.11.1, which removes an unnecessary Go-level read lock from the BBolt wrapper (go-stoabs#146).

Problem

Nodes with a large transaction log fail to start when disk I/O is slow (e.g. SMB/Azure Files volume mounts on Azure ACI):

unable to start Network: failed to start notifiers: unable to obtain BBolt read lock: database error: context deadline exceeded

On startup, connectToKnownNodes() triggers many incoming transactions. Each write transaction holds a Go-level write lock while committing (including a slow fdatasync over SMB). The go-stoabs wrapper used a sync.RWMutex with writer-preference, which blocked all concurrent read transactions — including network notifiers trying to start. After 1 second, the notifiers time out and the node shuts down.

Fix

go-stoabs v1.11.1 replaces the sync.RWMutex with a plain sync.Mutex that only serializes write transactions. Read transactions now go directly to BBolt's native locking, which handles concurrent reads independently of writes. This eliminates the "unable to obtain BBolt read lock" error class entirely.

See #4162 for the full root cause analysis and performance test results.

Fixes #4162

Test plan

  • Verify all existing tests pass
  • Verify node starts successfully when syncing many transactions on a slow disk

reinkrul and others added 3 commits April 10, 2026 12:25
…to v1.11.1

Updates go-stoabs from v1.9.0 to v1.11.1, which removes an unnecessary
Go-level read lock from the BBolt wrapper. The read lock caused reader
starvation (context deadline exceeded) when a write transaction was pending,
preventing network notifiers from starting on nodes with slow disk I/O
(e.g. SMB/Azure Files volume mounts).

Fixes #4162

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@qltysh
Copy link
Copy Markdown

qltysh bot commented Apr 10, 2026

Qlty

Coverage Impact

⬇️ Merging this pull request will decrease total coverage on V5.4 by 0.03%.

🚦 See full report on Qlty Cloud »

🛟 Help
  • Diff Coverage: Coverage for added or modified lines of code (excludes deleted files). Learn more.

  • Total Coverage: Coverage for the whole repository, calculated as the sum of all File Coverage. Learn more.

  • File Coverage: Covered Lines divided by Covered Lines plus Missed Lines. (Excludes non-executable lines including blank lines and comments.)

    • Indirect Changes: Changes to File Coverage for files that were not modified in this PR. Learn more.

@reinkrul reinkrul merged commit eef688d into V5.4 Apr 10, 2026
8 checks passed
@reinkrul reinkrul deleted the bbolt-read-lock-fix branch April 10, 2026 12:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants