Skip to content

Batch transaction writes in handleTransactionList to reduce fsync count#4168

Draft
stevenvegt wants to merge 1 commit intomasterfrom
feature/batch-transaction-add
Draft

Batch transaction writes in handleTransactionList to reduce fsync count#4168
stevenvegt wants to merge 1 commit intomasterfrom
feature/batch-transaction-add

Conversation

@stevenvegt
Copy link
Copy Markdown
Member

Summary

Add State.AddMany() that writes multiple transactions from a TransactionList message in a single BBolt write transaction, requiring only one fsync instead of one per transaction.

The problem

Previously, handleTransactionList called state.Add() for each transaction individually. Each Add() acquires a write lock, writes to BBolt, and triggers an fsync on commit. For a message with N transactions, that's N fsyncs.

On local SSD, fsync takes <1ms — barely noticeable. On network-attached storage (e.g. Azure premium SMB), fsync latency is 10-100ms per call. For 100 transactions, that's 1-10 seconds just in fsyncs per message.

This is compounded by the go-stoabs read lock issue (go-stoabs#146): slow fsyncs hold the write lock longer, and Go's sync.RWMutex writer-preference blocks all concurrent reads while a writer is pending. The result is a cascading slowdown where a bootup that takes 3 minutes on local storage takes 30+ minutes on SMB — not a proportional slowdown, but a multiplicative one due to lock contention amplifying the raw I/O penalty.

The fix

AddMany() processes all transactions from a message in a single write transaction:

  • Verification happens inside the write TX so that later transactions can reference earlier ones in the same batch (BBolt supports read-your-own-writes within a transaction)
  • On the first failure, processing stops and all successfully added transactions are committed
  • Returns (added int, err error) — the caller uses added to identify which transaction failed and err to determine the action (ErrPreviousTransactionMissing triggers state reconciliation, other errors are logged and recovered via gossip)
  • Context cancellation is checked between transactions

Relationship to go-stoabs#146

This PR and go-stoabs#146 are complementary:

  • go-stoabs#146 eliminates read lock contention (reads no longer blocked by writes)
  • This PR reduces write lock duration (1 fsync instead of N)

Together they address both sides of the lock contention that causes slow bootup on network storage.

Test plan

  • All existing TestProtocol_handleTransactionList tests updated and passing
  • Full test suite (106 packages) passes
  • Test on Azure SMB to measure bootup time improvement

Co-Authored-By: Claude Opus 4.6 (1M context) noreply@anthropic.com

@qltysh
Copy link
Copy Markdown

qltysh bot commented Apr 9, 2026

1 new issue

Tool Category Rule Count
qlty Structure Function with many returns (count = 7): handleTransactionList 1

Add State.AddMany() that writes multiple transactions in a single BBolt
write transaction. Previously, each transaction in a TransactionList
message was added individually via State.Add(), each acquiring a write
lock and triggering its own fsync. For a message with N transactions,
this meant N separate fsyncs.

On network-attached storage (e.g. Azure premium SMB), fsync latency is
10-100ms compared to <1ms on local SSD. Combined with the go-stoabs
read lock issue (see go-stoabs#146), this creates a compounding effect:
slow fsyncs hold the write lock longer, blocking concurrent reads via
the RWMutex writer-preference, which in turn blocks subsequent writes.
A bootup that takes 3 minutes on local storage can take 30+ minutes on
SMB because the lock contention multiplies the raw I/O penalty.

With batching, N transactions require only 1 fsync. Verification happens
inside the write transaction so that later transactions in the batch can
reference earlier ones. On the first failure, processing stops and all
successfully added transactions are committed. The caller receives the
count of added transactions and the first error, preserving the existing
error handling (ErrPreviousTransactionMissing triggers state
reconciliation, other errors are logged and recovered via gossip).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@stevenvegt stevenvegt force-pushed the feature/batch-transaction-add branch from d88ccdd to dc50f1c Compare April 9, 2026 19:49
@qltysh
Copy link
Copy Markdown

qltysh bot commented Apr 9, 2026

Qlty

Coverage Impact

⬇️ Merging this pull request will decrease total coverage on master by 0.21%.

Modified Files with Diff Coverage (2)

RatingFile% DiffUncovered Line #s
Coverage rating: B Coverage rating: D
network/dag/state.go0.0%222-309
Coverage rating: B Coverage rating: B
network/transport/v2/transactionlist_handler.go100.0%
Total19.3%
🤖 Increase coverage with AI coding...

In the `feature/batch-transaction-add` branch, add test coverage for this new code:

- `network/dag/state.go` -- Line 222-309

🚦 See full report on Qlty Cloud »

🛟 Help
  • Diff Coverage: Coverage for added or modified lines of code (excludes deleted files). Learn more.

  • Total Coverage: Coverage for the whole repository, calculated as the sum of all File Coverage. Learn more.

  • File Coverage: Covered Lines divided by Covered Lines plus Missed Lines. (Excludes non-executable lines including blank lines and comments.)

    • Indirect Changes: Changes to File Coverage for files that were not modified in this PR. Learn more.

@stevenvegt stevenvegt marked this pull request as draft April 10, 2026 07:08
@stevenvegt
Copy link
Copy Markdown
Member Author

Not ready yet, see failing e2e tests. Keys from previous txs in the same batch don't become available for later txs because the notifier hasn't ran yet. Needs some more thinking.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant