perf(matrix): use DoublyLinkedList + side-map for pending cell changes#27422
perf(matrix): use DoublyLinkedList + side-map for pending cell changes#27422anthony-murphy wants to merge 5 commits into
Conversation
|
Hi! Thank you for opening this PR. Want me to review it? Based on the diff (131 lines, 2 files), I've queued these reviewers:
How this works
|
| const change = pendingCell.local.pop(); | ||
| const change = pendingCell.local.pop()?.data; | ||
| assert(change?.localSeq === setMetadata.localSeq, 0xbaa /* must have change */); | ||
| pendingCell.localByLocalSeq.delete(setMetadata.localSeq); |
There was a problem hiding this comment.
Deep Review: pop() here is correct — rollback is LIFO, the operation being rolled back is always the tail, and assert 0xbaa enforces the match. But on #24604 the Copilot bot flagged this exact construct ("Using pop() to remove the pending operation may remove the most recently added op rather than the one corresponding to the rollback"), and now that this PR adds the localByLocalSeq index — making targeted removal trivial and applying it in reSubmitCore — a reader will reasonably ask why rollback didn't switch too. Switching would be semantically wrong (rollback must restore the previous cell value, which requires popping the tail — see matrix.rollback.spec.ts:163-187, :232-290), so the code is right; only the intent is implicit.
A one-line comment above the pop() pre-empts the same #24604 round-trip.
| pendingCell.localByLocalSeq.delete(setMetadata.localSeq); | |
| // Rollback is LIFO: the operation being rolled back is always the tail. `pop()` is correct; assert 0xbaa enforces the match. | |
| const change = pendingCell.local.pop()?.data; |
Deep ReviewReviewed commit Readiness: 8/10 — ALMOST READY Faithful, well-scoped perf prep: Path to Ready
Context for Reviewers
For human reviewer
Review history (4 prior reviews)
|
| return this.list.length; | ||
| } | ||
|
|
||
| public get last(): ListNode<{ localSeq: number; value: MatrixItem<T> }> | undefined { |
There was a problem hiding this comment.
Deep Review: PendingLocalCellChanges.last returns ListNode<{ localSeq: number; value: MatrixItem<T> }> | undefined, exposing the internal DoublyLinkedList node type as part of the wrapper's public surface. The single call site in setCellCore writes pendingCell.local.last!.data.value with // eslint-disable-next-line @typescript-eslint/no-non-null-assertion — the only such disable in this PR. Both the eslint suppression and the leaky ListNode shape come from the same accessor.
Replace get last(): ListNode<…> with peekLastValue(): MatrixItem<T> | undefined returning this.list.last?.data.value. The call site collapses to pendingCell.local.peekLastValue() ?? pendingCell.consensus, dropping the !.data.value chain and the eslint-disable, and centralising the internal-structure dependency inside the wrapper where it belongs. Josmithr's pattern on #24604 was to push back on exactly this kind of optional/ambiguity surface on the pending-cell interface family.
| } | ||
|
|
||
| public push(localSeq: number, value: MatrixItem<T>): void { | ||
| assert(!this.index.has(localSeq), "duplicate localSeq in PendingLocalCellChanges"); |
There was a problem hiding this comment.
Deep Review: assert(!this.index.has(localSeq), "duplicate localSeq in PendingLocalCellChanges") is the only plain-string assert in the changed file. Every other assert(...) in matrix.ts carries a hex short-code tag, including the three asserts this PR modifies (0xba4–0xba7) and the adjacent 0xba8–0xbab. The convention was established by the bulk-tagging pass in #24797 (jatgarg, 2025-06-09); the release-time assert-short-code tagger converts plain strings to numeric codes. If that policy check runs in CI for this package, it will fail.
Run the assert-short-code tagger (or confirm the intent is to let the next release-time pass tag it). Don't invent a code by hand.
| * (instead of a linear `findIndex` scan). | ||
| * | ||
| * The list and the index are kept in lock-step: every mutator updates both. | ||
| */ |
There was a problem hiding this comment.
Deep Review: The class JSDoc states the purpose (O(1) lookup) but not the motivation for the side-map shape. Under today's runtime contract reSubmitCore always receives a localSeq equal to the head of pendingCell.local (because sendSetCellOp assigns a fresh nextLocalSeq() and pushes to tail, and the runtime walks pending ops FIFO). A Deque + head-shift + head-match assert would deliver identical O(1) behavior with no side-map — which is exactly the question the next reader will ask, and the question this dossier converged on.
The justification — explicit removal by localSeq (a) avoids encoding an implicit FIFO dependency in a per-cell data structure, and (b) is staged for follow-up squash work that needs arbitrary-position removal — is reasonable but lives only in PR discussion, not in code. Expand the PendingLocalCellChanges class comment to state explicitly: (1) why the side-map exists (avoids relying on FIFO/head invariant at call sites; supports upcoming squash/arbitrary-position removal), and (2) the lock-step invariant the wrapper enforces between list and index. Without this, the next reader will re-derive the "why not a Deque?" critique.
|
|
||
| // Both clients should converge on the last-written value. | ||
| const expected = [[N - 1]]; | ||
| assert.deepEqual(extract(matrix1), expected); |
There was a problem hiding this comment.
Deep Review: "resubmits N writes to the same cell after reconnect" stages N=200 pending writes, reconnects, and asserts only assert.deepEqual(extract(matrix1), [[N - 1]]) and extract(matrix2), [[N - 1]] — final-state convergence. The test's own framing ("re-emit all N pending writes") implies a per-op regression target, but no cellsChanged observation is wired up. On #18018, vladsud explicitly objected to resubmit logic preserving final state while skipping intermediate set-cell states, because matrix exposes each intermediate state through change events. The PR adds no batching/squash path so the risk is narrowed today — but the gap remains real for the path being rewritten, and a future squash follow-up could regress this silently.
Attach a cellsChanged consumer (or a counter on the open-handler) and verify both clients observe all N values in order during the reconnect drain — not just the final one.
| assert.deepEqual(extract(matrix1), expected); | ||
| assert.deepEqual(extract(matrix3), expected); | ||
| }); | ||
|
|
There was a problem hiding this comment.
Deep Review: This PR introduces three new code paths on PendingLocalCellChanges: pop() (rollback, matrix.ts:974), last peek (FWW previous-value decision in setCellCore), and shift() (ACK). The single new test exercises only the reSubmit path. Existing tests cover single-pending rollback and single-pending FWW, but the "many pending locals to one cell → rollback / FWW conflict" combinations — exactly the paths that depend on the new last accessor and on pop's side-map cleanup — are not directly exercised.
Add two tests under the same harness:
- A rollback test that stacks ≥3 pending writes to one cell then undoes them in sequence (exercises
pop+last+ side-map cleanup; pairs with the existingmatrix.rollback.spec.tspatterns). - An FWW-conflict test where a remote winner arrives while multiple local pendings exist on the same cell (exercises the
length > 0 ? local.last!.data.value : pendingCell.consensusbranch).
Both fit the existing MockContainerRuntime harness.
Description
Replaces
PendingCellChanges<T>.local: { localSeq, value }[]withDoublyLinkedList<{ localSeq, value }>and adds a sibling side-maplocalByLocalSeq: Map<number, ListNode<{ localSeq, value }>>for O(1)findIndex-by-localSeqlookup.Sites converted:
sendSetCellOppush: insert into list, register node in side-map.reSubmitCore: wasfindIndex + spliceper pending op; nowlocalByLocalSeq.get(localSeq)→list.remove(node)→localByLocalSeq.delete(localSeq). O(1).list.shift()?.data+ side-map delete.pop()+ peek-taillist.last!.data.value.list.length.Why
For a hot cell with k pending writes, the previous reSubmit-per-op was O(k) find + O(k) splice (O(k²) total) and per-ACK was O(k) shift. For n ops to a single hot cell, the whole rebase pass was O(n²). The list-plus-side-map makes every per-op step O(1).
Notes
Snapshot/wire format unaffected (
_pendingCliSeqDataisunknown). Pure perf prep — noreSubmitSquashed, no squash logic, no tests touched, no api-report changes.