Skip to content

perf(tree): binary-search anchor removeChild + O(1) revision lookup in editManager#27428

Draft
anthony-murphy wants to merge 6 commits into
microsoft:mainfrom
anthony-murphy:prep-tree
Draft

perf(tree): binary-search anchor removeChild + O(1) revision lookup in editManager#27428
anthony-murphy wants to merge 6 commits into
microsoft:mainfrom
anthony-murphy:prep-tree

Conversation

@anthony-murphy
Copy link
Copy Markdown
Contributor

Description

Bundles two tree structural changes:

  1. Binary-search by parentIndex in removeChild (anchorSet.ts and sparseTree.ts) — replaces the linear indexOf with binary search by child.parentIndex(). The field arrays are already kept sorted by parentIndex (the existing binaryFind helpers on the same structs rely on it). The existing TODOs at both sites are removed.

  2. Map<RevisionTag, number> side-map on SharedBranch (editManager.ts + sharedTreeCore.ts) — so sharedTreeCore.ts:getEnrichedCommit's localCommits.findIndex(c => c.revision === revision) becomes O(1). The localCommits array and getLocalCommits() shape are unchanged. Maintenance uses a head-offset counter on shift so other map entries don't need to be touched (avoids O(m) decrement-all per ACK).

A local binaryFindIndex helper is added in each removeChild file. The existing benchmarked binaryFind is left untouched — it returns the element (not the index) and has a tuned dense-array fast-path that doesn't apply to the index-by-parentIndex case.

Why

  • The TODO at both removeChild sites called this out: "should do more optimized search (ex: binary search or better) using child.parentIndex()." This is invoked from anchor lifecycle (considerDispose) — every anchor free in workloads with many siblings paid the scan.
  • Burst submit / resubmit storms accumulate large local-commit batches. findIndex per resubmit was O(n) → O(n²) per full reSubmit. The side-map keeps the lookup O(1) without amortizing the shift cost over every other entry.

Notes

getLocalCommits() return shape unchanged (readonly GraphCommit[]). The sort-by-parentIndex invariant on anchor fields is already maintained by every field mutator (getOrCreateChild sorts after push; coupleNodes / decoupleNodes / offsetChildren preserve order). Pure perf prep — no squash logic, no tests touched, no api-report changes.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 27, 2026

Hi! Thank you for opening this PR. Want me to review it?

Based on the diff (233 lines, 6 files), I've queued these reviewers:

  • Correctness — logic errors, race conditions, lifecycle issues
  • Security — vulnerabilities, secret exposure, injection
  • API Compatibility — breaking changes, release tags, type design
  • Performance — algorithmic regressions, memory leaks
  • Testing — coverage gaps, hollow tests

How this works

  • Adjust the reviewer set by ticking/unticking boxes above. Reviewer toggles alone don't trigger anything.

  • Tick Start review below to dispatch the review fleet.

  • After review finishes, tick Start review again to request another run — it auto-resets after each dispatch.

  • This comment updates as new commits land; your reviewer selections are preserved.

  • Start review

}
}
return -1;
}
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Deep Review: binaryFindIndex here is a textbook [min, max) search with no fast-path. The sibling binaryFind immediately above (anchorSet.ts:1284-1310) opens with const guess = sorted[index]; if (guess?.parentIndex === index) return guess;, and its privateRemarks (anchorSet.ts:1273-1283) explicitly warns "current usages tends to fully populate the anchor tree leading the correct array index to be the requested parent index … care and benchmarking should be used when messing with this function."

The PR description says the fast-path "doesn't apply to the index-by-parentIndex case", but when the field is dense sorted[index].parentIndex === index, so the array index of the hit equals index — one read instead of log n iterations. removeChild runs from considerDispose on every anchor free, which is exactly the path CraigMacomber tuned in #22717 ("I started with a more generic binary search which returned the index, but this was so dominated by small data sizes (0 and 1 are the common array sizes here) that it had a lot of overhead…", 2024-10-03).

Suggest adding the guess fast-path before the binary loop and updating binaryFind's privateRemarks to also cover the new index variant. Re-run the BubbleBench-style anchor benchmarks #22717 used to confirm parity.

Suggested change
}
function binaryFindIndex(sorted: readonly PathNode[], index: number): number {
const guess = sorted[index];
if (guess !== undefined && guess.parentIndex === index) {
return index;
}
let min = 0;
let max = sorted.length;
while (min < max) {
const mid = (min + max) >>> 1;
const current = sorted[mid].parentIndex;
if (current === index) {
return mid;
} else if (current < index) {
min = mid + 1;
} else {
max = mid;
}
}
return -1;
}

}
}
return -1;
}
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Deep Review: Same missing fast-path as the anchorSet.ts copy. When the field is dense sorted[index].parentIndex === index, so the answer is index — one read instead of log n iterations. The sibling binaryFind (and binaryFind in anchorSet.ts) opens with exactly this guess to avoid the loop on the common dense-array shape that #22717's benchmarks were tuned against. Suggest mirroring the fast-path here and re-running the relevant benchmark to confirm parity.

Suggested change
}
function binaryFindIndex<TData>(sorted: readonly SparseNode<TData>[], index: number): number {
const guess = sorted[index];
if (guess !== undefined && guess.parentIndex === index) {
return index;
}
let min = 0;
let max = sorted.length;
while (min < max) {
const mid = (min + max) >>> 1;
const current = sorted[mid].parentIndex;
if (current === index) {
return mid;
} else if (current < index) {
min = mid + 1;
} else {
max = mid;
}
}
return -1;
}

const childIndex = field === undefined ? -1 : binaryFindIndex(field, child.parentIndex);
assert(childIndex !== -1, 0x35c /* child must be parented to be removed */);
field?.splice(childIndex, 1);
if (field?.length === 0) {
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Deep Review: The previous field?.indexOf(child) implicitly cross-checked object identity (it returns -1 if the exact child reference is absent). The replacement binaryFindIndex(field, child.parentIndex) returns a hit whenever some node with that parentIndex exists — if the sort/uniqueness invariant ever drifted (duplicate parentIndex, stale child reference), the PR would silently splice the wrong node.

Extend the assert to also verify identity. Zero runtime cost in non-debug builds; restores the implicit invariant indexOf provided.

Suggested change
if (field?.length === 0) {
assert(childIndex !== -1 && field![childIndex] === child, 0x35c /* child must be in the field */);

// Note that this is the index in the list of child paths, not the index within the field.
const childIndex = field === undefined ? -1 : binaryFindIndex(field, child.parentIndex);
assert(childIndex !== -1, 0x4a5 /* child must be parented to be removed */);
field?.splice(childIndex, 1);
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Deep Review: Same identity-loss concern as the anchorSet.ts assert. The prior field?.indexOf(child) returned -1 unless the exact child reference was present; binaryFindIndex(field, child.parentIndex) matches any node with that parentIndex. If the sorted-by-parentIndex invariant ever drifted (duplicate, stale reference), this would splice the wrong node silently.

Suggested change
field?.splice(childIndex, 1);
assert(childIndex !== -1 && field![childIndex] === child, 0x4a5 /* child must be in the field */);

public getLocalCommitIndexByRevision(
branchId: BranchId,
revision: RevisionTag,
): number | undefined {
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Deep Review: getLocalCommitIndexByRevision(branchId, revision): number | undefined leaks the storage shape — callers need to know the result is an index into localCommits and must reason about the new headOffset index space the side-map maintains. The only call site (sharedTreeCore.ts:558-566) immediately does localCommits.slice(revisionIndex), so the index is a means to a suffix slice.

Both solution-space reviewers independently designed a getLocalCommitsFrom(branchId, revision): readonly GraphCommit<TChangeset>[] returning the suffix slice (or undefined/empty). That matches the symmetry of the existing getLocalCommits(branchId) accessor at editManager.ts:422-425, removes the undefined-vs--1 sentinel split the PR's own "Risks and tradeoffs" calls out, and keeps EditManager's storage substitutable.

The fair counter is that getLocalCommits() already exposes the array shape — but it does not expose headOffset, which this accessor effectively does. If you'd rather keep the index API for callers we haven't seen yet, a one-line // returns an index into getLocalCommits(branchId); undefined if not present comment here would be enough to ward off the next reader.

* @param index - parentIndex being looked for.
* @returns array index of the child with the requested parentIndex, or -1 if not found.
*/
function binaryFindIndex(sorted: readonly PathNode[], index: number): number {
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Deep Review: This binaryFindIndex and the copy in sparseTree.ts:150-172 are byte-identical apart from the element type parameter (PathNode vs SparseNode<TData>), and both rely only on .parentIndex — a property on both via UpPath. Unlike binaryFind, this helper has no tuned fast-path or benchmark surface, so the "don't extract benchmarked code" rationale in the PR description does not apply here. Drift risk is real: a fast-path or bug-fix added to one copy (see the adjacent thread asking for the dense-array guess) will silently miss the other.

Suggest extracting function binaryFindIndexByParentIndex<T extends { parentIndex: number }>(sorted: readonly T[], index: number): number to a shared module alongside UpPath (or core/tree/) and calling it from both removeChild sites. If the fast-path lands, the shared helper carries it once — one code change, two beneficiaries.

anthony-murphy and others added 2 commits May 27, 2026 20:02
@anthony-murphy
Copy link
Copy Markdown
Contributor Author

Deep Review

Reviewed commit a0d2d19 on 2026-06-02.

Readiness: 6/10 — MAKING PROGRESS

Not ready for sign-off. The two perf-prep changes (binary-search removeChild; O(1) revision→index side-map on SharedBranch) remain structurally correct, but the new getLocalCommitIndexByRevision translation arithmetic (absoluteIndex - localCommitsHeadOffset at editManager.ts:935-937) — the only non-trivial branch the PR adds — is uncovered by the two new tests; both exercise only offset == 0 paths. A wrong index there silently resubmits the wrong commit suffix via sharedTreeCore.ts:558-564. Previously flagged polish (dense-array fast-path, identity assert, accessor shape, helper duplication, benchmark numbers) all stand under their existing inline threads.

Path to Ready

  • Resolve inline threads
  • Update the PR description — "no tests touched" is stale (the anchor sparse-removal test and two new side-map tests already live in this PR, and the offset>0 test from the new inline thread will too)

Context for Reviewers

For human reviewer
Review history (3 prior reviews)
  • 87be3cb 2026-05-27 · 7/10 — binary-search removeChild + side-map both correct; polish on a small diff
  • 20bb41a 2026-05-27 · 7/10 — same two perf changes; dense-array fast-path and identity-assert flagged
  • 9b64f11 2026-05-27 · 7/10 — same two perf changes; five Tier 3 follow-ups flagged

assert.equal(localCommitsAfterRebase[1].revision, local2.revision);
assert.equal(manager.getLocalCommitIndexByRevision("main", local1.revision), 0);
assert.equal(manager.getLocalCommitIndexByRevision("main", local2.revision), 1);
});
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Deep Review: Both new tests cover only the localCommitsHeadOffset == 0 path, so the new translation arithmetic at editManager.ts:935-937 (absoluteIndex - this.localCommitsHeadOffset) — the only non-trivial branch this PR adds — is never exercised.

  • Test 1 ("returns the correct index after all local commits are sequenced and new commits are appended") sequences every local commit, which hits the drain-reset branch at editManager.ts:971-974 and sets localCommitsHeadOffset = 0. Subsequent lookups are on freshly appended commits where offset is again 0.
  • Test 2 ("returns the correct index after a non-append afterChange event rebuilds the side-map") hits the clear-and-rebuild branch at editManager.ts:752-756, which also resets localCommitsHeadOffset = 0.

No test reaches the state where localCommitsHeadOffset > 0 AND localCommits.length > 0 and then asserts that getLocalCommitIndexByRevision returns the right array index. That state is the entire reason the translation exists.

This is correctness-sensitive, not just coverage. The production caller sharedTreeCore.ts:558-564 (getEnrichedCommit) does localCommits.slice(revisionIndex) — a wrong index here silently resubmits the wrong commit range. #20675 established that resubmit is schema-encoding-sensitive, so a wrong slice means wrong commits resubmitted under the wrong schema. The historical pattern (#15746 afterChange discriminator bug; #15177 / #19272 reviewer pushback on parallel-to-localCommits structures) is exactly this side-structure-shadowing-an-array bug class.

Suggested test (~15 lines, using primitives already in the file): applyLocalCommit × 3, sequence only the first, assert getLocalCommitIndexByRevision returns 0 and 1 for the two remaining commits (matching getLocalCommits() array indices). Then applyLocalCommit a 4th and assert it lands at index 2.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant