Skip to content

Commit e5d2f58

Browse files
committed
Documentation/technical: add merge-base doc
Add a technical document describing merge-base computation and specifically paint_down_to_common() implementation. Signed-off-by: Kristofer Karlsson <krka@spotify.com>
1 parent c37e4a9 commit e5d2f58

3 files changed

Lines changed: 132 additions & 0 deletions

File tree

Documentation/Makefile

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -129,6 +129,7 @@ TECH_DOCS += technical/long-running-process-protocol
129129
TECH_DOCS += technical/multi-pack-index
130130
TECH_DOCS += technical/packfile-uri
131131
TECH_DOCS += technical/pack-heuristics
132+
TECH_DOCS += technical/paint-down-to-common
132133
TECH_DOCS += technical/parallel-checkout
133134
TECH_DOCS += technical/partial-clone
134135
TECH_DOCS += technical/platform-support

Documentation/technical/meson.build

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -18,6 +18,7 @@ articles = [
1818
'multi-pack-index.adoc',
1919
'packfile-uri.adoc',
2020
'pack-heuristics.adoc',
21+
'paint-down-to-common.adoc',
2122
'parallel-checkout.adoc',
2223
'partial-clone.adoc',
2324
'platform-support.adoc',
Lines changed: 130 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,130 @@
1+
Merge-Base Computation and paint_down_to_common()
2+
==================================================
3+
4+
The function `paint_down_to_common()` in `commit-reach.c` computes merge
5+
bases by walking the commit graph backwards from two sets of tips and
6+
finding where their ancestry meets.
7+
8+
Use cases
9+
---------
10+
11+
Computing merge bases is used in two different ways:
12+
13+
1. *Finding all merge bases* (`merge-base --all`, `merge-tree`,
14+
`merge`, `rebase`). A merge base is a common ancestor that is
15+
not itself an ancestor of another common ancestor.
16+
17+
2. *Ancestry checks* (`in_merge_bases`, used by `merge-base
18+
--is-ancestor`, `branch -d`, `fetch`). These ask: "is commit A
19+
an ancestor of commit B?" If a common ancestor equals one of the
20+
inputs, that input is necessarily the only merge base -- no other
21+
common ancestor can be both as recent and not an ancestor of it.
22+
23+
Both use cases share the same algorithm and implementation.
24+
25+
Algorithm
26+
---------
27+
28+
Given a commit `one` and a set of commits `twos[]`, the walk paints
29+
commits with two colors:
30+
31+
- PARENT1: reachable from `one`
32+
- PARENT2: reachable from any commit in `twos[]`
33+
34+
The walk uses a priority queue ordered by generation number (falling
35+
back to commit date when generation numbers are unavailable). Each
36+
step dequeues the highest-priority commit (this is when we say a
37+
commit is "visited") and propagates its paint flags to its parents,
38+
enqueuing them if they gained new flags. When a commit receives
39+
both PARENT1 and PARENT2, it is a merge-base candidate. A candidate
40+
gains the STALE flag so its ancestors propagate staleness -- any
41+
deeper common ancestor is necessarily redundant.
42+
43+
INFINITY and finite generation regions
44+
--------------------------------------
45+
46+
The commit-graph stores a generation number for each commit. Commits
47+
not in the commit-graph have generation `GENERATION_NUMBER_INFINITY`. The
48+
graph is closed under reachability: if a commit is in the graph, all
49+
its ancestors are too. This partitions the commit graph into two regions:
50+
51+
....
52+
+---------------------------------------+
53+
| INFINITY region |
54+
| generation = INFINITY |
55+
| queue order: heuristic (commit date) |
56+
+---------------------------------------+
57+
|
58+
v
59+
+---------------------------------------+
60+
| Finite region |
61+
| generation = finite |
62+
| queue order: topological |
63+
+---------------------------------------+
64+
....
65+
66+
When the commit-graph is enabled, the INFINITY region is typically
67+
very small -- it only contains commits added since the last
68+
commit-graph refresh.
69+
70+
All reachable INFINITY-generation commits are visited before any
71+
finite-generation commit, because INFINITY is larger than any finite
72+
value. Once the walk crosses into the finite region, it stays there.
73+
74+
In the finite region, generation ordering guarantees topological
75+
traversal: children are always visited before their parents. This
76+
means that paint on already-visited commits is final -- no future
77+
traversal step can add paint to them.
78+
79+
In the INFINITY region, commit-date ordering can violate this: a
80+
parent with a later date can be visited before a child with an earlier
81+
date. Paint flags are therefore NOT final at visit time, and a
82+
commit visited with only one side's paint may later gain the other.
83+
84+
Paint flags are only added, never removed. Since each flag can be set
85+
at most once per commit, the number of times a commit can be
86+
re-enqueued is bounded by the number of flag transitions.
87+
88+
Termination
89+
-----------
90+
91+
Termination happens when we can prove that no extra progress is
92+
possible. We are done with the main loop when one of the following
93+
conditions holds:
94+
95+
1. The queue is empty.
96+
2. The queue only contains STALE entries.
97+
3. Side-exhaustion: the walk has reached the finite region and one
98+
of the sides is fully exhausted.
99+
100+
The loop waits for all pending merge-base candidates to be popped
101+
and recorded before any early exit fires, so no separate drain phase
102+
is needed after termination.
103+
104+
Stale entry condition
105+
~~~~~~~~~~~~~~~~~~~~~
106+
If all entries are stale we cannot find any new merge bases since
107+
that requires at least one enqueued side node meeting the other side.
108+
However, we could still invalidate merge bases (if there are more
109+
than one). This is unnecessary since `remove_redundant()` will clean
110+
that up as a post-process step.
111+
112+
Side-exhaustion
113+
~~~~~~~~~~~~~~~
114+
A commit is *exclusive* to one side if it carries that side's paint
115+
but not the other (e.g. PARENT1 without PARENT2).
116+
117+
If we have reached the finite region of the graph, no future
118+
traversal step can add paint to an already-visited commit. Thus if
119+
there are no exclusive PARENT2 commits in the queue, no additional
120+
PARENT2 paint can be introduced into the walk. Even if exclusive
121+
PARENT1 commits remain, no new merge-base candidates can be
122+
discovered. The same holds symmetrically for PARENT1.
123+
124+
This invariant is only valid in the finite region of the graph.
125+
126+
Related documentation
127+
---------------------
128+
129+
- `Documentation/technical/commit-graph.adoc` -- generation numbers
130+
and the reachability closure property.

0 commit comments

Comments
 (0)