Skip to content

replay: support replaying 2-parent merges#2106

Draft
dscho wants to merge 5 commits intogitgitgadget:masterfrom
dscho:support-merge-commits-in-git-history
Draft

replay: support replaying 2-parent merges#2106
dscho wants to merge 5 commits intogitgitgadget:masterfrom
dscho:support-merge-commits-in-git-history

Conversation

@dscho
Copy link
Copy Markdown
Member

@dscho dscho commented May 6, 2026

git history, the new history-rewriting builtin in v2.54, dies on any merge in the rewrite path with replaying merge commits is not supported yet!. That makes it not very useful for the workflows I (and many) actually have, where almost every interesting branch contains at least one merge of a feature topic. The natural fallback, git rebase --rebase-merges, is interactive and stops to ask for re-resolution even when no re-resolution is needed.

This series lifts that limitation for the common 2-parent case. The algorithm itself is not new: Elijah Newren wrote it down in his replay design notes and prototyped it in a 2022 work-in-progress sketch. What is new is wiring it into the libgit-extracted replay_revisions() API that backs both git replay and git history, plus three specific tweaks that make the trickier cases work where the WIP sketch bailed out: identical conflict-marker labels for the inner remerges of the original and the rewritten parents (so their conflict-markered trees compare equal in the regions the user did not touch), tolerating result.clean == 0 from those inner merges (their well-defined conflict-markered trees are valid inputs to the outer 3-way merge), and self-fallback for both merge parents combined with a pre-seed of the rev-range boundary into the rewritten-commit map. The relevant detail is in the algorithm commit's message.

The pre-seed plus self-fallback is what makes the genuinely hard topology work: a merge in the rewrite path whose first parent sits off the walk (a side branch brought in by an older merge into the boundary's ancestry, for example). Without self-fallback, the original first parent gets silently replanted at the rewritten boundary and the whole merge sprouts onto a different ancestry. That case is covered by a dedicated test in t3454-history-merges.sh.

Octopus merges and revert-of-merge are surfaced as explicit errors at the dispatch point. The split sub-command of git history continues to refuse when its target is a merge: split semantics simply do not apply there.

The replay propagates the textual diffs the user actually made in a merge commit; it does not extrapolate symbol-level intent. The most interesting limitation in that regard is captured in another t3454-history-merges.sh test: if a topic renamed harry() to hermione(), the merge commit manually renamed an existing caller from harry() to hermione(), and the replay introduces a brand-new caller of harry() via the rewritten parents, the new caller stays as harry(). That is intentional, not a regression; symbol-aware refactoring is out of scope here, just as it is for plain rebase. The xdiff special mode for matching conflict-marker hunks across inner remerges, the XDL_MERGE_FAVOR_BASE variant, and the modify/delete and binary-file specials that the design notes flag as future work all remain future work.

While I was at it, git history reword had a pre-existing silent-success bug: a positive return from replay_revisions() (which means "conflict, no updates queued") was treated as success. Obviously this should never occur, as a reword simply does not change any of the file contents, but bugs do happen. The merge-replay work is complex enough to make such a type of bug more likely, so the algorithm commit also reports those conflicts loudly.

I recommend start reading the message of the algorithm commit, as it describes the rationale. The fast-path commit immediately after it is the short-circuit for the dominant git history reword case, where parent and merge-base trees are unchanged; it is independent enough to be evaluated on its own. The historian test helper that follows is what drives the t3454-history-merges.sh scenarios; it leans on git fast-import for everything that matters and is intentionally tiny.

dscho added 2 commits May 6, 2026 00:35
`git history` (introduced in v2.54) and the underlying `git replay`
infrastructure both refused to walk past any commit with more than
one parent, dying with "replaying merge commits is not supported
yet!". For real history-rewriting work this is a showstopper: the
natural fallback `git rebase --rebase-merges` is interactive and
stops to ask for re-resolution even when no re-resolution is needed.

Elijah Newren spelled out a way to lift this limitation in his
replay-design-notes [1] and prototyped it in a 2022
work-in-progress sketch [2]. The idea is that a merge commit M on
parents (P1, P2) records both an automatic merge of those parents
AND any manual layer the author put on top of that automatic merge
(textual conflict resolution and any semantic edit outside conflict
markers). Replaying M onto rewritten parents (P1', P2') must
preserve that manual layer, but the rewritten parents change the
automatic merge, so a simple cherry-pick is wrong: the manual layer
would be re-introduced on top of stale auto-merge text.

What works instead is a three-way merge of three trees the existing
infrastructure already knows how to compute. Let R be the recursive
auto-merge of (P1, P2), O be M's actual tree and N be the recursive
auto-merge of (P1', P2'). Then `git diff R O` is morally
`git show --remerge-diff M`: it captures exactly what the author
added on top of the automatic merge. A non-recursive 3-way merge
with R as the merge base, O as side 1 and N as side 2 layers that
manual contribution onto the freshly auto-merged rewritten parents
(N) and produces the replayed tree.

Implement `pick_merge_commit()` along those lines and dispatch to it
from `replay_revisions()` when the commit being replayed has exactly
two parents. Two specific points (learned the hard way) keep
non-trivial cases working where the WIP sketch [2] bailed out.
First, R and N use identical `merge_options.branch1` and `branch2`
labels ("ours"/"theirs"). When the original parents conflicted on a
region of a file, both R and N produce textually identical conflict
markers; the outer non-recursive merge then sees N == R in that
region and the user's manual resolution from O wins cleanly. Without
this, the conflict-marker text would differ between R and N (because
the inner merges would label the conflicts differently), and the
outer merge would itself be unclean even when the user did supply a
clean resolution. Second, an unclean inner merge
(`result.clean == 0`) is _not_ fatal: the tree merge-ort produces in
that case still has well-defined contents (with conflict markers in
the conflicted files) and is a valid input to the outer
non-recursive merge. Only a real error (`< 0`) propagates as
failure.

The replay propagates the textual diffs the user actually made in M;
it does _not_ extrapolate symbol-level intent. If rewriting the
parents pulls in genuinely new content (for example, a brand-new
caller of a function that the merge renamed), that new content stays
as the rewritten parents have it. Symbol-aware refactoring is out of
scope here, just as it is for plain rebase.

Octopus merges (more than two parents) and revert-of-merge are not
supported and are surfaced as explicit errors at the dispatch point.
The "split" sub-command of `git history` continues to refuse when
the targeted commit is itself a merge: split semantics do not apply
to merges. The pre-walk gate in `builtin/history.c` that previously
rejected any merge in the rewrite path now only rejects octopus
merges; rename it accordingly.

A small refactor in `create_commit()` makes the merge case possible:
the helper now takes a `struct commit_list *parents` rather than a
single parent pointer and takes ownership of the list. The single
existing caller in `pick_regular_commit()` builds and passes a
one-element list; the new `pick_merge_commit()` builds a two-element
list, with the order of the `from` and `merge` parents preserved.

Update the negative expectations in t3451, t3452 and t3650 that were
asserting the now-retired "not supported yet" message, replacing
them with positive coverage where it fits. Octopus rejection and
revert-of-merge rejection are covered by new positive tests in
t3650. A dedicated test script with merge-replay scenarios driven by
a new test-tool fixture builder will follow in a subsequent commit.

[1] https://github.com/newren/git/blob/replay/replay-design-notes.txt
[2] newren@4c45e89

Helped-by: Elijah Newren <newren@gmail.com>
Assisted-by: Claude Opus 4.7
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
…hanged

For the common `git history reword` case the rewrite changes only
commit messages, so every commit on the line being replayed has the
same tree as before. When such a rewrite reaches a 2-parent merge
whose rewritten parents AND merge bases all carry the same trees as
the originals, the inner auto-merge of the rewritten parents (N) is
tree-equal to the inner auto-merge of the original parents (R), and
the outer 3-way merge with R as the merge base, the original merge
tree as side 1 and N as side 2 yields the original tree as result.

Detect this in `pick_merge_commit()` before doing any merge work and
write the new merge commit directly with the original tree and the
rewritten parents. This saves two recursive merges and one
non-recursive merge per merge commit on the rewrite path, which
dominates the cost of `git history reword` across histories with
many merges.

The merge-base trees must be checked too, in order. Tree-same
parents over a tree-different base could still produce a different
auto-merge (a conflict region that did not exist before, or vice
versa), and the original resolution would be inappropriate to apply.

To avoid recomputing the merge bases when the fast path does not
apply, both pairs are computed up front and the slow path that
follows reuses them.

Assisted-by: Claude Opus 4.7
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
@dscho dscho marked this pull request as draft May 6, 2026 01:25
dscho added 3 commits May 6, 2026 01:30
The merge-replay tests added in a follow-up commit need a way to set
up specific topologies with full control over blob contents, parent
order, and per-side trees. Sequencing plumbing commands or driving
plain `git fast-import` from shell quickly becomes unreadable for
the kinds of scenarios that exercise non-trivial merge resolution
(textual conflicts, semantic edits outside the conflict region,
intentional limitations such as new content on one side).

Add a small `test-tool historian` subcommand that reads a tight,
shell-quoted, one-line-per-object DSL and feeds an equivalent stream
to a `git fast-import` child process. Each blob and commit is given
a logical name; the helper allocates fast-import marks on first use
and emits a lightweight tag for every commit so tests can refer to
the resulting object via `refs/tags/<name>`.

The DSL has just two directives:

  blob NAME LINE...
  commit NAME BRANCH SUBJECT [from=NAME] [merge=NAME]... [PATH=BLOB]...

A blob's content is the listed lines joined with `\n` (and a final
`\n`); a commit's tree is exactly the listed PATH=BLOB pairs (the
helper emits a `deleteall` so nothing leaks in from the implicit
parent). Token splitting is delegated to `split_cmdline()` so quoted
arguments work as in shell. Marks for parent references and file
contents go through the same `strintmap`-backed name resolver, which
keeps the helper itself trivially small: blob writing, tree
construction, commit creation and merge-base computation are all
handled by `git fast-import`.

Note that the DSL reserves the names `from` and `merge` (with a
trailing `=`) for parent specification; a tree path called `from` or
`merge` cannot be expressed via this helper. That is acceptable here
because every input is a tightly controlled test fixture and the
filenames are chosen by the test author.

The helper trusts its caller: malformed input results in a
fast-import error rather than a friendly diagnostic.

Wire the new subcommand into the Makefile and meson build, register
it in `t/helper/test-tool.{c,h}`.

Assisted-by: Claude Opus 4.7
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Replace the blanket "does not (yet) work with histories that contain
merges" caveat now that 2-parent merges are supported via the R/O/N
algorithm. Spell out what works (the user's manual conflict
resolution and any semantic edits inside the merge are preserved
through the replay), what is intentionally out of scope (octopus
merges; symbol-level extrapolation when rewriting parents pulls in
genuinely new content), and what still requires interactive rebase
(merges that would actually conflict on replay).

Assisted-by: Claude Opus 4.7
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Add a dedicated test script for `git history reword` (and
`git replay` via the same code path) across 2-parent merges, using
the `test-tool historian` fixture builder so each scenario reads as
a small declarative recipe rather than a sequence of plumbing
commands.

The script exercises the cases that motivated the merge-replay
work:

  * a clean merge where each side touches unrelated files;

  * a non-trivial merge where the same line was changed on both
    sides and the user resolved by hand (textual manual resolution
    must be preserved through the replay);

  * a non-trivial merge where the user also touched a line outside
    any conflict region (a "semantic" edit must also be preserved
    through the replay);

  * an octopus merge in the rewrite path, which is rejected;

  * a function rename across the merge with a brand-new caller
    introduced by the rewritten parents. The pre-existing caller
    that the user manually renamed in the original merge must keep
    its rename, and the brand-new caller must _not_ be rewritten
    (calvin/hobbes naming chosen for legibility). This second part
    is the documented limitation: the replay propagates the textual
    diffs the user actually made, it does not extrapolate
    symbol-level intent. Symbol-aware refactoring is out of scope,
    just as it is for plain rebase.

The fixture builder lets each scenario sit in roughly a dozen lines
of historian directives plus the assertions, which keeps the test
file readable when more scenarios are added later.

Assisted-by: Claude Opus 4.7
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
@dscho dscho force-pushed the support-merge-commits-in-git-history branch from 4fcb053 to 7f73036 Compare May 6, 2026 01:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant