Skip to content

Commit 174f35c

Browse files
committed
Merge branch 'mm/diff-process-hunks' into seen
A new `diff.<driver>.process` configuration has been introduced to allow a long-running external process to act as a hunk provider to allows external tools to control which lines Git considers changed while leaving all output formatting (word diff, color, blame, etc.) to Git's standard pipeline. * mm/diff-process-hunks: blame: consult diff process for no-hunk detection diff: bypass diff process with --no-ext-diff and in format-patch diff: add long-running diff process via diff.<driver>.process sub-process: separate process lifecycle from hashmap management userdiff: add diff.<driver>.process config xdiff: support external hunks via xpparam_t
2 parents 85d1a4c + 12bdd74 commit 174f35c

24 files changed

Lines changed: 1368 additions & 21 deletions

Documentation/config/diff.adoc

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -218,6 +218,11 @@ endif::git-diff[]
218218
Set this option to `true` to make the diff driver cache the text
219219
conversion outputs. See linkgit:gitattributes[5] for details.
220220

221+
`diff.<driver>.process`::
222+
The command to run as a long-running diff process that
223+
provides hunks to Git's diff pipeline.
224+
See linkgit:gitattributes[5] for details.
225+
221226
`diff.indentHeuristic`::
222227
Set this option to `false` to disable the default heuristics
223228
that shift diff hunk boundaries to make patches easier to read.

Documentation/diff-algorithm-option.adoc

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -18,3 +18,6 @@
1818
For instance, if you configured the `diff.algorithm` variable to a
1919
non-default value and want to use the default one, then you
2020
have to use `--diff-algorithm=default` option.
21+
+
22+
If you explicitly choose a diff algorithm, it also bypasses
23+
`diff.<driver>.process` (see linkgit:gitattributes[5]).

Documentation/diff-options.adoc

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -833,7 +833,9 @@ endif::git-format-patch[]
833833
to use this option with linkgit:git-log[1] and friends.
834834

835835
`--no-ext-diff`::
836-
Disallow external diff drivers.
836+
Disallow external diff helpers, including
837+
`diff.<driver>.command` and `diff.<driver>.process`
838+
(see linkgit:gitattributes[5]).
837839

838840
`--textconv`::
839841
`--no-textconv`::

Documentation/gitattributes.adoc

Lines changed: 139 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -821,6 +821,145 @@ NOTE: If `diff.<name>.command` is defined for path with the
821821
(see above), and adding `diff.<name>.algorithm` has no effect, as the
822822
algorithm is not passed to the external diff driver.
823823

824+
Using an external diff process
825+
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
826+
827+
If `diff.<name>.process` is defined, Git sends the old and new file
828+
content to an external tool and receives back a list of changed
829+
regions (pairs of line ranges in the old and new file). Git uses
830+
these instead of its builtin diff algorithm, but still controls
831+
all output formatting, so features like word diff, function context,
832+
color, and blame work normally. This is achieved by using the
833+
long-running process protocol (described in
834+
Documentation/technical/long-running-process-protocol.adoc).
835+
Unlike `diff.<name>.command`, which replaces Git's output entirely,
836+
the diff process feeds results back into the standard pipeline.
837+
838+
First, in `.gitattributes`, assign the `diff` attribute for paths.
839+
840+
------------------------
841+
*.c diff=cdiff
842+
------------------------
843+
844+
Then, define a "diff.<name>.process" configuration to specify
845+
the diff process command.
846+
847+
----------------------------------------------------------------
848+
[diff "cdiff"]
849+
process = /path/to/diff-process-tool
850+
----------------------------------------------------------------
851+
852+
When Git encounters the first file that needs to be diffed, it starts
853+
the process and performs the handshake. In the handshake, the welcome
854+
message sent by Git is "git-diff-client", only version 1 is supported,
855+
and the supported capability is "hunks" (the changed regions
856+
described below).
857+
858+
For each file, Git sends a list of "key=value" pairs terminated with
859+
a flush packet, followed by the old and new file content as packetized
860+
data, each terminated with a flush packet. The pathname is relative
861+
to the repository root. When `diff.<name>.textconv` is also set,
862+
the tool receives the textconv-transformed content rather than the
863+
raw blob. Git does not send binary files to the diff process.
864+
865+
-----------------------
866+
packet: git> command=hunks
867+
packet: git> pathname=path/file.c
868+
packet: git> 0000
869+
packet: git> OLD_CONTENT
870+
packet: git> 0000
871+
packet: git> NEW_CONTENT
872+
packet: git> 0000
873+
-----------------------
874+
875+
The tool is expected to respond with zero or more hunk lines,
876+
a flush packet, and a status packet terminated with a flush packet.
877+
Each hunk line has the form:
878+
879+
`hunk <old_start> <old_count> <new_start> <new_count>`
880+
881+
where `<old_start>` and `<old_count>` identify a range of lines in
882+
the old file, and `<new_start>` and `<new_count>` identify the
883+
replacement range in the new file. Start values are 1-based and
884+
counts are non-negative. Ranges must not extend beyond the end of
885+
the file. For example, `hunk 3 2 3 4` means that 2 lines starting
886+
at line 3 in the old file were replaced by 4 lines starting at
887+
line 3 in the new file. An `<old_count>` of 0 means no lines were
888+
removed (pure insertion); a `<new_count>` of 0 means no lines were
889+
added (pure deletion).
890+
891+
Lines are delimited by newlines. A file `"foo\nbar\n"` and a
892+
file `"foo\nbar"` both have 2 lines.
893+
894+
Hunks must be listed in order and must not overlap. Any line
895+
not covered by a hunk is treated as unchanged, so the total
896+
number of unchanged lines must be the same on both sides.
897+
For example, if the old file has 10 lines and the hunks cover
898+
4 of them (`old_count` values summing to 4), then 6 old lines
899+
are unchanged. The new file must also have exactly 6 lines
900+
not covered by hunks, so the `new_count` values must sum to
901+
`new_file_lines - 6`.
902+
903+
-----------------------
904+
packet: git< hunk 1 3 1 5
905+
packet: git< hunk 10 2 12 2
906+
packet: git< 0000
907+
packet: git< status=success
908+
packet: git< 0000
909+
-----------------------
910+
911+
If the tool responds with hunks and "success", Git marks those lines
912+
as changed and feeds them into the standard diff pipeline. Patch
913+
output features (word diff, function context, color) work normally.
914+
Note that `--stat` and other summary formats use their own diff path
915+
and are not affected by the diff process.
916+
917+
If no hunk lines precede the flush, followed by "success", Git
918+
treats the files as having no changes: `git diff` produces no output
919+
and `git blame` skips the commit, attributing lines to earlier commits.
920+
921+
-----------------------
922+
packet: git< 0000
923+
packet: git< status=success
924+
packet: git< 0000
925+
-----------------------
926+
927+
If the tool returns invalid hunks (out of bounds, overlapping), Git
928+
silently falls back to the builtin diff algorithm.
929+
930+
In case the tool cannot or does not want to process the content,
931+
it is expected to respond with an "error" status. Git warns and
932+
falls back to the builtin diff algorithm for this file. The tool
933+
remains available for subsequent files.
934+
935+
-----------------------
936+
packet: git< 0000
937+
packet: git< status=error
938+
packet: git< 0000
939+
-----------------------
940+
941+
In case the tool cannot or does not want to process the content as
942+
well as any future content for the lifetime of the Git process, it
943+
is expected to respond with an "abort" status. Git silently falls
944+
back to the builtin diff algorithm for this file and does not send
945+
further requests to the tool.
946+
947+
-----------------------
948+
packet: git< 0000
949+
packet: git< status=abort
950+
packet: git< 0000
951+
-----------------------
952+
953+
If the tool dies during the communication or does not adhere to the
954+
protocol then Git will stop the process and fall back to the builtin
955+
diff algorithm. Git warns once and does not restart the process for
956+
subsequent files.
957+
958+
Tools should ignore unknown keys in the per-file request to remain
959+
forward-compatible. Future versions of Git may send additional
960+
`command=` values; tools that receive an unrecognized command should
961+
respond with `status=error` rather than terminating.
962+
824963
Defining a custom hunk-header
825964
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
826965

Makefile

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1141,6 +1141,7 @@ LIB_OBJS += diff-delta.o
11411141
LIB_OBJS += diff-merges.o
11421142
LIB_OBJS += diff-lib.o
11431143
LIB_OBJS += diff-no-index.o
1144+
LIB_OBJS += diff-process.o
11441145
LIB_OBJS += diff.o
11451146
LIB_OBJS += diffcore-break.o
11461147
LIB_OBJS += diffcore-delta.o

blame.c

Lines changed: 31 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -19,6 +19,8 @@
1919
#include "tag.h"
2020
#include "trace2.h"
2121
#include "blame.h"
22+
#include "diff-process.h"
23+
#include "xdiff-interface.h"
2224
#include "alloc.h"
2325
#include "commit-slab.h"
2426
#include "bloom.h"
@@ -314,17 +316,25 @@ static struct commit *fake_working_tree_commit(struct repository *r,
314316

315317

316318

317-
static int diff_hunks(mmfile_t *file_a, mmfile_t *file_b,
318-
xdl_emit_hunk_consume_func_t hunk_func, void *cb_data, int xdl_opts)
319+
static int diff_hunks_xpp(mmfile_t *file_a, mmfile_t *file_b,
320+
xdl_emit_hunk_consume_func_t hunk_func,
321+
void *cb_data, xpparam_t *xpp)
319322
{
320-
xpparam_t xpp = {0};
321323
xdemitconf_t xecfg = {0};
322324
xdemitcb_t ecb = {NULL};
323325

324-
xpp.flags = xdl_opts;
325326
xecfg.hunk_func = hunk_func;
326327
ecb.priv = cb_data;
327-
return xdi_diff(file_a, file_b, &xpp, &xecfg, &ecb);
328+
return xdi_diff(file_a, file_b, xpp, &xecfg, &ecb);
329+
}
330+
331+
static int diff_hunks(mmfile_t *file_a, mmfile_t *file_b,
332+
xdl_emit_hunk_consume_func_t hunk_func, void *cb_data, int xdl_opts)
333+
{
334+
xpparam_t xpp = {0};
335+
336+
xpp.flags = xdl_opts;
337+
return diff_hunks_xpp(file_a, file_b, hunk_func, cb_data, &xpp);
328338
}
329339

330340
static const char *get_next_line(const char *start, const char *end)
@@ -1943,6 +1953,7 @@ static void pass_blame_to_parent(struct blame_scoreboard *sb,
19431953
struct blame_origin *parent, int ignore_diffs)
19441954
{
19451955
mmfile_t file_p, file_o;
1956+
xpparam_t xpp = {0};
19461957
struct blame_chunk_cb_data d;
19471958
struct blame_entry *newdest = NULL;
19481959

@@ -1961,10 +1972,21 @@ static void pass_blame_to_parent(struct blame_scoreboard *sb,
19611972
&sb->num_read_blob, ignore_diffs);
19621973
sb->num_get_patch++;
19631974

1964-
if (diff_hunks(&file_p, &file_o, blame_chunk_cb, &d, sb->xdl_opts))
1965-
die("unable to generate diff (%s -> %s)",
1966-
oid_to_hex(&parent->commit->object.oid),
1967-
oid_to_hex(&target->commit->object.oid));
1975+
xpp.flags = sb->xdl_opts;
1976+
/*
1977+
* If the diff process considers the files equivalent,
1978+
* skip the diff so blame looks past this commit.
1979+
*/
1980+
if (diff_process_fill_hunks(&sb->revs->diffopt, target->path,
1981+
&file_p, &file_o, &xpp)
1982+
!= DIFF_PROCESS_EQUIVALENT) {
1983+
if (diff_hunks_xpp(&file_p, &file_o, blame_chunk_cb,
1984+
&d, &xpp))
1985+
die("unable to generate diff (%s -> %s)",
1986+
oid_to_hex(&parent->commit->object.oid),
1987+
oid_to_hex(&target->commit->object.oid));
1988+
}
1989+
free(xpp.external_hunks);
19681990
/* The rest are the same as the parent */
19691991
blame_chunk(&d.dstq, &d.srcq, INT_MAX, d.offset, INT_MAX, 0,
19701992
parent, target, 0);

builtin/log.c

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2217,6 +2217,13 @@ int cmd_format_patch(int argc,
22172217
if (argc > 1)
22182218
die(_("unrecognized argument: %s"), argv[1]);
22192219

2220+
/*
2221+
* Disable diff.<driver>.process so that patches generated by
2222+
* format-patch are always based on the builtin diff algorithm
2223+
* and can be applied reliably.
2224+
*/
2225+
rev.diffopt.flags.no_diff_process = 1;
2226+
22202227
if (rev.diffopt.output_format & DIFF_FORMAT_NAME)
22212228
die(_("--name-only does not make sense"));
22222229
if (rev.diffopt.output_format & DIFF_FORMAT_NAME_STATUS)

0 commit comments

Comments
 (0)