Skip to content

Commit 24bdb43

Browse files
committed
repack: demote .baddeltas packs into geometric rollup
A pack-aggregate run rolls many small packs up into one aggregator output and writes a sibling `.baddeltas` marker, since the new pack reuses raw delta-chains from its inputs without any cross-input delta search. That marker tells future `pack-objects` runs over the pack to treat its existing deltas as stale and recompute new ones. The aggregator deliberately does not install a `.keep` marker on its output: we want the next opportunity to re-delta its contents to roll up the pack like any other. But a `git repack --geometric=N` pass walks the packs sorted ascending by object count and only rolls up packs below the geometric split; anything already at the heavy end of the progression sits above the split and is kept as-is. Aggregated packs could potentially be large enough to land there, which may leave a `.baddeltas` pack above the split for many subsequent geometric passes resulting in the pack not getting repacked and keeping its stale deltas. Teach `pack_geometry_split()` to do a post-pass demotion of any `.baddeltas` packs from the kept region into the rollup region. The demotion preserves the ascending sort order of the kept region so that `pack_geometry_preferred_pack()` continues to return the largest local kept pack. Demotion is unconditional: a `.baddeltas` marker is an explicit "these deltas need redoing" tag from a previous repack step, and the next geometric pass is the cheapest place to honor it. `.keep`-marked packs are still excluded from the rollup by the existing `pack_kept_objects` / `remove_redundant_packs` paths, so a user who really wants to pin a `.baddeltas` pack in place has that escape hatch. Add two tests to t5336: one that confirms a `.baddeltas` pack above the natural split is rolled up and its marker disappears from the resulting pack, and a control test that confirms a non-`.baddeltas` pack above the split is still preserved. Assisted-by: Claude Opus 4.7 Signed-off-by: Elijah Newren <newren@gmail.com>
1 parent 4a2a398 commit 24bdb43

2 files changed

Lines changed: 129 additions & 1 deletion

File tree

repack-geometry.c

Lines changed: 53 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -187,13 +187,66 @@ static uint32_t compute_pack_geometry_split(struct packed_git **pack, size_t pac
187187
return split;
188188
}
189189

190+
/*
191+
* Move any packs marked with a `.baddeltas` sidecar from the kept
192+
* region of `pack[]` into the rollup region. A `.baddeltas` marker
193+
* declares that the pack's existing deltas are stale (e.g. because
194+
* the pack was assembled by `git pack-aggregate`, which concatenates
195+
* input packs without performing a fresh delta search). We do not
196+
* want such packs to sit on the geometric ladder indefinitely:
197+
* rolling them up forces the next `pack-objects` invocation to
198+
* recompute deltas across their objects, while leaving any
199+
* `.keep`-protected pack alone (those are filtered out by
200+
* `pack_geometry_init()`).
201+
*
202+
* Demotion preserves the ascending sort order of the kept region so
203+
* that `pack_geometry_preferred_pack()` continues to return the
204+
* largest local kept pack.
205+
*/
206+
static void demote_bad_delta_packs(struct pack_geometry *geometry)
207+
{
208+
struct packed_git **demoted, **retained;
209+
uint32_t d_nr = 0, r_nr = 0, kept_nr;
210+
211+
if (geometry->split == geometry->pack_nr)
212+
return;
213+
214+
kept_nr = geometry->pack_nr - geometry->split;
215+
for (uint32_t k = geometry->split; k < geometry->pack_nr; k++) {
216+
if (geometry->pack[k]->has_bad_deltas)
217+
d_nr++;
218+
}
219+
if (!d_nr)
220+
return;
221+
222+
ALLOC_ARRAY(demoted, d_nr);
223+
ALLOC_ARRAY(retained, kept_nr - d_nr);
224+
d_nr = 0;
225+
for (uint32_t k = geometry->split; k < geometry->pack_nr; k++) {
226+
if (geometry->pack[k]->has_bad_deltas)
227+
demoted[d_nr++] = geometry->pack[k];
228+
else
229+
retained[r_nr++] = geometry->pack[k];
230+
}
231+
232+
memcpy(&geometry->pack[geometry->split], demoted,
233+
d_nr * sizeof(*geometry->pack));
234+
memcpy(&geometry->pack[geometry->split + d_nr], retained,
235+
r_nr * sizeof(*geometry->pack));
236+
geometry->split += d_nr;
237+
238+
free(demoted);
239+
free(retained);
240+
}
241+
190242
void pack_geometry_split(struct pack_geometry *geometry)
191243
{
192244
geometry->split = compute_pack_geometry_split(geometry->pack, geometry->pack_nr,
193245
geometry->split_factor);
194246
geometry->promisor_split = compute_pack_geometry_split(geometry->promisor_pack,
195247
geometry->promisor_pack_nr,
196248
geometry->split_factor);
249+
demote_bad_delta_packs(geometry);
197250
for (uint32_t i = 0; i < geometry->split; i++) {
198251
struct packed_git *p = geometry->pack[i];
199252
/*

t/t5336-pack-aggregate.sh

Lines changed: 76 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -406,12 +406,17 @@ test_expect_success 'startup cleans up stale .keep markers from dead pids' '
406406
)
407407
'
408408

409-
test_expect_success 'startup leaves live-pid .keep markers alone' '
409+
test_expect_success !MINGW 'startup leaves live-pid .keep markers alone' '
410410
test_when_finished "rm -fr work" &&
411411
cp -R repo work &&
412412
# Use the test-runner shell pid, which is reliably alive
413413
# for the duration of this test. PID 1 (init) would also
414414
# work since our code treats EPERM as "alive".
415+
#
416+
# !MINGW: on Windows, bash $$ is a virtualized MSYS pid that
417+
# does not correspond to a process kill(pid, 0) can see, so
418+
# our liveness check would treat it as dead and the marker
419+
# would be removed. Skip this test there.
415420
live_pid=$$ &&
416421
(
417422
cd work &&
@@ -430,4 +435,74 @@ test_expect_success 'startup leaves live-pid .keep markers alone' '
430435
)
431436
'
432437

438+
# ---- geometric .baddeltas demotion ----
439+
440+
# Helper for the geometric tests: pack a single unique blob into its
441+
# own packfile. The `tag` parameter must differ between callers so
442+
# that the blob contents do not collide with one another (otherwise
443+
# `git pack-objects --stdin-packs` would see the "rollup" objects
444+
# already present in the kept pack and emit nothing).
445+
make_unique_pack () {
446+
tag=$1 &&
447+
blob=$(echo "unique-${tag}-$$" | git hash-object -w --stdin) &&
448+
echo "$blob" |
449+
git pack-objects --window=0 .git/objects/pack/pack >/dev/null &&
450+
git prune-packed
451+
}
452+
453+
test_expect_success 'geometric repack demotes .baddeltas packs into rollup' '
454+
test_when_finished "rm -fr work" &&
455+
cp -R repo work &&
456+
(
457+
cd work &&
458+
# Build several small packs and consolidate them into
459+
# one larger pack that would normally sit above the
460+
# geometric split.
461+
build_n_packs 8 >/dev/null &&
462+
git repack -d --geometric=2 &&
463+
test 1 -eq "$(count_packs)" &&
464+
big=$(ls .git/objects/pack/pack-*.pack) &&
465+
# Drop a .baddeltas marker on it (simulating an
466+
# aggregator output).
467+
>"${big%.pack}.baddeltas" &&
468+
test 1 -eq "$(count_baddeltas)" &&
469+
# Add small packs that are individually far smaller
470+
# than the big one; ordinarily the big pack would be
471+
# kept above the geometric split.
472+
make_unique_pack a &&
473+
make_unique_pack b &&
474+
make_unique_pack c &&
475+
test 4 -eq "$(count_packs)" &&
476+
# With the .baddeltas demotion, the big pack is rolled
477+
# up despite being above the natural split, and the
478+
# resulting pack carries no .baddeltas marker.
479+
git repack -d --geometric=2 &&
480+
test 1 -eq "$(count_packs)" &&
481+
test 0 -eq "$(count_baddeltas)" &&
482+
git fsck
483+
)
484+
'
485+
486+
test_expect_success 'geometric repack leaves non-baddeltas packs above the split alone' '
487+
test_when_finished "rm -fr work" &&
488+
cp -R repo work &&
489+
(
490+
cd work &&
491+
build_n_packs 8 >/dev/null &&
492+
git repack -d --geometric=2 &&
493+
test 1 -eq "$(count_packs)" &&
494+
# Add small packs; without any .baddeltas marker the
495+
# big pack should be preserved above the split.
496+
make_unique_pack a &&
497+
make_unique_pack b &&
498+
make_unique_pack c &&
499+
test 4 -eq "$(count_packs)" &&
500+
git repack -d --geometric=2 &&
501+
# 1 kept big pack + 1 newly-aggregated rollup = 2.
502+
test 2 -eq "$(count_packs)" &&
503+
test 0 -eq "$(count_baddeltas)" &&
504+
git fsck
505+
)
506+
'
507+
433508
test_done

0 commit comments

Comments
 (0)