Skip to content
Open
4 changes: 4 additions & 0 deletions Documentation/git-backfill.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -80,6 +80,10 @@ OPTIONS
+
You may also use commit-limiting options understood by
linkgit:git-rev-list[1] such as `--first-parent`, `--since`, or pathspecs.
+
Most `--filter=<spec>` options don't work with the purpose of
`git backfill`, but the `sparse:<oid>` filter is integrated to provide a
focused set of paths to download, distinct from the `--sparse` option.

SEE ALSO
--------
Expand Down
8 changes: 5 additions & 3 deletions Documentation/git-pack-objects.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -402,9 +402,11 @@ will be automatically changed to version `1`.
of filenames that cause collisions in Git's default name-hash
algorithm.
+
Incompatible with `--delta-islands`, `--shallow`, or `--filter`. The
`--use-bitmap-index` option will be ignored in the presence of
`--path-walk.`
Incompatible with `--delta-islands`. The `--use-bitmap-index` option is
ignored in the presence of `--path-walk`. Whe `--path-walk` option
supports the `--filter=<spec>` form `blob:none`, `blob:limit=<n>`,
`tree:0`, `object:type=<type>`, and `sparse:<oid>`. These supported filter
types can be combined with the `combine:<spec>+<spec>` form.


DELTA ISLANDS
Expand Down
8 changes: 5 additions & 3 deletions builtin/backfill.c
Original file line number Diff line number Diff line change
Expand Up @@ -96,9 +96,10 @@ static void reject_unsupported_rev_list_options(struct rev_info *revs)
if (revs->explicit_diff_merges)
die(_("'%s' cannot be used with 'git backfill'"),
"--diff-merges");
if (revs->filter.choice)
die(_("'%s' cannot be used with 'git backfill'"),
"--filter");
if (!path_walk_filter_compatible(&revs->filter))
die(_("cannot backfill with these filter options"));
if (revs->filter.blob_limit_value)
die(_("cannot backfill with blob size limits"));
}

static int do_backfill(struct backfill_context *ctx)
Expand All @@ -108,6 +109,7 @@ static int do_backfill(struct backfill_context *ctx)

if (ctx->sparse) {
CALLOC_ARRAY(info.pl, 1);
info.pl_sparse_trees = 1;
if (get_sparse_checkout_patterns(info.pl)) {
path_walk_info_clear(&info);
return error(_("problem loading sparse-checkout"));
Expand Down
9 changes: 4 additions & 5 deletions builtin/pack-objects.c
Original file line number Diff line number Diff line change
Expand Up @@ -4777,6 +4777,8 @@ static void get_object_list_path_walk(struct rev_info *revs)
result = walk_objects_by_path(&info);
trace2_region_leave("pack-objects", "path-walk", revs->repo);

path_walk_info_clear(&info);

if (result)
die(_("failed to pack objects via path-walk"));
}
Expand Down Expand Up @@ -5177,7 +5179,7 @@ int cmd_pack_objects(int argc,

if (path_walk) {
const char *option = NULL;
if (filter_options.choice)
if (!path_walk_filter_compatible(&filter_options))
option = "--filter";
else if (use_delta_islands)
option = "--delta-islands";
Expand All @@ -5190,10 +5192,7 @@ int cmd_pack_objects(int argc,
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Junio C Hamano wrote on the Git mailing list (how to reply to this email):

"Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes:

> From: Derrick Stolee <stolee@gmail.com>
>
> When 'git pack-objects' has the --path-walk option enabled, it uses a
> different set of revision walk parameters than normal. For once,

"once" -> "one" (or "instance")?

> --objects was previously assumed by the path-walk API and was not needed
> to be added. We also needed --boundary to allow discovering
> UNINTERESTING objects to use as delta bases.
>
> We will be updating the path-walk API soon to work with some filter
> options. However, the revision machinery will trigger a fatal error:
>
>   fatal: object filtering requires --objects
>
> The fix is easy: add the --objects option as an argument. This has no
> effect on the path-walk API but does simplify the revision option
> parsing for the objects filter.
>
> We can remove the comment about "removing" the options because they were
> never removed and instead not added. We still need to disable using
> bitmaps.

In the old code, there was a valid reason why bitmaps were not used
(i.e., "--objects" not enabled), but that no longer holds (i.e., now
we add "--objects" ourselves).  Do we need to give an updated
rationale to keep bitmap disabled?

> Signed-off-by: Derrick Stolee <stolee@gmail.com>
> ---
>  builtin/pack-objects.c | 5 +----
>  1 file changed, 1 insertion(+), 4 deletions(-)
>
> diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
> index dd2480a73d..4338962904 100644
> --- a/builtin/pack-objects.c
> +++ b/builtin/pack-objects.c
> @@ -5190,10 +5190,7 @@ int cmd_pack_objects(int argc,
>  	}
>  	if (path_walk) {
>  		strvec_push(&rp, "--boundary");
> -		 /*
> -		  * We must disable the bitmaps because we are removing
> -		  * the --objects / --objects-edge[-aggressive] options.
> -		  */
> +		strvec_push(&rp, "--objects");
>  		use_bitmap_index = 0;
>  	} else if (thin) {
>  		use_internal_rev_list = 1;

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Derrick Stolee wrote on the Git mailing list (how to reply to this email):

On 5/3/2026 8:49 PM, Junio C Hamano wrote:
> "Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes:
> 
>> From: Derrick Stolee <stolee@gmail.com>
>>
>> When 'git pack-objects' has the --path-walk option enabled, it uses a
>> different set of revision walk parameters than normal. For once,
> 
> "once" -> "one" (or "instance")?

Yes, "one". Sorry for the typo.

>> --objects was previously assumed by the path-walk API and was not needed
>> to be added. We also needed --boundary to allow discovering
>> UNINTERESTING objects to use as delta bases.
>>
>> We will be updating the path-walk API soon to work with some filter
>> options. However, the revision machinery will trigger a fatal error:
>>
>>   fatal: object filtering requires --objects
>>
>> The fix is easy: add the --objects option as an argument. This has no
>> effect on the path-walk API but does simplify the revision option
>> parsing for the objects filter.
>>
>> We can remove the comment about "removing" the options because they were
>> never removed and instead not added. We still need to disable using
>> bitmaps.
> 
> In the old code, there was a valid reason why bitmaps were not used
> (i.e., "--objects" not enabled), but that no longer holds (i.e., now
> we add "--objects" ourselves).  Do we need to give an updated
> rationale to keep bitmap disabled?

>>  	if (path_walk) {
>>  		strvec_push(&rp, "--boundary");
>> -		 /*
>> -		  * We must disable the bitmaps because we are removing
>> -		  * the --objects / --objects-edge[-aggressive] options.
>> -		  */
>> +		strvec_push(&rp, "--objects");
>>  		use_bitmap_index = 0;
>>  	} else if (thin) {
This old comment is perhaps confusing things. The important thing here
is to disable bitmaps with 'use_bitmap_index = 0;' (though perhaps not
for long [1]).

[1] https://lore.kernel.org/git/f50f8df01a9f216d5b4388b2fe4ff58077b574f3.1777853408.git.me@ttaylorr.com/

The path-walk API itself disables the objects walk for the revision
machinery in walk_objects_by_path():

	info->revs->blob_objects = info->revs->tree_objects = 0;

This allows the path-walk API to rely on the revision walk for a
_commits only_ walk and then have the path-walk API handle the trees
and blobs.

The reason we need to add "--objects" now is to allow for parsing the
"--filter" option without the revision logic complaining.

Thanks,
-Stolee

if (path_walk) {
strvec_push(&rp, "--boundary");
/*
* We must disable the bitmaps because we are removing
* the --objects / --objects-edge[-aggressive] options.
*/
strvec_push(&rp, "--objects");
use_bitmap_index = 0;
} else if (thin) {
use_internal_rev_list = 1;
Expand Down
201 changes: 189 additions & 12 deletions path-walk.c
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,9 @@
#include "hashmap.h"
#include "hex.h"
#include "list-objects.h"
#include "list-objects-filter-options.h"
#include "object-name.h"
#include "odb.h"
#include "object.h"
#include "oid-array.h"
#include "path.h"
Expand Down Expand Up @@ -178,11 +181,6 @@ static int add_tree_entries(struct path_walk_context *ctx,
return -1;
}

/* Skip this object if already seen. */
if (o->flags & SEEN)
continue;
o->flags |= SEEN;

strbuf_setlen(&path, base_len);
strbuf_add(&path, entry.path, entry.pathlen);

Expand All @@ -193,6 +191,40 @@ static int add_tree_entries(struct path_walk_context *ctx,
if (type == OBJ_TREE)
strbuf_addch(&path, '/');

if (o->flags & SEEN) {
/*
* A tree with a shared OID may appear at multiple
* paths. Even though we already added this tree to
* the output at some other path, we still need to
* walk into it at this in-cone path to discover
* blobs that were not found at the earlier
* out-of-cone path.
*
* Only do this for paths not yet in our map, to
* avoid duplicate entries when the same tree OID
* appears at the same path across multiple commits.
*/
if (type == OBJ_TREE && ctx->info->pl &&
ctx->info->pl->use_cone_patterns &&
!ctx->info->pl_sparse_trees &&
!strmap_contains(&ctx->paths_to_lists, path.buf)) {
int dtype;
enum pattern_match_result m;
m = path_matches_pattern_list(path.buf, path.len,
path.buf + base_len,
&dtype,
ctx->info->pl,
ctx->repo->index);
if (m != NOT_MATCHED) {
add_path_to_list(ctx, path.buf, type,
&entry.oid,
!(o->flags & UNINTERESTING));
push_to_stack(ctx, path.buf);
}
}
continue;
}

if (ctx->info->pl) {
int dtype;
enum pattern_match_result match;
Expand All @@ -202,7 +234,8 @@ static int add_tree_entries(struct path_walk_context *ctx,
ctx->repo->index);

if (ctx->info->pl->use_cone_patterns &&
match == NOT_MATCHED)
match == NOT_MATCHED &&
(type == OBJ_BLOB || ctx->info->pl_sparse_trees))
continue;
else if (!ctx->info->pl->use_cone_patterns &&
type == OBJ_BLOB &&
Expand Down Expand Up @@ -237,6 +270,7 @@ static int add_tree_entries(struct path_walk_context *ctx,
continue;
}

o->flags |= SEEN;
add_path_to_list(ctx, path.buf, type, &entry.oid,
!(o->flags & UNINTERESTING));

Expand Down Expand Up @@ -314,9 +348,29 @@ static int walk_path(struct path_walk_context *ctx,
/* Evaluate function pointer on this data, if requested. */
if ((list->type == OBJ_TREE && ctx->info->trees) ||
(list->type == OBJ_BLOB && ctx->info->blobs) ||
(list->type == OBJ_TAG && ctx->info->tags))
ret = ctx->info->path_fn(path, &list->oids, list->type,
ctx->info->path_fn_data);
(list->type == OBJ_TAG && ctx->info->tags)) {
struct oid_array *oids = &list->oids;
struct oid_array filtered = OID_ARRAY_INIT;

if (list->type == OBJ_BLOB && ctx->info->blob_limit) {
for (size_t i = 0; i < list->oids.nr; i++) {
unsigned long size;

if (odb_read_object_info(ctx->repo->objects,
&list->oids.oid[i],
&size) != OBJ_BLOB ||
size < ctx->info->blob_limit)
oid_array_append(&filtered,
&list->oids.oid[i]);
}
oids = &filtered;
}

if (oids->nr)
ret = ctx->info->path_fn(path, oids, list->type,
ctx->info->path_fn_data);
oid_array_clear(&filtered);
}

/* Expand data for children. */
if (list->type == OBJ_TREE) {
Expand Down Expand Up @@ -376,7 +430,7 @@ static int setup_pending_objects(struct path_walk_info *info,
CALLOC_ARRAY(tags, 1);
if (info->blobs)
CALLOC_ARRAY(tagged_blobs, 1);
if (info->trees)
if (info->trees || info->blobs)
root_tree_list = strmap_get(&ctx->paths_to_lists, root_path);

/*
Expand Down Expand Up @@ -421,7 +475,7 @@ static int setup_pending_objects(struct path_walk_info *info,

switch (obj->type) {
case OBJ_TREE:
if (!info->trees)
if (!info->trees && !info->blobs)
continue;
if (pending->path) {
char *path = *pending->path ? xstrfmt("%s/", pending->path)
Expand Down Expand Up @@ -485,6 +539,119 @@ static int setup_pending_objects(struct path_walk_info *info,
return 0;
}

static int prepare_filters_one(struct path_walk_info *info,
struct list_objects_filter_options *options)
{
switch (options->choice) {
case LOFC_DISABLED:
return 1;

case LOFC_BLOB_NONE:
if (info)
info->blobs = 0;
return 1;

case LOFC_BLOB_LIMIT:
if (info) {
if (!options->blob_limit_value) {
info->blobs = 0;
} else if (!info->blob_limit ||
options->blob_limit_value < info->blob_limit) {
info->blob_limit = options->blob_limit_value;
}
}
return 1;

case LOFC_TREE_DEPTH:
if (options->tree_exclude_depth) {
error(_("tree:%lu filter not supported by the path-walk API"),
options->tree_exclude_depth);
return 0;
}
if (info) {
info->trees = 0;
info->blobs = 0;
}
return 1;

case LOFC_OBJECT_TYPE:
if (info) {
info->commits &= options->object_type == OBJ_COMMIT;
info->tags &= options->object_type == OBJ_TAG;
info->trees &= options->object_type == OBJ_TREE;
info->blobs &= options->object_type == OBJ_BLOB;
}
return 1;

case LOFC_SPARSE_OID:
if (info) {
struct object_id sparse_oid;
struct repository *repo = info->revs->repo;

if (info->pl) {
warning(_("sparse filter cannot be combined with existing sparse patterns"));
return 0;
}

if (repo_get_oid_with_flags(repo,
options->sparse_oid_name,
&sparse_oid,
GET_OID_BLOB)) {
error(_("unable to access sparse blob in '%s'"),
options->sparse_oid_name);
return 0;
}

CALLOC_ARRAY(info->pl, 1);
info->pl->use_cone_patterns = 1;

if (add_patterns_from_blob_to_list(&sparse_oid, "", 0,
info->pl) < 0) {
clear_pattern_list(info->pl);
FREE_AND_NULL(info->pl);
error(_("unable to parse sparse filter data in '%s'"),
oid_to_hex(&sparse_oid));
return 0;
}

if (!info->pl->use_cone_patterns) {
clear_pattern_list(info->pl);
FREE_AND_NULL(info->pl);
warning(_("sparse filter is not cone-mode compatible"));
return 0;
}
}
return 1;

case LOFC_COMBINE:
for (size_t i = 0; i < options->sub_nr; i++) {
if (!prepare_filters_one(info, &options->sub[i]))
return 0;
}
return 1;

default:
error(_("object filter '%s' not supported by the path-walk API"),
list_objects_filter_spec(options));
return 0;
}
}

static int prepare_filters(struct path_walk_info *info,
struct list_objects_filter_options *options)
{
if (!prepare_filters_one(info, options))
return 0;
if (info)
list_objects_filter_release(options);
return 1;
}

int path_walk_filter_compatible(struct list_objects_filter_options *options)
{
return prepare_filters(NULL, options);
}

/**
* Given the configuration of 'info', walk the commits based on 'info->revs' and
* call 'info->path_fn' on each discovered path.
Expand Down Expand Up @@ -512,6 +679,9 @@ int walk_objects_by_path(struct path_walk_info *info)

trace2_region_enter("path-walk", "commit-walk", info->revs->repo);

if (!prepare_filters(info, &info->revs->filter))
return -1;

CALLOC_ARRAY(commit_list, 1);
commit_list->type = OBJ_COMMIT;

Expand All @@ -534,9 +704,16 @@ int walk_objects_by_path(struct path_walk_info *info)
/*
* Set these values before preparing the walk to catch
* lightweight tags pointing to non-commits and indexed objects.
*
* Keep tree_objects set whenever blobs are wanted: blobs may
* be reachable through trees that show up as pending objects
* (e.g., via lightweight tags pointing to trees, or annotated
* tags whose peeled target is a tree). Without tree_objects,
* prepare_revision_walk() would discard those pending trees
* and we would never descend into them.
*/
info->revs->blob_objects = info->blobs;
info->revs->tree_objects = info->trees;
info->revs->tree_objects = info->trees || info->blobs;

if (prepare_revision_walk(info->revs))
die(_("failed to setup revision walk"));
Expand Down
21 changes: 21 additions & 0 deletions path-walk.h
Original file line number Diff line number Diff line change
Expand Up @@ -42,6 +42,14 @@ struct path_walk_info {
int blobs;
int tags;

/**
* If non-zero, specifies a maximum blob size. Blobs with a
* size equal to or greater than this limit will be omitted
* from the walk. Blobs smaller than the limit (or blobs
* whose size cannot be determined) are still visited.
*/
unsigned long blob_limit;

/**
* When 'prune_all_uninteresting' is set and a path has all objects
* marked as UNINTERESTING, then the path-walk will not visit those
Expand All @@ -64,8 +72,14 @@ struct path_walk_info {
* of the cone. If not in cone mode, then all tree paths will be
* explored but the path_fn will only be called when the path matches
* the sparse-checkout patterns.
*
* When 'pl_sparse_trees' is zero, the sparse patterns only restrict
* blobs and all trees are included in the walk output. This matches
* the behavior of the sparse:oid object filter. When nonzero, trees
* are also pruned by the sparse patterns (as used by backfill).
*/
struct pattern_list *pl;
int pl_sparse_trees;
};

#define PATH_WALK_INFO_INIT { \
Expand All @@ -85,3 +99,10 @@ void path_walk_info_clear(struct path_walk_info *info);
* Returns nonzero on an error.
*/
int walk_objects_by_path(struct path_walk_info *info);

struct list_objects_filter_options;
/**
* Given a set of options for filtering objects, return 1 if the options
* are compatible with the path-walk API and 0 otherwise.
*/
int path_walk_filter_compatible(struct list_objects_filter_options *options);
Loading
Loading