Fetch git submodule during update of git subproject#1017
Fetch git submodule during update of git subproject#1017
Conversation
WalkthroughAdds tracking of VCS dependencies (git submodules) through VCS backends, project/subproject layers, and metadata; updates git checkout to detect/init submodules; propagates dependency data into Metadata and report output; adjusts logging and utility removals to support the new flows. Changes
Sequence DiagramsequenceDiagram
participant User as User
participant GitSub as GitSubProject
participant GitLocal as GitLocalRepo
participant Meta as Metadata
participant Reporter as StdoutReporter
User->>GitSub: fetch(version)
GitSub->>GitLocal: checkout_version(version)
GitLocal->>GitLocal: git checkout & git submodule update --init --recursive
GitLocal->>GitLocal: collect Submodule list
GitLocal-->>GitSub: (sha, submodules[])
GitSub->>GitSub: build VcsDependency[] from submodules
GitSub-->>Meta: fetched(version, ..., dependencies=VcsDependency[].to_dependency())
Meta->>Meta: store dependencies in _dependencies
User->>Reporter: generate report(project)
Reporter->>Meta: metadata.dependencies
Reporter-->>User: formatted output including dependencies
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related PRs
Suggested reviewers
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
126d91c to
dda77ef
Compare
dda77ef to
5b52e24
Compare
5b52e24 to
5148a28
Compare
|
Note Unit test generation is a public access feature. Expect some limitations and changes as we gather feedback and continue to improve it. Generating unit tests... This may take up to 20 minutes. |
|
✅ Created PR with unit tests: #1025 |
5148a28 to
ad901d7
Compare
ad901d7 to
3badd46
Compare
3badd46 to
70f0adc
Compare
70f0adc to
78711d1
Compare
78711d1 to
b81f411
Compare
b81f411 to
07b6553
Compare
There was a problem hiding this comment.
Actionable comments posted: 5
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
dfetch/vcs/git.py (1)
305-316:⚠️ Potential issue | 🟠 MajorApply the
src:remap before persisting dependency paths.Line 305 snapshots submodules before Lines 313-314 rewrite the checkout. If
src: vendor/libcontainsvendor/lib/ext/zlib, the exported tree ends up withext/zlibbut the dependency metadata still storesvendor/lib/ext/zlib; submodules outsidesrcare also reported even though they were removed.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@dfetch/vcs/git.py` around lines 305 - 316, The code captures submodules and current SHA before applying the src remap, causing dependency paths to persist pre-remap locations; move the src rewrite so it runs before snapshotting submodules and reading HEAD. Specifically, call self.move_src_folder_up(remote, src) (with the same parameters) before invoking self.submodules() and before the run_on_cmdline(... "git", "rev-parse", "HEAD") call so the exported tree and recorded submodule paths reflect the remapped src layout.
🧹 Nitpick comments (1)
dfetch/util/util.py (1)
45-51: Broken symlinks won't be removed.The use of
os.path.exists()returnsFalsefor broken symlinks (since it follows the link to check target existence). If the intent is to remove the path regardless of whether it's a valid symlink, consider usingos.path.lexists()instead, which returnsTruefor broken symlinks.If broken symlinks should be silently ignored, the current implementation is correct.
♻️ Optional fix if broken symlinks should be removed
def safe_rm(path: str | Path) -> None: """Delete an file or directory safely.""" - if os.path.exists(path): + if os.path.lexists(path): if os.path.isdir(path): safe_rmtree(str(path)) else: os.remove(path)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@dfetch/util/util.py` around lines 45 - 51, The safe_rm function uses os.path.exists which returns False for broken symlinks; change the existence check to os.path.lexists(path) so broken symlinks are detected and removed. In safe_rm (and when calling safe_rmtree/remove), keep the existing branch logic (os.path.isdir -> safe_rmtree, else -> os.remove) but gate it on os.path.lexists instead of os.path.exists so broken symlinks fall through to the os.remove path and are deleted.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@CHANGELOG.rst`:
- Line 5: The changelog entry "* Fetch git submodules in git subproject at
pinned revision (`#1015`)" references the wrong issue number; verify whether the
correct reference is `#1017` (per PR metadata) or `#1013` (per commit summary) and
update that parenthetical to the confirmed issue number so the CHANGELOG.rst
entry matches the PR and commit message.
In `@dfetch/util/util.py`:
- Line 46: Fix the minor typo in the docstring that currently reads "Delete an
file or directory safely." by changing it to "Delete a file or directory
safely."; update the docstring text in dfetch/util/util.py where that string
appears so the grammar is correct.
In `@dfetch/vcs/git.py`:
- Around line 299-305: In submodules() fix nested-submodule resolution: stop
using sm_path and the single cached urls map; instead parse each toplevel's
.gitmodules to build a fresh urls mapping per toplevel, use the $displaypath
value (not $sm_path) when emitting/looking up submodule entries and when
constructing full paths, and ensure url lookups use the corresponding
toplevel-specific map so names/URLs come from the same .gitmodules file.
In `@features/fetch-git-repo-with-submodule.feature`:
- Line 70: The expected "last fetch" value in
features/fetch-git-repo-with-submodule.feature is hardcoded to "26/02/2026,
20:28:24" and must be aligned with the suite's fixed clock; update the "last
fetch" expectation (the line containing the literal last fetch timestamp) to use
the shared fixed timestamp "02/07/2021, 20:25:56" (the same value asserted in
features/list-projects.feature) so the scenario uses the common clock fixture.
In `@features/steps/generic_steps.py`:
- Around line 333-335: The step function step_impl for the "{name} exists" step
currently delegates to check_file_exists which uses os.path.isfile() and only
accepts regular files; update the behavior to check for any filesystem entry by
either (a) changing check_file_exists to use os.path.exists() instead of
os.path.isfile(), or (b) create/rename a helper like check_path_exists that uses
os.path.exists() and call that from step_impl so the step truly asserts path
existence (reference: step_impl and check_file_exists).
---
Outside diff comments:
In `@dfetch/vcs/git.py`:
- Around line 305-316: The code captures submodules and current SHA before
applying the src remap, causing dependency paths to persist pre-remap locations;
move the src rewrite so it runs before snapshotting submodules and reading HEAD.
Specifically, call self.move_src_folder_up(remote, src) (with the same
parameters) before invoking self.submodules() and before the run_on_cmdline(...
"git", "rev-parse", "HEAD") call so the exported tree and recorded submodule
paths reflect the remapped src layout.
---
Nitpick comments:
In `@dfetch/util/util.py`:
- Around line 45-51: The safe_rm function uses os.path.exists which returns
False for broken symlinks; change the existence check to os.path.lexists(path)
so broken symlinks are detected and removed. In safe_rm (and when calling
safe_rmtree/remove), keep the existing branch logic (os.path.isdir ->
safe_rmtree, else -> os.remove) but gate it on os.path.lexists instead of
os.path.exists so broken symlinks fall through to the os.remove path and are
deleted.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: c047622b-f1a4-4b2e-9810-993a1556fbe1
📒 Files selected for processing (16)
CHANGELOG.rstdfetch/log.pydfetch/project/gitsubproject.pydfetch/project/metadata.pydfetch/project/subproject.pydfetch/project/svnsubproject.pydfetch/reporting/stdout_reporter.pydfetch/util/util.pydfetch/vcs/git.pyexample/dfetch.yamlfeatures/fetch-git-repo-with-submodule.featurefeatures/list-projects.featurefeatures/steps/generic_steps.pyfeatures/steps/git_steps.pytests/test_report.pytests/test_subproject.py
07b6553 to
d996bda
Compare
There was a problem hiding this comment.
Actionable comments posted: 3
🧹 Nitpick comments (1)
dfetch/project/subproject.py (1)
151-160: Confusing variable naming:dependencyis a list but named singular, and shadows itself in comprehension.The variable
dependencyholds alist[VcsDependency]but uses a singular name. The list comprehensionfor dependency in dependencythen shadows this variable, which works but is confusing and error-prone.Consider renaming to
vcs_depsorvcs_dependenciesfor clarity, matching the naming pattern used inGitSubProject._fetch_impl.♻️ Suggested fix for clarity
- actually_fetched, dependency = self._fetch_impl(to_fetch) + actually_fetched, vcs_deps = self._fetch_impl(to_fetch) self._log_project(f"Fetched {actually_fetched}") applied_patches = self._apply_patches(patch_count) self.__metadata.fetched( actually_fetched, hash_=hash_directory(self.local_path, skiplist=[self.__metadata.FILENAME]), patch_=applied_patches, - dependencies=[dependency.to_dependency() for dependency in dependency], + dependencies=[dep.to_dependency() for dep in vcs_deps], )🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@dfetch/project/subproject.py` around lines 151 - 160, The variable named `dependency` is actually a list and is shadowed in the comprehension; rename it to a plural name (e.g., `vcs_dependencies` or `vcs_deps`) where it's returned from _fetch_impl and where it's consumed before calling __metadata.fetched; update the call site that currently reads `dependency` to `vcs_dependencies` and change the list comprehension to `[d.to_dependency() for d in vcs_dependencies]`, ensuring references in the surrounding block (including the unpack from self._fetch_impl(to_fetch) and the dependencies argument in __metadata.fetched) use the new plural name consistently (follow the naming pattern used in GitSubProject._fetch_impl).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@dfetch/project/gitsubproject.py`:
- Around line 72-76: The checkout currently preserves ".gitmodules" via the
must_keeps argument in local_repo.checkout_version and then removes only the
top-level ".git", leaving each submodule's .git entries orphaned; modify the
post-checkout cleanup (around the local_repo.checkout_version call and the later
cleanup block that deletes the superproject .git) to recursively find and remove
any nested ".git" files or directories under the checked-out working tree
(including submodule paths) rather than only deleting the top-level .git,
ensuring you still keep ".gitmodules" long enough to initialize submodules
before removing all nested .git metadata.
In `@dfetch/vcs/git.py`:
- Around line 300-306: The submodule enumeration misses nested submodules and
the relative_to call can raise for non-ancestor src; update the call that
gathers submodules to request recursive enumeration (e.g., make
self.submodules() accept/forward a recursive=True flag or call the git command
with --recursive so nested/transitive submodules returned) and change the path
handling in checkout_version()/where Path(submodule.path).relative_to(Path(src))
is used to avoid ValueError by normalizing src to a directory (if src points to
a file, use src.parent) or by computing a safe relative path via
os.path.relpath/Path(...).relative_to inside a try/except fallback—apply these
changes to the functions self.submodules() and checkout_version() so nested
submodules are included and non-ancestor src inputs do not crash.
- Around line 314-317: The code that sets submodule.path using
Path(submodule.path).relative_to(Path(src)) fails for non-directory or glob
`src` values; update the logic in the loop that mutates submodule.path (in
dfetch/vcs/git.py) to mirror move_src_folder_up() behavior: if src is not a
directory (os.path.isdir(src) is False) replace src with its parent/dirname
before creating the Path for relative_to, and fall back to
os.path.relpath(submodule.path, src) or skip relative calculation if relative_to
still raises ValueError so submodules don't crash the checkout.
---
Nitpick comments:
In `@dfetch/project/subproject.py`:
- Around line 151-160: The variable named `dependency` is actually a list and is
shadowed in the comprehension; rename it to a plural name (e.g.,
`vcs_dependencies` or `vcs_deps`) where it's returned from _fetch_impl and where
it's consumed before calling __metadata.fetched; update the call site that
currently reads `dependency` to `vcs_dependencies` and change the list
comprehension to `[d.to_dependency() for d in vcs_dependencies]`, ensuring
references in the surrounding block (including the unpack from
self._fetch_impl(to_fetch) and the dependencies argument in __metadata.fetched)
use the new plural name consistently (follow the naming pattern used in
GitSubProject._fetch_impl).
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: faf1d394-d889-40b1-9ec4-222dacf51e84
📒 Files selected for processing (15)
CHANGELOG.rstdfetch/log.pydfetch/project/gitsubproject.pydfetch/project/metadata.pydfetch/project/subproject.pydfetch/project/svnsubproject.pydfetch/reporting/stdout_reporter.pydfetch/util/util.pydfetch/vcs/git.pyfeatures/fetch-git-repo-with-submodule.featurefeatures/list-projects.featurefeatures/steps/generic_steps.pyfeatures/steps/git_steps.pytests/test_report.pytests/test_subproject.py
✅ Files skipped from review due to trivial changes (1)
- CHANGELOG.rst
🚧 Files skipped from review as they are similar to previous changes (5)
- features/fetch-git-repo-with-submodule.feature
- features/steps/git_steps.py
- features/steps/generic_steps.py
- dfetch/util/util.py
- tests/test_subproject.py
| fetched_sha, submodules = local_repo.checkout_version( | ||
| remote=self.remote, | ||
| version=rev_or_branch_or_tag, | ||
| src=self.source, | ||
| must_keeps=license_globs, | ||
| must_keeps=license_globs + [".gitmodules"], |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
set -euo pipefail
tmp="$(mktemp -d)"
trap 'rm -rf "$tmp"' EXIT
git init "$tmp/child" >/dev/null
git -C "$tmp/child" config user.email review@example.com
git -C "$tmp/child" config user.name review
touch "$tmp/child/README"
git -C "$tmp/child" add README
git -C "$tmp/child" commit -m init >/dev/null
git init "$tmp/parent" >/dev/null
git -C "$tmp/parent" config user.email review@example.com
git -C "$tmp/parent" config user.name review
touch "$tmp/parent/.keep"
git -C "$tmp/parent" add .keep
git -C "$tmp/parent" commit -m init >/dev/null
git -C "$tmp/parent" -c protocol.file.allow=always submodule add "$tmp/child" deps/child >/dev/null
git -C "$tmp/parent" commit -am add-submodule >/dev/null
test -f "$tmp/parent/.gitmodules" && echo ".gitmodules is a tracked checkout file"
test -e "$tmp/parent/deps/child/.git" && echo "submodule worktree has its own .git entry"
rm -rf "$tmp/parent/.git"
test -e "$tmp/parent/deps/child/.git" && echo "nested .git entry still exists after deleting only the top-level .git"Repository: dfetch-org/dfetch
Length of output: 1446
🏁 Script executed:
cat -n dfetch/project/gitsubproject.py | sed -n '70,105p'Repository: dfetch-org/dfetch
Length of output: 1732
🏁 Script executed:
fd -t f "gitsubproject.py" --exec wc -l {} \;Repository: dfetch-org/dfetch
Length of output: 98
🏁 Script executed:
rg -A 5 -B 5 "safe_rmtree.*METADATA_DIR" dfetch/project/gitsubproject.pyRepository: dfetch-org/dfetch
Length of output: 484
🏁 Script executed:
rg "METADATA_DIR|GIT_MODULES_FILE" dfetch/ -t pyRepository: dfetch-org/dfetch
Length of output: 476
🏁 Script executed:
rg "def safe_rmtree|def safe_rm" dfetch/ -A 3 -t pyRepository: dfetch-org/dfetch
Length of output: 547
🏁 Script executed:
head -30 dfetch/project/gitsubproject.pyRepository: dfetch-org/dfetch
Length of output: 857
Remove all nested .git directories after checkout, not just the superproject metadata.
The code keeps .gitmodules during checkout to initialize submodules, then deletes it along with only the top-level .git. This leaves each submodule's .git entry orphaned and pointing to now-deleted superproject metadata.
🛠️ Suggested fix
safe_rmtree(os.path.join(self.local_path, local_repo.METADATA_DIR))
- safe_rm(os.path.join(self.local_path, local_repo.GIT_MODULES_FILE))
+ for nested_git_metadata in pathlib.Path(self.local_path).rglob(
+ local_repo.METADATA_DIR
+ ):
+ safe_rm(nested_git_metadata)Also applies to: 97-98
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@dfetch/project/gitsubproject.py` around lines 72 - 76, The checkout currently
preserves ".gitmodules" via the must_keeps argument in
local_repo.checkout_version and then removes only the top-level ".git", leaving
each submodule's .git entries orphaned; modify the post-checkout cleanup (around
the local_repo.checkout_version call and the later cleanup block that deletes
the superproject .git) to recursively find and remove any nested ".git" files or
directories under the checked-out working tree (including submodule paths)
rather than only deleting the top-level .git, ensuring you still keep
".gitmodules" long enough to initialize submodules before removing all nested
.git metadata.
| run_on_cmdline( | ||
| logger, | ||
| ["git", "submodule", "update", "--init", "--recursive"], | ||
| env=_extend_env_for_non_interactive_mode(), | ||
| ) | ||
|
|
||
| submodules = self.submodules() |
There was a problem hiding this comment.
🧩 Analysis chain
🌐 Web query:
Does git submodule foreachrecurse into nested submodules by default, or is--recursive required?
💡 Result:
git submodule foreach does not recurse into nested submodules by default; it runs the command only in the submodules of the current superproject. To run the command in nested submodules too, you must pass --recursive. [1], [2]
🏁 Script executed:
# Check the implementation of submodules() method in dfetch/vcs/git.py
rg -n "def submodules" dfetch/vcs/git.py -A 15Repository: dfetch-org/dfetch
Length of output: 615
🏁 Script executed:
# Check lines 314-317 for the relative_to() usage
sed -n '310,320p' dfetch/vcs/git.py | cat -nRepository: dfetch-org/dfetch
Length of output: 464
🏁 Script executed:
python <<'EOF'
from pathlib import Path
# Test 1: src is a parent directory (should work)
try:
result = Path("src/submodule1").relative_to(Path("src"))
print(f"✓ Test 1 (parent dir): {result}")
except ValueError as e:
print(f"✗ Test 1 (parent dir): ValueError - {e}")
# Test 2: src is not a parent directory
try:
result = Path("src/submodule1").relative_to(Path("other"))
print(f"✓ Test 2 (non-parent dir): {result}")
except ValueError as e:
print(f"✗ Test 2 (non-parent dir): ValueError - {e}")
# Test 3: src is a file (not a directory)
try:
result = Path("src/submodule1").relative_to(Path("src/file.txt"))
print(f"✓ Test 3 (file path): {result}")
except ValueError as e:
print(f"✗ Test 3 (file path): ValueError - {e}")
# Test 4: src is the full path (exact match)
try:
result = Path("src/submodule1").relative_to(Path("src/submodule1"))
print(f"✓ Test 4 (exact match): {result}")
except ValueError as e:
print(f"✗ Test 4 (exact match): ValueError - {e}")
EOFRepository: dfetch-org/dfetch
Length of output: 404
Nested submodules are fetched here but never enumerated or returned.
Line 302 updates all submodules recursively with --recursive, but line 306 calls self.submodules(), which uses git submodule foreach --quiet without --recursive. This means nested (transitive) submodules get checked out but are omitted from the returned dependency list, causing them to disappear from metadata and dependency reports.
Additionally, lines 314–317 call Path(submodule.path).relative_to(Path(src)) without validating that src is actually a parent directory. The checkout_version() method documents src as an optional path to a subdirectory or file, but relative_to() raises ValueError if src is not a proper ancestor directory (e.g., if src points to a file or a sibling path). This will crash at runtime for such inputs.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@dfetch/vcs/git.py` around lines 300 - 306, The submodule enumeration misses
nested submodules and the relative_to call can raise for non-ancestor src;
update the call that gathers submodules to request recursive enumeration (e.g.,
make self.submodules() accept/forward a recursive=True flag or call the git
command with --recursive so nested/transitive submodules returned) and change
the path handling in checkout_version()/where
Path(submodule.path).relative_to(Path(src)) is used to avoid ValueError by
normalizing src to a directory (if src points to a file, use src.parent) or by
computing a safe relative path via os.path.relpath/Path(...).relative_to inside
a try/except fallback—apply these changes to the functions self.submodules() and
checkout_version() so nested submodules are included and non-ancestor src inputs
do not crash.
| if src: | ||
| for submodule in submodules: | ||
| submodule.path = str(Path(submodule.path).relative_to(Path(src))) | ||
|
|
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
python - <<'PY'
from pathlib import Path
cases = [
("pkg/submodule", "pkg"),
("pkg/submodule", "pkg/*"),
("pkg/submodule", "pkg/file.txt"),
]
for submodule, src in cases:
try:
rel = Path(submodule).relative_to(Path(src))
print(f"{src!r}: OK -> {rel}")
except Exception as exc:
print(f"{src!r}: {type(exc).__name__}: {exc}")
PYRepository: dfetch-org/dfetch
Length of output: 339
🏁 Script executed:
# First, let me examine the file structure and find the method containing lines 314-317
git ls-files dfetch/vcs/git.pyRepository: dfetch-org/dfetch
Length of output: 78
🏁 Script executed:
# Read the relevant section of the file with context
head -330 dfetch/vcs/git.py | tail -50Repository: dfetch-org/dfetch
Length of output: 1837
🏁 Script executed:
# Get more context to understand the method and its documentation
head -350 dfetch/vcs/git.py | tail -100Repository: dfetch-org/dfetch
Length of output: 3772
🏁 Script executed:
# Search for usages of checkout_version to see what src values are passed
rg "checkout_version" --type py -B 2 -A 5Repository: dfetch-org/dfetch
Length of output: 959
🏁 Script executed:
# Look for test cases that might show how src is used
fd "test.*\.py$|.*_test\.py$" -type f | head -20Repository: dfetch-org/dfetch
Length of output: 231
🏁 Script executed:
# Search for test files properly
find . -name "*test*.py" -o -name "test_*.py" | head -20Repository: dfetch-org/dfetch
Length of output: 455
🏁 Script executed:
# Look at the GitSubproject class to understand what source can be
rg "class GitSubproject" -A 30 dfetch/project/gitsubproject.pyRepository: dfetch-org/dfetch
Length of output: 43
🏁 Script executed:
# Search for source property/attribute definitions
rg "self.source\s*=" dfetch/project/gitsubproject.py -B 2 -A 2Repository: dfetch-org/dfetch
Length of output: 43
🏁 Script executed:
# Read the gitsubproject.py file to understand the source attribute
cat dfetch/project/gitsubproject.py | head -200Repository: dfetch-org/dfetch
Length of output: 4851
🏁 Script executed:
# Search for where source is set in gitsubproject
rg "source" dfetch/project/gitsubproject.py | head -20Repository: dfetch-org/dfetch
Length of output: 138
🏁 Script executed:
# Look at test cases that test src parameter
rg "src=" tests/test_git_vcs.py -B 3 -A 3Repository: dfetch-org/dfetch
Length of output: 43
🏁 Script executed:
# Find the SubProject base class to see what source is
rg "class SubProject" dfetch/project/subproject.py -A 50 | head -80Repository: dfetch-org/dfetch
Length of output: 1768
🏁 Script executed:
# Search for source property definition
rg "def source|self\.source\s*=" dfetch/project/subproject.py -B 2 -A 2Repository: dfetch-org/dfetch
Length of output: 197
🏁 Script executed:
# Check the ProjectEntry to see what source field is
rg "source" dfetch/manifest/project.py | head -15Repository: dfetch-org/dfetch
Length of output: 174
🏁 Script executed:
# Look at the ProjectEntry class definition
rg "class ProjectEntry" dfetch/manifest/project.py -A 100 | head -150Repository: dfetch-org/dfetch
Length of output: 3534
🏁 Script executed:
# Look for the source field definition in ProjectEntry
rg "source:" dfetch/manifest/project.py -B 5 -A 5Repository: dfetch-org/dfetch
Length of output: 43
🏁 Script executed:
# Check test cases to see what source values are actually used
rg "source\s*=" tests/test_git_vcs.py -B 2 -A 2Repository: dfetch-org/dfetch
Length of output: 43
🏁 Script executed:
# Look at test files for git vcs to see how src is used
cat tests/test_git_vcs.py | head -300Repository: dfetch-org/dfetch
Length of output: 5634
🏁 Script executed:
# Search for any test that uses src or source parameter
rg "\"src\":|'src':" tests/ -B 3 -A 3Repository: dfetch-org/dfetch
Length of output: 746
🏁 Script executed:
# Look for test cases that test the checkout_version function
rg "checkout_version" tests/ -B 5 -A 10Repository: dfetch-org/dfetch
Length of output: 43
🏁 Script executed:
# Search for manifest examples or documentation about src
find . -name "*.md" -o -name "*.rst" | xargs grep -l "src" 2>/dev/null | head -10Repository: dfetch-org/dfetch
Length of output: 138
🏁 Script executed:
# Look for any actual usage examples with src values
rg "src.*=" . -g "*.md" -g "*.rst" -g "*.yaml" -g "*.yml" 2>/dev/null | head -20Repository: dfetch-org/dfetch
Length of output: 43
🏁 Script executed:
# Check if there are integration tests or examples with src parameter
find . -name "*.yaml" -o -name "*.yml" | xargs grep "src:" 2>/dev/null | head -20Repository: dfetch-org/dfetch
Length of output: 282
🏁 Script executed:
# Read the manifest documentation to understand src parameter
cat doc/manifest.rst | grep -A 20 "src"Repository: dfetch-org/dfetch
Length of output: 463
🏁 Script executed:
# Look at the actual example to understand src patterns
cat example/dfetch.yaml | head -100Repository: dfetch-org/dfetch
Length of output: 1755
🏁 Script executed:
# Let's verify the actual behavior with the glob patterns from the repo
python3 << 'EOF'
from pathlib import Path
# Real examples from the repo
cases = [
("src/mymodule", "*.sh"),
("src/mymodule", "*.puml"),
("src/mymodule", "sphinxcontrib/asciinema"),
("src/mymodule", "src/*"),
("src/mymodule", "src/*.txt"),
("src/mymodule", "src"),
]
for submodule, src in cases:
try:
rel = Path(submodule).relative_to(Path(src))
print(f"'{src}': OK -> {rel}")
except Exception as exc:
print(f"'{src}': {type(exc).__name__}: {exc}")
EOFRepository: dfetch-org/dfetch
Length of output: 726
🏁 Script executed:
# Read move_src_folder_up to understand the full context
sed -n '330,360p' dfetch/vcs/git.pyRepository: dfetch-org/dfetch
Length of output: 1045
🏁 Script executed:
# Check if submodules are even involved when src is a glob/file
# Look at what submodules() returns
rg "def submodules" dfetch/vcs/git.py -A 30Repository: dfetch-org/dfetch
Length of output: 1163
🏁 Script executed:
# Understand the complete flow - when does relative_to get called?
sed -n '300,330p' dfetch/vcs/git.pyRepository: dfetch-org/dfetch
Length of output: 1043
🏁 Script executed:
# Let's trace through what happens with a glob src value in sparse checkout
# Check the sparse-checkout file format
cat doc/manifest.rst | grep -A 30 -B 5 "Source"Repository: dfetch-org/dfetch
Length of output: 515
🏁 Script executed:
# Let's look at the _determine_ignore_paths to see how it handles globs
sed -n '345,370p' dfetch/vcs/git.pyRepository: dfetch-org/dfetch
Length of output: 934
🏁 Script executed:
# The key question: does sparse checkout with glob patterns work with submodules?
# Let's look at what sparse_checkout_file contains
sed -n '270,290p' dfetch/vcs/git.pyRepository: dfetch-org/dfetch
Length of output: 1108
🏁 Script executed:
# Let's check if there's any filtering of submodules based on sparse-checkout
# Look for any tests that combine src with submodules
rg "src.*=" tests/test_subproject.py -B 3 -A 3Repository: dfetch-org/dfetch
Length of output: 43
🏁 Script executed:
# Look at the complete picture - does sparse-checkout affect submodule paths?
# Check if there are any tests that test src with submodules
rg "submodule" tests/test_git_vcs.py -B 2 -A 2Repository: dfetch-org/dfetch
Length of output: 43
🏁 Script executed:
# Let me verify the actual behavior by looking at a real scenario
# When src="src/*", what would submodule.path contain?
cat dfetch/vcs/git.py | grep -A 20 "class Submodule"Repository: dfetch-org/dfetch
Length of output: 606
Path.relative_to() crashes with glob or file src values when submodules are present.
The method signature and documentation indicate that src accepts file and glob patterns (e.g., src="pkg/*", src="pkg/file.txt"), and the repo's own examples use these patterns. However, calling Path(submodule.path).relative_to(Path(src)) on line 314-317 will raise ValueError for any non-directory src, causing the entire checkout to fail if the repository contains submodules.
The move_src_folder_up() method explicitly handles this case with if not os.path.isdir(src): src = os.path.dirname(src), confirming that file paths are intentionally supported. The relative path calculation needs similar handling.
🛠️ Suggested fix
if src:
- for submodule in submodules:
- submodule.path = str(Path(submodule.path).relative_to(Path(src)))
+ src_base = src if os.path.isdir(src) else os.path.dirname(src)
+ filtered_submodules = []
+ for submodule in submodules:
+ rel_path = os.path.relpath(submodule.path, src_base or ".")
+ if rel_path.startswith(".."):
+ continue
+ submodule.path = rel_path
+ filtered_submodules.append(submodule)
+ submodules = filtered_submodules📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| if src: | |
| for submodule in submodules: | |
| submodule.path = str(Path(submodule.path).relative_to(Path(src))) | |
| if src: | |
| src_base = src if os.path.isdir(src) else os.path.dirname(src) | |
| filtered_submodules = [] | |
| for submodule in submodules: | |
| rel_path = os.path.relpath(submodule.path, src_base or ".") | |
| if rel_path.startswith(".."): | |
| continue | |
| submodule.path = rel_path | |
| filtered_submodules.append(submodule) | |
| submodules = filtered_submodules |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@dfetch/vcs/git.py` around lines 314 - 317, The code that sets submodule.path
using Path(submodule.path).relative_to(Path(src)) fails for non-directory or
glob `src` values; update the logic in the loop that mutates submodule.path (in
dfetch/vcs/git.py) to mirror move_src_folder_up() behavior: if src is not a
directory (os.path.isdir(src) is False) replace src with its parent/dirname
before creating the Path for relative_to, and fall back to
os.path.relpath(submodule.path, src) or skip relative calculation if relative_to
still raises ValueError so submodules don't crash the checkout.
Summary by CodeRabbit
New Features
Bug Fixes
Documentation
Tests