Skip to content

Fetch git submodule during update of git subproject#1017

Open
spoorcc wants to merge 2 commits intomainfrom
spoorcc/issue1013
Open

Fetch git submodule during update of git subproject#1017
spoorcc wants to merge 2 commits intomainfrom
spoorcc/issue1013

Conversation

@spoorcc
Copy link
Contributor

@spoorcc spoorcc commented Feb 23, 2026

Summary by CodeRabbit

  • New Features

    • Automatic detection, initialization and fetching of git submodules; nested VCS dependencies are tracked and shown as project dependencies.
    • Reports now include dependency details (path, URL, branch, tag, revision, source type).
  • Bug Fixes

    • Avoid printing an empty info line when metadata is missing.
    • Safer file removal: no error for non-existent paths.
  • Documentation

    • Changelog updated and example manifest extended with a new project entry.
  • Tests

    • New feature tests and step definitions covering submodule fetch and report scenarios.

@spoorcc spoorcc added the enhancement New feature or request label Feb 23, 2026
@coderabbitai
Copy link

coderabbitai bot commented Feb 23, 2026

Walkthrough

Adds tracking of VCS dependencies (git submodules) through VCS backends, project/subproject layers, and metadata; updates git checkout to detect/init submodules; propagates dependency data into Metadata and report output; adjusts logging and utility removals to support the new flows.

Changes

Cohort / File(s) Summary
Dependency & Metadata
dfetch/project/metadata.py, dfetch/project/subproject.py
Introduce Dependency TypedDict and VcsDependency type; extend Metadata to accept/store/expose dependencies via fetched(..., dependencies=...) and include them in dumps; update SubProject API to return dependency lists and convert VcsDependency → Dependency.
Git VCS & Submodule Detection
dfetch/vcs/git.py, dfetch/project/gitsubproject.py
Add GIT_MODULES_FILE, change checkout_version() to return (sha, submodules[]), initialize/update submodules, and have GitSubProject collect/return VcsDependency entries; remove .gitmodules file during cleanup.
VCS Implementations Alignment
dfetch/project/svnsubproject.py
Align _fetch_impl() signature to return tuple[Version, list[VcsDependency]] (returns empty dependency list for SVN).
Reporting / Output
dfetch/reporting/stdout_reporter.py, features/list-projects.feature
Reformat stdout report layout and add a dependencies section showing path, url, branch, tag, revision, and source-type for each dependency; adjust indentation and ordering of metadata fields.
CLI Logging & Util
dfetch/log.py, dfetch/util/util.py
Guard printing of empty info lines in logger; make safe removal check use os.path.lexists to avoid removing non-existent paths.
Features & Tests
features/fetch-git-repo-with-submodule.feature, features/steps/git_steps.py, features/steps/generic_steps.py, tests/test_report.py, tests/test_subproject.py
Add feature for fetching repos with submodules; adapt git test steps to support named remotes and submodule commits; add existence step; update tests to match new _fetch_impl return shape and reporter calls.
Examples & Changelog
example/dfetch.yaml, CHANGELOG.rst
Add TF-PSA-Crypto example project entry and a new Release 0.13.0 changelog section.

Sequence Diagram

sequenceDiagram
    participant User as User
    participant GitSub as GitSubProject
    participant GitLocal as GitLocalRepo
    participant Meta as Metadata
    participant Reporter as StdoutReporter

    User->>GitSub: fetch(version)
    GitSub->>GitLocal: checkout_version(version)
    GitLocal->>GitLocal: git checkout & git submodule update --init --recursive
    GitLocal->>GitLocal: collect Submodule list
    GitLocal-->>GitSub: (sha, submodules[])
    GitSub->>GitSub: build VcsDependency[] from submodules
    GitSub-->>Meta: fetched(version, ..., dependencies=VcsDependency[].to_dependency())
    Meta->>Meta: store dependencies in _dependencies
    User->>Reporter: generate report(project)
    Reporter->>Meta: metadata.dependencies
    Reporter-->>User: formatted output including dependencies
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

Suggested reviewers

  • ben-edna
🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 78.38% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately describes the main change: fetching git submodules during git subproject updates, which is the core objective across multiple modified files.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
  • 📝 Generate docstrings (stacked PR)
  • 📝 Generate docstrings (commit on current branch)
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch spoorcc/issue1013

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai
Copy link

coderabbitai bot commented Feb 26, 2026

Note

Unit test generation is a public access feature. Expect some limitations and changes as we gather feedback and continue to improve it.


Generating unit tests... This may take up to 20 minutes.

@coderabbitai
Copy link

coderabbitai bot commented Feb 26, 2026

✅ Created PR with unit tests: #1025

@spoorcc spoorcc force-pushed the spoorcc/issue1013 branch from 78711d1 to b81f411 Compare March 9, 2026 20:03
@spoorcc spoorcc force-pushed the spoorcc/issue1013 branch from b81f411 to 07b6553 Compare March 9, 2026 20:09
@spoorcc spoorcc marked this pull request as ready for review March 9, 2026 20:09
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
dfetch/vcs/git.py (1)

305-316: ⚠️ Potential issue | 🟠 Major

Apply the src: remap before persisting dependency paths.

Line 305 snapshots submodules before Lines 313-314 rewrite the checkout. If src: vendor/lib contains vendor/lib/ext/zlib, the exported tree ends up with ext/zlib but the dependency metadata still stores vendor/lib/ext/zlib; submodules outside src are also reported even though they were removed.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@dfetch/vcs/git.py` around lines 305 - 316, The code captures submodules and
current SHA before applying the src remap, causing dependency paths to persist
pre-remap locations; move the src rewrite so it runs before snapshotting
submodules and reading HEAD. Specifically, call self.move_src_folder_up(remote,
src) (with the same parameters) before invoking self.submodules() and before the
run_on_cmdline(... "git", "rev-parse", "HEAD") call so the exported tree and
recorded submodule paths reflect the remapped src layout.
🧹 Nitpick comments (1)
dfetch/util/util.py (1)

45-51: Broken symlinks won't be removed.

The use of os.path.exists() returns False for broken symlinks (since it follows the link to check target existence). If the intent is to remove the path regardless of whether it's a valid symlink, consider using os.path.lexists() instead, which returns True for broken symlinks.

If broken symlinks should be silently ignored, the current implementation is correct.

♻️ Optional fix if broken symlinks should be removed
 def safe_rm(path: str | Path) -> None:
     """Delete an file or directory safely."""
-    if os.path.exists(path):
+    if os.path.lexists(path):
         if os.path.isdir(path):
             safe_rmtree(str(path))
         else:
             os.remove(path)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@dfetch/util/util.py` around lines 45 - 51, The safe_rm function uses
os.path.exists which returns False for broken symlinks; change the existence
check to os.path.lexists(path) so broken symlinks are detected and removed. In
safe_rm (and when calling safe_rmtree/remove), keep the existing branch logic
(os.path.isdir -> safe_rmtree, else -> os.remove) but gate it on os.path.lexists
instead of os.path.exists so broken symlinks fall through to the os.remove path
and are deleted.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@CHANGELOG.rst`:
- Line 5: The changelog entry "* Fetch git submodules in git subproject at
pinned revision (`#1015`)" references the wrong issue number; verify whether the
correct reference is `#1017` (per PR metadata) or `#1013` (per commit summary) and
update that parenthetical to the confirmed issue number so the CHANGELOG.rst
entry matches the PR and commit message.

In `@dfetch/util/util.py`:
- Line 46: Fix the minor typo in the docstring that currently reads "Delete an
file or directory safely." by changing it to "Delete a file or directory
safely."; update the docstring text in dfetch/util/util.py where that string
appears so the grammar is correct.

In `@dfetch/vcs/git.py`:
- Around line 299-305: In submodules() fix nested-submodule resolution: stop
using sm_path and the single cached urls map; instead parse each toplevel's
.gitmodules to build a fresh urls mapping per toplevel, use the $displaypath
value (not $sm_path) when emitting/looking up submodule entries and when
constructing full paths, and ensure url lookups use the corresponding
toplevel-specific map so names/URLs come from the same .gitmodules file.

In `@features/fetch-git-repo-with-submodule.feature`:
- Line 70: The expected "last fetch" value in
features/fetch-git-repo-with-submodule.feature is hardcoded to "26/02/2026,
20:28:24" and must be aligned with the suite's fixed clock; update the "last
fetch" expectation (the line containing the literal last fetch timestamp) to use
the shared fixed timestamp "02/07/2021, 20:25:56" (the same value asserted in
features/list-projects.feature) so the scenario uses the common clock fixture.

In `@features/steps/generic_steps.py`:
- Around line 333-335: The step function step_impl for the "{name} exists" step
currently delegates to check_file_exists which uses os.path.isfile() and only
accepts regular files; update the behavior to check for any filesystem entry by
either (a) changing check_file_exists to use os.path.exists() instead of
os.path.isfile(), or (b) create/rename a helper like check_path_exists that uses
os.path.exists() and call that from step_impl so the step truly asserts path
existence (reference: step_impl and check_file_exists).

---

Outside diff comments:
In `@dfetch/vcs/git.py`:
- Around line 305-316: The code captures submodules and current SHA before
applying the src remap, causing dependency paths to persist pre-remap locations;
move the src rewrite so it runs before snapshotting submodules and reading HEAD.
Specifically, call self.move_src_folder_up(remote, src) (with the same
parameters) before invoking self.submodules() and before the run_on_cmdline(...
"git", "rev-parse", "HEAD") call so the exported tree and recorded submodule
paths reflect the remapped src layout.

---

Nitpick comments:
In `@dfetch/util/util.py`:
- Around line 45-51: The safe_rm function uses os.path.exists which returns
False for broken symlinks; change the existence check to os.path.lexists(path)
so broken symlinks are detected and removed. In safe_rm (and when calling
safe_rmtree/remove), keep the existing branch logic (os.path.isdir ->
safe_rmtree, else -> os.remove) but gate it on os.path.lexists instead of
os.path.exists so broken symlinks fall through to the os.remove path and are
deleted.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: c047622b-f1a4-4b2e-9810-993a1556fbe1

📥 Commits

Reviewing files that changed from the base of the PR and between 44290f0 and 07b6553.

📒 Files selected for processing (16)
  • CHANGELOG.rst
  • dfetch/log.py
  • dfetch/project/gitsubproject.py
  • dfetch/project/metadata.py
  • dfetch/project/subproject.py
  • dfetch/project/svnsubproject.py
  • dfetch/reporting/stdout_reporter.py
  • dfetch/util/util.py
  • dfetch/vcs/git.py
  • example/dfetch.yaml
  • features/fetch-git-repo-with-submodule.feature
  • features/list-projects.feature
  • features/steps/generic_steps.py
  • features/steps/git_steps.py
  • tests/test_report.py
  • tests/test_subproject.py

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Nitpick comments (1)
dfetch/project/subproject.py (1)

151-160: Confusing variable naming: dependency is a list but named singular, and shadows itself in comprehension.

The variable dependency holds a list[VcsDependency] but uses a singular name. The list comprehension for dependency in dependency then shadows this variable, which works but is confusing and error-prone.

Consider renaming to vcs_deps or vcs_dependencies for clarity, matching the naming pattern used in GitSubProject._fetch_impl.

♻️ Suggested fix for clarity
-            actually_fetched, dependency = self._fetch_impl(to_fetch)
+            actually_fetched, vcs_deps = self._fetch_impl(to_fetch)
         self._log_project(f"Fetched {actually_fetched}")

         applied_patches = self._apply_patches(patch_count)

         self.__metadata.fetched(
             actually_fetched,
             hash_=hash_directory(self.local_path, skiplist=[self.__metadata.FILENAME]),
             patch_=applied_patches,
-            dependencies=[dependency.to_dependency() for dependency in dependency],
+            dependencies=[dep.to_dependency() for dep in vcs_deps],
         )
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@dfetch/project/subproject.py` around lines 151 - 160, The variable named
`dependency` is actually a list and is shadowed in the comprehension; rename it
to a plural name (e.g., `vcs_dependencies` or `vcs_deps`) where it's returned
from _fetch_impl and where it's consumed before calling __metadata.fetched;
update the call site that currently reads `dependency` to `vcs_dependencies` and
change the list comprehension to `[d.to_dependency() for d in
vcs_dependencies]`, ensuring references in the surrounding block (including the
unpack from self._fetch_impl(to_fetch) and the dependencies argument in
__metadata.fetched) use the new plural name consistently (follow the naming
pattern used in GitSubProject._fetch_impl).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@dfetch/project/gitsubproject.py`:
- Around line 72-76: The checkout currently preserves ".gitmodules" via the
must_keeps argument in local_repo.checkout_version and then removes only the
top-level ".git", leaving each submodule's .git entries orphaned; modify the
post-checkout cleanup (around the local_repo.checkout_version call and the later
cleanup block that deletes the superproject .git) to recursively find and remove
any nested ".git" files or directories under the checked-out working tree
(including submodule paths) rather than only deleting the top-level .git,
ensuring you still keep ".gitmodules" long enough to initialize submodules
before removing all nested .git metadata.

In `@dfetch/vcs/git.py`:
- Around line 300-306: The submodule enumeration misses nested submodules and
the relative_to call can raise for non-ancestor src; update the call that
gathers submodules to request recursive enumeration (e.g., make
self.submodules() accept/forward a recursive=True flag or call the git command
with --recursive so nested/transitive submodules returned) and change the path
handling in checkout_version()/where Path(submodule.path).relative_to(Path(src))
is used to avoid ValueError by normalizing src to a directory (if src points to
a file, use src.parent) or by computing a safe relative path via
os.path.relpath/Path(...).relative_to inside a try/except fallback—apply these
changes to the functions self.submodules() and checkout_version() so nested
submodules are included and non-ancestor src inputs do not crash.
- Around line 314-317: The code that sets submodule.path using
Path(submodule.path).relative_to(Path(src)) fails for non-directory or glob
`src` values; update the logic in the loop that mutates submodule.path (in
dfetch/vcs/git.py) to mirror move_src_folder_up() behavior: if src is not a
directory (os.path.isdir(src) is False) replace src with its parent/dirname
before creating the Path for relative_to, and fall back to
os.path.relpath(submodule.path, src) or skip relative calculation if relative_to
still raises ValueError so submodules don't crash the checkout.

---

Nitpick comments:
In `@dfetch/project/subproject.py`:
- Around line 151-160: The variable named `dependency` is actually a list and is
shadowed in the comprehension; rename it to a plural name (e.g.,
`vcs_dependencies` or `vcs_deps`) where it's returned from _fetch_impl and where
it's consumed before calling __metadata.fetched; update the call site that
currently reads `dependency` to `vcs_dependencies` and change the list
comprehension to `[d.to_dependency() for d in vcs_dependencies]`, ensuring
references in the surrounding block (including the unpack from
self._fetch_impl(to_fetch) and the dependencies argument in __metadata.fetched)
use the new plural name consistently (follow the naming pattern used in
GitSubProject._fetch_impl).

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: faf1d394-d889-40b1-9ec4-222dacf51e84

📥 Commits

Reviewing files that changed from the base of the PR and between 07b6553 and d996bda.

📒 Files selected for processing (15)
  • CHANGELOG.rst
  • dfetch/log.py
  • dfetch/project/gitsubproject.py
  • dfetch/project/metadata.py
  • dfetch/project/subproject.py
  • dfetch/project/svnsubproject.py
  • dfetch/reporting/stdout_reporter.py
  • dfetch/util/util.py
  • dfetch/vcs/git.py
  • features/fetch-git-repo-with-submodule.feature
  • features/list-projects.feature
  • features/steps/generic_steps.py
  • features/steps/git_steps.py
  • tests/test_report.py
  • tests/test_subproject.py
✅ Files skipped from review due to trivial changes (1)
  • CHANGELOG.rst
🚧 Files skipped from review as they are similar to previous changes (5)
  • features/fetch-git-repo-with-submodule.feature
  • features/steps/git_steps.py
  • features/steps/generic_steps.py
  • dfetch/util/util.py
  • tests/test_subproject.py

Comment on lines +72 to +76
fetched_sha, submodules = local_repo.checkout_version(
remote=self.remote,
version=rev_or_branch_or_tag,
src=self.source,
must_keeps=license_globs,
must_keeps=license_globs + [".gitmodules"],
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
set -euo pipefail

tmp="$(mktemp -d)"
trap 'rm -rf "$tmp"' EXIT

git init "$tmp/child" >/dev/null
git -C "$tmp/child" config user.email review@example.com
git -C "$tmp/child" config user.name review
touch "$tmp/child/README"
git -C "$tmp/child" add README
git -C "$tmp/child" commit -m init >/dev/null

git init "$tmp/parent" >/dev/null
git -C "$tmp/parent" config user.email review@example.com
git -C "$tmp/parent" config user.name review
touch "$tmp/parent/.keep"
git -C "$tmp/parent" add .keep
git -C "$tmp/parent" commit -m init >/dev/null

git -C "$tmp/parent" -c protocol.file.allow=always submodule add "$tmp/child" deps/child >/dev/null
git -C "$tmp/parent" commit -am add-submodule >/dev/null

test -f "$tmp/parent/.gitmodules" && echo ".gitmodules is a tracked checkout file"
test -e "$tmp/parent/deps/child/.git" && echo "submodule worktree has its own .git entry"

rm -rf "$tmp/parent/.git"
test -e "$tmp/parent/deps/child/.git" && echo "nested .git entry still exists after deleting only the top-level .git"

Repository: dfetch-org/dfetch

Length of output: 1446


🏁 Script executed:

cat -n dfetch/project/gitsubproject.py | sed -n '70,105p'

Repository: dfetch-org/dfetch

Length of output: 1732


🏁 Script executed:

fd -t f "gitsubproject.py" --exec wc -l {} \;

Repository: dfetch-org/dfetch

Length of output: 98


🏁 Script executed:

rg -A 5 -B 5 "safe_rmtree.*METADATA_DIR" dfetch/project/gitsubproject.py

Repository: dfetch-org/dfetch

Length of output: 484


🏁 Script executed:

rg "METADATA_DIR|GIT_MODULES_FILE" dfetch/ -t py

Repository: dfetch-org/dfetch

Length of output: 476


🏁 Script executed:

rg "def safe_rmtree|def safe_rm" dfetch/ -A 3 -t py

Repository: dfetch-org/dfetch

Length of output: 547


🏁 Script executed:

head -30 dfetch/project/gitsubproject.py

Repository: dfetch-org/dfetch

Length of output: 857


Remove all nested .git directories after checkout, not just the superproject metadata.

The code keeps .gitmodules during checkout to initialize submodules, then deletes it along with only the top-level .git. This leaves each submodule's .git entry orphaned and pointing to now-deleted superproject metadata.

🛠️ Suggested fix
         safe_rmtree(os.path.join(self.local_path, local_repo.METADATA_DIR))
-        safe_rm(os.path.join(self.local_path, local_repo.GIT_MODULES_FILE))
+        for nested_git_metadata in pathlib.Path(self.local_path).rglob(
+            local_repo.METADATA_DIR
+        ):
+            safe_rm(nested_git_metadata)

Also applies to: 97-98

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@dfetch/project/gitsubproject.py` around lines 72 - 76, The checkout currently
preserves ".gitmodules" via the must_keeps argument in
local_repo.checkout_version and then removes only the top-level ".git", leaving
each submodule's .git entries orphaned; modify the post-checkout cleanup (around
the local_repo.checkout_version call and the later cleanup block that deletes
the superproject .git) to recursively find and remove any nested ".git" files or
directories under the checked-out working tree (including submodule paths)
rather than only deleting the top-level .git, ensuring you still keep
".gitmodules" long enough to initialize submodules before removing all nested
.git metadata.

Comment on lines +300 to +306
run_on_cmdline(
logger,
["git", "submodule", "update", "--init", "--recursive"],
env=_extend_env_for_non_interactive_mode(),
)

submodules = self.submodules()
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🌐 Web query:

Does git submodule foreachrecurse into nested submodules by default, or is--recursive required?

💡 Result:

git submodule foreach does not recurse into nested submodules by default; it runs the command only in the submodules of the current superproject. To run the command in nested submodules too, you must pass --recursive. [1], [2]


🏁 Script executed:

# Check the implementation of submodules() method in dfetch/vcs/git.py
rg -n "def submodules" dfetch/vcs/git.py -A 15

Repository: dfetch-org/dfetch

Length of output: 615


🏁 Script executed:

# Check lines 314-317 for the relative_to() usage
sed -n '310,320p' dfetch/vcs/git.py | cat -n

Repository: dfetch-org/dfetch

Length of output: 464


🏁 Script executed:

python <<'EOF'
from pathlib import Path

# Test 1: src is a parent directory (should work)
try:
    result = Path("src/submodule1").relative_to(Path("src"))
    print(f"✓ Test 1 (parent dir): {result}")
except ValueError as e:
    print(f"✗ Test 1 (parent dir): ValueError - {e}")

# Test 2: src is not a parent directory
try:
    result = Path("src/submodule1").relative_to(Path("other"))
    print(f"✓ Test 2 (non-parent dir): {result}")
except ValueError as e:
    print(f"✗ Test 2 (non-parent dir): ValueError - {e}")

# Test 3: src is a file (not a directory)
try:
    result = Path("src/submodule1").relative_to(Path("src/file.txt"))
    print(f"✓ Test 3 (file path): {result}")
except ValueError as e:
    print(f"✗ Test 3 (file path): ValueError - {e}")

# Test 4: src is the full path (exact match)
try:
    result = Path("src/submodule1").relative_to(Path("src/submodule1"))
    print(f"✓ Test 4 (exact match): {result}")
except ValueError as e:
    print(f"✗ Test 4 (exact match): ValueError - {e}")
EOF

Repository: dfetch-org/dfetch

Length of output: 404


Nested submodules are fetched here but never enumerated or returned.

Line 302 updates all submodules recursively with --recursive, but line 306 calls self.submodules(), which uses git submodule foreach --quiet without --recursive. This means nested (transitive) submodules get checked out but are omitted from the returned dependency list, causing them to disappear from metadata and dependency reports.

Additionally, lines 314–317 call Path(submodule.path).relative_to(Path(src)) without validating that src is actually a parent directory. The checkout_version() method documents src as an optional path to a subdirectory or file, but relative_to() raises ValueError if src is not a proper ancestor directory (e.g., if src points to a file or a sibling path). This will crash at runtime for such inputs.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@dfetch/vcs/git.py` around lines 300 - 306, The submodule enumeration misses
nested submodules and the relative_to call can raise for non-ancestor src;
update the call that gathers submodules to request recursive enumeration (e.g.,
make self.submodules() accept/forward a recursive=True flag or call the git
command with --recursive so nested/transitive submodules returned) and change
the path handling in checkout_version()/where
Path(submodule.path).relative_to(Path(src)) is used to avoid ValueError by
normalizing src to a directory (if src points to a file, use src.parent) or by
computing a safe relative path via os.path.relpath/Path(...).relative_to inside
a try/except fallback—apply these changes to the functions self.submodules() and
checkout_version() so nested submodules are included and non-ancestor src inputs
do not crash.

Comment on lines 314 to +317
if src:
for submodule in submodules:
submodule.path = str(Path(submodule.path).relative_to(Path(src)))

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
python - <<'PY'
from pathlib import Path

cases = [
    ("pkg/submodule", "pkg"),
    ("pkg/submodule", "pkg/*"),
    ("pkg/submodule", "pkg/file.txt"),
]

for submodule, src in cases:
    try:
        rel = Path(submodule).relative_to(Path(src))
        print(f"{src!r}: OK -> {rel}")
    except Exception as exc:
        print(f"{src!r}: {type(exc).__name__}: {exc}")
PY

Repository: dfetch-org/dfetch

Length of output: 339


🏁 Script executed:

# First, let me examine the file structure and find the method containing lines 314-317
git ls-files dfetch/vcs/git.py

Repository: dfetch-org/dfetch

Length of output: 78


🏁 Script executed:

# Read the relevant section of the file with context
head -330 dfetch/vcs/git.py | tail -50

Repository: dfetch-org/dfetch

Length of output: 1837


🏁 Script executed:

# Get more context to understand the method and its documentation
head -350 dfetch/vcs/git.py | tail -100

Repository: dfetch-org/dfetch

Length of output: 3772


🏁 Script executed:

# Search for usages of checkout_version to see what src values are passed
rg "checkout_version" --type py -B 2 -A 5

Repository: dfetch-org/dfetch

Length of output: 959


🏁 Script executed:

# Look for test cases that might show how src is used
fd "test.*\.py$|.*_test\.py$" -type f | head -20

Repository: dfetch-org/dfetch

Length of output: 231


🏁 Script executed:

# Search for test files properly
find . -name "*test*.py" -o -name "test_*.py" | head -20

Repository: dfetch-org/dfetch

Length of output: 455


🏁 Script executed:

# Look at the GitSubproject class to understand what source can be
rg "class GitSubproject" -A 30 dfetch/project/gitsubproject.py

Repository: dfetch-org/dfetch

Length of output: 43


🏁 Script executed:

# Search for source property/attribute definitions
rg "self.source\s*=" dfetch/project/gitsubproject.py -B 2 -A 2

Repository: dfetch-org/dfetch

Length of output: 43


🏁 Script executed:

# Read the gitsubproject.py file to understand the source attribute
cat dfetch/project/gitsubproject.py | head -200

Repository: dfetch-org/dfetch

Length of output: 4851


🏁 Script executed:

# Search for where source is set in gitsubproject
rg "source" dfetch/project/gitsubproject.py | head -20

Repository: dfetch-org/dfetch

Length of output: 138


🏁 Script executed:

# Look at test cases that test src parameter
rg "src=" tests/test_git_vcs.py -B 3 -A 3

Repository: dfetch-org/dfetch

Length of output: 43


🏁 Script executed:

# Find the SubProject base class to see what source is
rg "class SubProject" dfetch/project/subproject.py -A 50 | head -80

Repository: dfetch-org/dfetch

Length of output: 1768


🏁 Script executed:

# Search for source property definition
rg "def source|self\.source\s*=" dfetch/project/subproject.py -B 2 -A 2

Repository: dfetch-org/dfetch

Length of output: 197


🏁 Script executed:

# Check the ProjectEntry to see what source field is
rg "source" dfetch/manifest/project.py | head -15

Repository: dfetch-org/dfetch

Length of output: 174


🏁 Script executed:

# Look at the ProjectEntry class definition
rg "class ProjectEntry" dfetch/manifest/project.py -A 100 | head -150

Repository: dfetch-org/dfetch

Length of output: 3534


🏁 Script executed:

# Look for the source field definition in ProjectEntry
rg "source:" dfetch/manifest/project.py -B 5 -A 5

Repository: dfetch-org/dfetch

Length of output: 43


🏁 Script executed:

# Check test cases to see what source values are actually used
rg "source\s*=" tests/test_git_vcs.py -B 2 -A 2

Repository: dfetch-org/dfetch

Length of output: 43


🏁 Script executed:

# Look at test files for git vcs to see how src is used
cat tests/test_git_vcs.py | head -300

Repository: dfetch-org/dfetch

Length of output: 5634


🏁 Script executed:

# Search for any test that uses src or source parameter
rg "\"src\":|'src':" tests/ -B 3 -A 3

Repository: dfetch-org/dfetch

Length of output: 746


🏁 Script executed:

# Look for test cases that test the checkout_version function
rg "checkout_version" tests/ -B 5 -A 10

Repository: dfetch-org/dfetch

Length of output: 43


🏁 Script executed:

# Search for manifest examples or documentation about src
find . -name "*.md" -o -name "*.rst" | xargs grep -l "src" 2>/dev/null | head -10

Repository: dfetch-org/dfetch

Length of output: 138


🏁 Script executed:

# Look for any actual usage examples with src values
rg "src.*=" . -g "*.md" -g "*.rst" -g "*.yaml" -g "*.yml" 2>/dev/null | head -20

Repository: dfetch-org/dfetch

Length of output: 43


🏁 Script executed:

# Check if there are integration tests or examples with src parameter
find . -name "*.yaml" -o -name "*.yml" | xargs grep "src:" 2>/dev/null | head -20

Repository: dfetch-org/dfetch

Length of output: 282


🏁 Script executed:

# Read the manifest documentation to understand src parameter
cat doc/manifest.rst | grep -A 20 "src"

Repository: dfetch-org/dfetch

Length of output: 463


🏁 Script executed:

# Look at the actual example to understand src patterns
cat example/dfetch.yaml | head -100

Repository: dfetch-org/dfetch

Length of output: 1755


🏁 Script executed:

# Let's verify the actual behavior with the glob patterns from the repo
python3 << 'EOF'
from pathlib import Path

# Real examples from the repo
cases = [
    ("src/mymodule", "*.sh"),
    ("src/mymodule", "*.puml"),
    ("src/mymodule", "sphinxcontrib/asciinema"),
    ("src/mymodule", "src/*"),
    ("src/mymodule", "src/*.txt"),
    ("src/mymodule", "src"),
]

for submodule, src in cases:
    try:
        rel = Path(submodule).relative_to(Path(src))
        print(f"'{src}': OK -> {rel}")
    except Exception as exc:
        print(f"'{src}': {type(exc).__name__}: {exc}")
EOF

Repository: dfetch-org/dfetch

Length of output: 726


🏁 Script executed:

# Read move_src_folder_up to understand the full context
sed -n '330,360p' dfetch/vcs/git.py

Repository: dfetch-org/dfetch

Length of output: 1045


🏁 Script executed:

# Check if submodules are even involved when src is a glob/file
# Look at what submodules() returns
rg "def submodules" dfetch/vcs/git.py -A 30

Repository: dfetch-org/dfetch

Length of output: 1163


🏁 Script executed:

# Understand the complete flow - when does relative_to get called?
sed -n '300,330p' dfetch/vcs/git.py

Repository: dfetch-org/dfetch

Length of output: 1043


🏁 Script executed:

# Let's trace through what happens with a glob src value in sparse checkout
# Check the sparse-checkout file format
cat doc/manifest.rst | grep -A 30 -B 5 "Source"

Repository: dfetch-org/dfetch

Length of output: 515


🏁 Script executed:

# Let's look at the _determine_ignore_paths to see how it handles globs
sed -n '345,370p' dfetch/vcs/git.py

Repository: dfetch-org/dfetch

Length of output: 934


🏁 Script executed:

# The key question: does sparse checkout with glob patterns work with submodules?
# Let's look at what sparse_checkout_file contains
sed -n '270,290p' dfetch/vcs/git.py

Repository: dfetch-org/dfetch

Length of output: 1108


🏁 Script executed:

# Let's check if there's any filtering of submodules based on sparse-checkout
# Look for any tests that combine src with submodules
rg "src.*=" tests/test_subproject.py -B 3 -A 3

Repository: dfetch-org/dfetch

Length of output: 43


🏁 Script executed:

# Look at the complete picture - does sparse-checkout affect submodule paths?
# Check if there are any tests that test src with submodules
rg "submodule" tests/test_git_vcs.py -B 2 -A 2

Repository: dfetch-org/dfetch

Length of output: 43


🏁 Script executed:

# Let me verify the actual behavior by looking at a real scenario
# When src="src/*", what would submodule.path contain?
cat dfetch/vcs/git.py | grep -A 20 "class Submodule"

Repository: dfetch-org/dfetch

Length of output: 606


Path.relative_to() crashes with glob or file src values when submodules are present.

The method signature and documentation indicate that src accepts file and glob patterns (e.g., src="pkg/*", src="pkg/file.txt"), and the repo's own examples use these patterns. However, calling Path(submodule.path).relative_to(Path(src)) on line 314-317 will raise ValueError for any non-directory src, causing the entire checkout to fail if the repository contains submodules.

The move_src_folder_up() method explicitly handles this case with if not os.path.isdir(src): src = os.path.dirname(src), confirming that file paths are intentionally supported. The relative path calculation needs similar handling.

🛠️ Suggested fix
         if src:
-                for submodule in submodules:
-                    submodule.path = str(Path(submodule.path).relative_to(Path(src)))
+                src_base = src if os.path.isdir(src) else os.path.dirname(src)
+                filtered_submodules = []
+                for submodule in submodules:
+                    rel_path = os.path.relpath(submodule.path, src_base or ".")
+                    if rel_path.startswith(".."):
+                        continue
+                    submodule.path = rel_path
+                    filtered_submodules.append(submodule)
+                submodules = filtered_submodules
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
if src:
for submodule in submodules:
submodule.path = str(Path(submodule.path).relative_to(Path(src)))
if src:
src_base = src if os.path.isdir(src) else os.path.dirname(src)
filtered_submodules = []
for submodule in submodules:
rel_path = os.path.relpath(submodule.path, src_base or ".")
if rel_path.startswith(".."):
continue
submodule.path = rel_path
filtered_submodules.append(submodule)
submodules = filtered_submodules
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@dfetch/vcs/git.py` around lines 314 - 317, The code that sets submodule.path
using Path(submodule.path).relative_to(Path(src)) fails for non-directory or
glob `src` values; update the logic in the loop that mutates submodule.path (in
dfetch/vcs/git.py) to mirror move_src_folder_up() behavior: if src is not a
directory (os.path.isdir(src) is False) replace src with its parent/dirname
before creating the Path for relative_to, and fall back to
os.path.relpath(submodule.path, src) or skip relative calculation if relative_to
still raises ValueError so submodules don't crash the checkout.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants