fix: prevent git task workers from diverging during sync fan-out (#9349)#9360
fix: prevent git task workers from diverging during sync fan-out (#9349)#9360polmichel wants to merge 15 commits into
Conversation
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
get_commit_value only raises ValueError (branch absent) or InvalidGitRepositoryError (clone missing/corrupt). Catching the GitError base class also swallowed GitCommandError, HookExecutionError, and other unexpected git failures, masking them behind a silent fall-back to pull(). Narrow both pin-resolution sites to the exceptions actually expected. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
After merging into the destination branch, resolve the resulting commit and broadcast it in RefreshGitFetch so fan-out workers check out the merge commit instead of pulling the destination branch to whatever upstream HEAD is at that moment, keeping the pool converged if upstream advances during fan-out. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Resolve the new branch's commit and broadcast it in RefreshGitFetch so fan-out workers check out that exact SHA rather than pulling the branch to whatever upstream HEAD is at fetch time. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
| await repo.sync_from_remote() | ||
|
|
||
| # Notify other workers they need to clone the repository | ||
| notification = messages.RefreshGitFetch( |
This comment was marked as outdated.
This comment was marked as outdated.
Sorry, something went wrong.
There was a problem hiding this comment.
1 issue found across 8 files
Confidence score: 3/5
- There is a concrete regression risk in
backend/infrahub/git/tasks.py:pull_read_onlybroadcastscommit=model.commitrather than the commit actually synced, which can mislead downstream workers. - The most severe impact is user-facing sync divergence: when
model.commitisNone, fan-out workers may fall back topull()and reintroduce drift instead of converging on the intended revision. - Given the issue’s relatively high severity/confidence (7/10, 8/10), this is a meaningful but likely targeted fix rather than a broad systemic failure, so merge risk is moderate.
- Pay close attention to
backend/infrahub/git/tasks.py- ensure the broadcasted commit matches the synced commit to prevent fallback pulls and divergence.
Shadow auto-approve: would not auto-approve because issues were found.
Re-trigger cubic
Add the generated row for the new commit field and tighten its description to a single line so the message-bus events reference matches generation. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
A ref-only pull (commit unset) resolved a concrete SHA during sync but broadcast the unset commit, leaving fan-out workers to re-resolve the ref independently and diverge. Resolve the ref once and use that SHA for both the sync and the broadcast. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Resolve the ref to a concrete SHA once and use it for both the initial sync and the RefreshGitFetch broadcast, so fan-out workers cloning the new read-only repository converge on that commit instead of re-resolving the ref. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
| @@ -145,6 +157,7 @@ async def add_git_repository_read_only(model: GitRepositoryAddReadOnly) -> None: | |||
| repository_kind=InfrahubKind.REPOSITORY, | |||
There was a problem hiding this comment.
Is this an issue that this is tagged as REPOSITORY and not a read-only repository type?
There was a problem hiding this comment.
1 issue found across 3 files (changes from recent commits).
Prompt for AI agents (unresolved issues)
Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.
<file name="docs/docs/reference/message-bus-events.mdx">
<violation number="1" location="docs/docs/reference/message-bus-events.mdx:82">
P2: The Type column for the `commit` field is set to "N/A", but a commit SHA is a string value. This is inconsistent with the other fields in the same table (all typed as "string").</violation>
</file>
Shadow auto-approve: would not auto-approve because issues were found.
Re-trigger cubic
| | **repository_kind** | The type of repository | string | None | | ||
| | **infrahub_branch_name** | Infrahub branch on which to sync the remote repository | string | None | | ||
| | **infrahub_branch_id** | Id of the Infrahub branch on which to sync the remote repository | string | None | | ||
| | **commit** | Commit SHA to check out, pinned by the sync orchestrator instead of pulling the latest upstream HEAD | N/A | None | |
There was a problem hiding this comment.
P2: The Type column for the commit field is set to "N/A", but a commit SHA is a string value. This is inconsistent with the other fields in the same table (all typed as "string").
Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At docs/docs/reference/message-bus-events.mdx, line 82:
<comment>The Type column for the `commit` field is set to "N/A", but a commit SHA is a string value. This is inconsistent with the other fields in the same table (all typed as "string").</comment>
<file context>
@@ -79,6 +79,7 @@ For more detailed explanations on how these events are used within Infrahub, see
| **repository_kind** | The type of repository | string | None |
| **infrahub_branch_name** | Infrahub branch on which to sync the remote repository | string | None |
| **infrahub_branch_id** | Id of the Infrahub branch on which to sync the remote repository | string | None |
+| **commit** | Commit SHA to check out, pinned by the sync orchestrator instead of pulling the latest upstream HEAD | N/A | None |
<!-- vale on -->
<!-- vale off -->
</file context>
| | **commit** | Commit SHA to check out, pinned by the sync orchestrator instead of pulling the latest upstream HEAD | N/A | None | | |
| | **commit** | Commit SHA to check out, pinned by the sync orchestrator instead of pulling the latest upstream HEAD | string | None | |
There was a problem hiding this comment.
This documentation is auto-generated. This is surely due to the fact that the field as a default value at None, but I am not totally sure.
There was a problem hiding this comment.
Thanks for the context—understood that this table is auto-generated.
The add and read-only-add flows now resolve a commit and broadcast it, so the mocked repos must return a SHA from get_commit_value (and expose ref for the read-only flow), and the read-only sync assertion expects the pinned commit. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
0 issues found across 1 file (changes from recent commits).
Shadow auto-approve: would require human review. This PR modifies core git synchronization logic across multiple workers, introducing commit pinning and a new reset_to_commit method to prevent divergence; such changes affect a critical path for repository data integrity and require human review due to potential impact on fan-out behavior...
Re-trigger cubic
There was a problem hiding this comment.
0 issues found across 1 file (changes from recent commits).
Shadow auto-approve: would require human review. This PR modifies core git synchronization logic to prevent worker divergence, introducing a new reset_to_commit method, altering pull and message broadcast behavior, and changing concurrency handling under locks; while well-tested, the blast radius and risk to data integrity across all...
Re-trigger cubic
Why
In a multi-worker task pool, the git sync flow fans work out across workers. When a new upstream commit landed between the orchestrator resolving HEAD and the other workers running their own
git pull, workers ended up on different commits for the same repo.Goal: make every fan-out worker converge on the exact commit the orchestrator pinned, regardless of what lands upstream during fan-out.
This PR also carries a second, independent git fix (see below).
Closes #9349
What changed
Behavioral
RefreshGitFetch. Receiving workers check out that exact SHA instead of fast-forwarding to whatever upstream HEAD currently is.infrahub_branch). The branch is now bound before the commit is written.How to review
A commit by commit approach would be relevant since there is:
Following commits are unitary fix or similar issue using the new pinned commit, or documentation/lint/format matter.
Extra scrutiny welcome on the pin source per flow.
How to test
Known follow-ups (intentionally out of scope)
repository_kind=REPOSITORYfor a read-only repo. Worth confirming independently.Checklist
9349.fixed.md,+pull-new-branch-unbound-commit.fixed.md)Summary by cubic
Prevents git task workers from diverging during sync fan-out by pinning the orchestrator’s commit and hard-resetting workers to that exact SHA under a repository lock. Also fixes a branch-creation pull crash and ensures commit values are recorded. Addresses IFC-2642.
committoRefreshGitFetch; workers now fetch and hard-reset viareset_to_commitunder the repo lock, falling back topullif the pin is missing or can’t be resolved.ValueErrorandInvalidGitRepositoryError; added a fan-out convergence test and a commit-update test; updated RPC mocks and docs for the newcommitfield.Written for commit 23e08ed. Summary will update on new commits. Review in cubic