feat(sync): detect signature drift in rsync and revert/drop#235
Open
mosheabr wants to merge 6 commits into
Open
feat(sync): detect signature drift in rsync and revert/drop#235mosheabr wants to merge 6 commits into
mosheabr wants to merge 6 commits into
Conversation
Adds a new "Detect signature drift in rsync" step to sync-skills.yml that runs before the existing compliance check. Detects when the rsync brought in file changes under skills/<dir>/ without a matching skill.oms.sig refresh — landing those would break crypto verification on the next "Verify Signed Skills" sweep. Recovery rules: - Skill already on main: revert the catalog dir to its pre-rsync (signed) state. The signed copy stays live; the unsigned update is held until the source team runs /nvskills-ci on a follow-up PR and the next sync picks up a fresh sig. - Brand-new skill (no prior version on main): drop outright. Both cases write to /tmp/dropped-skills.txt with a "sig drift" reason. The "Track dropped skills" tracker-issue step is extended to partition entries by reason (missing-artifact vs sig-drift) and surface recovery steps for each. Motivation: hand-cherry-picking clean skills out of sync PRs has been a recurring task this week (PRs #226 -> #231, #232 -> #233, #234) because the source teams' content changes outpace their sig refresh. Automating the drop puts the recovery burden back on the source team where it belongs, and stops drift from propagating to Skills.sh / Codex / Hermes / OpenClaw via the catalog. Test cases the reviewer should think about: 1. Skill unchanged in rsync → not touched (no diff). 2. Skill content + sig both changed → no drift (both moved together). 3. Skill content changed, sig unchanged → drift detected. If on main, revert to HEAD. If new, drop outright. 4. New skill missing sig entirely → caught by existing compliance step (not this one). 5. Concurrent drift across multiple skills in one product → all detected; product still marked changed for PR title. Out of scope: full cryptographic verification of (sig, manifest) pairs — that's what the "Verify Signed Skills" workflow (PR #78) is meant to do periodically. This step catches the common authoring mistake (forgot to /nvskills-ci) at sync time, which is what we've been seeing all week. Signed-off-by: Moshe Abramovitch <moshea@nvidia.com>
P1 fixes:
- Untracked files were invisible to `git diff --name-only HEAD`,
which made the brand-new-skill drift branch unreachable and let
existing-skill drift slip through when the change was an added
file (not a modified one). Now unions diff + ls-files --others
to capture the full rsync delta.
- Compliance step truncated /tmp/dropped-skills.txt, wiping drift
records the new step had written. Switched to `touch` so both
steps append to the same file.
P2 fix:
- `git clean -fd` doesn't remove ignored files, so the revert path
could leave rsync-created ignored orphans behind. Use `git clean
-fdx --` to match the "restore pre-rsync state" contract.
P3 fixes:
- `grep -qx "- $product"` parsed the leading "-" as a flag (silent
failure -> always treated as "not found"). Use `grep -Fqx --`.
Fix applied to both the new drift step and the pre-existing
compliance step which had the same bug.
- Tracker-issue partition was sig-drift vs not-sig-drift, which
lumped orphan drops into "Missing artifacts." Now uses explicit
reason prefixes ("missing artifacts:", "orphan:", "sig drift:")
with a 4th "Other" catch-all so nothing is silently mis-bucketed.
- Title generation refactored to build from non-empty buckets so
it stays accurate as categories come and go.
Refactor:
- Tracker body now uses an `emit_section` shell function for the
three (now four) categories, dropping duplicated code.
Signed-off-by: Moshe Abramovitch <moshea@nvidia.com>
P2 fix: - Drift detection now drives the scan from "skill dirs the rsync touched" (git diff + ls-files --others, top-level skill-dir granularity) rather than "skill dirs containing SKILL.md after rsync." The latter missed cases where rsync deleted or renamed SKILL.md without refreshing skill.oms.sig — the existing signed copy on main would have been silently overwritten with an unsigned state. P3 fix: - Sync PR body now partitions /tmp/dropped-skills.txt by the same reason prefixes the tracker issue uses (missing artifacts:, sig drift:, orphan:, plus a catch-all). Drift entries no longer show up as "missing: sig drift: reverted..." nor get mis-labeled in the PR body. Signed-off-by: Moshe Abramovitch <moshea@nvidia.com>
P1 fix:
- Drift scan no longer awk-extracts "skills/$1/$2" from git output —
that collapsed multi-skill products to one entry, masking drift on
sibling skills and misclassifying new skills under existing
products as "existing on main." Now the rsync loop records the
exact destination dir (skills/<catalog_dir>) into
/tmp/rsynced-skill-dirs.txt as it goes, and the drift step reads
that list directly. Nesting-safe (handles both flat and nested
catalog layouts) and no path-parsing.
P2 fix:
- Loop now reads via `while IFS= read -r dir; do … done < file`
instead of `for dir in $touched_dirs`, so paths with spaces or
other word-splitting characters survive intact.
P3 fix:
- Title join used `IFS=', '; "${parts[*]}"`, but bash array
expansion only uses the first IFS char as separator — output was
"1 missing artifacts,2 sig drift" (no space). Now uses
printf "%s, " + trailing-strip, which gives the intended
", "-separated string.
Signed-off-by: Moshe Abramovitch <moshea@nvidia.com>
P3 fix from Codex 4th review: - sig_changed regex `(^|/)skill\.oms\.sig$` matched ANY path ending in `/skill.oms.sig`, including nested cases like `$dir/docs/skill.oms.sig`. Compliance only counts the root sig at `$dir/skill.oms.sig`, so drift detection must too. Otherwise a team adding a stray nested sig file would mask the stale root. - Replaced regex with exact-match `grep -Fx -- "$sig_path"` against the canonical root path. Signed-off-by: Moshe Abramovitch <moshea@nvidia.com>
Adds auto-assignment of the missing-compliance tracker issue to the catalog PIC (default: mosheabr) so any drift change reliably generates a GitHub notification — independent of repo-watch settings. Without this, the rolling issue update was silent unless the maintainer happened to be subscribed. Configurable via a CATALOG_TRACKER_ASSIGNEES secret holding a comma-separated list of GitHub handles, with a hardcoded fallback so the workflow stays self-contained. `--add-assignee` is idempotent and re-asserts assignment on every run, so a manual unassign doesn't permanently mute notifications. Signed-off-by: Moshe Abramovitch <moshea@nvidia.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds a "Detect signature drift in rsync" step to
sync-skills.ymlthat catches the case where the source-repo team edits skill content without refreshingskill.oms.sig. Recovery is automatic and safe:mainand across downstream channels (Skills.sh, Codex, Claude Code, Hermes, OpenClaw). The unsigned update is held until the source team re-signs.main): drop outright — never land in a drift state.Both write to
/tmp/dropped-skills.txtwith asig drift:reason prefix. The sync PR body and the rollingmissing-compliancetracker issue now partition drops into Missing artifacts / Signature drift / Orphan / Other sections so each failure mode has its own recovery path.Why now
We've hand-cherry-picked three sync PRs this week (#226→#231, #232→#233, #234→pending) because source-repo teams edit skill content without re-running
/nvskills-ci. The unsigned content keeps reaching the bot sync PR; landing it would propagate to Skills.sh + Codex + Hermes + OpenClaw within ~15 minutes and break crypto verification on the next sweep. Automating the drop turns this into a self-service flow: the source team gets pinged, the catalog never goes through a broken-crypto state.How drift is detected
For each catalog dir the rsync actually targeted (recorded in
/tmp/rsynced-skill-dirs.txtduring the rsync loop, so we never path-parse):git diff --name-only HEAD ∪ git ls-files --others --exclude-standard— captures both modifications/deletions and untracked additions.$dir/skill.oms.sigis in that set (exact match viagrep -Fx --, not regex — nested sig files like$dir/docs/skill.oms.sigdon't mask a stale root).git ls-tree HEAD -- "$dir"non-empty) → revert viagit checkout HEAD -- "$dir/"+git clean -fdx -- "$dir/". New skill →rm -rf.Repo-owner visibility
The rolling
missing-compliancetracker issue is now auto-assigned (mosheabrby default; configurable viaCATALOG_TRACKER_ASSIGNEESsecret as a comma-separated list). Every drift change generates a GitHub notification regardless of repo-watch state. Idempotent — re-asserts on each run so manual unassign can't permanently mute notifications.Test cases the design has been validated against
docs/skill.oms.sig) → still drift (root path is what matters).SKILL.mdwithout sig refresh → still caught (driver iterates exact rsync targets, not post-rsyncfind SKILL.md).Bonus fix in this PR
The pre-existing
grep -qx "- $product"at the "mark changed components" line parsed the leading-as a grep option, so the de-dup check silently always fell through. Fixed in both the new drift step and the existing compliance step (grep -Fqx --). Happy to split into a separate fix-bug PR if reviewers prefer.Follow-ups (intentionally out of scope here)
git log -1 --pretty='%ae' -- <changed-files>on the source clone we already have during rsync). Anchored to commit identity, not a static contacts directory — survives team rotations. Tracked as backlog item CI Pipeline to verify skill counts match source repos #3.model_signing verifyis the job of the "Verify Signed Skills" workflow in PR ci: add Verify Signed Skills workflow #78 — the two are complementary (this catches authoring mistakes at sync time; ci: add Verify Signed Skills workflow #78 catches cryptographic mismatches periodically).Review history
4 rounds of Codex review on a private draft branch (
mosheabr/skills-drafts#1) prior to raising upstream. All findings addressed before this PR opened.Stats
.github/workflows/sync-skills.yml