Add github-repository-inventory skill 🤖🤖🤖#1995
Open
Kinosaur wants to merge 2 commits into
Open
Conversation
Read-only, privacy-first Agent Skill that builds an evidence-based inventory of a GitHub account's footprint — owned, collaborator/organization (access), and contribution-evidenced (authored/reviewed PRs, authored issues, commits) repos — modelling each relationship with evidence and a confidence level, never conflating access with contribution. Fills an account-level gap not covered by existing single-repo skills. Standard-library Python under scripts/; gh owns auth; read-only. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Contributor
🔍 Skill Validator Results✅ All checks passed
Summary
Full validator output |
Contributor
There was a problem hiding this comment.
Pull request overview
Note
Copilot was unable to run its full agentic suite in this review.
Introduces a new github-repository-inventory skill that scans an authenticated GitHub account (via gh) to produce a deterministic, privacy-first inventory of owned, contributed-to, and optionally accessible/private repositories, plus derived reports and schema validation.
Changes:
- Add end-to-end inventory CLI (
scan/discover/inspect/render/report/validate) that discovers repos via REST/GraphQL/Search, inspects README + top-level contents, and writescatalog.json,PROJECTS.md, andwarnings.json. - Add pure technology detection and project-type classification based on root manifests + dependency parsing.
- Add documentation/reference materials and register the skill in
docs/README.skills.md.
Reviewed changes
Copilot reviewed 9 out of 9 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
| skills/github-repository-inventory/scripts/inventory.py | Implements the CLI workflow, discovery/inspection pipeline, deterministic rendering, caching, and schema validation. |
| skills/github-repository-inventory/scripts/detect_technologies.py | Adds pure manifest detection + dependency parsing to emit evidence-backed technologies and classify project types. |
| skills/github-repository-inventory/references/repository-schema.md | Documents the repository record/catalog schema for humans. |
| skills/github-repository-inventory/references/relationship-model.md | Documents relationship types/evidence/confidence model used in outputs. |
| skills/github-repository-inventory/references/privacy-rules.md | Documents privacy guarantees and safe handling rules for private data. |
| skills/github-repository-inventory/references/limitations.md | Documents known coverage/inspection limitations surfaced in warnings. |
| skills/github-repository-inventory/assets/projects-template.md | Provides a PROJECTS.md template reference for the skill. |
| skills/github-repository-inventory/SKILL.md | Defines the skill metadata, safety defaults, and usage workflow/commands. |
| docs/README.skills.md | Registers the new skill in the skills index. |
Comment on lines
+555
to
+566
| warnings.append( | ||
| { | ||
| "code": "coverage-scope", | ||
| "repo": "", | ||
| "message": ( | ||
| "Covers PUBLIC repositories the account owns plus those with PUBLIC " | ||
| "contribution evidence (authored/reviewed pull requests, authored issues). " | ||
| "Collaborator, organization, and private repositories are not included by " | ||
| "default. This is not 'everything you ever contributed to'." | ||
| ), | ||
| } | ||
| ) |
Comment on lines
+7
to
+8
| - **Public repositories only.** Private repositories are excluded unless the user passes | ||
| `--include-private`. (Private support is not in Phase 1.) |
Comment on lines
+14
to
+17
| ## When private repos are included (later phase) | ||
|
|
||
| - Write output only to a local cache directory, and ensure that directory is gitignored | ||
| (`github-inventory/`, `.inventory-cache/`). |
|
|
||
| | Field | Notes | | ||
| |---|---| | ||
| | `schema_version` | e.g. `"0.1.0"`. | |
Comment on lines
+95
to
+97
| DISCOVERY_ENDPOINT = ( | ||
| "/user/repos?affiliation=owner&visibility=public&per_page=100" | ||
| ) |
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
A read-only, privacy-first Agent Skill that answers one question the platform does not make easy and models do unreliably: what has this GitHub account actually contributed to — across repos it does not own — and what is the evidence? It models the user's relationship to each repo (owner / collaborator / org-member / PR author / PR reviewer / issue author / commit-contributor) with the evidence and a confidence level, and never conflates repository access with verified contribution.
Fills a gap in the collection
Existing skills operate on a single checked-out repo (e.g.
acquire-codebase-knowledge,repo-story-time); none work at the account level across owned and external repositories with a relationship-evidence model. This is a distinct capability, not a duplicate.Why this is meaningful uplift (not something frontier models already do well)
gh; the transform/render core is pure and unit-tested against recorded fixtures (no live API in tests).warnings.jsonrather than hidden.What it produces
catalog.json(canonical),PROJECTS.md(human-readable, localizable — English/Burmese),warnings.json, and optional derived reports (technology summary, portfolio candidates, missing READMEs, contribution summary).Best for
Reconstructing a verifiable contribution history (job applications, reviews, proving cross-org/contract work) and giving an agent a truthful tool instead of letting it guess. A focused, occasional-use utility — and honest about that.
Requirements
Python 3.11+ and an authenticated GitHub CLI (
gh) 2.90.0+. Bundles a small standard-library Python program underscripts/(no third-party packages;ghowns auth). Read-only; never modifies repositories.Upstream source & full test suite (80+ tests): https://github.com/Kinosaur/github-repository-inventory
🤖 Authored with Claude Code.