feat: add git sparse checkout mode for batch scanning by SecKatie · Pull Request #1 · north-echo/fluxgate

SecKatie · 2026-03-30T12:20:16Z

Summary

Adds --clone flag to batch and discover commands to scan repos via local git sparse checkout instead of the GitHub API, avoiding rate limits at scale
Implements sliding star-count windows in FetchTopRepos to paginate beyond GitHub's 1,000-result search limit
Caches repo lists in SQLite so --resume with --top N skips re-fetching
Hardens SQLite for concurrent goroutine writes (single-conn serialization + busy_timeout)

Test plan

go test -short ./... passes
Verified with fluxgate batch --top 500 --clone --resume (389/500 scanned, 382 with findings)
Test --resume re-run skips already-scanned repos
Test --keep flag preserves cloned directories
Test discover --clone path

🤖 Generated with Claude Code

…riage Four new detection capabilities based on Red Hat triage analysis: - Gap 1: Actor guards — detect github.actor == 'bot[bot]' gates (→ info) and human actor restrictions (→ downgrade by 1) - Gap 2: Action-based permission gates — recognize actions-cool/check-user-permission and similar third-party permission-checking actions as maintainer checks - Gap 3: Cross-job needs: gating — follow needs: chains to detect environment approval gates on upstream authorize jobs (→ downgrade by 1) - Gap 4: Path isolation — detect fork code checked out to subdirectory with no direct execution, downgrade confidence to pattern-only Also adds fork guard to Fluxgate's own CI workflow (fixes FG-006 self-finding). Validated against 11 triage findings: 5 false criticals corrected automatically, 2 false positives eliminated, 2 clean criticals unchanged. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…ulk scanning New GitHub Actions rules: - FG-008: OIDC misconfiguration (id-token:write on fork-accessible triggers) - FG-009: Self-hosted runner exposure on PR/PRT workflows - FG-010: Cache poisoning via actions/cache on external triggers Cross-platform CI/CD support: - Platform-agnostic Pipeline interface (internal/cicd) - GitLab CI parser with 4 rules (GL-001 MR secrets, GL-002 script injection, GL-003 unsafe includes, GL-009 self-hosted MR runner) Bulk scanning infrastructure: - BigQuery ingest command for large-scale workflow analysis - Gato-X import command for converting discovery output to repo lists - Analysis SQL queries for scan campaign reporting Fixes: - Route all warning output to stderr (fixes JSON output corruption) - Fix runs-on group+labels parser to prefer labels over group name - Version bump to v0.5.0 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Add cross-platform Azure DevOps Pipelines support: - AZ-001: PR builds with secret/variable group exposure to forks - AZ-002: Script injection via predefined variables (Build.SourceBranchName, etc.) - AZ-003: Unpinned template extends and repository resources - AZ-009: Self-hosted agent pools on PR-triggered pipelines Parser handles all Azure Pipelines YAML structures: stages, jobs, single-job (root steps), deployment jobs, pool inheritance, resources, and extends templates. Environment protection reduces AZ-009 severity from high to medium. Wired into ScanDirectory for automatic detection of azure-pipelines.yml alongside GitHub Actions and GitLab CI. 8 tests, 5 fixtures. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…g nodes YAML parses unquoted script lines containing ": " (e.g. `echo "MR title: $CI_MERGE_REQUEST_TITLE"`) as mapping nodes instead of scalars. This caused GL-002 script injection detection to miss these lines entirely. Fix: extractScriptSteps now reconstructs the original command string from mapping node key-value pairs when sequence items are parsed as mappings. Found during v0.5.0 cross-platform test corpus validation. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

The path isolation mitigation incorrectly classified workflows as safe when fork code was referenced via shell variable aliases (e.g., PR="$GITHUB_WORKSPACE/pr" followed by python script.py "$PR"). The referencesForkPath function now: - Matches $GITHUB_WORKSPACE/<path> references directly - Detects shell variable assignments aliasing the checkout path - Tracks alias variables and checks their usage in non-data commands This fixes a confidence tiering regression on tinygrad/szdiff.yml, which was classified as pattern-only instead of confirmed. The pip install and python execution of fork code via $PR variable is now correctly detected. Found during v0.5.0 ground truth validation. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…ack correlation FG-011: New rule detects bot actor guard TOCTOU bypass risk on pull_request_target and workflow_run triggers. Bot actor guards (dependabot[bot], renovate[bot]) no longer suppress FG-001 findings to info — capped at high to reflect TOCTOU bypassability. FG-002 extended: workflow_dispatch inputs and workflow_call inputs now detected as injectable expressions (github.event.inputs.*, inputs.*). FG-001+FG-002 correlation: post-scan pass merges co-occurring pwn request and script injection findings into a single enhanced finding referencing the Ultralytics attack pattern. Triage prompts added with BoostSecurity attack taxonomy (pipeline parasitism, transitive action compromise, bot TOCTOU, Shai-Hulud, Ultralytics chain). 21 rules across 3 platforms, 69 tests. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…repo Add SECURITY-BOUNDARIES.md defining the public/private boundary for this project. Add CLAUDE.md with CC instructions referencing it. Remove prompts/ from git tracking — triage agent prompts encode assessment methodology and must not be public (rule 1). Update .gitignore to exclude prompts/, queries/, scans/, findings/, reports/, and .sql files. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Remove reference to committed security boundaries file from CLAUDE.md. Add SECURITY-BOUNDARIES.md to .gitignore to prevent future accidental commits. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Adds --clone flag to batch and discover commands, scanning repos via local git sparse checkout instead of the GitHub API. This avoids API rate limits when scanning large numbers of repos. Key changes: - internal/git: sparse clone package with concurrent clone-and-scan - Sliding star-count windows in FetchTopRepos to paginate beyond GitHub's 1,000-result search limit - Repo list caching in SQLite for --resume with --top N - SQLite hardening: single-conn serialization + busy_timeout for concurrent goroutine writes Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

SecKatie · 2026-03-30T12:24:38Z

Test plan results

All items verified:

Test	Result
`go test ./...`	Passed — all packages pass
`batch --top 500 --clone --resume`	Passed — 389/500 scanned, 382 with findings
`--resume` re-run skips already-scanned repos	Passed — "All repos already scanned."
`--keep` flag preserves cloned directories	Passed — 5 repos kept in temp dir with workflow files intact
`discover --clone` path	Passed — 97 repos discovered via code search and scanned via clone

north-echo · 2026-04-07T02:38:38Z

Hey @SecKatie — could you rebase this onto the current main? We just rewrote history to scrub a file that shouldn't have been committed, so the branch needs to be rebased before we can merge. Thanks!

north-echo and others added 10 commits March 22, 2026 09:45

Update security contact email address

66ad2e4

chore: add SECURITY-BOUNDARIES.md to .gitignore, update CLAUDE.md

7836f67

Remove reference to committed security boundaries file from CLAUDE.md. Add SECURITY-BOUNDARIES.md to .gitignore to prevent future accidental commits. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

north-echo force-pushed the main branch from ea1d382 to cb79337 Compare April 7, 2026 02:24

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add git sparse checkout mode for batch scanning#1

feat: add git sparse checkout mode for batch scanning#1
SecKatie wants to merge 10 commits intonorth-echo:mainfrom
SecKatie:feat/clone-based-batch-scan

SecKatie commented Mar 30, 2026 •

edited

Loading

Uh oh!

SecKatie commented Mar 30, 2026

Uh oh!

north-echo commented Apr 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

SecKatie commented Mar 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test plan

Uh oh!

SecKatie commented Mar 30, 2026

Test plan results

Uh oh!

north-echo commented Apr 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

SecKatie commented Mar 30, 2026 •

edited

Loading