Skip to content

feat: add git sparse checkout mode for batch scanning#1

Open
SecKatie wants to merge 10 commits intonorth-echo:mainfrom
SecKatie:feat/clone-based-batch-scan
Open

feat: add git sparse checkout mode for batch scanning#1
SecKatie wants to merge 10 commits intonorth-echo:mainfrom
SecKatie:feat/clone-based-batch-scan

Conversation

@SecKatie
Copy link
Copy Markdown

@SecKatie SecKatie commented Mar 30, 2026

Summary

  • Adds --clone flag to batch and discover commands to scan repos via local git sparse checkout instead of the GitHub API, avoiding rate limits at scale
  • Implements sliding star-count windows in FetchTopRepos to paginate beyond GitHub's 1,000-result search limit
  • Caches repo lists in SQLite so --resume with --top N skips re-fetching
  • Hardens SQLite for concurrent goroutine writes (single-conn serialization + busy_timeout)

Test plan

  • go test -short ./... passes
  • Verified with fluxgate batch --top 500 --clone --resume (389/500 scanned, 382 with findings)
  • Test --resume re-run skips already-scanned repos
  • Test --keep flag preserves cloned directories
  • Test discover --clone path

🤖 Generated with Claude Code

north-echo and others added 10 commits March 22, 2026 09:45
…riage

Four new detection capabilities based on Red Hat triage analysis:

- Gap 1: Actor guards — detect github.actor == 'bot[bot]' gates (→ info)
  and human actor restrictions (→ downgrade by 1)
- Gap 2: Action-based permission gates — recognize actions-cool/check-user-permission
  and similar third-party permission-checking actions as maintainer checks
- Gap 3: Cross-job needs: gating — follow needs: chains to detect environment
  approval gates on upstream authorize jobs (→ downgrade by 1)
- Gap 4: Path isolation — detect fork code checked out to subdirectory with
  no direct execution, downgrade confidence to pattern-only

Also adds fork guard to Fluxgate's own CI workflow (fixes FG-006 self-finding).

Validated against 11 triage findings: 5 false criticals corrected automatically,
2 false positives eliminated, 2 clean criticals unchanged.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…ulk scanning

New GitHub Actions rules:
- FG-008: OIDC misconfiguration (id-token:write on fork-accessible triggers)
- FG-009: Self-hosted runner exposure on PR/PRT workflows
- FG-010: Cache poisoning via actions/cache on external triggers

Cross-platform CI/CD support:
- Platform-agnostic Pipeline interface (internal/cicd)
- GitLab CI parser with 4 rules (GL-001 MR secrets, GL-002 script
  injection, GL-003 unsafe includes, GL-009 self-hosted MR runner)

Bulk scanning infrastructure:
- BigQuery ingest command for large-scale workflow analysis
- Gato-X import command for converting discovery output to repo lists
- Analysis SQL queries for scan campaign reporting

Fixes:
- Route all warning output to stderr (fixes JSON output corruption)
- Fix runs-on group+labels parser to prefer labels over group name
- Version bump to v0.5.0

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add cross-platform Azure DevOps Pipelines support:
- AZ-001: PR builds with secret/variable group exposure to forks
- AZ-002: Script injection via predefined variables (Build.SourceBranchName, etc.)
- AZ-003: Unpinned template extends and repository resources
- AZ-009: Self-hosted agent pools on PR-triggered pipelines

Parser handles all Azure Pipelines YAML structures: stages, jobs,
single-job (root steps), deployment jobs, pool inheritance,
resources, and extends templates. Environment protection reduces
AZ-009 severity from high to medium.

Wired into ScanDirectory for automatic detection of
azure-pipelines.yml alongside GitHub Actions and GitLab CI.

8 tests, 5 fixtures.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…g nodes

YAML parses unquoted script lines containing ": " (e.g.
`echo "MR title: $CI_MERGE_REQUEST_TITLE"`) as mapping nodes
instead of scalars. This caused GL-002 script injection detection
to miss these lines entirely.

Fix: extractScriptSteps now reconstructs the original command
string from mapping node key-value pairs when sequence items
are parsed as mappings.

Found during v0.5.0 cross-platform test corpus validation.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The path isolation mitigation incorrectly classified workflows as safe
when fork code was referenced via shell variable aliases (e.g.,
PR="$GITHUB_WORKSPACE/pr" followed by python script.py "$PR").

The referencesForkPath function now:
- Matches $GITHUB_WORKSPACE/<path> references directly
- Detects shell variable assignments aliasing the checkout path
- Tracks alias variables and checks their usage in non-data commands

This fixes a confidence tiering regression on tinygrad/szdiff.yml,
which was classified as pattern-only instead of confirmed. The pip
install and python execution of fork code via $PR variable is now
correctly detected.

Found during v0.5.0 ground truth validation.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…ack correlation

FG-011: New rule detects bot actor guard TOCTOU bypass risk on
pull_request_target and workflow_run triggers. Bot actor guards
(dependabot[bot], renovate[bot]) no longer suppress FG-001 findings
to info — capped at high to reflect TOCTOU bypassability.

FG-002 extended: workflow_dispatch inputs and workflow_call inputs
now detected as injectable expressions (github.event.inputs.*,
inputs.*).

FG-001+FG-002 correlation: post-scan pass merges co-occurring pwn
request and script injection findings into a single enhanced finding
referencing the Ultralytics attack pattern.

Triage prompts added with BoostSecurity attack taxonomy (pipeline
parasitism, transitive action compromise, bot TOCTOU, Shai-Hulud,
Ultralytics chain).

21 rules across 3 platforms, 69 tests.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…repo

Add SECURITY-BOUNDARIES.md defining the public/private boundary for
this project. Add CLAUDE.md with CC instructions referencing it.

Remove prompts/ from git tracking — triage agent prompts encode
assessment methodology and must not be public (rule 1). Update
.gitignore to exclude prompts/, queries/, scans/, findings/,
reports/, and .sql files.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Remove reference to committed security boundaries file from
CLAUDE.md. Add SECURITY-BOUNDARIES.md to .gitignore to prevent
future accidental commits.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Adds --clone flag to batch and discover commands, scanning repos via
local git sparse checkout instead of the GitHub API. This avoids API
rate limits when scanning large numbers of repos.

Key changes:
- internal/git: sparse clone package with concurrent clone-and-scan
- Sliding star-count windows in FetchTopRepos to paginate beyond
  GitHub's 1,000-result search limit
- Repo list caching in SQLite for --resume with --top N
- SQLite hardening: single-conn serialization + busy_timeout for
  concurrent goroutine writes

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@SecKatie
Copy link
Copy Markdown
Author

Test plan results

All items verified:

Test Result
go test ./... Passed — all packages pass
batch --top 500 --clone --resume Passed — 389/500 scanned, 382 with findings
--resume re-run skips already-scanned repos Passed — "All repos already scanned."
--keep flag preserves cloned directories Passed — 5 repos kept in temp dir with workflow files intact
discover --clone path Passed — 97 repos discovered via code search and scanned via clone

@north-echo
Copy link
Copy Markdown
Owner

Hey @SecKatie — could you rebase this onto the current main? We just rewrote history to scrub a file that shouldn't have been committed, so the branch needs to be rebased before we can merge. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants