feat: add --exclude flag to skip paths during scanning#2458
feat: add --exclude flag to skip paths during scanning#2458another-rex merged 32 commits intogoogle:mainfrom
Conversation
Add --exclude/-e CLI flag that accepts glob patterns to exclude files and directories from vulnerability scanning. This addresses user requests to easily skip test files and documentation during scans. Changes: - Add ExcludePatterns field to ScannerActions struct - Add --exclude flag to scan source command - Wire ExcludePatterns to scalibr's SkipDirGlob option - Document the new flag in scan-source.md Fixes google#2324
There was a problem hiding this comment.
This is definitely something we want, though might need a bit more design before including it. It is really hard for us to justify removing a flag once it's added.
I want a way for people to specify regex patterns without having to add another separate flag. Maybe something like if it is wrapped /<regex>/ like in javascript?
…at/exclude-paths-flag
- Replace regexp import with cachedregexp to satisfy depguard rules - Add Regexp type alias and Compile function to cachedregexp package - Remove duplicate package doc comment (godoclint) - Add blank line before return (nlreturn) - Add t.Parallel() calls to test functions (paralleltest)
There was a problem hiding this comment.
Appreciate your work on this!
Could you please:
- move the flag to be
--experimental-(this is mainly because right now scalibr only supports skipping directories, not files which'll hopefully be able to change but in the meantime this will mean we can do breaking changes if we go another direction instead) - update the docs to reflect that this only applies to directories, not files
- add a couple of cmd tests
- update the syntax to use
:, matching our--lockfileflag, which'll also let us useDirsToSkiptoo- if the string is prefixed with
:or nothing, it should be applied as a directory (DirsToSkip) - if the string is prefixed with
g:, it should be applied as a glob (SkipDirGlob) - if the string is prefixed with
r:, it should be applied as a regex (SkipDirRegex)
- if the string is prefixed with
to clarify, also if it is not prefixed, use DirsToSkip, the prefixing with |
…it glob, regex, and exact match patterns.
…or exact, glob, and regex patterns, and update `go.mod` to directly import `gobwas/glob`.
…-dir arguments and patterns.
|
@another-rex sir, I have done the changes suggested you. |
…o exclude directories during scanning.
… to experimental actions.
… integration test.
|
I initially removed the type alias as you suggested, but CI failed with a depguard error: The Should I look into adding an exception to the depguard config for this file, or is the type alias acceptable given this constraint? |
|
right yeah of course - I think for now let's go with adding the exception as that mirrors what we're doing elsewhere for this situation, though I think I'll revisit that after this as it'll probably be better to just re-export stuff from (I had forgotten we pretty much don't use regexp outside of |
…r's `regexp` package usage check.
G-Rath
left a comment
There was a problem hiding this comment.
looking good, we just also do want to change the cmd flag to use exclude too
Co-authored-by: Gareth Jones <3151613+G-Rath@users.noreply.github.com>
Co-authored-by: Gareth Jones <3151613+G-Rath@users.noreply.github.com>
…e` and update its usage and documentation.
… multiple flags.
…parseExcludePatterns` test.
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request introduces a new experimental flag --experimental-exclude to allow excluding paths from scanning. The implementation is well-structured, with new logic for parsing exclusion patterns, comprehensive tests, and clear documentation. I've found one issue with how absolute paths are handled, which could lead to silent failures. My detailed feedback is in the review comments. Also, note that the PR description and title mention --exclude, but the implementation uses --experimental-exclude, which is correctly reflected in the documentation.
|
Hmm the requested interactions not found errors are a bit odd in the CI, because it looks like you have added the new interactions to the cassettes. @G-Rath Can you take a look at why that could be happening? |
|
I think they might just need to be regenerated, as they're still using the old First I'd do is just revert all changes to |
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #2458 +/- ##
==========================================
+ Coverage 67.72% 67.76% +0.03%
==========================================
Files 172 173 +1
Lines 13343 13414 +71
==========================================
+ Hits 9037 9090 +53
- Misses 3592 3604 +12
- Partials 714 720 +6 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
Apologies for the delay, tests are mostly passing now, though there does seem to be a legitimate issue on Windows. Is it easy for you to test on Windows to debug this? Otherwise we can look into it when we have time available. |
|
@another-rex Sorry to say but I can only test on macos and Linux(ubuntu). |
|
@G-Rath Can you take a look at the windows error and see if it can be resolved easily? |
| // Unknown prefixes are returned as-is so the caller can provide appropriate error messages. | ||
| func parseExcludeArg(arg string) (string, string) { | ||
| // Handle Windows absolute paths (e.g., C:\path) | ||
| if runtime.GOOS == "windows" && filepath.IsAbs(arg) { |
There was a problem hiding this comment.
@another-rex this is why Windows is unhappy, because r:/ and g:/ are absolute paths on Windows
Solutions that come to mind are:
- we assume a lowercase
randgare our prefixes- Windows drive letters are case-insensitive and usually represented in uppercase
- we have a double colon rule (i.e. have both
r::/andr:/be valid)- yuck, and yet more of a bandaid
- we make it a requirement to prefix with
:if you're doing absolute paths, at least on Windows- this'd be inconsistent with our logic elsewhere, though we could change that too..
- we change the symbol
:feels like the most intuitive symbol to me though for us to be using 😞
- we explicitly check for
r:andg:, and say Windows users have to prefix with:if they want to pass in those driver letters in lowercase - we do nothing (at least for now), and tell people their regexps need to start with something e.g.
.*
Those last two seem like the most straightforward for the time being (given this is experimental and all)
There was a problem hiding this comment.
I think 5 is probably the best way to go here, and on windows just force uppercase to be used for drive names if they want to have drive r: or drive g: to be interpreted as drives.
Overview
Issue: #2324
This PR adds a
--excludeCLI flag that allows users to specify glob patterns to exclude files and directories from vulnerability scanning.Fixes #2324
Details
Problem
Users scanning large repositories often want to exclude test files, documentation, and other non-production code from vulnerability scans. While
.gitignoreis respected by default and osv-scanner.toml can be used, there was no quick CLI option to exclude paths.Solution
Added
--exclude/-eflag that accepts glob patterns: