-
-
Notifications
You must be signed in to change notification settings - Fork 1.2k
SARIF reporter #10759
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
SARIF reporter #10759
Conversation
Pierre-Sassoulas
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you, this is great.
I'm not a SARIF expert so I have some code placement comments, and I hope someone more knowledgeable in SARIF will have an opinion about the actual content.
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #10759 +/- ##
=======================================
Coverage 95.98% 95.99%
=======================================
Files 176 177 +1
Lines 19560 19657 +97
=======================================
+ Hits 18775 18870 +95
- Misses 785 787 +2
🚀 New features to boost your workflow:
|
Maybe @nvuillam is still active since they're the original requestor. |
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
d2110f6 to
940ce59
Compare
This comment has been minimized.
This comment has been minimized.
|
As long as the sarif validator validates the SARIF output, it's good :) |
This comment has been minimized.
This comment has been minimized.
e64b059 to
ebf822c
Compare
|
🤖 According to the primer, this change has no effect on the checked open source code. 🤖🎉 This comment was generated for commit ebf822c |
|
Not entirely sure how to fix the coverage issue as it's mostly for the conversion of windows paths to relative URIs. |
Pierre-Sassoulas
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Implementation LGTM, it seems we don't need to cover everything as long as we add sarif validation in the test suite ?
Not sure it's that useful: the type specs should be the useful (to pylint) subset of the json schema (modulo a rule or two which are painful to type), while the "official" validator with additional suggestions or service-specific requirements doesn't seem to be available as anything but a webservice of sorts? |
|
pylint maintainers will probably have to deal with bug fixes on this, so we need to have something to make sure this does not regress somehow. We do have windows specific CI jobs though, so we can cover the windows specific path. |
73f84df to
b6aaa55
Compare
The config uses the `pre-commit/pre-commit-hook` repository which since v5.0 has multiple hooks using the `pre-commit` stage which was only added in pre-commit [3.2.0][] causing a run failure when run under a pre-commit pinned to 2.2. From a very surface looks it seems like https://pre-commit.ci/ is anal about keeping everything on latest so pinning pre-commit in tox seems like it can only cause problems (by drifting behind the actual CI). Of note: it looks like the `minimum_pre_commit_version` stricture on hooks does not do anything: all the `pre-commit` hooks are filtered on version except [`destroyed-symlinks`][] but I get an error on `check-added-large-file`: ==> At Hook(id='check-added-large-files') ==> At key: stages ==> At index 0 =====> Expected one of commit, commit-msg, manual, merge-commit, post-checkout, post-commit, post-merge, post-rewrite, prepare-commit-msg, push but got: 'pre-commit' [3.2.0]: https://github.com/pre-commit/pre-commit/blob/main/CHANGELOG.md#320---2023-03-17 [`destroyed-symlinks`]: pre-commit/pre-commit-hooks@c7d1e85
406280f to
164d694
Compare
SARIF is a unified file format used to exchange information between static analysis tools (like pylint) and various types of formatters, meta-runners, broadcasters / alert system, ... This implementation is ad-hoc, and non-validating. Spec v Github ------------- Turns out Github both doesn't implement all of SARIF (which makes sense) and requires a bunch of properties which the spec considers optional. The [official SARIF validator][] (linked to by both oasis and github) was used to validate the output of the reporter, ensuring that all the github requirements it flags are fulfilled, and fixing *some* of the validator's pet issues. As of now the following issues are left unaddressed: - azure requires `run.automationDetails`, looking at the spec I don't think it makes sense for the reporter to inject that, it's more up to the CI - the validator wants a `run.versionControlProvenance`, same as above - the validator wants rule names in PascalCase, lol - the validator wants templated result messages, but without pylint providing the args as part of the `Message` that's a bit of a chore - the validator wants `region` to include a snippet (the flagged content) - the validator wants `physicalLocation` to have a `contextRegion` (most likely with a snippet) On URIs ------- The reporter makes use of URIs for artifacts (~files). Per ["guidance on the use of artifactLocation objects"][3.4.7], `uri` *should* capture the deterministic part of the artifact location and `uriBaseId` *should* capture the non-deterministic part. However as far as I can tell pylint has no requirement (and no clean way to require) consistent resolution roots: `path` is just relative to the cwd, and there is no requirement to have project-level files to use pylint. This makes the use of relative uris dodgy, but absolute uris are pretty much always broken for the purpose of *interchange* so they're not really any better. As a side-note, Github [asserts][relative-uri-guidance] > While this [nb: `originalUriBaseIds`] is not required by GitHub for > the code scanning results to be displayed correctly, it is required > to produce a valid SARIF output when using relative URI references. However per [3.4.4][] this is incorrect, the `uriBaseId` can be resolved through end-user configuration, `originalUriBaseIds`, external information (e.g. envvars), or heuristics. It would be nice to document the "relative root" via `originalUriBaseIds` (which may be omitted for that purpose per [3.14.14][], but per the above claiming a consistent project root is dodgy. We *could* resolve known project files (e.g. pyproject.toml, tox.ini, etc...) in order to find a consistent root (project root, repo root, ...) and set / use that for relative URIs but that's a lot of additional complexity which I'm not sure is warranted at least for a first version. Fixes pylint-dev#5493 [3.4.4]: https://docs.oasis-open.org/sarif/sarif/v2.1.0/csprd01/sarif-v2.1.0-csprd01.html#_Toc10540869 [3.4.7]: https://docs.oasis-open.org/sarif/sarif/v2.1.0/csprd01/sarif-v2.1.0-csprd01.html#_Toc10540872 [3.14.14]: https://docs.oasis-open.org/sarif/sarif/v2.1.0/csprd01/sarif-v2.1.0-csprd01.html#_Toc10540936 [relative-uri-guidance]: https://docs.github.com/en/code-security/code-scanning/integrating-with-code-scanning/sarif-support-for-code-scanning#relative-uri-guidance-for-sarif-producers [official SARIF validator]: https://sarifweb.azurewebsites.net/
| "jsonschema~=4.25", | ||
| "py~=1.11.0", | ||
| "pytest>=8.4,<10", | ||
| "pytest-benchmark~=5.1", | ||
| "pytest-timeout~=2.4", | ||
| "requests", | ||
| "rpds-py<0.28; platform_python_implementation=='PyPy' and python_version<'3.11'", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there any way we could get by without having to install jsonschema and in particular rpds-py? Even with it just being a test dependency, it will make maintenance, especially testing new Pyhton versions before release, more difficult.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in particular rpds-py?
That's a hard dependency of jsonschema
Is there any way we could get by without having to install
jsonschema
Not doing schema validation, which was the original proposal but on discord Pierrre was wary of drift / maintenance burden without an upstream source of truth.
In theory types could be generated and used for typecheking (or runtime validation), with the generated code checked / validated against the upstream schema once in a while, but from a quick browse of https://json-schema.org/tools there does not seem to be any maintained generator:
- OpenAPI is archived
- Statham is unmaintained, does not support recent python, and does not support features required by the sarif schema
- thydux has completely disappeared from the internet
- yacg is not a complete e2e project
| schema = pytestconfig.cache.get(CACHE_KEY, None) | ||
| if not schema: | ||
| try: | ||
| res = requests.get(SCHEMA_URL, timeout=5) | ||
| except requests.exceptions.RequestException as e: | ||
| raise pytest.skip( | ||
| f"Unable to retrieve schema v{SCHEMA_VERSION}: {e}" | ||
| ) from e | ||
|
|
||
| if res.status_code != 200: | ||
| raise pytest.skip( | ||
| f"Unable to retrieve schema v{SCHEMA_VERSION}: {res.reason}" | ||
| ) | ||
| schema = res.text | ||
| pytestconfig.cache.set(CACHE_KEY, schema) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How often does the schema change? Is it necessary to download it every time or could we just vendor the current version. Any external download will inevitably slow down test runs.
Wouldn't it be enough to just check the output against a known good version? That would also help get rid of the jsonschema dependency.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How often does the schema change? Is it necessary to download it every time or could we just vendor the current version. Any external download will inevitably slow down test runs.
Rarely to never I would assume: the original specification was published in 2020, and when revisions were done in Errata 01 in 2023 that led to a new schema. So vendoring is not an issue as long as you're OK with a fairly large blob (the file is 115KB).
That would also help get rid of the
jsonschemadependency.
It would not.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if we create a repository specifically for the sarif reporter ? This way we do not slow down the pipeline in main pylint and we can add the reporter as a dependency anyway.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if we create a repository specifically for the sarif reporter ? This way we do not slow down the pipeline in main pylint and we can add the reporter as a dependency anyway.
I think this would make sense. It's more complex than the current builtin ones while also being more niche than say the GithubReporter. Users who need it could just install it like any other pylint plugin.
I'm not sure it even needs to be added as a dependency, a reference in the docs and help output might be enough.
SARIF is a unified file format used to exchange information between static analysis tools (like pylint) and various types of formatters, meta-runners, broadcasters / alert system, ...
This implementation is ad-hoc, and non-validating.
Spec v Github
Turns out Github both doesn't implement all of SARIF (which makes sense) and requires a bunch of properties which the spec considers optional. The official SARIF validator (linked to by both oasis and github) was used to validate the output of the reporter, ensuring that all the github requirements it flags are fulfilled, and fixing some of the validator's pet issues.
As of now the following issues are left unaddressed:
run.automationDetails, looking at the spec I don't think it makes sense for the reporter to inject that, it's more up to the CIrun.versionControlProvenance, same as aboveMessagethat's a bit of a choreregionto include a snippet (the flagged content)physicalLocationto have acontextRegion(most likely with a snippet)On URIs
The reporter makes use of URIs for artifacts (~files). Per "guidance on the use of artifactLocation objects",
urishould capture the deterministic part of the artifact location anduriBaseIdshould capture the non-deterministic part. However as far as I can tell pylint has no requirement (and no clean way to require) consistent resolution roots:pathis just relative to the cwd, and there is no requirement to have project-level files to use pylint. This makes the use of relative uris dodgy, but absolute uris are pretty much always broken for the purpose of interchange so they're not really any better.As a side-note, Github asserts
However per 3.4.4 this is incorrect, the
uriBaseIdcan be resolved through end-user configuration,originalUriBaseIds, external information (e.g. envvars), or heuristics.It would be nice to document the "relative root" via
originalUriBaseIds(which may be omitted for that purpose per 3.14.14, but per the above claiming a consistent project root is dodgy.We could resolve known project files (e.g. pyproject.toml, tox.ini, etc...) in order to find a consistent root (project root, repo root, ...) and set / use that for relative URIs but that's a lot of additional complexity which I'm not sure is warranted at least for a first version.
Fixes #5493