Conversation
ae212d8 to
d2d09b7
Compare
There was a problem hiding this comment.
Pull Request Overview
This PR adds HTML sanitization capabilities to prevent XSS attacks and other security issues when handling user-provided content. It introduces the bluemonday library to sanitize HTML tags in GitHub issue and pull request titles and bodies.
- Integrates bluemonday library for HTML sanitization with a configurable policy
- Updates sanitization logic to filter both invisible characters and HTML tags
- Adds comprehensive test coverage for the new HTML filtering functionality
Reviewed Changes
Copilot reviewed 14 out of 15 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
pkg/sanitize/sanitize.go |
Adds FilterHTMLTags and Sanitize functions with bluemonday policy configuration |
pkg/sanitize/sanitize_test.go |
Adds comprehensive test cases for HTML tag filtering |
pkg/github/issues.go |
Updates to use Sanitize instead of FilterInvisibleCharacters |
pkg/github/pullrequests.go |
Updates to use Sanitize instead of FilterInvisibleCharacters |
go.mod |
Adds bluemonday and its dependencies |
go.sum |
Updates checksums for new dependencies |
third-party-licenses.*.md |
Adds license entries for new dependencies |
third-party/*/LICENSE* |
Adds license files for new third-party dependencies |
script/licenses |
Adds GOFLAGS=-mod=mod to go-licenses commands |
script/licenses-check |
Adds GOFLAGS=-mod=mod and fixes echo to printf |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
kerobbi
left a comment
There was a problem hiding this comment.
Tiny comment, but other than that lgtm!
| } | ||
|
|
||
| func FilterHTMLTags(input string) string { | ||
| if input == "" { |
There was a problem hiding this comment.
Maybe we could also check if the string has any HTML in the first place in this early return?
There was a problem hiding this comment.
Interesting idea, although an early return that has to parse the content might not be an optimisation. Hard to tell without getting into the weeds.
There was a problem hiding this comment.
What I was mainly thinking about is just adding a simple !strings.Contains(input, "<") check, nothing fancy or overly complex. It wouldn't even handle all edge cases (i.e., an input containing a single < for whatever reason) but it would be just a quick scan to avoid running the full sanitiser on plain input. But what you said is definitely a good point!
There was a problem hiding this comment.
Bluemonday does html input tokenization and I don't want to reinvent the wheel here. :)
This PR:
Misc updates:
script/licensesandscript/licenses-checkwere updated to useGOFLAGS=-mod=modwhich will ignore the vendor directory. The report template uses theLicenseURLfield, but it’s being populated with links to this repo’s vendored paths because the tool is running in vendor mode by default when a vendor/ directory exists. By addingGOFLAGS=-mod=modvendor directory is ignored and the upstream module url is used as expected.