Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Linkcheck limits URL length to
MAX_URL_LENGTH, longer URLs are skipped and a warning is logged mentioning the URL. This can create disadvantageous situations:Someone decided it was a good idea to put a multi megabyte image directly into the content as a base64 encoded data URL, and the project using django-linkcheck did not think to prevent that situation.
Now there are multi megabyte data urls in the log every time linkcheck scans this. Inspecting logs is near impossible, since one has to scroll past huge blocks of garbage data, and maybe there's another data url logged just after it, so the important log line between the two is easily missed.
Maybe it is from a data URL or maybe just a conventional URL that happens to exceed
MAX_URL_LENGTH– one decides to investigate where in the content it was used and whether it should be changed somehow.Unfortunately, the usual solution of looking at Link objects to find the content object the URL is in does not work, since the URL was rejected for being too long.
I propose a solution to each of these:
If the URL exceeds
MAX_URL_LENGTH, if it also starts withdata:, truncate it to only 64 characters.(Expectation: the data is not useful for identifying the URL)
In the log message when the URL exceeds
MAX_URL_LENGTH, also log the instance where it came from to aid doing something about it.