Expand relative paths in href and src attributes via replace-match in sanitize-html function.#46
Expand relative paths in href and src attributes via replace-match in sanitize-html function.#46c1-g wants to merge 5 commits intoalphapapa:masterfrom
href and src attributes via replace-match in sanitize-html function.#46Conversation
Works by searching for the value of href and the src attributes and replace their value with an expanded url.
This is in case somebody explicitly pass a url to ‘org-web-tools--url-as-readable-org’.
href and src attributes via replace-match in sanitize-html function.href and src attributes via replace-match in sanitize-html function.
|
After a few more testing the expanding function raised error <a href="#Preorder_R\R">in https://en.wikipedia.org/wiki/Binary_relation Maybe quoting the replacement text will resolve the issue. |
The function was having issue expanding <a href="#Preorder_R\R"> because the “\” is treated as special in replace-match, with the LITERAL argument set to t, this won’t be a problem anymore.
alphapapa
left a comment
There was a problem hiding this comment.
Hi,
Apologies for overlooking this PR for so long.
Looking at #41 again, and based on what I've learned since then, I think the best way to solve this issue would be to parse the HTML to a DOM object with libxml-html-parse-buffer, then walk the DOM using the dom library and modify any anchors' HREFs accordingly. Then the DOM can be serialized back to HTML using shr-dom-print. That should be more robust and reliable than using regexp matches on the HTML.
Also, I think the --sanitize-html function is not the place to do this change; its purpose is to "sanitize" the HTML, i.e to make it clean and safe, and adjusting links is a different purpose.
So, would you like to adjust this PR accordingly? If you're not interested anymore, that's fine, too. Just let me know.
Thanks.
Addressing #41 and maybe #45 too?
All the work is done in
org-web-tools--sanitize-html.Added a new variable called
org-web-tools-expand-relative-path.If it's nil, relative paths won't be expanded.
This works by searching the temporary html buffer created by
org-web-tools--sanitize-htmlfor all
hrefandsrcattributes, pass the value of each attribute tourl-expand-file-name,with the URL argument or the url in user's kill ring being its base, replace the value of the
attribute with the result.
I tested this on a handful of Wikipedia articles and it works just fine.