Fix codegen build (toml 0.9 broke things), improve v_htmlescape behaviour (don’t encode slash)#164
Fix codegen build (toml 0.9 broke things), improve v_htmlescape behaviour (don’t encode slash)#164chris-morgan wants to merge 6 commits intozzau13:masterfrom
Conversation
Dependency version specifier "0" is always wrong. In this case, toml 0.8 could parse an entire TOML file as Value, but toml 0.9 fixed that obvious wrongness, insisting you use Table. Instead of changing the version to 0.8, I updated it to 0.9.
I reckon this improves things quite a bit. Still messy, but I find it noticeably easier to follow. (I still dislike rustfmt, makes it worse.)
Also ran cargo fmt, since that minimises the {src,tests}/lib.rs diffs.
(If that’s to be done, then I ask what the point of prettyplease is.)
This was always a mistake; nothing has *ever* required it. This was one of the worse problems with OWASP’s XSS prevention cheat sheet, a thoroughly bad document that was bad when it was written around 2010, and became worse as edits were made to it, though some edits in 2020–2023 finally improved it a little. Details *were* in <OWASP/CheatSheetSeries#515>, but that issue has been deleted, and the Wayback Machine didn’t have it. Sigh. I don’t like OWASP because of things like this. This should be considered a breaking change, because some people will have tests depending on the wonky behaviour.
Easier to read this way.
• HTML 3.2 (January 1997) lacked " and '.
• HTML 4 (December 1997) had " but lacked '.
• XML 1.0 (February 1998) had both " and '.
• HTML 5 (January 2008) added '.
• IE 8 (March 2009) was the last browser that lacked '.
By that time everyone else had been doing this HTML 5 thing for a while,
and Microsoft followed suit in IE 9.
Frankly, I don’t like apostrophe being encoded;
I would declare double-quoted attribute values the One True Form,
rejecting single-quoted attribute values,
just like unquoted attribute values are rejected by libraries like this.
But that would be a bit too drastic a change to make at this stage.
Another alternative is to use &zzau13#39;, which is shorter.
This should again be considered a breaking change.
And a slightly more serious one than stopping escaping slash,
because it *will* actually break IE≤8.
|
Thanks for the PR @chris-morgan, and sorry for the late reply. After thinking this over I'm going to go in a different direction, so I'll be closing this one. Here's the plan and reasoning: Two crates instead of changing the existing behaviour in place Rather than mutating
Notes on the entity choices in this PR:
Codegen rework first Before adding the second crate I want to land some codegen improvements that have been pending:
The TL;DR — closing in favour of: (1) codegen rework (OR op + backtracking + minimax selection), then (2) two coexisting HTML-escape crates (legacy 6-char + modern 5-char OWASP), both keeping |
Read the commit messages for more detail.
Why the v_escape_codegen@0.1.9 → 0.1.8? Dunno, haven’t delved.