Releases: EmilStenstrom/justhtml
Releases · EmilStenstrom/justhtml
Release v1.17.0
Security
- (Severity: Moderate) Harden custom foreign-namespace policies against active HTML integration points in SVG and MathML. Previously, preserved integration points such as
<foreignObject>,<annotation-xml encoding="text/html">, SVG<title>/<desc>, and MathML text integration points could keep or host active HTML descendants such as<script>when the sanitized output was rendered. - (Severity: Moderate) Harden constructor-time and transform-driven sanitization against preserved
<style>rawtext bypasses. Previously,JustHTML(..., sanitize=True)and explicit publicSanitize(...)transforms could preserve resource-loading CSS such as@importorbackground-image:url(...)in allowlisted<style>blocks from HTML string input, even thoughsanitize()andsanitize_dom()correctly stripped the same content. - (Severity: Low) Harden the low-level terminal
Sanitize(...)transform execution path against mutation XSS in custom foreign-namespace policies. Previously, a direct terminal sanitize pass in the transform runtime could sanitize MathML/SVG content into output that looked inert in memory but became active HTML, such as<img onerror>, after a later HTML reparse. - (Severity: Low) Harden HTML comment serialization against additional breakout payloads from programmatic
Comment(...)nodes. Previously, comment data beginning with invalid states such as>or->could serialize into an empty HTML comment followed by live markup like injected<img onerror>. - (Severity: Moderate) Harden custom foreign-namespace policies against SVG
filter="url(...)"fetches. Previously, preservedfilterpresentation attributes could contain externalurl(...)references that bypassed URL sanitization and triggered browser fetches. - (Severity: Moderate) Harden
sanitize()andsanitize_dom()against mutation XSS in custom foreign-namespace policies. Previously, crafted MathML/SVG parser-differential payloads could sanitize into output that looked inert in memory but became active HTML, such as<img onerror>, after a later HTML reparse. - (Severity: Low) Harden HTML serialization against rawtext breakout injection from programmatic
scriptandstylenodes. Previously, text such as</style><img ...>or</script><img ...>could serialize into active markup throughto_html()and downstreamto_markdown(html_passthrough=True). - (Severity: Low) Harden compiled sanitize-pipeline caching against cache mutation. Previously, once a policy’s compiled sanitizer had been warmed, mutating the cached transform list in place could weaken later
sanitize(),sanitize_dom(), andJustHTML(..., sanitize=True)calls, including on the exported default policies. - (Severity: Low) Harden the programmatic DOM APIs against cycle creation. Previously, creating parent/child cycles with
append_child(),insert_before(), orreplace_child()could make operations such asto_html()andsanitize_dom()loop indefinitely on attacker-controlled node graphs.
Release v1.16.0
Security
- (Severity: Low) Harden sanitization policy reuse against nested-state mutation. Previously, mutating nested policy state such as
allowed_attributesorurl_policy.allow_rulescould leave stale compiled sanitizers active insanitize(),sanitize_dom(), andJustHTML(..., sanitize=True), and mutating exported defaults such asDEFAULT_POLICY.url_policy.allow_rules[("a", "href")].allowed_schemescould weaken later default sanitization process-wide. - (Severity: Moderate) Harden
sanitize_dom()andsanitize()for programmatic DOM trees with mixed-case dangerous tag names. Previously, nodes such asScRiPtorStylecould miss thedrop_content_tagspolicy in the in-memory sanitization path and incorrectly preserve their children. - (Severity: Low) Normalize
SanitizationPolicy.drop_content_tagsto lowercase. Previously, custom policies using values such as{"SCRIPT"}could silently fail to drop dangerous subtrees in the in-memory sanitization APIs. - (Severity: Low) Harden doctype serialization against programmatic doctype-name injection. Previously, a crafted
doctype(...)or manual!doctypenode name such ashtml><img ...>could serialize into active markup before the document body. - (Severity: Moderate) Harden custom foreign-namespace policies against SVG animation-based URL mutation. Previously, preserved SVG animation elements such as
<set>or<animate>could mutate already-sanitized attributes likeimage[href]after sanitization and trigger remote requests that bypassed the configured URL rules. - (Severity: Moderate) Harden custom foreign-namespace policies against SVG
url(...)presentation-attribute fetches. Previously, preserved attributes such asfill,clip-path,mask,marker-start, andcursorcould contain externalurl(...)references that bypassed URL sanitization and triggered browser fetches. - (Severity: Moderate) Harden rawtext sanitization against mixed-case programmatic
styleandscripttag names. Previously, custom policies that preserved mixed-case nodes such asStYlEcould bypass the rawtext hardening pass and keep active stylesheet content such as remote@importrules. - (Severity: Moderate) Harden sanitization against programmatic DOM namespace confusion for
svgandmathsubtrees. Previously, nodes constructed withnamespace=\"html\"but serialized as<svg>...</svg>could bypass foreign-content checks insanitize()andsanitize_dom(), allowing active SVG features such asurl(...)presentation attributes or animation-based attribute mutation to survive.
Release v1.15.0
Security
- (Severity: Low) Harden HTML comment serialization against comment-breakout injection. Previously, programmatic
Comment(...)nodes or transform-produced comment data containing sequences like-->could serialize into active HTML such as injected<img onerror>. - (Severity: Low) Harden HTML serialization and the builder against unsafe programmatic element and attribute names. Previously, direct
Node(...)usage, transform-produced attrs, orbuilder.element(...)calls could emit attacker-controlled markup such as injected<img onerror>by including syntax-breaking characters in a tag or attribute name. - (Severity: Moderate) Harden
JustHTML.clean_url_value(...)andclean_url_in_js_string(...)against HTML character reference smuggling such asjavascript:..., which could bypass URL scheme validation and become an activejavascript:URL after HTML attribute parsing. - (Severity: Low) Harden URL sanitization against browser backslash normalization. Previously, “relative” URLs such as
\\evil.example/xor/\\evil.example/xcould survive sanitization and be interpreted by browsers as remote network requests, bypassing relative-only URL rules such as the defaultimg[src]policy. - (Severity: Low) Harden URL sanitization and
clean_url_value(...)against malformed bracketed hosts whenallowed_hostsis enabled. Previously, inputs such ashttps://[evil.example]/xcould raiseValueErrorfrom Python’s URL parser and crash sanitization instead of being rejected. - (Severity: Low) Harden
to_markdown(html_passthrough=True)for sanitized<textarea>content. Previously, attacker-controlled</textarea>sequences could survive sanitization as text, then break out during Markdown HTML passthrough and turn into active HTML when the Markdown output was reparsed or rendered. - (Severity: Low) Harden
a[ping]sanitization. Previously,pingwas treated as a single URL even though browsers interpret it as a space-separated list of URLs, so a custom policy could allow a trusted first endpoint while unintentionally preserving additional attacker-controlled ping URLs. - (Severity: Low) Harden preserved
<style>blocks in custom policies. Previously, JustHTML only neutralized HTML parser breakouts inside allowed<style>elements; resource-loading CSS such as@import,url(...),image-set(...), and legacy binding/filter constructs could still survive unchanged. - (Severity: Low) Harden preserved
<meta http-equiv=\"refresh\">tags in custom policies. Previously, thecontentattribute was treated as inert text even though browsers interpret it as a client-side redirect instruction, so refresh targets could survive without any URL policy. - (Severity: Low) Harden
link[imagesrcset]sanitization in custom policies. Previously,imagesrcsetwas not treated as URL-bearing at all, so<link rel="preload" as="image">could preserve attacker-controlled remote image candidates without any URL validation. - (Severity: Low) Harden
attributionsrcsanitization in custom policies. Previously,attributionsrcwas not treated as URL-bearing at all, so elements such as<img>could preserve attacker-controlled attribution-reporting endpoints and trigger extra browser requests without any URL validation. - (Severity: Low) Harden security-related attribute transforms against mixed-case attribute names in custom pipelines. Previously, transforms such as
DropAttrs(...),DropUrlAttrs(...),AllowStyleAttrs(...), andMergeAttrs(...)could miss or mis-handleOnClick,SrcDoc,Href,Style,Rel, and similar mixed-case variants unless an earlier step had already normalized names to lowercase. - (Severity: Low) Harden preserved
<base href>tags in custom policies. Previously, a kept<base href="...">could rewrite how later relative URLs resolved in the browser, bypassing per-attribute relative-only URL rules such asimg[src].
Release v1.14.0
Security
- (Severity: Moderate) Harden constructor-time sanitization against mutation XSS in custom policies that preserve foreign namespaces such as MathML or SVG. Previously, crafted markup could sanitize into output that looked safe but became active HTML when reparsed by a browser or downstream parser.
Release v1.13.0
Security
- (Severity: High): Harden fenced code generation in
to_markdown()by choosing backtick delimiters longer than any run inside<pre>content, preventing attacker-controlled backticks from breaking out of code blocks and exposing raw HTML to downstream Markdown renderers. - (Severity: Low): Treat text that starts at the beginning of a rendered Markdown line as text, not block syntax, by escaping line-leading headings, blockquotes, list markers, thematic breaks, setext underlines, and fenced-code delimiters from untrusted HTML content.
Release v1.12.0
Security
- (Severity: High) Markdown output now HTML-escapes text-node content before applying Markdown escaping, preventing attacker-controlled text such as
<script>from turning into raw HTML whento_markdown()output is rendered. - (Severity: Moderate) Sanitization now hardens
scriptandstyleraw-text content by neutralizing embedded closing-tag sequences and dropping non-text children, preventing sanitized DOM trees from serializing into breakout HTML.
Release v1.11.0
Added
- Sanitization: Add
SanitizationPolicy.strip_invisible_unicodeto strip invisible Unicode used for obfuscation from text and attribute values before other sanitizer checks run.
Changed
- Sanitization:
strip_invisible_unicodeis enabled by default and covers variation selectors, zero-width/bidi controls, and private-use characters.
Security
- (Severity: Low) Harden sanitization against invisible-Unicode obfuscation in text, attributes, and URL-like values such as disguised
javascript:schemes.
Release v1.10.0
Security
- (Severity: Low) Harden JustHTML against denial-of-service from attacker-controlled deeply nested HTML. Parsing post-processing, deep cloning, pretty HTML serialization, and Markdown rendering now use iterative traversal instead of recursion, preventing
RecursionErrorcrashes on pathological nesting.
Release v1.9.1
Fixed
- Serialization: Preserve literal text inside
scriptandstyleelements during HTML serialization so round-trips do not turn raw text content like>or&into entity text.
Release v1.9.0
Added
- Builder: Add
justhtml.builderwith explicitelement(),text(),comment(), anddoctype()factories for programmatic HTML construction. - Parser: Allow
JustHTML(...)to accept built nodes directly and normalize them through the existing HTML5 parser. - Docs: Add a dedicated Building HTML guide and expand the API/README documentation around programmatic HTML generation.
Changed
- Sanitization: Preserve doctypes by default in document mode.
- Sanitization: Add
<caption>to the default allowed tag set. - Typing: Normalize
SanitizationPolicy.allowed_tagstofrozenset[str], improving type safety when composing policies.
Fixed
- Builder & Serialization: Preserve arbitrary doctype names and identifiers across build/serialize/parse round-trips.
- Builder: Reject unsupported namespaces up front; builder namespaces are limited to HTML, SVG, and MathML.