Skip to content

Releases: EmilStenstrom/justhtml

Release v1.17.0

19 Apr 20:05

Choose a tag to compare

Security

  • (Severity: Moderate) Harden custom foreign-namespace policies against active HTML integration points in SVG and MathML. Previously, preserved integration points such as <foreignObject>, <annotation-xml encoding="text/html">, SVG <title>/<desc>, and MathML text integration points could keep or host active HTML descendants such as <script> when the sanitized output was rendered.
  • (Severity: Moderate) Harden constructor-time and transform-driven sanitization against preserved <style> rawtext bypasses. Previously, JustHTML(..., sanitize=True) and explicit public Sanitize(...) transforms could preserve resource-loading CSS such as @import or background-image:url(...) in allowlisted <style> blocks from HTML string input, even though sanitize() and sanitize_dom() correctly stripped the same content.
  • (Severity: Low) Harden the low-level terminal Sanitize(...) transform execution path against mutation XSS in custom foreign-namespace policies. Previously, a direct terminal sanitize pass in the transform runtime could sanitize MathML/SVG content into output that looked inert in memory but became active HTML, such as <img onerror>, after a later HTML reparse.
  • (Severity: Low) Harden HTML comment serialization against additional breakout payloads from programmatic Comment(...) nodes. Previously, comment data beginning with invalid states such as > or -> could serialize into an empty HTML comment followed by live markup like injected <img onerror>.
  • (Severity: Moderate) Harden custom foreign-namespace policies against SVG filter="url(...)" fetches. Previously, preserved filter presentation attributes could contain external url(...) references that bypassed URL sanitization and triggered browser fetches.
  • (Severity: Moderate) Harden sanitize() and sanitize_dom() against mutation XSS in custom foreign-namespace policies. Previously, crafted MathML/SVG parser-differential payloads could sanitize into output that looked inert in memory but became active HTML, such as <img onerror>, after a later HTML reparse.
  • (Severity: Low) Harden HTML serialization against rawtext breakout injection from programmatic script and style nodes. Previously, text such as </style><img ...> or </script><img ...> could serialize into active markup through to_html() and downstream to_markdown(html_passthrough=True).
  • (Severity: Low) Harden compiled sanitize-pipeline caching against cache mutation. Previously, once a policy’s compiled sanitizer had been warmed, mutating the cached transform list in place could weaken later sanitize(), sanitize_dom(), and JustHTML(..., sanitize=True) calls, including on the exported default policies.
  • (Severity: Low) Harden the programmatic DOM APIs against cycle creation. Previously, creating parent/child cycles with append_child(), insert_before(), or replace_child() could make operations such as to_html() and sanitize_dom() loop indefinitely on attacker-controlled node graphs.

Release v1.16.0

12 Apr 20:13

Choose a tag to compare

Security

  • (Severity: Low) Harden sanitization policy reuse against nested-state mutation. Previously, mutating nested policy state such as allowed_attributes or url_policy.allow_rules could leave stale compiled sanitizers active in sanitize(), sanitize_dom(), and JustHTML(..., sanitize=True), and mutating exported defaults such as DEFAULT_POLICY.url_policy.allow_rules[("a", "href")].allowed_schemes could weaken later default sanitization process-wide.
  • (Severity: Moderate) Harden sanitize_dom() and sanitize() for programmatic DOM trees with mixed-case dangerous tag names. Previously, nodes such as ScRiPt or Style could miss the drop_content_tags policy in the in-memory sanitization path and incorrectly preserve their children.
  • (Severity: Low) Normalize SanitizationPolicy.drop_content_tags to lowercase. Previously, custom policies using values such as {"SCRIPT"} could silently fail to drop dangerous subtrees in the in-memory sanitization APIs.
  • (Severity: Low) Harden doctype serialization against programmatic doctype-name injection. Previously, a crafted doctype(...) or manual !doctype node name such as html><img ...> could serialize into active markup before the document body.
  • (Severity: Moderate) Harden custom foreign-namespace policies against SVG animation-based URL mutation. Previously, preserved SVG animation elements such as <set> or <animate> could mutate already-sanitized attributes like image[href] after sanitization and trigger remote requests that bypassed the configured URL rules.
  • (Severity: Moderate) Harden custom foreign-namespace policies against SVG url(...) presentation-attribute fetches. Previously, preserved attributes such as fill, clip-path, mask, marker-start, and cursor could contain external url(...) references that bypassed URL sanitization and triggered browser fetches.
  • (Severity: Moderate) Harden rawtext sanitization against mixed-case programmatic style and script tag names. Previously, custom policies that preserved mixed-case nodes such as StYlE could bypass the rawtext hardening pass and keep active stylesheet content such as remote @import rules.
  • (Severity: Moderate) Harden sanitization against programmatic DOM namespace confusion for svg and math subtrees. Previously, nodes constructed with namespace=\"html\" but serialized as <svg>...</svg> could bypass foreign-content checks in sanitize() and sanitize_dom(), allowing active SVG features such as url(...) presentation attributes or animation-based attribute mutation to survive.

Release v1.15.0

09 Apr 05:46

Choose a tag to compare

Security

  • (Severity: Low) Harden HTML comment serialization against comment-breakout injection. Previously, programmatic Comment(...) nodes or transform-produced comment data containing sequences like --> could serialize into active HTML such as injected <img onerror>.
  • (Severity: Low) Harden HTML serialization and the builder against unsafe programmatic element and attribute names. Previously, direct Node(...) usage, transform-produced attrs, or builder.element(...) calls could emit attacker-controlled markup such as injected <img onerror> by including syntax-breaking characters in a tag or attribute name.
  • (Severity: Moderate) Harden JustHTML.clean_url_value(...) and clean_url_in_js_string(...) against HTML character reference smuggling such as javascript&#58..., which could bypass URL scheme validation and become an active javascript: URL after HTML attribute parsing.
  • (Severity: Low) Harden URL sanitization against browser backslash normalization. Previously, “relative” URLs such as \\evil.example/x or /\\evil.example/x could survive sanitization and be interpreted by browsers as remote network requests, bypassing relative-only URL rules such as the default img[src] policy.
  • (Severity: Low) Harden URL sanitization and clean_url_value(...) against malformed bracketed hosts when allowed_hosts is enabled. Previously, inputs such as https://[evil.example]/x could raise ValueError from Python’s URL parser and crash sanitization instead of being rejected.
  • (Severity: Low) Harden to_markdown(html_passthrough=True) for sanitized <textarea> content. Previously, attacker-controlled </textarea> sequences could survive sanitization as text, then break out during Markdown HTML passthrough and turn into active HTML when the Markdown output was reparsed or rendered.
  • (Severity: Low) Harden a[ping] sanitization. Previously, ping was treated as a single URL even though browsers interpret it as a space-separated list of URLs, so a custom policy could allow a trusted first endpoint while unintentionally preserving additional attacker-controlled ping URLs.
  • (Severity: Low) Harden preserved <style> blocks in custom policies. Previously, JustHTML only neutralized HTML parser breakouts inside allowed <style> elements; resource-loading CSS such as @import, url(...), image-set(...), and legacy binding/filter constructs could still survive unchanged.
  • (Severity: Low) Harden preserved <meta http-equiv=\"refresh\"> tags in custom policies. Previously, the content attribute was treated as inert text even though browsers interpret it as a client-side redirect instruction, so refresh targets could survive without any URL policy.
  • (Severity: Low) Harden link[imagesrcset] sanitization in custom policies. Previously, imagesrcset was not treated as URL-bearing at all, so <link rel="preload" as="image"> could preserve attacker-controlled remote image candidates without any URL validation.
  • (Severity: Low) Harden attributionsrc sanitization in custom policies. Previously, attributionsrc was not treated as URL-bearing at all, so elements such as <img> could preserve attacker-controlled attribution-reporting endpoints and trigger extra browser requests without any URL validation.
  • (Severity: Low) Harden security-related attribute transforms against mixed-case attribute names in custom pipelines. Previously, transforms such as DropAttrs(...), DropUrlAttrs(...), AllowStyleAttrs(...), and MergeAttrs(...) could miss or mis-handle OnClick, SrcDoc, Href, Style, Rel, and similar mixed-case variants unless an earlier step had already normalized names to lowercase.
  • (Severity: Low) Harden preserved <base href> tags in custom policies. Previously, a kept <base href="..."> could rewrite how later relative URLs resolved in the browser, bypassing per-attribute relative-only URL rules such as img[src].

Release v1.14.0

05 Apr 10:44

Choose a tag to compare

Security

  • (Severity: Moderate) Harden constructor-time sanitization against mutation XSS in custom policies that preserve foreign namespaces such as MathML or SVG. Previously, crafted markup could sanitize into output that looked safe but became active HTML when reparsed by a browser or downstream parser.

Release v1.13.0

21 Mar 20:46

Choose a tag to compare

Security

  • (Severity: High): Harden fenced code generation in to_markdown() by choosing backtick delimiters longer than any run inside <pre> content, preventing attacker-controlled backticks from breaking out of code blocks and exposing raw HTML to downstream Markdown renderers.
  • (Severity: Low): Treat text that starts at the beginning of a rendered Markdown line as text, not block syntax, by escaping line-leading headings, blockquotes, list markers, thematic breaks, setext underlines, and fenced-code delimiters from untrusted HTML content.

Release v1.12.0

17 Mar 21:58

Choose a tag to compare

Security

  • (Severity: High) Markdown output now HTML-escapes text-node content before applying Markdown escaping, preventing attacker-controlled text such as <script> from turning into raw HTML when to_markdown() output is rendered.
  • (Severity: Moderate) Sanitization now hardens script and style raw-text content by neutralizing embedded closing-tag sequences and dropping non-text children, preventing sanitized DOM trees from serializing into breakout HTML.

Release v1.11.0

15 Mar 22:04

Choose a tag to compare

Added

  • Sanitization: Add SanitizationPolicy.strip_invisible_unicode to strip invisible Unicode used for obfuscation from text and attribute values before other sanitizer checks run.

Changed

  • Sanitization: strip_invisible_unicode is enabled by default and covers variation selectors, zero-width/bidi controls, and private-use characters.

Security

  • (Severity: Low) Harden sanitization against invisible-Unicode obfuscation in text, attributes, and URL-like values such as disguised javascript: schemes.

Release v1.10.0

15 Mar 14:59

Choose a tag to compare

Security

  • (Severity: Low) Harden JustHTML against denial-of-service from attacker-controlled deeply nested HTML. Parsing post-processing, deep cloning, pretty HTML serialization, and Markdown rendering now use iterative traversal instead of recursion, preventing RecursionError crashes on pathological nesting.

Release v1.9.1

10 Mar 20:09

Choose a tag to compare

Fixed

  • Serialization: Preserve literal text inside script and style elements during HTML serialization so round-trips do not turn raw text content like > or & into entity text.

Release v1.9.0

08 Mar 22:46

Choose a tag to compare

Added

  • Builder: Add justhtml.builder with explicit element(), text(), comment(), and doctype() factories for programmatic HTML construction.
  • Parser: Allow JustHTML(...) to accept built nodes directly and normalize them through the existing HTML5 parser.
  • Docs: Add a dedicated Building HTML guide and expand the API/README documentation around programmatic HTML generation.

Changed

  • Sanitization: Preserve doctypes by default in document mode.
  • Sanitization: Add <caption> to the default allowed tag set.
  • Typing: Normalize SanitizationPolicy.allowed_tags to frozenset[str], improving type safety when composing policies.

Fixed

  • Builder & Serialization: Preserve arbitrary doctype names and identifiers across build/serialize/parse round-trips.
  • Builder: Reject unsupported namespaces up front; builder namespaces are limited to HTML, SVG, and MathML.