Skip to content

Conversation

@samuelhuang
Copy link
Contributor

@samuelhuang samuelhuang commented Nov 10, 2025

This CL fixes an issue where images nested within <div> elements
inside a <figure> tag were being incorrectly removed by Readability's
cleaning process. Specifically, a structure like
<figure><div><div><img></div></div></figure> would first be
transformed to <figure><div><p><img></p></div></figure>, and then
the outer <div> (and its contents) would be erroneously identified
as extraneous and removed.

The fix introduces a targeted exception within _cleanConditionally().
It prevents the removal of <div> elements that meet the following
criteria:

  • The element is a <div>.
  • The <div> is an ancestor of a <figure> element.
  • The <div> contains a single <img> element (potentially nested).

Also add test case allrecipes-1 taken from:
https://www.allrecipes.com/hot-honey-brussels-sprouts-recipe-11832010

@samuelhuang
Copy link
Contributor Author

Context for this change: For recipe website:

https://www.allrecipes.com/hot-honey-brussels-sprouts-recipe-11832010

Readability omits image under "Directions" section.

@samuelhuang
Copy link
Contributor Author

The "1 failing check" was from npm audit, which disappears when I run npm audit fix. But running this updates package-lock.json. Should I bundle the change to this CL?

@gijsk
Copy link
Contributor

gijsk commented Nov 17, 2025

Apologies for the slow response, I was at a dedicated workweek last week. In addition to reviewing this PR more closely, I'll try to find some time this week or next to get some other folks to sign up to help with reviews here.

The "1 failing check" was from npm audit, which disappears when I run npm audit fix. But running this updates package-lock.json. Should I bundle the change to this CL?

Ideally submit a separate CL with just that change?

Context for this change: For recipe website:

https://www.allrecipes.com/hot-honey-brussels-sprouts-recipe-11832010

Readability omits image under "Directions" section.

Would you mind adding this or a similar case as a testcase so we don't regress this in future?

}

// Handle <img> buried inside nested <div> layers in <figure>.
if (tag === "div" && this._hasAncestorTag(node, "figure") && this._isSingleImage(node)) {
Copy link
Contributor

@gijsk gijsk Nov 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This works but is a little inefficient if the divs are all nested, because the getElementsByTagName call that is being used for the divs here will find each of the ancestor divs and re-walk it to find the same image (and the ancestor chain to find the same figure).

Would it be feasible to de-nest singly nested divs with images? Along the lines of _prepareBrs, do a prepareImages or something like that. Presumably the nesting of all the divs isn't actually useful for anything if we're removing content otherwise - though I suppose there will be a question of what to do with ids/classnames in terms of scoring...

Also, I'm curious how this works out when there are figure tags with these deeply nested images that also have other content in them (so they fail the _isSingleImage check). From a naive inspection of the page you linked (haven't tried to debug readability right now), it would seem that the page in question does have a noscript tag in the figure as well...

... in fact, why doesn't the logic in this helper already deal with all of this? Why does it fall down for this website?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The <noscript> tags in our example embeds <img> as escaped text content, not HTML tags. That seems to be why _isSingleImage() fails the check.

@gijsk
Copy link
Contributor

gijsk commented Nov 18, 2025

The "1 failing check" was from npm audit, which disappears when I run npm audit fix. But running this updates package-lock.json. Should I bundle the change to this CL?

Ideally submit a separate CL with just that change?

Perhaps this was #990 ? I just merged that, hopefully just rebasing will address this.

@gijsk
Copy link
Contributor

gijsk commented Nov 18, 2025

Egh, sorry for the spam, but actually looking at the logfile for the failing PR I think the npm audit thing is more visible due to the coloured formatting, but the actual problem is probably:

[warn] Readability.js
[warn] Code style issues found in the above file. Run Prettier with --write to fix.

? Which would kind of make sense, in that I didn't think the automation would block on npm audit issues at all...

@samuelhuang
Copy link
Contributor Author

samuelhuang commented Nov 25, 2025

Okay I had more time to dig into _unwrapNoscriptImages() further, and see why it didn't help. First let's list relevant code:

  _unwrapNoscriptImages(doc) {

   ...

    // Next find noscript and try to extract its image
    var noscripts = Array.from(doc.getElementsByTagName("noscript"));
    this._forEachNode(noscripts, function (noscript) {
      // Parse content of noscript and make sure it only contains image
      if (!this._isSingleImage(noscript)) {
        return;
      }
      var tmp = doc.createElement("div");
      // We're running in the document context, and using unmodified
      // document contents, so doing this should be safe.
      // (Also we heavily discourage people from allowing script to
      // run at all in this document...)
      // eslint-disable-next-line no-unsanitized/property
      tmp.innerHTML = noscript.innerHTML;

Background info: When JS is available, <noscript> tag contents become text that don't get shown.

  1. _unwrapNoscriptImages() was added here, and had:

         // In jsdom content of noscript is treated as string, so here we parse it.
          var tmp = doc.createElement("div");
          tmp.innerHTML = noscript.innerHTML;
    
          // Make sure noscript only has one children, and it's <img> element
          var children = tmp.children;
          if (children.length != 1 || children[0].tagName !== "IMG") {
            return;
          }

    Problem: When a <noscript> element is cloned, then its innerHTML gets an extra escape. I saw this on Chrome and Firefox. Test code:

    <!DOCTYPE html>
    <html>
    <head>
      <meta charset="utf-8">
      <title>&lt;noscript&gt; innerHTML Discrepancy</title>
      <script>
    document.addEventListener('DOMContentLoaded', () => {
      const ns = document.querySelector('noscript');
      const cns = document.cloneNode(true).querySelector('noscript');
      console.log(ns.textContent);
      console.log(ns.innerHTML);
      console.log(cns.textContent);
      console.log(cns.innerHTML);
    });
      </script>
    </head>
    <body>
    <noscript>
      <div>Hello</div>
    </noscript>
    </body>
    </html>

    the console output is

      <div>Hello</div>
      <div>Hello</div>
      <div>Hello</div>
      &lt;div&gt;Hello&lt;/div&gt;
    

    Likely solution: Pass noscript.textContent instead of noscript.innerHTML

  2. Turns out the code above don't get called at all: This 2024 CL moved the early-exit code earlier:

       if (!this._isSingleImage(noscript)) {
         return;
       }
       var tmp = doc.createElement("div");
       // We're running in the document context, and using unmodified
       // document contents, so doing this should be safe.
       // (Also we heavily discourage people from allowing script to
       // run at all in this document...)
       // eslint-disable-next-line no-unsanitized/property
       tmp.innerHTML = noscript.innerHTML;
    

    But this just passes text to _isSingleImage(), which (in versions then and now) would assumes an Element is given; so this fails, and the unwrap code is skipped.

    Likely solution: Revert the change.

  3. Even if the above problems don't exist, in our test page , both <img> and the succeeding <noscript> are both nested under 2 layers of <div>s. Therefore unwrap wouldn't help us anyway!

So we should keep the approach of this CL?

This CL fixes an issue where images nested within `<div>` elements
inside a `<figure>` tag were being incorrectly removed by Readability's
cleaning process. Specifically, a structure like
`<figure><div><div><img></div></div></figure>` would first be
transformed to `<figure><div><p><img></p></div></figure>`, and then
the outer `<div>` (and its contents) would be erroneously identified
as extraneous and removed.

The fix introduces a targeted exception within _cleanConditionally().
It prevents the removal of `<div>` elements that meet the following
criteria:
* The element is a `<div>`.
* The `<div>` is an ancestor of a `<figure>` element.
* The `<div>` contains a single `<img>` element (potentially nested).

Also add test case allrecipes-1 taken from:
https://www.allrecipes.com/hot-honey-brussels-sprouts-recipe-11832010
@samuelhuang samuelhuang changed the title Preserve images within figure elements. Preserve images within figure elements Nov 28, 2025
@samuelhuang
Copy link
Contributor Author

Added new test case, PTAL. Thanks!

@gijsk
Copy link
Contributor

gijsk commented Dec 3, 2025

Okay I had more time to dig into _unwrapNoscriptImages() further, and see why it didn't help.

Thanks for doing this digging, @samuelhuang !

2. Turns out the code above don't get called at all: This [2024 CL](https://github.com/mozilla/readability/commit/0727c6e6446c9ac7e78705353cbd1334f77b1c84) moved the early-exit code earlier:
   ```
      if (!this._isSingleImage(noscript)) {
        return;
      }
      var tmp = doc.createElement("div");
      // We're running in the document context, and using unmodified
      // document contents, so doing this should be safe.
      // (Also we heavily discourage people from allowing script to
      // run at all in this document...)
      // eslint-disable-next-line no-unsanitized/property
      tmp.innerHTML = noscript.innerHTML;
   ```
     
   But this just passes text to `_isSingleImage()`, which (in versions then and now) would assumes an Element is given; so this fails, and the unwrap code is skipped.
   Likely solution: Revert the change.

To be clear, this is still passing an element, but the element only has text content rather than a DOM tree representing that "HTML" (text content), is what you mean, right?

I was confused why this would not have caused test failures if it was so different from before (without the code moving). From some searching, it seems we're not encountering issues in automation because jsdom parses with scripts disabled, in which case markup inside noscript is parsed into the DOM as per spec, but web browsers don't (unless scripting is disabled in the browser, I guess), which leads to the escaped content that you mentioned earlier. This is an HTML spec feature. We don't want to enable script loading in the unit tests in jsdom. So in that case, the automated tests would work, and the library is working for consumers using jsdom (without scripts) but not working in web browsers. That's unfortunate. :-(

Perhaps we should update JSDOMParser.js in this same repo to behave like browsers do for <noscript> just to improve test coverage?

3. Even if the above problems don't exist, in our [test page](https://www.allrecipes.com/hot-honey-brussels-sprouts-recipe-11832010) , both `<img>` and the succeeding `<noscript>` are both nested under 2 layers of `<div>`s. Therefore unwrap wouldn't help us anyway!

I think this is the part I would prefer to fix, if possible? I tried to do this in #993 . PTAL and let me know what you think? I wanted to link to a comparison of the expected.html output but I can't figure out how to get github to do this... for convenience, this is a simple git formatted diff between the "post patch" state of your PR and mine:

diff --git a/test/test-pages/allrecipes-1/expected.html b/test/test-pages/allrecipes-1/expected.html
index 2190e23d7d..2939e04280 100644
--- a/test/test-pages/allrecipes-1/expected.html
+++ b/test/test-pages/allrecipes-1/expected.html
@@ -12,9 +12,7 @@
                         <div data-tooltip="Nicole Russell is a prolific contributor to Allrecipes and an avid member of the Allrecipes Allstars." tabindex="0" data-inline-tooltip="true">
                             <p><a href="https://www.allrecipes.com/nicole-russell-7113592" rel="nocaes" data-trigger-link="true" tabindex="-1">Nicole Russell</a></p>
                             <div id="mntl-author-tooltip_1-0">
-                                <div>
-                                    <p><img src="https://www.allrecipes.com/thmb/uYVcXCSJ-dbRVP6aENNapCfLdlY=/75x75/filters:no_upscale():max_bytes(150000):strip_icc()/NicoleRussellSoupLovingNicole-9607577801804bd09ea93bbdb7fcae6d.jpg" width="75" height="75" srcset="https://www.allrecipes.com/thmb/Trlj084IC3wBi4LKJPIKHEidK_Y=/40x0/filters:no_upscale():max_bytes(150000):strip_icc()/NicoleRussellSoupLovingNicole-9607577801804bd09ea93bbdb7fcae6d.jpg 40w, https://www.allrecipes.com/thmb/PjK7K3x5635tX12yHK_IGTJCha8=/58x0/filters:no_upscale():max_bytes(150000):strip_icc()/NicoleRussellSoupLovingNicole-9607577801804bd09ea93bbdb7fcae6d.jpg 58w, https://www.allrecipes.com/thmb/r4I1SWydIlNIwza6llxnx6EUpZ4=/76x0/filters:no_upscale():max_bytes(150000):strip_icc()/NicoleRussellSoupLovingNicole-9607577801804bd09ea93bbdb7fcae6d.jpg 76w, https://www.allrecipes.com/thmb/pp0xDFVleEUku7amHGoSEAIxVdw=/94x0/filters:no_upscale():max_bytes(150000):strip_icc()/NicoleRussellSoupLovingNicole-9607577801804bd09ea93bbdb7fcae6d.jpg 94w, https://www.allrecipes.com/thmb/XIiaJQ1BUWIDHcE9XW8jQyRKIBQ=/112x0/filters:no_upscale():max_bytes(150000):strip_icc()/NicoleRussellSoupLovingNicole-9607577801804bd09ea93bbdb7fcae6d.jpg 112w" sizes="80px" alt="Nicole Russell" data-src="https://www.allrecipes.com/thmb/uYVcXCSJ-dbRVP6aENNapCfLdlY=/75x75/filters:no_upscale():max_bytes(150000):strip_icc()/NicoleRussellSoupLovingNicole-9607577801804bd09ea93bbdb7fcae6d.jpg" data-srcset="https://www.allrecipes.com/thmb/Trlj084IC3wBi4LKJPIKHEidK_Y=/40x0/filters:no_upscale():max_bytes(150000):strip_icc()/NicoleRussellSoupLovingNicole-9607577801804bd09ea93bbdb7fcae6d.jpg 40w, https://www.allrecipes.com/thmb/PjK7K3x5635tX12yHK_IGTJCha8=/58x0/filters:no_upscale():max_bytes(150000):strip_icc()/NicoleRussellSoupLovingNicole-9607577801804bd09ea93bbdb7fcae6d.jpg 58w, https://www.allrecipes.com/thmb/r4I1SWydIlNIwza6llxnx6EUpZ4=/76x0/filters:no_upscale():max_bytes(150000):strip_icc()/NicoleRussellSoupLovingNicole-9607577801804bd09ea93bbdb7fcae6d.jpg 76w, https://www.allrecipes.com/thmb/pp0xDFVleEUku7amHGoSEAIxVdw=/94x0/filters:no_upscale():max_bytes(150000):strip_icc()/NicoleRussellSoupLovingNicole-9607577801804bd09ea93bbdb7fcae6d.jpg 94w, https://www.allrecipes.com/thmb/XIiaJQ1BUWIDHcE9XW8jQyRKIBQ=/112x0/filters:no_upscale():max_bytes(150000):strip_icc()/NicoleRussellSoupLovingNicole-9607577801804bd09ea93bbdb7fcae6d.jpg 112w" /></p>
-                                </div>
+                                <p><img src="https://www.allrecipes.com/thmb/uYVcXCSJ-dbRVP6aENNapCfLdlY=/75x75/filters:no_upscale():max_bytes(150000):strip_icc()/NicoleRussellSoupLovingNicole-9607577801804bd09ea93bbdb7fcae6d.jpg" width="75" height="75" srcset="https://www.allrecipes.com/thmb/Trlj084IC3wBi4LKJPIKHEidK_Y=/40x0/filters:no_upscale():max_bytes(150000):strip_icc()/NicoleRussellSoupLovingNicole-9607577801804bd09ea93bbdb7fcae6d.jpg 40w, https://www.allrecipes.com/thmb/PjK7K3x5635tX12yHK_IGTJCha8=/58x0/filters:no_upscale():max_bytes(150000):strip_icc()/NicoleRussellSoupLovingNicole-9607577801804bd09ea93bbdb7fcae6d.jpg 58w, https://www.allrecipes.com/thmb/r4I1SWydIlNIwza6llxnx6EUpZ4=/76x0/filters:no_upscale():max_bytes(150000):strip_icc()/NicoleRussellSoupLovingNicole-9607577801804bd09ea93bbdb7fcae6d.jpg 76w, https://www.allrecipes.com/thmb/pp0xDFVleEUku7amHGoSEAIxVdw=/94x0/filters:no_upscale():max_bytes(150000):strip_icc()/NicoleRussellSoupLovingNicole-9607577801804bd09ea93bbdb7fcae6d.jpg 94w, https://www.allrecipes.com/thmb/XIiaJQ1BUWIDHcE9XW8jQyRKIBQ=/112x0/filters:no_upscale():max_bytes(150000):strip_icc()/NicoleRussellSoupLovingNicole-9607577801804bd09ea93bbdb7fcae6d.jpg 112w" sizes="80px" alt="Nicole Russell" data-src="https://www.allrecipes.com/thmb/uYVcXCSJ-dbRVP6aENNapCfLdlY=/75x75/filters:no_upscale():max_bytes(150000):strip_icc()/NicoleRussellSoupLovingNicole-9607577801804bd09ea93bbdb7fcae6d.jpg" data-srcset="https://www.allrecipes.com/thmb/Trlj084IC3wBi4LKJPIKHEidK_Y=/40x0/filters:no_upscale():max_bytes(150000):strip_icc()/NicoleRussellSoupLovingNicole-9607577801804bd09ea93bbdb7fcae6d.jpg 40w, https://www.allrecipes.com/thmb/PjK7K3x5635tX12yHK_IGTJCha8=/58x0/filters:no_upscale():max_bytes(150000):strip_icc()/NicoleRussellSoupLovingNicole-9607577801804bd09ea93bbdb7fcae6d.jpg 58w, https://www.allrecipes.com/thmb/r4I1SWydIlNIwza6llxnx6EUpZ4=/76x0/filters:no_upscale():max_bytes(150000):strip_icc()/NicoleRussellSoupLovingNicole-9607577801804bd09ea93bbdb7fcae6d.jpg 76w, https://www.allrecipes.com/thmb/pp0xDFVleEUku7amHGoSEAIxVdw=/94x0/filters:no_upscale():max_bytes(150000):strip_icc()/NicoleRussellSoupLovingNicole-9607577801804bd09ea93bbdb7fcae6d.jpg 94w, https://www.allrecipes.com/thmb/XIiaJQ1BUWIDHcE9XW8jQyRKIBQ=/112x0/filters:no_upscale():max_bytes(150000):strip_icc()/NicoleRussellSoupLovingNicole-9607577801804bd09ea93bbdb7fcae6d.jpg 112w" /></p>
                                 <p> Nicole Russell is a prolific contributor to Allrecipes and an avid member of the Allrecipes Allstars. </p>
                             </div>
                         </div>
@@ -24,9 +22,7 @@
             </div>
             <div>
                 <figure id="figure-article_1-0" data-click-tracked="false">
-                    <div>
-                        <p><img src="https://www.allrecipes.com/thmb/RDigrCswgpVoo9dPZxanUuRIudI=/1500x0/filters:no_upscale():max_bytes(150000):strip_icc()/11832010-hot-honey-brussels-sprouts-recipe-VAT-Beauty-4x3-5ee1ce8dde46479da6759f4cf0d8181c.jpg" width="4000" height="3000" srcset="https://www.allrecipes.com/thmb/LNvP40zVI2ZbS5O3gRnmmadoMEM=/750x0/filters:no_upscale():max_bytes(150000):strip_icc()/11832010-hot-honey-brussels-sprouts-recipe-VAT-Beauty-4x3-5ee1ce8dde46479da6759f4cf0d8181c.jpg 750w" sizes="750px" alt="Dish of roasted brussels sprouts with crumbled cheese and garnish" /></p>
-                    </div>
+                    <img src="https://www.allrecipes.com/thmb/RDigrCswgpVoo9dPZxanUuRIudI=/1500x0/filters:no_upscale():max_bytes(150000):strip_icc()/11832010-hot-honey-brussels-sprouts-recipe-VAT-Beauty-4x3-5ee1ce8dde46479da6759f4cf0d8181c.jpg" width="4000" height="3000" srcset="https://www.allrecipes.com/thmb/LNvP40zVI2ZbS5O3gRnmmadoMEM=/750x0/filters:no_upscale():max_bytes(150000):strip_icc()/11832010-hot-honey-brussels-sprouts-recipe-VAT-Beauty-4x3-5ee1ce8dde46479da6759f4cf0d8181c.jpg 750w" sizes="750px" alt="Dish of roasted brussels sprouts with crumbled cheese and garnish" />
                 </figure>
                 <div id="article-content_1-0">
                     <div id="mm-recipes-details_1-0">
@@ -106,9 +102,7 @@
                                 <li id="mntl-sc-block_2-0">
                                     <p id="mntl-sc-block_3-0"> Gather all ingredients. Preheat the oven to 400 degrees F (200 degrees C). Line a large baking sheet with aluminum foil. </p>
                                     <figure id="mntl-sc-block_4-0">
-                                        <div>
-                                            <p><img src="https://www.allrecipes.com/thmb/aqdax3V1v0dM1iYGEdk80SuiUW4=/1500x0/filters:no_upscale():max_bytes(150000):strip_icc()/11832010-hot-honey-brussels-sprouts-recipe-VAT-step-01-ced062a22ba24062afd44585679f2b11.jpg" width="4000" height="3000" srcset="https://www.allrecipes.com/thmb/YgfY5c5ZN8GjWNeoWQQ5ALoNWhM=/750x0/filters:no_upscale():max_bytes(150000):strip_icc()/11832010-hot-honey-brussels-sprouts-recipe-VAT-step-01-ced062a22ba24062afd44585679f2b11.jpg 750w" sizes="750px" alt="Ingredients for a recipe including Brussels sprouts oil honey cheese and seasonings arranged in bowls on a marble surface" data-src="https://www.allrecipes.com/thmb/aqdax3V1v0dM1iYGEdk80SuiUW4=/1500x0/filters:no_upscale():max_bytes(150000):strip_icc()/11832010-hot-honey-brussels-sprouts-recipe-VAT-step-01-ced062a22ba24062afd44585679f2b11.jpg" data-srcset="https://www.allrecipes.com/thmb/YgfY5c5ZN8GjWNeoWQQ5ALoNWhM=/750x0/filters:no_upscale():max_bytes(150000):strip_icc()/11832010-hot-honey-brussels-sprouts-recipe-VAT-step-01-ced062a22ba24062afd44585679f2b11.jpg 750w" data-hi-res-src="https://www.allrecipes.com/thmb/aqdax3V1v0dM1iYGEdk80SuiUW4=/1500x0/filters:no_upscale():max_bytes(150000):strip_icc()/11832010-hot-honey-brussels-sprouts-recipe-VAT-step-01-ced062a22ba24062afd44585679f2b11.jpg" /></p>
-                                        </div>
+                                        <img src="https://www.allrecipes.com/thmb/aqdax3V1v0dM1iYGEdk80SuiUW4=/1500x0/filters:no_upscale():max_bytes(150000):strip_icc()/11832010-hot-honey-brussels-sprouts-recipe-VAT-step-01-ced062a22ba24062afd44585679f2b11.jpg" width="4000" height="3000" srcset="https://www.allrecipes.com/thmb/YgfY5c5ZN8GjWNeoWQQ5ALoNWhM=/750x0/filters:no_upscale():max_bytes(150000):strip_icc()/11832010-hot-honey-brussels-sprouts-recipe-VAT-step-01-ced062a22ba24062afd44585679f2b11.jpg 750w" sizes="750px" alt="Ingredients for a recipe including Brussels sprouts oil honey cheese and seasonings arranged in bowls on a marble surface" data-src="https://www.allrecipes.com/thmb/aqdax3V1v0dM1iYGEdk80SuiUW4=/1500x0/filters:no_upscale():max_bytes(150000):strip_icc()/11832010-hot-honey-brussels-sprouts-recipe-VAT-step-01-ced062a22ba24062afd44585679f2b11.jpg" data-srcset="https://www.allrecipes.com/thmb/YgfY5c5ZN8GjWNeoWQQ5ALoNWhM=/750x0/filters:no_upscale():max_bytes(150000):strip_icc()/11832010-hot-honey-brussels-sprouts-recipe-VAT-step-01-ced062a22ba24062afd44585679f2b11.jpg 750w" data-hi-res-src="https://www.allrecipes.com/thmb/aqdax3V1v0dM1iYGEdk80SuiUW4=/1500x0/filters:no_upscale():max_bytes(150000):strip_icc()/11832010-hot-honey-brussels-sprouts-recipe-VAT-step-01-ced062a22ba24062afd44585679f2b11.jpg" />
                                         <figcaption id="mntl-figure-caption_1-0">
                                             <p>
                                                 <span>Allrecipes / Julia Hartbeck</span>
@@ -119,9 +113,7 @@
                                 <li id="mntl-sc-block_7-0">
                                     <p id="mntl-sc-block_8-0"> Place Brussels sprouts in a bowl. Add oil, hot honey, salt, and pepper. Stir to combine and transfer to the baking sheet. </p>
                                     <figure id="mntl-sc-block_9-0">
-                                        <div>
-                                            <p><img src="https://www.allrecipes.com/thmb/eUPioUqRphcDxRjTsAyjPzwxZJc=/1500x0/filters:no_upscale():max_bytes(150000):strip_icc()/11832010-hot-honey-brussels-sprouts-recipe-VAT-step-02-9032bfa0cf5f484fb0effe17210fb06a.jpg" width="4000" height="3000" srcset="https://www.allrecipes.com/thmb/MABSZFZSQiMH9-AAdON6zeW12Xc=/750x0/filters:no_upscale():max_bytes(150000):strip_icc()/11832010-hot-honey-brussels-sprouts-recipe-VAT-step-02-9032bfa0cf5f484fb0effe17210fb06a.jpg 750w" sizes="750px" alt="A baking sheet with halved Brussels sprouts arranged on foil prepared for cooking" data-src="https://www.allrecipes.com/thmb/eUPioUqRphcDxRjTsAyjPzwxZJc=/1500x0/filters:no_upscale():max_bytes(150000):strip_icc()/11832010-hot-honey-brussels-sprouts-recipe-VAT-step-02-9032bfa0cf5f484fb0effe17210fb06a.jpg" data-srcset="https://www.allrecipes.com/thmb/MABSZFZSQiMH9-AAdON6zeW12Xc=/750x0/filters:no_upscale():max_bytes(150000):strip_icc()/11832010-hot-honey-brussels-sprouts-recipe-VAT-step-02-9032bfa0cf5f484fb0effe17210fb06a.jpg 750w" data-hi-res-src="https://www.allrecipes.com/thmb/eUPioUqRphcDxRjTsAyjPzwxZJc=/1500x0/filters:no_upscale():max_bytes(150000):strip_icc()/11832010-hot-honey-brussels-sprouts-recipe-VAT-step-02-9032bfa0cf5f484fb0effe17210fb06a.jpg" /></p>
-                                        </div>
+                                        <img src="https://www.allrecipes.com/thmb/eUPioUqRphcDxRjTsAyjPzwxZJc=/1500x0/filters:no_upscale():max_bytes(150000):strip_icc()/11832010-hot-honey-brussels-sprouts-recipe-VAT-step-02-9032bfa0cf5f484fb0effe17210fb06a.jpg" width="4000" height="3000" srcset="https://www.allrecipes.com/thmb/MABSZFZSQiMH9-AAdON6zeW12Xc=/750x0/filters:no_upscale():max_bytes(150000):strip_icc()/11832010-hot-honey-brussels-sprouts-recipe-VAT-step-02-9032bfa0cf5f484fb0effe17210fb06a.jpg 750w" sizes="750px" alt="A baking sheet with halved Brussels sprouts arranged on foil prepared for cooking" data-src="https://www.allrecipes.com/thmb/eUPioUqRphcDxRjTsAyjPzwxZJc=/1500x0/filters:no_upscale():max_bytes(150000):strip_icc()/11832010-hot-honey-brussels-sprouts-recipe-VAT-step-02-9032bfa0cf5f484fb0effe17210fb06a.jpg" data-srcset="https://www.allrecipes.com/thmb/MABSZFZSQiMH9-AAdON6zeW12Xc=/750x0/filters:no_upscale():max_bytes(150000):strip_icc()/11832010-hot-honey-brussels-sprouts-recipe-VAT-step-02-9032bfa0cf5f484fb0effe17210fb06a.jpg 750w" data-hi-res-src="https://www.allrecipes.com/thmb/eUPioUqRphcDxRjTsAyjPzwxZJc=/1500x0/filters:no_upscale():max_bytes(150000):strip_icc()/11832010-hot-honey-brussels-sprouts-recipe-VAT-step-02-9032bfa0cf5f484fb0effe17210fb06a.jpg" />
                                         <figcaption id="mntl-figure-caption_2-0">
                                             <p>
                                                 <span>Allrecipes / Julia Hartbeck</span>
@@ -132,9 +124,7 @@
                                 <li id="mntl-sc-block_12-0">
                                     <p id="mntl-sc-block_13-0"> Roast in the preheated oven for 20 minutes. </p>
                                     <figure id="mntl-sc-block_14-0">
-                                        <div>
-                                            <p><img src="https://www.allrecipes.com/thmb/UB3Jz-pfoXewTAcjSJz4RBjrigY=/1500x0/filters:no_upscale():max_bytes(150000):strip_icc()/11832010-hot-honey-brussels-sprouts-recipe-VAT-step-03-8b8c035f2a964810b5e253bb4a95ce8d.jpg" width="4000" height="3000" srcset="https://www.allrecipes.com/thmb/2eqyp1WAanqlJ_UvmyL7dRxlfoM=/750x0/filters:no_upscale():max_bytes(150000):strip_icc()/11832010-hot-honey-brussels-sprouts-recipe-VAT-step-03-8b8c035f2a964810b5e253bb4a95ce8d.jpg 750w" sizes="750px" alt="Roasted Brussels sprouts on a foillined baking sheet" data-src="https://www.allrecipes.com/thmb/UB3Jz-pfoXewTAcjSJz4RBjrigY=/1500x0/filters:no_upscale():max_bytes(150000):strip_icc()/11832010-hot-honey-brussels-sprouts-recipe-VAT-step-03-8b8c035f2a964810b5e253bb4a95ce8d.jpg" data-srcset="https://www.allrecipes.com/thmb/2eqyp1WAanqlJ_UvmyL7dRxlfoM=/750x0/filters:no_upscale():max_bytes(150000):strip_icc()/11832010-hot-honey-brussels-sprouts-recipe-VAT-step-03-8b8c035f2a964810b5e253bb4a95ce8d.jpg 750w" data-hi-res-src="https://www.allrecipes.com/thmb/UB3Jz-pfoXewTAcjSJz4RBjrigY=/1500x0/filters:no_upscale():max_bytes(150000):strip_icc()/11832010-hot-honey-brussels-sprouts-recipe-VAT-step-03-8b8c035f2a964810b5e253bb4a95ce8d.jpg" /></p>
-                                        </div>
+                                        <img src="https://www.allrecipes.com/thmb/UB3Jz-pfoXewTAcjSJz4RBjrigY=/1500x0/filters:no_upscale():max_bytes(150000):strip_icc()/11832010-hot-honey-brussels-sprouts-recipe-VAT-step-03-8b8c035f2a964810b5e253bb4a95ce8d.jpg" width="4000" height="3000" srcset="https://www.allrecipes.com/thmb/2eqyp1WAanqlJ_UvmyL7dRxlfoM=/750x0/filters:no_upscale():max_bytes(150000):strip_icc()/11832010-hot-honey-brussels-sprouts-recipe-VAT-step-03-8b8c035f2a964810b5e253bb4a95ce8d.jpg 750w" sizes="750px" alt="Roasted Brussels sprouts on a foillined baking sheet" data-src="https://www.allrecipes.com/thmb/UB3Jz-pfoXewTAcjSJz4RBjrigY=/1500x0/filters:no_upscale():max_bytes(150000):strip_icc()/11832010-hot-honey-brussels-sprouts-recipe-VAT-step-03-8b8c035f2a964810b5e253bb4a95ce8d.jpg" data-srcset="https://www.allrecipes.com/thmb/2eqyp1WAanqlJ_UvmyL7dRxlfoM=/750x0/filters:no_upscale():max_bytes(150000):strip_icc()/11832010-hot-honey-brussels-sprouts-recipe-VAT-step-03-8b8c035f2a964810b5e253bb4a95ce8d.jpg 750w" data-hi-res-src="https://www.allrecipes.com/thmb/UB3Jz-pfoXewTAcjSJz4RBjrigY=/1500x0/filters:no_upscale():max_bytes(150000):strip_icc()/11832010-hot-honey-brussels-sprouts-recipe-VAT-step-03-8b8c035f2a964810b5e253bb4a95ce8d.jpg" />
                                         <figcaption id="mntl-figure-caption_3-0">
                                             <p>
                                                 <span>Allrecipes / Julia Hartbeck</span>
@@ -145,9 +135,7 @@
                                 <li id="mntl-sc-block_17-0">
                                     <p id="mntl-sc-block_18-0"> Transfer sprouts to a bowl. Top with feta and scallions. Toss to combine. Serve immediately. </p>
                                     <figure id="mntl-sc-block_19-0">
-                                        <div>
-                                            <p><img src="https://www.allrecipes.com/thmb/TKyvsRHU4FOU6HJ50w5StS_jOqk=/1500x0/filters:no_upscale():max_bytes(150000):strip_icc()/11832010-hot-honey-brussels-sprouts-recipe-VAT-step-04-59530f25a17644ab8d6d5ebd94c6d064.jpg" width="4000" height="3000" srcset="https://www.allrecipes.com/thmb/lUcNGpjFIbAT1nndRb9huLaamgo=/750x0/filters:no_upscale():max_bytes(150000):strip_icc()/11832010-hot-honey-brussels-sprouts-recipe-VAT-step-04-59530f25a17644ab8d6d5ebd94c6d064.jpg 750w" sizes="750px" alt="Bowl of roasted Brussels sprouts garnished with toppings and a spoon on the side" data-src="https://www.allrecipes.com/thmb/TKyvsRHU4FOU6HJ50w5StS_jOqk=/1500x0/filters:no_upscale():max_bytes(150000):strip_icc()/11832010-hot-honey-brussels-sprouts-recipe-VAT-step-04-59530f25a17644ab8d6d5ebd94c6d064.jpg" data-srcset="https://www.allrecipes.com/thmb/lUcNGpjFIbAT1nndRb9huLaamgo=/750x0/filters:no_upscale():max_bytes(150000):strip_icc()/11832010-hot-honey-brussels-sprouts-recipe-VAT-step-04-59530f25a17644ab8d6d5ebd94c6d064.jpg 750w" data-hi-res-src="https://www.allrecipes.com/thmb/TKyvsRHU4FOU6HJ50w5StS_jOqk=/1500x0/filters:no_upscale():max_bytes(150000):strip_icc()/11832010-hot-honey-brussels-sprouts-recipe-VAT-step-04-59530f25a17644ab8d6d5ebd94c6d064.jpg" /></p>
-                                        </div>
+                                        <img src="https://www.allrecipes.com/thmb/TKyvsRHU4FOU6HJ50w5StS_jOqk=/1500x0/filters:no_upscale():max_bytes(150000):strip_icc()/11832010-hot-honey-brussels-sprouts-recipe-VAT-step-04-59530f25a17644ab8d6d5ebd94c6d064.jpg" width="4000" height="3000" srcset="https://www.allrecipes.com/thmb/lUcNGpjFIbAT1nndRb9huLaamgo=/750x0/filters:no_upscale():max_bytes(150000):strip_icc()/11832010-hot-honey-brussels-sprouts-recipe-VAT-step-04-59530f25a17644ab8d6d5ebd94c6d064.jpg 750w" sizes="750px" alt="Bowl of roasted Brussels sprouts garnished with toppings and a spoon on the side" data-src="https://www.allrecipes.com/thmb/TKyvsRHU4FOU6HJ50w5StS_jOqk=/1500x0/filters:no_upscale():max_bytes(150000):strip_icc()/11832010-hot-honey-brussels-sprouts-recipe-VAT-step-04-59530f25a17644ab8d6d5ebd94c6d064.jpg" data-srcset="https://www.allrecipes.com/thmb/lUcNGpjFIbAT1nndRb9huLaamgo=/750x0/filters:no_upscale():max_bytes(150000):strip_icc()/11832010-hot-honey-brussels-sprouts-recipe-VAT-step-04-59530f25a17644ab8d6d5ebd94c6d064.jpg 750w" data-hi-res-src="https://www.allrecipes.com/thmb/TKyvsRHU4FOU6HJ50w5StS_jOqk=/1500x0/filters:no_upscale():max_bytes(150000):strip_icc()/11832010-hot-honey-brussels-sprouts-recipe-VAT-step-04-59530f25a17644ab8d6d5ebd94c6d064.jpg" />
                                         <figcaption id="mntl-figure-caption_4-0">
                                             <p>
                                                 <span>Allrecipes / Julia Hartbeck</span>

@samuelhuang
Copy link
Contributor Author

To be clear, this is still passing an element, but the element only has text content rather than a DOM tree representing that "HTML" (text content), is what you mean, right?

Yes.

I was confused why this would not have caused test failures if it was so different from before (without the code moving). From some searching, it seems we're not encountering issues in automation because jsdom parses with scripts disabled, in which case markup inside noscript is parsed into the DOM as per spec, but web browsers don't (unless scripting is disabled in the browser, I guess), which leads to the escaped content that you mentioned earlier. This is an HTML spec feature. We don't want to enable script loading in the unit tests in jsdom. So in that case, the automated tests would work, and the library is working for consumers using jsdom (without scripts) but not working in web browsers. That's unfortunate. :-(

Perhaps we should update JSDOMParser.js in this same repo to behave like browsers do for <noscript> just to improve test coverage?

Perhaps we can just read from textContent instead of innerHTML (as proposed for problem (1)) and avoid this problem? Also TIL <script> tags fortunately will NOT execute when added via innerHTML!

3. Even if the above problems don't exist, in our [test page](https://www.allrecipes.com/hot-honey-brussels-sprouts-recipe-11832010) , both `<img>` and the succeeding `<noscript>` are both nested under 2 layers of `<div>`s. Therefore unwrap wouldn't help us anyway!

I think this is the part I would prefer to fix, if possible? I tried to do this in #993 . PTAL and let me know what you think? I wanted to link to a comparison of the expected.html output but I can't figure out how to get github to do this... for convenience, this is a simple git formatted diff between the "post patch" state of your PR and mine:

Sounds good; I'm happy to abandon my pull request and take your approach!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants