Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
24 changes: 24 additions & 0 deletions packages/docusaurus-utils/src/__tests__/markdownUtils.test.ts
Original file line number Diff line number Diff line change
Expand Up @@ -184,6 +184,30 @@ describe('createExcerpt', () => {
`),
).toBe('Lorem ipsum dolor sit amet, consectetur adipiscing elit.');
});

it('creates excerpt with XML tag inside inline code', () => {
expect(
createExcerpt(dedent`
# Markdown Regular Title

This paragraph includes a link to the \`<metadata>\` documentation.
`),
).toBe(
'This paragraph includes a link to the &lt;metadata&gt; documentation.',
);
});

it('creates excerpt with XML tag inside inline code with hyperlink', () => {
expect(
createExcerpt(dedent`
# Markdown Regular Title

This paragraph includes a link to the [\`<metadata>\`](https://developer.mozilla.org/en-US/docs/Web/SVG/Element/metadata) documentation.
`),
).toBe(
'This paragraph includes a link to the &lt;metadata&gt; documentation.',
);
});
});

describe('parseMarkdownContentTitle', () => {
Expand Down
6 changes: 4 additions & 2 deletions packages/docusaurus-utils/src/markdownUtils.ts
Original file line number Diff line number Diff line change
Expand Up @@ -128,6 +128,10 @@ export function createExcerpt(fileString: string): string | undefined {
}

const cleanedLine = fileLine
// Remove inline code.
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

comment is wrong, we don't remove them anymore but we unwrap them

.replace(/`(?<text>.+?)`/g, (_match, p1) => {
return p1.replaceAll('<', '&lt;').replaceAll('>', '&gt;');
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are not supposed to escape the string here; it's done automatically later in the process, if needed (not the case for metadata)

Otherwise, you end up with this:

Image

And crawlers see this:

Image

})
// Remove HTML tags.
.replace(/<[^>]*>/g, '')
// Remove Title headers
Expand All @@ -144,8 +148,6 @@ export function createExcerpt(fileString: string): string | undefined {
.replace(/\[\^.+?\](?:: .*$)?/g, '')
// Remove inline links.
.replace(/\[(?<alt>.*?)\][[(].*?[\])]/g, '$1')
// Remove inline code.
.replace(/`(?<text>.+?)`/g, '$1')
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand why you moved this to the top: to escape tags before HTML tags get removed

Unfortunately, escaping doesn't work, so one solution that could work would be to use marker tags for < and > found in inline code blocks: you could replace these markers with their former < > values after having removed the HTML tags

// Remove blockquotes.
.replace(/^\s{0,3}>\s?/g, '')
// Remove admonition definition.
Expand Down
Loading