Skip to content

Add targeted Pandoc Lua filters to eliminate HTML rendering artifacts#5

Merged
pahjbo merged 3 commits into
mainfrom
copilot/build-html-anomaly-catalog
Apr 30, 2026
Merged

Add targeted Pandoc Lua filters to eliminate HTML rendering artifacts#5
pahjbo merged 3 commits into
mainfrom
copilot/build-html-anomaly-catalog

Conversation

Copy link
Copy Markdown
Contributor

Copilot AI commented Apr 28, 2026

Residual LaTeX control sequences, PDF image embeds, empty inline containers, generic "ref" citation link text, and repeated boilerplate sections produce user-visible defects in rendered HTML. This PR adds four narrowly-scoped Lua filters and fixes one existing filter to address these specific artifacts.

Changes

relink-ivoa-citations.lua — fix \citep link text

The \citep branch was hardcoding "ref" as link text. Now uses bibmap[bibkey] (the document shortname) — same as the \citet branch.

-- before
local citetext = "ref"
-- after
local citetext = bibmap[bibkey]

sanitize-raw-inline.lua — new filter

Converts or drops RawInline("latex", ...) remnants that survive pandoc's +raw_tex pass:

  • Drop: layout-only commands (\noindent, \par, \hfill, \clearpage, etc.), \index{…}, \phantom{…}, \hspace{…}, \vspace{…}
  • Expand: \TeXTeX, \LaTeXLaTeX, \textbackslash\, \ldots, etc.
  • Convert: \url{…} and \href{url}{text} → RST anonymous hyperlinks
  • Unknown macros are left unchanged

fix-figure-media-links.lua — new filter

Replaces Image elements with a .pdf source with an RST anonymous hyperlink, since browsers cannot render PDFs inline. Alt-text (or bare filename as fallback) becomes the link label; backticks and > are escaped for valid RST.

drop-empty-inline-shells.lua — new filter

Removes Span, Emph, Strong, Strikeout, Superscript, Subscript, and SmallCaps containers with no visible content (empty or whitespace-only), eliminating orphan punctuation and stray whitespace.

drop-conformance-section.lua — new filter

Removes the "Conformance-related definitions" boilerplate section from every document. This section is standard ivoatex scaffolding (RFC 2119 keyword definitions) that is redundant when all standards are rendered together on the site. The header is matched by a case-insensitive search for "conformance" + "definition" in the heading text; all blocks belonging to that section are dropped up to the next sibling or ancestor header. Runs before number-sections.lua so the raw header text is matched before numbering prefixes are added.

Makefile

The four new filters are inserted after fix_internal_refs.lua and before number-sections.lua in the pandoc filter chain.

@pahjbo
Copy link
Copy Markdown
Member

pahjbo commented Apr 28, 2026

@copilot could you also remove the "Conformance-related definitions" section from each output document

Copy link
Copy Markdown
Contributor Author

Copilot AI commented Apr 28, 2026

@copilot could you also remove the "Conformance-related definitions" section from each output document

Done in commit 11b92dc. Added pandocCustomization/drop-conformance-section.lua — a new filter that removes the "Conformance-related definitions" section from every document. It matches the section header by a case-insensitive check for "conformance" + "definition" in the heading text, then drops all blocks belonging to that section (up to the next sibling/ancestor header). The filter runs before number-sections.lua so the header text is not yet prefixed with a number. It's been wired into the Makefile filter chain.

Warning

Firewall rules blocked me from connecting to one or more addresses (expand for details)

I tried to connect to the following addresses, but was blocked by firewall rules:

  • github.com (HTTP Only)
    • Triggering command: /usr/bin/ssh /usr/bin/ssh -o SendEnv=GIT_PROTOCOL git@github.com git-upload-pack 'pahjbo/ivoatex.git' (packet block)

If you need me to access, download, or install something from one of these locations, you can either:

Copilot AI requested a review from pahjbo April 28, 2026 16:01
@pahjbo pahjbo marked this pull request as ready for review April 30, 2026 12:27
Copy link
Copy Markdown
Member

@pahjbo pahjbo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This does some useful tidying - it is not "complete" and might do a few things that we might want to reverse in the future - however, it does create some "filter themes" that can be added to in future

@pahjbo pahjbo merged commit 529e6dd into main Apr 30, 2026
1 check passed
@pahjbo pahjbo deleted the copilot/build-html-anomaly-catalog branch April 30, 2026 15:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants