Add targeted Pandoc Lua filters to eliminate HTML rendering artifacts#5
Conversation
Agent-Logs-Url: https://github.com/ivoa/IvoaDocViewSite/sessions/e5ca5916-94f8-4d50-8a18-3df364c75125 Co-authored-by: pahjbo <273267+pahjbo@users.noreply.github.com>
Agent-Logs-Url: https://github.com/ivoa/IvoaDocViewSite/sessions/e5ca5916-94f8-4d50-8a18-3df364c75125 Co-authored-by: pahjbo <273267+pahjbo@users.noreply.github.com>
|
@copilot could you also remove the "Conformance-related definitions" section from each output document |
Agent-Logs-Url: https://github.com/ivoa/IvoaDocViewSite/sessions/c9f94af1-154f-4960-9fc4-e2f923d3d55e Co-authored-by: pahjbo <273267+pahjbo@users.noreply.github.com>
Done in commit Warning Firewall rules blocked me from connecting to one or more addresses (expand for details)I tried to connect to the following addresses, but was blocked by firewall rules:
If you need me to access, download, or install something from one of these locations, you can either:
|
pahjbo
left a comment
There was a problem hiding this comment.
This does some useful tidying - it is not "complete" and might do a few things that we might want to reverse in the future - however, it does create some "filter themes" that can be added to in future
Residual LaTeX control sequences, PDF image embeds, empty inline containers, generic
"ref"citation link text, and repeated boilerplate sections produce user-visible defects in rendered HTML. This PR adds four narrowly-scoped Lua filters and fixes one existing filter to address these specific artifacts.Changes
relink-ivoa-citations.lua— fix\citeplink textThe
\citepbranch was hardcoding"ref"as link text. Now usesbibmap[bibkey](the document shortname) — same as the\citetbranch.sanitize-raw-inline.lua— new filterConverts or drops
RawInline("latex", ...)remnants that survive pandoc's+raw_texpass:\noindent,\par,\hfill,\clearpage, etc.),\index{…},\phantom{…},\hspace{…},\vspace{…}\TeX→TeX,\LaTeX→LaTeX,\textbackslash→\,\ldots→…, etc.\url{…}and\href{url}{text}→ RST anonymous hyperlinksfix-figure-media-links.lua— new filterReplaces
Imageelements with a.pdfsource with an RST anonymous hyperlink, since browsers cannot render PDFs inline. Alt-text (or bare filename as fallback) becomes the link label; backticks and>are escaped for valid RST.drop-empty-inline-shells.lua— new filterRemoves
Span,Emph,Strong,Strikeout,Superscript,Subscript, andSmallCapscontainers with no visible content (empty or whitespace-only), eliminating orphan punctuation and stray whitespace.drop-conformance-section.lua— new filterRemoves the "Conformance-related definitions" boilerplate section from every document. This section is standard ivoatex scaffolding (RFC 2119 keyword definitions) that is redundant when all standards are rendered together on the site. The header is matched by a case-insensitive search for "conformance" + "definition" in the heading text; all blocks belonging to that section are dropped up to the next sibling or ancestor header. Runs before
number-sections.luaso the raw header text is matched before numbering prefixes are added.MakefileThe four new filters are inserted after
fix_internal_refs.luaand beforenumber-sections.luain the pandoc filter chain.