Feature/unicode support#76
Open
m-messer wants to merge 9 commits into
Open
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem: The pdf-generator had no automated tests and no support for non-Latin scripts. There was no way to verify that the markdown→PDF pipeline was working correctly, and documents containing Korean, Chinese, or Japanese characters would silently fail to render. CJK (Chinese, Japanese, Korean) scripts require fonts with tens of thousands of glyphs and a dedicated TeX package. Standard Latin fonts like lmodern lack these glyphs, so without explicit CJK support, the characters are either dropped or cause a compilation error. Additionally, the API lacked a way to pass the language configuration to Pandoc, making it impossible to enable CJK rendering even when the fonts were present.
Solution: Added a four-layer test suite covering the full pipeline from pure function logic through to real PDF compilation and content verification. Added a general variables API field that forwards arbitrary Pandoc template variables (enabling lang, CJKmainfont, mainfont, etc.), switched the default document font to Noto Sans (satisfying the existing sans-serif accessibility requirement while providing broad Unicode coverage), and unconditionally loaded xeCJK in the template so CJK characters render without any caller configuration. Updated the Docker image to install the required TeX packages and fonts.
An example output including Korean text can be found here: korean_test_pdf-1.pdf
Changes: