Skip to content

feat: Add Unicode font detection and enhanced error handling#1563

Open
otreci4sgelt0nas wants to merge 3 commits intopy-pdf:masterfrom
otreci4sgelt0nas:feature/unicode-font-detection
Open

feat: Add Unicode font detection and enhanced error handling#1563
otreci4sgelt0nas wants to merge 3 commits intopy-pdf:masterfrom
otreci4sgelt0nas:feature/unicode-font-detection

Conversation

@otreci4sgelt0nas
Copy link
Copy Markdown

feat: Add Unicode font detection and enhanced error handling

  • Add UnicodeFontManager class for automatic font detection and recommendations
  • Fix Unicode encoding issues by providing automatic font selection
  • Enable PDF generation with Cyrillic, Arabic, Chinese, and other non-Latin scripts
  • Enhance FPDFUnicodeEncodingException with helpful font suggestions and exact usage instructions
  • Add comprehensive Unicode script detection (Cyrillic, Arabic, Chinese, etc.)
  • Provide system-specific font path detection (macOS, Linux, Windows)
  • Add convenience functions for quick font recommendations
  • Include comprehensive tests and tutorial examples
  • Transform cryptic encoding errors into actionable solutions

This fixes common Unicode encoding issues, especially with Cyrillic characters, by providing automatic font detection and helpful error messages that guide users to appropriate Unicode fonts. Users can now successfully generate PDFs with text in any Unicode script.

- Add UnicodeFontManager class for automatic font detection and recommendations
- Enhance FPDFUnicodeEncodingException with helpful font suggestions
- Add comprehensive Unicode script detection (Cyrillic, Arabic, Chinese, etc.)
- Provide system-specific font path detection (macOS, Linux, Windows)
- Add convenience functions for quick font recommendations
- Include comprehensive tests and tutorial examples
- Improve error messages with specific font recommendations and usage instructions

This addresses common Unicode encoding issues, especially with Cyrillic characters,
by providing automatic font detection and helpful error messages that guide users
to appropriate Unicode fonts.
@andersonhc
Copy link
Copy Markdown
Collaborator

Hi @otreci4sgelt0nas — thanks for this PR!

At first glance the code looks really solid, and turning cryptic encoding errors into actionable messages sounds really good. The tutorials are helpful too.

A couple of quick notes/questions:

  • It would be great to add a short section to docs/Text.md pointing to the new tutorials and explaining the basic “why/when/how” the user will see those messages.

  • Re: suggestions: are all recommended fonts open/free and clearly licensed? Font licensing can be tricky.

I have limited time this weekend, but I can take a deeper look early next week (unless @Lucas-C beats me to it). Thanks again for the thoughtful contribution!

@Lucas-C
Copy link
Copy Markdown
Member

Lucas-C commented Sep 8, 2025

I fully agree with @andersonhc feedbacks.
I would also have suggested adding details to the Markdown files in the docs/ folder regarding this, and the licensing is also very relevant 👍

Moreover, the GitHub Actions CI pipeline is failing du to the black code formatter.
You just need to run: black fpdf/ test/fonts/test_unicode_font_utils.py tutorial/unicode_font_detection.py

Copy link
Copy Markdown
Collaborator

@andersonhc andersonhc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you very much for your contribution.

Please:

  • Add a CHANGELOG entry
  • Format the code with black and check with pylint to pass the lint steps
  • Fix the 4 tests failing due to incorrect error assert.

@Lucas-C
Copy link
Copy Markdown
Member

Lucas-C commented Oct 27, 2025

Hi @otreci4sgelt0nas 🙂 👋
We are close to the end of the month, I was just curious to know if you were willing to finish this PR as part of Hacktoberfest?

- Fix failing tests in test_unicode_font_utils.py
- Format code with black
- Add CHANGELOG entry for Unicode font detection feature

Changes:
- Fixed detect_script_in_text() to return None for Latin-only text
- Fixed detect_script_in_text() to prioritize non-Latin scripts in mixed text
- Applied black formatting to errors.py and unicode_font_utils.py
- Added comprehensive CHANGELOG entry documenting the new feature
@otreci4sgelt0nas
Copy link
Copy Markdown
Author

Hi @andersonhc and @Lucas-C ! Sorry for the long delay, but I've finally circled back to finish this. 🙂

Copy link
Copy Markdown
Collaborator

@andersonhc andersonhc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please run all formatting (black), linting (pylint) and typing (mypy and pyright) tools this project require or the lint job will fail.
Please review the tests failing.

Comment thread fpdf/errors.py
return f"{base_message}\n\n{self.suggestion}"
else:
return f"{base_message} Please consider using a Unicode font."

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You changed the messages but didn't update the tests.

self.system = platform.system().lower()
self.font_paths = self._get_system_font_paths()
self.available_fonts = self._scan_available_fonts()

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are importing UnicodeFontManager on fpdf.py and doing this system font scan on init, so every time fpdf2 is loaded we'll do this scan - that's a considerable performance hit.

Please move the scan out of __init__() and only perform when needed. You can just set a flag on init, like _available_fonts_loaded = False, and do the scan and set the flag only when you need.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants