Skip to content

ENH: Font: Initialise a Font from an embedded font file#3704

Open
PJBrs wants to merge 6 commits intopy-pdf:mainfrom
PJBrs:font-from-fontfile
Open

ENH: Font: Initialise a Font from an embedded font file#3704
PJBrs wants to merge 6 commits intopy-pdf:mainfrom
PJBrs:font-from-fontfile

Conversation

@PJBrs
Copy link
Copy Markdown
Contributor

@PJBrs PJBrs commented Mar 27, 2026

This PR adds the capability to initialise a Font instance from an embedded font file. It adds fonttools as an optional dependency to parse the font file.

I tested this to see if the resulting Font instances can be used for text extraction, and it mostly can, barring a couple of exceptions. Then again, this ultimately isn't intended for text extraction but for creating new appearance streams.

I also added a fontsampler file that I created using pypdf and that contains selected fonts from the existing test files. These embedded fonts are used for all the different if conditions in this PR for dealing with font flags, as well as one font that apparently does not include a cmap. This PR raises a KeyError in that case.

This PR is a small part of #3652 and it includes all work from #3602. I created it to make review more manageable. It should be ready as is!

ashariyar and others added 6 commits March 27, 2026 21:46
This patch adds a character_map when initialising from an
embedded font.

This patch uses getGlyphOrder when trying to create a
character_map from an embedded font file. getGlyphOrder is sure
to include all glyphs in a font, not just the ones that are
mapped by a unicode code point. For now, this does not make a
lot of difference, but in the future it might make it easier to
collect all widths in the font.
This patch more comprehensivel tries to detect font flags. Furthermore,
it adds some checks to deal with missing tables in truetype fonts. It
is a bit of a question what to do when the cmap itself is missing. In
this version, we just continue, but perhaps we should raise a warning
or even an error, because, in practice, it would mean that the font
that results isn't usable.
This patch adds a test and a file with some sample font resources that
all have specific font flags and/ or specific missing tables, to test
all the if conditions in _font.py. The font resources were added using
pypdf itself, and lifted from pdf files used as part of the current
test suite.
@codecov
Copy link
Copy Markdown

codecov bot commented Mar 27, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 97.45%. Comparing base (6121a6b) to head (ffda170).

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #3704      +/-   ##
==========================================
+ Coverage   97.43%   97.45%   +0.02%     
==========================================
  Files          55       55              
  Lines       10016    10117     +101     
  Branches     1841     1855      +14     
==========================================
+ Hits         9759     9860     +101     
  Misses        149      149              
  Partials      108      108              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant