-
Notifications
You must be signed in to change notification settings - Fork 678
Closed
Description
Describe the bug (mandatory)
Output from .get_text is missing some random spaces between words on the same line of the text in the PDF.
To Reproduce (mandatory)
import fitz
doc = fitz.open('file.pdf`)
for page in doc:
for block in page.get_text("dict", flags=31)["blocks"]:
print(block)Expected behavior (optional)
Text contains all the spaces that the PDF does.
eg. The quick brown fox jumps over the lazy dog
is output instead as Thequick brown fox jumps overthe lazy dog (Removing spaces on the same PDF line)
Screenshots (optional)
N/A
Your configuration (mandatory)
- Operating system, potentially version and bitness :
Linux 6.3.3-arch1-1 x86_64 - Python version, bitness : Python
3.10.11 (main, May 25 2023, 13:44:59) [GCC 13.1.1 20230429]x86_64 - PyMuPDF version, installation method (wheel or generated from source) :
1.22.3installed from wheel (using pip23.1.2, setuptools67.8.0and wheel0.40.0
Additional context (optional)
I have reviewed the bug report from #456 and #364 and tested using mutool as recommended. Using mutool 1.22.0 (as is used by PyMuPDF 1.22.3), the output of the PDF (using mutool draw -o test.html file.pdf 1) contains all of the spaces.
I am unsure if this is a duplicate of #2400, as I don't have enough information to determine if the same issue (an empty gap between those spaces), and I apologize if it is.
Metadata
Metadata
Assignees
Labels
No labels