-
Notifications
You must be signed in to change notification settings - Fork 678
Closed
Description
Description of the bug
File: Simple PDF 2.0 file.pdf (taken from PDF association GitHub page with example PDFs)
Since version v1.24.0 I see unexpected new line in the parsed text. Here is a text object of the PDF above:
6 0 obj
<< /Length 166 >>
stream
% A text block that shows "Hello World"
% No color is set, so this defaults to black in DeviceGray colorspace
BT
/F1 24 Tf
100 100 Td
(Hello World) Tj
ET
endstream
endobj
How to reproduce the bug
To reproduce
import fitz as pymupdf
doc = pymupdf.open('Simple PDF 2.0 file.pdf') # see section aboveVersion 1.23.26:
>>> doc.load_page(0).get_text('text')
'Hello World\n'Version 1.24.0:
>>> doc.load_page(0).get_text('text')
'Hello \nWorld\n'Expected behaviour
I would say that the additional new line should not be there.
PyMuPDF version
1.24.1
Operating system
Linux
Python version
3.10
Metadata
Metadata
Assignees
Labels
No labels
