-
Notifications
You must be signed in to change notification settings - Fork 678
Description
Please provide all mandatory information!
I am using a pdf file and trying extracting the highlighted text
Describe the bug (mandatory)
My code:
def main():
doc = fitz.open("ACMSurvey.pdf")
# Total page in the pdf
print(len(doc))
page = doc.load_page
# taking page for further processing
highlights = []
for page in doc:
for annot in page.annots():
highlight_text = page.get_textbox(annot.rect)
print(highlight_text)
highlights.append(highlight_text)
#print(highlights)
return
To Reproduce (mandatory)
I try running the above code. I am using a pdf file and trying extracting the highlighted text
Explain the steps to reproduce the behavior, For example, include a minimal code snippet, example files, etc.
File "\pythonextractHighLightFromPdf\main.py", line 11, in main
for annot in page.annots():
File "\pythonextractHighLightFromPdf\venv\lib\site-packages\fitz\fitz.py", line 6698, in annots
annot = self.load_annot(xref)
File "\pythonextractHighLightFromPdf\venv\lib\site-packages\fitz\fitz.py", line 6147, in load_annot
val = self._load_annot(name, xref)
File "\pythonextractHighLightFromPdf\venv\lib\site-packages\fitz\fitz.py", line 6048, in _load_annot
return _fitz.Page__load_annot(self, name, xref)
RuntimeError: xref 732 is not an annot of this page
For problems when building or installing PyMuPDF, give the full output of the build/install command so that, for example, all pip/compiler/linker errors/warnings can be seen.
Expected behavior (optional)
Describe what you expected to happen (if not obvious).
Screenshots (optional)
If applicable, add screenshots to help explain your problem.
Your configuration (mandatory)
- Operating system, potentially version and bitness - Windows 10
- Python version, bitness - Pythin 3.9
- PyMuPDF version, installation method (wheel or generated from source).
PyMuPDF 1.21.0
Installed using pip
For example, the output of print(sys.version, "\n", sys.platform, "\n", fitz.__doc__) would be sufficient (for the first two bullets).
Additional context (optional)
Add any other context about the problem here.