Skip to content

RuntimeError: xref 732 is not an annot of this page #2063

@Abh4git

Description

@Abh4git

Please provide all mandatory information!

I am using a pdf file and trying extracting the highlighted text

Describe the bug (mandatory)

My code:

def main():
doc = fitz.open("ACMSurvey.pdf")
# Total page in the pdf
print(len(doc))
page = doc.load_page
# taking page for further processing
highlights = []
for page in doc:
for annot in page.annots():
highlight_text = page.get_textbox(annot.rect)
print(highlight_text)
highlights.append(highlight_text)
#print(highlights)
return

To Reproduce (mandatory)

I try running the above code. I am using a pdf file and trying extracting the highlighted text

Explain the steps to reproduce the behavior, For example, include a minimal code snippet, example files, etc.

File "\pythonextractHighLightFromPdf\main.py", line 11, in main
for annot in page.annots():
File "\pythonextractHighLightFromPdf\venv\lib\site-packages\fitz\fitz.py", line 6698, in annots
annot = self.load_annot(xref)
File "\pythonextractHighLightFromPdf\venv\lib\site-packages\fitz\fitz.py", line 6147, in load_annot
val = self._load_annot(name, xref)
File "\pythonextractHighLightFromPdf\venv\lib\site-packages\fitz\fitz.py", line 6048, in _load_annot
return _fitz.Page__load_annot(self, name, xref)
RuntimeError: xref 732 is not an annot of this page

For problems when building or installing PyMuPDF, give the full output of the build/install command so that, for example, all pip/compiler/linker errors/warnings can be seen.

Expected behavior (optional)

Describe what you expected to happen (if not obvious).

Screenshots (optional)

If applicable, add screenshots to help explain your problem.

Your configuration (mandatory)

  • Operating system, potentially version and bitness - Windows 10
  • Python version, bitness - Pythin 3.9
  • PyMuPDF version, installation method (wheel or generated from source).
    PyMuPDF 1.21.0

Installed using pip

For example, the output of print(sys.version, "\n", sys.platform, "\n", fitz.__doc__) would be sufficient (for the first two bullets).

Additional context (optional)

Add any other context about the problem here.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions