Skip to content

Does pymupdf support TOC creation from a pdf document? #2012

@tikitong

Description

@tikitong

First of all thank you very much for this great work. I particularly appreciate your layout preserving text extraction method.
My question is: does pymupdf support TOC creation for a pdf document?

get_toc() method used such as:

import fitz

pdf_filename = 'my.pdf'
with fitz.open(pdf_filename) as doc:
    print(doc.get_toc())

seems to give results only if the TOC is already present at the beginning of the document.

On this pdf: https://github.com/pymupdf/PyMuPDF-Utilities/blob/master/text-extraction/Dart.pdf for example, is there a method in pymupdf to generate the pdf outline? In this case only the numbered titles of the paragraphs.

As well as for pdf with more complex formatting such as: https://blog.xpgreat.com/file/lstm.pdf, with numbered parts and sub-parts.

I also tested mupdf with the command: mutool show my.pdf outline but it returns nothing with or without TOC inside the pdf file in my case.

Your configuration (mandatory)

In my case, I made installation on macOS arm64 M2 (not Intel).
I create a conda osx-64 environment inspired from this amazing solution, and its work for me.

conda create -n pymupdf
conda activate pymupdf
conda config --env --set subdir osx-64
conda install python=3.9 
python -m pip install --upgrade pymupdf

print(sys.version, "\n", sys.platform, "\n", fitz.__doc__) gives me:

3.9.13 (main, Oct 13 2022, 16:12:30) 
[Clang 12.0.0 ] 
 darwin 
 
PyMuPDF 1.20.2: Python bindings for the MuPDF 1.20.3 library.
Version date: 2022-08-13 00:00:01.
Built for Python 3.9 on darwin (64-bit).

Please feel free to modify the README.md to notify macOS users with the apple chip that it also works by following this steps, I'm sure it will be useful for some :).

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions