Skip to content

When I want to compress PDF,I got an bigger PDF #3644

@Syntamin

Description

@Syntamin

Description of the bug

I want to compress a PDF. I extracted images from the PDF file and then used pngquant to compress them. Their size was reduced by more than 70%. However, when I used the replace_image function to replace the images, the size of the new PDF became bigger. I want to know why this happens, and if I used the save function incorrectly.

How to reproduce the bug

def compress_pdf(input_path, output_path=""):
doc = pymupdf.open(input_path)

doc_name_with_extension = os.path.basename(input_path)
doc_name = os.path.splitext(doc_name_with_extension)[0]

for page_index in range(10):
    page = doc[page_index]
    image_list = page.get_images()

    for image_index, img in enumerate(image_list, start=1):
        xref = img[0]
        pix = pymupdf.Pixmap(doc, xref)

        if pix.n - pix.alpha > 3:
            pix = pymupdf.Pixmap(pymupdf.csRGB, pix)

        origin_png_path = "./origin/%s_page_%s-image_%s.png" % (
            doc_name,
            page_index,
            image_index,
        )

        pix.save(origin_png_path)  # 存储提取出的图片
        pix = None

        pngquant.compress_png(origin_png_path)

        compressed_png_path = "./origin/%s_page_%s-image_%s-fs8.png" % (
            doc_name,
            page_index,
            image_index,
        )

        # print(os.path.getsize(compressed_png_path), compressed_png_path)

        # 1. replace to file
        # page.replace_image(xref, filename=compressed_png_path)

        with open(compressed_png_path, "rb") as compressed_png:
            compressed_png_bytes = compressed_png.read()
            print(len(compressed_png_bytes), "111")
            page.replace_image(xref, stream=compressed_png_bytes)

doc.save(output_path, garbage=3, clean=True)
doc.close()

PyMuPDF version

1.24.6

Operating system

MacOS

Python version

3.9

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions