Skip to content

While I extract images, if the images has no background color, the background would be black. #2428

@w8741906

Description

@w8741906

Please provide all mandatory information!

Describe the bug (mandatory)

While I extract images, if the images has no background color, the background would be black.

To Reproduce (mandatory)

Explain the steps to reproduce the behavior, For example, include a minimal code snippet, example files, etc.

Here is my code.


def extract_images(pdf_path, output_folder, output_folder_with_title, classified_titles):
    doc = fitz.open(pdf_path)
    doc.colorspace = fitz.csRGB

    for page_number in range(len(doc)):
        page = doc[page_number]
        # only test 1 page
        # if page_number != 53:
        #     continue
        img_num = 0
        # get xref and rect objects
        images = page.get_images(full=True)
        for img_info in images:
            xref = img_info[0]
            img_rect = page.get_image_rects(xref)
            img_title = get_closest_title(page_number, img_rect, classified_titles)
            base_image = doc.extract_image(xref)
            image_bytes = base_image["image"]
            image = Image.open(io.BytesIO(image_bytes))
            img_num += 1
            image_name = "page" + str(page_number) + "_" + str(img_num) + img_title
            image_path = os.path.join(output_folder, f"{image_name}.png")
            image_with_title_path = os.path.join(output_folder_with_title, f"{image_name}.png")

            image.save(image_path)

            image_with_title = add_title_big(image, "page" + str(page_number) + "_" + str(img_num))
            image_with_title.save(image_with_title_path)

    return

Expected behavior (optional)

extract a image with white background.

Screenshots (optional)

This is what I extracted.
22222

This is when I open it with adobe.
1111

Your configuration (mandatory)

  • Mac book
  • Python 3.7
  • PyMuPDF version 1.22.1

Additional context (optional)

Metadata

Metadata

Assignees

No one assigned

    Labels

    not a bugnot a bug / user error / unable to reproduce

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions