Skip to content

Discard cropped / hidden content ("sanitize") or extract cropped images instead of raw images #1309

@quanvinh

Description

@quanvinh

My PDF has tons of cropped images and AFAIK PyMuPDF only allows me to extract raw (uncropped) ones.

I was wondering if any of the following is possible?

a. Discard hidden / cropped part of all images (similar to "Redact" -> "Sanitize" in Acrobat, without rasterizing) prior to extracting.
b. Obtain cropbox of each images so I can crop the extracted raw images using another library.
c. (Preferrably) Ignore cropped data during extracting (aka extract just the cropped images instead of raw ones).

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions