-
Notifications
You must be signed in to change notification settings - Fork 678
Description
Hi,
I'm sorry in advance, since my question is a little long
The question is about the value PDF_REDACT_IMAGE_PIXELS of images param.
when i use apply_redactions() with default value i get a result like this:

I read in the documentation about a bug related to transparency, when images=PDF_REDACT_IMAGE_PIXELS, so i went to debug the images inside.
Using page.get_images() returned a list of 4 items:
[
(59, 0, 1, 1, 8, 'DeviceGray', '', 'FXX1', ''),
(60, 0, 1200, 1518, 8, 'DeviceRGB', '', 'FXX2', 'JPXDecode'),
(61, 264, 1, 1, 8, 'DeviceGray', '', 'FXX3', ''),
(62, 264, 800, 1012, 8, 'DeviceRGB', '', 'FXX4', 'JPXDecode')
]
So basically i have 2 images that have heights: # 60 and # 62
as can be seen # 62 also indicates presence of a stencil mask which is object # 264
For the sake of experiment I started extracting and saving objects # 60 and # 62 individually.
This is the code i used:
data = page.get_text("dict")
new_doc.insert_page(0)
img1 = doc.extract_image(60)["image"]
img2 = doc.extract_image(62)["image"]
new_doc[0].insert_image(Rect(*data["blocks"][1]["bbox"]), stream=img2)
#new_doc[0].insert_image(Rect(*data["blocks"][0]["bbox"]), stream=img1)
block[0] is the object # 60
writing # 60 resulted in a slightly less colorful version of original doc:

I apologize for not being able to share the whole thing.
Then i saved block[1] which is object # 62 and got the same bad/blurry version that i shared at the top.
looks like even without apply_redactions() i got similarly bad result.
but hey, # 62 had some smask object.
So i went ahead to extract smask object # 264 and tried to create a combined pixmap with this:
pix = fitz.Pixmap(doc.extract_images(62)["image"])
mask = fitz.Pixmap(doc.extract_images(264)["image"])
combo = fitz.Pixmap(pix, mask)
just like its shown here: https://pymupdf.readthedocs.io/en/latest/faq.html
and I got an error, that color and mask should have the same size.
I checked, indeed the mask size is much bigger than the pix.
I think i'm doing it right, but since i'm a relatively new user to pymupdf, maybe not.
Could you please help me understand if i'm doing something wrong ?
How can the size of the smask image be bigger than the image its applied to ?
Also, could you please explain the nature of the bug related to this value: PDF_REDACT_IMAGE_PIXELS
As an experiment I used images=0 param to resolve the output pdf issue after redaction, but since images can still be fully recovered, that is not a good option for me.
Thank you!
