Skip to content

images= PDF_REDACT_IMAGE_PIXELS arg of apply_redactions() #1818

@tigrankh

Description

@tigrankh

Hi,

I'm sorry in advance, since my question is a little long

The question is about the value PDF_REDACT_IMAGE_PIXELS of images param.
when i use apply_redactions() with default value i get a result like this:
image

I read in the documentation about a bug related to transparency, when images=PDF_REDACT_IMAGE_PIXELS, so i went to debug the images inside.
Using page.get_images() returned a list of 4 items:

[
    (59, 0, 1, 1, 8, 'DeviceGray', '', 'FXX1', ''), 
    (60, 0, 1200, 1518, 8, 'DeviceRGB', '', 'FXX2', 'JPXDecode'), 
    (61, 264, 1, 1, 8, 'DeviceGray', '', 'FXX3', ''), 
    (62, 264, 800, 1012, 8, 'DeviceRGB', '', 'FXX4', 'JPXDecode')
]

So basically i have 2 images that have heights: # 60 and # 62
as can be seen # 62 also indicates presence of a stencil mask which is object # 264

For the sake of experiment I started extracting and saving objects # 60 and # 62 individually.
This is the code i used:

data = page.get_text("dict")
new_doc.insert_page(0)
img1 = doc.extract_image(60)["image"]
img2 = doc.extract_image(62)["image"]
new_doc[0].insert_image(Rect(*data["blocks"][1]["bbox"]), stream=img2)
#new_doc[0].insert_image(Rect(*data["blocks"][0]["bbox"]), stream=img1)

block[0] is the object # 60

writing # 60 resulted in a slightly less colorful version of original doc:
image

This is example of original:
image

I apologize for not being able to share the whole thing.

Then i saved block[1] which is object # 62 and got the same bad/blurry version that i shared at the top.

looks like even without apply_redactions() i got similarly bad result.

but hey, # 62 had some smask object.
So i went ahead to extract smask object # 264 and tried to create a combined pixmap with this:

pix = fitz.Pixmap(doc.extract_images(62)["image"])
mask = fitz.Pixmap(doc.extract_images(264)["image"])

combo = fitz.Pixmap(pix, mask)

just like its shown here: https://pymupdf.readthedocs.io/en/latest/faq.html

and I got an error, that color and mask should have the same size.

I checked, indeed the mask size is much bigger than the pix.

I think i'm doing it right, but since i'm a relatively new user to pymupdf, maybe not.

Could you please help me understand if i'm doing something wrong ?
How can the size of the smask image be bigger than the image its applied to ?
Also, could you please explain the nature of the bug related to this value: PDF_REDACT_IMAGE_PIXELS

As an experiment I used images=0 param to resolve the output pdf issue after redaction, but since images can still be fully recovered, that is not a good option for me.

Thank you!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions