images= PDF_REDACT_IMAGE_PIXELS arg of apply_redactions()

Hi,

I'm sorry in advance, since my question is a little long

The question is about the value `PDF_REDACT_IMAGE_PIXELS` of images param.
when i use apply_redactions() with default value i get a result like this:
![image](https://user-images.githubusercontent.com/9495111/179543022-2c6b152a-31ad-4bc1-86ca-08f984d787d0.png)

I read in the documentation about a bug related to transparency, when `images=PDF_REDACT_IMAGE_PIXELS`, so i went to debug the images inside.
Using `page.get_images()` returned a list of 4 items:
```
[
    (59, 0, 1, 1, 8, 'DeviceGray', '', 'FXX1', ''), 
    (60, 0, 1200, 1518, 8, 'DeviceRGB', '', 'FXX2', 'JPXDecode'), 
    (61, 264, 1, 1, 8, 'DeviceGray', '', 'FXX3', ''), 
    (62, 264, 800, 1012, 8, 'DeviceRGB', '', 'FXX4', 'JPXDecode')
]
```

So basically i have 2 images that have heights: # 60 and # 62
as can be seen # 62 also indicates presence of a stencil mask which is object # 264

For the sake of experiment I started extracting and saving objects # 60 and # 62 individually.
This is the code i used:

```
data = page.get_text("dict")
new_doc.insert_page(0)
img1 = doc.extract_image(60)["image"]
img2 = doc.extract_image(62)["image"]
new_doc[0].insert_image(Rect(*data["blocks"][1]["bbox"]), stream=img2)
#new_doc[0].insert_image(Rect(*data["blocks"][0]["bbox"]), stream=img1)
```

block[0] is the object # 60

writing # 60 resulted in a slightly less colorful version of original doc:
![image](https://user-images.githubusercontent.com/9495111/179546020-c6303ceb-48ab-4ecd-9428-c1457de95dd4.png)

This is example of original:
![image](https://user-images.githubusercontent.com/9495111/179546121-d55b69d4-c0d3-4775-850c-6c858c97bc85.png)

I apologize for not being able to share the whole thing.

Then i saved block[1] which is object # 62 and got the same bad/blurry version that i shared at the top.

looks like even without apply_redactions() i got similarly bad result.

but hey, # 62 had some smask object.
So i went ahead to extract smask object # 264 and tried to create a combined pixmap with this:
```
pix = fitz.Pixmap(doc.extract_images(62)["image"])
mask = fitz.Pixmap(doc.extract_images(264)["image"])

combo = fitz.Pixmap(pix, mask)
```

just like its shown here: https://pymupdf.readthedocs.io/en/latest/faq.html

and I got an error, that color and mask should have the same size.

I checked, indeed the mask size is much bigger than the pix.

I think i'm doing it right, but since i'm a relatively new user to pymupdf, maybe not.

Could you please help me understand if i'm doing something wrong ?
How can the size of the smask image be bigger than the image its applied to ?
Also, could you please explain the nature of the bug related to this value: PDF_REDACT_IMAGE_PIXELS

As an experiment I used `images=0` param to resolve the output pdf issue after redaction, but since images can still be fully recovered, that is not a good option for me. 

Thank you!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

images= PDF_REDACT_IMAGE_PIXELS arg of apply_redactions() #1818

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

images= PDF_REDACT_IMAGE_PIXELS arg of apply_redactions() #1818

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions