invisible text layer

I'm trying to take a non-searchable pdf and convert it to searchable pdf by superimposing an invisible text layer. 

        
            #Parse blocks received from textract response
            blocks = response['Blocks']
            
            for block in blocks:
                if block['BlockType'] == 'WORD':
                    page_number = block['Page'] - 1  # pages in PyMuPDF start from 0
                    pdf_page = doc.load_page(page_number)
            
                    bbox = block['Geometry']['BoundingBox']
                    bbox_mupdf = fitz.Rect(
                        bbox['Left'] * pdf_page.rect.width,
                        bbox['Top'] * pdf_page.rect.height,
                        (bbox['Left'] + bbox['Width']) * pdf_page.rect.width,
                        (bbox['Top'] + bbox['Height']) * pdf_page.rect.height
                    )
                    
                    pdf_page.insert_textbox(
                        rect=bbox_mupdf,
                        buffer=block['Text'],
                        color=None,  # Invisible color
                        overlay=True  # Overlay the text on top of existing content
                    )


I want the invisible bboxes to align exactly with the original bboxes (extracted by Textract), but with this code the resulting bboxes are off (smaller). 
Is insert_textbox the right way to do this? 
I don't want to specify a font and font_size because then the bboxes wouldn't align perfectly. 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

invisible text layer #2463

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

invisible text layer #2463

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions