Feature request: Extract both texts and tables on the same page

**Is your feature request related to a problem? Please describe.**
I'm testing out extract texts and tables using pymupdf. Some pages in the PDF may contain both texts and a table. 
Example (tables starting from page 3):
https://www.aetnamedicare.com/documents/individual/2024/summaryofbenefits/Y0001_H5521_127_PQ05_SB24_M.pdf

Pymupdf works great with extracting the tables using `Page.find_tables()` and it correctly identifies rows/columns. However I haven't found a great way to extract both texts outside of tables and the tables on the same page.

Ideally, I would expect a function something like `get_text_and_tables()` which will return a list of either text or tables in natural reading order. Then based on the type of the element I can determine what to do with the text or the table.

The closest thing I can think of for now is the following, but it's probably going to be error prone.
- Call `Page.get_text()` to extract all the text (which will contain texts from the tables on the page)
- Call `Page.find_tables()` to extract the tables
- Figure out the first cell and the last cell of each table, and delete the corresponding texts from `Page.get_text()`. Then try to combine the texts and the tables together. 


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Feature request: Extract both texts and tables on the same page #3093

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Feature request: Extract both texts and tables on the same page #3093

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions