Fix: Add extend_edges function to fix table extraction with one strat text and the other non-text#4878
Fix: Add extend_edges function to fix table extraction with one strat text and the other non-text#4878monchin wants to merge 1 commit intopymupdf:mainfrom
Conversation
… text and the other non-text
|
We already support parameters |
|
Thank you for your reply!
I proposed this PR because IMHO users may want to use library as easy as possible, and as correct as possible.
|
|
I don't understand most of your response. |
|
I think I know what you mean. But please let me have an example PDF page and how you extract the edge information, and I'm quite confident that I will be able to demonstrate how to do that using virtual vectors. |
text-lines-tables.pdf For the real pdf I met, here's an example. |

Hi, I refered pymupdf to write a library to extract pdf tables, and found a table-extracting bug when one strategy is "text" and the other is not, please see monchin/tablers#8 for more details.
I have fixed it in my library, and I found it also occurs in pymupdf, so I'd like to fix it.