Skip to content

Conversation

@inacionery
Copy link

Summary

This PR fixes a ValueError: min() arg is an empty sequence that occurred when processing PDFs with tables that have empty or invalid cell sequences.

Problem

When to_markdown() processes a table with no valid cells, accessing the t.bbox property causes PyMuPDF's table module to call min() on an empty sequence, resulting in a crash.

Error trace:

File "pymupdf4llm/helpers/pymupdf_rag.py", line 1057, in get_page_output
    omitted_table_rects.append(pymupdf.Rect(t.bbox))
                                            ^^^^^^
File "pymupdf/table.py", line 1506, in bbox
    min(map(itemgetter(0), c)),
    ^^^^^^^^^^^^^^^^^^^^^^^^^^
ValueError: min() arg is an empty sequence

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant