Skip to content

extract_table ignores ordering defined while loading the document #153

@paulopaixaoamaral

Description

@paulopaixaoamaral

Bug Report

extract_table re-orders the table rows by the y axis (top to bottom), which works for most cases.

The issue comes if we have a table with a header which is below any of the other elements of the table, when we have a table in a page split by 2 columns for example:
extract_table_bug

In the above case, even if element_ordering is properly set in load to adjust to the page split, extract_table would return:

[["C", "D"], ["E", "F"],["HEADER 1", "HEADER 2"], ["A", "B"]]

Should we make extract_table obey the ordering on which the document elements are defined? Or should we add some sort of rows_sort and columns_sort options to the function?

Metadata

Metadata

Assignees

No one assigned

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions