-
Notifications
You must be signed in to change notification settings - Fork 5.1k
[MS] Extend table support for wide tables #1552
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
…d add comprehensive test cases
…itdown into u/vilesyk/wide_tables
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR @lesyk could you please clarify:
- How were the adaptive constants (0.70 percentile, [25,50] clamp, 10 cols/inch threshold) chosen? Were other values tested?
- Were the existing PDF tests run before and after this change to confirm no regressions?
- Why was the version number bumped?
Can you please also update your description to include commands to run to test your changes and also indicate that you have manually verified all changes, especially if any AI was used to write the code.
We have internal testing datasets which has variety of different files After new dataset was added we found that old process of parsing did not work out, thus, making these changes.
I see no regressions on our internal datasets, nor tests I have added previously.
I think I misunderstood versioning for beta channels. I will change to
I am following repos setup: |
This pull request enhances the handling and extraction of complex tables from PDF files in the
markitdownpackage. It increases the flexibility of the PDF table extraction logic to support documents with a larger number of columns, updates the package version, and adds comprehensive tests for new PDF scenarios. Additionally, it improves repository configuration for handling binary files.