[MS] Extend table support for wide tables #1552

lesyk · 2026-02-10T11:23:24Z

This pull request enhances the handling and extraction of complex tables from PDF files in the markitdown package. It increases the flexibility of the PDF table extraction logic to support documents with a larger number of columns, updates the package version, and adds comprehensive tests for new PDF scenarios. Additionally, it improves repository configuration for handling binary files.

…ew test cases

…d add comprehensive test cases

…itdown into u/vilesyk/wide_tables

packages/markitdown/src/markitdown/__about__.py

gagb

Thanks for the PR @lesyk could you please clarify:

How were the adaptive constants (0.70 percentile, [25,50] clamp, 10 cols/inch threshold) chosen? Were other values tested?
Were the existing PDF tests run before and after this change to confirm no regressions?
Why was the version number bumped?

Can you please also update your description to include commands to run to test your changes and also indicate that you have manually verified all changes, especially if any AI was used to write the code.

lesyk · 2026-02-12T08:44:11Z

How were the adaptive constants (0.70 percentile, [25,50] clamp, 10 cols/inch threshold) chosen? Were other values tested?

We have internal testing datasets which has variety of different files After new dataset was added we found that old process of parsing did not work out, thus, making these changes.
As for values, these seem to be more stable from my testing using our datasets.
In previous PRs I have added same synthetic samples, and for each PR add more of them.

Were the existing PDF tests run before and after this change to confirm no regressions?

I see no regressions on our internal datasets, nor tests I have added previously.

Why was the version number bumped?

I think I misunderstood versioning for beta channels. I will change to 0.1.5b2. My mistake.

Can you please also update your description to include commands to run to test your changes and also indicate that you have manually verified all changes, especially if any AI was used to write the code.

I am following repos setup: pytest or hatch from root.

lesyk added 2 commits February 10, 2026 12:20

feat: enhance PDF table extraction to support complex forms and add n…

76a254a

…ew test cases

chore: update version to 0.1.6b1

be94561

lesyk changed the title ~~Extend table support for wide tables~~ [MS] Extend table support for wide tables Feb 10, 2026

lesyk and others added 4 commits February 10, 2026 12:24

Merge branch 'main' into u/vilesyk/wide_tables

f8ff685

feat: enhance PDF table extraction with adaptive column clustering an…

a50f2bb

…d add comprehensive test cases

Merge branch 'u/vilesyk/wide_tables' of https://github.com/lesyk/mark…

57e4b71

…itdown into u/vilesyk/wide_tables

fix: correct formatting and improve assertions in PDF table tests

bd20acd

lesyk marked this pull request as ready for review February 10, 2026 11:56

gagb reviewed Feb 11, 2026

View reviewed changes

packages/markitdown/src/markitdown/__about__.py Outdated Show resolved Hide resolved

gagb reviewed Feb 11, 2026

View reviewed changes

chore: revert version to 0.1.5b2 in __about__.py

51869c5

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[MS] Extend table support for wide tables #1552

[MS] Extend table support for wide tables #1552

lesyk commented Feb 10, 2026 •

edited

Loading

Uh oh!

Uh oh!

gagb left a comment •

edited

Loading

Uh oh!

lesyk commented Feb 12, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

[MS] Extend table support for wide tables #1552

Are you sure you want to change the base?

[MS] Extend table support for wide tables #1552

Conversation

lesyk commented Feb 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

gagb left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

lesyk commented Feb 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

lesyk commented Feb 10, 2026 •

edited

Loading

gagb left a comment •

edited

Loading

lesyk commented Feb 12, 2026 •

edited

Loading