v0.2.0 — SpreadsheetBench benchmark + retrievability #7
arnav2
announced in
Announcements
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
0.2.0 — 2026-05-11
Headline: ks-xlsx-parser ties Docling at recall@1 and wins recall@3/@5 on
SpreadsheetBench (912 instances, 5,458 xlsx). 99.945% parse success.
36.9% citation-grade geometric recall (Docling 0% structurally).
Added
make bench— head-to-head benchmark vs Docling on SpreadsheetBenchtests/benchmarks/adapters/docling_adapter.py— Docling adapterscripts/eval_retrieval.py— retrieval recall@k with embedded chunksscripts/summarize_retrieval.py— re-aggregate partial runsscripts/download_corpora.sh— fetches SpreadsheetBench v0.1tests/benchmarks/reports/COMPARISON.md— full methodology + capability matrixdocs/launch/RELEASE_NOTES_vX.Y.Z.mdautomaticallyFixed
1272), not display-formatted ("1,272.00")[=]formula marker no longer triggers spurious sci-notation truncationYYYY-MM-DD(drop midnight00:00:00)GradientFillcells no longer crash the sheet parserChanged
_detect_style_boundariesin segmenter — fill-color banding is nota semantic boundary (was splitting coherent tables into 5 header-less fragments)
tests/benchmarks/_schema.py:formulasnullable onstatus=ok(Docling/Marker don't model formulas)
Removed
scripts/compare_docling.py— superseded by the unified benchmark frameworkPerformance
Breaking
render_texton numeric cells now contains the raw value, not Excel'sdisplay-formatted string. If you keyed off display formatting, use the
cell DTO's
display_valuefield instead.Install:
pip install -U ks-xlsx-parser==0.2.0Full notes: docs/launch/RELEASE_NOTES_v0.2.0.md
Benchmark report: tests/benchmarks/reports/COMPARISON.md
This discussion was created from the release v0.2.0 — SpreadsheetBench benchmark + retrievability.
Beta Was this translation helpful? Give feedback.
All reactions