Skip to content

Update benchmarks with fresh 200-doc evaluation#2

Merged
jdrhyne merged 3 commits intomainfrom
update-benchmarks-2026-04-02
Apr 2, 2026
Merged

Update benchmarks with fresh 200-doc evaluation#2
jdrhyne merged 3 commits intomainfrom
update-benchmarks-2026-04-02

Conversation

@jdrhyne
Copy link
Copy Markdown
Contributor

@jdrhyne jdrhyne commented Apr 2, 2026

Summary

Updated benchmark data in README.md and docs/benchmarks.md with fresh results from a reproducible evaluation run on 200 PDF documents with ground-truth annotations.

Changes

  • Re-ran all non-ML parsers using the opendataloader-bench harness
  • Added pypdf and liteparse to comparison (7 parsers total)
  • Removed opendataloader-hybrid (uses docling internally, redundant)
  • Added reproducibility instructions linking to the open benchmark harness

Results (200 docs, ground truth)

Solution Overall NID TEDS MHS Speed (s/doc)
docling 0.88 0.90 0.89 0.82 0.618
Nutrient 0.88 0.92 0.66 0.81 0.007
opendataloader 0.83 0.90 0.49 0.74 0.014
pymupdf4llm 0.83 0.88 0.48 0.78 0.252
markitdown 0.59 0.84 0.27 0.00 0.106
pypdf 0.58 0.87 0.00 0.00 0.019
liteparse 0.57 0.86 0.00 0.00 0.233

Nutrient matches docling on overall accuracy (0.88) while being 90x faster.

Test plan

  • Benchmark data matches opendataloader-bench evaluation.json outputs
  • Speed claims match summary.json timing data
  • Reproducibility link works

jdrhyne added 3 commits April 2, 2026 12:05
Re-ran all non-ML parsers on the opendataloader-bench 200-doc corpus
with ground truth annotations. Added pypdf and liteparse to comparison.
Removed opendataloader-hybrid (uses docling internally, redundant).

Key results:
  Nutrient  0.880 overall, 0.924 NID, 0.662 TEDS, 0.811 MHS, 0.007s/doc
  docling   0.882 overall, 0.898 NID, 0.887 TEDS, 0.824 MHS, 0.618s/doc

Nutrient matches docling on overall accuracy (0.88 vs 0.88) while being
90x faster. Nutrient leads on reading order (0.92 vs 0.90).

Added reproducibility link to opendataloader-bench harness.
Source: PSPDFKit-labs/opendataloader-bench#1
Keep corpus size (200 docs), date (2026-04-02), metric descriptions,
and all result data. Remove external repo references.
Restored visual benchmark snapshots as PNG charts showing:
- Extraction accuracy (overall scores)
- Reading order (NID)
- Table structure (TEDS)
- Heading level (MHS)
- Extraction speed per page
- Speed advantage vs Nutrient
@jdrhyne jdrhyne merged commit 2e278f9 into main Apr 2, 2026
2 checks passed
@jdrhyne jdrhyne deleted the update-benchmarks-2026-04-02 branch April 2, 2026 16:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant