Update benchmarks with fresh 200-doc evaluation by jdrhyne · Pull Request #2 · PSPDFKit/pdf-to-markdown

jdrhyne · 2026-04-02T16:06:12Z

Summary

Updated benchmark data in README.md and docs/benchmarks.md with fresh results from a reproducible evaluation run on 200 PDF documents with ground-truth annotations.

Changes

Re-ran all non-ML parsers using the opendataloader-bench harness
Added pypdf and liteparse to comparison (7 parsers total)
Removed opendataloader-hybrid (uses docling internally, redundant)
Added reproducibility instructions linking to the open benchmark harness

Results (200 docs, ground truth)

Solution	Overall	NID	TEDS	MHS	Speed (s/doc)
docling	0.88	0.90	0.89	0.82	0.618
Nutrient	0.88	0.92	0.66	0.81	0.007
opendataloader	0.83	0.90	0.49	0.74	0.014
pymupdf4llm	0.83	0.88	0.48	0.78	0.252
markitdown	0.59	0.84	0.27	0.00	0.106
pypdf	0.58	0.87	0.00	0.00	0.019
liteparse	0.57	0.86	0.00	0.00	0.233

Nutrient matches docling on overall accuracy (0.88) while being 90x faster.

Test plan

Benchmark data matches opendataloader-bench evaluation.json outputs
Speed claims match summary.json timing data
Reproducibility link works

Re-ran all non-ML parsers on the opendataloader-bench 200-doc corpus with ground truth annotations. Added pypdf and liteparse to comparison. Removed opendataloader-hybrid (uses docling internally, redundant). Key results: Nutrient 0.880 overall, 0.924 NID, 0.662 TEDS, 0.811 MHS, 0.007s/doc docling 0.882 overall, 0.898 NID, 0.887 TEDS, 0.824 MHS, 0.618s/doc Nutrient matches docling on overall accuracy (0.88 vs 0.88) while being 90x faster. Nutrient leads on reading order (0.92 vs 0.90). Added reproducibility link to opendataloader-bench harness. Source: PSPDFKit-labs/opendataloader-bench#1

Keep corpus size (200 docs), date (2026-04-02), metric descriptions, and all result data. Remove external repo references.

Restored visual benchmark snapshots as PNG charts showing: - Extraction accuracy (overall scores) - Reading order (NID) - Table structure (TEDS) - Heading level (MHS) - Extraction speed per page - Speed advantage vs Nutrient

jdrhyne added 3 commits April 2, 2026 12:05

Remove opendataloader-bench repo link, keep benchmark metadata

75fe4db

Keep corpus size (200 docs), date (2026-04-02), metric descriptions, and all result data. Remove external repo references.

Add benchmark charts back to README

95dae6c

Restored visual benchmark snapshots as PNG charts showing: - Extraction accuracy (overall scores) - Reading order (NID) - Table structure (TEDS) - Heading level (MHS) - Extraction speed per page - Speed advantage vs Nutrient

jdrhyne merged commit 2e278f9 into main Apr 2, 2026
2 checks passed

jdrhyne deleted the update-benchmarks-2026-04-02 branch April 2, 2026 16:19

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update benchmarks with fresh 200-doc evaluation#2

Update benchmarks with fresh 200-doc evaluation#2
jdrhyne merged 3 commits intomainfrom
update-benchmarks-2026-04-02

jdrhyne commented Apr 2, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

jdrhyne commented Apr 2, 2026

Summary

Changes

Results (200 docs, ground truth)

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant