PSPDFKit
diff --git a/‎README.md‎
Lines changed: 33 additions & 24 deletions b/‎README.md‎
Lines changed: 33 additions & 24 deletions
diff --git a/‎docs/assets/extraction-accuracy.png‎
89.3 KB b/‎docs/assets/extraction-accuracy.png‎
89.3 KB
diff --git a/‎docs/assets/extraction-speed.png‎
99.1 KB b/‎docs/assets/extraction-speed.png‎
99.1 KB
diff --git a/‎docs/assets/faster-with-nutrient.png‎
67.1 KB b/‎docs/assets/faster-with-nutrient.png‎
67.1 KB
diff --git a/‎docs/assets/heading-level.png‎
88.5 KB b/‎docs/assets/heading-level.png‎
88.5 KB
diff --git a/‎docs/assets/reading-order.png‎
93.5 KB b/‎docs/assets/reading-order.png‎
93.5 KB
diff --git a/‎docs/assets/table-structure.png‎
91.9 KB b/‎docs/assets/table-structure.png‎
91.9 KB
diff --git a/‎docs/benchmarks.md‎
Lines changed: 25 additions & 25 deletions b/‎docs/benchmarks.md‎
Lines changed: 25 additions & 25 deletions
@@ -13,7 +13,7 @@
 
 Fast, accurate Markdown from PDFs — locally, with no cleanup required. Built for Claude, Codex, RAG pipelines, and document-heavy automation where noisy extraction burns tokens and makes downstream results less reliable.
 
-- **How fast is it?** — 0.008s per page. 176x faster than docling, 10x faster than opendataloader. ([benchmarks](#benchmarks))
+- **How fast is it?** — 0.007s per page. 90x faster than docling, 37x faster than pymupdf4llm. ([benchmarks](#benchmarks))
 - **How accurate is it?** — 0.92 reading order (best in class), 0.88 overall extraction accuracy, 0.81 heading detection. ([benchmarks](#benchmarks))
 - **Where do my PDFs go?** — Nowhere. The CLI runs locally. Your documents are not uploaded to Nutrient. ([trust & licensing](#trust-and-licensing))
 - **What does it cost?** — Free for up to 1,000 documents per calendar month. No license key, no signup, no API token. ([license](LICENSE.md))
@@ -86,45 +86,54 @@ When both arguments are directories, the CLI converts every PDF in the input dir
 
 ## Benchmarks
 
-Published benchmark values from [Nutrient's PDF-to-Markdown page](https://www.nutrient.io/ai/skills/pdf-to-markdown/), recorded on `AMD EPYC 9454`.
+Benchmark results from 200 PDF documents with hand-annotated Markdown ground truth, evaluated using NID (reading order), TEDS (table structure), and MHS (heading hierarchy) metrics. Benchmarked on `2026-04-02`.
 
 ### Visual Snapshot
 
-![Extraction accuracy benchmark](docs/assets/extraction-accuracy.svg)
+![Extraction accuracy](docs/assets/extraction-accuracy.png)
 
-![Extraction speed benchmark](docs/assets/extraction-speed.svg)
+![Reading order](docs/assets/reading-order.png)
 
-![Structure quality benchmark](docs/assets/structure-quality.svg)
+![Table structure](docs/assets/table-structure.png)
 
-![Relative speedup benchmark](docs/assets/faster-with-nutrient.svg)
+![Heading level](docs/assets/heading-level.png)
+
+![Extraction speed](docs/assets/extraction-speed.png)
+
+![Faster with Nutrient](docs/assets/faster-with-nutrient.png)
 
 ### Accuracy
 
-| Metric | Nutrient | Best competitor | MarkItDown |
-| --- | ---: | ---: | ---: |
-| Extraction accuracy | 0.88 | 0.89 (docling) | 0.58 |
-| Reading order (NID) | 0.92 | 0.91 | 0.88 |
-| Table structure (TEDS) | 0.66 | 0.93 (docling) | 0.00 |
-| Heading level (MHS) | 0.81 | 0.83 (docling) | 0.00 |
+| Solution | Overall | Reading Order (NID) | Table Structure (TEDS) | Heading Level (MHS) |
+| --- | ---: | ---: | ---: | ---: |
+| docling | **0.88** | 0.90 | **0.89** | **0.82** |
+| **Nutrient** | **0.88** | **0.92** | 0.66 | 0.81 |
+| opendataloader | 0.83 | 0.90 | 0.49 | 0.74 |
+| pymupdf4llm | 0.83 | 0.88 | 0.48 | 0.78 |
+| markitdown | 0.59 | 0.84 | 0.27 | 0.00 |
+| pypdf | 0.58 | 0.87 | 0.00 | 0.00 |
+| liteparse | 0.57 | 0.86 | 0.00 | 0.00 |
 
 ### Speed
 
 | Solution | Seconds per page |
 | --- | ---: |
-| Nutrient | 0.008 |
-| opendataloader | 0.056 |
-| markitdown | 0.058 |
-| pymupdf4llm | 0.083 |
-| opendataloader-hybrid | 1.412 |
-| docling | 1.473 |
+| **Nutrient** | **0.007** |
+| opendataloader | 0.014 |
+| pypdf | 0.019 |
+| markitdown | 0.106 |
+| liteparse | 0.233 |
+| pymupdf4llm | 0.252 |
+| docling | 0.618 |
 
 ### Faster with Nutrient
 
-- `176x` faster than `docling`
-- `172x` faster than `opendataloader-hybrid`
-- `10x` faster than `opendataloader`
-- `7x` faster than `pymupdf4llm`
-- `7x` faster than `markitdown`
+- `90x` faster than `docling`
+- `37x` faster than `pymupdf4llm`
+- `34x` faster than `liteparse`
+- `15x` faster than `markitdown`
+- `3x` faster than `pypdf`
+- `2x` faster than `opendataloader`
 
 For the full comparison table, see [docs/benchmarks.md](docs/benchmarks.md).
 
@@ -142,7 +151,7 @@ See [LICENSE.md](LICENSE.md) for the full terms and [docs/distribution-model.md]
 
 ### What makes this different from other PDF extractors?
 
-Speed and accuracy should not be a tradeoff. Most extractors are either fast but lose structure (markitdown, pymupdf4llm) or accurate but slow (docling). Nutrient extracts at 0.008s per page with strong reading order, heading, and table preservation — less cleanup, fewer wasted tokens, and more reliable downstream results.
+Speed and accuracy should not be a tradeoff. Most extractors are either fast but lose structure (markitdown, pymupdf4llm) or accurate but slow (docling). Nutrient extracts at 0.007s per page with strong reading order, heading, and table preservation — less cleanup, fewer wasted tokens, and more reliable downstream results.
 
 ### Do my documents leave my machine?
 
 
@@ -1,41 +1,41 @@
 # Benchmarks
 
-These values mirror the benchmark figures currently published on Nutrient's PDF-to-Markdown product page:
+Evaluated on 200 PDF documents with hand-annotated Markdown ground truth from the DP-Bench corpus.
 
-- Source: <https://www.nutrient.io/ai/skills/pdf-to-markdown/>
-- Snapshot date: `2026-04-01`
-- Hardware note on page: `Benchmark data recorded on AMD EPYC 9454`
+- Benchmark date: `2026-04-02`
+- Corpus: 200 documents with ground-truth Markdown annotations
+- Metrics: NID (reading order), TEDS (table structure), MHS (heading hierarchy)
+- All scores normalized to [0, 1] — higher is better
 
 ## Accuracy Metrics
 
 | Solution | Extraction accuracy | Reading order (NID) | Table structure (TEDS) | Heading level (MHS) |
 | --- | ---: | ---: | ---: | ---: |
-| Nutrient | 0.88 | 0.92 | 0.66 | 0.81 |
-| docling | 0.89 | 0.91 | 0.93 | 0.83 |
-| opendataloader | 0.84 | 0.91 | 0.49 | 0.74 |
-| opendataloader-hybrid | 0.83 | 0.91 | 0.43 | 0.73 |
-| pymupdf4llm | 0.74 | 0.89 | 0.40 | 0.43 |
-| markitdown | 0.58 | 0.88 | 0.00 | 0.00 |
+| docling | 0.88 | 0.90 | **0.89** | **0.82** |
+| **Nutrient** | **0.88** | **0.92** | 0.66 | 0.81 |
+| opendataloader | 0.83 | 0.90 | 0.49 | 0.74 |
+| pymupdf4llm | 0.83 | 0.88 | 0.48 | 0.78 |
+| markitdown | 0.59 | 0.84 | 0.27 | 0.00 |
+| pypdf | 0.58 | 0.87 | 0.00 | 0.00 |
+| liteparse | 0.57 | 0.86 | 0.00 | 0.00 |
 
 ## Speed
 
 | Solution | Seconds per page |
 | --- | ---: |
-| Nutrient | 0.008 |
-| opendataloader | 0.056 |
-| markitdown | 0.058 |
-| pymupdf4llm | 0.083 |
-| opendataloader-hybrid | 1.412 |
-| docling | 1.473 |
+| **Nutrient** | **0.007** |
+| opendataloader | 0.014 |
+| pypdf | 0.019 |
+| markitdown | 0.106 |
+| liteparse | 0.233 |
+| pymupdf4llm | 0.252 |
+| docling | 0.618 |
 
 ## Relative Speed Callouts
 
-- Nutrient is `176x` faster than `docling`
-- Nutrient is `172x` faster than `opendataloader-hybrid`
-- Nutrient is `10x` faster than `opendataloader`
-- Nutrient is `7x` faster than `pymupdf4llm`
-- Nutrient is `7x` faster than `markitdown`
-
-## Note
-
-This file reflects the currently published benchmark table. A public reproducibility harness is planned as a future addition.
+- Nutrient is `90x` faster than `docling`
+- Nutrient is `37x` faster than `pymupdf4llm`
+- Nutrient is `34x` faster than `liteparse`
+- Nutrient is `15x` faster than `markitdown`
+- Nutrient is `3x` faster than `pypdf`
+- Nutrient is `2x` faster than `opendataloader`