Packrift Packaging Optimization Benchmark Corpus

Public Packrift-owned packaging optimization benchmark corpus for SKU-specific DIM, fit, cost, routing, and warehouse planning references.

The live corpus is published at:

https://packrift.github.io/packaging-optimization-benchmark-corpus/

This uses the Merchant Center top-1,000 exact-spec JSONL feed as the source selection and joins the local product spec graph only to recover SKU, title, handle, product URL, price snapshot, and inventory snapshot. Current commerce facts, checkout, inventory, freight, and approval decisions stay on Packrift.com.

Generate

node generate-optimization-benchmark-corpus.mjs

Then run the quality audit:

node audit-corpus.mjs

Default public URL target:

https://packrift.github.io/packaging-optimization-benchmark-corpus

Override with:

BASE_URL=https://packrift.github.io/your-repo-name node generate-optimization-benchmark-corpus.mjs

Corpus Shape

Source records: 1,000 exact-spec Packrift feed rows
Page types per SKU: 24
SKU benchmark pages: 24,000
Supporting index/hub/methodology pages: home, SKU index, page-type index, pSEO workflow, cartonization benchmark technical note, cartonization solver fixtures, quality policy, dataset metadata, 6 family hubs, and 24 page-type hubs
Total sitemap URLs after local generation: 24,039
HTML files after local generation: 24,040 including 404.html
GitHub Pages output folder: docs/
Data evidence files: quality-ledger.csv, manifest.json, seo-quality-audit.json, datapackage.json, croissant.json, schema-dataset.jsonld, datacite.json, ro-crate-metadata.json, and kaggle-dataset-metadata-draft.json

Dataset Files

quality-ledger.csv - SKU-level source ledger with offer IDs, families, source Packrift product URLs, quality scores, and missing-field flags.
manifest.json - generation manifest with source-row counts, family counts, page-type counts, sitemap counts, and quality guardrails.
seo-quality-audit.json - static audit report covering title/description duplication, canonical/sitemap agreement, structured data, breadcrumbs, and Packrift product-link coverage.
datapackage.json, croissant.json, schema-dataset.jsonld, datacite.json, ro-crate-metadata.json, and kaggle-dataset-metadata-draft.json - machine-readable dataset metadata for search/discovery and later archive-platform submission.
docs/dataset-metadata.html - public metadata index page linking the machine-readable files.
docs/cartonization-benchmark-note.html - technical benchmark note defining source fields, tasks, metrics, baselines, and limitations for cartonization/bin-packing use cases.
docs/cartonization-solver-fixtures.html and examples/cartonization-fixtures/ - solver-ready CSV, JSON, and TXT fixture pack for bin-packing parser tests and runnable examples.
docs/ - generated HTML corpus and sitemap files served by GitHub Pages.
examples/ortools-carton-selection/ and docs/ortools-carton-selection-example.html - small Google OR-Tools CP-SAT carton-selection example using static Packrift dimension samples.

Page Types

The 24 page types are operationally distinct: DIM-weight benchmark, cube utilization, length-plus-girth, carton-fit boundary, void-fill screen, parcel/freight router, pallet storage prompt, warehouse bin slotting, pick-path label card, receiving inspection, source-spec audit, substitute approval, damage risk, material compatibility, pack-count normalization, unit economics, reorder trigger, bulk quote prep, marketplace prep, returns repack, AI retrieval, buyer comparison, QA exception, and implementation handoff.

OR-Tools Example

The repository includes a small Google OR-Tools CP-SAT example at examples/ortools-carton-selection/. It selects the smallest feasible carton from a static Packrift sample set using orientation and relaxed volume screens. The public explainer page is docs/ortools-carton-selection-example.html.

Quality Safeguards

Requires a product graph match for every feed offerId, so generated pages can link to real Packrift product URLs.
Requires each row to pass a source-quality gate before pages are emitted.
States missing dimensions or unsupported calculations explicitly instead of guessing.
Keeps current price, inventory, freight, checkout, fit approval, and substitute approval on Packrift.com.
Uses page-type-specific calculations and checklists rather than keyword-swapped duplicate pages.
Publishes a Packrift-specific pSEO workflow page so quality rules are visible, not only internal.
Splits XML sitemaps by static, family, and page-type sections with <lastmod> values for monitoring.
Adds JSON-LD for Dataset, TechArticle, Product-as-about, WebSite, Organization, and BreadcrumbList where the visible page content supports it.
Runs audit-corpus.mjs to block missing titles, missing descriptions, canonical/sitemap mismatches, bad structured data, missing H1s, missing breadcrumb schema, and missing Packrift product links.
Counts as Packrift-owned URL-scale reference content, not third-party backlinks, referring domains, editorial endorsements, or directory listings.

Release / Citation

Use the GitHub release archive for versioned citation and third-party dataset submissions. This corpus does not claim independent editorial endorsement; it is an owned public resource and benchmark dataset published by Packrift.

Suggested citation:

Packrift. Packrift Packaging Optimization Benchmark Corpus. GitHub repository and dataset archive. https://github.com/Packrift/packaging-optimization-benchmark-corpus

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Packrift Packaging Optimization Benchmark Corpus

Generate

Corpus Shape

Dataset Files

Page Types

OR-Tools Example

Quality Safeguards

Release / Citation

About

Uh oh!

Releases 2

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
docs		docs
examples		examples
.gitignore		.gitignore
CITATION.cff		CITATION.cff
DATASET.md		DATASET.md
README.md		README.md
STATUS-2026-05-10.md		STATUS-2026-05-10.md
audit-corpus.mjs		audit-corpus.mjs
croissant.json		croissant.json
datacite.json		datacite.json
datapackage.json		datapackage.json
generate-optimization-benchmark-corpus.mjs		generate-optimization-benchmark-corpus.mjs
kaggle-dataset-metadata-draft.json		kaggle-dataset-metadata-draft.json
manifest.json		manifest.json
quality-ledger.csv		quality-ledger.csv
ro-crate-metadata.json		ro-crate-metadata.json
schema-dataset.jsonld		schema-dataset.jsonld
seo-quality-audit.json		seo-quality-audit.json

Folders and files

Latest commit

History

Repository files navigation

Packrift Packaging Optimization Benchmark Corpus

Generate

Corpus Shape

Dataset Files

Page Types

OR-Tools Example

Quality Safeguards

Release / Citation

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases 2

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages