Public Packrift-owned packaging optimization benchmark corpus for SKU-specific DIM, fit, cost, routing, and warehouse planning references.
The live corpus is published at:
https://packrift.github.io/packaging-optimization-benchmark-corpus/
This uses the Merchant Center top-1,000 exact-spec JSONL feed as the source selection and joins the local product spec graph only to recover SKU, title, handle, product URL, price snapshot, and inventory snapshot. Current commerce facts, checkout, inventory, freight, and approval decisions stay on Packrift.com.
node generate-optimization-benchmark-corpus.mjsThen run the quality audit:
node audit-corpus.mjsDefault public URL target:
https://packrift.github.io/packaging-optimization-benchmark-corpus
Override with:
BASE_URL=https://packrift.github.io/your-repo-name node generate-optimization-benchmark-corpus.mjs- Source records: 1,000 exact-spec Packrift feed rows
- Page types per SKU: 24
- SKU benchmark pages: 24,000
- Supporting index/hub/methodology pages: home, SKU index, page-type index, pSEO workflow, cartonization benchmark technical note, cartonization solver fixtures, quality policy, dataset metadata, 6 family hubs, and 24 page-type hubs
- Total sitemap URLs after local generation: 24,039
- HTML files after local generation: 24,040 including
404.html - GitHub Pages output folder:
docs/ - Data evidence files:
quality-ledger.csv,manifest.json,seo-quality-audit.json,datapackage.json,croissant.json,schema-dataset.jsonld,datacite.json,ro-crate-metadata.json, andkaggle-dataset-metadata-draft.json
quality-ledger.csv- SKU-level source ledger with offer IDs, families, source Packrift product URLs, quality scores, and missing-field flags.manifest.json- generation manifest with source-row counts, family counts, page-type counts, sitemap counts, and quality guardrails.seo-quality-audit.json- static audit report covering title/description duplication, canonical/sitemap agreement, structured data, breadcrumbs, and Packrift product-link coverage.datapackage.json,croissant.json,schema-dataset.jsonld,datacite.json,ro-crate-metadata.json, andkaggle-dataset-metadata-draft.json- machine-readable dataset metadata for search/discovery and later archive-platform submission.docs/dataset-metadata.html- public metadata index page linking the machine-readable files.docs/cartonization-benchmark-note.html- technical benchmark note defining source fields, tasks, metrics, baselines, and limitations for cartonization/bin-packing use cases.docs/cartonization-solver-fixtures.htmlandexamples/cartonization-fixtures/- solver-ready CSV, JSON, and TXT fixture pack for bin-packing parser tests and runnable examples.docs/- generated HTML corpus and sitemap files served by GitHub Pages.examples/ortools-carton-selection/anddocs/ortools-carton-selection-example.html- small Google OR-Tools CP-SAT carton-selection example using static Packrift dimension samples.
The 24 page types are operationally distinct: DIM-weight benchmark, cube utilization, length-plus-girth, carton-fit boundary, void-fill screen, parcel/freight router, pallet storage prompt, warehouse bin slotting, pick-path label card, receiving inspection, source-spec audit, substitute approval, damage risk, material compatibility, pack-count normalization, unit economics, reorder trigger, bulk quote prep, marketplace prep, returns repack, AI retrieval, buyer comparison, QA exception, and implementation handoff.
The repository includes a small Google OR-Tools CP-SAT example at examples/ortools-carton-selection/. It selects the smallest feasible carton from a static Packrift sample set using orientation and relaxed volume screens. The public explainer page is docs/ortools-carton-selection-example.html.
- Requires a product graph match for every feed
offerId, so generated pages can link to real Packrift product URLs. - Requires each row to pass a source-quality gate before pages are emitted.
- States missing dimensions or unsupported calculations explicitly instead of guessing.
- Keeps current price, inventory, freight, checkout, fit approval, and substitute approval on Packrift.com.
- Uses page-type-specific calculations and checklists rather than keyword-swapped duplicate pages.
- Publishes a Packrift-specific pSEO workflow page so quality rules are visible, not only internal.
- Splits XML sitemaps by static, family, and page-type sections with
<lastmod>values for monitoring. - Adds JSON-LD for Dataset, TechArticle, Product-as-about, WebSite, Organization, and BreadcrumbList where the visible page content supports it.
- Runs
audit-corpus.mjsto block missing titles, missing descriptions, canonical/sitemap mismatches, bad structured data, missing H1s, missing breadcrumb schema, and missing Packrift product links. - Counts as Packrift-owned URL-scale reference content, not third-party backlinks, referring domains, editorial endorsements, or directory listings.
Use the GitHub release archive for versioned citation and third-party dataset submissions. This corpus does not claim independent editorial endorsement; it is an owned public resource and benchmark dataset published by Packrift.
Suggested citation:
Packrift. Packrift Packaging Optimization Benchmark Corpus. GitHub repository and dataset archive. https://github.com/Packrift/packaging-optimization-benchmark-corpus