Skip to content

Packrift/packaging-optimization-benchmark-corpus

Repository files navigation

Packrift Packaging Optimization Benchmark Corpus

Public Packrift-owned packaging optimization benchmark corpus for SKU-specific DIM, fit, cost, routing, and warehouse planning references.

The live corpus is published at:

https://packrift.github.io/packaging-optimization-benchmark-corpus/

This uses the Merchant Center top-1,000 exact-spec JSONL feed as the source selection and joins the local product spec graph only to recover SKU, title, handle, product URL, price snapshot, and inventory snapshot. Current commerce facts, checkout, inventory, freight, and approval decisions stay on Packrift.com.

Generate

node generate-optimization-benchmark-corpus.mjs

Then run the quality audit:

node audit-corpus.mjs

Default public URL target:

https://packrift.github.io/packaging-optimization-benchmark-corpus

Override with:

BASE_URL=https://packrift.github.io/your-repo-name node generate-optimization-benchmark-corpus.mjs

Corpus Shape

  • Source records: 1,000 exact-spec Packrift feed rows
  • Page types per SKU: 24
  • SKU benchmark pages: 24,000
  • Supporting index/hub/methodology pages: home, SKU index, page-type index, pSEO workflow, cartonization benchmark technical note, cartonization solver fixtures, quality policy, dataset metadata, 6 family hubs, and 24 page-type hubs
  • Total sitemap URLs after local generation: 24,039
  • HTML files after local generation: 24,040 including 404.html
  • GitHub Pages output folder: docs/
  • Data evidence files: quality-ledger.csv, manifest.json, seo-quality-audit.json, datapackage.json, croissant.json, schema-dataset.jsonld, datacite.json, ro-crate-metadata.json, and kaggle-dataset-metadata-draft.json

Dataset Files

  • quality-ledger.csv - SKU-level source ledger with offer IDs, families, source Packrift product URLs, quality scores, and missing-field flags.
  • manifest.json - generation manifest with source-row counts, family counts, page-type counts, sitemap counts, and quality guardrails.
  • seo-quality-audit.json - static audit report covering title/description duplication, canonical/sitemap agreement, structured data, breadcrumbs, and Packrift product-link coverage.
  • datapackage.json, croissant.json, schema-dataset.jsonld, datacite.json, ro-crate-metadata.json, and kaggle-dataset-metadata-draft.json - machine-readable dataset metadata for search/discovery and later archive-platform submission.
  • docs/dataset-metadata.html - public metadata index page linking the machine-readable files.
  • docs/cartonization-benchmark-note.html - technical benchmark note defining source fields, tasks, metrics, baselines, and limitations for cartonization/bin-packing use cases.
  • docs/cartonization-solver-fixtures.html and examples/cartonization-fixtures/ - solver-ready CSV, JSON, and TXT fixture pack for bin-packing parser tests and runnable examples.
  • docs/ - generated HTML corpus and sitemap files served by GitHub Pages.
  • examples/ortools-carton-selection/ and docs/ortools-carton-selection-example.html - small Google OR-Tools CP-SAT carton-selection example using static Packrift dimension samples.

Page Types

The 24 page types are operationally distinct: DIM-weight benchmark, cube utilization, length-plus-girth, carton-fit boundary, void-fill screen, parcel/freight router, pallet storage prompt, warehouse bin slotting, pick-path label card, receiving inspection, source-spec audit, substitute approval, damage risk, material compatibility, pack-count normalization, unit economics, reorder trigger, bulk quote prep, marketplace prep, returns repack, AI retrieval, buyer comparison, QA exception, and implementation handoff.

OR-Tools Example

The repository includes a small Google OR-Tools CP-SAT example at examples/ortools-carton-selection/. It selects the smallest feasible carton from a static Packrift sample set using orientation and relaxed volume screens. The public explainer page is docs/ortools-carton-selection-example.html.

Quality Safeguards

  • Requires a product graph match for every feed offerId, so generated pages can link to real Packrift product URLs.
  • Requires each row to pass a source-quality gate before pages are emitted.
  • States missing dimensions or unsupported calculations explicitly instead of guessing.
  • Keeps current price, inventory, freight, checkout, fit approval, and substitute approval on Packrift.com.
  • Uses page-type-specific calculations and checklists rather than keyword-swapped duplicate pages.
  • Publishes a Packrift-specific pSEO workflow page so quality rules are visible, not only internal.
  • Splits XML sitemaps by static, family, and page-type sections with <lastmod> values for monitoring.
  • Adds JSON-LD for Dataset, TechArticle, Product-as-about, WebSite, Organization, and BreadcrumbList where the visible page content supports it.
  • Runs audit-corpus.mjs to block missing titles, missing descriptions, canonical/sitemap mismatches, bad structured data, missing H1s, missing breadcrumb schema, and missing Packrift product links.
  • Counts as Packrift-owned URL-scale reference content, not third-party backlinks, referring domains, editorial endorsements, or directory listings.

Release / Citation

Use the GitHub release archive for versioned citation and third-party dataset submissions. This corpus does not claim independent editorial endorsement; it is an owned public resource and benchmark dataset published by Packrift.

Suggested citation:

Packrift. Packrift Packaging Optimization Benchmark Corpus. GitHub repository and dataset archive. https://github.com/Packrift/packaging-optimization-benchmark-corpus

About

Packrift-owned packaging optimization benchmark corpus for SKU-specific DIM, fit, cost, routing, and warehouse planning references.

Topics

Resources

Stars

Watchers

Forks

Packages

 
 
 

Contributors