Beyond simple receipts and order-confirmation messages, several public datasets contain “order-like” records (customer purchased offers, order status, etc.). In particular, open procurement/order logs and point-of-sale datasets can serve this role. For example, many U.S. cities and agencies publish purchase‐order data: Baton Rouge, LA offers a “Purchase Orders and Contracts” dataset (JSON/CSV) listing all POs with header and line-item details【23†L67-L75】. Washington, DC’s Office of Contracting and Procurement provides “Purchase Orders from PASS” (orders >$2,500) in CSV format【26†L91-L100】. San Francisco publishes a “Vendor Payments (PO Summary)” CSV, summarizing vendor payments by purchase order【42†L44-L53】. These open-government datasets effectively function as order records (who bought what from whom, for how much, and status).
-
Government procurement data. Public procurement portals often list purchase orders/contracts. For instance, Baton Rouge’s open data describes each PO header and line items【23†L67-L75】. DC’s PASS dataset contains every purchase order exceeding $2,500【26†L91-L100】. San Francisco’s Controller office publishes a “Vendor Payments” report showing payments to city vendors grouped by PO【42†L44-L53】. (These can be downloaded in JSON/CSV form and contain fields like order number, date, vendor, and line-item details.)
-
E-commerce/retail transactions. Large e-commerce logs and retail sales datasets also qualify. For example, the UCI “Online Retail” dataset contains ~540K transactions from a UK online gift store (Dec 2010–Dec 2011)【31†L17-L25】【31†L71-L79】. Each record has an invoice number, customer ID, items, quantity, price and date – effectively a confirmed sale. (This is cited as a classic “retail order” dataset in data-mining literature【31†L17-L25】【31†L71-L79】.) Likewise, various Kaggle or academic datasets simulate or release order logs (e.g. Kaggle’s “E-commerce Order & Supply Chain” dataset), which include CSVs of orders, items, customers and payments. These capture the same structure (“who bought what, when, for how much”) as schema:Order.
-
Receipts and invoice corpora. Annotated image datasets of receipts/invoices are directly relevant. Zenodo hosts a dataset of 813 invoice/receipt images (Portuguese language) with transcripts of key fields (seller/buyer, date, total, tax, etc.)【57†L30-L37】. Another Zenodo release provides 200 hand-photographed restaurant receipts with bounding-box annotations【58†L28-L36】. There is also the well-known ICDAR SROIE dataset (Scanned Receipts OCR & IE) – 1,000 real-world receipt images with OCR text and extracted fields like company, date and total【59†L358-L366】. (Hugging Face’s scanned_receipts dataset is a wrapper for this.) These corpora go beyond plain text: they provide images and labels, but crucially they are structured around completed transactions.
-
Shipping and logistics logs. While pure parcel-tracking notices are “ParcelDelivery” in schema.org, some datasets combine order+shipping info. For example, Zenodo has an artificial container-shipping event log, modeling customer orders through loading and departure events【34†L47-L55】. It simulates container orders, transport documents, and departures – essentially an end-to-end order/shipment log. (Real public datasets for shipping confirmations are rare, but some supply-chain datasets – e.g. Kaggle’s “Logistic Events” – include order and shipment details together.)
-
Service and digital purchase records. Transactions for services or digital goods (software, subscriptions, tickets) also fit as Orders. Publicly available examples include app-store receipts or APIs (though these are rarely openly published). However, some academic corpora do capture service transactions: for instance, question–answer datasets around e-commerce customer service (e.g. Hugging Face’s e_commerce_customer_service) often mention order/shipment emails【12†L193-L202】. In general, any dataset of confirmed purchases – even for services – qualifies.
In summary, useful “order” datasets include procurement and payment records from open-data portals, point-of-sale transaction logs, and annotated receipt/invoice image collections. These sources match the schema.org/Order model (a confirmed sale with status). Key examples cited above include the UCI Online Retail data【31†L17-L25】【31†L71-L79】, city procurement CSV/JSON files【23†L67-L75】【26†L91-L100】【42†L44-L53】, and annotated receipt/image datasets【57†L30-L37】【58†L28-L36】【59†L358-L366】.
Sources: Public open-data catalogs and ML repositories. Government data portals publish PO and payment datasets【23†L67-L75】【26†L91-L100】【42†L44-L53】. Research datasets of receipts/invoices are on Zenodo and academic benchmarks【57†L30-L37】【58†L28-L36】【59†L358-L366】. The UCI repository details the Online Retail order dataset【31†L17-L25】【31†L71-L79】, among other e-commerce data archives. (Kaggle and Hugging Face also host similar datasets for orders, receipts, and logistics, though direct citations above come from open catalogs and publications.)