An experimental Rust library for reading and writing ᴡᴀᴄᴢ files.
With cargo installed, run the following command in your project directory:
cargo add wacksy
This library provides two main ᴀᴘɪ functions.
from_files() takes a slice containing one or more ᴡᴀʀᴄ files and returns a structured representation of a ᴡᴀᴄᴢ object.
as_zip_archive() takes a ᴡᴀᴄᴢ object and zips it up to a byte array using rawzip.
fn main() -> Result<(), Box<dyn Error>> {
let warc_file_path = Path::new("example.warc.gz"); // set path to your ᴡᴀʀᴄ file
let warc_file_path2 = Path::new("example2.warc.gz");
let wacz_object = WACZ::from_files(&[warc_file_path, warc_file_path2])?; // index the ᴡᴀʀᴄ and create a ᴡᴀᴄᴢ object
let zipped_wacz: Vec<u8> = wacz_object.as_zip_archive()?; // zip up the ᴡᴀᴄᴢ
fs::write("example.wacz", zipped_wacz)?; // write out to file
Ok(())
}For backwards compatability, WACZ::from_file() is also available and will take a single WARC file
See the documentation for more details.
According to Ed Summers, a ᴡᴀᴄᴢ file is "really just a ᴢɪᴘ file that contains ᴡᴀʀᴄ data and metadata at predictable file locations."1
The example in the spec outlines what a ᴡᴀᴄᴢ file should contain:
archive
└── data.warc.gz
datapackage.json
datapackage-digest.json
indexes
└── index.cdx.gz
pages
└── pages.jsonl
MIT © Bodleian Libraries and contributors
Footnotes
-
For more discussion of the concept, see the talk "Web Archives in Digital Repositories" by Ilya Kremer and Ed Summers at Code4Lib 2022. ↩