Skip to content

Use modern compression algorithm #164

@jinnatar

Description

@jinnatar

In short: Please consider switching to using xz archives which uses the lzma compression algorithm. By my tests this can reduce total backup size by 30%.

At current dumps are stored as zip archives. The old and venerable zip primarily uses the deflate compression algorithm from 1990. Zips in theory may optionally support lzma but most tooling for it does not, including Info-ZIP (which most Linux distros use) which was last updated in 2008. There is support for bzip2 in Info-ZIP and while it's better than deflate, it's only slightly better.

The solution I'm proposing is to switch to a better container with better algorithm support. By my quick tests xz is the winner as it uses the lzma algorithm with a robust and modern container format. For comparison I've also included the legacy lzma container format below1. There's potentially further gains to be had by increasing the compression factor from the default 6 up to 7..9 but that increases the memory requirements. A 7 might be a good compromise.

Sample files, first is dump as stored by pgbackweb, then the same deflated. The rest are different ways of compressing the same dump:

3.0M dump.sql
319k dump.sql.gz
317k dump-20251226-020000-6cbade9f-0ef5-45cf-9778-9b6aa4dc7d0a.zip
312k dump.sql.zip
307k dump.sql.bz2
277k dump.sql.zst
218k dump.sql.xz
218k dump.sql.lzma

The file types of the same tests:

dump-20251226-020000-6cbade9f-0ef5-45cf-9778-9b6aa4dc7d0a.zip: Zip archive data, at least v2.0 to extract, compression method=deflate
dump.sql: ASCII text
dump.sql.bz2: bzip2 compressed data, block size = 900k
dump.sql.gz: gzip compressed data, was "dump.sql", last modified: Sun Dec 30 22:00:00 1979, from Unix, original size modulo 2^32 3004661
dump.sql.lzma: LZMA compressed data, streamed
dump.sql.xz: XZ compressed data, checksum CRC64
dump.sql.zip: Zip archive data, at least v4.6 to extract, compression method=bzip2
dump.sql.zst: Zstandard compressed data (v0.8+), Dictionary ID: None

Footnotes

  1. While technically legacy lzma is the smallest. it's purely by having a smaller header than the modern xz by a couple of bytes. xz is superior in all other ways.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions