Skip to content

stdlib: std::compression::zip and std::compression::deflate#2930

Merged
lerno merged 28 commits intoc3lang:masterfrom
ManuLinares:std_zip
Feb 20, 2026
Merged

stdlib: std::compression::zip and std::compression::deflate#2930
lerno merged 28 commits intoc3lang:masterfrom
ManuLinares:std_zip

Conversation

@ManuLinares
Copy link
Copy Markdown
Member

@ManuLinares ManuLinares commented Feb 12, 2026

  • C3 implementation of DEFLATE (RFC 1951) and ZIP archive handling.
  • Support for reading and writing archives using STORE and DEFLATE methods.
  • Supports both fixed and dynamic Huffman blocks
  • Compression using greedy LZ77 matching.
  • Stream-based entry reading and writing.
  • ZIP64 support for large files/archives.
  • Unit tests and benchmarks.
$ gcc -O3 -c dependencies/miniz/miniz.c -o build/miniz.o
$ build/c3c -O3 compile-run test/compression/deflate_benchmark.c3 build/miniz.o


 DEFLATE BENCHMARK  Comparing C3 std::compression::deflate with miniz (in-process)

Test Case                  | C3 Rat. | Miz Rat. | C3 MB/s | Miz MB/s | Winner
---------------------------+---------+---------+---------+---------+-----------
Redundant (100MB 'A')      |   0.10% |   0.10% |  1342.2 |   911.9 | C3 (1.5x)
Compiler Source (Bulk)     |  22.06% |  22.15% |   121.3 |   132.4 | Miniz (1.1x)
Stdlib Source (Bulk)       |  24.87% |  24.61% |   118.1 |   122.5 | Miniz (1.0x)
Log Files (Simulated)      |  11.10% |  11.13% |   343.5 |   301.2 | C3 (1.1x)
Web Content (Simulated)    |   1.51% |   1.51% |   751.7 |   779.9 | Miniz (1.0x)
CSV Data (Simulated)       |  15.05% |  14.70% |   244.2 |   254.3 | Miniz (1.0x)
Binary Data (Structured)   |  28.77% |  28.85% |   102.2 |    96.5 | C3 (1.1x)
Random Noise (Scaled)      |  65.82% |  64.57% |    79.8 |    65.8 | C3 (1.2x)
Tiny File (asd.c3)         | 140.00% | 116.67% |     3.2 |    12.9 | Miniz (4.1x)
Natural Text (Scaled)      |   0.29% |   0.29% |  1358.9 |   879.4 | C3 (1.5x)

OVERALL SUMMARY
  Average Throughput C3:       446.5 MB/s
  Average Throughput Miniz:    355.7 MB/s
  C3 is 1.3x faster on average!

Comment thread lib/std/compression/zip.c3 Outdated
Comment thread lib/std/compression/zip.c3 Outdated
Copy link
Copy Markdown
Member

@Book-reader Book-reader left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's great to see zip support being added to the stdlib! I don't have much knowledge of the zip spec so I can't comment on the implementation, but I do have this one small style nitpick

Comment thread lib/std/compression/deflate.c3 Outdated
Comment thread lib/std/compression/zip.c3 Outdated
Copy link
Copy Markdown
Member

@Book-reader Book-reader left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we add a test case for writing a file with size above 0xFFFFFFFE?

Comment thread lib/std/compression/zip.c3 Outdated
Comment thread lib/std/compression/zip.c3 Outdated
Copy link
Copy Markdown
Member

@Book-reader Book-reader left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

figured out the other problem, with this change and #2930 (comment) writing files 0xFFFFFFFF and larger works

Comment thread lib/std/compression/zip.c3 Outdated
@ManuLinares
Copy link
Copy Markdown
Member Author

ManuLinares commented Feb 13, 2026

Should we add a test case for writing a file with size above 0xFFFFFFFE?

I added a specific test case for this scenario. If you would like to extend the testing further, that would be appreciated.

@konimarti
Copy link
Copy Markdown
Contributor

A few more observations:

  • when you write the zip headers to a file with self.file.write(((char*)&cdh)[:ZipCDH.sizeof])!; is that safe on big-endian systems? The zip format expects every header in little-endian. You already take care of the endianness when writing the extra bytes.
  • the ArchiveStreamAdapter seems to act like the io::LimitReader from the stdlib; maybe this one could be used instead?
  • code attribution: this is probably personal style but I'd suggest to give credit where appropriate. the package-merge algorithm is most likely from https://create.stephan-brumme.com/length-limited-prefix-codes/ since it seems to be related to my c3 port. That was actually the trickiest part in the dynamic Huffman deflate implementation because it's not covered in the RFC.

@ManuLinares
Copy link
Copy Markdown
Member Author

A few more observations:

* when you write the zip headers to a file with `self.file.write(((char*)&cdh)[:ZipCDH.sizeof])!;` is that safe on big-endian systems? The zip format expects every header in little-endian. You already take care of the endianness when writing the extra bytes.

* the `ArchiveStreamAdapter` seems to act like the io::LimitReader from the stdlib; maybe this one could be used instead?

* code attribution: this is probably personal style but I'd suggest to give credit where appropriate. the package-merge algorithm is most likely from https://create.stephan-brumme.com/length-limited-prefix-codes/ since it seems to be related to my c3 port. That was actually the trickiest part in the dynamic Huffman deflate implementation because it's not covered in the RFC.

Hi @konimarti, excellent feedback!

  • Endianness: I've updated the ZIP header structs (ZipCDH, ZipLFH, etc.) to use the little-endian types from std::core::bitorder.
  • The adapter here is a bit more specialized than the stdlib version. It has to seek to the correct offset before every read because multiple readers share the same file handle.
  • I've added the proper attribution to the pkg_merge function, giving credit to your C3 implementation and Stephan Brumme's original work, thanks.

I also fixed a small bug in the STORE read method.

…ate`

- C3 implementation of DEFLATE (RFC 1951) and ZIP archive handling.
- Support for reading and writing archives using STORE and DEFLATE
methods.
- Decompression supports both fixed and dynamic Huffman blocks.
- Compression using greedy LZ77 matching.
- Zero dependencies on libc.
- Stream-based entry reading and writing.
- Full unit test coverage.

NOTE: This is an initial implementation. Future improvements could be:

- Optimization of the LZ77 matching (lazy matching).
- Support for dynamic Huffman blocks in compression.
- ZIP64 support for large files/archives.
- Support for encryption and additional compression methods.
deflate:
- replace linear search with hash-based match finding.
- implement support for dynamic Huffman blocks using the Package-Merge
algorithm.
- add streaming decompression.
- add buffered StreamBitReader.

zip:
- add ZIP64 support.
- add CP437 and UTF-8 filename encoding detection.
- add DOS date/time conversion and timestamp preservation.
- add ZipEntryReader for streaming entry reads.
- implement ZipArchive.extract and ZipArchive.recover helpers.

other:
- Add `set_modified_time` to std::io;
- Add benchmarks and a few more unit tests.
fix method not passed to open_writer
- detect encrypted zip
- `ZipArchive.open_writer` default to DEFLATE
Update ZipLFH, ZipCDH, ZipEOCD, Zip64EOCD, and Zip64Locator structs to
use little-endian bitstruct types from std::core::bitorder
@ManuLinares
Copy link
Copy Markdown
Member Author

@lerno I've pushed the final optimizations and fixes. All tests and benchmarks are passing. Last change would be:

quoting @Book-reader "The style in the stdlib is usually to have the allocator as the first parameter instead of using a default parameter", yes?

Comment thread lib/std/compression/zip.c3 Outdated
@Book-reader
Copy link
Copy Markdown
Member

Would it be possible to compress the data written to a ZipEntryWriter as it's streamed in? Currently compressing large files consumes the size of the uncompressed data as it's being written, which can be a lot.
This might require making an InStream that does the same as deflate::compress though, so that could be a future thing after this PR unless it's very easy to implement. I don't this PR to be held back just for that.

@ManuLinares
Copy link
Copy Markdown
Member Author

Would it be possible to compress the data written to a ZipEntryWriter as it's streamed in? Currently compressing large files consumes the size of the uncompressed data as it's being written, which can be a lot. This might require making an InStream that does the same as deflate::compress though, so that could be a future thing after this PR unless it's very easy to implement. I don't this PR to be held back just for that.

Yes, it's definitely possible, but it would require a streaming deflate compressor which the std::compression::deflate module currently lacks.

fn void? compress_stream(Allocator allocator = mem, InStream input, OutStream output)

This is indeed a more complex implementation than the current one and would be a good future improvement. I had some issues with implementing this last week but will definitively give it another go when I have the time.

Book-reader and others added 3 commits February 16, 2026 14:50
* style changes

* update tests

* style changes in `deflate.c3`
Copy link
Copy Markdown
Member

@Book-reader Book-reader left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great! Most of the feedback I have now are these incredibly minor code style things I missed before.
I do think the allocator should be changed to be the first parameter in deflate::compress etc, the reason the stdlib style changed in 0.7 to be like that was so it was clear which functions would cause allocations that would need to be freed because they would need to be explicitly given an allocator.

Comment thread lib/std/compression/deflate.c3 Outdated
Comment thread lib/std/io/os/file_libc.c3 Outdated
Comment thread lib/std/compression/deflate.c3 Outdated
@konimarti
Copy link
Copy Markdown
Contributor

There's an issue in the deflate::decompress_stream function: it will read beyond the end-of-block marker of the final deflate block. The inflate decompressor should only read up until the end of the deflate stream and not further. This issue did not pop up in the current tests because the zip format provide the lengths for both the compressed and uncompressed data. However, this is not the case in many other formats or applications of deflate.

The following tests highlights the issue and fails with an io::EOF error in the last read_byte call:

fn void test_deflate_embedded_stream() => @pool()
{
	String base = "This is a streaming test for DEFLATE. ";

	char[] compressed = deflate::compress(mem, base[..])!!;
	defer free(compressed.ptr);

	usz append_len = compressed.len + 1;
	char[] append = mem::malloc(append_len)[:append_len];
	defer free(append.ptr);

	append[:compressed.len] = compressed[..];
	append[^1] = 'c';

	ByteReader reader;
	reader.init(append);

	ByteWriter writer;
	writer.tinit();

	deflate::decompress_stream(&reader, &writer)!!;
	assert(writer.str_view() == base);

	assert(reader.read_byte()!! == 'c');
}

@lerno
Copy link
Copy Markdown
Collaborator

lerno commented Feb 20, 2026

Maybe something I introduced?

@konimarti
Copy link
Copy Markdown
Contributor

Maybe something I introduced?

I dropped the last two commits and re-tested and it still fails. It seems not related to your changes. I think it's related to the refill logic of the bitreader.

@lerno
Copy link
Copy Markdown
Collaborator

lerno commented Feb 20, 2026

Yeah, I looked through it and this happens due to the internal buffering.

…ailable.

- `instream.seek` is replaced by `set_cursor` and `cursor`.
- `instream.available`, `cursor` etc are long/ulong rather than isz/usz to be correct on 32-bit.
@lerno lerno merged commit eae7d0c into c3lang:master Feb 20, 2026
22 checks passed
@ManuLinares ManuLinares deleted the std_zip branch March 7, 2026 16:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants