Commit 759398f added support for reading gzip files with ISA-L to the C++ code. This has not been ported to Rust, yet.
We currently use the flate2 crate for decompression. We use their zlib-rs backend, which is supposed to be the fastest one.
Decompressing compressed FASTQ files represents only a small fraction of total runtime, so in many cases we don’t actually gain that much, but it is highly relevant when we try to run strobealign on a machine with many cores. Before we had ISA-L in the C++ code, strobealign could only saturate about 32 CPU cores. When more cores are available, the worker threads sit idle part of the time because they have to wait for the (single) decompression thread to provide them with data.
I measured this on the Rust version and can see the same thing: When increasing the number of threads, CPU usage scales accordingly up to about 32 threads and then stays flat.
When I tried to switch our code to use the isal-rs crate, it initially looked straightforward and was essentially just a cargo add isal-rs and switching from flate2::read::MultiGzDecoder to isal::read::GzipDecoder. However, the code does not actually compile because GzipDecoder does not implement the Send trait.
This is a Rust-specific thing: The Send trait marks types that are safe to send from one thread to another. We do this in strobealign because the GzipDecoder is created in the main thread so that we can estimate the read length, and then it is moved into a new thread that reads the entire file.
Here are a couple of ways to solve this:
- Explicitly mark
isal::read::GzipDecoder as Send. This requires the use of the unsafe keyword and is very, very likely not the proper way to solve this. We should do this only if we understand much better what the implications are.
- Try cloudflare’s fork of zlib instead, which is also supposed to be a lot faster than regular zlib.
- Restructure the code so that we don’t need to send the
GzipDecoder to a different thread. That is, read the initial 500 reads for estimating read length also in the reader thread.
I have tried the first option just to have something that I can perhaps use in benchmarks in order to see whether it actually makes a difference in core usage, but there were some issues compiling it.
I haven’t tried cloudflare’s zlib, but from what I heard, it is not as fast as ISA-L.
Restructuring the code is probably the cleanest way forward.
See also milesgranger/isal-rs#33
Commit 759398f added support for reading gzip files with ISA-L to the C++ code. This has not been ported to Rust, yet.
We currently use the flate2 crate for decompression. We use their
zlib-rsbackend, which is supposed to be the fastest one.Decompressing compressed FASTQ files represents only a small fraction of total runtime, so in many cases we don’t actually gain that much, but it is highly relevant when we try to run strobealign on a machine with many cores. Before we had ISA-L in the C++ code, strobealign could only saturate about 32 CPU cores. When more cores are available, the worker threads sit idle part of the time because they have to wait for the (single) decompression thread to provide them with data.
I measured this on the Rust version and can see the same thing: When increasing the number of threads, CPU usage scales accordingly up to about 32 threads and then stays flat.
When I tried to switch our code to use the isal-rs crate, it initially looked straightforward and was essentially just a
cargo add isal-rsand switching fromflate2::read::MultiGzDecodertoisal::read::GzipDecoder. However, the code does not actually compile becauseGzipDecoderdoes not implement theSendtrait.This is a Rust-specific thing: The Send trait marks types that are safe to send from one thread to another. We do this in strobealign because the GzipDecoder is created in the main thread so that we can estimate the read length, and then it is moved into a new thread that reads the entire file.
Here are a couple of ways to solve this:
isal::read::GzipDecoderasSend. This requires the use of theunsafekeyword and is very, very likely not the proper way to solve this. We should do this only if we understand much better what the implications are.GzipDecoderto a different thread. That is, read the initial 500 reads for estimating read length also in the reader thread.I have tried the first option just to have something that I can perhaps use in benchmarks in order to see whether it actually makes a difference in core usage, but there were some issues compiling it.
I haven’t tried cloudflare’s zlib, but from what I heard, it is not as fast as ISA-L.
Restructuring the code is probably the cleanest way forward.
See also milesgranger/isal-rs#33