feat: Adds SSE4.2 optimized for x86-64#258
Open
hellobertrand wants to merge 4 commits into
Open
Conversation
Introduces hand-written SSE4.2 SIMD implementations for core compression and decompression routines. This provides a performance uplift for x86-64 CPUs that support SSE4.2 (and its implied instruction sets like SSSE3/SSE4.1) but lack AVX2 or AVX512, broadening SIMD acceleration to more hardware. Includes optimized paths for: - LZ77 match finding (16-byte comparisons). - Numerical block encoding/decoding (delta-of-delta, zigzag, prefix sum). - Dynamic programming cost updates for optimal parsing. - Byte run and literal match searching. - Overlapping memory copy utilities. The build system, CPU feature detection, and function dispatch logic have been updated to support the new SSE4.2 variant, prioritizing it over generic scalar code when AVX2/AVX512 are not available.
Introduces a new multiarch workflow job configured to run on a `Nehalem` CPU emulation. This ensures that the recently added SSE4.2 optimized kernels are properly tested on systems that support SSE4.2 but do not have AVX2 or AVX512, verifying correct feature dispatch and execution.
Codecov Report✅ All modified and coverable lines are covered by tests. 📢 Thoughts on this report? Let us know! |
Ensures compatibility with the zxc library's current development state, enabling accurate benchmarking for features such as the recently added SSE4.2 kernels.
Integrates the build process for SSE4.2 optimized compression, decompression, and Huffman kernels into the Rust `zxc-sys` wrapper. This enables the Rust binding to leverage the recently added SSE4.2 SIMD implementations for x86-64, providing wider hardware acceleration.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Introduces a new set of hand-written SIMD optimized kernels leveraging SSE4.2 instruction set extensions for x86-64 processors.
These optimizations target CPUs that support SSE4.2 but lack AVX2 or AVX512, providing a performance uplift over the scalar fallback. The changes accelerate core compression and decompression routines, including:
The update also includes: