Skip to content

feat: Adds SSE4.2 optimized for x86-64#258

Open
hellobertrand wants to merge 4 commits into
mainfrom
feat/simd-sse42
Open

feat: Adds SSE4.2 optimized for x86-64#258
hellobertrand wants to merge 4 commits into
mainfrom
feat/simd-sse42

Conversation

@hellobertrand
Copy link
Copy Markdown
Owner

Introduces a new set of hand-written SIMD optimized kernels leveraging SSE4.2 instruction set extensions for x86-64 processors.

These optimizations target CPUs that support SSE4.2 but lack AVX2 or AVX512, providing a performance uplift over the scalar fallback. The changes accelerate core compression and decompression routines, including:

  • LZ77 match finding
  • Numerical block delta encoding and decoding
  • Dynamic programming table updates
  • Literal and run length detection

The update also includes:

  • Integration of SSE4.2 as a distinct build variant within CMake and Meson build systems.
  • Enhanced runtime CPU feature detection and dispatch logic to dynamically select the most advanced available SIMD implementation (AVX512 > AVX2 > SSE4.2 > scalar).
  • A dedicated CI job for SSE4.2 specific builds to ensure proper testing and validation.

Introduces hand-written SSE4.2 SIMD implementations for core compression and
decompression routines. This provides a performance uplift for x86-64 CPUs
that support SSE4.2 (and its implied instruction sets like SSSE3/SSE4.1) but
lack AVX2 or AVX512, broadening SIMD acceleration to more hardware.

Includes optimized paths for:
- LZ77 match finding (16-byte comparisons).
- Numerical block encoding/decoding (delta-of-delta, zigzag, prefix sum).
- Dynamic programming cost updates for optimal parsing.
- Byte run and literal match searching.
- Overlapping memory copy utilities.

The build system, CPU feature detection, and function dispatch logic have
been updated to support the new SSE4.2 variant, prioritizing it over generic
scalar code when AVX2/AVX512 are not available.
Introduces a new multiarch workflow job configured to run on a `Nehalem` CPU emulation. This ensures that the recently added SSE4.2 optimized kernels are properly tested on systems that support SSE4.2 but do not have AVX2 or AVX512, verifying correct feature dispatch and execution.
@codecov
Copy link
Copy Markdown

codecov Bot commented May 30, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

Ensures compatibility with the zxc library's current development state, enabling accurate benchmarking for features such as the recently added SSE4.2 kernels.
Integrates the build process for SSE4.2 optimized compression, decompression, and Huffman kernels into the Rust `zxc-sys` wrapper. This enables the Rust binding to leverage the recently added SSE4.2 SIMD implementations for x86-64, providing wider hardware acceleration.
@hellobertrand hellobertrand changed the title feat: Adds SSE4.2 optimized for x86-64 feat: Adds SSE4.2 optimized for x86_64 May 30, 2026
@hellobertrand hellobertrand changed the title feat: Adds SSE4.2 optimized for x86_64 feat: Adds SSE4.2 optimized for x86-64 May 30, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant