feat: Adds SSE4.2 optimized for x86-64 by hellobertrand · Pull Request #258 · hellobertrand/zxc

hellobertrand · 2026-05-30T19:36:28Z

Introduces a new set of hand-written SIMD optimized kernels leveraging SSE4.2 instruction set extensions for x86-64 processors.

These optimizations target CPUs that support SSE4.2 but lack AVX2 or AVX512, providing a performance uplift over the scalar fallback. The changes accelerate core compression and decompression routines, including:

LZ77 match finding
Numerical block delta encoding and decoding
Dynamic programming table updates
Literal and run length detection

The update also includes:

Integration of SSE4.2 as a distinct build variant within CMake and Meson build systems.
Enhanced runtime CPU feature detection and dispatch logic to dynamically select the most advanced available SIMD implementation (AVX512 > AVX2 > SSE4.2 > scalar).
A dedicated CI job for SSE4.2 specific builds to ensure proper testing and validation.

Introduces hand-written SSE4.2 SIMD implementations for core compression and decompression routines. This provides a performance uplift for x86-64 CPUs that support SSE4.2 (and its implied instruction sets like SSSE3/SSE4.1) but lack AVX2 or AVX512, broadening SIMD acceleration to more hardware. Includes optimized paths for: - LZ77 match finding (16-byte comparisons). - Numerical block encoding/decoding (delta-of-delta, zigzag, prefix sum). - Dynamic programming cost updates for optimal parsing. - Byte run and literal match searching. - Overlapping memory copy utilities. The build system, CPU feature detection, and function dispatch logic have been updated to support the new SSE4.2 variant, prioritizing it over generic scalar code when AVX2/AVX512 are not available.

Introduces a new multiarch workflow job configured to run on a `Nehalem` CPU emulation. This ensures that the recently added SSE4.2 optimized kernels are properly tested on systems that support SSE4.2 but do not have AVX2 or AVX512, verifying correct feature dispatch and execution.

codecov · 2026-05-30T19:37:39Z

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

Ensures compatibility with the zxc library's current development state, enabling accurate benchmarking for features such as the recently added SSE4.2 kernels.

Integrates the build process for SSE4.2 optimized compression, decompression, and Huffman kernels into the Rust `zxc-sys` wrapper. This enables the Rust binding to leverage the recently added SSE4.2 SIMD implementations for x86-64, providing wider hardware acceleration.

hellobertrand added 2 commits May 30, 2026 21:35

hellobertrand added 2 commits May 30, 2026 22:03

ci: Use zxc-specific LZbench fork for benchmarks

2c3f927

Ensures compatibility with the zxc library's current development state, enabling accurate benchmarking for features such as the recently added SSE4.2 kernels.

hellobertrand changed the title ~~feat: Adds SSE4.2 optimized for x86-64~~ feat: Adds SSE4.2 optimized for x86_64 May 30, 2026

hellobertrand changed the title ~~feat: Adds SSE4.2 optimized for x86_64~~ feat: Adds SSE4.2 optimized for x86-64 May 30, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: Adds SSE4.2 optimized for x86-64#258

feat: Adds SSE4.2 optimized for x86-64#258
hellobertrand wants to merge 4 commits into
mainfrom
feat/simd-sse42

hellobertrand commented May 30, 2026

Uh oh!

codecov Bot commented May 30, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

hellobertrand commented May 30, 2026

Uh oh!

codecov Bot commented May 30, 2026

Codecov Report

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant