Feature Request: Add Hardware Acceleration for AES and CRC32 in RAR Decompression

**Is your feature request related to a problem? Please describe.**

First off, thank you for creating and maintaining XADMaster. It's an incredibly comprehensive and well-architected library for archive decompression.

While profiling the RAR5 decompression performance, especially with encrypted archives, I noticed that the AES and CRC32 computations are currently implemented purely in software. While these implementations are correct and robust, they don't leverage the specialized hardware instructions available in most modern CPUs. This can lead to a significant performance gap when decompressing large, encrypted RAR files or when verifying checksums on very fast storage systems.

**Describe the solution you'd like**

I would like to propose the addition of hardware acceleration for AES and CRC32 calculations within the RAR decompression modules (`XADRARAESHandle` and `XADCRCHandle`/`CRC.m` respectively).

This would involve:
1.  **Runtime CPU feature detection** to check for the presence of the required instruction sets.
2.  Implementing **alternative code paths** that use CPU intrinsics for acceleration when available.
3.  Falling back to the existing pure software implementation on older hardware that lacks these features.

This approach would maintain broad compatibility while unlocking significant performance gains on modern systems.

**Technical Details & Implementation Suggestions:**

**1. AES Acceleration (AES-NI on x86 and ARM Cryptography Extensions)**

*   **x86/x86-64 Architecture (Intel/AMD):**
    *   **Instruction Set:** AES-NI (Advanced Encryption Standard New Instructions).
    *   **Implementation:** Use intrinsics from `<wmmintrin.h>` (for SSSE3/AES-NI). Functions like `_mm_aesdec_si128` and `_mm_aesdeclast_si128` can replace the software-based decryption loops in `XADRARAESHandle.m`. CPU support can be detected at runtime via `__cpuid`.

*   **ARMv8-A Architecture (Apple Silicon, modern iOS devices, etc.):**
    *   **Instruction Set:** ARMv8 Cryptography Extensions (ARMCE).
    *   **Implementation:** Use NEON intrinsics from `<arm_neon.h>`. Instructions like `vaesdq_u8` (AES decrypt one round) and `vaesimcq_u8` (AES inverse mix columns) can provide hardware-accelerated decryption. CPU feature detection can be done via `sysctlbyname` on macOS/iOS or `getauxval` on Linux.

**2. CRC32 Acceleration (SSE4.2 on x86 and ARM CRC32 Instructions)**

XADMaster already uses a highly optimized Slicing-by-16 software implementation (`XADCalculateCRCFast`), which is great. Hardware acceleration could push this even further.

*   **x86/x86-64 Architecture (Intel/AMD):**
    *   **Instruction Set:** SSE4.2 provides the `crc32` instruction.
    *   **Implementation:** Use the `_mm_crc32_u8`, `_mm_crc32_u32`, and `_mm_crc32_u64` intrinsics from `<nmmintrin.h>`. This can often outperform even the best software-based table methods, especially when the bottleneck is the CPU.

*   **ARMv8-A Architecture (Apple Silicon, etc.):**
    *   **Instruction Set:** ARMv8 CRC32 extensions.
    *   **Implementation:** Use NEON intrinsics like `__crc32d` or `__builtin_arm_crc32d` to compute CRC32 on 64-bit chunks of data at a time. This is significantly faster than software methods on ARM hardware.

**Describe alternatives you've considered**

The current software implementation is a perfectly valid alternative for ensuring maximum portability. However, given that hardware support for these instructions is now nearly ubiquitous across all major platforms (macOS, iOS, Windows, Linux on both x86 and ARM), adding accelerated paths seems like a natural evolution that would greatly benefit users without sacrificing compatibility.

**Additional context**

The performance improvement would be most noticeable in the following scenarios:
*   Extracting large files from password-protected RAR archives.
*   Testing the integrity of large archives (`-t` command in `unar`).
*   Operating on battery-powered devices like MacBooks and iPhones, where hardware-accelerated crypto is not only faster but also significantly more power-efficient.

Thank you for considering this feature request. I believe it would be a valuable enhancement to an already excellent library.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature Request: Add Hardware Acceleration for AES and CRC32 in RAR Decompression #176

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Feature Request: Add Hardware Acceleration for AES and CRC32 in RAR Decompression #176

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions