Skip to content

Feature Request: Add Hardware Acceleration for AES and CRC32 in RAR Decompression #176

@daiaji

Description

@daiaji

Is your feature request related to a problem? Please describe.

First off, thank you for creating and maintaining XADMaster. It's an incredibly comprehensive and well-architected library for archive decompression.

While profiling the RAR5 decompression performance, especially with encrypted archives, I noticed that the AES and CRC32 computations are currently implemented purely in software. While these implementations are correct and robust, they don't leverage the specialized hardware instructions available in most modern CPUs. This can lead to a significant performance gap when decompressing large, encrypted RAR files or when verifying checksums on very fast storage systems.

Describe the solution you'd like

I would like to propose the addition of hardware acceleration for AES and CRC32 calculations within the RAR decompression modules (XADRARAESHandle and XADCRCHandle/CRC.m respectively).

This would involve:

  1. Runtime CPU feature detection to check for the presence of the required instruction sets.
  2. Implementing alternative code paths that use CPU intrinsics for acceleration when available.
  3. Falling back to the existing pure software implementation on older hardware that lacks these features.

This approach would maintain broad compatibility while unlocking significant performance gains on modern systems.

Technical Details & Implementation Suggestions:

1. AES Acceleration (AES-NI on x86 and ARM Cryptography Extensions)

  • x86/x86-64 Architecture (Intel/AMD):

    • Instruction Set: AES-NI (Advanced Encryption Standard New Instructions).
    • Implementation: Use intrinsics from <wmmintrin.h> (for SSSE3/AES-NI). Functions like _mm_aesdec_si128 and _mm_aesdeclast_si128 can replace the software-based decryption loops in XADRARAESHandle.m. CPU support can be detected at runtime via __cpuid.
  • ARMv8-A Architecture (Apple Silicon, modern iOS devices, etc.):

    • Instruction Set: ARMv8 Cryptography Extensions (ARMCE).
    • Implementation: Use NEON intrinsics from <arm_neon.h>. Instructions like vaesdq_u8 (AES decrypt one round) and vaesimcq_u8 (AES inverse mix columns) can provide hardware-accelerated decryption. CPU feature detection can be done via sysctlbyname on macOS/iOS or getauxval on Linux.

2. CRC32 Acceleration (SSE4.2 on x86 and ARM CRC32 Instructions)

XADMaster already uses a highly optimized Slicing-by-16 software implementation (XADCalculateCRCFast), which is great. Hardware acceleration could push this even further.

  • x86/x86-64 Architecture (Intel/AMD):

    • Instruction Set: SSE4.2 provides the crc32 instruction.
    • Implementation: Use the _mm_crc32_u8, _mm_crc32_u32, and _mm_crc32_u64 intrinsics from <nmmintrin.h>. This can often outperform even the best software-based table methods, especially when the bottleneck is the CPU.
  • ARMv8-A Architecture (Apple Silicon, etc.):

    • Instruction Set: ARMv8 CRC32 extensions.
    • Implementation: Use NEON intrinsics like __crc32d or __builtin_arm_crc32d to compute CRC32 on 64-bit chunks of data at a time. This is significantly faster than software methods on ARM hardware.

Describe alternatives you've considered

The current software implementation is a perfectly valid alternative for ensuring maximum portability. However, given that hardware support for these instructions is now nearly ubiquitous across all major platforms (macOS, iOS, Windows, Linux on both x86 and ARM), adding accelerated paths seems like a natural evolution that would greatly benefit users without sacrificing compatibility.

Additional context

The performance improvement would be most noticeable in the following scenarios:

  • Extracting large files from password-protected RAR archives.
  • Testing the integrity of large archives (-t command in unar).
  • Operating on battery-powered devices like MacBooks and iPhones, where hardware-accelerated crypto is not only faster but also significantly more power-efficient.

Thank you for considering this feature request. I believe it would be a valuable enhancement to an already excellent library.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions