Cross-platform CUDA PTX/CUBIN extractor and decompressor. Extracts and decompresses CUBIN and PTX sections from a binary (or fatbin) or from folder with binaries (and fatbins).
The simplest way to compile - is using Conan 2:
conan install . --output-folder=build -s build_type=Release --build=missingcmake -S . -B build/release -DCMAKE_TOOLCHAIN_FILE=build/.conan/conan_toolchain.cmake -DCMAKE_BUILD_TYPE=Releasecmake --build build/release --config Release
spdlog/1.15.3- loggingllvm-core/19.1.7- read binary and find sectionszstd/1.5.6- decompress ZSTDlz4/1.10.0- decompress LZ4
cmake -S . -B build/release -DCMAKE_BUILD_TYPE=Releasecmake --build build/release --config Release
There are two target to be built: a cudexlib, and a simple cudex-cli tool.
./build/release/cli/cudex-cli ./PATH/TO/BINARY/OR/FOLDER ./output/folder/
cudex-cli stores all the found PTX and CUBIN files to the specified folder
Fatbin Section at offset: 16. Kind: 1. Header: 72. ContentSize: (Total: 1488, Compressed: 1481, Decompressed: 3533), Version: 8.0, Arch: 80, ptxOptionsOffset: 64, OS: Linux, Compression Method: Lz4, BuildType: Release, Identifier: (len: 0), Opts: (len: 0)
Fatbin Section at offset: 1576. Kind: 2. Header: 64. ContentSize: (Total: 11816, Compressed: 0, Decompressed: 0), Version: 1.7, Arch: 80, ptxOptionsOffset: 0, OS: Linux, Compression Method: None, BuildType: Release, Identifier: (len: 0), Opts: (len: 0)
Arch: sm_80, Version: [8.0], Os: Linux, Compression: Lz4, Size: 3533
Save PTX
Write success: ./output/ptx_sm80_v8.0_Linux.bin
Arch: sm_80, Version: [1.7], Os: Linux, Compression: None, Size: 11816
Save CUBIN
Write success: ./output/cubin_sm80_v1.7_Linux.bin
This section describes the underlying algorithm.
- Locate CUDA-specific sections by magic values or section names.
- Read the fatbin header and determine its version, header size, total size, build options, etc.
- Parse each fatbin section header.
Zstandard (ZSTD) and LZ4 are used to compress CUBIN and PTX entries, depending on nvcc compression settings (e.g., size, speed, none).
- Determine the compression type from the fatbin section header: None, LZ4, or ZSTD.
- Decompress the underlying section (CUBIN or PTX).
- The offset to the next section is computed using the following formula:
next_offset = current_offset + current_data_size.
- arch - which GPU family it targets (sm_80)
- codeVersion - PTX language version (e.g. 8.0)
- arch - which GPU family it runs on (sm_80)
- codeVersion - cubin/SASS format version (e.g. 1.7)
- Tested on macOS and Windows 11.
- Tested with CUDA 11.4 - 13.0
- Test data was collected in Debug and Release builds using different compiler flags and compression options (for both
nvccand CMake). - Some fields - such as OS, compression method, and build type - are inferred by comparing multiple binaries and may still contain inaccuracies.
This library is licensed under the MIT License (see LICENSE).