Skip to content

fatlipp/CUdex

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

14 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

CUdex

Cross-platform CUDA PTX/CUBIN extractor and decompressor. Extracts and decompresses CUBIN and PTX sections from a binary (or fatbin) or from folder with binaries (and fatbins).

Build

Conan

The simplest way to compile - is using Conan 2:

  • conan install . --output-folder=build -s build_type=Release --build=missing
  • cmake -S . -B build/release -DCMAKE_TOOLCHAIN_FILE=build/.conan/conan_toolchain.cmake -DCMAKE_BUILD_TYPE=Release
  • cmake --build build/release --config Release

Manual

Depencies:

  • spdlog/1.15.3 - logging
  • llvm-core/19.1.7 - read binary and find sections
  • zstd/1.5.6 - decompress ZSTD
  • lz4/1.10.0 - decompress LZ4

Build

  • cmake -S . -B build/release -DCMAKE_BUILD_TYPE=Release
  • cmake --build build/release --config Release

Target

There are two target to be built: a cudexlib, and a simple cudex-cli tool.

Usage

  • ./build/release/cli/cudex-cli ./PATH/TO/BINARY/OR/FOLDER ./output/folder/

Output

cudex-cli stores all the found PTX and CUBIN files to the specified folder

Example trace

Fatbin Section at offset: 16. Kind: 1. Header: 72. ContentSize: (Total: 1488, Compressed: 1481, Decompressed: 3533), Version: 8.0, Arch: 80, ptxOptionsOffset: 64, OS: Linux, Compression Method: Lz4, BuildType: Release, Identifier:  (len: 0), Opts:  (len: 0)
Fatbin Section at offset: 1576. Kind: 2. Header: 64. ContentSize: (Total: 11816, Compressed: 0, Decompressed: 0), Version: 1.7, Arch: 80, ptxOptionsOffset: 0, OS: Linux, Compression Method: None, BuildType: Release, Identifier:  (len: 0), Opts:  (len: 0)

Arch: sm_80, Version: [8.0], Os: Linux, Compression: Lz4, Size: 3533
Save PTX
Write success: ./output/ptx_sm80_v8.0_Linux.bin

Arch: sm_80, Version: [1.7], Os: Linux, Compression: None, Size: 11816
Save CUBIN
Write success: ./output/cubin_sm80_v1.7_Linux.bin

DOCS

Algorithm (TBD)

This section describes the underlying algorithm.

Extraction

  • Locate CUDA-specific sections by magic values or section names.
  • Read the fatbin header and determine its version, header size, total size, build options, etc.
  • Parse each fatbin section header.

Decompression

Methods

Zstandard (ZSTD) and LZ4 are used to compress CUBIN and PTX entries, depending on nvcc compression settings (e.g., size, speed, none).

Decompression steps

  • Determine the compression type from the fatbin section header: None, LZ4, or ZSTD.
  • Decompress the underlying section (CUBIN or PTX).
  • The offset to the next section is computed using the following formula: next_offset = current_offset + current_data_size.

Version info

PTX

  • arch - which GPU family it targets (sm_80)
  • codeVersion - PTX language version (e.g. 8.0)

CUBIN

  • arch - which GPU family it runs on (sm_80)
  • codeVersion - cubin/SASS format version (e.g. 1.7)

Limitations and known issues

  • Tested on macOS and Windows 11.
  • Tested with CUDA 11.4 - 13.0
  • Test data was collected in Debug and Release builds using different compiler flags and compression options (for both nvcc and CMake).
  • Some fields - such as OS, compression method, and build type - are inferred by comparing multiple binaries and may still contain inaccuracies.

License

This library is licensed under the MIT License (see LICENSE).

About

Cross-platform CUDA PTX/CUBIN extractor

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors