Keycarver

Scans raw disk images for Bitcoin private keys by testing every 32-byte sequence against a pre-built index of known blockchain addresses.

Full writeup: dojo7.com/2025/01/08/keycarver

Building

cargo build --release

For GPU acceleration (requires CUDA toolkit and an NVIDIA GPU):

cargo build --release --features cuda

Workflow

There are three steps: build an address index from your Bitcoin node's block files, then scan drive images against it.

1. Build the address index

keycarver index-build --block-dir <path/to/blocks> --index-dir <path/to/index>

Scans all blk*.dat files in block-dir, extracts P2PKH and P2WPKH addresses, and builds a minimal perfect hash index for O(1) lookup. Takes a while on a full node; only needs to be done once. The --factor parameter (default 1.7) controls the MPHF construction trade-off between build time and index size.

2. Query the index (optional sanity check)

keycarver index-query --address <address> --index-dir <path/to/index>

3. Scan a drive image

CPU:

keycarver scan-raw \
  --file <image.bin> \
  --checkpoint-file <image.bin.chk> \
  --index-dir <path/to/index> \
  --cache-size 16777216

GPU (requires --features cuda build):

keycarver scan-raw \
  --file <image.bin> \
  --checkpoint-file <image.bin.chk> \
  --index-dir <path/to/index> \
  --gpu \
  --gpu-chunk-size 4194304

Tests every byte offset in the file as a candidate 32-byte private key. Checks each valid key against the index. Saves progress to --checkpoint-file every second so interrupted scans can be resumed.

CPU options: --cache-size controls the deduplication cache (entries of 32 bytes each, ~64 bytes overhead per entry); the default 16M entries uses ~1GB of RAM.

GPU options: --gpu-chunk-size sets the batch size in bytes (default 1MB; 4–16MB recommended). Checkpoint files are compatible between CPU and GPU runs — you can switch modes and resume.

Output lines look like:

priv: <hex>, pkh: <hex>, p2pkh: <1addr>, p2wpkh: <bc1addr>, offset: <byte offset>

Checking recovered keys

Once you have results, balance_check.py checks each recovered key's addresses against the blockchain:

uv sync
uv run balance_check.py checkpoints/ --output results.csv

Checks both P2PKH and P2WPKH address forms for each key, deduplicates keys appearing across multiple checkpoint files, and writes a full CSV. Hits are printed immediately as they're found.

How it works

The scanner reads the image with a 32-byte sliding window, one byte at a time. Each window is:

Validated against the secp256k1 curve order
Converted to a compressed public key via scalar multiplication
Hashed: RIPEMD160(SHA256(pubkey)) to get the public key hash (PKH)
Looked up in the index

The index is built using boomphf — a minimal perfect hash function over all known PKHs. At query time, the MPHF maps a PKH to an offset in a memory-mapped flat file storing the actual PKH bytes at that position. A match requires the stored value to equal the query, ruling out false positives from hash collisions.

The CPU path filters repeated byte sequences with a deduplication cache before the EC multiplication step.

The GPU path runs the full SK→PKH pipeline (secp256k1 scalar multiply → SHA256 → RIPEMD160) in CUDA, one thread per byte offset. A precomputed table of 256 points (G, 2G, 4G, …, 2²⁵⁵·G) is generated in Rust and uploaded to the GPU once at startup. A double-buffer pipeline overlaps GPU computation with CPU-side index lookups (parallelised with rayon) so neither side sits idle waiting for the other.

Performance

Mode	Rate	Notes
CPU	~330k keys/sec	5975WX, 64 threads
GPU	~166 Mk/sec	RTX 3090, PCIe 4.0 ×16

GPU throughput is limited by the CPU-side MPHF index lookup, not the CUDA kernel — the GPU finishes each batch well before the CPU consumes the results. The kernel itself has low SM occupancy (~192 registers/thread → 1–2 warps/SM on RTX 3090), but the bottleneck is the random-access memory latency of the MPHF lookup across 64 rayon threads.

The D→H transfer uses regular pinned host memory (cuMemHostAlloc with flags=0). Using write-combining pinned memory (CU_MEMHOSTALLOC_WRITECOMBINED) makes D→H async but CPU reads uncached, reducing throughput ~300×.

Index startup: ~1 second for a full-blockchain index (~17GB index, ~370MB MPHF)
Memory: MPHF loaded into RAM (~370MB), index file memory-mapped

Limitations

Only finds keys stored as a contiguous 32-byte big-endian sequence. Keys in wallet file formats (Bitcoin Core, Electrum, etc.) won't be found this way — use btc-recover instead.
No support for HD wallet derivation (BIP-32). Experimental support is on a feature branch.
GPU build requires CUDA 12.x toolkit and a compute capability 8.6+ GPU. Update the cuda-12090 feature in Cargo.toml and -arch=sm_86 in build.rs to match a different CUDA version or GPU architecture.
No support from this maintainer.

Contributions

Fork and enjoy.
If this helps you uncover a massive treasure trove, I'll happily accept a few LBMA Good Delivery gold bars by way of thanks.

Name		Name	Last commit message	Last commit date
Latest commit History 35 Commits
.idea		.idea
kernels		kernels
src		src
.gitignore		.gitignore
CITATION.cff		CITATION.cff
Cargo.toml		Cargo.toml
LICENSE.txt		LICENSE.txt
README.md		README.md
balance_check.py		balance_check.py
build.rs		build.rs
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Keycarver

Building

Workflow

Checking recovered keys

How it works

Performance

Limitations

Contributions

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Keycarver

Building

Workflow

Checking recovered keys

How it works

Performance

Limitations

Contributions

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages