Skip to content

Unmanaged-Bytes/bit-crafts

bit-crafts

CI License: GPL v3 Libraries: LGPL v3 C: 11

A C11 monorepo of small system libraries and CLI tools for fast file inspection on Linux. Built around io_uring for I/O, hardware-accelerated hashing (SHA-NI / AVX2), and a lock-free worker pool.

A GTK4 + libadwaita desktop frontend, BitCrafts Vigil, ships the file-integrity workflow to non-CLI users — packaged for Snap and Flatpak.


About this project

I had not written production C in nearly 20 years. The implementation in this repository was produced almost entirely with Claude Code; I owned the architecture, the API shape, and the design decisions, but the line-by-line code is AI-generated.

This started as personal R&D — a way to revisit modern Linux systems programming (io_uring, lock-free workers, hardware-accelerated hashing) with an AI pair-programmer. It runs on my own machine every day, but it has not been audited for production use and it almost certainly contains bugs, rough edges, or choices a seasoned C engineer would push back on.

Released under GPL-3.0-or-later (tools) and LGPL-3.0-or-later (libraries) — feedback, issues, and PRs are welcome.


Tools

bchash — fast file hashing

Recursive directory hashing compatible with sha256sum output format. Supports SHA-1/256/512, MD5, BLAKE2b, xxh64, xxh3-128.

bchash hash --type=sha256 /path/to/dir > manifest.sha256
bchash check manifest.sha256                 # re-verify the tree
bchash diff old.sha256 new.sha256            # show what changed

bcduplicate — duplicate file finder

Two-stage funnel (size group → full hash) with parallel walk. Outputs jdupes-compatible groups, JSON, or actionable scripts (delete / hardlink / shell).

bcduplicate scan /path/to/dir                # JSON report
bcduplicate prune --action=hardlink /path/to/dir   # collapse via hardlinks
bcduplicate prune --action=script  /path/to/dir > prune.sh

bcintegrity — directory tree manifests

JSONL manifests (schema_version: 1) capturing path, kind, digest, size, mode, ownership, mtime, inode, link count. Roundtrips via verify; diffs two manifests for change detection.

bcintegrity manifest --output=tree.jsonl /path/to/dir
bcintegrity verify   /path/to/dir tree.jsonl
bcintegrity diff     old.jsonl new.jsonl

All three tools accept --threads=mono|compute|io, --memory-budget=N, emit --describe JSON (for completion / introspection), and follow standard exit codes documented in --help.


Desktop frontend

bitcrafts-vigil — GTK4 + libadwaita GUI

A native GNOME desktop wrapper around bchash. Snapshot a folder, verify it later, browse per-folder history, compare any two snapshots side-by-side, export findings as a self-contained ZIP. Hash-only for v0.1.0 (algorithms: blake3 default, xxh3, sha256, plus the full bc-hash family in an Advanced expander). EN + FR locales shipped.

Vigil never links against the C libraries — it spawns bchash via Gio.Subprocess with explicit argv and parses the JSONL output. That isolation lets the GUI survive a CLI crash, and the same backend is reusable by any other frontend.

See applications/vigil/README.md for screenshots and architecture. Snap Store and Flathub listings will follow the v0.1.0 tag; until then, build from source per the CONTRIBUTING guide.


Build from source

Requires Debian/Ubuntu 24.04+ (CI baseline). Install dependencies:

scripts/install-deps.sh build      # build / test / sanitize
scripts/install-deps.sh bench      # adds comparator binaries (jdupes, fclones, ...)
scripts/install-deps.sh perf       # adds perf, sysstat, hyperfine
scripts/install-deps.sh check      # report what is present / missing

liburing >= 2.6 is required — Ubuntu 24.04 ships 2.5 which lacks io_uring_prep_read_multishot. CI builds liburing 2.7 from source. If your distro ships an older version, build liburing from https://github.com/axboe/liburing before running scripts/bx build.

Build, test, install:

scripts/bx build release
scripts/bx test  debug             # runs the full test suite (~80 cases)
sudo meson install -C build/release

Variants exposed by scripts/bx: debug, release, coverage, asan, tsan, ubsan, bench. The matrix command iterates over a default set (tsan asan debug release).


Repository layout

.
├── meson.build              # top-level project (license : 'GPL-3.0-or-later')
├── meson_options.txt        # tests, benchmarks, fuzzing, arch, io_uring, blake3
├── subprojects/             # bc-* libraries (each ships its own LICENSE)
│   ├── bc-core              # CPU primitives (hash, SIMD memory, math)
│   ├── bc-allocators        # pool / arena / slab / context allocators
│   ├── bc-containers        # vector / map / set / ring / tree / bitset
│   ├── bc-concurrency       # threads, dispatch, lock-free queue, slots
│   ├── bc-io                # streams, filesystem helpers, mmap, io_uring
│   └── bc-runtime           # lifecycle, logging, config, metrics, CLI
├── tools/
│   ├── bc-hash              # bchash binary
│   ├── bc-duplicate         # bcduplicate binary
│   └── bc-integrity         # bcintegrity binary
├── applications/
│   └── vigil/               # GTK4 + libadwaita desktop frontend (Python)
│                            #   Snap + Flatpak packaging in packaging/
└── scripts/
    ├── bx                   # build / quality / profile front-end
    ├── bench.sh             # mono | compute | io | profile | correctness | perf-mode | datasets
    ├── install-deps.sh      # apt packages by mode (build|bench|perf|all)
    └── install-hooks.sh     # pre-commit (clang-format)

Benchmarks

Published numbers, methodology, datasets and reproduction recipe live in benchmarks.md. Raw JSON logs are committed under benchmarks/<YYYY-MM-DD>/. Headline (2026-05-16, AMD Ryzen 7 5700G, Linux 6.12.86 kernel-source corpus): bchash SHA-256 is 2.88× faster than sha256sum -P16 (uutils 0.8.0), bcduplicate is 9.06× faster than jdupes, bcintegrity is 18.22× faster than mtree.


License

The CLI tools (bcduplicate, bchash, bcintegrity) are licensed under the GNU General Public License v3.0 or later — see LICENSE.

The reusable libraries under subprojects/bc-*/ are licensed under the GNU Lesser General Public License v3.0 or later. Each directory ships its own LICENSE so libraries and tools can be redistributed independently.

This combination keeps the libraries usable from proprietary code (via dynamic linking, with modifications to the libraries themselves staying open) while ensuring the end-user tools remain copyleft. Contributors must sign off every commit under the Developer Certificate of Origin so the project's license chain stays clean.