Skip to content

build: enable hardware-accelerated BLAKE3 in cas_core#2

Merged
Zhihaoi merged 5 commits into
mainfrom
worktree-blake3-hwaccel
May 23, 2026
Merged

build: enable hardware-accelerated BLAKE3 in cas_core#2
Zhihaoi merged 5 commits into
mainfrom
worktree-blake3-hwaccel

Conversation

@Zhihaoi
Copy link
Copy Markdown
Collaborator

@Zhihaoi Zhihaoi commented May 23, 2026

Summary

  • Vendors BLAKE3 1.8.4 SIMD .c files (SSE2 / SSE4.1 / AVX2 / AVX512 / NEON) into include/blake3/ and wires them into cas_core with per-file -msse* / -mavx* (and MSVC /arch: equivalents) scoped via set_source_files_properties — the rest of cas_core keeps its baseline ISA. blake3_dispatch.c now picks the best path at runtime via CPUID / getauxval.
  • Adds AGENTVFS_BLAKE3_SIMD CMake option (default ON). When OFF, BLAKE3_NO_AVX512 / BLAKE3_NO_AVX2 / BLAKE3_NO_SSE41 / BLAKE3_NO_SSE2 / BLAKE3_USE_NEON=0 come back and the test target is omitted — clean rollback.
  • New cas_test_blake3_simd verifies (a) BLAKE3 of the empty input equals the canonical spec value and (b) the per-arch SIMD entry point (blake3_hash_many_avx2 on x86_64, blake3_hash_many_neon on aarch64 / ARM64EC) is linked into cas_core. Wired into the linux / macos / windows CI jobs.
  • README: notes the MSVC v141 (VS 2017 15.3) floor for /arch:AVX512 codegen in the Windows build-from-source block.

Plan: docs/superpowers/plans/2026-05-23-blake3-hwaccel.md
Spec: docs/superpowers/specs/2026-05-23-blake3-hwaccel-design.md

Test plan

  • Linux x86_64: `AGENTVFS_BLAKE3_SIMD=ON` — full unit-test suite (9 binaries) passes, no warnings on the vendored SIMD `.c` files under `-Wall -Wextra -Wpedantic`
  • Linux x86_64: `AGENTVFS_BLAKE3_SIMD=OFF` rollback — builds clean, `cas_test_blake3_simd` correctly omitted
  • macOS arm64 CI leg (exercises the NEON path and the `_M_ARM64EC` carve-out by analogy)
  • Windows x86_64 MSVC CI leg (exercises `/arch:SSE2 / /arch:AVX / /arch:AVX2 / /arch:AVX512`)
  • `windows-daemon` job intentionally untouched (SIMD test belongs to `windows:` only)

🤖 Generated with Claude Code

Zhihaoi and others added 5 commits May 23, 2026 08:23
These five files are copied verbatim from the upstream BLAKE3 1.8.4
distribution.  They are not yet compiled or wired into CMake — that
will happen in a subsequent commit.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Adds AGENTVFS_BLAKE3_SIMD CMake option (default ON), wires the
vendored SIMD C sources into cas_core with per-file -msse/-mavx
flags scoped to those translation units only. Drops the
BLAKE3_NO_AVX512 / BLAKE3_NO_AVX2 / BLAKE3_NO_SSE41 / BLAKE3_NO_SSE2
defines and the forced BLAKE3_USE_NEON=0. Adds cas_test_blake3_simd
which verifies (a) BLAKE3 of the empty input matches the canonical
spec vector and (b) the per-arch SIMD entry point is linked into
cas_core. Gates the test target on AGENTVFS_BLAKE3_SIMD=ON so the
rollback build is clean.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Two review fixes against the BLAKE3 SIMD enablement:

- Add `_M_ARM64EC` to the test's aarch64 branch in
  tests/cas/test_blake3_simd.cpp. blake3_impl.h treats
  ARM64EC as aarch64 and dispatches NEON at runtime, so the
  test should verify the NEON entry point on that target too.

- Expand the MSVC comment in CMakeLists.txt to note that
  /arch:SSE2 is x86-only and is silently ignored on x64 (where
  SSE2 is the implicit baseline). Documents that the flag is
  harmless rather than load-bearing.

Also fills in the canonical empty-input hash hex in the test
source's KAT comment (was a stray "hash =" left over from an
earlier edit).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@Zhihaoi Zhihaoi merged commit ddedadb into main May 23, 2026
5 checks passed
@Zhihaoi Zhihaoi deleted the worktree-blake3-hwaccel branch May 23, 2026 09:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant