feat: add Windows ARM64 (MSVC) build support#352
Conversation
|
hi, @Aanerud, it would be beneficial to introduce a CI workflow for Windows ARM64 to guarantee ongoing integration support. |
Extends the existing Windows job in 05-windows-build.yml to also cover `windows-11-arm` so the MSVC ARM64 build path is exercised on every PR, per request from @feihongxu0824 on alibaba#352. Changes: * Add a third row to the matrix with `platform: windows-11-arm`, `msvc_arch: arm64`, and `python_version: '3.11'` (Python 3.10 has no official Windows-on-ARM installer; 3.11 is the first). * Parameterize the existing `ilammy/msvc-dev-cmd@v1` step on `matrix.msvc_arch` instead of hard-coded `x64`, and the `actions/setup-python@v6` step on `matrix.python_version`. No changes to the x64 rows (still Python 3.10 + MSVC x64) and no changes to the build/test steps themselves — same `pip install -v .`, same C++ unittest run, same pytest, same examples. `fail-fast: false` was already set so an ARM64 regression will not hide x64 regressions and vice versa.
|
Thanks for the review @feihongxu0824 — CLA signed, and just pushed Kept it minimal to match the existing style:
Happy to split the ARM64 row into its own reusable workflow file (e.g. |
Extends the existing Windows job in 05-windows-build.yml to also cover `windows-11-arm` so the MSVC ARM64 build path is exercised on every PR, per request from @feihongxu0824 on alibaba#352. Changes: * Add a third row to the matrix with `platform: windows-11-arm`, `msvc_arch: arm64`, and `python_version: '3.11'` (Python 3.10 has no official Windows-on-ARM installer; 3.11 is the first). * Parameterize the existing `ilammy/msvc-dev-cmd@v1` step on `matrix.msvc_arch` instead of hard-coded `x64`, and the `actions/setup-python@v6` step on `matrix.python_version`. No changes to the x64 rows (still Python 3.10 + MSVC x64) and no changes to the build/test steps themselves — same `pip install -v .`, same C++ unittest run, same pytest, same examples. `fail-fast: false` was already set so an ARM64 regression will not hide x64 regressions and vice versa.
3f586d1 to
3350494
Compare
|
The
Both asserts are Root cause is test sensitivity, not an index bug. The dataset vectors are constructed with small inter-vector deltas ( Signal pattern that makes me confident in this diagnosis:
Just pushed If you'd rather I do the MSVC-ARM64 NEON work in this PR instead of skipping + follow-up, I'm happy to — let me know. |
| COMPILE_FLAGS "${MATH_MARCH_FLAG_NEON}" | ||
| ) | ||
| endforeach() | ||
| endif() |
There was a problem hiding this comment.
Don't we need to consider -march optimizations for MSVC on ARM64 here?
| // and on x64 MSVC with SSE/AVX2). Re-enable once a MSVC-ARM64 NEON kernel | ||
| // (using <arm_neon.h> gated on _M_ARM64) lands. | ||
| GTEST_SKIP() << "Skipped on MSVC ARM64: scalar math precision (see " | ||
| "thirdparty/arrow/arrow.windows-arm64.patch PR)"; |
There was a problem hiding this comment.
If we add an else() branch after the if(NOT MSVC) block in src/ailego/CMakeLists.txt to compile the *_neon.cc files for MSVC ARM64, and update the source code guards from __ARM_NEON to __ARM_NEON || _M_ARM64, the NEON kernels will be included. This may resolve the precision issues, allowing us to re-enable these two tests?
|
Apologies for the delay in reviewing this PR. Thank you for your contribution and for providing such a clear description. I’ve submitted some suggestions for your reference. Also, I noticed that the CLA hasn’t been signed yet. |
Enables `pip install .` to produce a native ARM64 Python wheel when building on Windows ARM64 with MSVC + Visual Studio Build Tools 2022. Four small changes cover the gaps: * src/ailego/CMakeLists.txt — the arm/arm64 branch of the AUTO_DETECT_ARCH block had `if(MSVC) return() endif()`, which bailed out of the file before `cc_library(zvec_ailego ...)` was reached. Downstream (`src/binding/c`, `src/core/*`, `src/db/*`, tests, etc.) then failed to configure with "No target zvec_ailego". The NEON source glob + `-march=armv8-a` flag setup is still only useful for GCC/Clang, so wrap just that block in `if(NOT MSVC)` and let the target definition proceed on MSVC ARM64. * src/ailego/internal/cpu_features.cc — the MSVC branch unconditionally used `__cpuidex`, which is x86/x64-only; the GCC branch used `__get_cpuid`, gated only by `!defined(__ARM_ARCH)`. On MSVC ARM64 neither macro applies and the build failed with "__cpuidex: identifier not found". Scope the two branches to x86/x64 explicitly so MSVC ARM64 falls through to the empty-stub constructor that already exists for non-x86 targets. * thirdparty/arrow/CMakeLists.txt — Arrow 21.0 ships xsimd 13.0, which does not implement `xsimd::make_sized_batch_t` for MSVC ARM64. Pass `-DARROW_SIMD_LEVEL=NONE -DARROW_RUNTIME_SIMD_LEVEL=NONE` only when `CMAKE_SYSTEM_PROCESSOR` is ARM64, so x64 MSVC keeps its SSE4.2 path unchanged. * thirdparty/arrow/arrow.windows-arm64.patch (new) — Arrow's vendored PCG header (`arrow/vendored/pcg/pcg_uint128.hpp`) uses an x86-only endianness check and calls `_umul128`, which is not available on MSVC ARM64 (the equivalent intrinsic is `__umulh`). Add `_M_ARM64`, `_M_ARM`, `__aarch64__`, and `__arm__` to the little-endian case, branch to `__umulh` on ARM, and guard `#pragma intrinsic(_umul128)` so it is not referenced on ARM. Applied via the existing `apply_patch_once` mechanism, scoped to MSVC+ARM64. Tested on Windows 11 ARM64 (Surface class hardware) with MSVC 14.44 and VS Build Tools 2022 component `Microsoft.VisualStudio.Component.VC.Tools.ARM64` installed. `pip install .` produces a working `zvec` wheel and the Python extension imports cleanly (`zvec: OK (0.3.2.dev2)`).
Extends the existing Windows job in 05-windows-build.yml to also cover `windows-11-arm` so the MSVC ARM64 build path is exercised on every PR, per request from @feihongxu0824 on alibaba#352. Changes: * Add a third row to the matrix with `platform: windows-11-arm`, `msvc_arch: arm64`, and `python_version: '3.11'` (Python 3.10 has no official Windows-on-ARM installer; 3.11 is the first). * Parameterize the existing `ilammy/msvc-dev-cmd@v1` step on `matrix.msvc_arch` instead of hard-coded `x64`, and the `actions/setup-python@v6` step on `matrix.python_version`. No changes to the x64 rows (still Python 3.10 + MSVC x64) and no changes to the build/test steps themselves — same `pip install -v .`, same C++ unittest run, same pytest, same examples. `fail-fast: false` was already set so an ARM64 regression will not hide x64 regressions and vice versa.
3d13553 to
aa76ffa
Compare
Addresses @feihongxu0824's review comment on alibaba#352. Previously the *_neon.cc files compiled into empty translation units on MSVC ARM64 because their guards (`__ARM_NEON`, `__aarch64__`) are GCC/Clang-only macros, leaving zvec to use the scalar fallback. That fallback hits ~1 ULP precision drift versus the NEON path, which surfaced as two HnswStreamerTest cosine failures and a `NormMatrix.Norm1_General` failure on the `windows-11-arm` CI runner. Changes: * Expand `defined(__ARM_NEON)` -> `(defined(__ARM_NEON) || defined(_M_ARM64))` at the 51 source sites that gate ARM NEON kernels (math, math_batch, utility, normalizer, platform headers, dispatch tables, version probe). * Expand `defined(__aarch64__)` -> `(defined(__aarch64__) || defined(_M_ARM64))` at the 26 sites that distinguish AArch64 from ARMv7 NEON — MSVC ARM64 is AArch64 but does not predefine `__aarch64__`. As a side effect the ARMv7-only polyfills (`vaddvq_f32`/`vaddvq_s32` shims in `distance_matrix_accum_fp32.i`, `distance_matrix_fp32.i`) are correctly skipped under MSVC ARM64, where those intrinsics are built in. * `src/include/zvec/ailego/internal/platform.h`: include `<arm_neon.h>` on the MSVC branch when `_M_ARM64` is defined (it was previously gated behind `!_MSC_VER`, so MSVC ARM64 saw no NEON types). * `src/ailego/CMakeLists.txt`: keep the existing `if(NOT MSVC)` wrapper around the GCC-only `-march=armv8-a` flag, and add an explicit `else()` branch with a comment explaining MSVC ARM64 does not need `-march` (NEON is the ARMv8 baseline on MSVC and the kernels are now picked up via the ALL_SRCS glob with the macro guards above). This supersedes the earlier `test(hnsw): skip two cosine self-match tests on MSVC ARM64` commit (which has been dropped from the branch). The NEON math kernels now use the same precision path as Linux/macOS ARM64, so those tests should pass natively on `windows-11-arm`.
|
Rebased onto current
For commit 3, instead of leaving the kernels gated on the GCC/Clang-only macros, I expanded the source guards as you suggested:
One subtlety: I also dropped the old The CLA-bot pending status was because my earlier commits were authored as Waiting on the |
Addresses @feihongxu0824's review comment on alibaba#352. Previously the *_neon.cc files compiled into empty translation units on MSVC ARM64 because their guards (`__ARM_NEON`, `__aarch64__`) are GCC/Clang-only macros, leaving zvec to use the scalar fallback. That fallback hits ~1 ULP precision drift versus the NEON path, which surfaced as two HnswStreamerTest cosine failures and a `NormMatrix.Norm1_General` failure on the `windows-11-arm` CI runner. Changes: * Expand `defined(__ARM_NEON)` -> `(defined(__ARM_NEON) || defined(_M_ARM64))` at the 51 source sites that gate ARM NEON kernels (math, math_batch, utility, normalizer, platform headers, dispatch tables, version probe). * Expand `defined(__aarch64__)` -> `(defined(__aarch64__) || defined(_M_ARM64))` at the 26 sites that distinguish AArch64 from ARMv7 NEON — MSVC ARM64 is AArch64 but does not predefine `__aarch64__`. As a side effect the ARMv7-only polyfills (`vaddvq_f32`/`vaddvq_s32` shims in `distance_matrix_accum_fp32.i`, `distance_matrix_fp32.i`) are correctly skipped under MSVC ARM64, where those intrinsics are built in. * `src/include/zvec/ailego/internal/platform.h`: include `<arm_neon.h>` on the MSVC branch when `_M_ARM64` is defined (it was previously gated behind `!_MSC_VER`, so MSVC ARM64 saw no NEON types). * `src/ailego/CMakeLists.txt`: keep the existing `if(NOT MSVC)` wrapper around the GCC-only `-march=armv8-a` flag, and add an explicit `else()` branch with a comment explaining MSVC ARM64 does not need `-march` (NEON is the ARMv8 baseline on MSVC and the kernels are now picked up via the ALL_SRCS glob with the macro guards above). This supersedes the earlier `test(hnsw): skip two cosine self-match tests on MSVC ARM64` commit (which has been dropped from the branch). The NEON math kernels now use the same precision path as Linux/macOS ARM64, so those tests should pass natively on `windows-11-arm`.
aa76ffa to
0dc8e61
Compare
Addresses @feihongxu0824's review comment on alibaba#352. Previously the *_neon.cc files compiled into empty translation units on MSVC ARM64 because their guards (`__ARM_NEON`, `__aarch64__`) are GCC/Clang-only macros, leaving zvec to use the scalar fallback. That fallback hits ~1 ULP precision drift versus the NEON path, which surfaced as two HnswStreamerTest cosine failures and a `NormMatrix.Norm1_General` failure on the `windows-11-arm` CI runner. Changes: * Expand `defined(__ARM_NEON)` -> `(defined(__ARM_NEON) || defined(_M_ARM64))` at the 51 source sites that gate ARM NEON kernels (math, math_batch, utility, normalizer, platform headers, dispatch tables, version probe). * Expand `defined(__aarch64__)` -> `(defined(__aarch64__) || defined(_M_ARM64))` at the 26 sites that distinguish AArch64 from ARMv7 NEON — MSVC ARM64 is AArch64 but does not predefine `__aarch64__`. As a side effect the ARMv7-only polyfills (`vaddvq_f32`/`vaddvq_s32` shims in `distance_matrix_accum_fp32.i`, `distance_matrix_fp32.i`) are correctly skipped under MSVC ARM64, where those intrinsics are built in. * `src/include/zvec/ailego/internal/platform.h`: include `<arm_neon.h>` on the MSVC branch when `_M_ARM64` is defined (it was previously gated behind `!_MSC_VER`, so MSVC ARM64 saw no NEON types). * `src/ailego/CMakeLists.txt`: keep the existing `if(NOT MSVC)` wrapper around the GCC-only `-march=armv8-a` flag, and add an explicit `else()` branch with a comment explaining MSVC ARM64 does not need `-march` (NEON is the ARMv8 baseline on MSVC and the kernels are now picked up via the ALL_SRCS glob with the macro guards above). This supersedes the earlier `test(hnsw): skip two cosine self-match tests on MSVC ARM64` commit (which has been dropped from the branch). The NEON math kernels now use the same precision path as Linux/macOS ARM64, so those tests should pass natively on `windows-11-arm`.
0dc8e61 to
afb6654
Compare
Summary
Enables
pip install .to produce a native ARM64 Python wheel on Windows ARM64 with MSVC + Visual Studio Build Tools 2022. Four small, targeted changes — three in zvec, one new patch file applied to the bundled Arrow 21.0 tree.The Python SDK build docs currently list Windows as x86_64-only with MSVC 2022+. This PR adds the ARM64 arm of that — everything else (Linux ARM64, macOS ARM64) is already working.
Changes
src/ailego/CMakeLists.txtarm/arm64branch ofAUTO_DETECT_ARCHhadif(MSVC) return() endif(), which bailed beforecc_library(zvec_ailego ...)was reached — downstream then failed withNo target "zvec_ailego". The-march=armv8-a/ NEON glob setup is still GCC/Clang-only, so wrap just that inif(NOT MSVC)and let the target definition proceed.src/ailego/internal/cpu_features.cc__cpuidex(x86/x64 intrinsic); GCC branch was gated only by!__ARM_ARCH. MSVC ARM64 matched neither and failed with__cpuidex: identifier not found. Scope both branches to x86/x64 explicitly so MSVC ARM64 falls through to the existing empty-stub ctor.thirdparty/arrow/CMakeLists.txtxsimd::make_sized_batch_tfor MSVC ARM64. Pass-DARROW_SIMD_LEVEL=NONE -DARROW_RUNTIME_SIMD_LEVEL=NONEonly whenCMAKE_SYSTEM_PROCESSORis ARM64 — x64 MSVC keeps its default SSE4.2 path untouched. Also wires in the new patch below.thirdparty/arrow/arrow.windows-arm64.patch(new)arrow/vendored/pcg/pcg_uint128.hpp) uses an x86-only endianness check and calls_umul128, which is not available on MSVC ARM64 — the equivalent intrinsic is__umulh. Add_M_ARM64/_M_ARM/__aarch64__/__arm__to the little-endian case, branch to__umulhon ARM, and guard#pragma intrinsic(_umul128)so it is not referenced on ARM. Applied via the existingapply_patch_oncemechanism, scoped to MSVC+ARM64.Non-goals
CMAKE_SYSTEM_PROCESSOR MATCHES "^(ARM64|arm64|aarch64)$"._M_ARM64using MSVC's<arm_neon.h>, but that's additive and out of scope here.IsProcessorFeaturePresentor compile-time__ARM_FEATURE_*macros.Test plan
Microsoft.VisualStudio.Component.VC.Tools.ARM64) —pip install .succeeds and produces a working wheelpython -c "import zvec; print(zvec)"imports cleanly on the resulting venv (zvec: OK (0.3.2.dev2))git apply --check thirdparty/arrow/arrow.windows-arm64.patchclean against the bundled Arrow 21.0 treeMSVC)Error trail for reviewers
For context on what each change fixes (in build order):
CMake Error at src/binding/c/CMakeLists.txt:109 (target_link_options): No target "zvec_ailego"→ fixed by ailego CMakeLists change.error C3861: '__cpuidex': identifier not found→ fixed by cpu_features.cc change.error C2039: 'make_sized_batch_t': is not a member of 'xsimd'(~20 errors inarrow_compute_core_staticandarrow_util_static) → fixed byARROW_SIMD_LEVEL=NONE.error C1189: #error: Unable to determine target endiannessinarrow/vendored/pcg/pcg_uint128.hpp→ fixed by endianness hunk of the new arrow patch.error C3861: '_umul128': identifier not foundin the same file → fixed by the__umulhhunk of the new arrow patch.After those five errors are resolved, the rest of the build (protobuf, rocksdb, arrow, zvec_core, zvec_db, zvec_ailego, zvec_turbo, roaring, Python binding) completes cleanly on MSVC ARM64.