Hi,
I wonder if the kernels for aarch64 architecture use NEON instructions? The assembly code in https://github.com/google/ruy/blob/master/ruy/kernel_arm64.cc doesn't have NEON instructions like VADD or VMUL. How is vectorization performed for 64-bit arm architectures?
Hi,
I wonder if the kernels for aarch64 architecture use NEON instructions? The assembly code in https://github.com/google/ruy/blob/master/ruy/kernel_arm64.cc doesn't have NEON instructions like
VADDorVMUL. How is vectorization performed for 64-bit arm architectures?