Fast homogeneous and rotation matrix multiplications #1845

SamFlt · 2025-12-02T12:48:08Z

This PR introduces two methods that perform batched 3d point transformations:

vpHomogeneousMatrix::project: Project N 3d points (Represented as a Nx3 vpMatrix) from a frame to another
vpRotationMatrix::rotateVectors: Rotate Vectors (Represented as a Nx3)

These methods rely on explicit mat muls and SIMD operations to be faster. This PR adds tests to measure the speed up against two naive versions: the Naive vpColVector pa = T * pb loop and vpMatrix::mult2Matrices version (which multiplies a 4x4 matrix with an 4xN matrix.

Interestingly, the version without SIMD in project or rotateVectors, already provides a x2/x3 speedup, while on my machine, I can obtain x5,x6 speedups.

Some notes:

There is a lot of redundancy in the intrinsics inclusion at the start of the .cpp files, maybe all the relevant stuff could be put in a vpSIMDIntrinsicsUtils.h file ?
It seems like the best way to use all the relevant CPU feature set is to use the march=native flag. As it is, turning on AVX with -mavx does not turn on FMA. This also enables all SSE relevant feature sets, which could help simplify the compileflags.

codecov · 2025-12-02T16:46:38Z

Codecov Report

❌ Patch coverage is 57.60369% with 92 lines in your changes missing coverage. Please review.
✅ Project coverage is 33.35%. Comparing base (799ad43) to head (671b883).
⚠️ Report is 6 commits behind head on master.

Files with missing lines	Patch %	Lines
...tracker/rbt/src/features/vpRBDenseDepthTracker.cpp	0.00%	78 Missing ⚠️
modules/tracker/rbt/src/core/private/vpSIMDUtils.h	0.00%	6 Missing ⚠️
...re/src/math/transformation/vpHomogeneousMatrix.cpp	93.93%	2 Missing and 2 partials ⚠️
.../core/src/math/transformation/vpRotationMatrix.cpp	93.75%	2 Missing and 2 partials ⚠️

❗ There is a different number of reports uploaded between BASE (799ad43) and HEAD (671b883). Click for more details.

HEAD has 1 upload less than BASE

Flag BASE (799ad43) HEAD (671b883)

3 2

Additional details and impacted files

@@             Coverage Diff             @@
##           master    #1845       +/-   ##
===========================================
- Coverage   47.84%   33.35%   -14.50%     
===========================================
  Files         532      466       -66     
  Lines       68944    66167     -2777     
  Branches    32201    28740     -3461     
===========================================
- Hits        32986    22068    -10918     
- Misses      31908    33587     +1679     
- Partials     4050    10512     +6462

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

… matrices

modules/core/src/math/transformation/vpHomogeneousMatrix.cpp:60:13: warning: unused variable 'outputData' [-Wunused-variable] 60 | double *outputData = output.data; | ^~~~~~~~~~

modules/tracker/rbt/src/features/vpRBDenseDepthTracker.cpp:200:10: warning: variable 't1' set but not used [-Wunused-but-set-variable] 200 | double t1 = vpTime::measureTimeMs(); | ^

…able Document vpRotationMatrix::rotateVectors() method

…able Document vpHomogeneousMatrix::project() method

modules\core\src\math\transformation\vpRotationMatrix.cpp(128,21): error: always_inline function '_mm_hadd_pd' requires target feature 'sse3', but would be inlined into function 'rotateVectors' that is compiled without support for 'sse3' 128 | __m128d r01 = _mm_hadd_pd(mul0, mul1); | ^

s-trinh · 2025-12-07T16:55:16Z

modules/core/src/math/matrix/private/vpSIMDUtils.h

+#if defined(VISP_HAVE_AVX2)
+using Register = __m512d;
+
+inline constexpr int numLanes = 8;
+inline const Register add(const Register a, const Register b)
+{
+  return _mm512_add_pd(a, b);
+}


Two comments.

AVX2 != 512-bits register:

AVX / AVX2 improvements: https://fr.wikipedia.org/wiki/Advanced_Vector_Extensions#Advanced_Vector_Extensions_2

see AVX-512 and all the different flavors

you can use cat /proc/cpuinfo to see which AVX-512 variants your CPU has

I would definitely not expose SIMD code to the user:

people knowing this subject will not use this code and there are already better libs doing that

with SIMD code in .h, with march=native and running ViSP on another computer there are some chances to have SIGILL crash on old CPU

I think what is done is runtime dispatching:

the .so can have all the more advanced instructions set

for example for a convolution function, a variant for 128-bits, 256-bits and 512-bits (the lib size will increase a lot)

and at runtime you check if the CPU has this kind of instructions set and dispatch to the more advanced one

For the moment this file is duplicated in core and rbt modules. We should find a solution to avoid code duplication.

SamFlt added 4 commits November 26, 2025 18:48

SIMD homogeneous matrix multiplicaiton

711fe77

implement first version of simd for rotaiton matrix

196660c

AVX implem for rotation matmul

729110a

Fix intrinsics usage, performance improvement

e230a72

SamFlt and others added 18 commits December 3, 2025 19:14

Simd version of the rbt dense depth, AVX matmul version for 3xN input…

1ff0b57

… matrices

Export SIMD intrinsics utils in a separate header

690c718

Fix test, improve vpRBDenseDepth

4c88d39

Remove debug prints

77adf9d

Move initVVS to cpp file, resize matrix there

1f3c7c8

Add ENABLE_NATIVE_ARCH option for gcc

cc729b6

Remove reference to MBT tukey estimator, disable prints

a05c329

Fix SSE3 flag check

d847bfb

Merge branch 'master' into fast_homogeneous_proj

c58271a

Update copyright headers

c4bc02a

Fix warning unused variable

49e8c83

modules/core/src/math/transformation/vpHomogeneousMatrix.cpp:60:13: warning: unused variable 'outputData' [-Wunused-variable] 60 | double *outputData = output.data; | ^~~~~~~~~~

Remove useless empty lines

ae5966b

Fix warning variable set but not used

3f3bac2

modules/tracker/rbt/src/features/vpRBDenseDepthTracker.cpp:200:10: warning: variable 't1' set but not used [-Wunused-but-set-variable] 200 | double t1 = vpTime::measureTimeMs(); | ^

Remove vpSIMD namespace from doxygen doc

adc26ab

Fix bug when input vector is not transposed and AVX or SSE2 not avail…

c6ff19a

…able Document vpRotationMatrix::rotateVectors() method

Fix bug when input vector is not transposed and AVX or SSE3 not avail…

51e89a3

…able Document vpHomogeneousMatrix::project() method

Cleanup tests to help debugging

09fc8cd

s-trinh reviewed Dec 7, 2025

View reviewed changes

fspindle added 4 commits December 12, 2025 08:20

Merge branch 'master' into fast_homogeneous_proj

06b044c

Make vpSIMDUtils.h private to not expose SIMD code to the user

7502eac

For the moment this file is duplicated in core and rbt modules. We should find a solution to avoid code duplication.

Remove to make code more explicit

1f60797

Make test independent from SIMD instruction set

671b883

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fast homogeneous and rotation matrix multiplications #1845

Fast homogeneous and rotation matrix multiplications #1845

Uh oh!

SamFlt commented Dec 2, 2025 •

edited

Loading

Uh oh!

codecov bot commented Dec 2, 2025 •

edited

Loading

Uh oh!

s-trinh Dec 7, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Fast homogeneous and rotation matrix multiplications #1845

Are you sure you want to change the base?

Fast homogeneous and rotation matrix multiplications #1845

Uh oh!

Conversation

SamFlt commented Dec 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov bot commented Dec 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

s-trinh Dec 7, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

SamFlt commented Dec 2, 2025 •

edited

Loading

codecov bot commented Dec 2, 2025 •

edited

Loading