Skip to content

Conversation

@SamFlt
Copy link
Contributor

@SamFlt SamFlt commented Dec 2, 2025

This PR introduces two methods that perform batched 3d point transformations:

  • vpHomogeneousMatrix::project: Project N 3d points (Represented as a Nx3 vpMatrix) from a frame to another
  • vpRotationMatrix::rotateVectors: Rotate Vectors (Represented as a Nx3)

These methods rely on explicit mat muls and SIMD operations to be faster. This PR adds tests to measure the speed up against two naive versions: the Naive vpColVector pa = T * pb loop and vpMatrix::mult2Matrices version (which multiplies a 4x4 matrix with an 4xN matrix.

Interestingly, the version without SIMD in project or rotateVectors, already provides a x2/x3 speedup, while on my machine, I can obtain x5,x6 speedups.

Some notes:

  • There is a lot of redundancy in the intrinsics inclusion at the start of the .cpp files, maybe all the relevant stuff could be put in a vpSIMDIntrinsicsUtils.h file ?
  • It seems like the best way to use all the relevant CPU feature set is to use the march=native flag. As it is, turning on AVX with -mavx does not turn on FMA. This also enables all SSE relevant feature sets, which could help simplify the compileflags.

@codecov
Copy link

codecov bot commented Dec 2, 2025

Codecov Report

❌ Patch coverage is 57.60369% with 92 lines in your changes missing coverage. Please review.
✅ Project coverage is 33.35%. Comparing base (799ad43) to head (671b883).
⚠️ Report is 6 commits behind head on master.

Files with missing lines Patch % Lines
...tracker/rbt/src/features/vpRBDenseDepthTracker.cpp 0.00% 78 Missing ⚠️
modules/tracker/rbt/src/core/private/vpSIMDUtils.h 0.00% 6 Missing ⚠️
...re/src/math/transformation/vpHomogeneousMatrix.cpp 93.93% 2 Missing and 2 partials ⚠️
.../core/src/math/transformation/vpRotationMatrix.cpp 93.75% 2 Missing and 2 partials ⚠️

❗ There is a different number of reports uploaded between BASE (799ad43) and HEAD (671b883). Click for more details.

HEAD has 1 upload less than BASE
Flag BASE (799ad43) HEAD (671b883)
3 2
Additional details and impacted files
@@             Coverage Diff             @@
##           master    #1845       +/-   ##
===========================================
- Coverage   47.84%   33.35%   -14.50%     
===========================================
  Files         532      466       -66     
  Lines       68944    66167     -2777     
  Branches    32201    28740     -3461     
===========================================
- Hits        32986    22068    -10918     
- Misses      31908    33587     +1679     
- Partials     4050    10512     +6462     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

SamFlt and others added 18 commits December 3, 2025 19:14
modules/core/src/math/transformation/vpHomogeneousMatrix.cpp:60:13: warning: unused variable 'outputData' [-Wunused-variable]
   60 |     double *outputData = output.data;
      |             ^~~~~~~~~~
modules/tracker/rbt/src/features/vpRBDenseDepthTracker.cpp:200:10: warning: variable 't1' set but not used [-Wunused-but-set-variable]
  200 |   double t1 = vpTime::measureTimeMs();
      |          ^
…able

Document vpRotationMatrix::rotateVectors() method
…able

Document vpHomogeneousMatrix::project() method
modules\core\src\math\transformation\vpRotationMatrix.cpp(128,21): error: always_inline function '_mm_hadd_pd' requires target feature 'sse3', but would be inlined into function 'rotateVectors' that is compiled without support for 'sse3'
  128 |       __m128d r01 = _mm_hadd_pd(mul0, mul1);
      |                     ^
Comment on lines +97 to +104
#if defined(VISP_HAVE_AVX2)
using Register = __m512d;

inline constexpr int numLanes = 8;
inline const Register add(const Register a, const Register b)
{
return _mm512_add_pd(a, b);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two comments.

AVX2 != 512-bits register:

I would definitely not expose SIMD code to the user:

  • people knowing this subject will not use this code and there are already better libs doing that
  • with SIMD code in .h, with march=native and running ViSP on another computer there are some chances to have SIGILL crash on old CPU
  • I think what is done is runtime dispatching:
    • the .so can have all the more advanced instructions set
    • for example for a convolution function, a variant for 128-bits, 256-bits and 512-bits (the lib size will increase a lot)
    • and at runtime you check if the CPU has this kind of instructions set and dispatch to the more advanced one

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants