- The speed of this library is at a level that is practical despite the high-precision calculations it performs.
- In both Managed and Unity, it was even faster than the standard library implementation in some cases.
- However, in the Unity environment, the
doubleversion tends to be slightly slower.
- Managed - vs.
System.Math(F) - Unity - vs.
Mathf,math,[BurstCompile]-edmath, Native library with P/Invoke
The vertical axis represents the mean time (smaller is faster), and the horizontal axis represents the function.
MathF is a native method with no accuracy guarantee, while CoreF (this library) is managed and has accuracy guarantees, putting it at a performance disadvantage.
Despite this, it appears to be performing well. Some methods are even faster than MathF.
Compound seems a little slower, but this may be due to the range of input random numbers - the fast or slow path is determined depending on the input.
In comparison with Math, faster methods are observed as well.
Although it includes some slightly slower methods such as AtanPi, overall it can be said that the speed is practical.
For the comparison of managed code, I used BenchmarkDotNet, which is considered to be the most reliable.
BenchmarkDotNet v0.15.8, Windows 11 (10.0.26100.7623/24H2/2024Update/HudsonValley)
12th Gen Intel Core i7-12700F 2.10GHz, 1 CPU, 20 logical and 12 physical cores
.NET SDK 10.0.102
[Host] : .NET 10.0.2 (10.0.2, 10.0.225.61305), X64 RyuJIT x86-64-v3
.NET 10.0 : .NET 10.0.2 (10.0.2, 10.0.225.61305), X64 RyuJIT x86-64-v3
.NET 8.0 : .NET 8.0.23 (8.0.23, 8.0.2325.60607), X64 RyuJIT x86-64-v3
Affinity=00001111111111111111
This benchmark compares:
Math-System.Math(F); Standard libraryCore- This library
For methods that are not directly supported in Math(F), I performed measurements by mapping those that can be mapped naively (e.g., Compound(x, y) == Pow(x + 1, y)).
Two benchmark patterns were prepared.
The first, MicroBenchmark, simply calls the function once.
// MicroBenchmark
[Benchmark]
public double CoreAcos()
{
// X is a random number generated outside the benchmark
return StrictMath.Acos(X);
}The second, MacroBenchmark returns the sum of the array passed through a function.
This is because a single function call may be too short a measurement.
// MacroBenchmark
[Benchmark]
public double CoreAcos()
{
double sum = 0.0;
// X is a double[5000] filled with random numbers
foreach (var x in X)
{
sum += StrictMath.Acos(x);
}
return sum;
}When comparing .NET Standard 2.1 and .NET 10, we can see that Core(F) (this library) has a significant speed difference.
This is thought to be mainly due to the speed difference of FusedMultiplyAdd.
This method is an important calculation that is frequently used by other methods, but its API is not implemented in .NET Standard 2.1.
Therefore, a software fallback is implemented, but it is much slower than the intrinsic.
This doesn't seem to have much of an effect on float, but it is more noticeable in the implementation of double.
- CoreMathSharp.Benchmarks.StrictMathBenchmark-report-github.md
- CoreMathSharp.Benchmarks.StrictMathMacroBenchmark-report-github.md
- CoreMathSharp.Benchmarks.StrictMathBenchmark-report.csv
- CoreMathSharp.Benchmarks.StrictMathMacroBenchmark-report.csv
Compound and Pow are slightly slower, but otherwise Core (this library) does not appear to be slower than the other methods.
Unlike float, it tends to be slower than other methods, possibly due to the time required for high-precision calculations.
For comparison in Unity, the Performance Testing Package for Unity Test Framework was used.
(That seems like a pretty simple way to measure benchmarks, but BenchmarkDotNet doesn't currently work with Unity, so that's a no-brainer.)
Unity 6000.5.0a5 was used for the measurements.
The measurements were taken for both PlayMode (Mono) and Player (IL2CPP).
This benchmark compares:
Unity-UnityEngine.Mathf; Unity's standard libraryMath-Unity.Mathematics.math- Burst is not used. As far as I can tell from reading the source code,
System.Mathis used internally.
- Burst is not used. As far as I can tell from reading the source code,
Low-mathwith[BurstCompile(FloatMode = FloatMode.Fast, FloatPrecision = FloatPrecision.Low)]- Relaxes the requirements for some math functions to 350.0 ulps, while allowing risky operations such as reordering floating-point operations.
Medium-mathwith[BurstCompile(FloatMode = FloatMode.Strict, FloatPrecision = FloatPrecision.Medium)]- Relaxes the requirements for some math functions to 3.5 ulps. This is the default setting.
High-mathwith[BurstCompile(FloatMode = FloatMode.Deterministic, FloatPrecision = FloatPrecision.High)]- Some mathematical functions have an error of 1.0 ulp. Consistent behavior can be expected across all platforms.
- Note that Flush to Zero is enabled.
Core- This libraryPInvoke- The CORE-MATH project 's original C library- Compiling a C library into a DLL (Native Plugin) and calling it with P/Invoke.
More information about [BurstCompile] can be found here.
As with the managed code comparison, I prepared two benchmark patterns.
// UnityBenchmark
[Test, Performance]
public void CoreAcosF()
{
Measure.Method(() =>
{
// X is a random number generated outside the benchmark
// Store it in the external variable Result to prevent it from being deleted as dead code.
Result = StrictMath.Acos(X);
})
.WarmupCount(10)
.MeasurementCount(100)
.IterationsPerMeasurement(5000)
.SampleGroup("Acos")
.Run();
}// UnityMacroBenchmark
[Test, Performance]
public void CoreAcos()
{
Measure.Method(() =>
{
double sum = 0.0;
// X is a double[5000] filled with random numbers
foreach (var x in X)
{
sum += StrictMath.Acos(x);
}
Result = sum;
})
.WarmupCount(10)
.MeasurementCount(100)
.IterationsPerMeasurement(16)
.SampleGroup("Acos")
.Run();
}In the Unity environment, the slowness of double calculations was evident.
This is because, as mentioned above, FusedMultiplyAdd is implemented in software, and also because error-free calculations in high-precision doubles are expensive.
.NET Standard 2.1, which can be used for Unity implementation, does not support intrinsics (Vector128, LeadingZeroCount, etc.), which results in a significant slowdown in speed.
In terms of source code size, some functions may require 10 times the number of lines.
This is also thought to be a factor in the difference between float and double.
Therefore, if you don't need it, it's better to use the float version of the function.
We found that the trends were generally the same for microbenchmarks and macrobenchmarks.
Therefore, it can be said that there are no major problems with the benchmarking methodology.
First, please note that the vertical axis is 1/10. The impact of IL2CPP on speed is likely to be very large.
The overall trend doesn't seem to have changed much, but High appears to be slightly slower in comparison.
Compound and Pow were relatively slower, but the other functions were roughly comparable in speed.
For inverse trigonometric functions and Cbrt, the results showed that despite being managed code, they were faster than Unity's implementation.
Core is often faster than [BurstCompile] -ed math.
In particular, it can be seen that it has a speed advantage in most cases compared to High.
High is considered to be a large overhead in order to obtain deterministic results.
It also showed that simply applying [BurstCompile] does not necessarily make the program faster.
This is likely because it makes little use of SIMD operations, where Burst's true value can be realized.
From what I measured, the only visible difference between Low and Medium was Pow; everything else was almost the same.
Considering the risk to accuracy, the benefits of using Low are likely smaller than the disadvantages.
Furthermore, if you utilize Burst Intrinsics in Unity, you can use fma and clz, for example, which is likely to make things even faster.
However, since this is an implementation specific to Burst, the implementation cost is very high, and as will be described later, it is thought that using a native library would be faster, so this is difficult.
In most cases, P/Invoke is slightly faster.
This is especially noticeable in the slower implementations Compound and Pow.
It is believed that the speedup achieved by native compilation has a much greater impact than the overhead of P/Invoke.
However, it should be noted that native libraries require the effort of being built for each environment.
It is certainly fast, but there are many points to be aware of. Managed code is easy and convenient.



























