Skip to content

[FEA] JIT LTO Pairwise Distances#2099

Open
tarang-jain wants to merge 30 commits into
rapidsai:mainfrom
tarang-jain:jit-lto-pw
Open

[FEA] JIT LTO Pairwise Distances#2099
tarang-jain wants to merge 30 commits into
rapidsai:mainfrom
tarang-jain:jit-lto-pw

Conversation

@tarang-jain
Copy link
Copy Markdown
Contributor

@tarang-jain tarang-jain commented May 15, 2026

Refactor the sm60 dispatch to use new fragments for:

  1. distance_op
  2. distance epilog

[Note 05/19]:
the fused path is calling PairwiseDistances directly (which in turn subs in fragments for compute_distance). This leads to symbol lookup errors. That means we need to keep the non-jit path around for the fused reductions (discussed with @divyegala)

libcuvs.so size (CUDA 13.2): 255.92 MB -> 238.41 MB
libcuvs.so size (CUDA 12.9): 487.15 MB -> 448.81 MB

Benchmarks to check for regressions:
Hardware: H100
cold_before_ms: benchmark (main) i.e. without warmup runs
cold_after_ms: benchmark (PR) without warmup runs
warm_before_ms: benchmark (main) after warmup runs. We take the median over 20 runs
warm_after_ms: benchmark (PR) after warmup runs. We take the median over 20 runs

metric dtype m n k L (layout) cold_before_ms cold_after_ms cold_x warm_before_ms warm_after_ms warm_x
canberra float32 1024 1024 3 C 0.755 497.942 0.00x 0.128 0.128 1.00x
canberra float32 1024 1024 8 C 0.124 0.127 0.97x 0.115 0.115 1.00x
canberra float32 1024 1024 16 C 0.101 0.101 1.00x 0.095 0.095 1.00x
canberra float32 1024 1024 64 C 0.094 0.090 1.04x 0.088 0.088 1.00x
canberra float32 1024 1024 256 C 0.290 0.289 1.00x 0.287 0.287 1.00x
canberra float32 4096 4096 3 C 2.708 2.728 0.99x 2.694 2.724 0.99x
canberra float32 4096 4096 8 C 2.391 2.274 1.05x 2.378 2.365 1.01x
canberra float32 4096 4096 16 C 1.857 1.806 1.03x 1.868 1.876 1.00x
canberra float32 4096 4096 64 C 1.517 1.515 1.00x 1.517 1.518 1.00x
canberra float32 4096 4096 256 C 5.968 5.961 1.00x 5.972 5.965 1.00x
canberra float32 8192 8192 3 C 8.422 8.501 0.99x 8.411 8.497 0.99x
canberra float32 8192 8192 8 C 7.399 7.460 0.99x 7.385 7.454 0.99x
canberra float32 8192 8192 16 C 5.753 5.795 0.99x 5.747 5.792 0.99x
canberra float32 8192 8192 64 C 4.966 4.958 1.00x 4.962 4.965 1.00x
canberra float32 8192 8192 256 C 19.695 19.662 1.00x 19.694 19.673 1.00x
canberra float64 1024 1024 3 C 0.607 235.230 0.00x 0.103 0.104 0.99x
canberra float64 1024 1024 8 C 0.095 0.097 0.97x 0.084 0.085 0.99x
canberra float64 1024 1024 16 C 0.059 0.059 1.00x 0.055 0.054 1.01x
canberra float64 1024 1024 64 C 0.151 0.152 0.99x 0.150 0.150 1.00x
canberra float64 1024 1024 256 C 0.535 0.539 0.99x 0.534 0.536 1.00x
canberra float64 4096 4096 3 C 2.058 2.078 0.99x 2.058 2.083 0.99x
canberra float64 4096 4096 8 C 1.594 1.611 0.99x 1.592 1.612 0.99x
canberra float64 4096 4096 16 C 0.832 0.833 1.00x 0.775 0.778 1.00x
canberra float64 4096 4096 64 C 2.987 3.006 0.99x 2.986 3.003 0.99x
canberra float64 4096 4096 256 C 11.829 11.897 0.99x 11.830 11.902 0.99x
canberra float64 8192 8192 3 C 6.200 6.343 0.98x 6.192 6.334 0.98x
canberra float64 8192 8192 8 C 4.785 4.875 0.98x 4.779 4.871 0.98x
canberra float64 8192 8192 16 C 2.514 2.519 1.00x 2.510 2.518 1.00x
canberra float64 8192 8192 64 C 9.821 9.881 0.99x 9.820 9.879 0.99x
canberra float64 8192 8192 256 C 39.072 39.308 0.99x 39.086 39.319 0.99x
chebyshev float32 1024 1024 3 C 0.958 500.151 0.00x 0.026 0.026 1.02x
chebyshev float32 1024 1024 8 C 0.090 358.533 0.00x 0.025 0.025 1.00x
chebyshev float32 1024 1024 16 C 0.029 0.043 0.68x 0.025 0.025 1.02x
chebyshev float32 1024 1024 64 C 0.032 0.035 0.91x 0.029 0.029 1.02x
chebyshev float32 1024 1024 256 C 0.054 0.055 0.99x 0.051 0.051 1.01x
chebyshev float32 4096 4096 3 C 0.121 0.123 0.99x 0.119 0.117 1.02x
chebyshev float32 4096 4096 8 C 0.103 0.102 1.01x 0.103 0.104 0.99x
chebyshev float32 4096 4096 16 C 0.099 0.107 0.93x 0.104 0.105 0.99x
chebyshev float32 4096 4096 64 C 0.176 0.184 0.96x 0.179 0.179 1.00x
chebyshev float32 4096 4096 256 C 0.594 0.601 0.99x 0.603 0.605 1.00x
chebyshev float32 8192 8192 3 C 0.345 0.336 1.03x 0.343 0.338 1.02x
chebyshev float32 8192 8192 8 C 0.289 0.298 0.97x 0.291 0.292 1.00x
chebyshev float32 8192 8192 16 C 0.289 0.294 0.98x 0.292 0.292 1.00x
chebyshev float32 8192 8192 64 C 0.540 0.537 1.01x 0.543 0.542 1.00x
chebyshev float32 8192 8192 256 C 1.933 1.945 0.99x 1.943 1.954 0.99x
chebyshev float64 1024 1024 3 C 1.082 377.472 0.00x 0.030 0.030 1.03x
chebyshev float64 1024 1024 8 C 0.128 344.873 0.00x 0.029 0.029 1.00x
chebyshev float64 1024 1024 16 C 0.035 0.041 0.85x 0.029 0.029 1.02x
chebyshev float64 1024 1024 64 C 0.052 0.057 0.92x 0.050 0.050 0.99x
chebyshev float64 1024 1024 256 C 0.131 0.138 0.95x 0.131 0.133 0.98x
chebyshev float64 4096 4096 3 C 0.196 0.179 1.10x 0.193 0.172 1.12x
chebyshev float64 4096 4096 8 C 0.173 0.172 1.01x 0.170 0.167 1.02x
chebyshev float64 4096 4096 16 C 0.171 0.166 1.03x 0.170 0.168 1.01x
chebyshev float64 4096 4096 64 C 0.562 0.579 0.97x 0.564 0.577 0.98x
chebyshev float64 4096 4096 256 C 2.136 2.178 0.98x 2.143 2.185 0.98x
chebyshev float64 8192 8192 3 C 0.613 0.528 1.16x 0.591 0.516 1.15x
chebyshev float64 8192 8192 8 C 0.528 0.506 1.04x 0.518 0.501 1.03x
chebyshev float64 8192 8192 16 C 0.523 0.502 1.04x 0.520 0.504 1.03x
chebyshev float64 8192 8192 64 C 1.870 1.948 0.96x 1.891 1.948 0.97x
chebyshev float64 8192 8192 256 C 7.042 7.233 0.97x 7.068 7.225 0.98x
cityblock float32 1024 1024 3 C 0.037 0.042 0.87x 0.026 0.025 1.01x
cityblock float32 1024 1024 8 C 0.029 0.030 0.95x 0.025 0.025 1.00x
cityblock float32 1024 1024 16 C 0.027 0.027 0.97x 0.025 0.025 1.01x
cityblock float32 1024 1024 64 C 0.030 0.035 0.85x 0.029 0.029 1.02x
cityblock float32 1024 1024 256 C 0.050 0.051 0.97x 0.051 0.050 1.01x
cityblock float32 4096 4096 3 C 0.122 0.120 1.02x 0.119 0.118 1.01x
cityblock float32 4096 4096 8 C 0.101 0.099 1.02x 0.104 0.103 1.01x
cityblock float32 4096 4096 16 C 0.099 0.099 1.00x 0.104 0.103 1.01x
cityblock float32 4096 4096 64 C 0.176 0.173 1.02x 0.177 0.176 1.01x
cityblock float32 4096 4096 256 C 0.583 0.578 1.01x 0.588 0.586 1.00x
cityblock float32 8192 8192 3 C 0.347 0.339 1.02x 0.343 0.342 1.00x
cityblock float32 8192 8192 8 C 0.293 0.288 1.02x 0.293 0.290 1.01x
cityblock float32 8192 8192 16 C 0.292 0.288 1.01x 0.295 0.292 1.01x
cityblock float32 8192 8192 64 C 0.530 0.526 1.01x 0.536 0.533 1.01x
cityblock float32 8192 8192 256 C 1.886 1.879 1.00x 1.896 1.889 1.00x
cityblock float64 1024 1024 3 C 0.033 0.032 1.02x 0.026 0.025 1.02x
cityblock float64 1024 1024 8 C 0.030 0.028 1.10x 0.026 0.025 1.02x
cityblock float64 1024 1024 16 C 0.028 0.025 1.10x 0.026 0.025 1.03x
cityblock float64 1024 1024 64 C 0.037 0.034 1.08x 0.035 0.035 1.01x
cityblock float64 1024 1024 256 C 0.072 0.073 0.98x 0.072 0.072 1.01x
cityblock float64 4096 4096 3 C 0.105 0.106 0.99x 0.102 0.104 0.98x
cityblock float64 4096 4096 8 C 0.098 0.096 1.03x 0.098 0.098 1.00x
cityblock float64 4096 4096 16 C 0.098 0.101 0.97x 0.100 0.100 1.01x
cityblock float64 4096 4096 64 C 0.275 0.279 0.99x 0.278 0.277 1.00x
cityblock float64 4096 4096 256 C 0.983 0.987 1.00x 0.989 0.990 1.00x
cityblock float64 8192 8192 3 C 0.286 0.293 0.98x 0.286 0.293 0.98x
cityblock float64 8192 8192 8 C 0.270 0.273 0.99x 0.273 0.274 1.00x
cityblock float64 8192 8192 16 C 0.278 0.283 0.98x 0.280 0.280 1.00x
cityblock float64 8192 8192 64 C 0.861 0.862 1.00x 0.859 0.862 1.00x
cityblock float64 8192 8192 256 C 3.200 3.200 1.00x 3.211 3.208 1.00x
correlation float32 1024 1024 3 C 1.306 481.815 0.00x 0.043 0.044 1.00x
correlation float32 1024 1024 8 C 0.152 326.839 0.00x 0.043 0.043 1.00x
correlation float32 1024 1024 16 C 0.084 0.099 0.84x 0.043 0.043 0.99x
correlation float32 1024 1024 64 C 0.085 0.097 0.88x 0.046 0.046 1.00x
correlation float32 1024 1024 256 C 0.100 0.105 0.95x 0.061 0.061 1.00x
correlation float32 4096 4096 3 C 0.182 0.180 1.01x 0.166 0.167 1.00x
correlation float32 4096 4096 8 C 0.171 0.171 1.00x 0.166 0.166 1.00x
correlation float32 4096 4096 16 C 0.169 0.167 1.01x 0.165 0.166 1.00x
correlation float32 4096 4096 64 C 0.230 0.233 0.99x 0.228 0.228 1.00x
correlation float32 4096 4096 256 C 0.485 0.489 0.99x 0.489 0.490 1.00x
correlation float32 8192 8192 3 C 0.477 0.481 0.99x 0.471 0.476 0.99x
correlation float32 8192 8192 8 C 0.461 0.459 1.00x 0.462 0.460 1.00x
correlation float32 8192 8192 16 C 0.465 0.462 1.01x 0.463 0.461 1.01x
correlation float32 8192 8192 64 C 0.587 0.667 0.88x 0.581 0.660 0.88x
correlation float32 8192 8192 256 C 1.513 1.514 1.00x 1.525 1.524 1.00x
correlation float64 1024 1024 3 C 0.887 335.799 0.00x 0.045 0.046 0.98x
correlation float64 1024 1024 8 C 0.152 290.216 0.00x 0.046 0.047 0.99x
correlation float64 1024 1024 16 C 0.082 0.097 0.85x 0.046 0.047 0.98x
correlation float64 1024 1024 64 C 0.111 0.097 1.14x 0.054 0.055 0.99x
correlation float64 1024 1024 256 C 0.117 0.121 0.96x 0.084 0.085 0.99x
correlation float64 4096 4096 3 C 0.217 0.207 1.05x 0.194 0.197 0.99x
correlation float64 4096 4096 8 C 0.217 0.198 1.09x 0.212 0.209 1.02x
correlation float64 4096 4096 16 C 0.219 0.209 1.05x 0.214 0.211 1.01x
correlation float64 4096 4096 64 C 0.336 0.338 0.99x 0.335 0.336 1.00x
correlation float64 4096 4096 256 C 0.895 0.909 0.98x 0.898 0.914 0.98x
correlation float64 8192 8192 3 C 0.626 0.632 0.99x 0.618 0.628 0.98x
correlation float64 8192 8192 8 C 0.625 0.614 1.02x 0.620 0.609 1.02x
correlation float64 8192 8192 16 C 0.621 0.612 1.02x 0.621 0.612 1.01x
correlation float64 8192 8192 64 C 1.000 1.007 0.99x 1.002 1.007 0.99x
correlation float64 8192 8192 256 C 2.851 2.913 0.98x 2.869 2.927 0.98x
cosine float32 1024 1024 3 C 3.044 1.421 2.14x 0.052 0.049 1.07x
cosine float32 1024 1024 8 C 0.269 0.229 1.17x 0.051 0.048 1.08x
cosine float32 1024 1024 16 C 0.091 0.091 1.00x 0.051 0.048 1.08x
cosine float32 1024 1024 64 C 0.094 0.090 1.03x 0.059 0.055 1.07x
cosine float32 1024 1024 256 C 0.173 0.116 1.49x 0.087 0.082 1.07x
cosine float32 4096 4096 3 C 0.144 0.134 1.08x 0.115 0.111 1.03x
cosine float32 4096 4096 8 C 0.143 0.129 1.11x 0.116 0.112 1.04x
cosine float32 4096 4096 16 C 0.126 0.120 1.05x 0.114 0.112 1.02x
cosine float32 4096 4096 64 C 0.171 0.163 1.04x 0.158 0.158 1.00x
cosine float32 4096 4096 256 C 0.324 0.326 0.99x 0.332 0.329 1.01x
cosine float32 8192 8192 3 C 0.365 0.351 1.04x 0.338 0.335 1.01x
cosine float32 8192 8192 8 C 0.358 0.351 1.02x 0.336 0.333 1.01x
cosine float32 8192 8192 16 C 0.347 0.341 1.02x 0.331 0.330 1.00x
cosine float32 8192 8192 64 C 0.497 0.501 0.99x 0.498 0.496 1.00x
cosine float32 8192 8192 256 C 1.151 1.151 1.00x 1.117 1.159 0.96x
cosine float64 1024 1024 3 C 1.278 0.613 2.08x 0.044 0.041 1.06x
cosine float64 1024 1024 8 C 0.116 0.091 1.28x 0.044 0.041 1.08x
cosine float64 1024 1024 16 C 0.477 0.620 0.77x 0.045 0.042 1.08x
cosine float64 1024 1024 64 C 0.095 0.091 1.05x 0.048 0.045 1.07x
cosine float64 1024 1024 256 C 0.108 0.102 1.06x 0.064 0.062 1.04x
cosine float64 4096 4096 3 C 0.127 0.123 1.03x 0.112 0.111 1.01x
cosine float64 4096 4096 8 C 0.124 0.120 1.04x 0.113 0.111 1.01x
cosine float64 4096 4096 16 C 0.123 0.117 1.06x 0.113 0.112 1.01x
cosine float64 4096 4096 64 C 0.190 0.184 1.03x 0.176 0.179 0.98x
cosine float64 4096 4096 256 C 0.419 0.426 0.98x 0.412 0.406 1.02x
cosine float64 8192 8192 3 C 0.328 0.319 1.03x 0.316 0.311 1.02x
cosine float64 8192 8192 8 C 0.331 0.316 1.05x 0.317 0.312 1.02x
cosine float64 8192 8192 16 C 0.325 0.316 1.03x 0.318 0.314 1.01x
cosine float64 8192 8192 64 C 0.571 0.562 1.02x 0.569 0.564 1.01x
cosine float64 8192 8192 256 C 1.462 1.444 1.01x 1.534 1.453 1.06x
euclidean float32 1024 1024 3 C 6.477 5.780 1.12x 0.053 0.052 1.01x
euclidean float32 1024 1024 8 C 0.395 0.372 1.06x 0.051 0.051 1.01x
euclidean float32 1024 1024 16 C 0.087 0.090 0.97x 0.051 0.050 1.01x
euclidean float32 1024 1024 64 C 0.090 0.094 0.96x 0.058 0.057 1.01x
euclidean float32 1024 1024 256 C 0.115 0.115 1.00x 0.084 0.083 1.01x
euclidean float32 4096 4096 3 C 0.173 0.167 1.04x 0.125 0.125 1.00x
euclidean float32 4096 4096 8 C 0.156 0.151 1.03x 0.125 0.123 1.01x
euclidean float32 4096 4096 16 C 0.134 0.130 1.03x 0.124 0.123 1.01x
euclidean float32 4096 4096 64 C 0.175 0.174 1.01x 0.167 0.167 1.00x
euclidean float32 4096 4096 256 C 0.360 0.363 0.99x 0.331 0.331 1.00x
euclidean float32 8192 8192 3 C 0.424 0.425 1.00x 0.381 0.381 1.00x
euclidean float32 8192 8192 8 C 0.401 0.404 0.99x 0.375 0.369 1.01x
euclidean float32 8192 8192 16 C 0.383 0.379 1.01x 0.373 0.368 1.01x
euclidean float32 8192 8192 64 C 0.543 0.542 1.00x 0.540 0.535 1.01x
euclidean float32 8192 8192 256 C 1.174 1.155 1.02x 1.176 1.164 1.01x
euclidean float64 1024 1024 3 C 1.135 0.783 1.45x 0.042 0.045 0.94x
euclidean float64 1024 1024 8 C 0.090 0.083 1.09x 0.042 0.045 0.94x
euclidean float64 1024 1024 16 C 0.078 0.083 0.93x 0.042 0.044 0.97x
euclidean float64 1024 1024 64 C 0.073 0.077 0.95x 0.046 0.045 1.01x
euclidean float64 1024 1024 256 C 0.087 0.088 0.98x 0.061 0.062 0.99x
euclidean float64 4096 4096 3 C 0.120 0.123 0.98x 0.113 0.113 0.99x
euclidean float64 4096 4096 8 C 0.119 0.124 0.96x 0.112 0.114 0.98x
euclidean float64 4096 4096 16 C 0.119 0.119 1.00x 0.113 0.114 0.99x
euclidean float64 4096 4096 64 C 0.183 0.187 0.98x 0.175 0.180 0.97x
euclidean float64 4096 4096 256 C 0.416 0.431 0.97x 0.416 0.433 0.96x
euclidean float64 8192 8192 3 C 0.390 0.391 1.00x 0.332 0.348 0.95x
euclidean float64 8192 8192 8 C 0.352 0.362 0.97x 0.333 0.350 0.95x
euclidean float64 8192 8192 16 C 0.346 0.359 0.97x 0.335 0.352 0.95x
euclidean float64 8192 8192 64 C 0.585 0.619 0.94x 0.578 0.611 0.95x
euclidean float64 8192 8192 256 C 1.506 1.601 0.94x 1.607 1.610 1.00x
hamming float32 1024 1024 3 C 1.063 445.932 0.00x 0.026 0.026 1.00x
hamming float32 1024 1024 8 C 0.095 311.794 0.00x 0.025 0.025 0.99x
hamming float32 1024 1024 16 C 0.032 0.036 0.89x 0.025 0.025 1.01x
hamming float32 1024 1024 64 C 0.032 0.033 0.95x 0.029 0.029 1.01x
hamming float32 1024 1024 256 C 0.056 0.054 1.03x 0.051 0.051 1.01x
hamming float32 4096 4096 3 C 0.121 0.122 0.98x 0.118 0.119 1.00x
hamming float32 4096 4096 8 C 0.100 0.102 0.98x 0.102 0.103 1.00x
hamming float32 4096 4096 16 C 0.104 0.101 1.03x 0.103 0.103 1.00x
hamming float32 4096 4096 64 C 0.173 0.173 1.00x 0.177 0.177 1.00x
hamming float32 4096 4096 256 C 0.596 0.600 0.99x 0.605 0.602 1.01x
hamming float32 8192 8192 3 C 0.344 0.343 1.00x 0.342 0.345 0.99x
hamming float32 8192 8192 8 C 0.288 0.289 1.00x 0.289 0.291 0.99x
hamming float32 8192 8192 16 C 0.286 0.287 1.00x 0.289 0.292 0.99x
hamming float32 8192 8192 64 C 0.531 0.530 1.00x 0.535 0.536 1.00x
hamming float32 8192 8192 256 C 1.945 1.934 1.01x 1.954 1.943 1.01x
hamming float64 1024 1024 3 C 0.680 322.386 0.00x 0.026 0.030 0.89x
hamming float64 1024 1024 8 C 0.097 268.425 0.00x 0.027 0.027 1.01x
hamming float64 1024 1024 16 C 0.031 0.039 0.81x 0.027 0.026 1.02x
hamming float64 1024 1024 64 C 0.039 0.042 0.93x 0.038 0.038 1.01x
hamming float64 1024 1024 256 C 0.089 0.082 1.08x 0.082 0.081 1.00x
hamming float64 4096 4096 3 C 0.110 0.164 0.67x 0.110 0.163 0.67x
hamming float64 4096 4096 8 C 0.114 0.113 1.01x 0.115 0.114 1.01x
hamming float64 4096 4096 16 C 0.117 0.114 1.03x 0.116 0.116 1.01x
hamming float64 4096 4096 64 C 0.325 0.325 1.00x 0.327 0.328 0.99x
hamming float64 4096 4096 256 C 1.160 1.163 1.00x 1.165 1.170 1.00x
hamming float64 8192 8192 3 C 0.315 0.479 0.66x 0.311 0.481 0.65x
hamming float64 8192 8192 8 C 0.325 0.319 1.02x 0.323 0.318 1.01x
hamming float64 8192 8192 16 C 0.327 0.321 1.02x 0.326 0.322 1.01x
hamming float64 8192 8192 64 C 1.023 1.020 1.00x 1.021 1.024 1.00x
hamming float64 8192 8192 256 C 3.792 3.800 1.00x 3.800 3.812 1.00x
hellinger float32 1024 1024 3 C 1.054 350.515 0.00x 0.037 0.036 1.03x
hellinger float32 1024 1024 8 C 0.096 249.521 0.00x 0.035 0.035 1.02x
hellinger float32 1024 1024 16 C 0.042 0.047 0.91x 0.036 0.034 1.03x
hellinger float32 1024 1024 64 C 0.046 0.043 1.08x 0.038 0.037 1.01x
hellinger float32 1024 1024 256 C 0.055 0.057 0.96x 0.053 0.053 1.00x
hellinger float32 4096 4096 3 C 0.167 0.171 0.97x 0.165 0.165 1.00x
hellinger float32 4096 4096 8 C 0.152 0.156 0.98x 0.153 0.153 1.00x
hellinger float32 4096 4096 16 C 0.150 0.154 0.98x 0.153 0.153 1.00x
hellinger float32 4096 4096 64 C 0.190 0.184 1.03x 0.189 0.187 1.01x
hellinger float32 4096 4096 256 C 0.474 0.471 1.01x 0.478 0.477 1.00x
hellinger float32 8192 8192 3 C 0.473 0.471 1.00x 0.475 0.474 1.00x
hellinger float32 8192 8192 8 C 0.437 0.436 1.00x 0.440 0.439 1.00x
hellinger float32 8192 8192 16 C 0.450 0.440 1.02x 0.440 0.439 1.00x
hellinger float32 8192 8192 64 C 0.543 0.541 1.00x 0.545 0.545 1.00x
hellinger float32 8192 8192 256 C 1.505 1.506 1.00x 1.509 1.512 1.00x
hellinger float64 1024 1024 3 C 0.670 232.625 0.00x 0.037 0.037 1.00x
hellinger float64 1024 1024 8 C 0.092 208.067 0.00x 0.037 0.037 1.00x
hellinger float64 1024 1024 16 C 0.042 0.048 0.87x 0.037 0.037 1.00x
hellinger float64 1024 1024 64 C 0.047 0.050 0.94x 0.045 0.045 1.00x
hellinger float64 1024 1024 256 C 0.078 0.080 0.98x 0.076 0.077 0.99x
hellinger float64 4096 4096 3 C 0.177 0.182 0.97x 0.175 0.177 0.99x
hellinger float64 4096 4096 8 C 0.170 0.171 1.00x 0.169 0.169 1.00x
hellinger float64 4096 4096 16 C 0.187 0.170 1.10x 0.186 0.171 1.09x
hellinger float64 4096 4096 64 C 0.300 0.306 0.98x 0.303 0.308 0.98x
hellinger float64 4096 4096 256 C 0.867 0.885 0.98x 0.869 0.886 0.98x
hellinger float64 8192 8192 3 C 0.514 0.515 1.00x 0.511 0.515 0.99x
hellinger float64 8192 8192 8 C 0.490 0.495 0.99x 0.491 0.491 1.00x
hellinger float64 8192 8192 16 C 0.498 0.497 1.00x 0.546 0.498 1.10x
hellinger float64 8192 8192 64 C 0.920 0.935 0.98x 0.923 0.937 0.98x
hellinger float64 8192 8192 256 C 2.786 2.841 0.98x 2.786 2.843 0.98x
inner_product float32 1024 1024 3 C 69.165 64.390 1.07x 0.033 0.032 1.05x
inner_product float32 1024 1024 8 C 0.063 0.067 0.94x 0.033 0.031 1.06x
inner_product float32 1024 1024 16 C 0.147 0.149 0.99x 0.032 0.033 0.98x
inner_product float32 1024 1024 64 C 0.063 0.063 1.00x 0.035 0.037 0.96x
inner_product float32 1024 1024 256 C 0.072 0.069 1.04x 0.049 0.049 1.01x
inner_product float32 4096 4096 3 C 0.237 0.233 1.02x 0.064 0.063 1.01x
inner_product float32 4096 4096 8 C 0.145 0.150 0.97x 0.062 0.063 1.00x
inner_product float32 4096 4096 16 C 0.092 0.091 1.01x 0.070 0.070 0.99x
inner_product float32 4096 4096 64 C 0.134 0.138 0.97x 0.117 0.117 1.00x
inner_product float32 4096 4096 256 C 0.434 0.436 1.00x 0.268 0.267 1.00x
inner_product float32 8192 8192 3 C 0.179 0.177 1.01x 0.151 0.150 1.01x
inner_product float32 8192 8192 8 C 0.174 0.175 0.99x 0.155 0.156 1.00x
inner_product float32 8192 8192 16 C 0.201 0.206 0.98x 0.187 0.189 0.99x
inner_product float32 8192 8192 64 C 0.379 0.384 0.99x 0.361 0.362 1.00x
inner_product float32 8192 8192 256 C 0.993 0.995 1.00x 0.975 0.975 1.00x
inner_product float64 1024 1024 3 C 30.439 30.640 0.99x 0.034 0.034 1.00x
inner_product float64 1024 1024 8 C 0.121 0.118 1.03x 0.035 0.035 1.00x
inner_product float64 1024 1024 16 C 0.141 0.141 1.00x 0.033 0.033 1.01x
inner_product float64 1024 1024 64 C 0.060 0.060 1.00x 0.035 0.035 1.00x
inner_product float64 1024 1024 256 C 0.118 0.119 0.99x 0.043 0.043 1.00x
inner_product float64 4096 4096 3 C 0.116 0.118 0.98x 0.096 0.096 1.00x
inner_product float64 4096 4096 8 C 0.330 0.334 0.99x 0.093 0.092 1.01x
inner_product float64 4096 4096 16 C 0.118 0.112 1.05x 0.093 0.092 1.01x
inner_product float64 4096 4096 64 C 0.134 0.132 1.01x 0.104 0.104 1.00x
inner_product float64 4096 4096 256 C 0.251 0.253 0.99x 0.232 0.232 1.00x
inner_product float64 8192 8192 3 C 0.368 0.371 0.99x 0.299 0.297 1.00x
inner_product float64 8192 8192 8 C 0.293 0.294 1.00x 0.274 0.274 1.00x
inner_product float64 8192 8192 16 C 0.346 0.343 1.01x 0.329 0.329 1.00x
inner_product float64 8192 8192 64 C 0.337 0.331 1.02x 0.311 0.309 1.01x
inner_product float64 8192 8192 256 C 0.813 0.805 1.01x 0.809 0.804 1.01x
jensenshannon float32 1024 1024 3 C 0.808 574.979 0.00x 0.107 0.107 1.00x
jensenshannon float32 1024 1024 8 C 0.117 0.119 0.99x 0.107 0.106 1.01x
jensenshannon float32 1024 1024 16 C 0.112 0.115 0.97x 0.107 0.106 1.01x
jensenshannon float32 1024 1024 64 C 0.193 0.194 1.00x 0.190 0.189 1.00x
jensenshannon float32 1024 1024 256 C 0.692 0.690 1.00x 0.688 0.689 1.00x
jensenshannon float32 4096 4096 3 C 1.684 1.674 1.01x 1.678 1.670 1.00x
jensenshannon float32 4096 4096 8 C 1.682 1.673 1.01x 1.681 1.671 1.01x
jensenshannon float32 4096 4096 16 C 1.682 1.667 1.01x 1.681 1.671 1.01x
jensenshannon float32 4096 4096 64 C 3.291 3.278 1.00x 3.287 3.275 1.00x
jensenshannon float32 4096 4096 256 C 12.922 12.895 1.00x 12.918 12.900 1.00x
jensenshannon float32 8192 8192 3 C 5.505 5.474 1.01x 5.499 5.470 1.01x
jensenshannon float32 8192 8192 8 C 5.505 5.478 1.00x 5.498 5.470 1.01x
jensenshannon float32 8192 8192 16 C 5.501 5.472 1.01x 5.500 5.470 1.01x
jensenshannon float32 8192 8192 64 C 10.832 10.795 1.00x 10.822 10.792 1.00x
jensenshannon float32 8192 8192 256 C 42.699 42.612 1.00x 42.704 42.619 1.00x
jensenshannon float64 1024 1024 3 C 1.769 1831.767 0.00x 0.268 0.269 1.00x
jensenshannon float64 1024 1024 8 C 0.278 0.284 0.98x 0.269 0.269 1.00x
jensenshannon float64 1024 1024 16 C 0.278 0.275 1.01x 0.268 0.270 0.99x
jensenshannon float64 1024 1024 64 C 1.111 1.046 1.06x 1.061 1.053 1.01x
jensenshannon float64 1024 1024 256 C 4.196 4.152 1.01x 4.215 4.211 1.00x
jensenshannon float64 4096 4096 3 C 5.294 5.264 1.01x 5.262 5.268 1.00x
jensenshannon float64 4096 4096 8 C 5.276 5.257 1.00x 5.273 5.280 1.00x
jensenshannon float64 4096 4096 16 C 5.274 5.287 1.00x 5.280 5.280 1.00x
jensenshannon float64 4096 4096 64 C 23.512 22.845 1.03x 23.241 22.919 1.01x
jensenshannon float64 4096 4096 256 C 90.298 97.378 0.93x 93.772 96.524 0.97x
jensenshannon float64 8192 8192 3 C 18.319 18.465 0.99x 18.110 18.085 1.00x
jensenshannon float64 8192 8192 8 C 18.023 17.990 1.00x 18.108 18.174 1.00x
jensenshannon float64 8192 8192 16 C 18.328 18.183 1.01x 18.174 18.182 1.00x
jensenshannon float64 8192 8192 64 C 82.497 81.423 1.01x 81.730 81.544 1.00x
jensenshannon float64 8192 8192 256 C 341.630 337.802 1.01x 339.172 337.489 1.00x
kl_divergence float32 1024 1024 3 C 1.203 1139.463 0.00x 0.095 0.094 1.01x
kl_divergence float32 1024 1024 8 C 0.105 0.109 0.97x 0.095 0.094 1.02x
kl_divergence float32 1024 1024 16 C 0.106 0.103 1.03x 0.097 0.094 1.03x
kl_divergence float32 1024 1024 64 C 0.170 0.167 1.02x 0.161 0.159 1.01x
kl_divergence float32 1024 1024 256 C 0.558 0.565 0.99x 0.552 0.562 0.98x
kl_divergence float32 4096 4096 3 C 1.704 1.676 1.02x 1.708 1.679 1.02x
kl_divergence float32 4096 4096 8 C 1.722 1.684 1.02x 1.709 1.680 1.02x
kl_divergence float32 4096 4096 16 C 1.720 1.687 1.02x 1.710 1.680 1.02x
kl_divergence float32 4096 4096 64 C 3.478 3.484 1.00x 3.442 3.436 1.00x
kl_divergence float32 4096 4096 256 C 13.266 13.956 0.95x 13.377 13.596 0.98x
kl_divergence float32 8192 8192 3 C 5.677 5.581 1.02x 5.674 5.581 1.02x
kl_divergence float32 8192 8192 8 C 5.679 5.569 1.02x 5.663 5.570 1.02x
kl_divergence float32 8192 8192 16 C 5.649 5.587 1.01x 5.667 5.587 1.01x
kl_divergence float32 8192 8192 64 C 12.789 12.595 1.02x 11.655 12.541 0.93x
kl_divergence float32 8192 8192 256 C 47.794 48.375 0.99x 48.048 48.350 0.99x
kl_divergence float64 1024 1024 3 C 2.496 2875.166 0.00x 0.130 0.133 0.98x
kl_divergence float64 1024 1024 8 C 0.141 0.146 0.97x 0.130 0.132 0.98x
kl_divergence float64 1024 1024 16 C 0.137 0.138 0.99x 0.131 0.132 0.99x
kl_divergence float64 1024 1024 64 C 0.454 0.462 0.98x 0.450 0.445 1.01x
kl_divergence float64 1024 1024 256 C 1.682 1.716 0.98x 1.667 1.705 0.98x
kl_divergence float64 4096 4096 3 C 2.520 2.319 1.09x 2.513 2.430 1.03x
kl_divergence float64 4096 4096 8 C 2.478 2.303 1.08x 2.527 2.390 1.06x
kl_divergence float64 4096 4096 16 C 2.552 2.335 1.09x 2.541 2.443 1.04x
kl_divergence float64 4096 4096 64 C 11.060 12.640 0.88x 10.988 12.469 0.88x
kl_divergence float64 4096 4096 256 C 41.734 42.930 0.97x 41.354 44.327 0.93x
kl_divergence float64 8192 8192 3 C 8.397 8.152 1.03x 8.326 8.172 1.02x
kl_divergence float64 8192 8192 8 C 8.365 8.180 1.02x 8.339 8.178 1.02x
kl_divergence float64 8192 8192 16 C 8.381 8.207 1.02x 8.363 8.196 1.02x
kl_divergence float64 8192 8192 64 C 38.257 43.692 0.88x 37.888 43.262 0.88x
kl_divergence float64 8192 8192 256 C 141.810 161.265 0.88x 142.403 157.154 0.91x
l1 float32 1024 1024 3 C 1.101 479.678 0.00x 0.026 0.026 1.00x
l1 float32 1024 1024 8 C 0.097 309.765 0.00x 0.025 0.025 0.99x
l1 float32 1024 1024 16 C 0.031 0.041 0.76x 0.025 0.025 1.01x
l1 float32 1024 1024 64 C 0.032 0.035 0.91x 0.029 0.029 1.00x
l1 float32 1024 1024 256 C 0.052 0.053 0.99x 0.050 0.050 1.00x
l1 float32 4096 4096 3 C 0.122 0.124 0.99x 0.119 0.118 1.00x
l1 float32 4096 4096 8 C 0.109 0.102 1.07x 0.104 0.102 1.01x
l1 float32 4096 4096 16 C 0.104 0.102 1.02x 0.104 0.103 1.01x
l1 float32 4096 4096 64 C 0.173 0.175 0.99x 0.177 0.176 1.00x
l1 float32 4096 4096 256 C 0.582 0.579 1.00x 0.588 0.587 1.00x
l1 float32 8192 8192 3 C 0.342 0.348 0.98x 0.343 0.343 1.00x
l1 float32 8192 8192 8 C 0.293 0.287 1.02x 0.292 0.290 1.01x
l1 float32 8192 8192 16 C 0.291 0.288 1.01x 0.295 0.292 1.01x
l1 float32 8192 8192 64 C 0.537 0.527 1.02x 0.537 0.534 1.01x
l1 float32 8192 8192 256 C 1.886 1.880 1.00x 1.896 1.889 1.00x
l1 float64 1024 1024 3 C 0.616 247.160 0.00x 0.026 0.026 0.98x
l1 float64 1024 1024 8 C 0.080 203.891 0.00x 0.025 0.026 0.99x
l1 float64 1024 1024 16 C 0.029 0.038 0.75x 0.026 0.025 1.02x
l1 float64 1024 1024 64 C 0.036 0.043 0.85x 0.035 0.035 1.01x
l1 float64 1024 1024 256 C 0.075 0.077 0.98x 0.072 0.072 1.01x
l1 float64 4096 4096 3 C 0.106 0.110 0.96x 0.102 0.104 0.98x
l1 float64 4096 4096 8 C 0.097 0.104 0.94x 0.098 0.098 1.00x
l1 float64 4096 4096 16 C 0.100 0.102 0.98x 0.100 0.100 0.99x
l1 float64 4096 4096 64 C 0.276 0.282 0.98x 0.277 0.278 1.00x
l1 float64 4096 4096 256 C 0.984 0.980 1.00x 0.987 0.987 1.00x
l1 float64 8192 8192 3 C 0.286 0.299 0.96x 0.286 0.294 0.97x
l1 float64 8192 8192 8 C 0.272 0.273 1.00x 0.272 0.273 1.00x
l1 float64 8192 8192 16 C 0.281 0.280 1.00x 0.280 0.279 1.00x
l1 float64 8192 8192 64 C 0.860 0.859 1.00x 0.860 0.862 1.00x
l1 float64 8192 8192 256 C 3.206 3.204 1.00x 3.216 3.215 1.00x
l2 float32 1024 1024 3 C 0.115 0.128 0.90x 0.051 0.050 1.01x
l2 float32 1024 1024 8 C 0.087 0.087 1.00x 0.050 0.049 1.01x
l2 float32 1024 1024 16 C 0.063 0.062 1.02x 0.050 0.050 1.00x
l2 float32 1024 1024 64 C 0.070 0.074 0.94x 0.057 0.056 1.01x
l2 float32 1024 1024 256 C 0.093 0.096 0.96x 0.083 0.083 1.00x
l2 float32 4096 4096 3 C 0.143 0.142 1.01x 0.124 0.124 1.00x
l2 float32 4096 4096 8 C 0.144 0.143 1.00x 0.124 0.123 1.01x
l2 float32 4096 4096 16 C 0.128 0.127 1.01x 0.123 0.123 1.00x
l2 float32 4096 4096 64 C 0.172 0.170 1.01x 0.166 0.167 1.00x
l2 float32 4096 4096 256 C 0.328 0.327 1.00x 0.332 0.332 1.00x
l2 float32 8192 8192 3 C 0.401 0.400 1.00x 0.380 0.380 1.00x
l2 float32 8192 8192 8 C 0.395 0.394 1.00x 0.375 0.374 1.00x
l2 float32 8192 8192 16 C 0.379 0.374 1.01x 0.372 0.372 1.00x
l2 float32 8192 8192 64 C 0.543 0.543 1.00x 0.540 0.540 1.00x
l2 float32 8192 8192 256 C 1.171 1.170 1.00x 1.178 1.178 1.00x
l2 float64 1024 1024 3 C 0.068 0.064 1.05x 0.042 0.041 1.01x
l2 float64 1024 1024 8 C 0.055 0.059 0.93x 0.042 0.042 1.01x
l2 float64 1024 1024 16 C 0.053 0.052 1.02x 0.042 0.042 1.01x
l2 float64 1024 1024 64 C 0.060 0.056 1.07x 0.046 0.046 1.01x
l2 float64 1024 1024 256 C 0.069 0.068 1.01x 0.063 0.062 1.00x
l2 float64 4096 4096 3 C 0.122 0.121 1.01x 0.116 0.116 1.00x
l2 float64 4096 4096 8 C 0.124 0.125 0.99x 0.116 0.116 1.00x
l2 float64 4096 4096 16 C 0.123 0.123 1.00x 0.117 0.117 1.00x
l2 float64 4096 4096 64 C 0.185 0.186 0.99x 0.183 0.177 1.03x
l2 float64 4096 4096 256 C 0.433 0.421 1.03x 0.440 0.420 1.05x
l2 float64 8192 8192 3 C 0.354 0.339 1.04x 0.349 0.334 1.05x
l2 float64 8192 8192 8 C 0.353 0.343 1.03x 0.351 0.334 1.05x
l2 float64 8192 8192 16 C 0.361 0.345 1.05x 0.353 0.336 1.05x
l2 float64 8192 8192 64 C 0.612 0.583 1.05x 0.611 0.582 1.05x
l2 float64 8192 8192 256 C 1.599 1.516 1.05x 1.608 1.525 1.05x
lp float32 1024 1024 3 C 0.187 0.185 1.01x 0.158 0.158 1.00x
lp float32 1024 1024 8 C 0.161 0.163 0.99x 0.158 0.158 1.00x
lp float32 1024 1024 16 C 0.160 0.159 1.00x 0.158 0.157 1.00x
lp float32 1024 1024 64 C 0.290 0.293 0.99x 0.293 0.293 1.00x
lp float32 1024 1024 256 C 1.076 1.077 1.00x 1.085 1.084 1.00x
lp float32 4096 4096 3 C 2.550 2.536 1.01x 2.536 2.542 1.00x
lp float32 4096 4096 8 C 2.530 2.552 0.99x 2.547 2.534 1.00x
lp float32 4096 4096 16 C 2.526 2.560 0.99x 2.543 2.542 1.00x
lp float32 4096 4096 64 C 5.119 5.158 0.99x 5.139 5.129 1.00x
lp float32 4096 4096 256 C 19.594 19.528 1.00x 19.616 19.345 1.01x
lp float32 8192 8192 3 C 8.510 8.511 1.00x 8.506 8.499 1.00x
lp float32 8192 8192 8 C 8.520 8.509 1.00x 8.513 8.513 1.00x
lp float32 8192 8192 16 C 8.523 8.541 1.00x 8.526 8.525 1.00x
lp float32 8192 8192 64 C 17.323 17.321 1.00x 17.353 17.294 1.00x
lp float32 8192 8192 256 C 65.735 65.360 1.01x 65.728 65.426 1.00x
lp float64 1024 1024 3 C 0.259 0.255 1.02x 0.230 0.229 1.00x
lp float64 1024 1024 8 C 0.230 0.235 0.98x 0.227 0.227 1.00x
lp float64 1024 1024 16 C 0.224 0.227 0.99x 0.222 0.221 1.00x
lp float64 1024 1024 64 C 0.790 0.784 1.01x 0.784 0.779 1.01x
lp float64 1024 1024 256 C 3.037 3.025 1.00x 3.033 3.012 1.01x
lp float64 4096 4096 3 C 4.423 4.474 0.99x 4.421 4.470 0.99x
lp float64 4096 4096 8 C 4.353 4.389 0.99x 4.354 4.387 0.99x
lp float64 4096 4096 16 C 4.247 4.256 1.00x 4.245 4.256 1.00x
lp float64 4096 4096 64 C 16.176 16.232 1.00x 16.178 16.228 1.00x
lp float64 4096 4096 256 C 63.825 64.085 1.00x 63.828 64.080 1.00x
lp float64 8192 8192 3 C 14.582 14.742 0.99x 14.571 14.727 0.99x
lp float64 8192 8192 8 C 14.357 14.466 0.99x 14.348 14.459 0.99x
lp float64 8192 8192 16 C 13.991 14.031 1.00x 13.988 14.027 1.00x
lp float64 8192 8192 64 C 53.493 53.653 1.00x 53.488 53.650 1.00x
lp float64 8192 8192 256 C 211.214 211.908 1.00x 211.114 211.925 1.00x
minkowski float32 1024 1024 3 C 1.026 834.363 0.00x 0.159 0.158 1.00x
minkowski float32 1024 1024 8 C 0.168 0.173 0.97x 0.158 0.158 1.00x
minkowski float32 1024 1024 16 C 0.168 0.165 1.02x 0.158 0.158 1.00x
minkowski float32 1024 1024 64 C 0.293 0.296 0.99x 0.294 0.294 1.00x
minkowski float32 1024 1024 256 C 1.079 1.102 0.98x 1.087 1.080 1.01x
minkowski float32 4096 4096 3 C 2.498 2.550 0.98x 2.533 2.539 1.00x
minkowski float32 4096 4096 8 C 2.504 2.534 0.99x 2.537 2.538 1.00x
minkowski float32 4096 4096 16 C 2.529 2.554 0.99x 2.544 2.547 1.00x
minkowski float32 4096 4096 64 C 5.131 5.176 0.99x 5.156 5.131 1.00x
minkowski float32 4096 4096 256 C 19.575 19.474 1.01x 19.585 19.500 1.00x
minkowski float32 8192 8192 3 C 8.510 8.488 1.00x 8.508 8.499 1.00x
minkowski float32 8192 8192 8 C 8.541 8.524 1.00x 8.511 8.503 1.00x
minkowski float32 8192 8192 16 C 8.529 8.542 1.00x 8.531 8.525 1.00x
minkowski float32 8192 8192 64 C 17.315 17.397 1.00x 17.367 17.302 1.00x
minkowski float32 8192 8192 256 C 65.760 65.372 1.01x 65.735 65.399 1.01x
minkowski float64 1024 1024 3 C 1.052 847.813 0.00x 0.230 0.229 1.00x
minkowski float64 1024 1024 8 C 0.237 0.239 0.99x 0.227 0.225 1.01x
minkowski float64 1024 1024 16 C 0.226 0.226 1.00x 0.222 0.220 1.01x
minkowski float64 1024 1024 64 C 0.786 0.779 1.01x 0.784 0.777 1.01x
minkowski float64 1024 1024 256 C 3.035 3.009 1.01x 3.033 3.008 1.01x
minkowski float64 4096 4096 3 C 4.423 4.471 0.99x 4.421 4.468 0.99x
minkowski float64 4096 4096 8 C 4.362 4.392 0.99x 4.353 4.387 0.99x
minkowski float64 4096 4096 16 C 4.249 4.262 1.00x 4.245 4.256 1.00x
minkowski float64 4096 4096 64 C 16.175 16.234 1.00x 16.177 16.231 1.00x
minkowski float64 4096 4096 256 C 63.824 64.077 1.00x 63.830 64.077 1.00x
minkowski float64 8192 8192 3 C 14.578 14.736 0.99x 14.569 14.730 0.99x
minkowski float64 8192 8192 8 C 14.358 14.466 0.99x 14.348 14.460 0.99x
minkowski float64 8192 8192 16 C 13.996 14.031 1.00x 13.988 14.028 1.00x
minkowski float64 8192 8192 64 C 53.524 53.650 1.00x 53.484 53.656 1.00x
minkowski float64 8192 8192 256 C 211.073 211.905 1.00x 211.102 211.917 1.00x
sqeuclidean float32 1024 1024 3 C 0.117 0.097 1.22x 0.051 0.046 1.10x
sqeuclidean float32 1024 1024 8 C 0.094 0.085 1.11x 0.049 0.046 1.06x
sqeuclidean float32 1024 1024 16 C 0.063 0.055 1.14x 0.047 0.046 1.02x
sqeuclidean float32 1024 1024 64 C 0.068 0.064 1.07x 0.054 0.052 1.03x
sqeuclidean float32 1024 1024 256 C 0.093 0.086 1.08x 0.080 0.077 1.03x
sqeuclidean float32 4096 4096 3 C 0.129 0.126 1.02x 0.107 0.104 1.03x
sqeuclidean float32 4096 4096 8 C 0.131 0.127 1.03x 0.108 0.105 1.03x
sqeuclidean float32 4096 4096 16 C 0.114 0.120 0.95x 0.108 0.105 1.03x
sqeuclidean float32 4096 4096 64 C 0.160 0.155 1.03x 0.155 0.152 1.02x
sqeuclidean float32 4096 4096 256 C 0.319 0.312 1.02x 0.318 0.315 1.01x
sqeuclidean float32 8192 8192 3 C 0.342 0.331 1.03x 0.312 0.309 1.01x
sqeuclidean float32 8192 8192 8 C 0.333 0.326 1.02x 0.306 0.306 1.00x
sqeuclidean float32 8192 8192 16 C 0.315 0.308 1.02x 0.305 0.304 1.00x
sqeuclidean float32 8192 8192 64 C 0.484 0.478 1.01x 0.479 0.479 1.00x
sqeuclidean float32 8192 8192 256 C 1.101 1.096 1.00x 1.065 1.104 0.97x
sqeuclidean float64 1024 1024 3 C 0.078 0.072 1.08x 0.043 0.041 1.05x
sqeuclidean float64 1024 1024 8 C 0.066 0.052 1.27x 0.043 0.041 1.05x
sqeuclidean float64 1024 1024 16 C 0.055 0.047 1.17x 0.043 0.041 1.04x
sqeuclidean float64 1024 1024 64 C 0.062 0.051 1.20x 0.046 0.045 1.03x
sqeuclidean float64 1024 1024 256 C 0.076 0.066 1.15x 0.062 0.061 1.01x
sqeuclidean float64 4096 4096 3 C 0.112 0.115 0.97x 0.104 0.105 0.99x
sqeuclidean float64 4096 4096 8 C 0.116 0.110 1.05x 0.104 0.106 0.98x
sqeuclidean float64 4096 4096 16 C 0.110 0.109 1.01x 0.105 0.106 0.98x
sqeuclidean float64 4096 4096 64 C 0.168 0.173 0.97x 0.166 0.171 0.97x
sqeuclidean float64 4096 4096 256 C 0.396 0.409 0.97x 0.398 0.415 0.96x
sqeuclidean float64 8192 8192 3 C 0.295 0.303 0.97x 0.288 0.298 0.97x
sqeuclidean float64 8192 8192 8 C 0.297 0.303 0.98x 0.291 0.299 0.97x
sqeuclidean float64 8192 8192 16 C 0.301 0.307 0.98x 0.294 0.301 0.97x
sqeuclidean float64 8192 8192 64 C 0.530 0.551 0.96x 0.531 0.552 0.96x
sqeuclidean float64 8192 8192 256 C 1.433 1.501 0.95x 1.443 1.503 0.96x

@copy-pr-bot
Copy link
Copy Markdown

copy-pr-bot Bot commented May 15, 2026

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

@tarang-jain tarang-jain self-assigned this May 15, 2026
@tarang-jain tarang-jain added the non-breaking Introduces a non-breaking change label May 15, 2026
@aamijar aamijar moved this to In Progress in Unstructured Data Processing May 15, 2026
@aamijar aamijar added the improvement Improves an existing functionality label May 15, 2026
Comment thread cpp/src/distance/detail/pairwise_matrix/jit_lto_kernels/registration_tags.hpp Outdated
Comment thread cpp/src/distance/detail/pairwise_matrix/dispatch_matrix.json
Comment thread cpp/src/distance/detail/pairwise_distance_base.cuh Outdated
Comment thread cpp/include/cuvs/detail/jit_lto/common_fragments.hpp
Comment thread cpp/src/distance/detail/pairwise_matrix/dispatch-inl.cuh Outdated
Comment thread cpp/include/cuvs/detail/jit_lto/common_fragments.hpp Outdated
Comment thread cpp/src/distance/detail/pairwise_matrix/dispatch_matrix.json
Comment thread cpp/src/distance/detail/pairwise_matrix/dispatch_rbf_inst.cu.in
typename IdxT>
__global__ void pairwise_matrix_arch_probe_kernel()
{
}
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added an empty kernel here because we need a ptr to a non-jit kernel to do the arch check.

@tarang-jain tarang-jain marked this pull request as ready for review May 20, 2026 02:46
@tarang-jain tarang-jain requested review from a team as code owners May 20, 2026 02:46
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 20, 2026

Ready to act? Review this PR in Change Stack to turn feedback into patch suggestions you can inspect and refine.

Review Change Stack

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 7310baad-a954-4a1a-91e6-1140bd78fd96

📥 Commits

Reviewing files that changed from the base of the PR and between e3e6d27 and 4d4f178.

📒 Files selected for processing (2)
  • cpp/src/distance/detail/pairwise_matrix/jit_lto_kernels/compute_distance_epilog_matrix.json
  • cpp/src/distance/detail/pairwise_matrix/jit_lto_kernels/compute_distance_epilog_rbf_matrix.json
✅ Files skipped from review due to trivial changes (1)
  • cpp/src/distance/detail/pairwise_matrix/jit_lto_kernels/compute_distance_epilog_rbf_matrix.json
🚧 Files skipped from review as they are similar to previous changes (1)
  • cpp/src/distance/detail/pairwise_matrix/jit_lto_kernels/compute_distance_epilog_matrix.json

📝 Walkthrough

Summary by CodeRabbit

  • Refactor
    • Migrated distance computation from legacy architecture-specific kernels to a JIT-based compilation strategy, improving code maintainability and flexibility.
    • Removed support for older GPU architectures.
    • Updated distance operation implementations to use shared operator utilities consistently.

Walkthrough

Moves pairwise-matrix distance computation into a JIT-LTO pipeline: new fragment tags, device shims and compute/epilog kernels (including RBF), planner and kernel templates, type-to-tag dispatch, runtime arch probe, base integration toggle, include fixes, and CMake kernel generation.

Changes

JIT-LTO Pairwise Distance Migration

Layer / File(s) Summary
Fragment tag system and JIT contracts
cpp/include/cuvs/detail/jit_lto/common_fragments.hpp, cpp/include/cuvs/detail/jit_lto/pairwise_matrix/pairwise_matrix_fragments.hpp
Adds empty tag structs and fragment-tag templates to parameterize JIT-LTO pairwise-matrix fragments (distance, data, acc/out, index, fin-op, layout, Veclen).
Device declarations and compute shims
cpp/src/distance/detail/pairwise_matrix/jit_lto_kernels/device_functions.cuh, cpp/src/distance/detail/pairwise_matrix/jit_lto_kernels/compute_distance_kernel.cu.in, cpp/src/distance/detail/pairwise_matrix/jit_lto_kernels/compute_distance_epilog_kernel.cu.in
Declares extern __device__ JIT entrypoints and provides device-side compute_distance and compute_distance_epilog specializations that forward to distance op core/epilog implementations.
JIT kernel configuration matrices
cpp/src/distance/detail/pairwise_matrix/jit_lto_kernels/*.json
Adds JSON matrices enumerating supported distance ops, data/acc/out types, index types, layout policies and veclen variants for compute and epilog generation (including RBF variants).
Pairwise kernel templates & planner
cpp/src/distance/detail/pairwise_matrix/jit_lto_kernels/pairwise_matrix_planner.hpp, cpp/src/distance/detail/pairwise_matrix/jit_lto_kernels/pairwise_matrix_kernel.cu.in, .../pairwise_matrix_rbf_kernel.cu.in
Adds kernel .cu.in templates (standard and RBF) and PairwiseMatrixPlanner that registers pairwise, compute-distance, and compute-distance-epilog fragments with a static JIT cache and fixed entrypoint.
JIT dispatch type mapping and launcher
cpp/src/distance/detail/pairwise_matrix/jit_lto_kernels/pairwise_matrix_jit.cuh
Implements constexpr type-to-tag mapping for scalars, indices, layouts, fin-ops and distance-op traits, and pairwise_matrix_jit_dispatch which configures planner, derives policy/veclen, and launches the planned kernel.
Runtime dispatch selection and arch probe
cpp/src/distance/detail/pairwise_matrix/dispatch-inl.cuh, cpp/src/distance/detail/pairwise_matrix/dispatch_matrix.json, dispatch_rbf_inst.cu.in, dispatch_rbf_matrix.json
Adds an arch-probe kernel to determine runtime virtual arch without forcing JIT, routes CUTLASS/non-CUTLASS cases to either SM80 dispatch or JIT, and updates dispatch_matrix arch_includes and RBF instantiation types.
Pairwise base JIT integration
cpp/src/distance/detail/pairwise_distance_base.cuh
Conditionally includes JIT device shims under CUVS_DISTANCE_PAIRWISE_USE_JIT and switches inner core and epilog calls to compute_distance / compute_distance_epilog when enabled.
Distance operator include updates
cpp/src/distance/detail/distance_ops/correlation.cuh, hellinger.cuh, l1.cuh, l2_unexp.cuh, l_inf.cuh, cpp/src/distance/detail/sparse/l2_distance.cuh
Adds explicit #include <raft/core/operators.hpp> to provide operator helpers (raft::sqrt, raft::abs, raft::max) and tidies header prolog formatting.
Build system kernel generation
cpp/CMakeLists.txt
Adds distance_ns and pairwise_matrix_jit_dir and invokes generate_jit_lto_kernels to produce compute-distance and compute-distance-epilog kernel fragments (standard and RBF) across layouts and veclen variants, appending outputs to jit_lto_files.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Suggested labels

C++, Build

Suggested reviewers

  • divyegala
  • KyleFromNVIDIA
  • dantegd
🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly summarizes the main feature: JIT LTO Pairwise Distances, which accurately reflects the core of the changeset.
Description check ✅ Passed The description is directly related to the changeset, explaining the refactoring goals and providing performance metrics and benchmark data relevant to the changes.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

Comment thread cpp/src/distance/detail/pairwise_distance_base.cuh
Copy link
Copy Markdown
Member

@divyegala divyegala left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two minor questions, great PR!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

improvement Improves an existing functionality non-breaking Introduces a non-breaking change

Projects

Status: In Progress

Development

Successfully merging this pull request may close these issues.

4 participants