[FEA] JIT LTO Pairwise Distances by tarang-jain · Pull Request #2099 · rapidsai/cuvs

tarang-jain · 2026-05-15T23:02:22Z

Refactor the sm60 dispatch to use new fragments for:

distance_op
distance epilog

[Note 05/19]:
the fused path is calling PairwiseDistances directly (which in turn subs in fragments for compute_distance). This leads to symbol lookup errors. That means we need to keep the non-jit path around for the fused reductions (discussed with @divyegala)

libcuvs.so size (CUDA 13.2): 255.92 MB -> 238.41 MB
libcuvs.so size (CUDA 12.9): 487.15 MB -> 448.81 MB

Benchmarks to check for regressions:
Hardware: H100
cold_before_ms: benchmark (main) i.e. without warmup runs
cold_after_ms: benchmark (PR) without warmup runs
warm_before_ms: benchmark (main) after warmup runs. We take the median over 20 runs
warm_after_ms: benchmark (PR) after warmup runs. We take the median over 20 runs

metric	dtype	m	n	k	L (layout)	cold_before_ms	cold_after_ms	cold_x	warm_before_ms	warm_after_ms	warm_x
canberra	float32	1024	1024	3	C	0.755	497.942	0.00x	0.128	0.128	1.00x
canberra	float32	1024	1024	8	C	0.124	0.127	0.97x	0.115	0.115	1.00x
canberra	float32	1024	1024	16	C	0.101	0.101	1.00x	0.095	0.095	1.00x
canberra	float32	1024	1024	64	C	0.094	0.090	1.04x	0.088	0.088	1.00x
canberra	float32	1024	1024	256	C	0.290	0.289	1.00x	0.287	0.287	1.00x
canberra	float32	4096	4096	3	C	2.708	2.728	0.99x	2.694	2.724	0.99x
canberra	float32	4096	4096	8	C	2.391	2.274	1.05x	2.378	2.365	1.01x
canberra	float32	4096	4096	16	C	1.857	1.806	1.03x	1.868	1.876	1.00x
canberra	float32	4096	4096	64	C	1.517	1.515	1.00x	1.517	1.518	1.00x
canberra	float32	4096	4096	256	C	5.968	5.961	1.00x	5.972	5.965	1.00x
canberra	float32	8192	8192	3	C	8.422	8.501	0.99x	8.411	8.497	0.99x
canberra	float32	8192	8192	8	C	7.399	7.460	0.99x	7.385	7.454	0.99x
canberra	float32	8192	8192	16	C	5.753	5.795	0.99x	5.747	5.792	0.99x
canberra	float32	8192	8192	64	C	4.966	4.958	1.00x	4.962	4.965	1.00x
canberra	float32	8192	8192	256	C	19.695	19.662	1.00x	19.694	19.673	1.00x
canberra	float64	1024	1024	3	C	0.607	235.230	0.00x	0.103	0.104	0.99x
canberra	float64	1024	1024	8	C	0.095	0.097	0.97x	0.084	0.085	0.99x
canberra	float64	1024	1024	16	C	0.059	0.059	1.00x	0.055	0.054	1.01x
canberra	float64	1024	1024	64	C	0.151	0.152	0.99x	0.150	0.150	1.00x
canberra	float64	1024	1024	256	C	0.535	0.539	0.99x	0.534	0.536	1.00x
canberra	float64	4096	4096	3	C	2.058	2.078	0.99x	2.058	2.083	0.99x
canberra	float64	4096	4096	8	C	1.594	1.611	0.99x	1.592	1.612	0.99x
canberra	float64	4096	4096	16	C	0.832	0.833	1.00x	0.775	0.778	1.00x
canberra	float64	4096	4096	64	C	2.987	3.006	0.99x	2.986	3.003	0.99x
canberra	float64	4096	4096	256	C	11.829	11.897	0.99x	11.830	11.902	0.99x
canberra	float64	8192	8192	3	C	6.200	6.343	0.98x	6.192	6.334	0.98x
canberra	float64	8192	8192	8	C	4.785	4.875	0.98x	4.779	4.871	0.98x
canberra	float64	8192	8192	16	C	2.514	2.519	1.00x	2.510	2.518	1.00x
canberra	float64	8192	8192	64	C	9.821	9.881	0.99x	9.820	9.879	0.99x
canberra	float64	8192	8192	256	C	39.072	39.308	0.99x	39.086	39.319	0.99x
chebyshev	float32	1024	1024	3	C	0.958	500.151	0.00x	0.026	0.026	1.02x
chebyshev	float32	1024	1024	8	C	0.090	358.533	0.00x	0.025	0.025	1.00x
chebyshev	float32	1024	1024	16	C	0.029	0.043	0.68x	0.025	0.025	1.02x
chebyshev	float32	1024	1024	64	C	0.032	0.035	0.91x	0.029	0.029	1.02x
chebyshev	float32	1024	1024	256	C	0.054	0.055	0.99x	0.051	0.051	1.01x
chebyshev	float32	4096	4096	3	C	0.121	0.123	0.99x	0.119	0.117	1.02x
chebyshev	float32	4096	4096	8	C	0.103	0.102	1.01x	0.103	0.104	0.99x
chebyshev	float32	4096	4096	16	C	0.099	0.107	0.93x	0.104	0.105	0.99x
chebyshev	float32	4096	4096	64	C	0.176	0.184	0.96x	0.179	0.179	1.00x
chebyshev	float32	4096	4096	256	C	0.594	0.601	0.99x	0.603	0.605	1.00x
chebyshev	float32	8192	8192	3	C	0.345	0.336	1.03x	0.343	0.338	1.02x
chebyshev	float32	8192	8192	8	C	0.289	0.298	0.97x	0.291	0.292	1.00x
chebyshev	float32	8192	8192	16	C	0.289	0.294	0.98x	0.292	0.292	1.00x
chebyshev	float32	8192	8192	64	C	0.540	0.537	1.01x	0.543	0.542	1.00x
chebyshev	float32	8192	8192	256	C	1.933	1.945	0.99x	1.943	1.954	0.99x
chebyshev	float64	1024	1024	3	C	1.082	377.472	0.00x	0.030	0.030	1.03x
chebyshev	float64	1024	1024	8	C	0.128	344.873	0.00x	0.029	0.029	1.00x
chebyshev	float64	1024	1024	16	C	0.035	0.041	0.85x	0.029	0.029	1.02x
chebyshev	float64	1024	1024	64	C	0.052	0.057	0.92x	0.050	0.050	0.99x
chebyshev	float64	1024	1024	256	C	0.131	0.138	0.95x	0.131	0.133	0.98x
chebyshev	float64	4096	4096	3	C	0.196	0.179	1.10x	0.193	0.172	1.12x
chebyshev	float64	4096	4096	8	C	0.173	0.172	1.01x	0.170	0.167	1.02x
chebyshev	float64	4096	4096	16	C	0.171	0.166	1.03x	0.170	0.168	1.01x
chebyshev	float64	4096	4096	64	C	0.562	0.579	0.97x	0.564	0.577	0.98x
chebyshev	float64	4096	4096	256	C	2.136	2.178	0.98x	2.143	2.185	0.98x
chebyshev	float64	8192	8192	3	C	0.613	0.528	1.16x	0.591	0.516	1.15x
chebyshev	float64	8192	8192	8	C	0.528	0.506	1.04x	0.518	0.501	1.03x
chebyshev	float64	8192	8192	16	C	0.523	0.502	1.04x	0.520	0.504	1.03x
chebyshev	float64	8192	8192	64	C	1.870	1.948	0.96x	1.891	1.948	0.97x
chebyshev	float64	8192	8192	256	C	7.042	7.233	0.97x	7.068	7.225	0.98x
cityblock	float32	1024	1024	3	C	0.037	0.042	0.87x	0.026	0.025	1.01x
cityblock	float32	1024	1024	8	C	0.029	0.030	0.95x	0.025	0.025	1.00x
cityblock	float32	1024	1024	16	C	0.027	0.027	0.97x	0.025	0.025	1.01x
cityblock	float32	1024	1024	64	C	0.030	0.035	0.85x	0.029	0.029	1.02x
cityblock	float32	1024	1024	256	C	0.050	0.051	0.97x	0.051	0.050	1.01x
cityblock	float32	4096	4096	3	C	0.122	0.120	1.02x	0.119	0.118	1.01x
cityblock	float32	4096	4096	8	C	0.101	0.099	1.02x	0.104	0.103	1.01x
cityblock	float32	4096	4096	16	C	0.099	0.099	1.00x	0.104	0.103	1.01x
cityblock	float32	4096	4096	64	C	0.176	0.173	1.02x	0.177	0.176	1.01x
cityblock	float32	4096	4096	256	C	0.583	0.578	1.01x	0.588	0.586	1.00x
cityblock	float32	8192	8192	3	C	0.347	0.339	1.02x	0.343	0.342	1.00x
cityblock	float32	8192	8192	8	C	0.293	0.288	1.02x	0.293	0.290	1.01x
cityblock	float32	8192	8192	16	C	0.292	0.288	1.01x	0.295	0.292	1.01x
cityblock	float32	8192	8192	64	C	0.530	0.526	1.01x	0.536	0.533	1.01x
cityblock	float32	8192	8192	256	C	1.886	1.879	1.00x	1.896	1.889	1.00x
cityblock	float64	1024	1024	3	C	0.033	0.032	1.02x	0.026	0.025	1.02x
cityblock	float64	1024	1024	8	C	0.030	0.028	1.10x	0.026	0.025	1.02x
cityblock	float64	1024	1024	16	C	0.028	0.025	1.10x	0.026	0.025	1.03x
cityblock	float64	1024	1024	64	C	0.037	0.034	1.08x	0.035	0.035	1.01x
cityblock	float64	1024	1024	256	C	0.072	0.073	0.98x	0.072	0.072	1.01x
cityblock	float64	4096	4096	3	C	0.105	0.106	0.99x	0.102	0.104	0.98x
cityblock	float64	4096	4096	8	C	0.098	0.096	1.03x	0.098	0.098	1.00x
cityblock	float64	4096	4096	16	C	0.098	0.101	0.97x	0.100	0.100	1.01x
cityblock	float64	4096	4096	64	C	0.275	0.279	0.99x	0.278	0.277	1.00x
cityblock	float64	4096	4096	256	C	0.983	0.987	1.00x	0.989	0.990	1.00x
cityblock	float64	8192	8192	3	C	0.286	0.293	0.98x	0.286	0.293	0.98x
cityblock	float64	8192	8192	8	C	0.270	0.273	0.99x	0.273	0.274	1.00x
cityblock	float64	8192	8192	16	C	0.278	0.283	0.98x	0.280	0.280	1.00x
cityblock	float64	8192	8192	64	C	0.861	0.862	1.00x	0.859	0.862	1.00x
cityblock	float64	8192	8192	256	C	3.200	3.200	1.00x	3.211	3.208	1.00x
correlation	float32	1024	1024	3	C	1.306	481.815	0.00x	0.043	0.044	1.00x
correlation	float32	1024	1024	8	C	0.152	326.839	0.00x	0.043	0.043	1.00x
correlation	float32	1024	1024	16	C	0.084	0.099	0.84x	0.043	0.043	0.99x
correlation	float32	1024	1024	64	C	0.085	0.097	0.88x	0.046	0.046	1.00x
correlation	float32	1024	1024	256	C	0.100	0.105	0.95x	0.061	0.061	1.00x
correlation	float32	4096	4096	3	C	0.182	0.180	1.01x	0.166	0.167	1.00x
correlation	float32	4096	4096	8	C	0.171	0.171	1.00x	0.166	0.166	1.00x
correlation	float32	4096	4096	16	C	0.169	0.167	1.01x	0.165	0.166	1.00x
correlation	float32	4096	4096	64	C	0.230	0.233	0.99x	0.228	0.228	1.00x
correlation	float32	4096	4096	256	C	0.485	0.489	0.99x	0.489	0.490	1.00x
correlation	float32	8192	8192	3	C	0.477	0.481	0.99x	0.471	0.476	0.99x
correlation	float32	8192	8192	8	C	0.461	0.459	1.00x	0.462	0.460	1.00x
correlation	float32	8192	8192	16	C	0.465	0.462	1.01x	0.463	0.461	1.01x
correlation	float32	8192	8192	64	C	0.587	0.667	0.88x	0.581	0.660	0.88x
correlation	float32	8192	8192	256	C	1.513	1.514	1.00x	1.525	1.524	1.00x
correlation	float64	1024	1024	3	C	0.887	335.799	0.00x	0.045	0.046	0.98x
correlation	float64	1024	1024	8	C	0.152	290.216	0.00x	0.046	0.047	0.99x
correlation	float64	1024	1024	16	C	0.082	0.097	0.85x	0.046	0.047	0.98x
correlation	float64	1024	1024	64	C	0.111	0.097	1.14x	0.054	0.055	0.99x
correlation	float64	1024	1024	256	C	0.117	0.121	0.96x	0.084	0.085	0.99x
correlation	float64	4096	4096	3	C	0.217	0.207	1.05x	0.194	0.197	0.99x
correlation	float64	4096	4096	8	C	0.217	0.198	1.09x	0.212	0.209	1.02x
correlation	float64	4096	4096	16	C	0.219	0.209	1.05x	0.214	0.211	1.01x
correlation	float64	4096	4096	64	C	0.336	0.338	0.99x	0.335	0.336	1.00x
correlation	float64	4096	4096	256	C	0.895	0.909	0.98x	0.898	0.914	0.98x
correlation	float64	8192	8192	3	C	0.626	0.632	0.99x	0.618	0.628	0.98x
correlation	float64	8192	8192	8	C	0.625	0.614	1.02x	0.620	0.609	1.02x
correlation	float64	8192	8192	16	C	0.621	0.612	1.02x	0.621	0.612	1.01x
correlation	float64	8192	8192	64	C	1.000	1.007	0.99x	1.002	1.007	0.99x
correlation	float64	8192	8192	256	C	2.851	2.913	0.98x	2.869	2.927	0.98x
cosine	float32	1024	1024	3	C	3.044	1.421	2.14x	0.052	0.049	1.07x
cosine	float32	1024	1024	8	C	0.269	0.229	1.17x	0.051	0.048	1.08x
cosine	float32	1024	1024	16	C	0.091	0.091	1.00x	0.051	0.048	1.08x
cosine	float32	1024	1024	64	C	0.094	0.090	1.03x	0.059	0.055	1.07x
cosine	float32	1024	1024	256	C	0.173	0.116	1.49x	0.087	0.082	1.07x
cosine	float32	4096	4096	3	C	0.144	0.134	1.08x	0.115	0.111	1.03x
cosine	float32	4096	4096	8	C	0.143	0.129	1.11x	0.116	0.112	1.04x
cosine	float32	4096	4096	16	C	0.126	0.120	1.05x	0.114	0.112	1.02x
cosine	float32	4096	4096	64	C	0.171	0.163	1.04x	0.158	0.158	1.00x
cosine	float32	4096	4096	256	C	0.324	0.326	0.99x	0.332	0.329	1.01x
cosine	float32	8192	8192	3	C	0.365	0.351	1.04x	0.338	0.335	1.01x
cosine	float32	8192	8192	8	C	0.358	0.351	1.02x	0.336	0.333	1.01x
cosine	float32	8192	8192	16	C	0.347	0.341	1.02x	0.331	0.330	1.00x
cosine	float32	8192	8192	64	C	0.497	0.501	0.99x	0.498	0.496	1.00x
cosine	float32	8192	8192	256	C	1.151	1.151	1.00x	1.117	1.159	0.96x
cosine	float64	1024	1024	3	C	1.278	0.613	2.08x	0.044	0.041	1.06x
cosine	float64	1024	1024	8	C	0.116	0.091	1.28x	0.044	0.041	1.08x
cosine	float64	1024	1024	16	C	0.477	0.620	0.77x	0.045	0.042	1.08x
cosine	float64	1024	1024	64	C	0.095	0.091	1.05x	0.048	0.045	1.07x
cosine	float64	1024	1024	256	C	0.108	0.102	1.06x	0.064	0.062	1.04x
cosine	float64	4096	4096	3	C	0.127	0.123	1.03x	0.112	0.111	1.01x
cosine	float64	4096	4096	8	C	0.124	0.120	1.04x	0.113	0.111	1.01x
cosine	float64	4096	4096	16	C	0.123	0.117	1.06x	0.113	0.112	1.01x
cosine	float64	4096	4096	64	C	0.190	0.184	1.03x	0.176	0.179	0.98x
cosine	float64	4096	4096	256	C	0.419	0.426	0.98x	0.412	0.406	1.02x
cosine	float64	8192	8192	3	C	0.328	0.319	1.03x	0.316	0.311	1.02x
cosine	float64	8192	8192	8	C	0.331	0.316	1.05x	0.317	0.312	1.02x
cosine	float64	8192	8192	16	C	0.325	0.316	1.03x	0.318	0.314	1.01x
cosine	float64	8192	8192	64	C	0.571	0.562	1.02x	0.569	0.564	1.01x
cosine	float64	8192	8192	256	C	1.462	1.444	1.01x	1.534	1.453	1.06x
euclidean	float32	1024	1024	3	C	6.477	5.780	1.12x	0.053	0.052	1.01x
euclidean	float32	1024	1024	8	C	0.395	0.372	1.06x	0.051	0.051	1.01x
euclidean	float32	1024	1024	16	C	0.087	0.090	0.97x	0.051	0.050	1.01x
euclidean	float32	1024	1024	64	C	0.090	0.094	0.96x	0.058	0.057	1.01x
euclidean	float32	1024	1024	256	C	0.115	0.115	1.00x	0.084	0.083	1.01x
euclidean	float32	4096	4096	3	C	0.173	0.167	1.04x	0.125	0.125	1.00x
euclidean	float32	4096	4096	8	C	0.156	0.151	1.03x	0.125	0.123	1.01x
euclidean	float32	4096	4096	16	C	0.134	0.130	1.03x	0.124	0.123	1.01x
euclidean	float32	4096	4096	64	C	0.175	0.174	1.01x	0.167	0.167	1.00x
euclidean	float32	4096	4096	256	C	0.360	0.363	0.99x	0.331	0.331	1.00x
euclidean	float32	8192	8192	3	C	0.424	0.425	1.00x	0.381	0.381	1.00x
euclidean	float32	8192	8192	8	C	0.401	0.404	0.99x	0.375	0.369	1.01x
euclidean	float32	8192	8192	16	C	0.383	0.379	1.01x	0.373	0.368	1.01x
euclidean	float32	8192	8192	64	C	0.543	0.542	1.00x	0.540	0.535	1.01x
euclidean	float32	8192	8192	256	C	1.174	1.155	1.02x	1.176	1.164	1.01x
euclidean	float64	1024	1024	3	C	1.135	0.783	1.45x	0.042	0.045	0.94x
euclidean	float64	1024	1024	8	C	0.090	0.083	1.09x	0.042	0.045	0.94x
euclidean	float64	1024	1024	16	C	0.078	0.083	0.93x	0.042	0.044	0.97x
euclidean	float64	1024	1024	64	C	0.073	0.077	0.95x	0.046	0.045	1.01x
euclidean	float64	1024	1024	256	C	0.087	0.088	0.98x	0.061	0.062	0.99x
euclidean	float64	4096	4096	3	C	0.120	0.123	0.98x	0.113	0.113	0.99x
euclidean	float64	4096	4096	8	C	0.119	0.124	0.96x	0.112	0.114	0.98x
euclidean	float64	4096	4096	16	C	0.119	0.119	1.00x	0.113	0.114	0.99x
euclidean	float64	4096	4096	64	C	0.183	0.187	0.98x	0.175	0.180	0.97x
euclidean	float64	4096	4096	256	C	0.416	0.431	0.97x	0.416	0.433	0.96x
euclidean	float64	8192	8192	3	C	0.390	0.391	1.00x	0.332	0.348	0.95x
euclidean	float64	8192	8192	8	C	0.352	0.362	0.97x	0.333	0.350	0.95x
euclidean	float64	8192	8192	16	C	0.346	0.359	0.97x	0.335	0.352	0.95x
euclidean	float64	8192	8192	64	C	0.585	0.619	0.94x	0.578	0.611	0.95x
euclidean	float64	8192	8192	256	C	1.506	1.601	0.94x	1.607	1.610	1.00x
hamming	float32	1024	1024	3	C	1.063	445.932	0.00x	0.026	0.026	1.00x
hamming	float32	1024	1024	8	C	0.095	311.794	0.00x	0.025	0.025	0.99x
hamming	float32	1024	1024	16	C	0.032	0.036	0.89x	0.025	0.025	1.01x
hamming	float32	1024	1024	64	C	0.032	0.033	0.95x	0.029	0.029	1.01x
hamming	float32	1024	1024	256	C	0.056	0.054	1.03x	0.051	0.051	1.01x
hamming	float32	4096	4096	3	C	0.121	0.122	0.98x	0.118	0.119	1.00x
hamming	float32	4096	4096	8	C	0.100	0.102	0.98x	0.102	0.103	1.00x
hamming	float32	4096	4096	16	C	0.104	0.101	1.03x	0.103	0.103	1.00x
hamming	float32	4096	4096	64	C	0.173	0.173	1.00x	0.177	0.177	1.00x
hamming	float32	4096	4096	256	C	0.596	0.600	0.99x	0.605	0.602	1.01x
hamming	float32	8192	8192	3	C	0.344	0.343	1.00x	0.342	0.345	0.99x
hamming	float32	8192	8192	8	C	0.288	0.289	1.00x	0.289	0.291	0.99x
hamming	float32	8192	8192	16	C	0.286	0.287	1.00x	0.289	0.292	0.99x
hamming	float32	8192	8192	64	C	0.531	0.530	1.00x	0.535	0.536	1.00x
hamming	float32	8192	8192	256	C	1.945	1.934	1.01x	1.954	1.943	1.01x
hamming	float64	1024	1024	3	C	0.680	322.386	0.00x	0.026	0.030	0.89x
hamming	float64	1024	1024	8	C	0.097	268.425	0.00x	0.027	0.027	1.01x
hamming	float64	1024	1024	16	C	0.031	0.039	0.81x	0.027	0.026	1.02x
hamming	float64	1024	1024	64	C	0.039	0.042	0.93x	0.038	0.038	1.01x
hamming	float64	1024	1024	256	C	0.089	0.082	1.08x	0.082	0.081	1.00x
hamming	float64	4096	4096	3	C	0.110	0.164	0.67x	0.110	0.163	0.67x
hamming	float64	4096	4096	8	C	0.114	0.113	1.01x	0.115	0.114	1.01x
hamming	float64	4096	4096	16	C	0.117	0.114	1.03x	0.116	0.116	1.01x
hamming	float64	4096	4096	64	C	0.325	0.325	1.00x	0.327	0.328	0.99x
hamming	float64	4096	4096	256	C	1.160	1.163	1.00x	1.165	1.170	1.00x
hamming	float64	8192	8192	3	C	0.315	0.479	0.66x	0.311	0.481	0.65x
hamming	float64	8192	8192	8	C	0.325	0.319	1.02x	0.323	0.318	1.01x
hamming	float64	8192	8192	16	C	0.327	0.321	1.02x	0.326	0.322	1.01x
hamming	float64	8192	8192	64	C	1.023	1.020	1.00x	1.021	1.024	1.00x
hamming	float64	8192	8192	256	C	3.792	3.800	1.00x	3.800	3.812	1.00x
hellinger	float32	1024	1024	3	C	1.054	350.515	0.00x	0.037	0.036	1.03x
hellinger	float32	1024	1024	8	C	0.096	249.521	0.00x	0.035	0.035	1.02x
hellinger	float32	1024	1024	16	C	0.042	0.047	0.91x	0.036	0.034	1.03x
hellinger	float32	1024	1024	64	C	0.046	0.043	1.08x	0.038	0.037	1.01x
hellinger	float32	1024	1024	256	C	0.055	0.057	0.96x	0.053	0.053	1.00x
hellinger	float32	4096	4096	3	C	0.167	0.171	0.97x	0.165	0.165	1.00x
hellinger	float32	4096	4096	8	C	0.152	0.156	0.98x	0.153	0.153	1.00x
hellinger	float32	4096	4096	16	C	0.150	0.154	0.98x	0.153	0.153	1.00x
hellinger	float32	4096	4096	64	C	0.190	0.184	1.03x	0.189	0.187	1.01x
hellinger	float32	4096	4096	256	C	0.474	0.471	1.01x	0.478	0.477	1.00x
hellinger	float32	8192	8192	3	C	0.473	0.471	1.00x	0.475	0.474	1.00x
hellinger	float32	8192	8192	8	C	0.437	0.436	1.00x	0.440	0.439	1.00x
hellinger	float32	8192	8192	16	C	0.450	0.440	1.02x	0.440	0.439	1.00x
hellinger	float32	8192	8192	64	C	0.543	0.541	1.00x	0.545	0.545	1.00x
hellinger	float32	8192	8192	256	C	1.505	1.506	1.00x	1.509	1.512	1.00x
hellinger	float64	1024	1024	3	C	0.670	232.625	0.00x	0.037	0.037	1.00x
hellinger	float64	1024	1024	8	C	0.092	208.067	0.00x	0.037	0.037	1.00x
hellinger	float64	1024	1024	16	C	0.042	0.048	0.87x	0.037	0.037	1.00x
hellinger	float64	1024	1024	64	C	0.047	0.050	0.94x	0.045	0.045	1.00x
hellinger	float64	1024	1024	256	C	0.078	0.080	0.98x	0.076	0.077	0.99x
hellinger	float64	4096	4096	3	C	0.177	0.182	0.97x	0.175	0.177	0.99x
hellinger	float64	4096	4096	8	C	0.170	0.171	1.00x	0.169	0.169	1.00x
hellinger	float64	4096	4096	16	C	0.187	0.170	1.10x	0.186	0.171	1.09x
hellinger	float64	4096	4096	64	C	0.300	0.306	0.98x	0.303	0.308	0.98x
hellinger	float64	4096	4096	256	C	0.867	0.885	0.98x	0.869	0.886	0.98x
hellinger	float64	8192	8192	3	C	0.514	0.515	1.00x	0.511	0.515	0.99x
hellinger	float64	8192	8192	8	C	0.490	0.495	0.99x	0.491	0.491	1.00x
hellinger	float64	8192	8192	16	C	0.498	0.497	1.00x	0.546	0.498	1.10x
hellinger	float64	8192	8192	64	C	0.920	0.935	0.98x	0.923	0.937	0.98x
hellinger	float64	8192	8192	256	C	2.786	2.841	0.98x	2.786	2.843	0.98x
inner_product	float32	1024	1024	3	C	69.165	64.390	1.07x	0.033	0.032	1.05x
inner_product	float32	1024	1024	8	C	0.063	0.067	0.94x	0.033	0.031	1.06x
inner_product	float32	1024	1024	16	C	0.147	0.149	0.99x	0.032	0.033	0.98x
inner_product	float32	1024	1024	64	C	0.063	0.063	1.00x	0.035	0.037	0.96x
inner_product	float32	1024	1024	256	C	0.072	0.069	1.04x	0.049	0.049	1.01x
inner_product	float32	4096	4096	3	C	0.237	0.233	1.02x	0.064	0.063	1.01x
inner_product	float32	4096	4096	8	C	0.145	0.150	0.97x	0.062	0.063	1.00x
inner_product	float32	4096	4096	16	C	0.092	0.091	1.01x	0.070	0.070	0.99x
inner_product	float32	4096	4096	64	C	0.134	0.138	0.97x	0.117	0.117	1.00x
inner_product	float32	4096	4096	256	C	0.434	0.436	1.00x	0.268	0.267	1.00x
inner_product	float32	8192	8192	3	C	0.179	0.177	1.01x	0.151	0.150	1.01x
inner_product	float32	8192	8192	8	C	0.174	0.175	0.99x	0.155	0.156	1.00x
inner_product	float32	8192	8192	16	C	0.201	0.206	0.98x	0.187	0.189	0.99x
inner_product	float32	8192	8192	64	C	0.379	0.384	0.99x	0.361	0.362	1.00x
inner_product	float32	8192	8192	256	C	0.993	0.995	1.00x	0.975	0.975	1.00x
inner_product	float64	1024	1024	3	C	30.439	30.640	0.99x	0.034	0.034	1.00x
inner_product	float64	1024	1024	8	C	0.121	0.118	1.03x	0.035	0.035	1.00x
inner_product	float64	1024	1024	16	C	0.141	0.141	1.00x	0.033	0.033	1.01x
inner_product	float64	1024	1024	64	C	0.060	0.060	1.00x	0.035	0.035	1.00x
inner_product	float64	1024	1024	256	C	0.118	0.119	0.99x	0.043	0.043	1.00x
inner_product	float64	4096	4096	3	C	0.116	0.118	0.98x	0.096	0.096	1.00x
inner_product	float64	4096	4096	8	C	0.330	0.334	0.99x	0.093	0.092	1.01x
inner_product	float64	4096	4096	16	C	0.118	0.112	1.05x	0.093	0.092	1.01x
inner_product	float64	4096	4096	64	C	0.134	0.132	1.01x	0.104	0.104	1.00x
inner_product	float64	4096	4096	256	C	0.251	0.253	0.99x	0.232	0.232	1.00x
inner_product	float64	8192	8192	3	C	0.368	0.371	0.99x	0.299	0.297	1.00x
inner_product	float64	8192	8192	8	C	0.293	0.294	1.00x	0.274	0.274	1.00x
inner_product	float64	8192	8192	16	C	0.346	0.343	1.01x	0.329	0.329	1.00x
inner_product	float64	8192	8192	64	C	0.337	0.331	1.02x	0.311	0.309	1.01x
inner_product	float64	8192	8192	256	C	0.813	0.805	1.01x	0.809	0.804	1.01x
jensenshannon	float32	1024	1024	3	C	0.808	574.979	0.00x	0.107	0.107	1.00x
jensenshannon	float32	1024	1024	8	C	0.117	0.119	0.99x	0.107	0.106	1.01x
jensenshannon	float32	1024	1024	16	C	0.112	0.115	0.97x	0.107	0.106	1.01x
jensenshannon	float32	1024	1024	64	C	0.193	0.194	1.00x	0.190	0.189	1.00x
jensenshannon	float32	1024	1024	256	C	0.692	0.690	1.00x	0.688	0.689	1.00x
jensenshannon	float32	4096	4096	3	C	1.684	1.674	1.01x	1.678	1.670	1.00x
jensenshannon	float32	4096	4096	8	C	1.682	1.673	1.01x	1.681	1.671	1.01x
jensenshannon	float32	4096	4096	16	C	1.682	1.667	1.01x	1.681	1.671	1.01x
jensenshannon	float32	4096	4096	64	C	3.291	3.278	1.00x	3.287	3.275	1.00x
jensenshannon	float32	4096	4096	256	C	12.922	12.895	1.00x	12.918	12.900	1.00x
jensenshannon	float32	8192	8192	3	C	5.505	5.474	1.01x	5.499	5.470	1.01x
jensenshannon	float32	8192	8192	8	C	5.505	5.478	1.00x	5.498	5.470	1.01x
jensenshannon	float32	8192	8192	16	C	5.501	5.472	1.01x	5.500	5.470	1.01x
jensenshannon	float32	8192	8192	64	C	10.832	10.795	1.00x	10.822	10.792	1.00x
jensenshannon	float32	8192	8192	256	C	42.699	42.612	1.00x	42.704	42.619	1.00x
jensenshannon	float64	1024	1024	3	C	1.769	1831.767	0.00x	0.268	0.269	1.00x
jensenshannon	float64	1024	1024	8	C	0.278	0.284	0.98x	0.269	0.269	1.00x
jensenshannon	float64	1024	1024	16	C	0.278	0.275	1.01x	0.268	0.270	0.99x
jensenshannon	float64	1024	1024	64	C	1.111	1.046	1.06x	1.061	1.053	1.01x
jensenshannon	float64	1024	1024	256	C	4.196	4.152	1.01x	4.215	4.211	1.00x
jensenshannon	float64	4096	4096	3	C	5.294	5.264	1.01x	5.262	5.268	1.00x
jensenshannon	float64	4096	4096	8	C	5.276	5.257	1.00x	5.273	5.280	1.00x
jensenshannon	float64	4096	4096	16	C	5.274	5.287	1.00x	5.280	5.280	1.00x
jensenshannon	float64	4096	4096	64	C	23.512	22.845	1.03x	23.241	22.919	1.01x
jensenshannon	float64	4096	4096	256	C	90.298	97.378	0.93x	93.772	96.524	0.97x
jensenshannon	float64	8192	8192	3	C	18.319	18.465	0.99x	18.110	18.085	1.00x
jensenshannon	float64	8192	8192	8	C	18.023	17.990	1.00x	18.108	18.174	1.00x
jensenshannon	float64	8192	8192	16	C	18.328	18.183	1.01x	18.174	18.182	1.00x
jensenshannon	float64	8192	8192	64	C	82.497	81.423	1.01x	81.730	81.544	1.00x
jensenshannon	float64	8192	8192	256	C	341.630	337.802	1.01x	339.172	337.489	1.00x
kl_divergence	float32	1024	1024	3	C	1.203	1139.463	0.00x	0.095	0.094	1.01x
kl_divergence	float32	1024	1024	8	C	0.105	0.109	0.97x	0.095	0.094	1.02x
kl_divergence	float32	1024	1024	16	C	0.106	0.103	1.03x	0.097	0.094	1.03x
kl_divergence	float32	1024	1024	64	C	0.170	0.167	1.02x	0.161	0.159	1.01x
kl_divergence	float32	1024	1024	256	C	0.558	0.565	0.99x	0.552	0.562	0.98x
kl_divergence	float32	4096	4096	3	C	1.704	1.676	1.02x	1.708	1.679	1.02x
kl_divergence	float32	4096	4096	8	C	1.722	1.684	1.02x	1.709	1.680	1.02x
kl_divergence	float32	4096	4096	16	C	1.720	1.687	1.02x	1.710	1.680	1.02x
kl_divergence	float32	4096	4096	64	C	3.478	3.484	1.00x	3.442	3.436	1.00x
kl_divergence	float32	4096	4096	256	C	13.266	13.956	0.95x	13.377	13.596	0.98x
kl_divergence	float32	8192	8192	3	C	5.677	5.581	1.02x	5.674	5.581	1.02x
kl_divergence	float32	8192	8192	8	C	5.679	5.569	1.02x	5.663	5.570	1.02x
kl_divergence	float32	8192	8192	16	C	5.649	5.587	1.01x	5.667	5.587	1.01x
kl_divergence	float32	8192	8192	64	C	12.789	12.595	1.02x	11.655	12.541	0.93x
kl_divergence	float32	8192	8192	256	C	47.794	48.375	0.99x	48.048	48.350	0.99x
kl_divergence	float64	1024	1024	3	C	2.496	2875.166	0.00x	0.130	0.133	0.98x
kl_divergence	float64	1024	1024	8	C	0.141	0.146	0.97x	0.130	0.132	0.98x
kl_divergence	float64	1024	1024	16	C	0.137	0.138	0.99x	0.131	0.132	0.99x
kl_divergence	float64	1024	1024	64	C	0.454	0.462	0.98x	0.450	0.445	1.01x
kl_divergence	float64	1024	1024	256	C	1.682	1.716	0.98x	1.667	1.705	0.98x
kl_divergence	float64	4096	4096	3	C	2.520	2.319	1.09x	2.513	2.430	1.03x
kl_divergence	float64	4096	4096	8	C	2.478	2.303	1.08x	2.527	2.390	1.06x
kl_divergence	float64	4096	4096	16	C	2.552	2.335	1.09x	2.541	2.443	1.04x
kl_divergence	float64	4096	4096	64	C	11.060	12.640	0.88x	10.988	12.469	0.88x
kl_divergence	float64	4096	4096	256	C	41.734	42.930	0.97x	41.354	44.327	0.93x
kl_divergence	float64	8192	8192	3	C	8.397	8.152	1.03x	8.326	8.172	1.02x
kl_divergence	float64	8192	8192	8	C	8.365	8.180	1.02x	8.339	8.178	1.02x
kl_divergence	float64	8192	8192	16	C	8.381	8.207	1.02x	8.363	8.196	1.02x
kl_divergence	float64	8192	8192	64	C	38.257	43.692	0.88x	37.888	43.262	0.88x
kl_divergence	float64	8192	8192	256	C	141.810	161.265	0.88x	142.403	157.154	0.91x
l1	float32	1024	1024	3	C	1.101	479.678	0.00x	0.026	0.026	1.00x
l1	float32	1024	1024	8	C	0.097	309.765	0.00x	0.025	0.025	0.99x
l1	float32	1024	1024	16	C	0.031	0.041	0.76x	0.025	0.025	1.01x
l1	float32	1024	1024	64	C	0.032	0.035	0.91x	0.029	0.029	1.00x
l1	float32	1024	1024	256	C	0.052	0.053	0.99x	0.050	0.050	1.00x
l1	float32	4096	4096	3	C	0.122	0.124	0.99x	0.119	0.118	1.00x
l1	float32	4096	4096	8	C	0.109	0.102	1.07x	0.104	0.102	1.01x
l1	float32	4096	4096	16	C	0.104	0.102	1.02x	0.104	0.103	1.01x
l1	float32	4096	4096	64	C	0.173	0.175	0.99x	0.177	0.176	1.00x
l1	float32	4096	4096	256	C	0.582	0.579	1.00x	0.588	0.587	1.00x
l1	float32	8192	8192	3	C	0.342	0.348	0.98x	0.343	0.343	1.00x
l1	float32	8192	8192	8	C	0.293	0.287	1.02x	0.292	0.290	1.01x
l1	float32	8192	8192	16	C	0.291	0.288	1.01x	0.295	0.292	1.01x
l1	float32	8192	8192	64	C	0.537	0.527	1.02x	0.537	0.534	1.01x
l1	float32	8192	8192	256	C	1.886	1.880	1.00x	1.896	1.889	1.00x
l1	float64	1024	1024	3	C	0.616	247.160	0.00x	0.026	0.026	0.98x
l1	float64	1024	1024	8	C	0.080	203.891	0.00x	0.025	0.026	0.99x
l1	float64	1024	1024	16	C	0.029	0.038	0.75x	0.026	0.025	1.02x
l1	float64	1024	1024	64	C	0.036	0.043	0.85x	0.035	0.035	1.01x
l1	float64	1024	1024	256	C	0.075	0.077	0.98x	0.072	0.072	1.01x
l1	float64	4096	4096	3	C	0.106	0.110	0.96x	0.102	0.104	0.98x
l1	float64	4096	4096	8	C	0.097	0.104	0.94x	0.098	0.098	1.00x
l1	float64	4096	4096	16	C	0.100	0.102	0.98x	0.100	0.100	0.99x
l1	float64	4096	4096	64	C	0.276	0.282	0.98x	0.277	0.278	1.00x
l1	float64	4096	4096	256	C	0.984	0.980	1.00x	0.987	0.987	1.00x
l1	float64	8192	8192	3	C	0.286	0.299	0.96x	0.286	0.294	0.97x
l1	float64	8192	8192	8	C	0.272	0.273	1.00x	0.272	0.273	1.00x
l1	float64	8192	8192	16	C	0.281	0.280	1.00x	0.280	0.279	1.00x
l1	float64	8192	8192	64	C	0.860	0.859	1.00x	0.860	0.862	1.00x
l1	float64	8192	8192	256	C	3.206	3.204	1.00x	3.216	3.215	1.00x
l2	float32	1024	1024	3	C	0.115	0.128	0.90x	0.051	0.050	1.01x
l2	float32	1024	1024	8	C	0.087	0.087	1.00x	0.050	0.049	1.01x
l2	float32	1024	1024	16	C	0.063	0.062	1.02x	0.050	0.050	1.00x
l2	float32	1024	1024	64	C	0.070	0.074	0.94x	0.057	0.056	1.01x
l2	float32	1024	1024	256	C	0.093	0.096	0.96x	0.083	0.083	1.00x
l2	float32	4096	4096	3	C	0.143	0.142	1.01x	0.124	0.124	1.00x
l2	float32	4096	4096	8	C	0.144	0.143	1.00x	0.124	0.123	1.01x
l2	float32	4096	4096	16	C	0.128	0.127	1.01x	0.123	0.123	1.00x
l2	float32	4096	4096	64	C	0.172	0.170	1.01x	0.166	0.167	1.00x
l2	float32	4096	4096	256	C	0.328	0.327	1.00x	0.332	0.332	1.00x
l2	float32	8192	8192	3	C	0.401	0.400	1.00x	0.380	0.380	1.00x
l2	float32	8192	8192	8	C	0.395	0.394	1.00x	0.375	0.374	1.00x
l2	float32	8192	8192	16	C	0.379	0.374	1.01x	0.372	0.372	1.00x
l2	float32	8192	8192	64	C	0.543	0.543	1.00x	0.540	0.540	1.00x
l2	float32	8192	8192	256	C	1.171	1.170	1.00x	1.178	1.178	1.00x
l2	float64	1024	1024	3	C	0.068	0.064	1.05x	0.042	0.041	1.01x
l2	float64	1024	1024	8	C	0.055	0.059	0.93x	0.042	0.042	1.01x
l2	float64	1024	1024	16	C	0.053	0.052	1.02x	0.042	0.042	1.01x
l2	float64	1024	1024	64	C	0.060	0.056	1.07x	0.046	0.046	1.01x
l2	float64	1024	1024	256	C	0.069	0.068	1.01x	0.063	0.062	1.00x
l2	float64	4096	4096	3	C	0.122	0.121	1.01x	0.116	0.116	1.00x
l2	float64	4096	4096	8	C	0.124	0.125	0.99x	0.116	0.116	1.00x
l2	float64	4096	4096	16	C	0.123	0.123	1.00x	0.117	0.117	1.00x
l2	float64	4096	4096	64	C	0.185	0.186	0.99x	0.183	0.177	1.03x
l2	float64	4096	4096	256	C	0.433	0.421	1.03x	0.440	0.420	1.05x
l2	float64	8192	8192	3	C	0.354	0.339	1.04x	0.349	0.334	1.05x
l2	float64	8192	8192	8	C	0.353	0.343	1.03x	0.351	0.334	1.05x
l2	float64	8192	8192	16	C	0.361	0.345	1.05x	0.353	0.336	1.05x
l2	float64	8192	8192	64	C	0.612	0.583	1.05x	0.611	0.582	1.05x
l2	float64	8192	8192	256	C	1.599	1.516	1.05x	1.608	1.525	1.05x
lp	float32	1024	1024	3	C	0.187	0.185	1.01x	0.158	0.158	1.00x
lp	float32	1024	1024	8	C	0.161	0.163	0.99x	0.158	0.158	1.00x
lp	float32	1024	1024	16	C	0.160	0.159	1.00x	0.158	0.157	1.00x
lp	float32	1024	1024	64	C	0.290	0.293	0.99x	0.293	0.293	1.00x
lp	float32	1024	1024	256	C	1.076	1.077	1.00x	1.085	1.084	1.00x
lp	float32	4096	4096	3	C	2.550	2.536	1.01x	2.536	2.542	1.00x
lp	float32	4096	4096	8	C	2.530	2.552	0.99x	2.547	2.534	1.00x
lp	float32	4096	4096	16	C	2.526	2.560	0.99x	2.543	2.542	1.00x
lp	float32	4096	4096	64	C	5.119	5.158	0.99x	5.139	5.129	1.00x
lp	float32	4096	4096	256	C	19.594	19.528	1.00x	19.616	19.345	1.01x
lp	float32	8192	8192	3	C	8.510	8.511	1.00x	8.506	8.499	1.00x
lp	float32	8192	8192	8	C	8.520	8.509	1.00x	8.513	8.513	1.00x
lp	float32	8192	8192	16	C	8.523	8.541	1.00x	8.526	8.525	1.00x
lp	float32	8192	8192	64	C	17.323	17.321	1.00x	17.353	17.294	1.00x
lp	float32	8192	8192	256	C	65.735	65.360	1.01x	65.728	65.426	1.00x
lp	float64	1024	1024	3	C	0.259	0.255	1.02x	0.230	0.229	1.00x
lp	float64	1024	1024	8	C	0.230	0.235	0.98x	0.227	0.227	1.00x
lp	float64	1024	1024	16	C	0.224	0.227	0.99x	0.222	0.221	1.00x
lp	float64	1024	1024	64	C	0.790	0.784	1.01x	0.784	0.779	1.01x
lp	float64	1024	1024	256	C	3.037	3.025	1.00x	3.033	3.012	1.01x
lp	float64	4096	4096	3	C	4.423	4.474	0.99x	4.421	4.470	0.99x
lp	float64	4096	4096	8	C	4.353	4.389	0.99x	4.354	4.387	0.99x
lp	float64	4096	4096	16	C	4.247	4.256	1.00x	4.245	4.256	1.00x
lp	float64	4096	4096	64	C	16.176	16.232	1.00x	16.178	16.228	1.00x
lp	float64	4096	4096	256	C	63.825	64.085	1.00x	63.828	64.080	1.00x
lp	float64	8192	8192	3	C	14.582	14.742	0.99x	14.571	14.727	0.99x
lp	float64	8192	8192	8	C	14.357	14.466	0.99x	14.348	14.459	0.99x
lp	float64	8192	8192	16	C	13.991	14.031	1.00x	13.988	14.027	1.00x
lp	float64	8192	8192	64	C	53.493	53.653	1.00x	53.488	53.650	1.00x
lp	float64	8192	8192	256	C	211.214	211.908	1.00x	211.114	211.925	1.00x
minkowski	float32	1024	1024	3	C	1.026	834.363	0.00x	0.159	0.158	1.00x
minkowski	float32	1024	1024	8	C	0.168	0.173	0.97x	0.158	0.158	1.00x
minkowski	float32	1024	1024	16	C	0.168	0.165	1.02x	0.158	0.158	1.00x
minkowski	float32	1024	1024	64	C	0.293	0.296	0.99x	0.294	0.294	1.00x
minkowski	float32	1024	1024	256	C	1.079	1.102	0.98x	1.087	1.080	1.01x
minkowski	float32	4096	4096	3	C	2.498	2.550	0.98x	2.533	2.539	1.00x
minkowski	float32	4096	4096	8	C	2.504	2.534	0.99x	2.537	2.538	1.00x
minkowski	float32	4096	4096	16	C	2.529	2.554	0.99x	2.544	2.547	1.00x
minkowski	float32	4096	4096	64	C	5.131	5.176	0.99x	5.156	5.131	1.00x
minkowski	float32	4096	4096	256	C	19.575	19.474	1.01x	19.585	19.500	1.00x
minkowski	float32	8192	8192	3	C	8.510	8.488	1.00x	8.508	8.499	1.00x
minkowski	float32	8192	8192	8	C	8.541	8.524	1.00x	8.511	8.503	1.00x
minkowski	float32	8192	8192	16	C	8.529	8.542	1.00x	8.531	8.525	1.00x
minkowski	float32	8192	8192	64	C	17.315	17.397	1.00x	17.367	17.302	1.00x
minkowski	float32	8192	8192	256	C	65.760	65.372	1.01x	65.735	65.399	1.01x
minkowski	float64	1024	1024	3	C	1.052	847.813	0.00x	0.230	0.229	1.00x
minkowski	float64	1024	1024	8	C	0.237	0.239	0.99x	0.227	0.225	1.01x
minkowski	float64	1024	1024	16	C	0.226	0.226	1.00x	0.222	0.220	1.01x
minkowski	float64	1024	1024	64	C	0.786	0.779	1.01x	0.784	0.777	1.01x
minkowski	float64	1024	1024	256	C	3.035	3.009	1.01x	3.033	3.008	1.01x
minkowski	float64	4096	4096	3	C	4.423	4.471	0.99x	4.421	4.468	0.99x
minkowski	float64	4096	4096	8	C	4.362	4.392	0.99x	4.353	4.387	0.99x
minkowski	float64	4096	4096	16	C	4.249	4.262	1.00x	4.245	4.256	1.00x
minkowski	float64	4096	4096	64	C	16.175	16.234	1.00x	16.177	16.231	1.00x
minkowski	float64	4096	4096	256	C	63.824	64.077	1.00x	63.830	64.077	1.00x
minkowski	float64	8192	8192	3	C	14.578	14.736	0.99x	14.569	14.730	0.99x
minkowski	float64	8192	8192	8	C	14.358	14.466	0.99x	14.348	14.460	0.99x
minkowski	float64	8192	8192	16	C	13.996	14.031	1.00x	13.988	14.028	1.00x
minkowski	float64	8192	8192	64	C	53.524	53.650	1.00x	53.484	53.656	1.00x
minkowski	float64	8192	8192	256	C	211.073	211.905	1.00x	211.102	211.917	1.00x
sqeuclidean	float32	1024	1024	3	C	0.117	0.097	1.22x	0.051	0.046	1.10x
sqeuclidean	float32	1024	1024	8	C	0.094	0.085	1.11x	0.049	0.046	1.06x
sqeuclidean	float32	1024	1024	16	C	0.063	0.055	1.14x	0.047	0.046	1.02x
sqeuclidean	float32	1024	1024	64	C	0.068	0.064	1.07x	0.054	0.052	1.03x
sqeuclidean	float32	1024	1024	256	C	0.093	0.086	1.08x	0.080	0.077	1.03x
sqeuclidean	float32	4096	4096	3	C	0.129	0.126	1.02x	0.107	0.104	1.03x
sqeuclidean	float32	4096	4096	8	C	0.131	0.127	1.03x	0.108	0.105	1.03x
sqeuclidean	float32	4096	4096	16	C	0.114	0.120	0.95x	0.108	0.105	1.03x
sqeuclidean	float32	4096	4096	64	C	0.160	0.155	1.03x	0.155	0.152	1.02x
sqeuclidean	float32	4096	4096	256	C	0.319	0.312	1.02x	0.318	0.315	1.01x
sqeuclidean	float32	8192	8192	3	C	0.342	0.331	1.03x	0.312	0.309	1.01x
sqeuclidean	float32	8192	8192	8	C	0.333	0.326	1.02x	0.306	0.306	1.00x
sqeuclidean	float32	8192	8192	16	C	0.315	0.308	1.02x	0.305	0.304	1.00x
sqeuclidean	float32	8192	8192	64	C	0.484	0.478	1.01x	0.479	0.479	1.00x
sqeuclidean	float32	8192	8192	256	C	1.101	1.096	1.00x	1.065	1.104	0.97x
sqeuclidean	float64	1024	1024	3	C	0.078	0.072	1.08x	0.043	0.041	1.05x
sqeuclidean	float64	1024	1024	8	C	0.066	0.052	1.27x	0.043	0.041	1.05x
sqeuclidean	float64	1024	1024	16	C	0.055	0.047	1.17x	0.043	0.041	1.04x
sqeuclidean	float64	1024	1024	64	C	0.062	0.051	1.20x	0.046	0.045	1.03x
sqeuclidean	float64	1024	1024	256	C	0.076	0.066	1.15x	0.062	0.061	1.01x
sqeuclidean	float64	4096	4096	3	C	0.112	0.115	0.97x	0.104	0.105	0.99x
sqeuclidean	float64	4096	4096	8	C	0.116	0.110	1.05x	0.104	0.106	0.98x
sqeuclidean	float64	4096	4096	16	C	0.110	0.109	1.01x	0.105	0.106	0.98x
sqeuclidean	float64	4096	4096	64	C	0.168	0.173	0.97x	0.166	0.171	0.97x
sqeuclidean	float64	4096	4096	256	C	0.396	0.409	0.97x	0.398	0.415	0.96x
sqeuclidean	float64	8192	8192	3	C	0.295	0.303	0.97x	0.288	0.298	0.97x
sqeuclidean	float64	8192	8192	8	C	0.297	0.303	0.98x	0.291	0.299	0.97x
sqeuclidean	float64	8192	8192	16	C	0.301	0.307	0.98x	0.294	0.301	0.97x
sqeuclidean	float64	8192	8192	64	C	0.530	0.551	0.96x	0.531	0.552	0.96x
sqeuclidean	float64	8192	8192	256	C	1.433	1.501	0.95x	1.443	1.503	0.96x

copy-pr-bot · 2026-05-15T23:02:26Z

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

…e_distance_epilog_kernel.cu.in Co-authored-by: Kyle Edwards <kyedwards@nvidia.com>

tarang-jain · 2026-05-19T22:56:27Z

+          typename IdxT>
+__global__ void pairwise_matrix_arch_probe_kernel()
+{
+}


Added an empty kernel here because we need a ptr to a non-jit kernel to do the arch check.

coderabbitai · 2026-05-20T02:51:42Z

Ready to act? Review this PR in Change Stack to turn feedback into patch suggestions you can inspect and refine.

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 7310baad-a954-4a1a-91e6-1140bd78fd96

📥 Commits

Reviewing files that changed from the base of the PR and between e3e6d27 and 4d4f178.

📒 Files selected for processing (2)

cpp/src/distance/detail/pairwise_matrix/jit_lto_kernels/compute_distance_epilog_matrix.json
cpp/src/distance/detail/pairwise_matrix/jit_lto_kernels/compute_distance_epilog_rbf_matrix.json

✅ Files skipped from review due to trivial changes (1)

cpp/src/distance/detail/pairwise_matrix/jit_lto_kernels/compute_distance_epilog_rbf_matrix.json

🚧 Files skipped from review as they are similar to previous changes (1)

cpp/src/distance/detail/pairwise_matrix/jit_lto_kernels/compute_distance_epilog_matrix.json

📝 Walkthrough

Summary by CodeRabbit

Refactor
- Migrated distance computation from legacy architecture-specific kernels to a JIT-based compilation strategy, improving code maintainability and flexibility.
- Removed support for older GPU architectures.
- Updated distance operation implementations to use shared operator utilities consistently.

Walkthrough

Moves pairwise-matrix distance computation into a JIT-LTO pipeline: new fragment tags, device shims and compute/epilog kernels (including RBF), planner and kernel templates, type-to-tag dispatch, runtime arch probe, base integration toggle, include fixes, and CMake kernel generation.

Changes

JIT-LTO Pairwise Distance Migration

Layer / File(s)	Summary
Fragment tag system and JIT contracts `cpp/include/cuvs/detail/jit_lto/common_fragments.hpp`, `cpp/include/cuvs/detail/jit_lto/pairwise_matrix/pairwise_matrix_fragments.hpp`	Adds empty tag structs and fragment-tag templates to parameterize JIT-LTO pairwise-matrix fragments (distance, data, acc/out, index, fin-op, layout, Veclen).
Device declarations and compute shims `cpp/src/distance/detail/pairwise_matrix/jit_lto_kernels/device_functions.cuh`, `cpp/src/distance/detail/pairwise_matrix/jit_lto_kernels/compute_distance_kernel.cu.in`, `cpp/src/distance/detail/pairwise_matrix/jit_lto_kernels/compute_distance_epilog_kernel.cu.in`	Declares `extern __device__` JIT entrypoints and provides device-side `compute_distance` and `compute_distance_epilog` specializations that forward to distance op core/epilog implementations.
JIT kernel configuration matrices `cpp/src/distance/detail/pairwise_matrix/jit_lto_kernels/*.json`	Adds JSON matrices enumerating supported distance ops, data/acc/out types, index types, layout policies and veclen variants for compute and epilog generation (including RBF variants).
Pairwise kernel templates & planner `cpp/src/distance/detail/pairwise_matrix/jit_lto_kernels/pairwise_matrix_planner.hpp`, `cpp/src/distance/detail/pairwise_matrix/jit_lto_kernels/pairwise_matrix_kernel.cu.in`, `.../pairwise_matrix_rbf_kernel.cu.in`	Adds kernel `.cu.in` templates (standard and RBF) and `PairwiseMatrixPlanner` that registers pairwise, compute-distance, and compute-distance-epilog fragments with a static JIT cache and fixed entrypoint.
JIT dispatch type mapping and launcher `cpp/src/distance/detail/pairwise_matrix/jit_lto_kernels/pairwise_matrix_jit.cuh`	Implements constexpr type-to-tag mapping for scalars, indices, layouts, fin-ops and distance-op traits, and `pairwise_matrix_jit_dispatch` which configures planner, derives policy/veclen, and launches the planned kernel.
Runtime dispatch selection and arch probe `cpp/src/distance/detail/pairwise_matrix/dispatch-inl.cuh`, `cpp/src/distance/detail/pairwise_matrix/dispatch_matrix.json`, `dispatch_rbf_inst.cu.in`, `dispatch_rbf_matrix.json`	Adds an arch-probe kernel to determine runtime virtual arch without forcing JIT, routes CUTLASS/non-CUTLASS cases to either SM80 dispatch or JIT, and updates dispatch_matrix arch_includes and RBF instantiation types.
Pairwise base JIT integration `cpp/src/distance/detail/pairwise_distance_base.cuh`	Conditionally includes JIT device shims under `CUVS_DISTANCE_PAIRWISE_USE_JIT` and switches inner `core` and epilog calls to `compute_distance` / `compute_distance_epilog` when enabled.
Distance operator include updates `cpp/src/distance/detail/distance_ops/correlation.cuh`, `hellinger.cuh`, `l1.cuh`, `l2_unexp.cuh`, `l_inf.cuh`, `cpp/src/distance/detail/sparse/l2_distance.cuh`	Adds explicit `#include <raft/core/operators.hpp>` to provide operator helpers (`raft::sqrt`, `raft::abs`, `raft::max`) and tidies header prolog formatting.
Build system kernel generation `cpp/CMakeLists.txt`	Adds `distance_ns` and `pairwise_matrix_jit_dir` and invokes `generate_jit_lto_kernels` to produce compute-distance and compute-distance-epilog kernel fragments (standard and RBF) across layouts and veclen variants, appending outputs to `jit_lto_files`.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Suggested labels

C++, Build

Suggested reviewers

divyegala
KyleFromNVIDIA
dantegd

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title clearly summarizes the main feature: JIT LTO Pairwise Distances, which accurately reflects the core of the changeset.
Description check	✅ Passed	The description is directly related to the changeset, explaining the refactoring goals and providing performance metrics and benchmark data relevant to the changes.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

divyegala

Two minor questions, great PR!

fragments, tags, instantiations, json and cmake

5e5eecf

github-project-automation Bot added this to Unstructured Data Processing May 15, 2026

tarang-jain self-assigned this May 15, 2026

tarang-jain added the non-breaking Introduces a non-breaking change label May 15, 2026

Merge branch 'main' into jit-lto-pw

060aee2

aamijar moved this to In Progress in Unstructured Data Processing May 15, 2026

aamijar added the improvement Improves an existing functionality label May 15, 2026

KyleFromNVIDIA requested changes May 18, 2026

View reviewed changes

KyleFromNVIDIA reviewed May 18, 2026

View reviewed changes

Comment thread cpp/src/distance/detail/pairwise_distance_base.cuh Outdated

tarang-jain and others added 6 commits May 18, 2026 14:31

Update cpp/src/distance/detail/pairwise_matrix/jit_lto_kernels/comput…

cf6ccb0

…e_distance_epilog_kernel.cu.in Co-authored-by: Kyle Edwards <kyedwards@nvidia.com>

Merge branch 'main' into jit-lto-pw

9c0b716

update arch_include

f9ff9e8

Merge branch 'jit-lto-pw' of github.com:tarang-jain/cuvs into jit-lto-pw

3faba38

correct tags

7dce6e3

update template

e1f6a5c

tarang-jain commented May 19, 2026

View reviewed changes

Comment thread cpp/include/cuvs/detail/jit_lto/common_fragments.hpp

tarang-jain commented May 19, 2026

View reviewed changes

Comment thread cpp/src/distance/detail/pairwise_matrix/dispatch-inl.cuh Outdated

tarang-jain added 2 commits May 18, 2026 18:50

reapply cmake comment

9e80919

rm jit boolean

8fcb1bb

KyleFromNVIDIA requested changes May 19, 2026

View reviewed changes

tarang-jain and others added 9 commits May 19, 2026 11:43

address pr reviews

4ae7507

update index type in json

e465980

Merge branch 'main' into jit-lto-pw

150e705

do not switch off clang

7df2f95

Merge branch 'jit-lto-pw' of github.com:tarang-jain/cuvs into jit-lto-pw

ec5ae32

style check and header includes

15bb587

style check and header includes

a2e77d7

compilation errors

5bb0293

fix header includes

5da125a

restore old version of dispatch-inl

d5e942d

tarang-jain commented May 19, 2026

View reviewed changes

style and docs

60d3344

tarang-jain force-pushed the jit-lto-pw branch from 3cc4595 to 60d3344 Compare May 20, 2026 02:37

add the ifdef

680b55d

tarang-jain marked this pull request as ready for review May 20, 2026 02:46

tarang-jain requested review from a team as code owners May 20, 2026 02:46

Merge branch 'main' into jit-lto-pw

86100d4

KyleFromNVIDIA reviewed May 21, 2026

View reviewed changes

Comment thread cpp/src/distance/detail/pairwise_distance_base.cuh

KyleFromNVIDIA approved these changes May 21, 2026

View reviewed changes

tarang-jain and others added 5 commits June 4, 2026 11:34

Merge branch 'main' into jit-lto-pw

9957163

Merge branch 'main' of https://github.com/rapidsai/cuvs into jit-lto-pw

3ea9e91

Merge branch 'jit-lto-pw' of github.com:tarang-jain/cuvs into jit-lto-pw

e3e6d27

Merge branch 'main' into jit-lto-pw

9964674

Merge branch 'main' into jit-lto-pw

846d48f

divyegala reviewed Jun 5, 2026

View reviewed changes

Comment thread cpp/src/distance/detail/pairwise_matrix/jit_lto_kernels/compute_distance_epilog_matrix.json Outdated

Comment thread cpp/src/distance/detail/pairwise_matrix/jit_lto_kernels/compute_distance_epilog_rbf_matrix.json Outdated

tarang-jain added 2 commits June 5, 2026 20:08

simplify json

884caf2

Merge branch 'jit-lto-pw' of github.com:tarang-jain/cuvs into jit-lto-pw

4d4f178

divyegala approved these changes Jun 6, 2026

View reviewed changes

Conversation

tarang-jain commented May 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

copy-pr-bot Bot commented May 15, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

tarang-jain May 19, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot commented May 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Walkthrough

Changes

Estimated code review effort

Suggested labels

Suggested reviewers

❌ Failed checks (1 warning)

Uh oh!

Uh oh!

divyegala left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

tarang-jain commented May 15, 2026 •

edited

Loading

coderabbitai Bot commented May 20, 2026 •

edited

Loading