Introduce UDF Architecture by divyegala · Pull Request #1804 · rapidsai/cuvs

divyegala · 2026-02-15T01:21:58Z

This PR introduces User-Defined-Functions supporting architecture in cuVS and uses JIT LTO to achieve it. The initial example is written for passing a metric UDF to IVF Flat search kernels.

When tested with native L2 metric and UDF L2 metric, we get native performance.

cpp/include/cuvs/detail/jit_lto/NVRTCLTOFragmentCompiler.hpp

mythrocks · 2026-03-04T19:09:58Z

cpp/include/cuvs/detail/jit_lto/FragmentDatabase.hpp

                            std::size_t size);
+
+void registerNVRTCFragment(std::string const& key,
+                           std::unique_ptr<char[]>&& program,


I haven't gotten far in this change yet, but I wonder why program is not a vector<char> instead of a unique_ptr<char[]>.
Edit: Or a std::string, really.

I don't think there's a particular reason. We need to use the C type char[] so it's just clearer IMO.

cpp/src/detail/jit_lto/NVRTCLTOFragmentCompiler.cu

jinsolp · 2026-03-05T02:47:36Z

cpp/include/cuvs/neighbors/ivf_flat.hpp

+  if constexpr (std::is_same_v<T, uint8_t> && V > 1) {
+    auto diff = __vabsdiffu4(x.raw(), y.raw());
+  } else if constexpr (std::is_same_v<T, int8_t> && V > 1) {
+    auto diff = __vabsdiffs4(x.raw(), y.raw());


Shouldn't we be retuning the diff here?

Good catch again!

jinsolp · 2026-03-05T02:53:56Z

cpp/include/cuvs/neighbors/ivf_flat.hpp

+  if constexpr (std::is_same_v<T, uint8_t> && V > 1) {
+    auto diff = __vabsdiffu4(x.raw(), y.raw());
+    return __dp4a(diff, diff, AccT{0});
+  } else if constexpr (std::is_same_v<T, int8_t> && V > 1) {
+    auto diff = __vabsdiffs4(x.raw(), y.raw());
+    return __dp4a(diff, diff, static_cast<uint32_t>(0));


I don't think this should lead to correctness issues but why is the first one using AccT as the accumulator and the second one uint32_t? Can't we do return __dp4a(diff, diff, AccT{0}); for both?

Yes! Nice catch.

Oh we can't, because __vabsdiffs4 returns a uint32_t.

oh okay no worries

Co-authored-by: MithunR <mythrocks@gmail.com>

…ivf-flat-search-udf

achirkin

Hi @divyegala, thanks for the extensive work! From my side, a few small nitpicks, but also two major things to consider:

First, I think the enabling-nvrtc part can be split into a separate PR. Can it?

Second, the JIT code generation. I feel like having two versions of the code, one as a type-checked C++ header and the other one as a collections of std::string values will be hard to maintain in the long run. Which other code parsing/generation approaches have you considered?

How about generating/parsing AST programmatically? We could use the heavy-but-feature-complete clang set of tools, or something more lightweight, such as cppast at compile time to parse the header subject to JIT. Use C++ custom function attributes to label and find specifically functions to be saved as strings for compilation. Or perhaps even their dependencies?
As a low-effort workaround, maybe just annotate the functions and make a small python parser to extract the pieces of code?

Also, could you please describe (maybe in the PR description) why do we need to go all the way from source code for the UDF functions compiled via nvrtc rather than let the user pre-compile them into LTO IR at build time and use that at runtime?

achirkin · 2026-03-09T08:56:32Z

cpp/include/cuvs/neighbors/ivf_flat.hpp

+  point() = default;
+  __device__ __host__ explicit point(storage_type d) : data_(d) {}
+
+  __device__ __forceinline__ storage_type raw() const { return data_; }


Please use raft macros for these where appropriate:

Suggested change

__device__ __forceinline__ storage_type raw() const { return data_; }

RAFT_DEVICE_INLINE_FUNCTION storage_type raw() const { return data_; }

We cannot use those here. We are not allowed to include any headers in the UDF strings.

Please write this in bold everywhere where it's relevant. To an unprepared viewer like me, there's no indication if this is the case and why. Another developer will see this with no commentary whatsoever and write the same style outside of the nvrtc/udf context.
This is becoming especially a problem in light of AI coding agents who learn the style from the existing project code and replicate it not-so-thoughtfully.

achirkin · 2026-03-09T08:59:32Z

conda/recipes/libcuvs/recipe.yaml

      - if: cuda_major == "13"
        then:
          - libnvjitlink-dev
+          - cuda-nvrtc-dev


I see the nvrtc introduction is shared between this PR and #1807 (all related changes in conda, cmake, and C++ headers). It looks to me like an important change set on its own. Could you please move it in a separate PR? This would reduce the diff of both PRs and will make the commit history in the main branch more granular (much more readable git blame).

achirkin · 2026-03-09T09:21:37Z

cpp/include/cuvs/neighbors/ivf_flat.hpp

+ * @tparam Veclen Vector length (1, 2, 4, 8, 16)
+ */
+template <typename T, typename AccT, int Veclen>
+struct point {


You've probably considered this already, but could you give a small explanation (perhaps right in the docstring for future reference): why can't we use TxN_t or IOType from raft/util/vectorized.cuh here? Either directly or as a wrapped-in carrier type? (adding missing accessors to raft I think is a reasonable idea too)

These are pre-existing template types used by interleaved_scan_kernel to express its internal distance computation functions.

I would say it is orthogonal to this PR to update these types. Any decision to refactor interleaved_scan_kernel should come separately from the JIT/LTO feature.

Ah sorry I didn't look at ivf-flat code for a while and missed when this structure was introduced

achirkin · 2026-03-09T10:48:28Z

cpp/include/cuvs/neighbors/ivf_flat.hpp

+};
+
+// ============================================================
+// Helper Operations - Deduce Veclen from point type!


What does this relate to?

It is so that the user does not have to know the value of Veclen to write their UDF, it is just a convenience class.

It is so that the user does not have to know the value of Veclen to write their UDF, it is just a convenience class.

Could you please write just that in the file? When I stumbled upon this comment, it looked like an instruction to me or some sort of todo note :)

achirkin · 2026-03-09T10:52:20Z

cpp/include/cuvs/neighbors/ivf_flat.hpp

+}
+
+// ============================================================================
+// String versions for JIT compilation


This looks like it's going to be hard to maintain. What if we put all definitions subject to JIT in a separate header and then write a small utility module that would read that header as text at compile time and save it to strings?

That's a little hard and would need us to plug it in with CMake, so that we can embed that generated string back into this header. That said, I do think that it will declutter this header nicely. Would you be okay with a follow-up?

cpp/src/neighbors/ivf_flat/ivf_flat_interleaved_scan.cuh

cpp/src/detail/jit_lto/NVRTCLTOFragmentCompiler.cu

achirkin · 2026-03-09T11:31:18Z

cpp/src/neighbors/ivf_flat/ivf_flat_interleaved_scan_jit.cuh

+  if (params.metric_udf.has_value()) {
+    std::string metric_udf = params.metric_udf.value();
+    // Add explicit template instantiation with actual types
+    metric_udf += "\ntemplate void cuvs::neighbors::ivf_flat::detail::compute_dist<";
+    metric_udf += std::to_string(Veclen);
+    metric_udf += ", ";
+    metric_udf += type_name<T>();
+    metric_udf += ", ";
+    metric_udf += type_name<AccT>();
+    metric_udf += ">(";
+    metric_udf += type_name<AccT>();
+    metric_udf += "&, ";
+    metric_udf += type_name<AccT>();
+    metric_udf += ", ";
+    metric_udf += type_name<AccT>();
+    metric_udf += ");\n";


Since we're C++20 now, I think using std::format is preferable for being more concise:

metric_udf += fmt::format( "\ntemplate void cuvs::neighbors::ivf_flat::detail::compute_dist" "<{}, {}, {}>({}&, {}, {});\n", Veclen, type_name<T>(), type_name<AccT>(), type_name<AccT>(), type_name<AccT>(), type_name<AccT>());

I don't think we have fmt available?

divyegala · 2026-03-09T23:41:56Z

Thanks for your review @achirkin!

First, I think the enabling-nvrtc part can be split into a separate PR. Can it?

I would prefer not to for a few reasons:

No way to test the enabling-nvrtc part. This PR helps establish an e2e test through our algorithms.
It has been tough to find timely reviews for my JIT LTO/UDF work, so I would like to maintain momentum on this PR while it has reviewers.

Second, the JIT code generation

I will attempt to answer all your questions related to the input at the same time.

At the present moment, we only have two options:

Pass a string to nvrtc -> generate LTO-IR -> link with nvjitlink
User passes LTO-IR -> link with nvjiitlink (we can enable this feature in a follow-up PR, I consider this a feature that would require some savviness from the user themselves)

Even though it appears that we are supporting two methods (header or string), we are actually only supporting method 1. The header is simply a convenience around avoiding using raw strings and letting our users figure out at compile time if their UDF has the right interface to work in our kernels.

Creating any other method to parse the UDF would involve a significant investment from our end. That said, we have other teams already working on a better UX - please check out this nvbug.

Keeping efforts of other teams in mind, I consider the current macro-style type checker to be a fair middle ground and low effort bar for us to help users enforce some compile time compatibility checks while we await further developments.

achirkin

No way to test the enabling-nvrtc part. This PR helps establish an e2e test through our algorithms.

Is it really not possible to make a simple nvrtc test with a dummy kernel in our gtests? This is concerning.

It has been tough to find timely reviews for my JIT LTO/UDF work, so I would like to maintain momentum on this PR while it has reviewers.

I'm sorry about that and I'm sorry I waited so long to join the review process. Yet I think it's no secret the size of a PR is the main contributor to deterring timely reviews.

On the topic of nvrtc strings. Thank you for the link, the RFE doc is a delight to read :) It outlines two common existing approaches at their extremes: (1) writing in plain strings vs (2) integrating a custom code generator in cmake. You chose (1) and it has its advantages over (2). In my comments I'm suggesting a middle ground as a slight improvement: why not writing an extremely simple parser to just copy the relevant parts of the header that you wrote as a means of compile-time validation? We don't strictly need to integrate it into the build system. We already do this in multiple places to generate template instances. We could, for example, "define" our own attribute and annotate the functions and structs to be passed to nvrtc like this:

[[cuvs::nvrtc_code_start("END_TOKEN")]]
template <typename T, typename AccT, int V>
__device__ __forceinline__ AccT product(point<T, AccT, V> x, point<T, AccT, V> y)
{
  return dot_product(x, y);
}
)
// END_TOKEN

And then extracting all strings would amount to a regex on [[cuvs::nvrtc_code_start\(".*"\)]] plus a simple parse till the declared token. With a proper parser like clang or cppast we could just filter and extract all annotated AST (with raft macros already applied), but even a simple python script goes a long way in making this easier to maintain.

cpp/include/cuvs/neighbors/ivf_flat.hpp

achirkin · 2026-03-12T06:48:18Z

cpp/include/cuvs/neighbors/ivf_flat.hpp

+  point() = default;
+  __device__ __host__ explicit point(storage_type d) : data_(d) {}
+
+  __device__ __forceinline__ storage_type raw() const { return data_; }


Please write this in bold everywhere where it's relevant. To an unprepared viewer like me, there's no indication if this is the case and why. Another developer will see this with no commentary whatsoever and write the same style outside of the nvrtc/udf context.
This is becoming especially a problem in light of AI coding agents who learn the style from the existing project code and replicate it not-so-thoughtfully.

Don't block the other great JIT-LTO/nvrtc work dependent on this PR over the design issues.

divyegala · 2026-03-12T21:24:50Z

@achirkin

Is it really not possible to make a simple nvrtc test with a dummy kernel in our gtests? This is concerning.

Hmm, it should be actually. We would have to embed a test kernel in libcuvs.so though, which I don't feel great about adding testing components to the main library.

On the topic of nvrtc strings.

I like your proposed solution, and it definitely reads cleaner and more maintainable. It will take some time to figure out the right design to land on, which is why this feature is in the experimental namespace :). Can you please create an issue describing your idea? We can iterate on the design there and update in 26.06.

…rch-udf

divyegala added 30 commits October 2, 2025 18:33

jit lto interleaved scan

a024f61

fix dependencies.yaml

45da4aa

generate files at build time, use tags to avoid compilation of types

a7c8621

passing tests

eb2d74b

update gitignore

d2318e8

separate out distance function from main kernel

5e6afcd

fix deps

6eee4da

add filters as jit device functions, rework caching logic

1de8f28

lto post lambda, cleanup files, generate cmake in build dir

84c6020

don't read hardcoded kernels, use generator properly

22680c8

random cmake changes carried over from 25.10

37f1163

cmake format

0ae5383

remove dep on kernel list

fe56aec

attempt to solve overlinking problem

40c8fd6

reorder if-else in compiler check

e87a8c7

Merge branch 'branch-25.12' into jit-lto-ivf-flat-interleaved

179d733

use cudart apis

32a67bd

merge

c27612e

attempt to link cudart

a4b48b1

revert cudart link, try all arch build of jit lto fatbin sources

d5d692e

cmake format

1c6dd94

missing shared mem setting

30f5ab6

separate cuda 12 and 13 compilation

9674969

merge upstream

24fc47d

remove bench

db9a487

c include directory

aa9294f

style check

2eb77fe

merge upstream

6c685fa

guard cuda calls and use shared_ptr

3e35b99

add AlgorithmPlanner to main target

d0ff62c

divyegala linked an issue Mar 4, 2026 that may be closed by this pull request

Using UDF Architecture in cuVS via JIT LTO #1870

Open

bdice and others added 3 commits March 4, 2026 08:28

Add cudart to cuda-toolkit extras

3f09b32

Merge remote-tracking branch 'origin/main' into ivf-flat-search-udf

77114a4

fix tests

9069280

divyegala requested review from jinsolp and tarang-jain March 4, 2026 17:40

mythrocks reviewed Mar 4, 2026

View reviewed changes

cpp/include/cuvs/detail/jit_lto/NVRTCLTOFragmentCompiler.hpp Outdated Show resolved Hide resolved

mythrocks reviewed Mar 4, 2026

View reviewed changes

cpp/src/detail/jit_lto/NVRTCLTOFragmentCompiler.cu Outdated Show resolved Hide resolved

divyegala removed a link to an issue Mar 4, 2026

Using UDF Architecture in cuVS via JIT LTO #1870

Open

divyegala linked an issue Mar 4, 2026 that may be closed by this pull request

Introduce UDF Architecture and apply to interleaved_scan_kernel metric functions #1871

Open

jinsolp reviewed Mar 5, 2026

View reviewed changes

divyegala and others added 6 commits March 5, 2026 15:41

Apply suggestions from code review

81ed0a2

Co-authored-by: MithunR <mythrocks@gmail.com>

address reviews

342b5cc

Merge branch 'main' into ivf-flat-search-udf

492f293

fix

705dcf9

Merge branch 'ivf-flat-search-udf' of github.com:divyegala/cuvs into …

ac10f8d

…ivf-flat-search-udf

Merge branch 'main' into ivf-flat-search-udf

4d61d4d

achirkin previously requested changes Mar 9, 2026

View reviewed changes

divyegala added 2 commits March 11, 2026 17:17

address review

cb3e23d

Merge branch 'main' of github.com:rapidsai/cuvs into ivf-flat-search-udf

217d42d

achirkin reviewed Mar 12, 2026

View reviewed changes

achirkin requested review from achirkin and removed request for achirkin March 12, 2026 07:54

divyegala changed the base branch from main to release/26.04 March 12, 2026 18:51

divyegala added 2 commits March 12, 2026 21:41

add more docs

bf7de27

Merge remote-tracking branch 'origin/release/26.04' into ivf-flat-sea…

1407f60

…rch-udf

	__device__ __forceinline__ storage_type raw() const { return data_; }
	RAFT_DEVICE_INLINE_FUNCTION storage_type raw() const { return data_; }

Conversation

divyegala commented Feb 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

mythrocks Mar 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

achirkin left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

divyegala commented Mar 9, 2026

Uh oh!

achirkin left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

divyegala commented Mar 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

8 participants

divyegala commented Feb 15, 2026 •

edited

Loading

mythrocks Mar 4, 2026 •

edited

Loading

divyegala commented Mar 12, 2026 •

edited

Loading