feat: add kernelgen backend by ymwangg · Pull Request #56 · aws-neuron/nkipy

ymwangg · 2026-04-27T19:48:28Z

Summary

Add a kernelgen backend that generates NKI kernels via MLIR as an alternative to the existing HLO backend, with MLIR encapsulated behind a builder API
Unify the IR interface across HLO and kernelgen backends with a common ComputationIR abstraction, including uniform input/output name resolution and compilation delegation
Add op implementations for the kernelgen backend (_kernelgen_impls.py, _register_kernelgen.py) with support for inplace updates (dynamic_update_slice) and custom ops via nki_op.nki_custom_op
Add comprehensive unit tests (test_kernelgen_backend.py), op tests (test_kernelgen_ops.py), and numerical tests (test_kernelgen_numerical.py)

Test plan

uv run pytest tests/unit/test_kernelgen_backend.py -v -n auto
uv run pytest tests/test_kernelgen_ops.py -v -n auto
uv run pytest tests/test_kernelgen_numerical.py -v -n auto
uv run pytest tests/ -n auto (full suite)

Add a complete kernelgen backend that generates NKI kernels via MLIR, as an alternative to the existing HLO backend. This includes: - Kernelgen backend with MLIR encapsulated behind a builder API - Op implementations for kernelgen (_kernelgen_impls.py) - Inplace update (dynamic_update_slice) support - Unified IR interface for HLO and kernelgen backends - Custom op interface via nki_op.nki_custom_op - Compilation delegation to nkipy_kernelgen.compile - Comprehensive unit and numerical tests

vgene

This part looks good to me in general. Two minor issues.

vgene · 2026-05-07T11:48:15Z

+            if out_idx >= self._user_return_len
+        }
+
+    def resolve_input_arrays(self, original_inputs):


let's discuss this and consolidate IO tensor handling between two backends, especially with aliasing.

I've consolidated resolve_input_arrays and get_alias_input_name into a single prepare_io_mapping function, and reused as much logic as possible. For alias handling, the main difference is HLO backend needs to create a new variable .must_alias_input suffix while kernelgen can reuse the same buffer without creating alias variable in the graph. Also, another difference is kernelgen generated neff has automatic variable naming like"input_0", "input_1" and requires extra work to do variable mapping.

…ram_names list The redundant Dict[str, str] mapping NEFF input names to parameter names was fragile — it could drift out of sync with _input_specs. Replace with a List[str] positionally aligned with _input_specs, making the invariant structurally impossible to violate.

Separate the op registration system so that op definition files are backend-agnostic and each backend registers its implementations lazily through its own registration module. Key changes: - Add composed_impl to Op class for backend-agnostic fallback dispatch - Extract all HLO implementations into _hlo_impls.py with lazy registration via _register_hlo.py (same pattern as kernelgen) - Op definition files now only declare Op instances, CPU impls, and composed_impl for ops built from other dispatched ops - Remove redundant per-backend registration of composed ops from _register_kernelgen.py (they fall through to composed_impl) Adding a new backend now only requires registering primitive ops; all composed ops (floor_divide, tan, mean, cumsum, etc.) automatically work via the composed_impl fallback.

…ied prepare_io_mapping Add TensorPlaceholder.original_name so each backend encodes the user-facing parameter name at construction time (HLO strips the .must_alias_input suffix; KernelGen uses its _original_param_names list). This enables a single, backend-agnostic prepare_io_mapping free function that replaces two per-backend protocol methods, fixes a silent bug in KernelGen's fallback path, and moves input-count validation into one place.

BIR emission assigns "in_tensor_N" names during compilation regardless of caller-provided names — the old comment incorrectly attributed this to the "C++ pipeline from unnamed MLIR block arguments". Update to reflect the actual mechanism.

ymwangg · 2026-05-14T07:42:50Z

@vgene Updated the PR as we discussed. Please take a look when you have a moment and let me know if you have any comments.

Rename all references from kernelgen/KernelGen to nkigen/NkiGen to align with the new nkigen package name (formerly nkipy_kernelgen).

ymwangg requested a review from a team April 27, 2026 19:48

ymwangg force-pushed the feat/kernelgen branch from 67588b8 to ed1dc64 Compare April 27, 2026 19:49

aws-zhehongb approved these changes May 1, 2026

View reviewed changes

vgene reviewed May 7, 2026

View reviewed changes

ymwangg added 4 commits May 11, 2026 15:46

refactor: rename kernelgen backend to nkigen

fd2bca1

Rename all references from kernelgen/KernelGen to nkigen/NkiGen to align with the new nkigen package name (formerly nkipy_kernelgen).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add kernelgen backend#56

feat: add kernelgen backend#56
ymwangg wants to merge 6 commits into
mainfrom
feat/kernelgen

ymwangg commented Apr 27, 2026 •

edited

Loading

Uh oh!

vgene left a comment

Uh oh!

vgene May 7, 2026

Uh oh!

ymwangg May 14, 2026

Uh oh!

Uh oh!

ymwangg commented May 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

ymwangg commented Apr 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test plan

Uh oh!

vgene left a comment

Choose a reason for hiding this comment

Uh oh!

vgene May 7, 2026

Choose a reason for hiding this comment

Uh oh!

ymwangg May 14, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

ymwangg commented May 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

ymwangg commented Apr 27, 2026 •

edited

Loading