Skip to content

feat: add kernelgen backend#56

Open
ymwangg wants to merge 6 commits into
mainfrom
feat/kernelgen
Open

feat: add kernelgen backend#56
ymwangg wants to merge 6 commits into
mainfrom
feat/kernelgen

Conversation

@ymwangg
Copy link
Copy Markdown
Contributor

@ymwangg ymwangg commented Apr 27, 2026

Summary

  • Add a kernelgen backend that generates NKI kernels via MLIR as an alternative to the existing HLO backend, with MLIR encapsulated behind a builder API
  • Unify the IR interface across HLO and kernelgen backends with a common ComputationIR abstraction, including uniform input/output name resolution and compilation delegation
  • Add op implementations for the kernelgen backend (_kernelgen_impls.py, _register_kernelgen.py) with support for inplace updates (dynamic_update_slice) and custom ops via nki_op.nki_custom_op
  • Add comprehensive unit tests (test_kernelgen_backend.py), op tests (test_kernelgen_ops.py), and numerical tests (test_kernelgen_numerical.py)

Test plan

  • uv run pytest tests/unit/test_kernelgen_backend.py -v -n auto
  • uv run pytest tests/test_kernelgen_ops.py -v -n auto
  • uv run pytest tests/test_kernelgen_numerical.py -v -n auto
  • uv run pytest tests/ -n auto (full suite)

@ymwangg ymwangg requested a review from a team April 27, 2026 19:48
Add a complete kernelgen backend that generates NKI kernels via MLIR,
as an alternative to the existing HLO backend. This includes:

- Kernelgen backend with MLIR encapsulated behind a builder API
- Op implementations for kernelgen (_kernelgen_impls.py)
- Inplace update (dynamic_update_slice) support
- Unified IR interface for HLO and kernelgen backends
- Custom op interface via nki_op.nki_custom_op
- Compilation delegation to nkipy_kernelgen.compile
- Comprehensive unit and numerical tests
Copy link
Copy Markdown
Contributor

@vgene vgene left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This part looks good to me in general. Two minor issues.

if out_idx >= self._user_return_len
}

def resolve_input_arrays(self, original_inputs):
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's discuss this and consolidate IO tensor handling between two backends, especially with aliasing.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've consolidated resolve_input_arrays and get_alias_input_name into a single prepare_io_mapping function, and reused as much logic as possible. For alias handling, the main difference is HLO backend needs to create a new variable .must_alias_input suffix while kernelgen can reuse the same buffer without creating alias variable in the graph. Also, another difference is kernelgen generated neff has automatic variable naming like"input_0", "input_1" and requires extra work to do variable mapping.

Comment thread nkipy/src/nkipy/core/ops/_register_kernelgen.py Outdated
ymwangg added 4 commits May 11, 2026 15:46
…ram_names list

The redundant Dict[str, str] mapping NEFF input names to parameter names
was fragile — it could drift out of sync with _input_specs. Replace with
a List[str] positionally aligned with _input_specs, making the invariant
structurally impossible to violate.
Separate the op registration system so that op definition files are
backend-agnostic and each backend registers its implementations lazily
through its own registration module.

Key changes:
- Add composed_impl to Op class for backend-agnostic fallback dispatch
- Extract all HLO implementations into _hlo_impls.py with lazy
  registration via _register_hlo.py (same pattern as kernelgen)
- Op definition files now only declare Op instances, CPU impls, and
  composed_impl for ops built from other dispatched ops
- Remove redundant per-backend registration of composed ops from
  _register_kernelgen.py (they fall through to composed_impl)

Adding a new backend now only requires registering primitive ops;
all composed ops (floor_divide, tan, mean, cumsum, etc.) automatically
work via the composed_impl fallback.
…ied prepare_io_mapping

Add TensorPlaceholder.original_name so each backend encodes the
user-facing parameter name at construction time (HLO strips the
.must_alias_input suffix; KernelGen uses its _original_param_names
list). This enables a single, backend-agnostic prepare_io_mapping
free function that replaces two per-backend protocol methods, fixes
a silent bug in KernelGen's fallback path, and moves input-count
validation into one place.
BIR emission assigns "in_tensor_N" names during compilation regardless
of caller-provided names — the old comment incorrectly attributed this
to the "C++ pipeline from unnamed MLIR block arguments". Update to
reflect the actual mechanism.
@ymwangg
Copy link
Copy Markdown
Contributor Author

ymwangg commented May 14, 2026

@vgene Updated the PR as we discussed. Please take a look when you have a moment and let me know if you have any comments.

Rename all references from kernelgen/KernelGen to nkigen/NkiGen to align
with the new nkigen package name (formerly nkipy_kernelgen).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants