Skip to content

Releases: huggingface/kernels

v0.13.0

10 Apr 14:31

Choose a tag to compare

New features

kernels 0.13.0 is a feature-packed release with among other things an improved CLI for building kernels (kernel-builder), Torch 2.11 support, and a tech-preview of TVM FFI support.

kernel-builder CLI overhaul

The build2cmake command has been renamed to kernel-builder. This new tool can be used to develop, build, and upload kernels without directly using Nix.

These are the main subcommands for the new kernel-builder CLI:

  • kernel-builder init: scaffold a new kernel, including tests and benchmarks.
  • kernel-builder build: build a kernel.
  • kernel-builder build-and-copy: build a kernel and copy artifacts to the build directory.
  • kernel-builder build-and-upload: build a kernel and upload it to the Hub.
  • kernel-builder create-pyproject: create Python project such as pyproject.toml to develop kernels in IDEs and editors.
  • kernel-builder devshell / kernel-builder testshell — drop into a development or test shell for a kernel.
  • kernel-builder upload: upload a built kernel to the Hugging Face Hub.
  • kernel-builder list-variants — list all supported build variants for a kernel.

The build, devshell, and testshell subcommands accept a --variant flag to select a specific build variant. All subcommands accept a directory argument instead of requiring a specific working directory.

An installation script is also provided to help new users get a working kernel-builder environment set up quickly, including Nix, the binary cache, and the required trusted-user configuration. Go to the following page for information on how to get started:

https://huggingface.co/docs/kernels/main/en/builder/writing-kernels#quick-install

PyTorch 2.11 support

kernel-builder now supports Torch 2.11. Torch 2.9 support has been removed in accordance with our policy of supporting the two latest PyTorch versions.

TVM FFI kernels (tech preview)

kernels 0.13 adds support for TVM FFI kernels. TVM FFI aims to be a single ABI for multiple frameworks, such as Torch, JAX, NumPy, and CuPy. TVM FFI support is a tech preview. For instance, we might still make changes to the build.toml options for TVM FFI, change the kernel source layout, or change the provided helper functions.

The kernels examples directory provides ReLU and CUTLASS example kernels that use TVM FFI.

Card filling

kernel-builder now supports card filling. If the kernel source repository contains a CARD.md template, building a kernel will fill the template with details about the kernel. When a kernel is uploaded (with kernel-builder upload or kernel-builder build-and-upload), the card will be uploaded as the README.md of the Hub repository. The default card template can be generated with kernels init.

kernels skills

We added a new CLI command for installing an agent-compatible skill. Use kernels skills add to install the skills for AI coding assistants like Claude, Codex, and OpenCode. For now, only the cuda-kernels skill is supported. Skill files are downloaded from the huggingface/kernels directory in this repository. ROCm kernel skills are on the way.

Local kernel overrides

Kernels can now be overridden locally without changing any get_kernel call sites. Set the LOCAL_KERNELS environment variable to a colon-separated list of org/repo=local_path pairs:

LOCAL_KERNELS=kernels-community/activation=/path/to/local/activation

This is useful for testing kernel changes locally before uploading them to the Hub.

This is useful when running some operations on CPU while the rest of the model runs on a GPU.

More reliable uploads of kernels with a very large number of files

Large kernel uploads are now automatically split across multiple commits to stay within Hub limits, rather than failing or requiring manual intervention for kernels with many files.

What's Changed

  • Use lowercase for ninja install in the Windows builder by @danieldk in #237
  • update Dockerfile override with monorepo by @drbh in #239
  • Ensure that metadata.json is correctly added to the output of Windows builds by @danieldk in #242
  • Set version to 0.12.2.dev0 by @danieldk in #238
  • update relative paths and readme cleanups by @drbh in #240
  • Move all kernel component handling to CMake functions by @danieldk in #243
  • Fix torchVersions argument of genKernelFlakeOutputs by @danieldk in #246
  • build2cmake: always generate kernel components for all backends by @danieldk in #245
  • Factor out render_binding and render_extensions by @danieldk in #248
  • Use single setup.py and move writing to common module by @danieldk in #250
  • Move writing of CMake utility fails and ops wrapper to common by @danieldk in #249
  • Improve benchmark command by @drbh in #244
  • Factor out render_deps function by @danieldk in #251
  • Fix build set issues by @danieldk in #252
  • CI: relax timeouts for Hub-based tests by @danieldk in #254
  • Combine CMake preambles for all backends into a single preamble by @danieldk in #253
  • Remove the last backend-specific writer functions by @danieldk in #255
  • Ignore flake locks in examples by @danieldk in #257
  • Fix XPU build by @danieldk in #256
  • Fix the XPU compilation issue by @YangKai0616 in #258
  • Remove previous team members from authors by @julien-c in #261
  • feat: include benchmark dir in bundle by @drbh in #260
  • Remove backend-specific generation and also use CMake variant generation in Nix by @danieldk in #259
  • cmake: merge loops for handing Python and data extensions by @danieldk in #266
  • add init command that pulls template repo by @drbh in #247
  • Add the backend to the ops name by @danieldk in #267
  • Support local kernels in benchmark by @drbh in #265
  • Make CLI-related modules submodules of cli by @danieldk in #269
  • Add support for overriding kernels locally by @danieldk in #271
  • Fix versions torch dependency by @danieldk in #272
  • CMake: merge two condition blocks by @danieldk in #273
  • Upgrade GitHub Actions to latest versions by @salmanmkc in #232
  • Cleanup huggingface hub integration by @drbh in #274
  • Rename cutlass-sycl to sycl-tla by @YangKai0616 in #277
  • get_kernel: support specifying the backend by @danieldk in #268
  • feat: move template into project by @drbh in #275
  • build2cmake: add support for family suffix in CUDA capabilities by @danieldk in #280
  • Benchmark graphics by @drbh in #270
  • [FEATURE] add kernels skills add to the cli by @burtenshaw in #278
  • add cachix to flake and update buildSet by @drbh in #282
  • gen-flake-outputs: add ci-test package by @danieldk in #281
  • add utilities to generate template repo cards by @sayakpaul in #210
  • include repo_id in the card usage. by @sayakpaul in #284
  • Fix aarch64-linux and add it to CI by @danieldk in #286
  • chore: fix minor markdown backtick mistake by @HyperBlaze456 in #289
  • feat: enforce strict kernel name by @drbh in #290
  • pass revision to to cmake template by @drbh in #291
  • builder: support no-arch builds without Nix by @danieldk in #288
  • fix: adjust the template publish workflow by @drbh in #295
  • fix: update template and init to use new repo and format by @drbh in #296
  • fix: adjust token for upload to hub by @drbh in #297
  • update init command to respect naming convention by @drbh in https://github.com/huggingfac...
Read more

v0.12.3

20 Mar 10:21

Choose a tag to compare

What's Changed

Full Changelog: v0.12.2...v0.12.3

v0.12.2

04 Mar 10:03

Choose a tag to compare

New features

This release add experimental Neuron + NKI support to kernels. build2cmake support is currently only available on the main branch.

Full Changelog: v0.12.1...v0.12.2

v0.12.1

26 Jan 16:16

Choose a tag to compare

What's Changed

  • Set version to 0.12.1.dev0 by @danieldk in #233
  • kernels: remove the version warning until we have the Hub UX by @danieldk in #234

Full Changelog: v0.12.0...v0.12.1

v0.12.0

24 Jan 12:23

Choose a tag to compare

New features

Merge of kernels and kernel-builder repositories

kernel-builder has been merged into the kernels repository. This makes it easier for us to coordinate changes that affect both the kernels Python library and the builder. To switch to the new repo when building kernels, replace the following line in flake.nix

kernel-builder.url = "github:huggingface/kernel-builder";

by

kernel-builder.url = "github:huggingface/kernels";

As a result of the merge, the documentation of kernel-builder is now also available at: https://huggingface.co/docs/kernels/

Support for kernel versions

Before kernels 0.12, kernels could be pulled from a repository without specifying a version. This led to the issue that kernels would typically pull from main. As a result, incompatible changes to the main branch would break downstream use of kernels. To avoid this in the future, we introduce kernel versions. In the future, each kernel will have a major version and when a kernel is uploaded to the Hub, it will be uploaded to the corresponding version branch. The kernel author will bump the kernel version when there are incompatible changes. In this way, kernels can evolve their APIs without breakage for existing users. Versioning can be enabled for a kernel by specifying the version in build.toml:

[general]
version = 1

This will add the kernel version to the kernel's metadata and the kernel upload command will upload builds to the v1 version branch.

Kernel users can pull from a version branch using the version argument. For example:

activation = get_kernel("kernels-community/activation", version=1)

For more information, refer to the guide to adopting kernel versions. Getting kernels without a version is deprecated in kernels 0.12 and will become an error in 0.14 (except for local kernels).

PyTorch 2.10 support

Support for PyTorch 2.10 has been added to the builder. Support for Torch 2.8 has been removed in accordance with our policy to support the two latest Torch versions.

Kernel benchmarks

kernels 0.12 adds the experimental kernels benchmark subcommand. This will run benchmarks for a given kernel, if available. The kernels benchmark command will be extended and documented in the upcoming releases.

What's Changed

New Contributors

  • @onel made their first contribution in #214

Full Changelog: v0.11.7...v0.12.0

v0.11.7

08 Jan 15:42

Choose a tag to compare

What's Changed

Full Changelog: v0.11.6...v0.11.7

v0.11.6

08 Jan 09:18

Choose a tag to compare

What's Changed

New Contributors

Full Changelog: v0.11.5...v0.11.6

v0.11.5

17 Dec 15:03

Choose a tag to compare

What's Changed

Full Changelog: v0.11.4...v0.11.5

v0.11.4

16 Dec 14:33

Choose a tag to compare

This release extends support for curated Python dependencies and synchronizes support with upcoming kernel-builder changes.

What's Changed

Full Changelog: v0.11.3...v0.11.4

v0.11.3

05 Dec 15:09

Choose a tag to compare

New features

Use kernel functions to extend layers

Up until now, it was only possible to extend existing layers with kernel layers from the Hub. Starting with this release it's also possible to extend them with kernel functions from the Hub. For instance, a silu-and-mul layer

@use_kernel_forward_from_hub("SiluAndMul")
class SiluAndMul(nn.Module):
    def forward(self, input: torch.Tensor) -> torch.Tensor:
        d = input.shape[-1] // 2
        return F.silu(input[..., :d]) * input[..., d:]

can now be extended with a silu_and_mul function from the Hub:

with use_kernel_mapping({
    "SiluAndMul": {
        "cuda": FuncRepository(
            repo_id="kernels-community/activation",
            func_name="silu_and_mul",
        ),
    }
}):
    kernelize(...)

We have added the FuncRepository, LocalFuncRepository, and LockedFuncRepository classes to load functions from regular, local, and locked repositories.

Making functions extensible

The counterpart to the previous enhancement is that functions can now also be made extensible using the new use_kernel_func_from_hub decorator:

@use_kernel_forward_from_hub("silu_and_mul")
def silu_and_mul(x: torch.Tensor) -> torch.Tensor:
    d = x.shape[-1] // 2
    return F.silu(x[..., :d]) * x[..., d:]

This will implicitly replace the function by a Torch nn.Module. Since Torch modules implement __call__, it can still be called as a function:

out = silu_and_mul(x)

However, when the function stored as part of a model/layer, it will also be kernelized:

class FeedForward(nn.Module):
  def __init__(self, in_features: int, out_features: int):
      self.linear = nn.Linear(in_features, out_features)
      # Note: silu_and_mul is a Torch module.
      self.silu_and_mul = silu_and_mul

  def forward(self, x: torch.Tensor) -> torch.Tensor:
      return self.silu_and_mul(self.linear(x))

Similar to layers, the function can be kernelized using both a Hub layer and a Hub function.

What's Changed

Full Changelog: v0.11.2...v0.11.3