diff --git a/docs/src/tutorials/kernels.md b/docs/src/tutorials/kernels.md
new file mode 100644
index 0000000000..2a82948e0a
--- /dev/null
+++ b/docs/src/tutorials/kernels.md
@@ -0,0 +1,73 @@
+# Kernels
+
+Suppose your codebase contains custom GPU kernels, typically those defined with [KernelAbstractions.jl](https://github.com/JuliaGPU/KernelAbstractions.jl).
+
+## Example
+
+```@example kernels
+using KernelAbstractions
+
+@kernel function square_kernel!(y, @Const(x))
+    i = @index(Global)
+    @inbounds y[i] = x[i] * x[i]
+end
+
+function square(x)
+    y = similar(x)
+    backend = KernelAbstractions.get_backend(x)
+    kernel! = square_kernel!(backend)
+    kernel!(y, x; ndrange=length(x))
+    return y
+end
+```
+
+```jldoctest kernels
+x = float.(1:5)
+y = square(x)
+
+# output
+
+5-element Vector{Float64}:
+  1.0
+  4.0
+  9.0
+ 16.0
+ 25.0
+```
+
+## Kernel compilation
+
+To compile such kernels with Reactant, you need to pass the option `raise=true` to the `@compile` or `@jit` macro.
+Furthermore, the [CUDA.jl](https://github.com/JuliaGPU/CUDA.jl) package needs to be loaded (even on non-NVIDIA hardware).
+
+```jldoctest kernels
+import CUDA
+using Reactant
+
+xr = ConcreteRArray(x)
+yr = @jit raise=true square(xr)
+
+# output
+
+5-element ConcretePJRTArray{Float64,1}:
+  1.0
+  4.0
+  9.0
+ 16.0
+ 25.0
+```
+
+## Differentiated kernel
+
+In addition, if you want to compute derivatives of your kernel with [Enzyme.jl](https://github.com/EnzymeAD/Enzyme.jl), the option `raise_first=true` also becomes necessary.
+
+```jldoctest kernels
+import Enzyme
+
+sumsquare(x) = sum(square(x))
+gr = @jit raise=true raise_first=true Enzyme.gradient(Enzyme.Reverse, sumsquare, xr)
+
+# output
+
+(ConcretePJRTArray{Float64, 1, 1}([2.0, 4.0, 6.0, 8.0, 10.0]),)
+```