tailcall lets you write deeply recursive functions without blowing the stack on stable Rust.
It provides explicit, stack-safe tail calls using a lightweight trampoline runtime, with a macro that keeps usage ergonomic.
The runtime crate is no_std, so it can be used on targets without the standard library.
If the proposed become keyword is ever stabilized, it will likely be the preferred solution for proper tail calls.
[dependencies]
tailcall = "~2"use tailcall::tailcall;
#[tailcall]
fn gcd(a: u64, b: u64) -> u64 {
if b == 0 {
a
} else {
tailcall::call! { gcd(b, a % b) }
}
}
assert_eq!(gcd(12, 18), 6);That’s the core API:
- mark the function with
#[tailcall] - wrap recursive tail calls with
tailcall::call!
This runs in constant stack space, even for very large inputs.
Rust does not guarantee tail call optimization. Deep recursion can overflow the stack.
tailcall avoids this by turning recursive steps into deferred computations and executing them in
a loop:
- each recursive step returns a deferred computation (
Thunk) - the runtime repeatedly executes those steps in a loop
- no additional stack frames are created
For simple direct self-recursion, #[tailcall] lowers the function or method to an inline loop.
When that is not possible, such as with mutual recursion or other more complex control flow, the
macro falls back to the more general Thunk runtime.
This turns recursion into iteration under the hood.
A common alternative for stack-safe recursion in Rust is to box each step. That can offer a similar interface, but it introduces allocation and indirection on every recursive step.
tailcall keeps each step inline instead, so the main cost is an extra indirect call per Thunk::bounce step.
In some cases, that cost disappears entirely. If a simple free function or inherent method only
tail-calls itself directly, #[tailcall] can lower it to an inline loop.
On a simple benchmark (relative to a handwritten loop):
- handwritten loop: 1.0×
#[tailcall](inline loop): ~1.0×#[tailcall](Thunk runtime): ~3.2× slower- toy comparison boxed runtime: ~14× slower
This is just a local measurement, but the general shape is the important part:
- direct self-recursion can optimize down to loop-like performance
- the
Thunkruntime is slower, but it supports more complex cases such as mutual recursion, borrowed-state builders, and recursive control flow that doesn’t collapse into a single loop - heap-allocating approaches are slower again (and therefore never used by
tailcall)
Most users only need the macro.
For simple direct self-recursion, the macro can compile free functions and inherent methods to an
inline loop. Mutual recursion and other more complex cases continue to use the hidden Thunk
builder automatically.
#[tailcall]
fn f(...) -> T {
if done {
result
} else {
tailcall::call! { f(next_args...) }
}
}Only calls wrapped in tailcall::call! are stack-safe.
use tailcall::tailcall;
#[tailcall]
fn is_even(x: u128) -> bool {
if x == 0 {
true
} else {
tailcall::call! { is_odd(x - 1) }
}
}
#[tailcall]
fn is_odd(x: u128) -> bool {
if x == 0 {
false
} else {
tailcall::call! { is_even(x - 1) }
}
}use tailcall::tailcall;
struct Parity;
impl Parity {
#[tailcall]
fn is_even(&self, x: u32) -> bool {
if x == 0 {
true
} else {
tailcall::call! { self.is_odd(x - 1) }
}
}
#[tailcall]
fn is_odd(&self, x: u32) -> bool {
if x == 0 {
false
} else {
tailcall::call! { self.is_even(x - 1) }
}
}
}Only tailcall::call! sites participate in the tailcall transform:
use tailcall::tailcall;
#[tailcall]
fn sum(n: u64) -> u64 {
match n {
0 => 0,
_ if n % 2 == 0 => {
let partial = sum(n - 1); // normal recursion
n + partial
}
_ => tailcall::call! { sum(n - 1) },
}
}use tailcall::tailcall;
fn factorial(n: u64) -> u64 {
#[tailcall]
fn inner(acc: u64, n: u64) -> u64 {
if n == 0 {
acc
} else {
tailcall::call! { inner(acc * n, n - 1) }
}
}
inner(1, n)
}For the more general cases, the macro is a thin layer over runtime::Thunk.
A runtime::Thunk<T> is a fixed-size deferred value from a computation, so it can live on the
stack. It may contain the value directly or a type-erased closure that will eventually produce the
value.
On 64-bit targets, the default runtime keeps Thunk at 32 bytes. That uses a small inline slot
for deferred closures, which means manual Thunk values and macro-generated helpers can only
capture a limited amount of data before construction panics. Optional features can trade a larger
Thunk for a larger inline capture budget.
Available tailcall feature tiers:
- default:
Thunkis 32 bytes, with about 16 bytes of inline capture budget at-least-inline-captures-32:Thunkis 48 bytes, with about 32 bytes of inline capture budgetat-least-inline-captures-48:Thunkis 64 bytes, with about 48 bytes of inline capture budget
Pending Thunk values still preserve normal destructor-on-drop behavior for captured values.
You build a chain of steps, then execute it with .call().
Thunk::value(x)— final resultThunk::new(f)— deferred computation returning a valueThunk::bounce(f)— deferred computation returning anotherThunk(this is what enables stack safety)
use tailcall::runtime::Thunk;
fn count_down(n: u64) -> u64 {
build(n).call()
}
fn build(n: u64) -> Thunk<'static, u64> {
Thunk::bounce(move || {
if n == 0 {
Thunk::value(0)
} else {
build(n - 1)
}
})
}Thunk::bounce ensures each step returns control to the runtime loop instead of growing the call stack.
#[tailcall]currently only supports simple identifier arguments. Patterns in function parameters are not rewritten by the macro.- The
?operator is not supported inside#[tailcall]functions on stable Rust. Usematchor explicit early returns instead. - Methods in ordinary
implblocks are supported. Trait methods are not supported. #[tailcall]does not supportasync fnorconst fn.- Each deferred closure is stored in a fixed-size inline slot.
By default that budget is about 16 bytes on 64-bit targets; optional features can increase it
by making
Thunkitself larger.
The workspace is split by responsibility:
tailcall: the published runtime crate and its runtime-internal unit tests.tailcall-proc-macro: the proc-macro crate directory in this repo. The published package name remainstailcall-impl.std-integration: downstreamstdintegration tests and benchmarks that exercise the public API.no-std-integration: downstream#![no_std]integration tests for the public API.
This repo pins its toolchain in rust-toolchain.toml, so plain cargo commands run against the
expected Rust version for local development and CI.
For normal development, the main local checks are:
cargo test
cargo test --doc
cargo +nightly miri test --all
cargo fmt --all -- --check
cargo clippy --all
cargo doc --no-deps
cargo bench -p std-integration --no-runtailcall and the proc-macro package tailcall-impl are released together.
- Update the versions in
tailcall/Cargo.tomlandtailcall-proc-macro/Cargo.toml. - Run the release checks:
cargo test
cargo test --doc
cargo doc --no-deps- Commit the release version bump.
- Publish the proc-macro package first:
cargo publish -p tailcall-impl- Publish
tailcallafter the proc-macro crate is available:
cargo publish -p tailcall- Tag the release from
mainand push the commit and tag:
git tag vX.Y.Z
git push origin main
git push origin vX.Y.ZTailcall is distributed under the terms of both the MIT license and the Apache License (Version 2.0).
See LICENSE-APACHE, LICENSE-MIT, and COPYRIGHT for details.