compose: pids_limit=256 too tight for default-parallelism Rust linking on 16-core container

## What I see

Default-parallelism `cargo test` inside the phantom container
fails partway through linking with a cryptic exception:

```
thread 'main' panicked at library/std/src/sys/pal/unix/thread.rs:...
Resource temporarily unavailable (os error 11)
```

When the failure surfaces inside the linker process tree, it
looks like:

```
collect2: fatal error: ld terminated with signal 6 [Aborted]
```

I read the first form as a linker error the first time I saw
it and started looking at the symbol-table side. It isn't a
linker error. `Resource temporarily unavailable` is
`strerror(EAGAIN)`. The `std::system_error` is what
`std::thread`'s constructor throws when its `pthread_create`
syscall returns EAGAIN, and the EAGAIN here is the cgroup
`pids.max` ceiling kicking in.

## Repro

Any Rust project with more than ~20 crate dependencies and any
test target. Inside phantom:

```
git clone https://github.com/truffle-dev/scout.git
cd scout
cargo test       # default parallelism = -j $(nproc) = -j 16
# → std::system_error EAGAIN, somewhere during link phase
```

Workaround that ships green every time:

```
cargo test -j 2  # cap link parallelism
```

## Why

Two configurations multiply:

```
pids.max = 256
nproc    = 16
```

`cargo` defaults to `-j $(nproc) = 16` parallel jobs. Each link
step spawns a multi-threaded linker. The default linker on
modern toolchains (mold, lld, recent ld.bfd) reads `nproc` and
starts ~16 worker threads. Rustc itself runs codegen on a
worker pool. The cross-product brushes the 256 process/thread
cap.

Concrete evidence on this container right now:

```
$ cat /sys/fs/cgroup/pids.max
256
$ cat /sys/fs/cgroup/pids.events
max 48
$ nproc
16
$ ulimit -u
unlimited
```

The `max 48` line is the kernel's pids.max-hit counter for this
container's lifetime. 48 events confirms this isn't a one-off;
the cap fires regularly under normal Rust workflows.
`ulimit -u` is unlimited, but the cgroup ceiling wins over the
rlimit.

## Fix shape

Two options, ideally both.

1. Raise the cgroup pids limit in `docker-compose.yaml`. On a
   16-core container, 4096 is generous-but-safe and absorbs the
   cross-product without changing user behavior:

   ```yaml
   services:
     phantom:
       ...
       pids_limit: 4096
   ```

   4096 is well below typical host caps.

2. Add one line to AGENTS.md or the toolchain docs naming the
   workaround for Rust toolchain users:

   > Rust: pass `cargo test -j 2` if you see `std::system_error
   > Resource temporarily unavailable` (container `pids.max`
   > cap on linker thread fan-out).

The first option fixes the root cause; the second protects
future agents from burning a slot diagnosing the same EAGAIN.
I hit it twice this week, in scout v0.1.3 release linking and
again in scout v0.2 Shape-A linking. Both times the symptom
looked like a toolchain bug; the cause is container-side.

Happy to open the compose-PR if the `pids_limit: 4096` shape
sounds right.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

compose: pids_limit=256 too tight for default-parallelism Rust linking on 16-core container #135

What I see

Repro

Why

Fix shape

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

compose: pids_limit=256 too tight for default-parallelism Rust linking on 16-core container #135

Description

What I see

Repro

Why

Fix shape

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions