Free-threaded CPython can starve a thread reattaching during repeated stop-the-world GC

# Bug report

While hardening free-threaded Python support in `numba-cuda`, we found a workload that could wedge when a stress worker repeatedly called `gc.collect()` in a tight loop while other threads were doing CUDA dispatch work.  Removing only the tight manual-GC worker made the workload pass.

I reduced this to a no-third-party CPython reproducer.  The reduced case is strongest on a debug free-threaded CPython `main` build, where the extra free-threaded GC validation work makes the stop-the-world window long enough to reproduce reliably.

```python
import faulthandler
import gc
import importlib
import os
import sys
import threading
import time

faulthandler.dump_traceback_later(25, exit=True)

modules = (
    "abc", "argparse", "collections", "contextlib",
    "decimal", "enum", "functools", "heapq",
    "importlib", "inspect", "itertools", "json",
    "math", "operator", "random", "re",
    "statistics", "threading", "types", "weakref",
)
for name in modules:
    importlib.import_module(name)

print("python", sys.version, flush=True)
print("abiflags", getattr(sys, "abiflags", ""), flush=True)
print("gil", getattr(sys, "_is_gil_enabled", lambda: None)(), flush=True)
print("debug", hasattr(sys, "gettotalrefcount"), flush=True)
print("pid", os.getpid(), flush=True)

stop = threading.Event()
count = 0

def collect():
    global count
    while not stop.is_set():
        gc.collect()
        count += 1

thread = threading.Thread(target=collect, name="gc-collector")
started = time.monotonic()
thread.start()

time.sleep(5.0)
stop.set()
thread.join(10.0)

elapsed = time.monotonic() - started
print(f"alive={thread.is_alive()} count={count} elapsed={elapsed:.6f}",
      flush=True)

raise SystemExit(1 if thread.is_alive() else 0)
```

Build used for the reproducer:

```bash
./configure --with-pydebug --disable-gil --without-ensurepip
make -j
```

Observed on CPython `main` debug/free-threaded:

```text
Python 3.16.0a0 free-threading build
abiflags td
GIL disabled
debug True
```

I reproduced this on a B200 Linux system with 224 CPUs at the host level.  I have not yet reproduced it on a smaller local workstation.  The test process was running under a Slurm allocation that reported 28 CPUs available to the process and 224 CPUs on the system.

The script regularly times out via the faulthandler watchdog.  The Python-level traceback shows the GC worker in `gc.collect()` and the main thread apparently still at `time.sleep()`.  A native stack shows the more precise state:

```text
gc-collector:
  validate_refcounts() / validate_gc_objects()
  gc_visit_heaps()
  deduce_unreachable_heap()
  gc_collect_internal()
  _PyGC_Collect()
  gc.collect()

main thread:
  _PyParkingLot_Park()
  tstate_wait_attach()
  _PyThreadState_Attach()
  PyEval_RestoreThread()
  pysleep()
  time.sleep()
```

At the time of the stall, GDB showed the interpreter stop-the-world state as:

```text
requested = true
world_stopped = true
thread_countdown = 0
requester = GC worker thread state
main thread state = _Py_THREAD_SUSPENDED
```

The suspected issue is that after `start_the_world()` moves suspended threads back to detached and unparks them, a thread running a tight `gc.collect()` loop can immediately request another stop-the-world pause and re-suspend a just-unparked thread before that thread gets to attach.  The result is starvation of a thread trying to return from a detached operation such as `time.sleep()`.

This was not reproduced in my minimized matrix on:

- a non-debug free-threaded CPython `main` build
- conda-forge Python 3.14.6t

However, the original larger stress workload was first encountered while testing Python 3.14t free-threaded package support, and the reduced CPython `main` debug build gives a clean way to expose the progress bug.

I have a local CPython patch that tracks thread states waiting to attach after a stop-the-world suspension and prevents a subsequent stop-the-world requester from immediately re-suspending those attach waiters.  With that patch:

- the minimized reproducer passes
- the broader pure-Python GC stress matrix passes
- `test_free_threading.test_gc` passes under debug and non-debug free-threaded builds
- `test_free_threading.test_gc test_threading -v` passes under the debug free-threaded build

I will prepare a PR with the fix and regression test once this issue exists.

CPython versions tested on:

- 3.14
- 3.16
- CPython main branch

Operating systems tested on:

- Linux


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Free-threaded CPython can starve a thread reattaching during repeated stop-the-world GC #151518

Bug report

Metadata

Assignees

Labels

Fields

Projects

Milestone

Relationships

Development

Uh oh!

Free-threaded CPython can starve a thread reattaching during repeated stop-the-world GC #151518

Description

Bug report

Metadata

Metadata

Assignees

Labels

Fields

Projects

Milestone

Relationships

Development

Issue actions