Thread pool limits meaning is incommensurable across libraries and APIs

## `threadpoolctl` limiting API has inconsistent behavior

### 1. Scope of setting limits is inconsistent

`threadpoolctl`'s API to set limits affect different libraries in different ways:

| Library                 | Where is the limit set |
|-------------------------|------------------------|
| OpenMP (libgomp/libomp) | Current thread only    |
| OpenBLAS (pthreads)     | Process-wide           |
| OpenBLAS (Windows)      | Process-wide             |
| OpenBLAS (OpenMP)       | Intended to be process-wide? But currently broken: https://github.com/OpenMathLib/OpenBLAS/pull/5808             |
| MKL (Intel threading)   | Process-wide (but both available)          |
| BLIS | Process-wide, (but [both available](https://github.com/flame/blis/blob/master/docs/Multithreading.md))|
| FlexiBLAS | Depends on the backend! |

See https://github.com/joblib/threadpoolctl/issues/207

### 2. Semantics of limits are inconsistent

There are also different semantics for limits. Given a limit of `L`, this limit can be:

* **Per-thread:** Each thread can have to `L` sub-worker threads.
* **Process-wide:** The process will have a global thread pool with `L` workers.

| Library                 | Limit is—    |
|-------------------------|--------------|
| OpenMP (libgomp/libomp) | Per-thread   |
| OpenBLAS (pthreads)     | Process-wide |
| OpenBLAS (Windows)      | Process-wide   |
| OpenBLAS (OpenMP)       | Per-thread   |
| MKL (Intel threading)   | Per-thread   |
| BLIS | Per-thread, probably |
| FlexiBLAS | Depends on the backend! |

#### Demonstration

Consider an example where I start a Python thread pool with 10 threads, and then run a BLAS operation in each thread:

```python
import psutil
from concurrent.futures import ThreadPoolExecutor
import numpy as np
import os
import threading
import threadpoolctl
import sys

A = np.ones((10_000_000,))

num_threads = 0
stop = False

def count_threads():
    global num_threads
    process = psutil.Process()
    while not stop:
        num_threads = max(num_threads, process.num_threads())

def blasop(_ignore):
    threadpoolctl.threadpool_limits(
        limits=int(sys.argv[1]), user_api="blas"
    )
    for i in range(100):
        A.dot(A)

threadpoolctl.threadpool_limits(
    limits=int(sys.argv[1]), user_api="blas"
)
threading.Thread(target=count_threads).start()
POOL = ThreadPoolExecutor(10)
list(POOL.map(blasop, range(10)))
stop = True
print(threadpoolctl.threadpool_info())
print("Max threads:", num_threads)
```

With OpenBLAS pthreads:

```shell-session
$ python nested-blas.py 10
[{'user_api': 'blas', 'internal_api': 'openblas', 'num_threads': 20, 'prefix': 'libscipy_openblas',
'version': '0.3.31.188.0', 'threading_layer': 'pthreads', 'architecture': 'Haswell'}]
Max threads: 23
```

With MKL (Conda-Forge):

```shell-session
$ python nested-blas.py 10
[{'user_api': 'blas', 'internal_api': 'mkl', 'num_threads': 12, 'prefix': 'libmkl_rt',  
  'version': '2026.0-Product', 'threading_layer': 'intel'},
 {'user_api': 'openmp', 'internal_api': 'openmp', 'num_threads': 12,
   'prefix': 'libomp', 'version': None}]
Max threads: 102
```

That's a very different configuration outcome!

Also, in an earlier version I forgot to set the thread pool limit in each thread, which is necessary because of the different semantics...

## From Python process pools to Python thread pools

If the limit is set on a process which only runs Python code from its main thread, all the variations have the same effect. Because process pools were more common in Python, the current API therefore does behave consistently.

As Python thread pools become more common, the current state of the API is no longer a good one.

## Proposal (big picture)

Notice there are three possible API variants for a library:

* Set the size of a process-wide pool.
* Set the size of thread-specific pool, across all threads.
* Set the size of a thread-specific pool, for the current thread only.

(The fourth quadrant, thread-local setting for a process-wide pool, makes no sense semantically.)

I do not think the current API is viable going forward, it is too inconsistent. On the other hand, it is in use. I therefore propose:

1. Deprecating the current limits API, but leaving it with the current semantics.
2. Add a new API where the user explicitly states which of the 3 variants above they are requesting.
   That is, it should be clear to the user exactly what the outcomes will be.
3. Ideally, leave room for other variations.
4. Add an API for querying which APIs variants are available.

## Next steps

If this big picture approach is acceptable, the next step would be coming up with use cases (preventing oversaturation in thread pools, process pool, etc), and then designing a suitable API.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Thread pool limits meaning is incommensurable across libraries and APIs #208

`threadpoolctl` limiting API has inconsistent behavior

1. Scope of setting limits is inconsistent

2. Semantics of limits are inconsistent

Demonstration

From Python process pools to Python thread pools

Proposal (big picture)

Next steps

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Library	Where is the limit set
OpenMP (libgomp/libomp)	Current thread only
OpenBLAS (pthreads)	Process-wide
OpenBLAS (Windows)	Process-wide
OpenBLAS (OpenMP)	Intended to be process-wide? But currently broken: OpenMathLib/OpenBLAS#5808
MKL (Intel threading)	Process-wide (but both available)
BLIS	Process-wide, (but both available)
FlexiBLAS	Depends on the backend!

Library	Limit is—
OpenMP (libgomp/libomp)	Per-thread
OpenBLAS (pthreads)	Process-wide
OpenBLAS (Windows)	Process-wide
OpenBLAS (OpenMP)	Per-thread
MKL (Intel threading)	Per-thread
BLIS	Per-thread, probably
FlexiBLAS	Depends on the backend!

Thread pool limits meaning is incommensurable across libraries and APIs #208

Description

threadpoolctl limiting API has inconsistent behavior

1. Scope of setting limits is inconsistent

2. Semantics of limits are inconsistent

Demonstration

From Python process pools to Python thread pools

Proposal (big picture)

Next steps

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions

`threadpoolctl` limiting API has inconsistent behavior