Skip to content

Thread pool limits meaning is incommensurable across libraries and APIs #208

@itamarst

Description

@itamarst

threadpoolctl limiting API has inconsistent behavior

1. Scope of setting limits is inconsistent

threadpoolctl's API to set limits affect different libraries in different ways:

Library Where is the limit set
OpenMP (libgomp/libomp) Current thread only
OpenBLAS (pthreads) Process-wide
OpenBLAS (Windows) Process-wide
OpenBLAS (OpenMP) Intended to be process-wide? But currently broken: OpenMathLib/OpenBLAS#5808
MKL (Intel threading) Process-wide (but both available)
BLIS Process-wide, (but both available)
FlexiBLAS Depends on the backend!

See #207

2. Semantics of limits are inconsistent

There are also different semantics for limits. Given a limit of L, this limit can be:

  • Per-thread: Each thread can have to L sub-worker threads.
  • Process-wide: The process will have a global thread pool with L workers.
Library Limit is—
OpenMP (libgomp/libomp) Per-thread
OpenBLAS (pthreads) Process-wide
OpenBLAS (Windows) Process-wide
OpenBLAS (OpenMP) Per-thread
MKL (Intel threading) Per-thread
BLIS Per-thread, probably
FlexiBLAS Depends on the backend!

Demonstration

Consider an example where I start a Python thread pool with 10 threads, and then run a BLAS operation in each thread:

import psutil
from concurrent.futures import ThreadPoolExecutor
import numpy as np
import os
import threading
import threadpoolctl
import sys

A = np.ones((10_000_000,))

num_threads = 0
stop = False

def count_threads():
    global num_threads
    process = psutil.Process()
    while not stop:
        num_threads = max(num_threads, process.num_threads())

def blasop(_ignore):
    threadpoolctl.threadpool_limits(
        limits=int(sys.argv[1]), user_api="blas"
    )
    for i in range(100):
        A.dot(A)

threadpoolctl.threadpool_limits(
    limits=int(sys.argv[1]), user_api="blas"
)
threading.Thread(target=count_threads).start()
POOL = ThreadPoolExecutor(10)
list(POOL.map(blasop, range(10)))
stop = True
print(threadpoolctl.threadpool_info())
print("Max threads:", num_threads)

With OpenBLAS pthreads:

$ python nested-blas.py 10
[{'user_api': 'blas', 'internal_api': 'openblas', 'num_threads': 20, 'prefix': 'libscipy_openblas',
'version': '0.3.31.188.0', 'threading_layer': 'pthreads', 'architecture': 'Haswell'}]
Max threads: 23

With MKL (Conda-Forge):

$ python nested-blas.py 10
[{'user_api': 'blas', 'internal_api': 'mkl', 'num_threads': 12, 'prefix': 'libmkl_rt',  
  'version': '2026.0-Product', 'threading_layer': 'intel'},
 {'user_api': 'openmp', 'internal_api': 'openmp', 'num_threads': 12,
   'prefix': 'libomp', 'version': None}]
Max threads: 102

That's a very different configuration outcome!

Also, in an earlier version I forgot to set the thread pool limit in each thread, which is necessary because of the different semantics...

From Python process pools to Python thread pools

If the limit is set on a process which only runs Python code from its main thread, all the variations have the same effect. Because process pools were more common in Python, the current API therefore does behave consistently.

As Python thread pools become more common, the current state of the API is no longer a good one.

Proposal (big picture)

Notice there are three possible API variants for a library:

  • Set the size of a process-wide pool.
  • Set the size of thread-specific pool, across all threads.
  • Set the size of a thread-specific pool, for the current thread only.

(The fourth quadrant, thread-local setting for a process-wide pool, makes no sense semantically.)

I do not think the current API is viable going forward, it is too inconsistent. On the other hand, it is in use. I therefore propose:

  1. Deprecating the current limits API, but leaving it with the current semantics.
  2. Add a new API where the user explicitly states which of the 3 variants above they are requesting.
    That is, it should be clear to the user exactly what the outcomes will be.
  3. Ideally, leave room for other variations.
  4. Add an API for querying which APIs variants are available.

Next steps

If this big picture approach is acceptable, the next step would be coming up with use cases (preventing oversaturation in thread pools, process pool, etc), and then designing a suitable API.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions