Skip to content

Commit 3690bec

Browse files
committed
Document OWN_GIL subinterpreter parallelism
1 parent 84a8bb8 commit 3690bec

File tree

2 files changed

+224
-3
lines changed

2 files changed

+224
-3
lines changed

CHANGELOG.md

Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -4,6 +4,15 @@
44

55
### Added
66

7+
- **OWN_GIL Subinterpreter Thread Pool** - True parallelism with Python 3.12+ subinterpreters
8+
- Each subinterpreter runs in its own thread with its own GIL (`Py_GIL_OWN`)
9+
- Thread pool manages N subinterpreters for parallel Python execution
10+
- `py:context(N)` returns the Nth context PID for explicit context selection
11+
- `py_context_router` provides scheduler-affinity routing for automatic distribution
12+
- Cast operations are 25-30% faster compared to worker mode
13+
- Full isolation between subinterpreters (separate namespaces, modules, state)
14+
- New C files: `py_subinterp_pool.c`, `py_subinterp_pool.h`
15+
716
- **`erlang.reactor` module** - FD-based protocol handling for building custom servers
817
- `reactor.Protocol` - Base class for implementing protocols
918
- `reactor.serve(sock, protocol_factory)` - Serve connections using a protocol
@@ -112,6 +121,11 @@
112121
- **Subinterpreter cleanup and thread worker re-registration** - Fixed cleanup
113122
issues when subinterpreters are destroyed and recreated
114123

124+
- **ProcessError exception class identity in subinterpreters** - Fixed exception
125+
class mismatch when raising `erlang.ProcessError` in subinterpreter contexts.
126+
The exception class is now looked up from the current interpreter's `erlang`
127+
module at runtime instead of using a global variable.
128+
115129
- **Thread worker handlers not re-registering after app restart** - Workers now
116130
properly re-register when application restarts
117131

docs/scalability.md

Lines changed: 210 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -30,14 +30,150 @@ When running on a free-threaded Python build (compiled with `--disable-gil`), er
3030

3131
### Sub-interpreter Mode (Python 3.12+)
3232

33-
Uses Python's sub-interpreter feature with per-interpreter GIL. Each sub-interpreter has its own GIL, allowing true parallel execution across interpreters.
33+
Uses Python's sub-interpreter feature with per-interpreter GIL (`Py_GIL_OWN`). Each sub-interpreter runs in its own dedicated thread with its own GIL, enabling true parallel execution across interpreters.
34+
35+
**Architecture:**
36+
- Thread pool manages N subinterpreters (default: number of schedulers)
37+
- Each subinterpreter has its own thread, GIL, and Python state
38+
- Requests are routed to subinterpreters via `py_context_router`
39+
- 25-30% faster cast operations compared to worker mode
3440

3541
**Note:** Each sub-interpreter has isolated state. Use the [Shared State](#shared-state) API to share data between workers.
3642

43+
**Explicit Context Selection:**
44+
```erlang
45+
%% Get a specific context by index (1-based)
46+
Ctx = py:context(1),
47+
{ok, Result} = py:call(Ctx, math, sqrt, [16]).
48+
49+
%% Or use automatic scheduler-affinity routing
50+
{ok, Result} = py:call(math, sqrt, [16]).
51+
```
52+
3753
### Multi-Executor Mode (Python < 3.12)
3854

3955
Runs N executor threads that share the GIL. Requests are distributed round-robin across executors. Good for I/O-bound workloads where Python releases the GIL during I/O operations.
4056

57+
## Choosing the Right Mode
58+
59+
### Mode Comparison
60+
61+
| Aspect | Free-Threaded | Subinterpreter | Multi-Executor |
62+
|--------|---------------|----------------|----------------|
63+
| **Parallelism** | True N-way | True N-way | GIL contention |
64+
| **State Isolation** | Shared | Isolated | Shared |
65+
| **Memory Overhead** | Low | Higher (per-interp) | Low |
66+
| **Module Compatibility** | Limited | Most modules | All modules |
67+
| **Python Version** | 3.13+ (nogil) | 3.12+ | Any |
68+
69+
### When to Use Each Mode
70+
71+
**Use Free-Threaded (Python 3.13t) when:**
72+
- You need maximum parallelism with shared state
73+
- Your libraries are GIL-free compatible
74+
- You're running CPU-bound workloads
75+
- Memory efficiency is important
76+
77+
**Use Subinterpreters (Python 3.12+) when:**
78+
- You need parallelism with state isolation
79+
- You want crash isolation between contexts
80+
- You're running untrusted or unstable code
81+
- You need predictable per-request state
82+
83+
**Use Multi-Executor (Python < 3.12) when:**
84+
- Running on older Python versions
85+
- Your workload is I/O-bound (GIL released during I/O)
86+
- You need compatibility with all Python modules
87+
- Shared state between workers is required
88+
89+
### Pros and Cons
90+
91+
**Subinterpreter Mode Pros:**
92+
- True parallelism without GIL contention
93+
- Complete isolation (crashes don't affect other contexts)
94+
- Each context has clean namespace (no state bleed)
95+
- 25-30% faster cast operations vs worker mode
96+
97+
**Subinterpreter Mode Cons:**
98+
- Higher memory usage (each interpreter loads modules separately)
99+
- Some C extensions don't support subinterpreters
100+
- No shared state between contexts (use Shared State API)
101+
- asyncio event loop integration requires main interpreter
102+
103+
**Free-Threaded Mode Pros:**
104+
- True parallelism with shared state
105+
- Lower memory overhead than subinterpreters
106+
- Simplest mental model (like regular threading)
107+
108+
**Free-Threaded Mode Cons:**
109+
- Requires Python 3.13+ built with `--disable-gil`
110+
- Many C extensions not yet compatible
111+
- Shared state requires careful synchronization
112+
- Still experimental
113+
114+
## Subinterpreter Architecture
115+
116+
### Design Overview
117+
118+
```
119+
┌─────────────────────────────────────────────────────────────────┐
120+
│ Erlang VM (BEAM) │
121+
├─────────────────────────────────────────────────────────────────┤
122+
│ py_context_router │
123+
│ ┌─────────────────────────────────────────────────────────┐ │
124+
│ │ Scheduler 1 ──► Context 1 (pid) │ │
125+
│ │ Scheduler 2 ──► Context 2 (pid) │ │
126+
│ │ Scheduler N ──► Context N (pid) │ │
127+
│ └─────────────────────────────────────────────────────────┘ │
128+
│ │ │ │ │
129+
│ ▼ ▼ ▼ │
130+
│ ┌──────────┐ ┌──────────┐ ┌──────────┐ │
131+
│ │ Context │ │ Context │ │ Context │ │
132+
│ │ Process │ │ Process │ │ Process │ │
133+
│ │ (gen_srv)│ │ (gen_srv)│ │ (gen_srv)│ │
134+
│ └────┬─────┘ └────┬─────┘ └────┬─────┘ │
135+
└───────┼──────────────┼──────────────┼───────────────────────────┘
136+
│ │ │
137+
▼ ▼ ▼
138+
┌─────────────────────────────────────────────────────────────────┐
139+
│ Subinterpreter Thread Pool │
140+
├─────────────────────────────────────────────────────────────────┤
141+
│ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │
142+
│ │ Thread 1 │ │ Thread 2 │ │ Thread N │ │
143+
│ │ ┌──────────┐ │ │ ┌──────────┐ │ │ ┌──────────┐ │ │
144+
│ │ │ Interp │ │ │ │ Interp │ │ │ │ Interp │ │ │
145+
│ │ │ (GIL 1) │ │ │ │ (GIL 2) │ │ │ │ (GIL N) │ │ │
146+
│ │ └──────────┘ │ │ └──────────┘ │ │ └──────────┘ │ │
147+
│ └──────────────┘ └──────────────┘ └──────────────┘ │
148+
│ │
149+
│ Each thread owns its interpreter's GIL (Py_GIL_OWN) │
150+
│ No GIL contention between threads │
151+
└─────────────────────────────────────────────────────────────────┘
152+
```
153+
154+
### Key Components
155+
156+
**py_context_router**: Routes requests to context processes based on scheduler affinity or explicit binding.
157+
158+
**py_context_process**: Gen_server that owns a Python context reference and handles call/eval/exec operations.
159+
160+
**Subinterpreter Thread Pool (C)**: Manages N threads, each with its own Python subinterpreter created with `Py_NewInterpreterFromConfig()` and `Py_GIL_OWN`.
161+
162+
### Request Flow
163+
164+
1. Erlang process calls `py:call(Module, Func, Args)`
165+
2. `py_context_router` selects context based on scheduler ID
166+
3. Request sent to `py_context_process` gen_server
167+
4. Gen_server calls NIF which executes on subinterpreter's thread
168+
5. Result returned through gen_server to caller
169+
170+
### Thread Safety
171+
172+
- Each subinterpreter has its own GIL (no cross-interpreter contention)
173+
- NIF calls are serialized per-context via gen_server
174+
- Erlang message passing provides synchronization
175+
- C code uses atomics for cross-thread state (`thread_running` flag)
176+
41177
## Rate Limiting
42178

43179
All Python calls pass through an ETS-based counting semaphore that prevents overload:
@@ -106,12 +242,83 @@ This allows your application to implement backpressure or shed load gracefully.
106242

107243
## Parallel Execution with Sub-interpreters
108244

109-
For CPU-bound workloads on Python 3.12+, use explicit parallel execution:
245+
For CPU-bound workloads on Python 3.12+, erlang_python provides true parallelism via OWN_GIL subinterpreters.
246+
247+
### Check Support
110248

111249
```erlang
112-
%% Check if supported
250+
%% Check if subinterpreters are supported (Python 3.12+)
113251
true = py:subinterp_supported().
114252

253+
%% Check current execution mode
254+
subinterp = py:execution_mode().
255+
```
256+
257+
### Using the Context Router
258+
259+
The context router automatically distributes calls across subinterpreters:
260+
261+
```erlang
262+
%% Start contexts (usually done by application startup)
263+
{ok, _} = py:start_contexts().
264+
265+
%% Calls are automatically routed to subinterpreters
266+
{ok, 4.0} = py:call(math, sqrt, [16]).
267+
{ok, 6} = py:eval(<<"2 + 4">>).
268+
ok = py:exec(<<"x = 42">>).
269+
```
270+
271+
### Explicit Context Selection
272+
273+
For fine-grained control, use explicit context selection:
274+
275+
```erlang
276+
%% Get a specific context by index (1-based)
277+
Ctx = py:context(1),
278+
279+
%% All operations on this context share state
280+
ok = py:exec(Ctx, <<"my_var = 'hello'">>),
281+
{ok, <<"hello">>} = py:eval(Ctx, <<"my_var">>),
282+
{ok, 4.0} = py:call(Ctx, math, sqrt, [16]).
283+
284+
%% Different context has isolated state
285+
Ctx2 = py:context(2),
286+
{error, _} = py:eval(Ctx2, <<"my_var">>). %% Not defined in Ctx2
287+
```
288+
289+
### Context Router API
290+
291+
```erlang
292+
%% Start router with default number of contexts (scheduler count)
293+
{ok, Contexts} = py_context_router:start().
294+
295+
%% Start with custom number of contexts
296+
{ok, Contexts} = py_context_router:start(#{contexts => 8}).
297+
298+
%% Get context for current scheduler (automatic affinity)
299+
Ctx = py_context_router:get_context().
300+
301+
%% Get specific context by index
302+
Ctx = py_context_router:get_context(1).
303+
304+
%% Bind current process to a specific context
305+
ok = py_context_router:bind_context(Ctx).
306+
307+
%% Unbind (return to scheduler-based routing)
308+
ok = py_context_router:unbind_context().
309+
310+
%% Get number of active contexts
311+
N = py_context_router:num_contexts().
312+
313+
%% Stop all contexts
314+
ok = py_context_router:stop().
315+
```
316+
317+
### Parallel Execution
318+
319+
Execute multiple calls in parallel across subinterpreters:
320+
321+
```erlang
115322
%% Execute multiple calls in parallel
116323
{ok, Results} = py:parallel([
117324
{math, sqrt, [16]},

0 commit comments

Comments
 (0)