Skip to content

Commit 3483f41

Browse files
authored
Add dual pool support with registration-based routing (#21)
- Two pools: default (CPU-bound) and io (I/O-bound, 10 contexts) - py:register_pool(io, Module) routes module calls to io pool - py:register_pool(io, {Module, Func}) routes specific callable - Automatic routing via persistent_term registry - New docs/pools.md and test/py_pool_SUITE.erl
1 parent cddb694 commit 3483f41

File tree

9 files changed

+1054
-117
lines changed

9 files changed

+1054
-117
lines changed

CHANGELOG.md

Lines changed: 28 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -4,6 +4,34 @@
44

55
### Added
66

7+
- **Dual Pool Support** - Separate pools for CPU-bound and I/O-bound operations
8+
- `default` pool - For quick CPU-bound operations, sized to number of schedulers
9+
- `io` pool - For I/O-bound operations, larger pool (default: 10) for concurrency
10+
- `py:call(io, Module, Func, Args)` - Execute on the io pool
11+
- `py:call(io, Module, Func, Args, Kwargs)` - Execute with kwargs on io pool
12+
- Registration-based routing (no call site changes needed):
13+
- `py:register_pool(io, requests)` - Route all `requests.*` calls to io pool
14+
- `py:register_pool(io, {aiohttp, get})` - Route specific function to io pool
15+
- `py:unregister_pool(Module)` - Remove module registration
16+
- `py:unregister_pool({Module, Func})` - Remove function registration
17+
- Automatic routing: `py:call(requests, get, [Url])` goes to io pool when registered
18+
- `py_context_router:start_pool/2,3` - Start named pools programmatically
19+
- `py_context_router:stop_pool/1` - Stop a named pool
20+
- `py_context_router:pool_started/1` - Check if a pool is running
21+
- `py_context_router:get_context(Pool)` - Get context from a named pool
22+
- `py_context_router:num_contexts(Pool)` - Get pool size
23+
- `py_context_router:contexts(Pool)` - Get all contexts in a pool
24+
- `py_context_router:lookup_pool(Module, Func)` - Query pool routing
25+
- Configuration via application env:
26+
```erlang
27+
{erlang_python, [
28+
{io_pool_size, 10}, % Size of io pool (default: 10)
29+
{io_pool_mode, worker} % Mode for io pool (default: auto)
30+
]}.
31+
```
32+
- Backward compatible: existing code using `py:call/3,4,5` works unchanged
33+
- New test suite: `test/py_pool_SUITE.erl`
34+
735
- **Channel API** - Bidirectional message passing between Erlang and Python
836
- `py_channel:new/0,1` - Create channels with optional backpressure (`max_size`)
937
- `py_channel:send/2` - Send Erlang terms to Python (returns `busy` on backpressure)

docs/getting-started.md

Lines changed: 21 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -395,8 +395,29 @@ end.
395395

396396
See [Security](security.md) for details on blocked operations and recommended alternatives.
397397

398+
## Dual Pool Support
399+
400+
erlang_python provides two pools to separate CPU-bound and I/O-bound operations:
401+
402+
```erlang
403+
%% Register entire modules to io pool
404+
py:register_pool(io, requests).
405+
py:register_pool(io, psycopg2).
406+
407+
%% Or register specific callables
408+
py:register_pool(io, {db, query}). %% Only db.query goes to io pool
409+
410+
%% Calls automatically route to the right pool
411+
{ok, 4.0} = py:call(math, sqrt, [16]). %% -> default pool (fast)
412+
{ok, Resp} = py:call(requests, get, [Url]). %% -> io pool (module registered)
413+
{ok, Rows} = py:call(db, query, [Sql]). %% -> io pool (callable registered)
414+
```
415+
416+
This prevents slow HTTP requests from blocking quick math operations. See [Dual Pool Support](pools.md) for configuration and advanced usage.
417+
398418
## Next Steps
399419

420+
- See [Dual Pool Support](pools.md) for separating CPU and I/O operations
400421
- See [Type Conversion](type-conversion.md) for detailed type mapping
401422
- See [Context Affinity](context-affinity.md) for preserving Python state
402423
- See [Streaming](streaming.md) for working with generators

docs/pools.md

Lines changed: 288 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,288 @@
1+
# Dual Pool Support
2+
3+
This guide covers erlang_python's dual pool architecture for separating CPU-bound and I/O-bound Python operations.
4+
5+
## Overview
6+
7+
erlang_python provides two separate pools of Python contexts:
8+
9+
| Pool | Purpose | Default Size | Use Case |
10+
|------|---------|--------------|----------|
11+
| `default` | Quick CPU-bound operations | Number of schedulers | Math, string processing, data transformation |
12+
| `io` | Slow I/O-bound operations | 10 | HTTP requests, database queries, file I/O |
13+
14+
This separation prevents slow I/O operations from blocking quick CPU operations.
15+
16+
## Architecture
17+
18+
```
19+
┌──────────────────────────────────────────────────────────────────┐
20+
│ py:call/3,4,5 │
21+
│ │ │
22+
│ ▼ │
23+
│ ┌─────────────────┐ │
24+
│ │ lookup_pool() │ │
25+
│ │ (registration) │ │
26+
│ └────────┬────────┘ │
27+
│ │ │
28+
│ ┌──────────────┴──────────────┐ │
29+
│ ▼ ▼ │
30+
│ ┌────────────────┐ ┌────────────────┐ │
31+
│ │ default pool │ │ io pool │ │
32+
│ │ (N contexts) │ │ (10 contexts) │ │
33+
│ └────────────────┘ └────────────────┘ │
34+
│ │ │ │
35+
│ ┌────────┴────────┐ ┌────────┴────────┐ │
36+
│ ▼ ▼ ▼ ▼ ▼ ▼ │
37+
│ Ctx1 Ctx2 CtxN Ctx1 Ctx2 Ctx10 │
38+
│ (math) (json) (...) (http) (db) (...) │
39+
└──────────────────────────────────────────────────────────────────┘
40+
```
41+
42+
## Basic Usage
43+
44+
### Explicit Pool Selection
45+
46+
Specify the pool directly in the call:
47+
48+
```erlang
49+
%% Use default pool (quick operations)
50+
{ok, 4.0} = py:call(default, math, sqrt, [16]).
51+
52+
%% Use io pool (slow operations)
53+
{ok, Response} = py:call(io, requests, get, [Url]).
54+
55+
%% With keyword arguments
56+
{ok, Data} = py:call(io, requests, get, [Url], #{timeout => 30}).
57+
```
58+
59+
### Registration-Based Routing
60+
61+
Register modules or specific functions to automatically route to a specific pool:
62+
63+
```erlang
64+
%% Register entire module to io pool (all functions in module)
65+
ok = py:register_pool(io, requests).
66+
ok = py:register_pool(io, aiohttp).
67+
ok = py:register_pool(io, psycopg2).
68+
69+
%% Register specific module.function to io pool
70+
ok = py:register_pool(io, {urllib, urlopen}). %% Only urllib.urlopen
71+
ok = py:register_pool(io, {httpx, 'get'}). %% Only httpx.get
72+
ok = py:register_pool(io, {db, query}). %% Only db.query
73+
74+
%% Now calls route automatically - no code changes needed
75+
{ok, 4.0} = py:call(math, sqrt, [16]). %% -> default pool
76+
{ok, Resp} = py:call(requests, get, [Url]). %% -> io pool (module registered)
77+
{ok, Rows} = py:call(db, query, [Sql]). %% -> io pool (function registered)
78+
{ok, Data} = py:call(db, connect, [Dsn]). %% -> default pool (only db.query registered)
79+
```
80+
81+
### Unregistering
82+
83+
```erlang
84+
%% Remove module registration
85+
ok = py:unregister_pool(requests).
86+
87+
%% Remove function registration
88+
ok = py:unregister_pool({urllib, urlopen}).
89+
```
90+
91+
## Registration Priority
92+
93+
Function-specific registrations take priority over module-wide registrations:
94+
95+
```erlang
96+
%% Register all json functions to io pool
97+
ok = py:register_pool(io, json).
98+
99+
%% But keep json.dumps on default pool (it's fast)
100+
ok = py:register_pool(default, {json, dumps}).
101+
102+
%% Results:
103+
io = py_context_router:lookup_pool(json, loads). %% Module registration
104+
default = py_context_router:lookup_pool(json, dumps). %% Function override
105+
```
106+
107+
## API Reference
108+
109+
### High-Level API (py module)
110+
111+
```erlang
112+
%% Register entire module to pool (all callables in the module)
113+
-spec register_pool(Pool, Module) -> ok when
114+
Pool :: default | io | atom(),
115+
Module :: atom().
116+
117+
%% Register specific callable (module.function) to pool
118+
-spec register_pool(Pool, {Module, Callable}) -> ok when
119+
Pool :: default | io | atom(),
120+
Module :: atom(),
121+
Callable :: atom().
122+
123+
%% Unregister module or specific callable
124+
-spec unregister_pool(Module | {Module, Callable}) -> ok.
125+
126+
%% Call on specific pool
127+
-spec call(Pool, Module, Func, Args) -> {ok, Result} | {error, Reason}.
128+
-spec call(Pool, Module, Func, Args, Kwargs) -> {ok, Result} | {error, Reason}.
129+
```
130+
131+
### Low-Level API (py_context_router module)
132+
133+
```erlang
134+
%% Pool management
135+
-spec start_pool(Pool, Size) -> {ok, [pid()]} | {error, term()}.
136+
-spec start_pool(Pool, Size, Mode) -> {ok, [pid()]} | {error, term()}.
137+
-spec stop_pool(Pool) -> ok.
138+
-spec pool_started(Pool) -> boolean().
139+
140+
%% Context access
141+
-spec get_context(Pool) -> pid().
142+
-spec num_contexts(Pool) -> non_neg_integer().
143+
-spec contexts(Pool) -> [pid()].
144+
145+
%% Registration
146+
-spec register_pool(Pool, Module) -> ok.
147+
-spec register_pool(Pool, Module, Func) -> ok.
148+
-spec unregister_pool(Module) -> ok.
149+
-spec unregister_pool(Module, Func) -> ok.
150+
-spec lookup_pool(Module, Func) -> Pool.
151+
-spec list_pool_registrations() -> [{{Module, Func | '_'}, Pool}].
152+
```
153+
154+
## Configuration
155+
156+
Configure pool sizes via application environment:
157+
158+
```erlang
159+
%% sys.config
160+
[
161+
{erlang_python, [
162+
%% Default pool size (default: erlang:system_info(schedulers))
163+
{default_pool_size, 8},
164+
165+
%% IO pool size (default: 10)
166+
{io_pool_size, 20},
167+
168+
%% IO pool mode: auto | subinterp | worker (default: auto)
169+
{io_pool_mode, worker}
170+
]}
171+
].
172+
```
173+
174+
### Runtime Configuration
175+
176+
```erlang
177+
%% Start additional custom pool
178+
{ok, _} = py_context_router:start_pool(gpu, 2, worker).
179+
180+
%% Register GPU operations
181+
ok = py:register_pool(gpu, torch).
182+
ok = py:register_pool(gpu, tensorflow).
183+
```
184+
185+
## Use Cases
186+
187+
### Web Application with Database
188+
189+
```erlang
190+
%% At application startup
191+
init_pools() ->
192+
%% Register I/O-heavy modules
193+
py:register_pool(io, requests),
194+
py:register_pool(io, httpx),
195+
py:register_pool(io, psycopg2),
196+
py:register_pool(io, redis),
197+
ok.
198+
199+
%% In request handler - no pool awareness needed
200+
handle_request(UserId) ->
201+
%% Fast: uses default pool
202+
{ok, Hash} = py:call(hashlib, sha256, [UserId]),
203+
204+
%% Slow: automatically uses io pool
205+
{ok, User} = py:call(psycopg2, fetchone, [<<"SELECT * FROM users WHERE id = ?">>, [UserId]]),
206+
207+
%% Fast: uses default pool
208+
{ok, Json} = py:call(json, dumps, [User]),
209+
{ok, Json}.
210+
```
211+
212+
### ML Pipeline with I/O
213+
214+
```erlang
215+
%% Register I/O operations
216+
py:register_pool(io, boto3), %% S3 access
217+
py:register_pool(io, requests), %% API calls
218+
219+
%% ML operations stay on default pool (CPU-intensive)
220+
%% I/O operations go to io pool
221+
222+
process_batch(Items) ->
223+
%% Parallel fetch from S3 (io pool)
224+
Futures = [py:cast(boto3, download_file, [Key]) || Key <- Items],
225+
Files = [py:await(F) || F <- Futures],
226+
227+
%% Process with ML model (default pool - doesn't block I/O)
228+
[{ok, _} = py:call(model, predict, [File]) || File <- Files].
229+
```
230+
231+
## Performance Considerations
232+
233+
### Pool Sizing
234+
235+
| Workload | default Pool | io Pool |
236+
|----------|--------------|---------|
237+
| CPU-heavy | Schedulers | Small (5-10) |
238+
| I/O-heavy | Small (2-4) | Large (20-50) |
239+
| Mixed | Schedulers | 10-20 |
240+
241+
### When to Use Registration
242+
243+
**Use registration when:**
244+
- You have clear I/O-bound modules (HTTP clients, database drivers)
245+
- You want transparent routing without changing call sites
246+
- Multiple call sites use the same module
247+
248+
**Use explicit pool selection when:**
249+
- A function's pool depends on arguments
250+
- You need fine-grained control per-call
251+
- Testing or debugging specific pools
252+
253+
## Monitoring
254+
255+
```erlang
256+
%% Check pool status
257+
true = py_context_router:pool_started(default),
258+
true = py_context_router:pool_started(io).
259+
260+
%% Check pool sizes
261+
DefaultSize = py_context_router:num_contexts(default),
262+
IoSize = py_context_router:num_contexts(io).
263+
264+
%% List all registrations
265+
Registrations = py_context_router:list_pool_registrations().
266+
%% => [{{requests, '_'}, io}, {{json, dumps}, default}, ...]
267+
268+
%% Check which pool a call would use
269+
io = py_context_router:lookup_pool(requests, get).
270+
default = py_context_router:lookup_pool(math, sqrt).
271+
```
272+
273+
## Backward Compatibility
274+
275+
Existing code using `py:call/3,4,5` without pool registration continues to work unchanged, using the default pool:
276+
277+
```erlang
278+
%% These all use the default pool (backward compatible)
279+
{ok, 4.0} = py:call(math, sqrt, [16]).
280+
{ok, Data} = py:call(json, dumps, [#{a => 1}]).
281+
{ok, 6} = py:eval(<<"2 + 4">>).
282+
```
283+
284+
## See Also
285+
286+
- [Scalability](scalability.md) - Execution modes and parallel execution
287+
- [Getting Started](getting-started.md) - Basic usage
288+
- [Asyncio](asyncio.md) - Async I/O with event loops

src/erlang_python_app.erl

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -23,7 +23,8 @@ start(_StartType, _StartArgs) ->
2323
erlang_python_sup:start_link().
2424

2525
stop(_State) ->
26-
%% Stop contexts before application shutdown to ensure proper cleanup
26+
%% Stop pools before application shutdown to ensure proper cleanup
2727
%% of subinterpreters before NIF resources are garbage collected
28-
catch py:stop_contexts(),
28+
catch py_context_router:stop_pool(io),
29+
catch py_context_router:stop_pool(default),
2930
ok.

0 commit comments

Comments
 (0)