Releases: TavariAgent/Py-TokenGate
[FIXES] - Race Conditions & Positional Alignment
This release fixes some race conditions that appeared in testing, hashes no longer miss with low probability.
[UPDATE] - The Optimization Pass
TokenGate — Optimization Pass Release Notes
Summary
This pass targets the token submission hot path — the sequence of operations
between a caller invoking a decorated function and the token landing in a
worker mailbox. No routing contracts, no execution semantics, and no public
API signatures were changed. Also added a runtime guard for tokens inserted
into the coordinator. All improvements are opt-in or transparent.
Benchmark Comparison
| Metric | Before | After | Delta |
|---|---|---|---|
| Total wall time (131k tokens) | 83.506s | 42.337s | −49% |
| Active execution time | ~83s | 4.673s | −94% |
| Avg latency / token | 0.351ms | 0.086ms | −75% |
| Peak concurrency ratio | 3.15× | 49.22× | +15.6× |
| Peak overlap ratio | 46.09× | 47.39× | maintained |
| Peak single-wave throughput | ~3,368 tok/s | ~64,754 tok/s | ~19× |
| Sustained throughput | ~1,569 tok/s | ~28,046 tok/s | ~18× |
The concurrency jump from 3× to 49× indicates the submission path was the
real ceiling. Workers were idle-waiting on routing overhead. Once that
overhead was removed the thread pool expressed its actual capacity.
Throughput Reporting — How to Read the Numbers
The benchmark reports four throughput figures. Each measures something
distinct and they should not be compared directly without understanding
what each one counts.
Important: All mean calculations are derived from accumulated token
totals and elapsed time totals across waves — they are not arithmetic
averages of independent per-wave unit rates. Reading them as per-wave
averages will make them appear inconsistent with wall time. They are not.
| Metric | Formula | What it measures |
|---|---|---|
| Sustained | total_tokens / Σ wave_elapsed |
Volume-weighted reality. Dominated by the largest waves which carry the most tokens. The honest number for full-workload throughput. |
| Peak | max(tokens_i / elapsed_i) |
Best single-wave rate. Reflects the sweet-spot batch size where parallelism is fullest and scheduling overhead is smallest. |
| Token-weighted mean | Σ(rate_i × tokens_i) / total_tokens |
Each token votes equally on the average rate. Sits between sustained and peak — useful for understanding where the system spends most of its token-budget. |
| Arithmetic mean | Σ rate_i / N |
Each wave votes equally. Small fast waves inflate this significantly. Included for completeness but the least representative of real workload behaviour. |
Why sustained and wall time appear inconsistent:
Wall time includes 15 × 0.05s = 0.75s of deliberate inter-wave sleep plus
event loop scheduling gaps and print overhead. Active time is the raw sum of
asyncio.gather spans only. The tok/s figure in each wave row and in the
summary is always based on active time. Wall time is reported separately so
the two are never conflated.
Example from the final benchmark run:
Waves 13–15: 114,688 tokens / 4.266s = ~26,900 tok/s ← 87% of all volume
Waves 1–12: 16,380 tokens / 0.407s = ~40,200 tok/s ← 13% of all volume
Combined: 131,068 tokens / 4.673s = ~28,046 tok/s ← sustained (correct)
The sustained rate is pulled toward the large-wave rate because large waves
dominate the token count. This is the correct and expected result.
Changes
1. unhashable_checker.py — O(1) Type Dispatch
Problem: make_hashable walked a 23-layer isinstance chain on every
un-hashable value, including common exact types like dict, list, and
np.ndarray.
Change: Added _DISPATCH: dict populated once at module load by
_build_dispatch(). At call time, a single _DISPATCH.get(type(obj))
lookup short-circuits to the correct handler for registered exact types.
Subclass misses fall through to the existing isinstance chain — no
coverage regression.
Registered at load time: dict, list, set, bytearray, memoryview,
slice, array.array, deque, OrderedDict, defaultdict, Counter,
ChainMap, and optionally np.ndarray, pd.DataFrame, pd.Series,
pd.MultiIndex, pd.Index, pd.Categorical, torch.Tensor, cp.ndarray,
PIL.Image.
Fast path fix: The hash() fast path now catches (TypeError, RuntimeError) instead of TypeError only. Non-scalar torch.Tensor raises
RuntimeError from hash() and previously slipped through to the tensor
handler at layer 6. This closes that gap consistently with is_hashable.
2. unhashable_checker.py — HashPolicy Enum
Problem: All tokens paid the full make_hashable pipeline cost at
submission time regardless of whether their operation used sticky routing
or conductor domain anchoring.
Change: Added HashPolicy enum with four levels:
| Value | Behaviour |
|---|---|
NONE |
No arg hashing. route_args is always (). Free routing only. |
FAST |
Builtins and stdlib only (layers 1–3). Unknown types get identity routing (type_name, id). |
STANDARD |
Full make_hashable pipeline. Default — unchanged behavior. |
FULL |
Same as STANDARD. Reserved for explicit subclass-fallthrough intent. |
Set per-operation via decorator tag:
@task_token_guard(
operation_type="conductor_lead",
tags={"weight": "medium",
"hash_policy": HashPolicy.FAST,
"digest_policy": DigestPolicy.FAST,
"external_calls": ["conductor_child"]},
)
def my_function(...): ...Default remains STANDARD. No existing code changes behavior without opt-in.
Also added fast_make_hashable() — the reduced pipeline used by
HashPolicy.FAST. Covers layers 1–3, falls back to (type_name, id) for
unrecognised types.
3. unhashable_checker.py — DigestPolicy Enum
Problem: Conductor seed generation was hardwired to SHA-256 full 64-char
hex on every lead token submission regardless of volume or lifetime
requirements.
Change: Added DigestPolicy enum with four levels:
| Value | Algorithm | Output | Collision space |
|---|---|---|---|
FULL |
SHA-256 | 64 chars | 256-bit. Default, unchanged. |
SHORT |
SHA-256 truncated | 16 chars | 64-bit. Safe at any realistic volume. |
FAST |
BLAKE2s (8-byte) | 16 chars | 64-bit. Lower compute cost than SHA-256. |
MINIMAL |
SHA-256 truncated | 8 chars | 32-bit. Low volume only. |
Collision semantics for MINIMAL: Collisions merge two logical domain
chains into a shared mailbox cluster. No data corruption occurs — the
least-loaded mechanism compensates. Under saturation, heavy tasks may fall
back from their primary core, shifting load distribution and inertly reducing
variance. The effect is benign at low-to-mid token volume with short-lived
leads: collisions reduce domain variance rather than causing failures, but
they can force heavy tasks off their primary core under sustained call chains,
inertly reducing performance at the affinity boundary.
Set per-operation via decorator tag:
@task_token_guard(
operation_type="my_op",
tags={"digest_policy": DigestPolicy.FAST}
)
def my_function(...): ...4. hash_conductor.py — Policy-Aware generate_seed and charge
Change: generate_seed now accepts a DigestPolicy parameter and
branches to the appropriate algorithm. charge reads the digest_policy
tag from the token's metadata, resolves string values to the enum with a
fallback-to-FULL warning, and passes the resolved policy to generate_seed.
The tg_print dispatch line in charge now includes digest={policy.value}
for observability during mixed-policy runs.
snapshot() seed[:12] truncation is safe for all four policies — Python's
slice never raises on out-of-bounds, so an 8-char MINIMAL seed returns
itself.
5. sticky_token.py — freeze() Empty-Args Guard
Problem: _make_key called freeze(args) on every mark() and
unmark(), including the majority path where args = (). freeze(()) is
always () — the recursive call was redundant on every conductor
on_complete → unmark(seed, ()) call.
Change:
return op_name, freeze(args) if args else ()Zero overhead on the empty-args path which is now the dominant path through
the conductor release cycle.
6. core_pinned_staggered_queue.py — Loop and Boolean Simplification
| Location | Change |
|----------------------------------------|------------------------------------------------------------------------------------------------------...
[MINOR-UPDATE] - Type Association Resolution & Cleanup
This update fixes some types I made that were ambiguous and some fields I forgot.
[MINOR-UPDATE] - Resolved Un-hashables & Fall Through Prevention
This is a must have update for clarity and security of hash-able and sticky operations. Anyone using these releases should most certainly update.
[MINOR-UPDATE] - Cleanup & The Crew (Dunders)
This is just a nice-to-have release, it cleans some misleading pathing up and adds in more type friendly resolution via the "double underscored methods".
[UPDATE] - The Production Safety Update
TokenGate — Release Notes
This update marks the first major "production ready features" given improved
data locality, safety from cache storms, and proper cleanup protocols.
Take note of the "Layer by layer" setup - it mentions all stages in the tokens layered contracts.
Extra information was added to BETA.md! Take a look there for bonus setup information.
Hash Conductor & Sticky Token Registry
This release introduces two systems that work together to anchor token execution
to stable core domains for the full lifetime of a call chain. The result is
deterministic routing, zero cross-domain data races, and measurably better
behaviour under saturated load conditions.
What Changed
StickyTokenRegistry
Tokens carrying the same (operation_type, args) key are now pinned to the
core that first receives them. Any later token arriving with the same key is
redirected to that core automatically. The pin releases when the token
completes, freeing the next submission to route normally.
A sticky_anchor tag can be added to any decorator to give the sticky key an
explicit name, independent of operation type.
@task_token_guard(
operation_type="my_op",
tags={"weight": "medium", "sticky_anchor": "op_token"},
)
def my_operation(n: int) -> int:
...HashConductor
Lead tokens — those decorated with external_calls — generate a SHA-256 seed
from their token ID and call list. That seed is pinned to a core domain. Any
token spawned during the lead's execution inherits the seed and is routed to the
same core automatically. The domain releases when the lead and all of its
children have completed.
@task_token_guard(
operation_type="lead_op",
tags={"weight": "medium", "external_calls": ["child_op"]},
)
def lead_operation(n: int) -> list:
return [child_op(n + i) for i in range(4)]No changes are required at call sites. Domain anchoring is fully automatic once
external_calls is declared.
State Machine Cleanup
conductor.on_complete() is now wired into transition_state() directly. Every
terminal state — COMPLETED, FAILED, KILLED, TIMEOUT — decrements the
pending count. Killed tokens are reported as completed for observability clarity.
Domains cannot leak memory regardless of how a token ends.
How Routing Works — Layer by Layer
Understanding the full path a token takes from call to completion.
Layer 1 — Core Pinning Workers in Their Domains:
Workers are fixed to a single core domain at startup and never move. HEAVY
tokens belong to Core 1. MEDIUM tokens belong to Core 2 and above. LIGHT
tokens belong to Core 3 and above. Workers sit in their domain and reach for
the nearest valid token. The queue is what moves — it forms itself into the
correct shape around the workers, routing tokens into position so each worker
always reaches the right one.
Layer 2 — Token Creation & Metadata:
task_token_guard intercepts the decorated call before execution. A TaskToken
is created carrying the function, arguments, operation type, and routing tags.
Complexity scoring runs once and is cached on the wrapper. If an active conductor
seed is present in the current executor thread, it is stamped onto the token's
metadata here, before any event loop crossing occurs.
Layer 3 — Weight Classification & Position Assignment:
The token is classified as HEAVY, MEDIUM, or LIGHT from its tags or
operation type name. A staggered global position is calculated from the per-core
position counter, respecting the active worker pattern for that core. The
candidate core and local worker index are derived from this position.
Layer 4 — Sticky Token Enforcement:
For standard tokens, sticky_registry.mark() is called with the resolved
sticky_anchor or operation type as the key. If a marker already exists for
this key, the token is redirected to the pinned core. If not, the candidate core
is pinned and returned. This prevents concurrent tokens with matching keys from
splitting across core domains.
Layer 5 — Seed Generation:
For lead tokens carrying external_calls, a SHA-256 digest is computed from the
token ID concatenated with a frozen representation of the call list. The full
64-character hex digest is used. Collision probability at any realistic token
volume is negligible. The seed is stored on the token's metadata tags.
Layer 6 — Core Resolution:
_put_routing_block makes the final routing decision. Lead tokens with
external_calls go to conductor.charge(). Tokens carrying a
conductor_seed tag go to conductor.register_child(). All other tokens
follow the normal sticky registry path. The returned core_id is written to
the token's sticky_core tag and used for all subsequent mailbox placements.
Layer 7 — Charge Lead:
conductor.charge() generates the seed, pins it to the candidate core via the
sticky registry, and opens the pending count at 1 for the lead itself. The seed
and core mapping are stored in the conductor's internal registry for the lifetime
of the chain.
Layer 8 — Pre-Register Children:
When the lead function runs in the executor thread and spawns child tokens,
task_token_guard stamps the active seed onto each child at creation time and
calls conductor.pre_register() to increment the pending count immediately.
This increment happens before the child crosses to the event loop, ensuring the
domain stays alive even if the lead completes before the children reach put().
Layer 9 — Register Children:
When the child arrives at put(), register_child() reads the seed from the
token tag and returns the pinned core_id. Position assignment still runs —
but now it runs with a core_id that is already established. assign_position_for_token
uses that known core_id to derive local_i, selecting the correct local worker
within the domain. The staggered counter is not bypassed; it is given the right
context to work from. This keeps domain membership valid as worker counts change
live.
Layer 10 — Domain Grouping:
With the target core_id confirmed, the least-loaded active local worker within
that core is selected. The token is placed into that worker's mailbox. All tokens
under the same seed land in the same core's mailbox domain, keeping related work
physically co-located.
Layer 11 — Execution:
The worker loop dequeues the token and calls _execute_token_wrapped. Before the
function runs, conductor.activate() sets the conductor seed into a thread-local
on the executor thread. If the function spawns further child tokens, they
automatically inherit the seed through the same creation-time stamping mechanism
in task_token_guard.
Layer 12 — Finalization: Cleanup & Metadata:
When the token transitions to any terminal state (COMPLETED, FAILED,
KILLED, TIMEOUT), transition_state() calls conductor.on_complete().
The pending count decrements. When it reaches zero — meaning the lead and every
child have finished — the seed is removed from the conductor registry and the
sticky pin is released. Execution timing, core assignment, and complexity score
are written to the execution record.
Test Coverage Added
Cache Storm Test (demo/cache_storm.py)
Submits anchor tokens to pin keys to cores, then fires concurrent bursts of
tokens with identical (op_type, args) keys targeting different cores. Verifies
the sticky registry redirects all of them to the pinned core with zero misses.
Hash Conductor Test (demo/hash_conductor_test.py)
Submits lead tokens that spawn children during execution. Verifies all tokens
in each chain land on the same core, carry matching seeds, and that the conductor
snapshot is empty after resolution. Also checks seed uniqueness across
independent concurrent leads.
Benchmark
Endurance run across 15 doubling waves. 131,068 tokens total.
Wave Tokens OK Fail Time Tok/s Lat(ms) Conc Overlap
1 4 4 0 0.003s 1386.2 0.721 1.00× 1.44×
2 8 8 0 0.003s 2391.2 0.418 1.72× 2.48×
3 16 16 0 0.006s 2744.8 0.364 1.98× 4.82×
4 32 32 0 0.011s 2812.7 0.356 2.03× 11.32×
5 64 64 0 0.022s 2880.0 0.347 2.08× 22.01×
6 128 128 0 0.044s 2907.6 0.344 2.10× 29.78×
7 256 256 0 0.090s 2846.8 0.351 2.05× 37.98×
8 512 512 0 0.182s 2811.5 0.356 2.03× 41.81×
9 1024 1024 0 0.364s 2813.9 0.355 2.03× 44.18×
10 2048 2048 0 0.775s 2644.3 0.378 1.91× 44.86×
11 4096 4096 0 1.454s 2816.3 0.355 2.03× 38.34×
12 8192 8192 0 2.905s 2819.9 0.355 2.03× 32.64×
13 16384 16384 0 5.925s 2765.0 0.362 1.99× 27.92×
14 32768 32768 0 12.102s 2707.7 0.369 1.95× 24.96×
15 65536 65536 0 23.494s 2789.5 0.358 2.01× 24.21×
TOTAL 131,068 131,068 0 89.091s
Avg latency : 0.386 ms/token
Peak overlap: 44.86×
Less peak overlap due to more deterministic task routing and better task performance
under saturated conditions indicating the update was a success.
Zero failures. Latency holds within 0.04ms from wave 3 through wave 15.
Peak overlap of 44.86× achieved at wave 10 with flat latency — the system
reached saturation and held position rather than degrad...
[FIXES] - Actually Clean Prints & Better Alignment
This set of fixes includes some removed unused code, some conflicting deprecated flags and aligns TokenGates token registration access methods (no more protected member access needed).
[UPDATE] v0.2.2.0-beta — __await__ & The Maximum Concurrency Test
What this gives you
Before this, waiting on a token meant blocking the calling thread:
result = token.get(timeout=30.0)Now tokens participate in the async model directly:
result = await token
# or an entire batch at once
results = await asyncio.gather(*tokens)The decorated function stays synchronous — __await__ only changes how the caller waits for the result, not how the work executes.
Under the hood it bridges TaskToken's internal concurrent.futures.Future to whatever asyncio event loop is currently running via asyncio.wrap_future().
One line of real logic.
The practical benefit is that your orchestrator can now be fully async end to end. Submit a batch, await the gather, report, repeat. No blocking calls, no thread management, no futures handling. TokenGate handles all of that.
What to be aware of
__await__ must be called from within a running event loop. If you call await token from a synchronous context you'll get a runtime error.
The existing token.get() is still there for synchronous use cases and hasn't changed. Because asyncio.wrap_future() binds to the running loop at await time, don't pass tokens between event loops.
Create them and await them on the same loop. In normal usage this never comes up — if you're writing await token inside an async def you're already doing the right thing.
The concurrency test
To validate __await__ properly this update was tested using max_concurrency_test.py.
Four synchronous operations decorated with @task_token_guard:
@task_token_guard(operation_type='cpu_crunch', tags={'weight': 'light'})
def cpu_crunch(n: int) -> int: ...
@task_token_guard(operation_type='string_ops', tags={'weight': 'light'})
def string_transform(seed: int) -> str: ...
@task_token_guard(operation_type='data_transform', tags={'weight': 'medium'})
def data_sort(size: int) -> List[int]: ...
@task_token_guard(operation_type='hash_compute', tags={'weight': 'heavy'})
def hash_chain(seed: str, iterations: int) -> str: ...The orchestrator is async and touches no internal controls. Submit, await, report:
tokens = submit_batch(target)
results = await asyncio.gather(*tokens, return_exceptions=True)I ran this across scaling waves from 4 tokens up to 65,536.
Three metrics tracked per wave: tok/s, average latency per token, concurrency ratio against the wave-1 baseline, and an overlap ratio — the one that tells the real story.
Overlap ratio = Σ(individual task execution times) / wave elapsed time.
If tasks were fully sequential this would be 1.0. Values above 1× mean real parallel execution is happening. The higher the number the more concurrent work was compressed into the wave window.
The results were clean and somewhat a surprise:
RESULTS SUMMARY
Wave Tokens OK Fail Time Tok/s Lat(ms) Conc Overlap ΣTask(ms)
-------------------------------------------------------------------------------------
1 4 4 0 0.002s 1718.7 0.582ms 1.00× 2.39× 5.56ms
2 8 8 0 0.002s 3781.8 0.264ms 2.20× 3.79× 8.01ms
3 16 16 0 0.004s 4067.5 0.246ms 2.37× 5.42× 21.31ms
4 32 32 0 0.007s 4392.7 0.228ms 2.56× 11.90× 86.71ms
5 64 64 0 0.015s 4268.0 0.234ms 2.48× 23.01× 345.10ms
6 128 128 0 0.029s 4475.1 0.223ms 2.60× 38.72× 1107.58ms
7 256 256 0 0.059s 4331.8 0.231ms 2.52× 52.89× 3125.85ms
8 512 512 0 0.113s 4532.8 0.221ms 2.64× 60.30× 6810.70ms
9 1024 1024 0 0.246s 4165.0 0.240ms 2.42× 61.87× 15211.02ms
10 2048 2048 0 0.505s 4052.4 0.247ms 2.36× 33.82× 17091.60ms
11 4096 4096 0 0.980s 4181.1 0.239ms 2.43× 33.15× 32476.06ms
12 8192 8192 0 2.163s 3786.8 0.264ms 2.20× 30.14× 65197.90ms
13 16384 16384 0 4.327s 3786.7 0.264ms 2.20× 23.91× 103450.88ms
14 32768 32768 0 8.754s 3743.3 0.267ms 2.18× 18.72× 163876.79ms
15 65536 65536 0 17.314s 3785.2 0.264ms 2.20× 17.16× 297095.85ms
-------------------------------------------------------------------------------------
TOTAL 131068 131068 0 86.845s
Avg latency across waves : 0.268 ms/token
Peak concurrency ratio : 2.64×
Peak overlap ratio : 61.87×
Overlap ratio = Σ(individual task times) / wave elapsed time
Values above 1× indicate true parallel execution.
Values approaching N = N tasks running simultaneously.
Waves 1 through 9 show sub-quadratic scaling.
The overlap ratio grows faster than token count because workers cycle through short tasks and immediately pull the next one. Compounding the total concurrent activity with peak overlap hitting 62× at 1024 tokens.
Wave 10 is the transition. Workers saturate, rapid cycling slows, and the ratio settles near hardware worker count. From there it holds.
The system found its floor and stayed there.
(Advanced Note: conc is the per-token concurrency in effect, it measures the max() against the baseline per-task latency. This tells us "how fast it's going through operations when N tokens have been distributed" per-token. It's useful because it reflects the batch counts ability to produce concurrency at a marked start (where latency begins) and then how many tasks did complete in that period to the end of the operation that was unitized for a latent range. Keep in mind this uses ABS and is NOT equalized so the results are technically a crude ratio that reflects the actual concurrency only somewhat.)
A new layer
__await__ isn't just a convenience. It's what makes the orchestrator pattern possible at all. Without it the test would require blocking calls inside an async function. Or it blocks the event loop or requires running collect logic in a thread executor. Neither is clean.
With await, TokenGate tokens behave like first-class async objects. You can await one, gather many, use them inside any async framework, or compose them with other await-able methods.
The internal machinery — admission, routing, worker pinning, token lifecycle — stays completely unchanged.
The only thing that changed is how you wait for the result.
To use it, your setup stays exactly the same as before:
async def main():
coordinator = OperationsCoordinator()
coordinator.start()
try:
tokens = [my_task(i) for i in range(64)]
results = await asyncio.gather(*tokens)
finally:
coordinator.stop()
asyncio.run(main())The decorated functions are still synchronous. The coordinator still starts the same way. The only difference is await instead of .get().
Closing
This started as a one-method addition and turned into the most thorough characterization of TokenGate's concurrency behavior yet.
The sub-quadratic scaling result, the transition point at around 1024 tokens, the stable floor holding all the way to 65,536 —
none of that was the goal, it was just what the numbers showed.
The maximum concurrency test is included in demo/ if you want to run it on your own hardware. Your sweet spot will be in a different place than mine based on your hardware capabilities. That's expected.
[UPDATE] - Printability & Static Methods
This updated release includes better prints and updates to some definitions for @staticmethod improving clarity.
It is recommended to update to this version for the best overall experience using TokenGate at this point.
Refer to tg_print.py for the new class interface used to provide prints.
TokenGate Beta!
This introduces the first official beta test release for TokenGate, please refer to BETA.md for detailed startup and usage!