Skip to content

Releases: TavariAgent/Py-TokenGate

[FIXES] - Race Conditions & Positional Alignment

19 May 02:48

Choose a tag to compare

This release fixes some race conditions that appeared in testing, hashes no longer miss with low probability.

[UPDATE] - The Optimization Pass

18 May 18:54

Choose a tag to compare

Pre-release

TokenGate — Optimization Pass Release Notes

Summary

This pass targets the token submission hot path — the sequence of operations
between a caller invoking a decorated function and the token landing in a
worker mailbox. No routing contracts, no execution semantics, and no public
API signatures were changed. Also added a runtime guard for tokens inserted
into the coordinator. All improvements are opt-in or transparent.


Benchmark Comparison

Metric Before After Delta
Total wall time (131k tokens) 83.506s 42.337s −49%
Active execution time ~83s 4.673s −94%
Avg latency / token 0.351ms 0.086ms −75%
Peak concurrency ratio 3.15× 49.22× +15.6×
Peak overlap ratio 46.09× 47.39× maintained
Peak single-wave throughput ~3,368 tok/s ~64,754 tok/s ~19×
Sustained throughput ~1,569 tok/s ~28,046 tok/s ~18×

The concurrency jump from 3× to 49× indicates the submission path was the
real ceiling. Workers were idle-waiting on routing overhead. Once that
overhead was removed the thread pool expressed its actual capacity.


Throughput Reporting — How to Read the Numbers

The benchmark reports four throughput figures. Each measures something
distinct and they should not be compared directly without understanding
what each one counts.

Important: All mean calculations are derived from accumulated token
totals and elapsed time totals across waves — they are not arithmetic
averages of independent per-wave unit rates. Reading them as per-wave
averages will make them appear inconsistent with wall time. They are not.

Metric Formula What it measures
Sustained total_tokens / Σ wave_elapsed Volume-weighted reality. Dominated by the largest waves which carry the most tokens. The honest number for full-workload throughput.
Peak max(tokens_i / elapsed_i) Best single-wave rate. Reflects the sweet-spot batch size where parallelism is fullest and scheduling overhead is smallest.
Token-weighted mean Σ(rate_i × tokens_i) / total_tokens Each token votes equally on the average rate. Sits between sustained and peak — useful for understanding where the system spends most of its token-budget.
Arithmetic mean Σ rate_i / N Each wave votes equally. Small fast waves inflate this significantly. Included for completeness but the least representative of real workload behaviour.

Why sustained and wall time appear inconsistent:

Wall time includes 15 × 0.05s = 0.75s of deliberate inter-wave sleep plus
event loop scheduling gaps and print overhead. Active time is the raw sum of
asyncio.gather spans only. The tok/s figure in each wave row and in the
summary is always based on active time. Wall time is reported separately so
the two are never conflated.

Example from the final benchmark run:

Waves 13–15:  114,688 tokens / 4.266s  =  ~26,900 tok/s  ← 87% of all volume
Waves  1–12:   16,380 tokens / 0.407s  =  ~40,200 tok/s  ← 13% of all volume
Combined:      131,068 tokens / 4.673s  =  ~28,046 tok/s  ← sustained (correct)

The sustained rate is pulled toward the large-wave rate because large waves
dominate the token count. This is the correct and expected result.


Changes

1. unhashable_checker.py — O(1) Type Dispatch

Problem: make_hashable walked a 23-layer isinstance chain on every
un-hashable value, including common exact types like dict, list, and
np.ndarray.

Change: Added _DISPATCH: dict populated once at module load by
_build_dispatch(). At call time, a single _DISPATCH.get(type(obj))
lookup short-circuits to the correct handler for registered exact types.
Subclass misses fall through to the existing isinstance chain — no
coverage regression.

Registered at load time: dict, list, set, bytearray, memoryview,
slice, array.array, deque, OrderedDict, defaultdict, Counter,
ChainMap, and optionally np.ndarray, pd.DataFrame, pd.Series,
pd.MultiIndex, pd.Index, pd.Categorical, torch.Tensor, cp.ndarray,
PIL.Image.

Fast path fix: The hash() fast path now catches (TypeError, RuntimeError) instead of TypeError only. Non-scalar torch.Tensor raises
RuntimeError from hash() and previously slipped through to the tensor
handler at layer 6. This closes that gap consistently with is_hashable.


2. unhashable_checker.pyHashPolicy Enum

Problem: All tokens paid the full make_hashable pipeline cost at
submission time regardless of whether their operation used sticky routing
or conductor domain anchoring.

Change: Added HashPolicy enum with four levels:

Value Behaviour
NONE No arg hashing. route_args is always (). Free routing only.
FAST Builtins and stdlib only (layers 1–3). Unknown types get identity routing (type_name, id).
STANDARD Full make_hashable pipeline. Default — unchanged behavior.
FULL Same as STANDARD. Reserved for explicit subclass-fallthrough intent.

Set per-operation via decorator tag:

@task_token_guard(
    operation_type="conductor_lead",
    tags={"weight": "medium",
          "hash_policy": HashPolicy.FAST,
          "digest_policy": DigestPolicy.FAST,
          "external_calls": ["conductor_child"]},
)
def my_function(...): ...

Default remains STANDARD. No existing code changes behavior without opt-in.

Also added fast_make_hashable() — the reduced pipeline used by
HashPolicy.FAST. Covers layers 1–3, falls back to (type_name, id) for
unrecognised types.


3. unhashable_checker.pyDigestPolicy Enum

Problem: Conductor seed generation was hardwired to SHA-256 full 64-char
hex on every lead token submission regardless of volume or lifetime
requirements.

Change: Added DigestPolicy enum with four levels:

Value Algorithm Output Collision space
FULL SHA-256 64 chars 256-bit. Default, unchanged.
SHORT SHA-256 truncated 16 chars 64-bit. Safe at any realistic volume.
FAST BLAKE2s (8-byte) 16 chars 64-bit. Lower compute cost than SHA-256.
MINIMAL SHA-256 truncated 8 chars 32-bit. Low volume only.

Collision semantics for MINIMAL: Collisions merge two logical domain
chains into a shared mailbox cluster. No data corruption occurs — the
least-loaded mechanism compensates. Under saturation, heavy tasks may fall
back from their primary core, shifting load distribution and inertly reducing
variance. The effect is benign at low-to-mid token volume with short-lived
leads: collisions reduce domain variance rather than causing failures, but
they can force heavy tasks off their primary core under sustained call chains,
inertly reducing performance at the affinity boundary.

Set per-operation via decorator tag:

@task_token_guard(
    operation_type="my_op",
    tags={"digest_policy": DigestPolicy.FAST}
)
def my_function(...): ...

4. hash_conductor.py — Policy-Aware generate_seed and charge

Change: generate_seed now accepts a DigestPolicy parameter and
branches to the appropriate algorithm. charge reads the digest_policy
tag from the token's metadata, resolves string values to the enum with a
fallback-to-FULL warning, and passes the resolved policy to generate_seed.

The tg_print dispatch line in charge now includes digest={policy.value}
for observability during mixed-policy runs.

snapshot() seed[:12] truncation is safe for all four policies — Python's
slice never raises on out-of-bounds, so an 8-char MINIMAL seed returns
itself.


5. sticky_token.pyfreeze() Empty-Args Guard

Problem: _make_key called freeze(args) on every mark() and
unmark(), including the majority path where args = (). freeze(()) is
always () — the recursive call was redundant on every conductor
on_completeunmark(seed, ()) call.

Change:

return op_name, freeze(args) if args else ()

Zero overhead on the empty-args path which is now the dominant path through
the conductor release cycle.


6. core_pinned_staggered_queue.py — Loop and Boolean Simplification

| Location | Change |
|----------------------------------------|------------------------------------------------------------------------------------------------------...

Read more

[MINOR-UPDATE] - Type Association Resolution & Cleanup

12 May 21:46

Choose a tag to compare

This update fixes some types I made that were ambiguous and some fields I forgot.

[MINOR-UPDATE] - Resolved Un-hashables & Fall Through Prevention

12 May 00:20

Choose a tag to compare

This is a must have update for clarity and security of hash-able and sticky operations. Anyone using these releases should most certainly update.

[MINOR-UPDATE] - Cleanup & The Crew (Dunders)

10 May 01:42

Choose a tag to compare

This is just a nice-to-have release, it cleans some misleading pathing up and adds in more type friendly resolution via the "double underscored methods".

[UPDATE] - The Production Safety Update

09 May 15:23

Choose a tag to compare

Pre-release

TokenGate — Release Notes

This update marks the first major "production ready features" given improved
data locality, safety from cache storms, and proper cleanup protocols.

Take note of the "Layer by layer" setup - it mentions all stages in the tokens layered contracts.

Extra information was added to BETA.md! Take a look there for bonus setup information.

Hash Conductor & Sticky Token Registry

This release introduces two systems that work together to anchor token execution
to stable core domains for the full lifetime of a call chain. The result is
deterministic routing, zero cross-domain data races, and measurably better
behaviour under saturated load conditions.


What Changed

StickyTokenRegistry

Tokens carrying the same (operation_type, args) key are now pinned to the
core that first receives them. Any later token arriving with the same key is
redirected to that core automatically. The pin releases when the token
completes, freeing the next submission to route normally.

A sticky_anchor tag can be added to any decorator to give the sticky key an
explicit name, independent of operation type.

@task_token_guard(
    operation_type="my_op",
    tags={"weight": "medium", "sticky_anchor": "op_token"},
)
def my_operation(n: int) -> int:
    ...

HashConductor

Lead tokens — those decorated with external_calls — generate a SHA-256 seed
from their token ID and call list. That seed is pinned to a core domain. Any
token spawned during the lead's execution inherits the seed and is routed to the
same core automatically. The domain releases when the lead and all of its
children have completed.

@task_token_guard(
    operation_type="lead_op",
    tags={"weight": "medium", "external_calls": ["child_op"]},
)
def lead_operation(n: int) -> list:
    return [child_op(n + i) for i in range(4)]

No changes are required at call sites. Domain anchoring is fully automatic once
external_calls is declared.

State Machine Cleanup

conductor.on_complete() is now wired into transition_state() directly. Every
terminal state — COMPLETED, FAILED, KILLED, TIMEOUT — decrements the
pending count. Killed tokens are reported as completed for observability clarity.
Domains cannot leak memory regardless of how a token ends.


How Routing Works — Layer by Layer

Understanding the full path a token takes from call to completion.

Layer 1 — Core Pinning Workers in Their Domains:
Workers are fixed to a single core domain at startup and never move. HEAVY
tokens belong to Core 1. MEDIUM tokens belong to Core 2 and above. LIGHT
tokens belong to Core 3 and above. Workers sit in their domain and reach for
the nearest valid token. The queue is what moves — it forms itself into the
correct shape around the workers, routing tokens into position so each worker
always reaches the right one.

Layer 2 — Token Creation & Metadata:
task_token_guard intercepts the decorated call before execution. A TaskToken
is created carrying the function, arguments, operation type, and routing tags.
Complexity scoring runs once and is cached on the wrapper. If an active conductor
seed is present in the current executor thread, it is stamped onto the token's
metadata here, before any event loop crossing occurs.

Layer 3 — Weight Classification & Position Assignment:
The token is classified as HEAVY, MEDIUM, or LIGHT from its tags or
operation type name. A staggered global position is calculated from the per-core
position counter, respecting the active worker pattern for that core. The
candidate core and local worker index are derived from this position.

Layer 4 — Sticky Token Enforcement:
For standard tokens, sticky_registry.mark() is called with the resolved
sticky_anchor or operation type as the key. If a marker already exists for
this key, the token is redirected to the pinned core. If not, the candidate core
is pinned and returned. This prevents concurrent tokens with matching keys from
splitting across core domains.

Layer 5 — Seed Generation:
For lead tokens carrying external_calls, a SHA-256 digest is computed from the
token ID concatenated with a frozen representation of the call list. The full
64-character hex digest is used. Collision probability at any realistic token
volume is negligible. The seed is stored on the token's metadata tags.

Layer 6 — Core Resolution:
_put_routing_block makes the final routing decision. Lead tokens with
external_calls go to conductor.charge(). Tokens carrying a
conductor_seed tag go to conductor.register_child(). All other tokens
follow the normal sticky registry path. The returned core_id is written to
the token's sticky_core tag and used for all subsequent mailbox placements.

Layer 7 — Charge Lead:
conductor.charge() generates the seed, pins it to the candidate core via the
sticky registry, and opens the pending count at 1 for the lead itself. The seed
and core mapping are stored in the conductor's internal registry for the lifetime
of the chain.

Layer 8 — Pre-Register Children:
When the lead function runs in the executor thread and spawns child tokens,
task_token_guard stamps the active seed onto each child at creation time and
calls conductor.pre_register() to increment the pending count immediately.
This increment happens before the child crosses to the event loop, ensuring the
domain stays alive even if the lead completes before the children reach put().

Layer 9 — Register Children:
When the child arrives at put(), register_child() reads the seed from the
token tag and returns the pinned core_id. Position assignment still runs —
but now it runs with a core_id that is already established. assign_position_for_token
uses that known core_id to derive local_i, selecting the correct local worker
within the domain. The staggered counter is not bypassed; it is given the right
context to work from. This keeps domain membership valid as worker counts change
live.

Layer 10 — Domain Grouping:
With the target core_id confirmed, the least-loaded active local worker within
that core is selected. The token is placed into that worker's mailbox. All tokens
under the same seed land in the same core's mailbox domain, keeping related work
physically co-located.

Layer 11 — Execution:
The worker loop dequeues the token and calls _execute_token_wrapped. Before the
function runs, conductor.activate() sets the conductor seed into a thread-local
on the executor thread. If the function spawns further child tokens, they
automatically inherit the seed through the same creation-time stamping mechanism
in task_token_guard.

Layer 12 — Finalization: Cleanup & Metadata:
When the token transitions to any terminal state (COMPLETED, FAILED,
KILLED, TIMEOUT), transition_state() calls conductor.on_complete().
The pending count decrements. When it reaches zero — meaning the lead and every
child have finished — the seed is removed from the conductor registry and the
sticky pin is released. Execution timing, core assignment, and complexity score
are written to the execution record.


Test Coverage Added

Cache Storm Test (demo/cache_storm.py)
Submits anchor tokens to pin keys to cores, then fires concurrent bursts of
tokens with identical (op_type, args) keys targeting different cores. Verifies
the sticky registry redirects all of them to the pinned core with zero misses.

Hash Conductor Test (demo/hash_conductor_test.py)
Submits lead tokens that spawn children during execution. Verifies all tokens
in each chain land on the same core, carry matching seeds, and that the conductor
snapshot is empty after resolution. Also checks seed uniqueness across
independent concurrent leads.


Benchmark

Endurance run across 15 doubling waves. 131,068 tokens total.

Wave   Tokens    OK      Fail    Time      Tok/s     Lat(ms)   Conc    Overlap
1      4         4       0       0.003s    1386.2    0.721     1.00×   1.44×
2      8         8       0       0.003s    2391.2    0.418     1.72×   2.48×
3      16        16      0       0.006s    2744.8    0.364     1.98×   4.82×
4      32        32      0       0.011s    2812.7    0.356     2.03×   11.32×
5      64        64      0       0.022s    2880.0    0.347     2.08×   22.01×
6      128       128     0       0.044s    2907.6    0.344     2.10×   29.78×
7      256       256     0       0.090s    2846.8    0.351     2.05×   37.98×
8      512       512     0       0.182s    2811.5    0.356     2.03×   41.81×
9      1024      1024    0       0.364s    2813.9    0.355     2.03×   44.18×
10     2048      2048    0       0.775s    2644.3    0.378     1.91×   44.86×
11     4096      4096    0       1.454s    2816.3    0.355     2.03×   38.34×
12     8192      8192    0       2.905s    2819.9    0.355     2.03×   32.64×
13     16384     16384   0       5.925s    2765.0    0.362     1.99×   27.92×
14     32768     32768   0       12.102s   2707.7    0.369     1.95×   24.96×
15     65536     65536   0       23.494s   2789.5    0.358     2.01×   24.21×

TOTAL  131,068   131,068  0      89.091s
Avg latency : 0.386 ms/token
Peak overlap: 44.86×

Less peak overlap due to more deterministic task routing and better task performance
under saturated conditions indicating the update was a success.

Zero failures. Latency holds within 0.04ms from wave 3 through wave 15.
Peak overlap of 44.86× achieved at wave 10 with flat latency — the system
reached saturation and held position rather than degrad...

Read more

[FIXES] - Actually Clean Prints & Better Alignment

01 May 16:19

Choose a tag to compare

This set of fixes includes some removed unused code, some conflicting deprecated flags and aligns TokenGates token registration access methods (no more protected member access needed).

[UPDATE] v0.2.2.0-beta — __await__ & The Maximum Concurrency Test

26 Apr 19:02

Choose a tag to compare

What this gives you

Before this, waiting on a token meant blocking the calling thread:

result = token.get(timeout=30.0)

Now tokens participate in the async model directly:

result = await token

# or an entire batch at once
results = await asyncio.gather(*tokens)

The decorated function stays synchronous — __await__ only changes how the caller waits for the result, not how the work executes.

Under the hood it bridges TaskToken's internal concurrent.futures.Future to whatever asyncio event loop is currently running via asyncio.wrap_future().

One line of real logic.

The practical benefit is that your orchestrator can now be fully async end to end. Submit a batch, await the gather, report, repeat. No blocking calls, no thread management, no futures handling. TokenGate handles all of that.

What to be aware of

__await__ must be called from within a running event loop. If you call await token from a synchronous context you'll get a runtime error.

The existing token.get() is still there for synchronous use cases and hasn't changed. Because asyncio.wrap_future() binds to the running loop at await time, don't pass tokens between event loops.

Create them and await them on the same loop. In normal usage this never comes up — if you're writing await token inside an async def you're already doing the right thing.

The concurrency test

To validate __await__ properly this update was tested using max_concurrency_test.py.

Four synchronous operations decorated with @task_token_guard:

@task_token_guard(operation_type='cpu_crunch', tags={'weight': 'light'})
def cpu_crunch(n: int) -> int: ...

@task_token_guard(operation_type='string_ops', tags={'weight': 'light'})
def string_transform(seed: int) -> str: ...

@task_token_guard(operation_type='data_transform', tags={'weight': 'medium'})
def data_sort(size: int) -> List[int]: ...

@task_token_guard(operation_type='hash_compute', tags={'weight': 'heavy'})
def hash_chain(seed: str, iterations: int) -> str: ...

The orchestrator is async and touches no internal controls. Submit, await, report:

tokens  = submit_batch(target)
results = await asyncio.gather(*tokens, return_exceptions=True)

I ran this across scaling waves from 4 tokens up to 65,536.

Three metrics tracked per wave: tok/s, average latency per token, concurrency ratio against the wave-1 baseline, and an overlap ratio — the one that tells the real story.

Overlap ratio = Σ(individual task execution times) / wave elapsed time.

If tasks were fully sequential this would be 1.0. Values above 1× mean real parallel execution is happening. The higher the number the more concurrent work was compressed into the wave window.

The results were clean and somewhat a surprise:

  RESULTS SUMMARY

  Wave   Tokens   OK    Fail      Time    Tok/s   Lat(ms)    Conc   Overlap   ΣTask(ms)
  -------------------------------------------------------------------------------------
  1      4        4     0       0.002s   1718.7    0.582ms   1.00×     2.39×       5.56ms
  2      8        8     0       0.002s   3781.8    0.264ms   2.20×     3.79×       8.01ms
  3      16       16    0       0.004s   4067.5    0.246ms   2.37×     5.42×      21.31ms
  4      32       32    0       0.007s   4392.7    0.228ms   2.56×    11.90×      86.71ms
  5      64       64    0       0.015s   4268.0    0.234ms   2.48×    23.01×     345.10ms
  6      128      128   0       0.029s   4475.1    0.223ms   2.60×    38.72×    1107.58ms
  7      256      256   0       0.059s   4331.8    0.231ms   2.52×    52.89×    3125.85ms
  8      512      512   0       0.113s   4532.8    0.221ms   2.64×    60.30×    6810.70ms
  9      1024     1024  0       0.246s   4165.0    0.240ms   2.42×    61.87×   15211.02ms
  10     2048     2048  0       0.505s   4052.4    0.247ms   2.36×    33.82×   17091.60ms
  11     4096     4096  0       0.980s   4181.1    0.239ms   2.43×    33.15×   32476.06ms
  12     8192     8192  0       2.163s   3786.8    0.264ms   2.20×    30.14×   65197.90ms
  13     16384    16384 0       4.327s   3786.7    0.264ms   2.20×    23.91×  103450.88ms
  14     32768    32768 0       8.754s   3743.3    0.267ms   2.18×    18.72×  163876.79ms
  15     65536    65536 0      17.314s   3785.2    0.264ms   2.20×    17.16×  297095.85ms
  -------------------------------------------------------------------------------------
  TOTAL  131068   131068 0      86.845s

  Avg latency across waves  : 0.268 ms/token
  Peak concurrency ratio    : 2.64×
  Peak overlap ratio        : 61.87×

  Overlap ratio = Σ(individual task times) / wave elapsed time
  Values above 1× indicate true parallel execution.
  Values approaching N = N tasks running simultaneously.

Waves 1 through 9 show sub-quadratic scaling.

The overlap ratio grows faster than token count because workers cycle through short tasks and immediately pull the next one. Compounding the total concurrent activity with peak overlap hitting 62× at 1024 tokens.

Wave 10 is the transition. Workers saturate, rapid cycling slows, and the ratio settles near hardware worker count. From there it holds.

The system found its floor and stayed there.

(Advanced Note: conc is the per-token concurrency in effect, it measures the max() against the baseline per-task latency. This tells us "how fast it's going through operations when N tokens have been distributed" per-token. It's useful because it reflects the batch counts ability to produce concurrency at a marked start (where latency begins) and then how many tasks did complete in that period to the end of the operation that was unitized for a latent range. Keep in mind this uses ABS and is NOT equalized so the results are technically a crude ratio that reflects the actual concurrency only somewhat.)

A new layer

__await__ isn't just a convenience. It's what makes the orchestrator pattern possible at all. Without it the test would require blocking calls inside an async function. Or it blocks the event loop or requires running collect logic in a thread executor. Neither is clean.

With await, TokenGate tokens behave like first-class async objects. You can await one, gather many, use them inside any async framework, or compose them with other await-able methods.

The internal machinery — admission, routing, worker pinning, token lifecycle — stays completely unchanged.

The only thing that changed is how you wait for the result.

To use it, your setup stays exactly the same as before:

async def main():
    coordinator = OperationsCoordinator()
    coordinator.start()
    try:
        tokens = [my_task(i) for i in range(64)]
        results = await asyncio.gather(*tokens)
    finally:
        coordinator.stop()

asyncio.run(main())

The decorated functions are still synchronous. The coordinator still starts the same way. The only difference is await instead of .get().

Closing

This started as a one-method addition and turned into the most thorough characterization of TokenGate's concurrency behavior yet.

The sub-quadratic scaling result, the transition point at around 1024 tokens, the stable floor holding all the way to 65,536 —
none of that was the goal, it was just what the numbers showed.

The maximum concurrency test is included in demo/ if you want to run it on your own hardware. Your sweet spot will be in a different place than mine based on your hardware capabilities. That's expected.

[UPDATE] - Printability & Static Methods

25 Apr 20:57

Choose a tag to compare

This updated release includes better prints and updates to some definitions for @staticmethod improving clarity.

It is recommended to update to this version for the best overall experience using TokenGate at this point.

Refer to tg_print.py for the new class interface used to provide prints.

TokenGate Beta!

22 Apr 20:10

Choose a tag to compare

TokenGate Beta! Pre-release
Pre-release

This introduces the first official beta test release for TokenGate, please refer to BETA.md for detailed startup and usage!