Releases: fastapi-extensions/fastapi-tenancy
fastapi-tenancy v0.4.0
[0.4.0] — 2026-04-02
Concurrency hardening, PostgreSQL schema isolation correctness under multi-transaction
sessions, serializable metadata merges with automatic retry, L1 cache lifecycle
management, context-variable restoration safety, and 46 new regression tests.
All 5 failures found by running the full live-database test suite against v0.3.0
are fixed in this release.
Security
CRITICAL — Cross-tenant schema bleed under concurrent load (isolation/schema.py)
Three successive implementations of the _schema_session search-path mechanism
were analysed and the root defects fixed:
-
v0.3.0 (engine-level
beginlistener) —event.listen(sync_engine, "begin", _on_begin)attached a single listener to the global engine, shared by every
concurrent request. Under load, Request A's and Request B's listeners both fired on
every transaction begin, causing each session to silently receive the other tenant's
search_path. Additionally, theevent.listencall preceded thetryblock, so
ifAsyncSession()construction raised (e.g. pool exhausted), the listener remained
permanently on the engine, corrupting all subsequent connections. -
Pool
checkout/checkinapproach (interim) — The asyncpg dialect wraps the raw
DBAPI connection inAdaptedConnection, which does not implement the SQLAlchemy
event interface. Attaching abeginlistener to it raised
InvalidRequestError: No such event 'begin'on every PostgreSQL connection checkout. -
Connection.beginonconn.sync_connection(second interim) — Correct for
single-transaction sessions, but broken for multi-transaction ones. With
autobegin=False, SQLAlchemy releases the physical connection back to the pool on
everycommit(). The next transaction uses a newConnectionobject, making
the listener on the old object useless. The test
test_search_path_reapplied_after_commitconfirmed data was invisible after commit.
Final fix — Session.after_begin: The Session.after_begin(session, transaction, connection) event fires for every transaction the session begins, including those
that start after commit() releases and re-acquires the connection. It receives the
current Connection as an argument, so SET LOCAL search_path is always issued on
the correct physical connection. The listener is scoped to the session.sync_session
object — invisible to other sessions — and is removed in finally.
CRITICAL — RLS GUC listener not removed after session close (isolation/rls.py)
The @event.listens_for(sync_conn, "begin") decorator inside get_session() is a
call-site decoration that never removes the listener. When the physical connection
was returned to the pool and reused by a future request for a different tenant, the
stale listener fired at the start of the new tenant's first transaction, silently
setting app.current_tenant to the previous tenant's ID — a silent cross-tenant data
read breach.
Fix: The listener function is defined as a named local variable, registered with
event.listen(), and removed with event.remove() in a finally block that wraps
the entire session lifetime — including the AsyncSession() constructor call.
sync_conn is initialised to None outside the block so the guard is safe even if
the session never opens.
Added
TenancyConfig — L1 cache fields now first-class (core/config.py)
l1_cache_max_size: int(default1000, range10–100 000) — maximum entries in
the in-process LRU cache. Configurable viaTENANCY_L1_CACHE_MAX_SIZEenv var.l1_cache_ttl_seconds: int(default60, min1) — TTL for in-process cache
entries. Configurable viaTENANCY_L1_CACHE_TTL_SECONDSenv var. Previously
TenancyManagerread these viagetattr(config, "l1_cache_...", fallback)— they
did not exist onTenancyConfigand could not be set by users.
TenancyManager — periodic L1 cache purge task (manager.py)
_run_cache_purge_loop()— backgroundasyncio.Taskthat calls
TenantCache.purge_expired()everymax(1, l1_cache_ttl_seconds // 2)seconds.
Previouslypurge_expired()existed but was never called automatically; in
low-traffic deployments expired entries accumulated indefinitely.initialize()creates the task (idempotent — a second call while the task is
running is a no-op). The task is named"fastapi-tenancy:l1-cache-purge"for
observability in async debuggers.close()cancels and awaits the task before disposing the store and isolation
provider, preventing use-after-free on the cache reference.
TenantContext.reset_all() (core/context.py)
- New static method
reset_all(tenant_token, metadata_token)that calls
_tenant_ctx.reset(tenant_token)and_metadata_ctx.reset(meta_token)atomically.
Counterpart to the updatedclear(), enabling safe nested-scope context management
at any call depth.
Fixed
FIX-1 — DatabaseIsolationProvider._creation_locks grows without bound
(isolation/database.py)
- Replaced
dict[str, asyncio.Lock]withweakref.WeakValueDictionary[str, asyncio.Lock]. Entries are garbage-collected automatically when no coroutine holds
a live reference, bounding the dict to the number of actively contested tenants
at any moment. The local variabletenant_lockinside_get_enginekeeps the lock
strongly referenced for the critical section, preventing premature GC between the
WeakValueDictionarylookup and acquiring the lock. All manualpopcleanup calls
removed.
FIX-2 — Metadata merge loses updates under PostgreSQL concurrent writes
(storage/database.py)
_update_metadata_pg()— the SERIALIZABLE transaction correctly aborts one of N
concurrent writers withSerializationError(pgcode40001). The previous
implementation propagated this error asTenancyError, makingupdate_metadata
non-functional under any realistic write concurrency.- Fix: Retry loop — up to 5 attempts with 5 ms base exponential back-off. The
competing transaction has already committed by the time the error is received, so
retries succeed immediately in practice. Detection is class-name-based to avoid
a hardasyncpgimport. - The three-transaction corruption-recovery pattern (optimistic attempt → reset →
re-merge) is replaced by a single SERIALIZABLE transaction with an inline
CASE … WHEN tenant_metadata IS NULL OR … THEN '{}'::jsonb ELSE … ENDguard
that handles NULL, empty-string, and non-JSON values server-side with no round-trip.
FIX-3 — TenantContext.clear() and clear_metadata() discard tokens
(core/context.py)
- Both methods previously called
set(None)and discarded the returnedToken,
making it impossible to restore the previous state. In nested scopes — a test
fixture inside atenant_scope, background tasks, or middleware wrapping an
outer tenant scope — the outer tenant was permanently erased. - Fix:
clear()now returns(tenant_token, metadata_token).clear_metadata()
returns itsToken. Existing callers that ignore return values are unaffected.
FIX-4 — TenancyManager reads L1 cache config via fragile getattr fallback
(manager.py)
TenancyManager.__init__previously usedgetattr(config, "l1_cache_max_size", 1000)andgetattr(config, "l1_cache_ttl_seconds", 60). These fields did not
exist onTenancyConfig. Users could not configure them via environment variables
or programmatic construction, and any typo in the field name silently fell through
to the hardcoded default.- Fix: Both fields added to
TenancyConfig(see Added above).TenancyManager
now readsconfig.l1_cache_max_sizeandconfig.l1_cache_ttl_secondsdirectly.
FIX-5 — MSSQL destroy_tenant dynamic SQL uses raw string concatenation
(isolation/schema.py)
- The T-SQL block that drops all tables in a schema before dropping the schema itself
previously used'DROP TABLE [' + :schema + '].[' + TABLE_NAME + '];'— raw string
concatenation forTABLE_NAMEwith no quoting. - Fix: Replaced with
QUOTENAME(TABLE_SCHEMA) + N'.' + QUOTENAME(TABLE_NAME).
Both identifiers are now bracket-quoted by SQL Server's built-inQUOTENAME()
function.:schemaremains a bound parameter for theWHERE TABLE_SCHEMA = :schema
predicate.AND TABLE_TYPE = N'BASE TABLE'guard added to exclude views.
FIX-6 — Rate-limit Lua sorted-set member collision (manager.py)
ZADD key now nowused the float timestamp as both score and member. Two requests
arriving within the same microsecond produce an identical float value; the second
ZADDoverwrote the first entry rather than adding a new one, under-counting the
window and allowing an extra request past the limit.- Fix: Each call to
check_rate_limit()generates
member = f"{now}:{uuid.uuid4().hex}". Score remainsnowfor time-based
eviction; the UUID suffix guarantees per-request uniqueness. The Lua script
receivesmemberasARGV[5].
Changed
TenancyConfig.cache_ttl— description updated to clarify this is the
Redis write-through cache TTL (SETEX expiry), distinct from
l1_cache_ttl_seconds(in-process LRU TTL). Both fields were previously conflated.TenancyManager.initialize()— now starts the L1 purge background task and
logs its interval. Docstring updated to document all four startup steps.TenancyManager.close()— cancels and awaits the purge task before disposing
the isolation provider and store.TenantContext.clear()— return type changed fromNoneto
tuple[Token[Tenant | None], Token[dict[str, Any] | None]]. Fully backward
compatible — existing callers that ignore the return value continue to work.TenantContext.clear_metadata()— return type chang...
fastapi-tenancy v0.3.0
[0.3.0] — 2026-03-20
Security hardening, field-level encryption, L1 cache wired into every request,
MSSQL schema isolation fix, anti-enumeration resolver, and a complete CI rewrite
with MSSQL in the integration tier.
Added
Field-level encryption (utils/encryption.py)
TenancyEncryption— Fernet/HKDF-SHA256 implementation. Encryptsdatabase_url
and any metadata key prefixed_enc_at rest. Ciphertext is prefixedenc::for
rolling-migration compatibility (plain values pass through unchanged on read).- Key material is derived via HKDF-SHA256 — callers supply any 32+ char passphrase;
the library derives a proper 32-byte Fernet key internally, preventing weak-key attacks. TenancyManagerencrypts onregister_tenant()write and decrypts transparently
viadecrypt_tenant().
L1 cache wired into every request (manager.py)
_CachingStoreProxy— transparent proxy that wraps anyTenantStorewith the
in-processTenantCache. Interceptsget_by_identifier()(the hot path on every
request) and serves from L1 on warm hits. Automatically invalidates oncreate,
update,set_status, anddelete. PreviouslyTenantCacheexisted but was
never connected to the request path.
Observability
TenancyManager.get_metrics()— runtime snapshot: L1 cache hit rate / size,
engine cache size (DATABASE isolation). Designed for wiring to a/metricsendpoint.
CI / workflows
- MSSQL added to the integration job tier using the custom image from
compose/mssql/(built viadocker build+docker runsteps since GitHub
Actionsservices:does not supportbuild:context). This is the same image
used bymake test-alllocally, ensuring parity between local and CI runs. ci.yml— path-filtered job graph. Docs-only PRs skip all test/lint jobs.
Integration enforces--cov-fail-under=85which is achievable with MSSQL
included. E2E (PostgreSQL × 2 versions, MySQL) runs only onmainpush or
PRs labelledrun-e2e.docs.yml— separate workflow for MkDocs build + GitHub Pages deploy,
triggered only when docs paths change.release.yml— version/CHANGELOG validation before build, pre-release detection
(→ TestPyPI), PyPI Trusted Publishing (OIDC — no stored API token).codeql.yml— triggered only onsrc/**changes plus weekly schedule.dependency-review.yml— blocks PRs introducing CVEs ≥ moderate or GPL/AGPL deps.ci-passgate job — single required status for branch protection.
Fixed
FIX-1 — _creation_locks leak on engine creation failure (isolation/database.py)
DatabaseIsolationProvider._get_engine()— wrappedcreate_async_engine()in
try/except. On failure the per-tenant lock is removed from_creation_locks
before re-raising. Previously the lock leaked permanently, blocking all retries
for that tenant until process restart.
FIX-2 — MSSQL schema isolation (isolation/schema.py)
_mssql_schema_session()— replacedALTER USER CURRENT_USER WITH DEFAULT_SCHEMA
(permanently forbidden for thedboprincipal, error 15150) with SQLAlchemy's
schema_translate_mapexecution option. Every unqualified ORM table reference is
rewritten to[schema].[table]at SQL-generation time with no DDL required._initialize_mssql_schema()— usesMetaData(schema=schema)+create_allto
generateCREATE TABLE [schema].[table]without touching any database user.get_session()now has an explicitelif self.dialect == DbDialect.MSSQLbranch.
FIX-3 — Tenant enumeration via header resolver (resolution/header.py)
- All failure modes — missing header, invalid identifier format, unknown tenant —
raiseTenantResolutionErrorwith the same generic reason"Tenant not found". - Unknown tenant now produces a 400 response (not 404) to prevent status-code-based
enumeration of valid tenant identifiers.
FIX-4 — search() ILIKE not portable to MSSQL (storage/database.py)
SQLAlchemyTenantStore.search()branches onself._dialect: MSSQL uses.like()
(case-insensitive by default with CI collation); all other dialects keep.ilike().
FIX-5 — _prefix_session docstring gap (isolation/schema.py)
- Added
.. warning::block documenting thatsession.info["table_prefix"]persists
on the session object across transactions but is not automatically set on new
AsyncSessioninstances created manually within the same request.
FIX-6 — SET LOCAL search_path implicit-transaction collision (isolation/schema.py)
_schema_session()— replacedsession.connection()(which starts an implicit
transaction, causingInvalidRequestError: A transaction is already begunwhen
callers openasync with session.begin()) with an engine-levelbeginevent
listener onengine.sync_engine. The listener fires before any transaction starts
and is removed infinallyto prevent cross-session leakage.
FIX-7 — Flaky test_multiple_requests_each_get_fresh_session (tests/test_dependencies.py)
- Replaced
id(session)comparison (unreliable: CPython reuses memory addresses for
non-overlapping objects) withsessions_seen[0] is not sessions_seen[1]while
keeping both references alive simultaneously.
Changed
AuditLogWriterpromoted to runtimeProtocol— moved out ofTYPE_CHECKING,
decorated with@runtime_checkable.isinstance(writer, AuditLogWriter)now works
correctly at runtime.enable_metricswired —TenancyManager.get_metrics()exposes runtime metrics.
Previously the config field was declared but nothing consumed it.- README — complete rewrite with live CI and Codecov badges, feature table, all
four isolation strategies with code examples, encryption and L1 cache usage,
observability section, DB compatibility matrix.
v0.2.0
Release 0.2.0 — first public release