Reduce global lock contention in Factory resolution#365
Conversation
Split resolve() into a fast path (plan cache + scope cache, no global lock) and a slow path (first resolve only, minimal lock hold time). Add per-key creation locks for singleton/cached scope, move graph scope and DEBUG state to TLS, and thread-safe @LazyInjected. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
|
@hmlongco What are your thoughts? This is quite a big change, but we noticed the lock contention due to Factory resolutions in our app and this brings it down significantly with no changes on the consumer side. |
- Replace CFAbsoluteTimeGetCurrent (Darwin-only, wall clock) with a cross-platform currentTimestamp() helper using CLOCK_MONOTONIC - Add makePthreadKey(destructor:) to handle UnsafeMutableRawPointer optionality difference between Darwin and Linux - Monotonic clock is correct for TTL: immune to NTP and clock adjustments Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
|
Looking at it. Not sure of ramifications yet, especially in regard to cross-container evaluations and other race conditions. Just how many cross-threaded resolutions are you doing on launch, anyway? On a side note, the property wrapper mentions "until Factory supports macros". Are you aware of: https://github.com/hmlongco/Factory/tree/macros#factory-macros |
|
Attempted the following since it breaks fewer contracts, but multiple locking performance is even worse. The globalRecursiveLock is extremely fast, and the only way I could get performance to be worse in the stress test is to add delay loops to the initializers (duplicating a ton of heavy work done on init). Which shouldn't be occurring in the first place. |
Thanks for your time. I'll check out your branch and give it a try. The main problem on our end is how we use Factory. Some of our registrations can be pretty expensive — file I/O, waiting on the keychain and so on. Currently, it's all contended by the global lock. I understand the design philosophy though and that initializers should be slim. I'm torn between fixing the contention in Factory itself or slimming down our initializers. Unfortunately, keeping |
|
Updates locks branch |
|
Your feature/reduce-global-lock-contention
locks
ReviewI noticed an issue we need to address, though. With Result: two "singletons" are created. Thread A receives #1 but is not cached. #2 is cached for everyone else. If the singleton has side effects on init (or keeps state), there is a bug. Claude also found two other issues which look legit. Graph scope depth is process-global (cross-thread interference)Location: internal func enter() {
lock.withLock { depth += 1 }
}
internal func leave() {
lock.withLock {
depth -= 1
if depth == 0 { cache.reset() }
}
}
public private(set) var depth: Int = 0 // ← single Int, shared by all threads
internal var cache = Cache() // ← single Cache, shared by all threadsEvery Failure scenario:
|
|
Updated locks branch to fix race on cache miss |
|
Locks is now in develop as 3.1 |
Very cool! I have more time next week to look into this and test it in a big app. The decorator change you made is not quite clear to me. It's unrelated to the locking, but now the (default) decorator doesn't run anymore when a decorator is set explicitly? Oh and the swift-atomics dependency I'm not sure if it's worth adding that just for the one file. A regular lock would work, too. However, most apps will include swift-atomics anyway as it's very widely used nowadays. 🤷 |
|
If you set a default scope on a container that scope is used unless a given factory overrides it. If you set a default decorator on a container that decorator is used unless a given factory overrides it. |


Summary
This PR eliminates the global recursive lock as a bottleneck during factory resolution. The current implementation holds
globalRecursiveLockfor the entire duration ofresolve()— including while factory closures execute. This serializes all resolution across all threads, even when factories are independent and cached values are available.The new design splits resolution into a fast path (cache hits, no global lock) and a slow path (first resolve only, minimal lock duration), achieving significant throughput improvement on concurrent workloads with realistic factory costs.
Motivation
In apps with 100+ services resolving at launch across multiple threads (e.g., app startup on modern multi-core devices), the global lock creates a convoy effect: threads queue up waiting for unrelated factories to complete. Every microsecond a factory closure spends reading a config, opening a database, or decoding JSON blocks every other thread in the process from resolving anything.
Design
Resolution Plan Cache
On first resolve, a lightweight
ResolutionPlanstruct is built and cached per-key on theContainerManager. Subsequent resolves read the plan under aCrossPlatformLock(nanoseconds) and go directly to the scope cache — bypassing the global lock, options dictionary lookups, and registration checks entirely.Plans are invalidated on any write operation:
.register {},.reset(),.decorator(),.push()/.pop(), scope changes, and context modifications. A monotonic generation counter prevents stale deferred plan stores from racing with concurrent mutations.Lock Hierarchy
No inversion possible — fast path never touches the global lock, slow path acquires in consistent order.
Per-Key Creation Locks
Singleton and cached scopes use a per-key
MutexLock(sleeping, non-recursive) with double-checked locking to guarantee factory-called-once semantics without holding the global lock during factory execution. Creation locks are auto-pruned after use.Thread-Local State
@MainActorLazyInjectedNew lock-free property wrapper for
@MainActor-isolated types. Omits all locking since main actor isolation already guarantees single-threaded access — zero overhead for the common case of view models and UI services.This is a pragmatic compromise. Ideally, the
@LazyInjectedmacro could inspect the enclosing declaration's actor isolation at compile time and expand to the correct code automatically — no wrapper choice, no lock allocation, no per-access lock overhead. However, Swift macros depend on SwiftSyntax, which significantly increases build times and still has reliability issues with prebuilt binary support. Until the macro ecosystem matures, a separate property wrapper is the simplest correct solution.New Lock Backends (prepared, not yet active)
The codebase includes conditional implementations for modern lock primitives behind compiler flags (
SynchronizationLock,AllocatedLock). These are not enabled in this PR — SPM package traits cannot conditionally raise the deployment target, so enabling them would either break consumers on older OS versions or require a hard minimum bump. They're ready to activate when Factory's deployment target is raised in a future release.SynchronizationLock— Uses Swift 6Mutex(zero-cost inline lock, requires iOS 18+ / macOS 15+)AllocatedLock— UsesOSAllocatedUnfairLock(single allocation, requires iOS 16+ / macOS 13+)os_unfair_lockviaUnsafeMutablePointer(all OS versions)Profiling (Instruments)
Measured with
os_signpostintervals aroundresolve()and global lock acquisition in a real app resolving ~1,000 factories during launch.Baseline (
main)This PR
Key Improvements
What's Changed
resolve(),ResolutionPlanstruct, generation-guarded deferred plan store for circular dependency safetyScope.Cacheis now internally synchronized (CrossPlatformLock+ per-keyMutexLockfor exclusive creation with auto-prune), Graph scope uses TLSContainerManagerwith generation counter, invalidation calls on reset/push/pop/decorator/defaultScopeThreadLocalDebugStatereplacing global mutable varsCrossPlatformLock(renamed fromSpinLock) with conditional backends (prepared, not active), newMutexLocktype@MainActorLazyInjectedproperty wrapper (lock-free).dynamicfrom FactoryTestingBreaking Changes
None. Public API is unchanged. Internal behavior is semantically equivalent with the documented eventual-consistency caveat on concurrent registration (which was already racy by design in the original — the global lock only prevented crashes, not deterministic ordering).
Risks
resolve()in flight during a.register {}may return the old value. This was already true before (the global lock prevented crashes but not deterministic ordering of concurrent read/write). Now it's documented explicitly.invalidatePlan/invalidateAllPlans), the fast path could serve stale data. Mitigated by invalidating on all known write paths and by the generation counter on deferred stores.Testing
Declaration
The program was tested solely for our own use cases, which might differ from yours.
Link to provider information
https://github.com/mercedes-benz