Skip to content

[WIP] Refactor typed object factory to use async coroutines#30

Draft
Copilot wants to merge 1 commit intoclaude/async-typed-object-factory-CZYhzfrom
copilot/sub-pr-29
Draft

[WIP] Refactor typed object factory to use async coroutines#30
Copilot wants to merge 1 commit intoclaude/async-typed-object-factory-CZYhzfrom
copilot/sub-pr-29

Conversation

Copy link

Copilot AI commented Feb 20, 2026

Thanks for the feedback on #29. I've created this new PR, which merges into #29, to address your comment. I will work on the changes and keep this PR's description up to date as I make progress.

Original PR: #29
Triggering comment (#29 (comment)):

@copilot review this PR.

The goal is to undertake the broader refactoring to get from blocking to async using coroutines.

Here's a summary of the issue (taken from the codebase):

   /*
    * THREADING MODEL - READ BEFORE MODIFYING
    * More detail in ORB-1077
    *
    * Background
    * ----------
    * TypedObjectFactory uses per-field `lazy` delegates (LazyThreadSafetyMode.SYNCHRONIZED)
    * to handle cross-field dependencies during expression evaluation (e.g. `total = quantity * price`).
    * This was intentional: ConcurrentHashMap.computeIfAbsent is not re-entrant, so lazy was chosen
    * to allow safe recursive field access within a single factory instance.
    *
    * The problem
    * -----------
    * Several methods (handleTypeNotFound, evaluateLambdaExpression, queryForParentType) bridge
    * async operations back to synchronous code via runBlocking { }. When these are invoked from
    * a DefaultDispatcher worker thread, the worker blocks waiting for a coroutine to complete.
    * That coroutine also needs a DefaultDispatcher thread to run. Under concurrent load (multiple
    * factory instances building simultaneously), all DefaultDispatcher workers become blocked,
    * the coroutines they're waiting on can never be scheduled, and the system stalls indefinitely.
    * This was confirmed by a production thread dump showing all DefaultDispatcher workers parked
    * in runBlocking { } inside handleTypeNotFound, each holding a lazy lock on a different
    * factory instance.
    *
    * Short-term fix
    * --------------
    * All runBlocking { } call sites in this class use a dedicated blockingBridgeDispatcher
    * (unbounded cached thread pool) rather than inheriting the calling thread's dispatcher.
    * This moves the blocking onto threads outside the DefaultDispatcher pool, freeing coroutine
    * workers to complete the async operations being waited on.
    *
    * Why this is still a sticking plaster
    * -------------------------------------
    * The calling thread is still blocked — we've just moved the blockage off the bounded pool.
    * Under sustained high concurrency the bridge pool will grow unboundedly (one thread per
    * concurrent blocked call). This is acceptable in practice because it is bounded by upstream
    * request concurrency, but it is not a principled fix.
    *
    * The correct long-term fix
    * -------------------------
    * Make the entire call chain properly async: replace the lazy + runBlocking pattern with
    * suspend functions (or Deferred), propagating suspension all the way up through getValue(),
    * buildField(), getOrBuild(), and buildAsync(). This eliminates thread blocking entirely —
    * coroutines suspend and yield their workers back to the pool while waiting. This requires
    * changes to the classes that call into TypedObjectFactory and is a non-trivial refactor,
    * hence the bridge approach in the interim.
    */
    ```
    
This PR is attempting the "correct long-term fix" mentioned there.

There are a few changes relating to UI in the PR which just needs rebasing, we can do that later. But, pay close attention to the migration from blockgin to non-blocking, and look for correctness, or any issues that may be lurking here. This is a major refactor of a core part of our system.


💡 You can make Copilot smarter by setting up custom instructions, customizing its development environment and configuring Model Context Protocol (MCP) servers. Learn more Copilot coding agent tips in the docs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants