feat: add configurable token cache for customer and lob flows#133
feat: add configurable token cache for customer and lob flows#133vitalykumov wants to merge 8 commits into
Conversation
prashantrakheja
left a comment
There was a problem hiding this comment.
changes look good, some minor feedback, in particular i would like to remove usage of app_tid from token caching because its an optional parameter and likely to be removed going forward..
| Returns: | ||
| System-scoped access token. | ||
| """ | ||
| cached = cache.get_system_token(app_tid) |
There was a problem hiding this comment.
i am not sure if app_tid would be needed for fetching the tokens, is there any other way to do this?
There was a problem hiding this comment.
switched to use client_id
| logger.debug("Loaded %d tool(s) from %s", len(server_tools), dep.ord_id) | ||
| except Exception: | ||
| except Exception as exc: | ||
| unwrapped = _unwrap_exception_group(exc) |
There was a problem hiding this comment.
do u feel the exception block is unusually big? can we put this stuff inside a method?
There was a problem hiding this comment.
Updated logic here - request system_token when needed. Added helper closure to refetch token
| ) | ||
| continue | ||
| except Exception: | ||
| logger.exception( |
There was a problem hiding this comment.
i feel we should not swallow the exception here, if there is a failure we should fail-fast, what's ur take?
There was a problem hiding this comment.
I'll extract logic to a method, good catch.
Regarding exception - it basically follows what was implemented before - log with logger.exception and continue
a003c9b to
9994991
Compare
9994991 to
44aeb27
Compare
|
@vitalykumov, could you take a look at the conflicts? |
I'll try but seems like it would require to change quite a lot... Quick look: |
|
closing in favor of #142 |
Description
Add in-process token cache for both customer and LoB agent flows. Previously every
list_mcp_tools()/call_mcp_tool()call fetched fresh IAS token via mTLS - unnecessary latency in agentic loops.Changes:
_token_cache.py(new):_TokenCache- TTL + LRU eviction for system tokens (key:client_id) and user tokens (key:sha256(user_jwt+"|"+client_id)[:16]). Expiry is fromexpires_in,id_tokenexp claim, or fallback TTL._customer.py:get_system_token_mtls/exchange_user_tokento consult/populate cache. 401 response from MCP server → invalidate stale token + retry once_lob.py:get_system_auth/get_user_authnow check cache before calling BTP Destination Service.get_mcp_tools_lobreuses system auth across all fragments (single fetch per discovery call).call_mcp_tool_lobcaches user token per(fragment, tenant)scope. Both functions retry once on 401 with cache invalidation - same pattern as customer flowagw_client.py:AgentGatewayClientpasses_TokenCacheto both customer and LoB functionsconfig.py: 4 newClientConfigfields added -token_expiry_buffer_seconds(60 s),max_system_token_cache_size(10 s),max_user_token_cache_size(10 s),fallback_token_ttl_seconds(300 s)Type of Change
Please check the relevant option:
How to Test
Describe how reviewers can test your changes:
Checklist
Before submitting your PR, please review and check the following:
Breaking Changes
None. Cache internal to
AgentGatewayClient- existingcreate_client()calls get caching automatically.ClientConfignew fields all have defaults.Additional Notes
Thread safety: GIL makes individual
OrderedDictops atomic, but check-then-set is not. Concurrent coroutines on same key may both miss and both fetch - redundant requests, not corruption. Acceptable for agentic loop use case.401 retry: Both
get_mcp_tools_customerandcall_mcp_tool_customerinvalidate + retry once on 401, handling server-side revocation before token expiry.