Skip to content

feat(providers): hosted-key support for LLM providers (flag-gated, no rate limiting)#5127

Open
TheodoreSpeaks wants to merge 8 commits into
stagingfrom
fix/use-hosted-key-agent
Open

feat(providers): hosted-key support for LLM providers (flag-gated, no rate limiting)#5127
TheodoreSpeaks wants to merge 8 commits into
stagingfrom
fix/use-hosted-key-agent

Conversation

@TheodoreSpeaks

Copy link
Copy Markdown
Collaborator

Summary

  • Route hosted LLM/agent provider calls through the shared hosted-key framework (BYOK-first → platform key pool), gated by the hosted-key-llm flag (default off)
  • Add rate-limiter mode: 'none' so LLMs acquire platform keys with no queue/rate limiting
  • Emit hosted-key metrics (used/cost/failed) for streaming + non-streaming LLM calls, mirroring tools
  • Centralize streaming cost in createStreamingExecution (shared settleStreamingLlmCost), applying the cost multiplier the per-provider streaming paths previously omitted; gemini uses the same helper
  • Per-provider hosting config (envKeyPrefix + byokProviderId); pricing stays per-model via calculateCost
  • Share classifyHostedKeyFailure across tools/providers (fixes auth-error misclassification)

Type of Change

  • New feature

Testing

Tested manually; bun run lint clean, tsc --noEmit 0 errors, check:api-validation:strict passed, unit tests passing (hosted-cost, byok, rate-limiter, streaming-execution, tools)

Checklist

  • Code follows project style guidelines
  • Self-reviewed my changes
  • Tests added/updated and passing
  • No new warnings introduced
  • I confirm that I have read and agree to the terms outlined in the Contributor License Agreement (CLA)

@vercel

vercel Bot commented Jun 18, 2026

Copy link
Copy Markdown

The latest updates on your projects. Learn more about Vercel for GitHub.

1 Skipped Deployment
Project Deployment Actions Updated (UTC)
docs Skipped Skipped Jun 18, 2026 5:06pm

Request Review

@cursor

cursor Bot commented Jun 18, 2026

Copy link
Copy Markdown

PR Summary

High Risk
Changes API key selection, billing gates, and streaming cost settlement across many providers; misconfiguration or flag rollout could affect charges, metrics, or which keys are used.

Overview
Adds a hosted-key-llm feature flag (env HOSTED_KEY_LLM, default off) that routes hosted LLM API key resolution through the same framework as tools: BYOK → user key → platform key pool, with mode: 'none' so LLMs skip FIFO queue and per-workspace rate limits.

Key resolution in getApiKeyWithBYOK uses per-provider hosting config (envKeyPrefix, byokProviderId) and returns hostedKeyEnvVar when a pool key is acquired; legacy getRotatingApiKey paths remain when the flag is off or no keys are configured.

Billing and metrics: non-streaming responses bill when a platform hosted key was used (not only legacy hosted-model lists), apply the cost multiplier, and call emitHostedKeyUsage. Streaming uses settleStreamingLlmCost on drain (including Gemini’s bespoke streams) and recordHostedStreamFailure for mid-stream errors. Client-supplied hostedKey is stripped in executeProviderRequest; the server sets it only after acquiring a pool key.

Shared helpers in hosted-cost.ts (calculateHostedCost, classifyHostedKeyFailure, emitHostedKeyUsage) replace duplicated tool logic and improve auth vs rate-limit error classification. The rate limiter probes _1.._N env keys when _COUNT is unset and refactors key selection into selectKey.

Reviewed by Cursor Bugbot for commit 4d27d1b. Bugbot is set up for automated code reviews on this repo. Configure here.

Comment thread apps/sim/lib/api-key/byok.ts
@greptile-apps

greptile-apps Bot commented Jun 18, 2026

Copy link
Copy Markdown
Contributor

Greptile Summary

This PR routes LLM provider calls through the shared hosted-key framework (BYOK-first → platform key pool via acquireKey with mode: 'none'), gated by the hosted-key-llm feature flag. It centralises streaming cost settlement into settleStreamingLlmCost (applying the cost multiplier the per-provider drain callbacks previously omitted), emits recordUsed/recordCostCharged/recordFailed hosted-key metrics for both streaming and non-streaming LLM calls, and consolidates classifyHostedKeyFailure and calculateHostedCost into a shared hosted-cost module used by both tools and providers.

  • Rate limiter: adds mode: 'none' (NoRateLimit) to HostedKeyRateLimitConfig and a selectKey helper so acquireKey bypasses the FIFO queue and token buckets entirely, plus upgrades resolveEnvKeys to probe _1.._N without requiring _COUNT.
  • Key resolution: getApiKeyWithBYOK gains a flag-gated top-of-function block that checks BYOK first, then acquires from the platform pool, then falls through to the unchanged legacy path.
  • Streaming cost seam: createStreamingExecution's finalizeTiming hook now calls settleStreamingLlmCost when hostedKey is set; Gemini's bespoke drain callbacks call it directly; both paths use output.tokens already written by the provider before finalizeTiming is invoked.

Confidence Score: 4/5

The non-streaming and simple-streaming paths are correctly wired end-to-end; tool-path streaming for most providers still lacks the finalizeTiming call that would settle the hosted-key cost metric and apply the cost multiplier on drain.

The flag-gated key resolution, rate-limiter mode:'none', and centralized cost settlement in settleStreamingLlmCost are all sound. The one structural gap — tool-path streaming providers omitting finalizeTiming in their createStream callbacks — was flagged in a prior review cycle and remains unaddressed.

apps/sim/providers/streaming-execution.ts and the tool-path createStream callbacks in openai/core.ts, baseten/index.ts, fireworks/index.ts, together/index.ts, mistral/index.ts, and ollama/core.ts

Important Files Changed

Filename Overview
apps/sim/lib/api-key/byok.ts Adds the flag-gated hosted-key path to getApiKeyWithBYOK: tries BYOK first, then acquires a platform key via the rate limiter (mode: 'none'), and falls through to legacy on failure. Logic is sound.
apps/sim/providers/streaming-execution.ts Adds settleStreamingLlmCost and wires it into createStreamingExecution's finalizeTiming hook. Correct for the no-tool streaming path; tool-path providers omitting finalizeTiming will never trigger it.
apps/sim/providers/index.ts Adds hosted-key resolution, threads hostedKey to the request, records recordUsed on streaming execution, recordFailed for pre-stream errors, and emitHostedKeyUsage on non-streaming responses. Mid-stream errors not captured in recordFailed.
apps/sim/providers/gemini/core.ts Replaces inline calculateCost with settleStreamingLlmCost in drain callbacks. Token arithmetic preserved; behavior-preserving on the non-hosted path.
apps/sim/lib/api-key/hosted-cost.ts New module centralising calculateHostedCost, classifyHostedKeyFailure, and emitHostedKeyUsage. Tests cover the key branches.
apps/sim/lib/core/rate-limiter/hosted-key/hosted-key-rate-limiter.ts Adds mode:'none' support via a new selectKey helper. Also upgrades resolveEnvKeys to probe _1.._N without requiring _COUNT.
apps/sim/providers/models.ts Adds ProviderHostingConfig and per-provider hosting entries for 8 providers. All byokProviderIds are valid BYOKProviderId values.
apps/sim/tools/index.ts Migrates calculateToolCost and classifyHostedKeyFailure to shared hosted-cost module. Straightforward refactor with no behavior change.

Sequence Diagram

%%{init: {'theme': 'neutral'}}%%
sequenceDiagram
    participant E as executeProviderRequest
    participant B as getApiKeyWithBYOK
    participant RL as HostedKeyRateLimiter
    participant P as Provider.executeRequest
    participant M as hostedKeyMetrics

    E->>B: provider, model, workspaceId, apiKey, userId
    B->>B: isHosted and workspaceId and flag on?
    alt BYOK workspace key exists
        B-->>E: apiKey isBYOK true
    else platform key available
        B->>RL: acquireKey mode none
        RL-->>B: success key envVarName
        B-->>E: apiKey isBYOK false hostedKeyEnvVar
    else fallback
        B-->>E: legacy key no hostedKeyEnvVar
    end
    E->>P: executeRequest sanitizedRequest
    alt throws
        E->>M: recordFailed
        E-->>E: re-throw
    else StreamingExecution
        E->>M: recordUsed
        Note over P,M: On stream drain
        P->>M: recordCostCharged via settleStreamingLlmCost
    else ProviderResponse
        E->>M: emitHostedKeyUsage
    end
Loading
%%{init: {'theme': 'base', 'themeVariables': {"darkMode": true, "background": "#0d1117", "primaryColor": "#21262d", "primaryTextColor": "#e6edf3", "primaryBorderColor": "#8b949e", "lineColor": "#8b949e", "textColor": "#e6edf3", "edgeLabelBackground": "#161b22", "actorBkg": "#21262d", "actorBorder": "#8b949e", "actorTextColor": "#e6edf3", "actorLineColor": "#8b949e", "signalColor": "#8b949e", "signalTextColor": "#e6edf3", "noteBkgColor": "#373320", "noteBorderColor": "#d4a72c", "noteTextColor": "#f0e6c0", "labelBoxBkgColor": "#21262d", "labelBoxBorderColor": "#8b949e", "labelTextColor": "#e6edf3", "loopTextColor": "#e6edf3", "activationBkgColor": "#30363d", "activationBorderColor": "#8b949e"}}}%%
sequenceDiagram
    participant E as executeProviderRequest
    participant B as getApiKeyWithBYOK
    participant RL as HostedKeyRateLimiter
    participant P as Provider.executeRequest
    participant M as hostedKeyMetrics

    E->>B: provider, model, workspaceId, apiKey, userId
    B->>B: isHosted and workspaceId and flag on?
    alt BYOK workspace key exists
        B-->>E: apiKey isBYOK true
    else platform key available
        B->>RL: acquireKey mode none
        RL-->>B: success key envVarName
        B-->>E: apiKey isBYOK false hostedKeyEnvVar
    else fallback
        B-->>E: legacy key no hostedKeyEnvVar
    end
    E->>P: executeRequest sanitizedRequest
    alt throws
        E->>M: recordFailed
        E-->>E: re-throw
    else StreamingExecution
        E->>M: recordUsed
        Note over P,M: On stream drain
        P->>M: recordCostCharged via settleStreamingLlmCost
    else ProviderResponse
        E->>M: emitHostedKeyUsage
    end
Loading

Reviews (3): Last reviewed commit: "Merge remote-tracking branch 'origin/sta..." | Re-trigger Greptile

Comment thread apps/sim/providers/index.ts Outdated
Comment on lines +231 to +234
} else if (hostedKeyEnvVar) {
// Hosted key used: record usage now; cost is settled on stream drain
// inside createStreamingExecution (the single streaming cost seam).
hostedKeyMetrics.recordUsed({

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Inaccurate comment — Gemini does not use createStreamingExecution

The comment says "cost is settled on stream drain inside createStreamingExecution (the single streaming cost seam)", but Gemini uses a bespoke createStreamingResult helper and calls settleStreamingLlmCost directly inside its own stream callback. createStreamingExecution is not involved in the Gemini path at all. The comment should note that Gemini settles cost in its own drain callback (or reference both seams).

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

@TheodoreSpeaks

Copy link
Copy Markdown
Collaborator Author

@greptile review

…-agent

# Conflicts:
#	apps/sim/lib/api-key/byok.ts
#	apps/sim/lib/core/config/feature-flags.ts
@TheodoreSpeaks

Copy link
Copy Markdown
Collaborator Author

@greptile review

@TheodoreSpeaks

Copy link
Copy Markdown
Collaborator Author

Addressed review:

  • Bugbot (user key bypassed by pool): getApiKeyWithBYOK now returns a user-provided request.apiKey (not billed) before acquiring a platform pool key, matching tool hosted-key precedence (5f9046e57).
  • Greptile (inaccurate comment): corrected the streaming-cost comment to note gemini settles via its bespoke finalizer, not createStreamingExecution.

Also fixed CI: completed provider test mocks (@/providers/utils isCachedInput; @/lib/core/config/env getEnv/isTruthy/isFalsy) that the new streaming wiring exposed. Full suite green locally (8868 tests).

Comment thread apps/sim/providers/openai/core.ts
tool: response.model,
key: hostedKeyEnvVar as string,
costTotal: response.cost.total,
})

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hosted metrics omit tool costs

Medium Severity

For non-streaming hosted-key responses, emitHostedKeyUsage runs right after token calculateCost, but sumToolCosts is applied to response.cost.total only later. CloudWatch hosted-key cost therefore excludes tool charges even when the returned response total includes them.

Additional Locations (1)
Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit 5f9046e. Configure here.

@TheodoreSpeaks

Copy link
Copy Markdown
Collaborator Author

Two more Bugbot findings reviewed:

  • High — tool streaming skips hosted settlement: valid and fixed (8d6b768c9). The post-tool streaming paths (createStream: ({ output }) => …) never call finalizeTiming(), so hooking settlement there missed them. Moved settlement to fire on actual stream drain in createStreamingExecution (wraps the returned stream; single point, covers simple + post-tool paths for every provider). Added a regression test that a stream which never calls finalizeTiming still settles cost + emits recordCostCharged on drain.

  • Medium — hosted metrics omit tool costs: by design, not a bug. hostedKeyMetrics.recordCostCharged for an LLM call is the LLM token cost (label tool: <model>). Tool costs are emitted separately by each hosted tool's own path (label tool: <tool.id>); folding sumToolCosts into the LLM metric would double-count hosted tools and mis-attribute cost to the model dimension. Streaming and non-streaming are consistent here. response.cost.total (LLM+tools) remains the billing/trace total — that's a separate channel from this CloudWatch observability metric.

Comment thread apps/sim/providers/index.ts Outdated
Comment thread apps/sim/providers/streaming-execution.ts
@TheodoreSpeaks

Copy link
Copy Markdown
Collaborator Author

Addressed the two streaming-metric symmetry findings (ed64de979):

  • The stream-drain wrapper now records a hosted-key failure on mid-stream error (via classifyHostedKeyFailure), so a failed hosted stream emits recordFailed to match the recordUsed fired at dispatch. A client cancel runs neither (it's an abort, not a key failure), and cost is intentionally not settled on failure (no charge for a failed/partial response). Added a regression test for the error path.
  • Net streaming semantics: recordUsed = attempt, recordCostCharged = successful completion, recordFailed = mid-stream error — so completed = Used − Failed (no silent over-count).

Full suite green (8870 tests), tsc 0, lint 16/16.

Comment thread apps/sim/providers/gemini/core.ts
@TheodoreSpeaks

Copy link
Copy Markdown
Collaborator Author

Addressed the gemini failure-metric finding (c96a2f80c) at the root rather than per-provider:

Moved hosted-stream failure recording to the provider-agnostic chokepoint — executeProviderRequest now wraps the returned stream with recordHostedStreamFailure, so a mid-stream error emits recordFailed for every provider, including gemini and any other bespoke (non-createStreamingExecution) stream. Cost-on-success stays settled per-provider; createStreamingExecution now only handles the cost-drain leg. Added/updated tests for the wrapper's success and error paths.

This generalizes the previous per-provider failure handling, so it won't recur for other stream shapes.

Comment thread apps/sim/providers/index.ts
@TheodoreSpeaks

Copy link
Copy Markdown
Collaborator Author

Fixed the High (4d27d1bad): hostedKey is server-only, so sanitizeRequest now strips any client-supplied value up front. executeProviderRequest is the sole authority that sets it — only when it actually acquires a platform pool key — so a client-supplied hostedKey on a BYOK/user-key request can no longer reach streaming settlement (no bogus cost multiplier or hosted-key metrics). Added a regression test asserting the provider never receives a client-supplied hostedKey.

@TheodoreSpeaks

Copy link
Copy Markdown
Collaborator Author

bugbot run

@cursor cursor Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Bugbot reviewed your changes and found no new issues!

1 issue from previous review remains unresolved.

Fix All in Cursor

Comment @cursor review or bugbot run to trigger another review on this PR

Reviewed by Cursor Bugbot for commit 4d27d1b. Configure here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant