Skip to content

feat: Anthropic CoT controls#417

Open
odilitime wants to merge 85 commits intodevfrom
odi-anthro
Open

feat: Anthropic CoT controls#417
odilitime wants to merge 85 commits intodevfrom
odi-anthro

Conversation

@odilitime
Copy link
Copy Markdown
Contributor

@odilitime odilitime commented Mar 27, 2026

  • Align check-types-split with packages/db, packages/lib, and UI components; exclude Storybook stories from the UI chunk.
  • Add check-types:tests to CI; add root scripts for UI, agent-server, and gateway typechecks.
  • packages/ui: path aliases for @/lib, @/db, @/app, @/packages/ui; styled-jsx attrs for <style jsx>.
  • packages/lib: JWKS uses jose KeyLike; runtime-factory imports plugin-sql default with typed adapter factory.
  • gateway-discord: explicit discord.js handler types, connectBot characterId, sendTyping guard, listener map cast.
  • services/agent-server: requireInternalAuth accepts Elysia string/number status.
  • Tests: mocks, RequestInit/fetch preconnect, UUID casts, toMatchObject, financial addCredits, Playwright APIRequestContext, and related fixes.
  • New anthropic-thinking provider helper and unit test.

Note: services/operator (Pepr) is not part of the main workspace typecheck.
Made-with: Cursor


Note

Medium Risk
Medium risk: changes LLM providerOptions and credit reservation estimates across multiple production API routes, which can impact billing and model behavior; also modifies rate limiting presets via a new env multiplier.

Overview
Introduces per-agent Anthropic extended thinking controls by adding packages/lib/providers/anthropic-thinking.ts to resolve thinking budgets from user_characters.settings.anthropicThinkingBudgetTokens with deploy-level defaults (ANTHROPIC_COT_BUDGET) and an optional global cap (ANTHROPIC_COT_BUDGET_MAX), plus helpers to deep-merge providerOptions without clobbering existing gateway/google settings.

Wires these options into MCP/A2A agent chat (including credit reservation estimates that account for thinking tokens) and applies env-based thinking merges to several other text routes, while explicitly disabling CoT for background/promo/SEO and image-generation flows to preserve temperature control. Adds env validation for the new variables, updates rate limiting to use production-level limits in dev with a RATE_LIMIT_MULTIPLIER, expands CI/typecheck coverage, and hardens/adjusts unit+integration tests (including Milady pricing mock isolation) with new docs and changelog entries.

Written by Cursor Bugbot for commit 3b0b079. This will update automatically on new commits. Configure here.

Summary by CodeRabbit

  • New Features

    • Per-agent Anthropic “extended thinking” token budgets for MCP/A2A chats; model calls now honor per-agent and env-derived budgets with optional max cap
    • Global env toggles added to set defaults and optional ceilings
  • Documentation

    • New docs and roadmap entries explaining Anthropic extended-thinking configuration and unit-testing guidance
  • Bug Fixes

    • Rate-limit presets unified to consistent numeric values across environments
  • Chores

    • CI/typecheck coverage extended to include test project types

- Align check-types-split with packages/db, packages/lib, and UI components;
  exclude Storybook stories from the UI chunk.
- Add check-types:tests to CI; add root scripts for UI, agent-server, and
  gateway typechecks.
- packages/ui: path aliases for @/lib, @/db, @/app, @/packages/ui; styled-jsx
  attrs for <style jsx>.
- packages/lib: JWKS uses jose KeyLike; runtime-factory imports plugin-sql
  default with typed adapter factory.
- gateway-discord: explicit discord.js handler types, connectBot characterId,
  sendTyping guard, listener map cast.
- services/agent-server: requireInternalAuth accepts Elysia string/number status.
- Tests: mocks, RequestInit/fetch preconnect, UUID casts, toMatchObject,
  financial addCredits, Playwright APIRequestContext, and related fixes.
- New anthropic-thinking provider helper and unit test.

Note: services/operator (Pepr) is not part of the main workspace typecheck.
Made-with: Cursor
@vercel
Copy link
Copy Markdown

vercel bot commented Mar 27, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
eliza-cloud-v2 Error Error Mar 27, 2026 5:40pm

@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Mar 27, 2026

Walkthrough

This PR adds Anthropic “extended thinking” (CoT) budget support with per-agent settings and env fallbacks/caps, provider-options merge helpers, widespread integration of those helpers into AI request calls, TypeScript typing and test infrastructure improvements, Milady test rework, a rate-limit preset change, and documentation updates.

Changes

Cohort / File(s) Summary
Anthropic Extended Thinking Core
packages/lib/providers/anthropic-thinking.ts, packages/lib/providers/cloud-provider-options.ts, packages/lib/providers/index.ts, packages/lib/config/env-validator.ts
New Anthropic CoT module: env + per-character parsing, resolution with optional max clamp, provider-options deep-merge helpers, new type CloudMergedProviderOptions, and env validation entries for ANTHROPIC_COT_BUDGET / ANTHROPIC_COT_BUDGET_MAX.
Route & Service Integrations
app/api/agents/[id]/a2a/route.ts, app/api/agents/[id]/mcp/route.ts, app/api/v1/chat/completions/route.ts, app/api/v1/chat/route.ts, app/api/v1/messages/route.ts, app/api/mcp/tools/generation.ts, app/api/v1/generate-image/route.ts, app/api/v1/responses/route.ts, packages/lib/services/... (several files)
Spreads merged provider options (Anthropic CoT / Google modalities / gateway prefs) into streamText/generateText calls; MCP/A2A chat handlers now parse character.settings for per-agent budgets and pass computed options into LLM invocations.
Image Model Modalities & Google Merge
app/api/mcp/tools/generation.ts, app/api/v1/generate-image/route.ts, packages/lib/services/app-promotion-assets.ts, packages/lib/services/app-promotion.ts, packages/lib/services/seo.ts, packages/lib/services/twitter-automation/app-automation.ts, packages/lib/api/a2a/skills.ts
Replaced inline Google responseModalities with helper that merges Google image/text modalities with Anthropic CoT options; image-generation calls now pick model constants and merged provider options.
Provider Options Typing
packages/lib/providers/types.ts, packages/lib/providers/cloud-provider-options.ts, packages/lib/providers/index.ts
Introduces CloudMergedProviderOptions and updates request typings to use the shared provider-options shape; re-exports Anthropic helpers.
Env / Config / Scripts
.env.example, packages/scripts/check-types-split.ts, package.json, services/agent-server/package.json, tsconfig.test.json, packages/ui/tsconfig.json
Documented Anthropic env vars in .env.example; added/check-script and typecheck entries; adjusted typecheck splitting and tsconfig path/exclude updates; added agent-server typecheck script.
Milady Test Infrastructure & Helpers
packages/tests/helpers/mock-milady-pricing-for-route-tests.ts, packages/tests/helpers/index.ts, packages/tests/unit/*milady*.test.ts, packages/tests/unit/z-milady-billing-route.test.ts
New shared Milady mock helper that spreads real MILADY_PRICING and overrides MINIMUM_DEPOSIT; moved billing test to z-* and refactored DB mocks into registerMiladyBillingMocks() for stable ordering.
Unit Tests: Anthropic & Merge Helpers
packages/tests/unit/anthropic-thinking.test.ts
Comprehensive tests covering env parsing, per-character parsing, resolve/clamp logic, anthropicThinkingProviderOptions, and provider-options merge behaviors across Anthropic/Google/gateway cases.
Type Safety & Test Improvements
many packages/tests/... files (type-casting fetch mocks, typed fixtures, signature tightening)
Widespread test typing hardening: explicit casts via unknown, typed fixtures for OAuth/Twitter/Google, adjusted helper signatures, assert relaxations where appropriate, and preservation of preconnect via Object.assign wrappers.
Rate Limiting
packages/lib/middleware/rate-limit.ts
Removed dev-vs-prod isDevelopment conditionals: RateLimitPresets now use fixed numeric maxRequests values across environments (backing store still differs by REDIS_RATE_LIMITING).
Auth & JWKS Typing
packages/lib/auth/jwks.ts, services/agent-server/src/routes.ts
Replaced JoseCryptoKey alias with KeyLike in JWKS API; relaxed requireInternalAuth status typing to `number
E2E & Integration Test Adjustments
packages/tests/e2e/setup-server.ts, integration tests under packages/tests/integration/*
Improved server log piping guard, reworked fetchWithServer to preserve preconnect, removed source from some credits calls, and relaxed/updated integration assertions and fixtures.
Fixtures & Minor App Changes
packages/tests/fixtures/mcp-test-character.ts, packages/lib/eliza/runtime-factory.ts, packages/services/gateway-discord/src/gateway-manager.ts
Updated test fixture IDs; wrapped DB adapter import with typed local wrapper; strengthened Discord gateway typings and safe sendTyping checks.
Docs & Changelog
README.md, CHANGELOG.md, docs/anthropic-cot-budget.md, docs/unit-testing-milady-mocks.md, docs/ROADMAP.md, packages/content/changelog.mdx
Adds Anthropic CoT documentation and internal HOWTOs, README engineering docs links, changelog entries, and roadmap notes.
Misc Tests: assertions, typings, refactors
many packages/tests/unit/* files
Multiple tests updated for safer assertions (toMatchObject, guards), stricter typing, fixture helpers, and minor control-flow tweaks; many changes are test-only and type-focused.
UI JSX typings
packages/ui/src/styled-jsx.d.ts
Adds React module augmentation for StyleHTMLAttributes<T> with optional jsx and global boolean props.

Sequence Diagram(s)

sequenceDiagram
    participant Client as Client
    participant Route as MCP/A2A Route
    participant Resolver as CharacterResolver
    participant Anthropic as AnthropicThinking
    participant LLM as LLMProvider

    Client->>Route: POST /api/agents/{id}/mcp (chat)
    Route->>Resolver: resolve character
    Resolver-->>Route: character (includes settings)
    Route->>Anthropic: parseThinkingBudgetFromCharacterSettings(settings)
    Anthropic-->>Route: agentThinkingBudget?
    Route->>Anthropic: mergeAnthropicCotProviderOptions(model, env, budget)
    Anthropic-->>Route: providerOptions fragment
    Route->>LLM: streamText({ model, prompt, ...providerOptions })
    LLM-->>Route: stream response
    Route-->>Client: stream completion
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

🧠 Extended thinking budgets bloom,
Per-agent settings fill the room,
Provider options merge with grace,
TypeScript types find their place,
Milady mocks now stable stand— ✨

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 60.47% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The PR title 'feat: Anthropic CoT controls' directly reflects the main feature added: per-agent Anthropic extended thinking (chain-of-thought) token budget controls and related infrastructure.
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch odi-anthro

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@claude
Copy link
Copy Markdown

claude bot commented Mar 27, 2026


Code Review

This PR adds Anthropic extended thinking (ANTHROPIC_COT_BUDGET) support across all AI routes, fixes TypeScript type errors in tests/packages, adds a check-types:tests CI step, and hardens the Discord gateway with explicit handler types. The implementation is generally solid with good test coverage for the new helper.

Issues:

  1. Silent temperature stripping when thinking is enabled (medium)

app-promotion-assets.ts, app-promotion.ts, twitter-automation/app-automation.ts, and seo.ts pass explicit temperature alongside mergeProviderOptions. When ANTHROPIC_COT_BUDGET is set, @ai-sdk/anthropic silently strips temperature/topP/topK (noted in the v1/chat completions comment). Callers relying on temperature 0.7/0.8 will silently get default sampling - an invisible behavior change. Consider logging a warning at startup when both are configured.

  1. mergeProviderOptions silently overwrites unknown provider keys (low-medium)

The function deep-merges only gateway, anthropic, and google. Any other provider key present in both a and b (e.g. openai, azure) is overwritten by the shallow spread without merging. Worth a comment flagging this limitation.

  1. Rate limit change may break local development (medium)

Removing the isDevelopment override drops STANDARD from 10,000 to 60 req/min in local dev. Developers running hot-reload tests or UI workflows will now hit production limits backed only by in-memory storage. Consider keeping separate dev limits or documenting the change prominently.

  1. runtime-factory.ts default-import cast hides the type (low)

The default import is immediately cast to a manually-written signature. If the upstream @elizaos/plugin-sql signature changes, this cast will silently pass wrong types. Worth a comment explaining the workaround and a TODO to remove it when upstream fixes the type export.

  1. Definite assignment in test teardown (low)

In x402-topup.test.ts, afterAll accesses referralsServiceForTest etc. with definite-assignment assertions and no null guards. If beforeAll throws before all three are assigned, afterAll will also throw and may swallow the original failure. The previous null-guard pattern was safer.

Positives:

  • parseAnthropicCotBudgetFromEnv is thorough: strict digit regex, safe-integer check, and solid edge-case test coverage.
  • Extracting model strings to named constants before passing to mergeProviderOptions improves readability.
  • Discord sendTyping guard is correct - DMChannel/PartialGroupDMChannel lack that method.
  • Excluding Storybook stories from check-types-split is a good cleanup.
  • Switching to toMatchObject in server-wallets tests is the correct fix for partial-shape assertions.

Rename milady billing route test to z-milady-billing-route for script ordering,
use queue-backed db mocks re-registered in beforeEach, and add
mockMiladyPricingMinimumDepositForRouteTests so route tests do not replace
MILADY_PRICING with only MINIMUM_DEPOSIT (which broke billing thresholds in-process).
Update package.json bulk/special unit paths accordingly.

Made-with: Cursor
Extend anthropic-thinking handling, add cloud-provider-options, update provider
types and index exports, and thread safe model params through v1 chat,
completions, messages, responses, generate-image, MCP/A2A, and internal services.
Expand anthropic-thinking unit tests.

Made-with: Cursor
Add engineering docs (unit-testing-milady-mocks, anthropic-cot-budget),
root CHANGELOG.md, README links and accurate test script table, ROADMAP
and site changelog entries. Expand JSDoc and .env.example with rationale
for ANTHROPIC_COT_BUDGET and partial MILADY_PRICING mock pitfalls.

Made-with: Cursor
@odilitime odilitime marked this pull request as ready for review March 27, 2026 08:06
Copilot AI review requested due to automatic review settings March 27, 2026 08:06
@odilitime odilitime changed the title fix(types): test project, split check-types, UI package, gateways feat: Anthropic CoT controls Mar 27, 2026
@claude
Copy link
Copy Markdown

claude bot commented Mar 27, 2026

Code Review - PR 417

This PR is a broad type-fix/quality pass with one new functional feature (Anthropic extended thinking via ANTHROPIC_COT_BUDGET). The changes are generally clean. A few things worth discussing:

ISSUES

  1. Rate limiting: dev/prod unification is a breaking DX change (medium)

File: packages/lib/middleware/rate-limit.ts

Removing the isDevelopment branch means local dev now hits the same strict limits as production (60/min STANDARD, 10/min STRICT, 5 per 5-min CRITICAL) backed only by in-memory storage. Developers iterating locally will now hit 429s immediately. The old 10000 req/window dev ceiling existed precisely to avoid this.

The log message change says "same numeric limits as production" explicitly. Was this intentional? If so, it should be called out in the PR description. If not, the isDevelopment guard should be restored.

  1. Definite assignment (!) removes null-safety in teardown (low)

File: packages/tests/integration/x402-topup.test.ts

Using ! on the three service vars silences TS but means if beforeAll throws partway through, afterAll will crash accessing properties on uninitialized vars. The original null-guards were load-bearing. Consider keeping them or initializing with no-op stubs.

  1. mergeProviderOptions shallow merge - worth a comment (low)

File: packages/lib/providers/anthropic-thinking.ts

The merge is one level deep for anthropic/google/gateway. For the thinking key specifically: if base has { thinking: { type: "disabled" } } and COT helper adds { thinking: { type: "enabled", budgetTokens: N } }, the second replaces the first entirely. This is the desired behavior, but a short comment would prevent future contributors from trying to fix it.

  1. Google responseModalities leaks into non-Google paths (nit)

File: app/api/v1/generate-image/route.ts

The else branch in isOpenAIModel ? mergeAnthropicCot : mergeGoogleImage covers everything that is not OpenAI. Future Anthropic image models would fall into the Google path and pick up responseModalities unnecessarily. Low risk now, worth noting when new providers are added.

  1. Runtime factory type cast is fragile (nit)

File: packages/lib/eliza/runtime-factory.ts

Casting the default import manually bypasses TS module resolution. A comment explaining the upstream typing gap would help future audits.

POSITIVES

  • sendTyping guard: "sendTyping" in channel is the right type-narrowing approach.
  • UUID fixture: changing "test-agent-simple-001" to a valid UUID format prevents silent failures.
  • APIRequestContext in Playwright fixture: removing the ReturnType gymnastics is cleaner.
  • preconnect binding on fetchWithServer: correctly preserving fetch.preconnect.
  • Unit tests for anthropic-thinking.ts: good boundary coverage (0, invalid, whitespace, non-anthropic model, merge collision).
  • Storybook exclusion from check-types-split: correct - stories use Storybook types not in the main tsconfig.
  • isChatInputCommand() guard: proper discord.js type guard instead of duck-typing is a correctness improvement.

Overall this is a solid type cleanup pass. The rate-limit change is the most impactful behavioral difference and should be confirmed as intentional before merge.

@claude
Copy link
Copy Markdown

claude bot commented Mar 27, 2026

PR Review: fix(types): test project, split check-types, UI package, gateways

Overall this is well-structured work with clear documentation of design decisions. A few items worth addressing before merge.

Bugs / Correctness

x402-topup.test.ts: afterAll null safety regression - The refactor replaced null-guarded service patches with definite assignment (!) but the afterAll cleanup now runs unconditionally. If beforeAll throws before the service variables are assigned, afterAll will throw on property access, masking the root failure and potentially breaking test teardown for other suites. Restore optional chaining or null guards in afterAll even if the variable declarations no longer need the union type.

Behaviour Change Worth Calling Out

Rate limit presets: dev-mode unlimited -> production limits for local development - packages/lib/middleware/rate-limit.ts removes the isDevelopment pattern and always uses production-level limits (STANDARD: 60/min, STRICT: 10/min, etc.). The only difference now is in-memory vs Redis backing. This will hit local developers on standard routes once they exceed 60 req/min. If intentional, worth a note in dev setup docs.

Minor Issues

  1. mergeProviderOptions silently drops unknown provider keys - The merge explicitly deep-merges gateway, anthropic, and google, but the initial spread means any other provider key present in both inputs will be shallow-overwritten by the second. Worth a comment so the next person adding a provider (mistral, cohere) knows to add it to the explicit merge block.

  2. runtime-factory.ts: default import workaround is not grouped - The import declaration and the const re-cast are separated by an unrelated import. Grouping them or adding a comment would help future readers understand why the named binding does not work here.

  3. ANTHROPIC_COT_BUDGET parse throws at request time, not startup - parseAnthropicCotBudgetFromEnv throws for invalid values and is called per-request, so a misconfigured env var produces 500s rather than a startup failure. Consider a module-level one-time parse or document that env-validator must run at startup.

What is Good

  • mockMiladyPricingMinimumDepositForRouteTests cleanly solves the Bun mock pollution problem; docs/unit-testing-milady-mocks.md is a valuable future reference.
  • Deep-merge helpers correctly preserve existing gateway.order and google.responseModalities when adding anthropic.thinking - the test suite covers the key combinations well.
  • Discord gateway: sendTyping guard and isChatInputCommand() check are correct defensive fixes.
  • Playwright APIRequestContext type fix removes the as-any cast cleanly.
  • fetch.preconnect binding on test fetch wrappers is a correct TypeScript conformance fix.
  • check-types:tests CI step closes a real coverage gap.

Summary: The null-safety regression in x402-topup.test.ts afterAll is the only real bug. The rate-limit behaviour change is worth explicit acknowledgement. Everything else is minor or informational.

Copy link
Copy Markdown

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Autofix Details

Bugbot Autofix prepared a fix for the issue found in the latest run.

  • ✅ Fixed: Spread order lets options override deploy-wide thinking config
    • Swapped the spread order so mergeAnthropicCotProviderOptions comes after options, ensuring deploy-wide ANTHROPIC_COT_BUDGET cannot be overridden by caller-supplied options.

Create PR

Or push these changes by commenting:

@cursor push 99924fb2eb
Preview (99924fb2eb)
diff --git a/packages/lib/api/a2a/skills.ts b/packages/lib/api/a2a/skills.ts
--- a/packages/lib/api/a2a/skills.ts
+++ b/packages/lib/api/a2a/skills.ts
@@ -91,8 +91,8 @@
         role: m.role as "user" | "assistant" | "system",
         content: m.content,
       })),
+      ...options,
       ...mergeAnthropicCotProviderOptions(model),
-      ...options,
     });
 
     let fullText = "";

This Bugbot Autofix run was free. To enable autofix for future PRs, go to the Cursor dashboard.

odilitime and others added 2 commits March 27, 2026 08:13
Read user_characters.settings.anthropicThinkingBudgetTokens for agent chat;
add ANTHROPIC_COT_BUDGET_MAX cap and resolution helpers. Wire MCP and A2A
routes; extend tests, env-validator, and .env.example. Update CHANGELOG,
site changelog, README, ROADMAP, anthropic-cot-budget doc, and route/module
comments with WHYs (owner-controlled settings vs untrusted request params).

Made-with: Cursor
…ridden

Move mergeAnthropicCotProviderOptions spread after options spread in
executeSkillChatCompletion to ensure the deploy-wide ANTHROPIC_COT_BUDGET
configuration takes precedence over caller-supplied options.

This aligns with all other call sites in the codebase which consistently
place mergeAnthropicCotProviderOptions last to enforce operator-controlled
thinking budgets.
@odilitime
Copy link
Copy Markdown
Contributor Author

@cursor push 99924fb

…ridden

Move mergeAnthropicCotProviderOptions spread after options spread in
executeSkillChatCompletion to ensure the deploy-wide ANTHROPIC_COT_BUDGET
configuration takes precedence over caller-supplied options.

This aligns with all other call sites in the codebase which consistently
place mergeAnthropicCotProviderOptions last to enforce operator-controlled
thinking budgets.

Applied via @cursor push command
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Aligns repo typechecking and test typings across the workspace, introduces deploy-scoped Anthropic extended-thinking configuration helpers, and tightens/clarifies several gateway + test mocks/casts to reduce order-dependent failures.

Changes:

  • Split/expand typecheck coverage (tests project typecheck in CI, new per-package typecheck scripts, updated tsconfigs/paths).
  • Add Anthropic COT budget env + providerOptions merge helpers and thread them through gateway-backed routes/services.
  • Stabilize/retarget numerous unit/integration tests (mocks, env handling, casting, and fixture shapes), including Milady pricing mock helper + renamed cron test.

Reviewed changes

Copilot reviewed 100 out of 101 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
tsconfig.test.json Adds UI package path aliases and refines include/exclude for test typecheck.
services/agent-server/src/routes.ts Broadens set.status typing for Elysia compatibility.
services/agent-server/package.json Adds typecheck script.
packages/ui/tsconfig.json Adds cross-workspace path aliases and excludes Storybook stories from UI typecheck.
packages/ui/src/styled-jsx.d.ts Augments React style attributes for styled-jsx.
packages/tests/unit/z-milady-billing-route.test.ts Reworks Milady billing cron test mocking to be order-stable across the suite.
packages/tests/unit/x402/facilitator-service.test.ts Adjusts casts for stricter typing.
packages/tests/unit/wallet-auth.test.ts Fixes mock typing/resolved values.
packages/tests/unit/waifu-bridge.test.ts Uses a typed mutable env view for NODE_ENV writes.
packages/tests/unit/v1-milaidy-provision-route.test.ts Uses shared Milady pricing mock helper.
packages/tests/unit/token-agent-linkage.test.ts Refactors validation test into scenario loop with strict checks.
packages/tests/unit/security-validations.test.ts Avoids {} prototype typing issues in pollution assertions.
packages/tests/unit/referrals-service.test.ts Uses toMatchObject to reduce brittle equality.
packages/tests/unit/provisioning-jobs.test.ts Fixes Vitest mock restore typing and fetch assignment typing.
packages/tests/unit/provisioning-jobs-followups.test.ts Uses toMatchObject for partial comparisons.
packages/tests/unit/privy-sync.test.ts Uses toMatchObject for hydrated user comparisons.
packages/tests/unit/pr385-round5-fixes.test.ts Aligns fake service signature with optional id param.
packages/tests/unit/performance-optimizations.test.ts Adds typed helpers to avoid repeated unsafe access to provider snapshots.
packages/tests/unit/oauth/token-cache.test.ts Rewrites date normalization branch for clearer typing.
packages/tests/unit/oauth/provider-registry.test.ts Adjusts optional chaining/non-null assertions in provider routes.
packages/tests/unit/oauth/oauth-service.test.ts Tightens typings for provider routes/platform loops/status handling.
packages/tests/unit/oauth/errors.test.ts Stabilizes enum value typing for expectations.
packages/tests/unit/milaidy-sandbox-bridge-security.test.ts Fixes fetch typing and backup fixture shape updates.
packages/tests/unit/milaidy-pairing-token-route.test.ts Stabilizes env setup/teardown for base domain; removes module mock.
packages/tests/unit/milaidy-agent-routes-followups.test.ts Uses shared Milady pricing mock helper.
packages/tests/unit/milady-sandbox-service.test.ts Introduces typed sandbox provider helper and updates sandbox fixture fields.
packages/tests/unit/milady-create-routes.test.ts Uses shared Milady pricing mock helper.
packages/tests/unit/mcp-twitter-tools.test.ts Adds typed OAuth fixtures + handler result typing; normalizes token result shape.
packages/tests/unit/mcp-proxy-affiliate-pricing.test.ts Fixes fetch typing cast.
packages/tests/unit/mcp-lib.test.ts Tightens auth context typing and assertions.
packages/tests/unit/mcp-hubspot-tools.test.ts Adds typed OAuth fixtures/token helper and fixes fetch typing casts.
packages/tests/unit/mcp-google-tools.test.ts Adds typed OAuth fixtures/handler result typing; fixes fetch typing casts.
packages/tests/unit/internal-jwt-auth.test.ts Uses typed env writes for NODE_ENV.
packages/tests/unit/field-encryption.test.ts Adds typed rows and mock signatures to match repository interfaces.
packages/tests/unit/evm-rpc-proxy-route.test.ts Fixes fetch typing casts.
packages/tests/unit/engagement-metrics/compute-metrics-cron.test.ts Aligns mock signatures with implementation params.
packages/tests/unit/engagement-metrics/admin-metrics-api.test.ts Aligns mock signature with implementation params.
packages/tests/unit/eliza-app/whatsapp-auth.test.ts Casts string literals for stricter typing compatibility.
packages/tests/unit/eliza-app/telegram-ux-helpers.test.ts Fixes fetch typing cast.
packages/tests/unit/docker-ssh-cloud-deploy.test.ts Adjusts logger redact import strategy.
packages/tests/unit/database-url.test.ts Uses typed env record for safe env mutation + asserts.
packages/tests/unit/compat-envelope.test.ts Updates Milady sandbox fixture with new billing fields.
packages/tests/unit/api/route-test-helpers.ts Improves routeParams typing and fixes File BlobPart typing.
packages/tests/unit/api/openapi-catalog.test.ts Refactors test structure and simplifies OPTIONS invocation.
packages/tests/unit/anthropic-thinking.test.ts Adds unit tests for new Anthropic thinking env + merge helpers.
packages/tests/unit/admin-service-pricing-route.test.ts Tightens auth mock typing and request construction.
packages/tests/runtime/performance.test.ts Fixes UUID typing casts for agentId.
packages/tests/runtime/mcp-assistant-trending.test.ts Updates debug rendering mode and failure handling structure.
packages/tests/runtime/integration/runtime-factory/oauth-cache-invalidation.test.ts Removes any for MCP settings lookup with typed cast.
packages/tests/runtime/integration/performance/runtime-creation.test.ts Fixes UUID typing casts for agentId.
packages/tests/runtime/integration/message-handler/mcp-tools.test.ts Updates debug rendering mode and failure handling structure.
packages/tests/playwright/fixtures/auth.fixture.ts Uses APIRequestContext instead of inferred request type and removes any call.
packages/tests/load-env.ts Uses typed env write for NODE_ENV.
packages/tests/integration/x402-topup.test.ts Refactors service patch typing and restores methods after suite.
packages/tests/integration/webhooks-e2e.test.ts Wraps fetch to auto-sign webhooks and preserves fetch.preconnect.
packages/tests/integration/unified-oauth-api.test.ts Tightens it wrapper signature.
packages/tests/integration/services/users.service.test.ts Adds fixture guard and tightens nullable handling.
packages/tests/integration/services/users-join-regression.test.ts Adds runtime guard for relation shape.
packages/tests/integration/services/organizations.service.test.ts Simplifies membership assertion.
packages/tests/integration/server-wallets.test.ts Updates mocked returning rows and loosens assertion to toMatchObject.
packages/tests/integration/mcp-registry.test.ts Tightens test wrapper signature.
packages/tests/integration/financial/credits-budget-flow.test.ts Removes source field from addCredits call in test.
packages/tests/integration/financial/concurrent-operations.test.ts Removes source field from addCredits call in test.
packages/tests/helpers/mock-milady-pricing-for-route-tests.ts Adds helper to safely override Milady MINIMUM_DEPOSIT without clobbering constants.
packages/tests/helpers/index.ts Re-exports the new Milady pricing mock helper.
packages/tests/fixtures/mcp-test-character.ts Updates simpleTestCharacter id to UUID-like value.
packages/tests/e2e/setup-server.ts Wraps fetch to ensure server availability and preserves fetch.preconnect.
packages/services/gateway-discord/src/gateway-manager.ts Tightens discord.js handler typing, adds characterId routing, guards sendTyping, and listener map casts.
packages/scripts/check-types-split.ts Updates typecheck chunk selection and excludes stories from temp configs.
packages/lib/services/twitter-automation/app-automation.ts Threads new Anthropic COT merge helper into gateway generateText calls.
packages/lib/services/seo.ts Threads new Anthropic COT merge helper into SEO Claude calls.
packages/lib/services/app-promotion.ts Threads new Anthropic COT merge helper into promo copy generation.
packages/lib/services/app-promotion-assets.ts Uses merge helpers for google image modalities + Anthropic COT for copy generation.
packages/lib/services/app-builder-ai-sdk.ts Threads new Anthropic COT merge helper into streamText calls.
packages/lib/providers/types.ts Broadens providerOptions typing to shared JSON-serializable merged options.
packages/lib/providers/index.ts Exports new anthropic-thinking helpers.
packages/lib/providers/cloud-provider-options.ts Adds shared merged providerOptions type alias.
packages/lib/providers/anthropic-thinking.ts Adds env parsing + merge helpers for Anthropic extended thinking.
packages/lib/middleware/rate-limit.ts Makes preset numeric limits consistent across dev/prod; clarifies logging/docs.
packages/lib/eliza/runtime-factory.ts Changes plugin-sql adapter import to default + casted factory signature.
packages/lib/config/env-validator.ts Adds validation rule for ANTHROPIC_COT_BUDGET.
packages/lib/auth/jwks.ts Updates jose KeyLike typings for cached keys.
packages/lib/api/a2a/skills.ts Threads merge helpers into skill chat + image generation paths.
packages/content/changelog.mdx Adds Mar 27, 2026 entry for Anthropic COT + Milady testing fixes.
package.json Adds per-package typecheck scripts and updates unit test split scripts for renamed file.
docs/unit-testing-milady-mocks.md Documents Bun mock.module pitfalls and Milady pricing mock strategy.
docs/anthropic-cot-budget.md Documents ANTHROPIC_COT_BUDGET behavior and providerOptions merge rationale.
docs/ROADMAP.md Records completed items for COT budget + Milady test stabilization.
app/api/v1/responses/route.ts Uses gateway+Anthropic merge helper for providerOptions injection.
app/api/v1/messages/route.ts Threads merge helper into non-stream and stream AI SDK calls.
app/api/v1/generate-image/route.ts Uses merge helpers for google image modalities and optional Anthropic COT.
app/api/v1/chat/route.ts Threads merge helper into chat streaming call.
app/api/v1/chat/completions/route.ts Threads merge helper into streaming and non-streaming calls.
app/api/v1/admin/service-pricing/tests/route.integration.test.ts Standardizes NextRequest construction via URL base.
app/api/mcp/tools/generation.ts Threads merge helpers into MCP generation (text + image) tools.
app/api/agents/[id]/mcp/route.ts Threads merge helper into tool-call streaming.
app/api/agents/[id]/a2a/route.ts Threads merge helper into A2A streaming.
README.md Adds links to new “WHY” engineering docs and clarifies bun run test staging.
CHANGELOG.md Adds Keep-a-Changelog style engineering changelog entries for these changes.
.github/workflows/tests.yml Adds check-types:tests to CI.
.env.example Documents optional ANTHROPIC_COT_BUDGET env var.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +18 to +21
const createDatabaseAdapter = createDatabaseAdapterDefault as (
config: { postgresUrl: string },
agentId: UUID,
) => IDatabaseAdapter;
Copy link

Copilot AI Mar 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@elizaos/plugin-sql/node is being imported as a default export and then cast to a callable adapter factory. This bypasses type safety and can hide a runtime mismatch if the default export isn’t actually callable (or if its signature changes). Consider updating the local declare module "@elizaos/plugin-sql/node" typings to correctly model the default export (e.g., a callable function with any attached properties like schema) and then import it with the accurate type, rather than casting unknown here.

Suggested change
const createDatabaseAdapter = createDatabaseAdapterDefault as (
config: { postgresUrl: string },
agentId: UUID,
) => IDatabaseAdapter;
type CreateDatabaseAdapterFn = (
config: { postgresUrl: string },
agentId: UUID,
) => IDatabaseAdapter;
function ensureCreateDatabaseAdapter(fn: unknown): CreateDatabaseAdapterFn {
if (typeof fn !== "function") {
throw new TypeError(
'Default export from "@elizaos/plugin-sql/node" is not a callable database adapter factory',
);
}
return fn as CreateDatabaseAdapterFn;
}
const createDatabaseAdapter = ensureCreateDatabaseAdapter(
createDatabaseAdapterDefault,
);

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dismissed: Comment targets line 21 but identifier schema not found in file — code may have been removed or renamed

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 6455e14.

Comment on lines +78 to +95
ANTHROPIC_COT_BUDGET: {
required: false,
description:
"Default Anthropic extended-thinking token budget when a character omits settings.anthropicThinkingBudgetTokens. Unset or 0 disables unless the character sets a positive budget",
validate: (value: string) => {
const trimmed = value.trim();
if (trimmed === "") {
return false;
}
if (!/^\d+$/.test(trimmed)) {
return false;
}
const n = Number.parseInt(trimmed, 10);
return n >= 0 && n <= Number.MAX_SAFE_INTEGER;
},
errorMessage:
"Must be a non-negative integer string (0 = off; when enabling, Anthropic often expects at least ~1024)",
},
Copy link

Copilot AI Mar 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ANTHROPIC_COT_BUDGET parsing in anthropic-thinking.ts throws on invalid non-empty values, but the env validator treats invalid optional values as warnings (startup continues). That means a mis-set budget can turn into request-time exceptions across any route that spreads mergeAnthropicCotProviderOptions(...). Either make invalid values for this variable a hard validation error (fail fast at startup), or change the parsing/merge path to degrade safely (e.g., log once and treat as "off").

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in fe10e6b.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 18bfb11.

Comment on lines +11 to +16
// Importing via the file-system path with a cache-buster query param
// guarantees we always get the real implementation regardless of mocks.
import { redact } from "../../lib/utils/logger.ts?_real";
import { redact } from "../../lib/utils/logger";
Copy link

Copilot AI Mar 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment above says this import uses a cache-buster query param to avoid mock.module("@/lib/utils/logger") poisoning, but the import no longer includes any cache-buster. If other unit files mock the logger module earlier in the same Bun process, this test may again receive the mocked module and lose the real redact export. Either restore the cache-buster (or another unmocked import strategy) or update the comment and ensure redact is always available in the mocked logger module(s).

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in fe10e6b.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dismissed: Line 16 still includes the cache-buster query param ?v=docker-ssh-test as required

@odilitime
Copy link
Copy Markdown
Contributor Author

@coderabbitai review

@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Mar 27, 2026

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

Iteration 1

prr-fix:prrc_kwdop6btdm6yyjtn
Iteration 1

prr-fix:prrc_kwdop6btdm6yyjt7
Changes:
- rate-limit.ts: PR Review: fix(types): test project, split check-types, UI package, gateways ...
@odilitime
Copy link
Copy Markdown
Contributor Author

@coderabbitai review

@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Mar 27, 2026

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

@claude
Copy link
Copy Markdown

claude bot commented Mar 27, 2026

Code Review — PR #417: Anthropic CoT Controls

Overall the approach is sound: centralizing thinking-budget resolution in a single helper, keeping budgets off request bodies, and wiring env defaults through a type-safe merge layer. A few issues worth addressing before merge.


🐛 Potential Bugs

1. Temperature conflict in seo.ts and app-promotion.ts

Anthropic's API rejects requests with temperature ≠ 1 when extended thinking is enabled. The PR comment in chat/completions/route.ts says "@ai-sdk/anthropic strips temp/topP/topK when thinking is on", but seo.ts and app-promotion.ts use generateText (not streamText), and the two routes use hardcoded non-1 temperatures:

// seo.ts
const { text } = await generateText({
  model: gateway.languageModel(modelId),
  temperature: 0.3,           // ← will cause API error when ANTHROPIC_COT_BUDGET is set
  ...mergeAnthropicCotProviderOptions(modelId),
  ...
// app-promotion.ts
await generateText({
  model: gateway.languageModel(promoModel),
  temperature: 0.7,           // ← same risk
  ...mergeAnthropicCotProviderOptions(promoModel),

If the AI SDK Gateway path does not strip temperature for generateText the way the comment implies, this will surface as a 400 from the Anthropic API when ANTHROPIC_COT_BUDGET is non-zero. Recommend either (a) explicitly documenting the SDK version that guarantees stripping or (b) using temperature: 1 at these call sites when thinking is active, or (c) a comment noting the SDK guarantees and the SDK version.


2. mergeGoogleImageModalitiesWithAnthropicCot now applied to ALL non-OpenAI image models

In generate-image/route.ts:

const cotOpts = isOpenAIModel
  ? mergeAnthropicCotProviderOptions(imageModel)    // returns {} for OpenAI — no-op
  : mergeGoogleImageModalitiesWithAnthropicCot(imageModel);  // applies google.responseModalities to ANY non-OpenAI model

Before this PR, google.responseModalities was only set in the two Google-specific streamConfig paths. After this change it's applied to every non-OpenAI model at the call site. If the image route ever serves an Anthropic or other non-Google image model, it will receive an unexpected google provider option. The AI SDK likely ignores it, but it changes observable behavior. The isOpenAIModel ? ... : ... branch would be more precise as isGoogleModel ? mergeGoogleImageModalitiesWithAnthropicCot : mergeAnthropicCotProviderOptions.


⚠️ Breaking Change for Developers

3. Rate-limit preset change removes dev relaxation

// Before
maxRequests: isDevelopment ? 10000 : 60,

// After
maxRequests: 60,

Local developers and CI integration tests that exercise HTTP API routes will now hit production-level rate limits (60 req/min STANDARD). Unit tests that spin up a real server or use fetch against local routes can fail intermittently. This is documented in the preset comments, but it's a behavioral change that will surprise contributors who run bun run test:integration locally. Worth calling out explicitly in the PR description or flagging in CONTRIBUTING.md.


🔍 Code Quality

4. mergeProviderOptions only deep-merges 3 known provider keys

if (a?.gateway && b?.gateway) { out.gateway = { ...a.gateway, ...b.gateway }; }
if (a?.anthropic && b?.anthropic) { out.anthropic = { ...a.anthropic, ...b.anthropic }; }
if (a?.google && b?.google) { out.google = { ...a.google, ...b.google }; }

Any other provider-specific keys (openai, vertex, bedrock, etc.) fall through to the { ...a, ...b } shallow spread, where b's top-level key silently overwrites a's. This is fine for current usage but worth a comment noting the limitation so future callers don't get surprised when adding new provider options.

5. parsePositiveIntStrict accepts and returns 0

The function is named "strict positive" but returns 0 (since /^\d+$/ matches "0" and parseInt("0") = 0). Callers then do if (n <= 0) return null after the call. The naming and intent are slightly mismatched — a comment clarifying that "positive" here means "non-negative" would reduce confusion.

6. parseThinkingBudgetFromCharacterSettings silently truncates floats

const n = Math.trunc(raw);  // 1.9 → 1, 0.5 → 0

A stored value of 0.5 becomes 0 (thinking off), which may surprise an agent owner who intended a positive budget. Since the docs and interface say "integer ≥ 0", consider rejecting non-integers explicitly (!Number.isInteger(raw)) rather than silently truncating, or documenting that truncation occurs.


✅ Strengths

  • The single resolution path in resolveAnthropicThinkingBudgetTokens is clean and easy to audit.
  • ANTHROPIC_COT_BUDGET_MAX as an operator ceiling over per-agent settings is a good safety valve.
  • failOnInvalid: true in env-validator.ts ensures bad config is caught at startup rather than at request time.
  • Test coverage for parse functions, merge helpers, and edge cases (0 disables, max clamps) is thorough.
  • Deep-merging providerOptions rather than shallow-spreading is the correct approach.
  • The CHANGELOG.md and docs/anthropic-cot-budget.md docs are well-written and explain the design rationale clearly.

process.env,
agentThinkingBudget ?? undefined,
);
const baseCost = await estimateRequestCost(model, fullMessages, effectiveThinkingBudget);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A2A route cost estimation inconsistent with MCP route

Medium Severity

The A2A route passes effectiveThinkingBudget (type number | null) directly to estimateRequestCost as maxOutputTokens. When the thinking budget is e.g. 5000, estimateRequestCost uses 5000 as the total output token estimate, replacing the default 500 base output tokens. The MCP route correctly adds the thinking budget on top of base output tokens (500 + effectiveThinkingBudget). This means the A2A route underestimates credit reservation when thinking is enabled. Additionally, when resolveAnthropicThinkingBudgetTokens returns null, it's passed as null to a parameter typed number | undefined.

Additional Locations (1)
Fix in Cursor Fix in Web

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 67138cc.

// Note: estimateRequestCost adds thinking tokens to output estimate when budget is provided
// Calculate estimated costs, including potential thinking budget
// Use resolveAnthropicThinkingBudgetTokens to get effective budget (same as MCP route)
// This ensures credit reservation accounts for thinking tokens
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Duplicate comment block in A2A route handler

Low Severity

Lines 297-301 contain a duplicated comment — // Calculate estimated costs, including potential thinking budget appears twice in succession (once at line 297 with a // Note: continuation, and again at line 299 with different continuation text). This looks like an accidental merge artifact from editing the same block twice.

Fix in Cursor Fix in Web

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 67138cc.

Iteration 7

prr-fix:prrc_kwdop6btdm6y8iwm
prr-fix:prrc_kwdop6btdm6y8iwq
@claude
Copy link
Copy Markdown

claude bot commented Mar 27, 2026

PR Review: feat: Anthropic CoT controls

Overall this is well-structured work — the security rationale (budget lives on the character record, not request bodies) is sound, the deep-merge helpers are properly designed, and env validation with failOnInvalid is a good pattern. A few issues worth addressing before merge.


Bugs / Correctness

1. Duplicate comment block in app/api/agents/[id]/a2a/route.ts

There are two nearly-identical comment blocks left by accident around the cost estimation section:

// Calculate estimated costs, including potential thinking budget
// Note: estimateRequestCost adds thinking tokens to output estimate when budget is provided
// Calculate estimated costs, including potential thinking budget   ← duplicate
// Use resolveAnthropicThinkingBudgetTokens to get effective budget (same as MCP route)

One should be removed.

2. Inconsistent output-token estimation between MCP and A2A routes

  • MCP route (mcp/route.ts): manually adds effectiveThinkingBudget to a hardcoded baseOutputTokens = 500
  • A2A route (a2a/route.ts): passes effectiveThinkingBudget as a 3rd argument to estimateRequestCost

These do different things. If estimateRequestCost already accounts for the thinking budget, the MCP route's manual addition may double-count in some future refactor. The two routes should use the same approach.

3. model.includes("claude") is a weak check in mcp/route.ts

const estimatedOutputTokens = model.includes("claude") && effectiveThinkingBudget > 0

Every other check in this file uses getProviderFromModel(model) === "anthropic" or resolveAnthropicThinkingBudgetTokens (which checks provider internally). The .includes("claude") check could match a hypothetical non-Anthropic model with "claude" in the name. Use getProviderFromModel(model) === "anthropic" here for consistency.

4. getRateLimitMultiplier uses parseInt but validator allows decimals

env-validator.ts accepts RATE_LIMIT_MULTIPLIER values matching /^\d+(\.\d+)?$/ (decimals OK). But getRateLimitMultiplier() calls Number.parseInt, silently truncating decimal values like "1.5". Either align the validator to integers-only (/^\d+$/) or switch to Number.parseFloat in getRateLimitMultiplier.

5. No minimum budget enforcement in resolveAnthropicThinkingBudgetTokens

Anthropic requires budget_tokens ≥ 1024 when thinking is enabled. A character setting of, say, anthropicThinkingBudgetTokens: 50 would pass through resolveAnthropicThinkingBudgetTokens (it's > 0) and hit the API, which would reject it. Consider enforcing a minimum (e.g., 1024) on the resolved value, or at least documenting the Anthropic minimum in the settings key JSDoc.


Design / Quality

6. Root-level anthropic-thinking.ts and anthropic-thinking.test.ts are dead code

The routes import from @/lib/providers/anthropic-thinking, not from the repo root. The root-level files define a completely different API (CharacterThinkingSettings.anthropicThinking.enabled boolean vs the anthropicThinkingBudgetTokens integer in the packages/lib version) and have divergent constants (DEFAULT=10000, MIN=1000 vs no explicit minimums in packages/lib). These look like an early prototype that was superseded. They should be deleted — leaving them creates confusion about which implementation is authoritative.

7. RateLimitPresets lost as const

The change removes as const from RateLimitPresets, making the presets mutable objects. Any call site that spreads or destructures these can now accidentally mutate them. This was probably unintentional — add as const back, or use Object.freeze.

8. agentThinkingBudget ?? undefined in a2a/route.ts is a no-op

parseThinkingBudgetFromCharacterSettings returns number | undefined (never null), so agentThinkingBudget ?? undefined always equals agentThinkingBudget. This can be simplified to just agentThinkingBudget.


Missing Test Coverage

9. packages/lib/providers/anthropic-thinking.ts has no unit tests

The root anthropic-thinking.test.ts covers the orphaned root module. The production module at packages/lib/providers/anthropic-thinking.ts — which includes resolveAnthropicThinkingBudgetTokens, mergeProviderOptions, mergeAnthropicCotProviderOptions, mergeGoogleImageModalitiesWithAnthropicCot, and parseThinkingBudgetFromCharacterSettings — has no tests. These are the functions actually used by the routes and have edge cases (env cap wins, per-agent 0 disables despite env, maxCap < agentBudget clamps, model provider filter, etc.) that would benefit from test coverage.


Minor

  • The mcp-test-character.ts fixture addition in packages/tests/fixtures/ — does this fixture now need to include a settings field to test the settings: Record<string, unknown> added to the character type? Verify the fixture is still sufficient for the MCP route tests.
  • parsePositiveIntStrict duplicates the validation logic already in env-validator.ts. Not blocking, but worth noting if the validation rules ever diverge.

Iteration 1

prr-fix:ic-4143706884-3
prr-fix:ic-4144015244-5
Iteration 1

prr-fix:ic-4143990242-4
@claude
Copy link
Copy Markdown

claude bot commented Mar 27, 2026

Code Review — PR #417: Anthropic CoT Controls

Overall the architecture is sound: character settings > env default > disabled, with proper security rationale (untrusted clients cannot set thinking budgets). The merge helpers are a good abstraction. A few issues worth addressing before merge:


Bugs / Correctness

1. Root-level anthropic-thinking.ts duplicates — different model list than production

anthropic-thinking.ts and anthropic-thinking.test.ts live at the repo root but the production implementation is in packages/lib/providers/anthropic-thinking.ts. The root-level version:

  • Is missing claude-3-7-sonnet, claude-sonnet-4, claude-opus-4 from EXTENDED_THINKING_MODEL_PATTERNS
  • Has a completely different API (validateBudgetTokens, getThinkingConfig, buildThinkingParam vs the production resolveAnthropicThinkingBudgetTokens)

The root-level test (anthropic-thinking.test.ts) imports from ./anthropic-thinking (root), so it's testing the wrong implementation. If these are dev artifacts they should be removed; if they're meant to be the canonical implementation they should replace or be reconciled with the packages/lib/providers/ version.

2. getRateLimitMultiplier uses parseInt but env validator accepts floats

In packages/lib/middleware/rate-limit.ts:

const parsed = Number.parseInt(multiplier, 10);

But env-validator.ts accepts floats for RATE_LIMIT_MULTIPLIER:

if (!/^\d+(\.\d+)?$/.test(trimmed)) {

Setting RATE_LIMIT_MULTIPLIER=1.5 passes validation but silently truncates to 1. Use Number.parseFloat in getRateLimitMultiplier, or restrict the validator to integers only.

3. settings added as required field in character type narrowing

Both route handlers add:

settings: Record<string, unknown>;

to their local character type. If any existing user_characters DB row has a NULL settings column this will fail at runtime when parseThinkingBudgetFromCharacterSettings receives null. The function itself handles null gracefully, but the type annotation doesn't reflect that. It should be:

settings: Record<string, unknown> | null;

4. Thinking budget cost estimate inconsistency between MCP and A2A routes

MCP route:

const effectiveThinkingBudget = resolveAnthropicThinkingBudgetTokens(...) ?? 0;
const estimatedOutputTokens = model.includes("claude") && effectiveThinkingBudget > 0
  ? baseOutputTokens + effectiveThinkingBudget
  : baseOutputTokens;

A2A route:

const maxOutputTokens = effectiveThinkingBudget != null ? 500 + effectiveThinkingBudget : undefined;
const baseCost = await estimateRequestCost(model, fullMessages, maxOutputTokens);

The logic is different: MCP checks model.includes("claude") as a guard; A2A just checks != null. The resolveAnthropicThinkingBudgetTokens function already handles the provider/model check internally and returns null for non-Anthropic models, so the extra model.includes("claude") guard in MCP is redundant but harmless. Unifying these would reduce drift.


Design / Clarity

5. ANTHROPIC_COT_BUDGET=0 passes validation but is silently treated as disabled

The validator accepts 0 (n >= 0) but parseAnthropicCotBudgetFromEnv returns null when n <= 0. The env var description says "0 or unset = disabled" which is accurate, but the validator's errorMessage says "0 or unset = disabled" — making 0 pass validation when it could be clearer to just document this as "use empty string or omit to disable" rather than accepting 0. Not a bug, just potentially surprising to operators.

6. mergeProviderOptions note on non-deep-merged providers

The JSDoc notes that non-gateway/anthropic/google keys get clobbered. The current call sites don't pass conflicting providers, but this is a latent correctness risk as more providers are added. Consider either: (a) making the merge generic over all shared keys, or (b) adding a runtime assertion/warning if unknown keys collide.

7. a2a/skills.ts — no character context for CoT

The A2A skills route applies mergeAnthropicCotProviderOptions(model) (env-only, no character budget). This is documented as intentional (env-only for routes without a resolved character), but unlike api/agents/[id]/a2a it doesn't pass character settings even when a character is resolvable from the A2A context. Worth a comment explaining why skills don't get per-character budgets.


Minor

8. parsePositiveIntStrict throws on whitespace-only values — but callers trim before passing. The whitespace check in the function is unreachable for ANTHROPIC_COT_BUDGET since parseAnthropicCotBudgetFromEnv returns early for empty strings. Not harmful but slightly misleading.

9. Test file namingpackages/tests/unit/milady-billing-route.test.ts was renamed to z-milady-billing-route.test.ts (for ordering). This is a workaround for a Bun test runner issue. A comment explaining this naming convention (and ideally a note in the test file itself) would help future contributors avoid renaming it back.


What's Good

  • Security model is correct: budget on character record, not in request body. Clear rationale in docs.
  • The resolveAnthropicThinkingBudgetTokens resolution hierarchy (character > env default > disabled) is clean.
  • Explicit 0 passed to disable CoT in background services (SEO, promotion, Twitter) is the right approach where temperature matters.
  • failOnInvalid flag on env validator for startup-time failures is a useful addition.
  • Rate limit unification (same numbers prod/dev, multiplier knob) is cleaner than the old isDevelopment ? 10000 : 60 pattern.
  • RATE_LIMIT_DISABLED guarded to non-production only.

The main blocking items are #1 (root-level duplicate with wrong model list), #2 (parseInt/float mismatch), and #3 (nullable settings type).

@claude
Copy link
Copy Markdown

claude bot commented Mar 27, 2026

Eliza Cloud V2

Stack

  • Runtime: Bun
  • Framework: Next.js 15 (App Router)
  • Database: PostgreSQL + Drizzle ORM
  • Deployment: Vercel Serverless
  • UI: React + Tailwind CSS

Commands

bun install          # Install dependencies
bun run dev          # Start dev server
bun run build        # Production build
bun run db:migrate   # Apply database migrations
bun run db:generate  # Generate migration from schema
bun run db:studio    # Open Drizzle Studio

Database Migrations

Never use db:push - it's removed. All schema changes go through migrations.

Schema Change Workflow

  1. Edit schema in db/schemas/
  2. bun run db:generate
  3. Review SQL in db/migrations/
  4. bun run db:migrate
  5. Commit both schema + migration

Custom Migrations

npx drizzle-kit generate --custom --name=descriptive_name

Rules

  • No CREATE INDEX CONCURRENTLY (runs in transaction)
  • Use IF NOT EXISTS / IF EXISTS for creating tables
  • Never edit applied migrations
  • NEVER use omnibus migrations that recreate the full schema or existing objects - they will fail in production by locking active tables. Instead:
    1. Create small targeted migrations that ONLY add your new schema objects
    2. Use separate migrations for data backfills
    3. Put cleanup/drops in their own migration
    4. Group related objects together but limit migrations to <100 lines
  • See docs/database-migrations.md for details

Type Checking

bun run check-types has many pre-existing errors across the codebase (db/, lib/services/, app/). Don't try to fix them all — only verify your changed files have no new errors. Filter output:

bun run check-types 2>&1 | grep -E "(your-file\.ts|your-other-file\.ts)"

If the grep returns empty, your changes are clean. bun run build also fails on unrelated env vars (ELIZA_APP_DISCORD_BOT_TOKEN). Use check-types filtered to your files instead.

Project Structure

app/           # Next.js App Router pages
lib/           # Business logic, services
db/
  schemas/     # Drizzle schema definitions
  migrations/  # SQL migration files
  repositories/# Data access layer
components/    # React components
scripts/       # CLI utilities

@claude
Copy link
Copy Markdown

claude bot commented Mar 27, 2026

Eliza Cloud

A comprehensive AI agent development platform built with Next.js 15, featuring multi-model AI generation (text, image, video), full elizaOS runtime integration, enterprise authentication, credit-based billing, and production-ready cloud infrastructure.

📋 Table of Contents

🎯 Overview

Eliza Cloud V2 is a full-stack AI-as-a-Service platform that combines:

  • Multi-Modal AI Generation: Text chat, image creation, and video generation
  • elizaOS Integration: Full-featured autonomous agent runtime with memory, rooms, and plugins
  • SaaS Platform: User management, API keys, credit-based billing, usage tracking
  • Container Deployment: Deploy elizaOS projects via elizaos deploy CLI to AWS ECS
  • Enterprise Features: Privy authentication with multi-provider support, Stripe billing, ECR image storage, health monitoring

✨ Key Features

🤖 AI Generation Studio

  • Text & Chat:

    • Multi-model support (GPT-4, Claude, Gemini, etc.) via AI SDK Gateway
    • Real-time streaming responses
    • Anthropic-compatible /api/v1/messages endpoint for Claude Code and Anthropic SDK clients
    • Conversation persistence with full history
    • Model selection and configuration
  • Image Creation:

    • Google Gemini 2.5 Flash multimodal generation
    • High-quality images (1024x1024)
    • Automatic Vercel Blob storage
    • Base64 preview + downloadable files
  • Video Generation:

    • Multiple Fal.ai models: Veo3, Kling v2.1, MiniMax Hailuo
    • Long-form video support (up to 5 minutes)
    • Automatic Vercel Blob upload
    • Fallback handling with error recovery

🧠 elizaOS Runtime Integration

  • Full Agent Runtime:

    • AgentRuntime from @elizaos/core with PostgreSQL database
    • Memory system with vector embeddings (384-3072 dimensions)
    • Rooms, participants, relationships, and entities
    • Plugin system with custom providers and actions
  • Character Creator:

    • AI-assisted character definition builder
    • Progressive JSON generation with live preview
    • Import/export elizaOS-compatible character files
    • Support for all character fields (bio, style, plugins, knowledge, etc.)
  • Agent Chat Interface:

    • Chat with deployed elizaOS agents via rooms
    • Message persistence and history
    • Real-time WebSocket updates (future)
    • Multi-agent conversations

💳 SaaS Platform Features

  • Billing System:

    • Add funds via Stripe integration
    • Automatic deduction for AI operations
    • Usage tracking per organization/user
    • Credit packs with volume pricing
  • API Key Management:

    • Generate API keys for programmatic access
    • Key rotation and regeneration
    • Rate limiting per key
    • Usage statistics and audit logs
  • Container Deployments:

    • Deploy elizaOS projects via elizaos deploy CLI
    • Docker-based deployments to AWS ECS (Elastic Container Service)
    • ECR (Elastic Container Registry) for Docker image storage
    • EC2-based ECS (t4g.small ARM instances, 1 per user)
    • Health checks and monitoring via ECS

📊 Management & Analytics

  • Dashboard:

    • Usage overview with charts (Recharts)
    • Provider health monitoring
    • Credit activity timeline
    • Model usage breakdown
  • Gallery:

    • View all generated images and videos
    • Filter by type (image/video)
    • Download or delete media
    • Storage usage statistics
  • Analytics:

    • Usage records by model, provider, type
    • Cost breakdown and trends
    • Error tracking and success rates

🔐 Security & Infrastructure

  • Enterprise Auth:

    • Privy authentication with email, wallet, and social logins
    • Organization and user management
    • Webhook-based user synchronization
    • Role-based access (admin, member)
  • Billing Integration:

    • Stripe Checkout for credit purchases
    • Webhook processing with idempotency
    • Tax ID collection for businesses
    • Invoice generation
  • Type Safety:

    • Full TypeScript coverage
    • Zod validation for API requests
    • Drizzle ORM with type-safe queries

🏗 Architecture

Directory Structure

cloud/
├── app/                      # Next.js App Router
│   ├── api/                  # API routes
│   │   ├── v1/              # Versioned API
│   │   │   ├── chat/        # AI text generation
│   │   │   ├── generate-image/  # Image generation
│   │   │   ├── generate-video/  # Video generation
│   │   │   ├── gallery/     # Media gallery
│   │   │   ├── containers/  # Container management (AWS ECS/ECR)
│   │   │   ├── api-keys/    # API key CRUD
│   │   │   ├── character-assistant/  # Character creator AI
│   │   │   ├── user/        # User info
│   │   │   └── models/      # Available AI models
│   │   ├── eliza/           # elizaOS agent API
│   │   │   └── rooms/       # Agent rooms and messages
│   │   ├── stripe/          # Stripe webhooks and checkout
│   │   └── fal/             # Fal.ai proxy
│   ├── dashboard/           # Protected dashboard pages
│   │   ├── text/            # Text chat interface
│   │   ├── image/           # Image generation studio
│   │   ├── video/           # Video generation studio
│   │   ├── gallery/         # Generated media gallery
│   │   ├── containers/      # Container management UI
│   │   ├── api-keys/        # API key management
│   │   ├── billing/         # Credits and billing
│   │   ├── analytics/       # Usage analytics
│   │   ├── account/         # Account settings
│   │   ├── character-creator/  # Character builder
│   │   ├── eliza/           # elizaOS agent chat
│   │   └── storage/         # Storage management
│   ├── actions/             # Server actions
│   │   ├── auth.ts          # Auth actions
│   │   ├── gallery.ts       # Gallery actions
│   │   ├── characters.ts    # Character CRUD
│   │   ├── conversations.ts # Conversation management
│   │   └── users.ts         # User actions
│   ├── layout.tsx           # Root layout with analytics
│   ├── page.tsx             # Landing page
│   └── globals.css          # Global styles (Tailwind)
├── components/              # React components
│   ├── chat/                # Chat interfaces
│   ├── image/               # Image generation UI
│   ├── video/               # Video generation UI
│   ├── gallery/             # Gallery grid and display
│   ├── containers/          # Container tables
│   ├── api-keys/            # API key management UI
│   ├── billing/             # Credit packs and billing
│   ├── character-creator/   # Character builder UI
│   ├── dashboard/           # Dashboard metrics and cards
│   ├── layout/              # Header, sidebar, navigation
│   ├── theme/               # Theme provider and toggle
│   ├── ui/                  # Reusable UI components (45+ components)
│   └── ai-elements/         # AI-specific UI components
├── db/                      # Database layer
│   ├── sass/                # SaaS platform schema
│   │   └── schema.ts        # Organizations, users, API keys, credits, etc.
│   ├── eliza/               # elizaOS runtime schema
│   │   └── schema.ts        # Agents, memories, rooms, embeddings, etc.
│   ├── drizzle.ts           # Database client
│   └── migrations/          # Migration SQL files
├── lib/                     # Shared utilities
│   ├── queries/             # Database queries (12 files)
│   │   ├── api-keys.ts      # API key operations
│   │   ├── credits.ts       # Credit transactions
│   │   ├── containers.ts    # Container CRUD
│   │   ├── container-quota.ts  # Quota enforcement
│   │   ├── generations.ts   # Media generation records
│   │   ├── usage.ts         # Usage tracking
│   │   └── ...
│   ├── services/            # Business logic services
│   │   ├── ecr.ts           # AWS ECR integration
│   │   ├── ecs.ts           # AWS ECS deployment
│   │   ├── health-monitor.ts  # Provider health checks
│   │   └── containers.ts    # Container management
│   ├── eliza/               # elizaOS integration
│   │   ├── agent-runtime.ts # AgentRuntime wrapper
│   │   ├── agent.ts         # Agent management
│   │   └── plugin-assistant/  # Custom elizaOS plugin
│   ├── config/              # Configuration
│   │   ├── env-validator.ts # Environment validation
│   │   ├── env-consolidation.ts  # Config helpers
│   │   └── startup.ts       # Startup checks
│   ├── errors/              # Custom error classes
│   ├── middleware/          # Middleware utilities
│   ├── auth.ts              # Auth helpers
│   ├── blob.ts              # Vercel Blob utilities
│   ├── stripe.ts            # Stripe client
│   ├── pricing.ts           # Cost calculations
│   ├── rate-limiter.ts      # Rate limiting
│   ├── utils.ts             # General utilities
│   └── types.ts             # Shared TypeScript types
├── .env.example            # Environment template
├── docs/                    # Detailed documentation
│   ├── API_REFERENCE.md    # Complete API reference
│   ├── anthropic-cot-budget.md   # ANTHROPIC_COT_BUDGET + provider merge WHYs
│   ├── unit-testing-milady-mocks.md  # Bun mock.module + Milady pricing test WHYs
│   ├── ROADMAP.md          # Product direction and done items
│   ├── DEPLOYMENT.md       # Deployment guide
│   ├── DEPLOYMENT_TROUBLESHOOTING.md  # Troubleshooting
│   ├── STRIPE_SETUP.md     # Stripe integration
│   ├── ENV_VARIABLES.md    # Environment configuration
│   └── ...
├── scripts/                 # Utility scripts
│   ├── seed-credit-packs.ts
│   └── ...
├── middleware.ts            # Next.js middleware (auth)
├── drizzle.config.ts        # Drizzle Kit config
└── package.json             # Dependencies

Request Flow

graph TD
    A[Client Request] --> B[Next.js Middleware]
    B --> C{Auth Required?}
    C -->|Yes| D[Privy Auth]
    C -->|No| E[Route Handler]
    D -->|Authenticated| E
    D -->|Unauthenticated| F[Redirect to Login]
    E --> G{Request Type}
    G -->|AI Chat| H[AI SDK Gateway]
    G -->|Image/Video| I[Gemini/Fal.ai]
    G -->|Data| J[Drizzle ORM]
    G -->|Container| K[AWS ECS/ECR]
    G -->|elizaOS| L[AgentRuntime]
    H --> M[Response]
    I --> M
    J --> N[PostgreSQL]
    K --> M
    L --> N
    N --> M
Loading

Database Architecture

The platform uses a single database with integrated schemas:

  1. Platform Schemas (db/schemas/*.ts): Platform infrastructure
    • Organizations (db/schemas/organizations.ts), users (db/schemas/users.ts), authentication
    • API keys (db/schemas/api-keys.ts), usage tracking (db/schemas/usage-records.ts)
    • Credit system (db/schemas/credit-transactions.ts, db/schemas/credit-packs.ts), billing, Stripe integration
    • Containers (db/schemas/containers.ts), ECS/ECR deployments
    • Generations (db/schemas/generations.ts - image/video records)
    • Conversations (db/schemas/conversations.ts - platform-level chat)
    • elizaOS Tables (integrated via @elizaos/plugin-sql schema):
      • Agents (character definitions)
      • Memories with vector embeddings
      • Rooms and participants
      • Entities and relationships
      • Components and tasks
      • Message servers and channels

🛠 Tech Stack

Core Framework

  • Next.js 15.5.4: React framework with App Router, Turbopack, and Server Actions
  • React 19.2.0: Latest UI library with server components
  • TypeScript 5: Full type safety

Database & ORM

  • PostgreSQL: Single database with all tables
    • Platform tables: SaaS tables (users, credits, containers, etc.)
    • elizaOS tables: Agent runtime tables (agents, memories, rooms, etc.)
  • Drizzle ORM 0.44.6: Type-safe SQL ORM
  • Drizzle Kit 0.31.5: Migrations and schema management
  • pgvector: Vector similarity search for embeddings

Authentication & Billing

  • Privy Auth: Web3-native authentication with multi-provider support (email, wallet, social logins)
    • @privy-io/react-auth for frontend
    • @privy-io/server-auth for backend token verification
  • Stripe 19.1.0: Payment processing and credit purchases
  • @stripe/stripe-js 8.0.0: Client-side Stripe integration

AI & Machine Learning

  • AI SDK 5.0.60: Vercel AI SDK for streaming
  • @ai-sdk/gateway 1.0.33: Multi-provider AI routing
  • @ai-sdk/openai 2.0.43: OpenAI provider
  • @ai-sdk/react 2.0.60: React hooks for AI
  • @fal-ai/client 1.6.2: Fal.ai video generation
  • @elizaos/core 1.6.1: elizaOS agent runtime
  • @elizaos/plugin-openai 1.5.15: OpenAI plugin for elizaOS
  • @elizaos/plugin-sql 1.6.1: SQL database plugin for elizaOS

Storage & Infrastructure

  • Vercel Blob 2.0.0: Media storage (images/videos)
  • @aws-sdk/client-ecr 3.x: AWS Elastic Container Registry
  • @aws-sdk/client-ecs 3.x: AWS Elastic Container Service deployment

Styling & UI

  • Tailwind CSS 4.1.14: Utility-first CSS framework
  • Radix UI: 20+ accessible, unstyled UI primitives
  • Lucide React 0.545.0: Icon library (1000+ icons)
  • class-variance-authority 0.7.1: Component variants
  • next-themes 0.4.6: Dark/light mode support
  • motion 12.23.22: Animation library
  • Sonner 2.0.7: Toast notifications
  • Recharts 2.15.4: Charts for analytics

Development Tools

  • ESLint 9.37.0: Code linting
  • Prettier 3.6.2: Code formatting
  • tsx 4.19.2: TypeScript execution
  • Zod 4.1.11: Schema validation

📦 Prerequisites

Required Software

  • Node.js: v20 or higher
  • npm: v10 or higher
  • Git: For version control

Required Services

  1. Neon Database (neon.tech)

    • Create a new project
    • Copy the connection string
  2. Privy (privy.io)

    • Create an application
    • Configure webhook endpoint: http://localhost:3000/api/privy/webhook
    • Enable desired login methods (email, wallet, social)
    • Note your Client ID and API Key
  3. OpenAI or AI Gateway (at least one)

    • OpenAI API key for direct access, OR
    • AI Gateway API key for multi-provider access

Optional Services

  1. Vercel Blob (vercel.com)

    • Required for Gallery feature
    • Create a Blob store and copy token
  2. Fal.ai (fal.ai)

    • Required for video generation
    • Create account and get API key
  3. AWS (aws.amazon.com)

    • Required for container deployments
    • AWS credentials (Access Key ID, Secret Access Key)
    • ECS/ECR configuration, VPC, subnets, security groups
  4. Stripe (stripe.com)

    • Required for billing/credits
    • Secret key and webhook secret

🚀 Quick Start

1. Clone and Install

cd cloud
bun install

2. Environment Setup

Copy the example environment file:

cp .env.example .env.local

Edit .env.local with your credentials (see .env.example for all options).

For local caching, the repo expects the Docker Redis container on localhost:6379:

REDIS_URL=redis://localhost:6379
KV_URL=redis://localhost:6379

Leave KV_REST_API_URL and KV_REST_API_TOKEN blank locally unless you are intentionally using Upstash.

Minimum required variables:

# Database
DATABASE_URL=postgresql://user:password@host:5432/database?sslmode=require

# Privy Authentication
NEXT_PUBLIC_PRIVY_APP_ID=your_privy_app_id_here
PRIVY_APP_SECRET=your_privy_app_secret_here
PRIVY_WEBHOOK_SECRET=replace_with_strong_random_secret

# AI (at least one)
OPENAI_API_KEY=sk-your_openai_key
# OR
AI_GATEWAY_API_KEY=your_gateway_key
# Optional for Groq-native models (groq/compound, groq/compound-mini)
GROQ_API_KEY=your_groq_key

Eliza App variables (for Telegram, iMessage, Discord, and WhatsApp integrations):

# JWT secret for Eliza App user sessions (required)
ELIZA_APP_JWT_SECRET=          # Generate: openssl rand -hex 32

# Telegram (optional)
ELIZA_APP_TELEGRAM_BOT_TOKEN=  # From @BotFather

# iMessage / Blooio (optional)
ELIZA_APP_BLOOIO_API_KEY=      # From Blooio dashboard

# Discord (optional)
ELIZA_APP_DISCORD_BOT_TOKEN=         # Developer Portal → Bot
ELIZA_APP_DISCORD_APPLICATION_ID=    # Developer Portal → General Information (also the OAuth2 Client ID)
ELIZA_APP_DISCORD_CLIENT_SECRET=     # Developer Portal → OAuth2 → Client Secret

# WhatsApp Business Cloud API (optional — for the public Eliza App bot)
ELIZA_APP_WHATSAPP_ACCESS_TOKEN=     # Meta Business Settings → System Users → Generate Token
ELIZA_APP_WHATSAPP_PHONE_NUMBER_ID=  # Meta App Dashboard → WhatsApp → API Setup
ELIZA_APP_WHATSAPP_APP_SECRET=       # Meta App Dashboard → Settings → Basic → App Secret
ELIZA_APP_WHATSAPP_VERIFY_TOKEN=     # Generate: openssl rand -hex 32
ELIZA_APP_WHATSAPP_PHONE_NUMBER=     # Display phone number in E.164 format (e.g. +14245074963)

Organization-level WhatsApp (Dashboard > Settings > Connections):

Per-organization WhatsApp credentials are stored in the database via the dashboard UI. Each organization connects their own WhatsApp Business account by entering their Access Token, Phone Number ID, and App Secret in the connections settings. The webhook URL and verify token are auto-generated and displayed after connecting.

See .env.example for the full list of Eliza App environment variables.

Generate secure passwords:

# Generate PRIVY_WEBHOOK_SECRET (min 32 chars)
openssl rand -base64 32

# Generate CRON_SECRET / ELIZA_APP_JWT_SECRET
openssl rand -hex 32

3. Database Setup

If you want the full local infrastructure stack, start Postgres and Redis first:

bun run db:local:start

Run migrations to create all tables:

npm run db:migrate

4. Seed Credit Packs (Optional)

If using Stripe billing:

npm run seed:credit-packs

This creates credit pack products in Stripe.

5. Start Development Server

npm run dev

Visit http://localhost:3000.

6. First Login

  1. Click "Sign In" → Privy will create your user
  2. You'll be redirected to the dashboard
  3. Your organization starts with 10,000 credits

💻 Development

Available Scripts

# Development
npm run dev              # Start dev server with Turbopack
npm run build            # Production build with Turbopack
npm start                # Start production server

# Database
npm run db:generate      # Generate migrations from schema changes
npm run db:migrate       # Run pending migrations
npm run db:studio        # Open Drizzle Studio

# Code Quality
npm run lint             # Run ESLint
npm run lint:fix         # Auto-fix ESLint issues
npm run format           # Format with Prettier
npm run format:check     # Check formatting
npm run check-types      # TypeScript type checking

# Utilities
npm run seed:credit-packs   # Seed Stripe credit packs
npm run bootstrapper:build  # Build container bootstrapper

Testing (unit vs integration vs runtime)

Tests are split by kind; use the right script for what you want to run:

Script Directory What it runs Needs
bun run test:unit tests/unit/ Unit tests (mocked deps, fast) Env preload only; some skip without DATABASE_URL
bun run test:integration tests/integration/ API/DB/E2E integration tests DATABASE_URL (+ migrations); some need a running server
bun run test:runtime tests/runtime/ Runtime/factory and perf tests DATABASE_URL (+ migrations), heavier
bun run test test:repo-unit:bulk + special Two staged unit batches (see package.json for included/excluded files) Env preload only (same family as test:unit)
bun run test:playwright tests/playwright/ Playwright E2E (optional) @playwright/test installed

Env is loaded from .env, .env.local, and .env.test via preload.

Engineering docs (WHYs)

  • docs/unit-testing-milady-mocks.md — Why partial MILADY_PRICING mocks break other Milady modules under Bun, and how the billing cron tests isolate mock.module("@/db/client") contention.
  • docs/anthropic-cot-budget.md — Per-agent settings.anthropicThinkingBudgetTokens (MCP/A2A), env default (ANTHROPIC_COT_BUDGET) and cap (ANTHROPIC_COT_BUDGET_MAX), and why thinking budgets are not request parameters.
  • CHANGELOG.md — Engineering changelog (Keep a Changelog style).
  • docs/ROADMAP.md — Product direction and rationale; “Done” links to the above where relevant.

Development Workflow

  1. Start dev server: npm run dev
  2. Make changes: Edit files in app/, components/, lib/
  3. Instant feedback: Turbopack provides sub-second HMR
  4. Test features: Navigate to /dashboard routes
  5. Check types: npm run check-types
  6. Database changes: Edit db/schemas/*.tsnpm run db:generatenpm run db:migrate

Project Structure Guidelines

  • app/: Routes, API handlers, server actions
  • components/: Reusable React components
  • lib/: Business logic, database queries, services
  • db/: Database schemas and migrations
  • Server Components: Default for all components
  • Client Components: Only when needed ('use client')

🔐 Production Security

⚠️ IMPORTANT: Before deploying to production, you MUST complete the security configuration for Privy authentication.

Security Features Implemented

Content Security Policy (CSP): Comprehensive CSP configured in next.config.ts that:

  • Protects against XSS attacks
  • Allows Privy authentication iframe
  • Allows WalletConnect and wallet integrations
  • Prevents clickjacking with frame-ancestors restrictions
  • Includes all required domains for Privy, Solana, and third-party services

Security Headers: Multiple layers of protection:

  • X-Frame-Options: DENY (clickjacking protection)
  • X-Content-Type-Options: nosniff (MIME sniffing protection)
  • Referrer-Policy (referrer information control)
  • X-XSS-Protection (browser XSS protection)
  • Permissions-Policy (browser feature restrictions)

Pre-Production Checklist

Before deploying to production, complete these critical steps:

1. Configure Privy Dashboard

Visit https://dashboard.privy.io and configure:

  • ✅ Add production domain to Allowed Domains
  • ✅ Remove all test/development domains
  • ✅ Enable HttpOnly cookies for enhanced security
  • ✅ Complete domain ownership verification
  • ✅ Configure MFA settings (disable SMS, enable authenticator apps)
  • ✅ Review OAuth providers and session duration

2. Set Environment Variables

NEXT_PUBLIC_PRIVY_APP_ID=your_production_app_id
NEXT_PUBLIC_PRIVY_CLIENT_ID=your_production_client_id
PRIVY_APP_SECRET=your_production_app_secret
PRIVY_WEBHOOK_SECRET=strong_random_secret_here

3. Test Security Configuration

# Build and run in production mode
npm run build
npm run start

# Test security headers (in another terminal)
npm run test:security-headers

# Test all authentication flows
# - Login with all methods
# - Wallet connections
# - Transactions
# - Check browser console for CSP violations

Documentation

Complete security documentation is available:

Testing Your Security Setup

Run the automated security headers test:

npm run test:security-headers

This will verify:

  • All security headers are present
  • CSP includes required Privy domains
  • Frame embedding protection is active
  • All security directives are configured

Common Security Issues

Issue: "Domain not allowed"

  • Solution: Add your domain to Privy Dashboard > Configuration > App settings > Allowed domains

Issue: Privy iframe not loading

  • Solution: Check browser console for CSP violations. Verify frame-src includes https://auth.privy.io

Issue: CSP violations

  • Solution: Review browser console, determine if legitimate, update next.config.ts if needed

See docs/CSP_TESTING_GUIDE.md for detailed troubleshooting.


🔧 Platform Features

1. AI Text Generation

Location: /dashboard/chat and /app/api/v1/chat/route.ts

Features:

  • Multi-model support (GPT-4, Claude, Gemini, etc.)
  • Real-time streaming responses with useChat hook
  • Conversation persistence with full history
  • Model selection dropdown
  • Token usage and cost tracking

Usage:

import { useChat } from "@ai-sdk/react";

const { messages, input, handleSubmit, isLoading } = useChat({
  api: "/api/v1/chat",
  body: { model: "gpt-4o" },
});

Cost: Token-based pricing from lib/pricing.ts

Anthropic Messages API (Claude Code): For tools that expect the Anthropic Messages API (e.g. Claude Code), use POST /api/v1/messages with the same request/response shape. Set ANTHROPIC_BASE_URL=https://cloud.milady.ai/api/v1 and ANTHROPIC_API_KEY to your Cloud API key so usage goes through Cloud credits instead of a direct Anthropic key. See API docs → Anthropic Messages. Why: single API key and billing for both OpenAI-style and Anthropic-style clients.

Public cloud agents (MCP / A2A) — Anthropic extended thinking: For POST /api/agents/{id}/mcp (chat tool) and POST /api/agents/{id}/a2a (chat), extended thinking uses the character’s settings.anthropicThinkingBudgetTokens when the model is Anthropic (0 = off; omitted = fall back to ANTHROPIC_COT_BUDGET). Optional ANTHROPIC_COT_BUDGET_MAX clamps any effective budget. Why: the agent owner controls cost/quality per agent; MCP/A2A clients cannot pass a thinking budget in the request (untrusted input). See docs/anthropic-cot-budget.md.

2. AI Image Generation

Location: /dashboard/image and /app/api/v1/generate-image/route.ts

Features:

  • Google Gemini 2.5 Flash multimodal generation
  • High-quality 1024x1024 images
  • Automatic Vercel Blob upload
  • Base64 preview for instant display
  • Download functionality

API:

POST /api/v1/generate-image
Content-Type: application/json
Authorization: Bearer eliza_your_api_key

{
  "prompt": "A serene landscape with mountains and lake at sunset"
}

Cost: $0.01 per image

3. AI Video Generation

Location: /dashboard/video and /app/api/v1/generate-video/route.ts

Features:

  • Multiple Fal.ai models:
    • fal-ai/veo3 (Google Veo 3)
    • fal-ai/veo3/fast (faster version)
    • fal-ai/kling-video/v2.1/pro/text-to-video (Kling Pro)
    • fal-ai/minimax/hailuo-02/pro/text-to-video (MiniMax)
  • Automatic Vercel Blob upload
  • Progress tracking with queue updates
  • Fallback video on errors

API:

POST /api/v1/generate-video
Content-Type: application/json
Authorization: Bearer eliza_your_api_key

{
  "prompt": "A cinematic shot of a spaceship flying through stars",
  "model": "fal-ai/veo3"
}

Cost: $0.05 per video ($0.025 for fallback)

4. Gallery & Media Storage

Location: /dashboard/gallery

Features:

  • View all generated images and videos
  • Filter by type (image, video, all)
  • Grid layout with thumbnails
  • Full-size preview with details
  • Download media files
  • Delete from both DB and Vercel Blob
  • Storage usage statistics

Vercel Blob Benefits:

  • Global CDN delivery (19 edge regions)
  • Public access with unguessable URLs
  • Automatic caching
  • No upload fees (only downloads charged)
  • Hierarchical folder structure

Setup:

# 1. Create Blob store in Vercel Dashboard
# 2. Copy BLOB_READ_WRITE_TOKEN to .env.local
BLOB_READ_WRITE_TOKEN=vercel_blob_rw_your_token

5. Container Deployments

Location: /dashboard/containers and /app/api/v1/containers/route.ts

Features:

  • Deploy elizaOS projects via elizaos deploy CLI
  • Multi-project support: Deploy multiple different projects per user
  • Multi-architecture support: Auto-detects platform and deploys to matching AWS instance type
    • ARM64: t4g.small (AWS Graviton2, $15.76/month) - Recommended for cost savings
    • x86_64: t3.small (Intel/AMD, $18.68/month) - Universal compatibility
  • Smart update detection: Automatically detects and updates existing deployments
  • Docker-based deployments to AWS ECS (Elastic Container Service)
  • ECR (Elastic Container Registry) for Docker image storage with project-specific repositories
  • CloudFormation stack per project: elizaos-{userId}-{projectName}
  • Optimized health checks (15s interval, 5min grace period)
  • Health monitoring via CloudWatch and ECS
  • Quota enforcement (prevents race conditions)
  • Environment variable injection
  • Credit-based billing with automatic deduction
  • Container management CLI: elizaos containers list|delete|logs
  • Async deployment: API returns immediately, CLI polls with beautiful progress

How It Works:

  1. User gets API key from /dashboard/api-keys
  2. User runs elizaos deploy --project-name my-project --api-key eliza_xxxxx from project directory
  3. CLI auto-detects if project already deployed (checks project_name)
  4. CLI requests ECR credentials from the cloud API
  5. CLI builds Docker image locally using project's Dockerfile (or generates one)
  6. CLI pushes Docker image to project-specific ECR repository
  7. CLI creates/updates container deployment via cloud API:
    • Fresh deployment: Creates new CloudFormation stack
    • Update deployment: Updates existing CloudFormation stack (zero-downtime)
  8. Cloud provisions/updates dedicated EC2 instance with ECS
  9. Container accessible via AWS Load Balancer URL
  10. Credits automatically deducted based on container resources (CPU/memory)

Multi-Project Example:

# Deploy first project
cd ~/chatbot
elizaos deploy --project-name chatbot --api-key eliza_xxx
# URL: https://fc51b251-chatbot.containers.elizacloud.ai

# Deploy second project (same user, different project)
cd ~/assistant
elizaos deploy --project-name assistant --api-key eliza_xxx
# URL: https://fc51b251-assistant.containers.elizacloud.ai

# Update chatbot
cd ~/chatbot
# ... make changes ...
elizaos deploy --project-name chatbot  # Auto-detected as update
# URL unchanged: https://fc51b251-chatbot.containers.elizacloud.ai

Human-Readable URLs:

  • Format: https://{userId-prefix}-{project-name}.containers.elizacloud.ai
  • Example: https://fc51b251-chatbot.containers.elizacloud.ai
  • Uses first segment of UUID + project name for easy recognition

Instance Specs (Auto-Selected):

ARM64 (t4g.small - Recommended):

  • 2 vCPUs (ARM Graviton2)
  • 2 GiB RAM (2048 MB)
  • $15.76/month ($12.26 instance + $3.50 storage/monitoring)
  • Default container allocation: 1.75 vCPU (1792 units), 1.75 GiB RAM (1792 MB, 87.5% of instance)

x86_64 (t3.small - Universal):

  • 2 vCPUs (Intel/AMD)
  • 2 GiB RAM (2048 MB)
  • $18.68/month ($15.18 instance + $3.50 storage/monitoring)
  • Default container allocation: 1.75 vCPU (1792 units), 1.75 GiB RAM (1792 MB, 87.5% of instance)

Platform is automatically detected from your system. ARM64 provides better cost efficiency ($2.92/month savings) while x86_64 ensures universal compatibility.

Container Management:

# List all containers (with project names)
elizaos containers list --api-key eliza_xxx

# View logs (auto-detects from current directory)
cd ~/chatbot
elizaos containers logs  # Finds chatbot project automatically

# Delete container (auto-detects from current directory)
cd ~/chatbot
elizaos containers delete  # Finds and deletes chatbot project

Deployment Architecture:

┌──────────────┐
│   CLI Tool   │
│  (elizaos)   │
└──────┬───────┘
       │ 1. Request ECR credentials
       ▼
┌──────────────┐
│  Cloud API   │
│   (Next.js)  │
└──────┬───────┘
       │ 2. Return ECR auth token + repository
       ▼
┌──────────────┐
│  Docker CLI  │
│ (local build)│
└──────┬───────┘
       │ 3. Push image to ECR
       ▼
┌──────────────┐     4. Deploy container     ┌──────────────┐
│     ECR      │ ─────────────────────────▶ │  EC2 + ECS   │
│  (Registry)  │                              │  (Runtime)   │
└──────────────┘                              └──────┬───────┘
                                                      │
                                                      ▼
                                              ┌──────────────┐
                                              │ Load Balancer│
                                              │   (Public)   │
                                              └──────────────┘

Docker Image Requirements:

  • Must expose a port (default: 3000)
  • Must include a /health endpoint for ECS health checks
  • Dockerfile can be auto-generated if not present
  • Environment variables passed from cloud API

API:

POST /api/v1/containers
Content-Type: application/json
Authorization: Bearer eliza_your_api_key

{
  "name": "my-agent",
  "port": 3000,
  "max_instances": 1,
  "environment_vars": {
    "NODE_ENV": "production"
  },
  "ecr_image_uri": "123456789012.dkr.ecr.us-east-1.amazonaws.com/elizaos/my-project:latest"
}

Requirements:

  • AWS account with ECS/ECR/EC2 access and CloudFormation permissions
  • elizaOS Cloud account with API key
  • VPC with public subnets configured
  • IAM roles for ECS task execution
  • Environment variables set (see .env.example and docs/ENV_VARIABLES.md)

6. elizaOS Agent Integration

Location: /dashboard/chat and lib/eliza/

Features:

  • Full AgentRuntime from @elizaos/core
  • PostgreSQL-backed memory system
  • Vector embeddings (384-3072 dimensions)
  • Rooms for conversations
  • Participants and relationships
  • Custom plugins and providers

Database Schema:

  • agents: Character definitions
  • memories: Conversation history
  • embeddings: Vector similarity search
  • rooms: Conversation contexts
  • entities: Users and participants
  • relationships: Entity connections

API:

# Create room
POST /api/eliza/rooms
{
  "agentId": "uuid",
  "name": "Chat Room"
}

# Send message
POST /api/eliza/rooms/{roomId}/messages
{
  "content": "Hello, agent!",
  "authorId": "user-uuid"
}

7. Character Creator

Location: /dashboard/character-creator and /app/api/v1/character-assistant/route.ts

Features:

  • AI-assisted character building using GPT-4o-mini
  • Progressive JSON generation
  • Live preview of character definition
  • Import/export elizaOS-compatible JSON
  • Support for all character fields:
    • name, username, bio, system prompt
    • messageExamples, postExamples
    • topics, adjectives, style
    • plugins, knowledge, settings

Workflow:

  1. User describes character in natural language
  2. AI generates JSON incrementally
  3. User sees live preview
  4. AI suggests improvements
  5. Export as elizaOS character file

Example:

{
  "name": "Alex",
  "bio": ["A friendly AI assistant", "Specializes in technical support"],
  "adjectives": ["helpful", "knowledgeable", "patient"],
  "system": "You are a helpful technical support agent...",
  "style": {
    "chat": ["Be concise", "Use bullet points"],
    "post": ["Be professional", "Include examples"]
  },
  "plugins": ["@elizaos/plugin-sql", "@elizaos/plugin-openai"]
}

8. Developer API & Programmatic Access

Location: Documented management endpoints that explicitly note API key support

API key authentication is available for the specific endpoints documented in this README (for example: /api/v1/chat, /api/v1/chat/completions, /api/v1/messages, /api/v1/generate-image, /api/v1/generate-video, /api/v1/containers, /api/v1/voice/*, /api/v1/billing/*, /api/v1/models, /api/v1/gallery). Not every /api/v1/ or /api/my-agents/ route supports API keys today, so rely on the documented list, enabling:

  • Programmatic Agent Management: Create, update, delete, and clone agents via API
  • Voice Integration: Text-to-speech, speech-to-text, and voice cloning for voice-enabled applications
  • Billing Automation: Monitor balance, configure auto-top-up, and manage credits programmatically
  • AI Agent Autonomy: Enable AI agents to manage their own resources and budgets

Session-based auth only (no API key support yet): /api/v1/api-keys, /api/v1/apps/[id]/deploy, /api/v1/dashboard, /api/my-agents/characters/[id]/track-interaction.

Why API Keys for Management Endpoints?

Traditional SaaS platforms only expose limited APIs. We've enabled API key authentication across these management endpoints because:

  1. Developer Experience: Developers can build integrations without browser-based auth flows
  2. Agent Autonomy: AI agents need to manage their own resources (credits, other agents, voices) autonomously
  3. Automation: CI/CD pipelines, scripts, and external systems can interact with the platform programmatically
  4. No Vendor Lock-in: Generic endpoint paths (/api/v1/voice/ instead of provider-specific paths) allow switching providers without breaking integrations

Generic Voice API: Voice endpoints use provider-agnostic paths (/api/v1/voice/tts instead of /api/elevenlabs/tts) so your code doesn't need to change if the underlying provider changes. Legacy paths are preserved for backwards compatibility.

Billing Management: Agents and developers can configure auto-top-up settings programmatically, ensuring autonomous agents never stop working due to insufficient credits.


9. API Key Management

Location: /dashboard/api-keys and /app/api/v1/api-keys/route.ts

Features:

  • Generate API keys for programmatic access
  • Key rotation and regeneration
  • Rate limiting per key (default 1000 req/day)
  • Usage tracking and statistics
  • Expires_at support for time-limited keys

Key Format: eliza_<random_32_chars>

API:

# Create API key
POST /api/v1/api-keys
{
  "name": "Production API Key",
  "description": "Main production key",
  "rate_limit": 10000
}

# Regenerate key
POST /api/v1/api-keys/{id}/regenerate

# Delete key
DELETE /api/v1/api-keys/{id}

Using API Keys:

curl https://your-app.com/api/v1/chat \
  -H "Authorization: Bearer eliza_your_key_here" \
  -H "Content-Type: application/json" \
  -d '{"messages": [{"role": "user", "content": "Hello"}]}'

9. Credit System & Billing

Location: /dashboard/billing and lib/queries/credits.ts

Features:

  • Credit-based pricing model
  • Stripe integration for purchases
  • Credit packs with volume discounts
  • Automatic deduction on usage
  • Transaction history
  • Organization-level balance

Pricing:

  • Text Chat: Token-based (varies by model)
  • Image Generation: $0.01 per image
  • Video Generation: $0.05 per video
  • Container Running: $0.67/day (~$20/month) - billed daily
  • Container Deployment: $0.50 one-time per deployment
  • Voice Clone (Instant): 50 credits
  • Voice Clone (Professional): $2.00

Stripe Integration:

  • Credit pack products defined in Stripe
  • Checkout session for purchases
  • Webhook processing for fulfillment
  • Idempotency for duplicate webhooks

Credit Packs (example):

[
  { name: "Starter", credits: 10000, price: 9.99 },
  { name: "Pro", credits: 50000, price: 39.99 },
  { name: "Enterprise", credits: 200000, price: 129.99 },
];

Setup:

See docs/STRIPE_SETUP.md for detailed Stripe configuration.

Referrals & Affiliates

  • Referrals: Signup-based. When a user signs up with a referral code, we record the link; when they buy credits (Stripe or x402), we redistribute 100% of that purchase in a 50/40/10 split (ElizaCloud / app owner / creator). Signup and qualified bonuses ($1 + $0.50 + $0.50) are minted as marketing spend, not carved from revenue. Why: One predictable split model; no risk of over-paying (splits always sum to 100%).
  • Affiliates: Link-based. Users can be linked to an affiliate code; on auto top-up and MCP usage we add a markup (default 20%) to what the customer pays and pay that to the affiliate. Why: Affiliate cost is passed to the customer, so we never over-allocate.
  • No double-apply: Referral splits apply only to Stripe checkout and x402; affiliate markup only to auto top-up and MCP. No single transaction pays both.

See docs/referrals.md for flow, API, and revenue math; docs/affiliate-referral-comparison.md for comparison with the other cloud repo.

Signup codes

  • What: One-time bonus credits per organization, e.g. launch50 → $50. Codes are defined in the SIGNUP_CODES_JSON env var (JSON object); if unset, defaults to {} (no codes). Why env var: So each environment (staging, prod) can have its own codes without committing them; no config file in the repo.
  • Where: Redeem via POST /api/signup-code/redeem (session auth only) or during Discord/Telegram signup by passing signup_code in the auth body. Why one per org: Prevents abuse (one shared code = one bonus per org) and keeps "welcome bonus" semantics.
  • Distinct from referrals: Referrals split revenue on purchases (50/40/10). Signup codes are flat campaign bonuses; an org can use both. See docs/signup-codes.md for API, security (rate limit CRITICAL, no-cache, two-layer one-per-org), and WHYs.

Wallet API (SIWE + wallet header auth)

  • SIWE (EIP-4361): GET /api/auth/siwe/nonce → sign message → POST /api/auth/siwe/verify → receive API key. New wallets get an account and initial free credits. Why: Agents and headless clients need a way to sign in and get an API key without a browser.
  • Wallet header signature: Send X-Wallet-Address, X-Timestamp, X-Wallet-Signature on each request to authenticate without storing a key. First valid request for an unknown wallet creates the account. Why: Some clients prefer not to store an API key; the wallet proves ownership per request.
  • x402 topup: Can credit the signer when wallet sig headers are present, or use body.walletAddress. All wallet signup (SIWE, wallet-header, topup) uses the same findOrCreateUserByWalletAddress path (slug, credits, race handling). See docs/wallet-api.md and Authentication / Wallet API for full reference and WHYs.

10. Analytics & Monitoring

Location: /dashboard/analytics and lib/queries/usage.ts

Features:

  • Usage records per request (tokens, cost, model)
  • Provider health monitoring
  • Model usage breakdown (Recharts)
  • Credit activity timeline
  • Error rate tracking
  • Response time monitoring

Metrics Tracked:

  • Input/output tokens
  • Cost per request
  • Duration (ms)
  • Success/failure status
  • IP address and user agent
  • Model and provider used

Provider Health:

  • Automatic health checks for AI providers
  • Status: healthy, degraded, unhealthy
  • Response time percentiles
  • Error rate calculation

11. MCP (Model Context Protocol) API

Location: /app/api/mcp/route.ts

Features:

  • Standards-based MCP server implementation
  • Platform tools exposed via MCP protocol
  • Bearer token authentication using API keys
  • Compatible with any MCP client (Claude Desktop, MCP Inspector, etc.)
  • Automatic billing for tool usage

Available MCP Tools:

  1. check_credits: View organization balance and recent transactions
  2. get_recent_usage: View recent API usage statistics
  3. generate_text: Generate text using AI models
  4. generate_image: Generate images using Google Gemini 2.5
  5. save_memory: Save to long-term memory
  6. retrieve_memories: Search and retrieve memories
  7. chat_with_agent: Chat with deployed elizaOS agents
  8. list_agents: List all available agents
  9. list_containers: List deployed containers

Using MCP Inspector:

npm run mcp:inspector

Using with Claude Desktop:

Add to your Claude Desktop config:

{
  "mcpServers": {
    "eliza-cloud": {
      "url": "http://localhost:3000/api/mcp",
      "transport": {
        "type": "streamableHttp"
      },
      "headers": {
        "Authorization": "Bearer eliza_your_api_key_here"
      }
    }
  }
}

🗄 Database Architecture

SaaS Schema (db/schemas/*.ts)

Core Tables:

  • organizations: Multi-tenant organization data

    • credit_balance, stripe_customer_id
    • allowed_models, allowed_providers
    • webhook_url for notifications
  • users: User accounts linked to organizations

    • privy_user_id for authentication
    • role: admin, member
    • is_active for deactivation
  • api_keys: API authentication

    • key_hash for secure storage
    • rate_limit, usage_count
    • permissions array
  • credit_transactions: Credit ledger

    • amount (positive or negative)
    • type: purchase, deduction, refund, adjustment
    • stripe_payment_intent_id for reconciliation
  • credit_packs: Balance top-up packages

    • stripe_price_id, stripe_product_id
    • sort_order for display
  • usage_records: Per-request usage tracking

    • input_tokens, output_tokens
    • input_cost, output_cost
    • model, provider, type
    • is_successful, error_message
  • generations: Image/video generation records

    • type: image, video
    • status: pending, completed, failed
    • storage_url (Vercel Blob)
    • dimensions, file_size, mime_type
  • containers: AWS ECS container deployments

    • ecr_repository_uri, ecr_image_tag (Docker image in ECR)
    • ecs_cluster_arn, ecs_service_arn, ecs_task_definition_arn (ECS resources)
    • load_balancer_url (ALB URL for accessing the container)
    • status: pending, building, deploying, running, failed, stopped
    • environment_vars, desired_count, cpu, memory, port
    • Unique constraint on (organization_id, name)
  • alb_priorities: Application Load Balancer priority management

    • Ensures each container gets a unique priority for ALB routing rules
    • Prevents priority conflicts when multiple containers share an ALB
  • conversations: Platform-level chat history

    • title, model, settings
    • message_count, total_cost
  • conversation_messages: Messages in conversations

    • role: user, assistant, system
    • sequence_number for ordering
    • tokens, cost, processing_time
  • user_characters: User-created elizaOS characters

    • character_data (full JSON)
    • is_template, is_public
    • Stored separately from agents
  • model_pricing: Dynamic pricing per model

    • input_cost_per_1k, output_cost_per_1k
    • effective_from, effective_until
    • is_active for versioning
  • provider_health: AI provider status

    • status: healthy, degraded, unhealthy
    • response_time, error_rate
    • last_checked timestamp
  • jobs: Background job queue

    • type, status: pending, in_progress, completed, failed
    • attempts, max_attempts
    • webhook_url for callbacks

elizaOS Schema (Integrated)

Integrated into the main database via @elizaos/plugin-sql schema. These tables are managed by elizaOS core:

Agent Runtime Tables:

  • agents: Character definitions

    • name, username, bio, system
    • messageExamples, postExamples
    • topics, adjectives, style
    • plugins, knowledge, settings
  • memories: Conversation history

    • type (message, document, fragment)
    • content (JSONB)
    • unique flag for deduplication
    • metadata with document references
  • embeddings: Vector similarity search

    • Multiple dimension columns:
      • dim384, dim512, dim768 (small-large)
      • dim1024, dim1536, dim3072 (XL-XXXL)
    • memory_id foreign key
  • rooms: Conversation contexts

    • source (discord, telegram, web, etc.)
    • type (DM, group, channel)
    • world_id optional reference
    • channel_id for platform mapping
  • participants: Room membership

    • entity_id, room_id, agent_id
    • room_state for custom data
  • entities: Users and participants

    • names array for aliases
    • metadata JSONB
  • relationships: Entity connections

    • source_entity_id, target_entity_id
    • agent_id scope
    • tags array
    • Unique constraint prevents duplicates
  • components: ECS-style data

    • entity_id, room_id, world_id
    • type, data JSONB
  • worlds: High-level grouping

    • agent_id, name, server_id
  • tasks: Scheduled agent tasks

    • name, description, tags
    • metadata JSONB
  • cache: Key-value cache

    • key, agent_id composite primary key
    • expires_at for TTL
  • logs: Audit trail

    • entity_id, room_id, type
    • body JSONB
  • message_servers: Central messaging (future)

    • source_type, source_id
    • For multi-platform agents
  • channels: Message channels

    • message_server_id
    • type (text, voice, DM, etc.)
  • central_messages: Cross-platform messages

    • channel_id, author_id
    • in_reply_to_root_message_id for threads

Database Migrations

Generate migration:

bun run db:generate

This creates SQL migration files in db/migrations/.

Apply migration:

bun run db:migrate

Race Condition Prevention

The platform implements atomic operations to prevent quota bypass:

Example: Container quota enforcement

await db.transaction(async (tx) => {
  // 1. Lock organization row
  const org = await tx
    .select()
    .from(organizations)
    .where(eq(organizations.id, orgId))
    .for("update");

  // 2. Count containers while holding lock
  const count = await tx
    .select()
    .from(containers)
    .where(eq(containers.organization_id, orgId));

  // 3. Check quota
  if (count >= maxAllowed) throw new QuotaExceededError();

  // 4. Create container
  return await tx.insert(containers).values(data);
});

See lib/queries/container-quota.ts for full implementation.

🔌 API Reference

Authentication

Documented management endpoints support multiple authentication methods:

  1. Session Cookie (Privy): Automatic for logged-in users
  2. API Key Header: Authorization: Bearer eliza_your_key or X-API-Key: eliza_your_key
  3. SIWE: Get nonce from GET /api/auth/siwe/nonce, sign EIP-4361 message, POST /api/auth/siwe/verify to receive an API key
  4. Wallet header: X-Wallet-Address, X-Timestamp, X-Wallet-Signature (per-request signature; first request can create account)

Base URL

  • Development: http://localhost:3000
  • Production: https://your-domain.com

Endpoints

AI Generation

# Text Chat
POST /api/v1/chat
{
  "messages": [{"role": "user", "content": "Hello"}],
  "model": "gpt-4o"
}

# Image Generation
POST /api/v1/generate-image
{
  "prompt": "A beautiful sunset over mountains"
}

# Video Generation
POST /api/v1/generate-video
{
  "prompt": "Cinematic shot of spaceship",
  "model": "fal-ai/veo3"
}

# Available Models
GET /api/v1/models

Gallery

# List Media
GET /api/v1/gallery?type=image&limit=50&offset=0

# Response:
{
  "items": [...],
  "count": 10,
  "hasMore": false
}

Containers

# List Containers
GET /api/v1/containers

# Create Container
POST /api/v1/containers
{
  "name": "my-agent",
  "port": 3000,
  "ecr_image_uri": "123456789012.dkr.ecr.us-east-1.amazonaws.com/elizaos/my-project:v1.0.0",
  "environment_vars": {...},
  "cpu": 256,
  "memory": 512,
  "desired_count": 1
}

# Get Container
GET /api/v1/containers/{id}

# Delete Container
DELETE /api/v1/containers/{id}

# Check Quota
GET /api/v1/containers/quota

ECR Credentials

# Get ECR credentials for pushing Docker images
POST /api/v1/containers/credentials
{
  "projectId": "my-project",
  "version": "1.0.0"
}

# Response includes:
# - ecrRepositoryUri: Where to push the image
# - authToken: Docker login credentials
# - ecrImageUri: Full image URI to use in deployment

API Keys

# Create Key
POST /api/v1/api-keys
{
  "name": "Production",
  "rate_limit": 10000
}

# List Keys
GET /api/v1/api-keys

# Regenerate Key
POST /api/v1/api-keys/{id}/regenerate

# Delete Key
DELETE /api/v1/api-keys/{id}

User Info

# Get Current User
GET /api/v1/user

# Response:
{
  "id": "uuid",
  "email": "user@example.com",
  "name": "John Doe",
  "organization": {...},
  "credit_balance": 5000
}

elizaOS Agents

# Create Room
POST /api/eliza/rooms
{
  "agentId": "uuid",
  "name": "Chat"
}

# Get Room Messages
GET /api/eliza/rooms/{roomId}/messages

# Send Message
POST /api/eliza/rooms/{roomId}/messages
{
  "content": "Hello!",
  "authorId": "user-uuid"
}

Rate Limiting

  • Default: 1000 requests/day per API key
  • Container Deployments: 5 per 5 minutes
  • Billing Endpoints: 100 per hour

Rate limits return:

{
  "error": "Rate limit exceeded",
  "retryAfter": 3600
}

Error Responses

{
  "success": false,
  "error": "Error message",
  "details": {...}
}

HTTP Status Codes:

  • 400: Bad Request (validation error)
  • 401: Unauthorized (missing/invalid auth)
  • 403: Forbidden (insufficient permissions)
  • 404: Not Found
  • 429: Too Many Requests (rate limited)
  • 500: Internal Server Error
  • 503: Service Unavailable (feature not configured)

🚢 Deployment

Deploying to Vercel (Recommended)

1. Push to GitHub:

git add .
git commit -m "Initial commit"
git push origin main

2. Import to Vercel:

  • Go to vercel.com/new
  • Import your repository
  • Vercel auto-detects Next.js

3. Configure Environment Variables:

Add all variables from .env.local in Vercel dashboard:

  • DATABASE_URL - Single database for platform and elizaOS tables
  • NEXT_PUBLIC_PRIVY_APP_ID, PRIVY_APP_SECRET, PRIVY_WEBHOOK_SECRET
  • OPENAI_API_KEY or AI_GATEWAY_API_KEY
  • BLOB_READ_WRITE_TOKEN (optional, for media gallery)
  • FAL_KEY (optional, for video generation)
  • STRIPE_SECRET_KEY, STRIPE_WEBHOOK_SECRET (optional, for payments)
  • AWS_REGION, AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY (optional, for container deployments)
  • ECS_CLUSTER_NAME, AWS_VPC_ID, AWS_SUBNET_IDS, AWS_SECURITY_GROUP_IDS (for containers)
  • CRON_SECRET (required for production)

Optional moderation config:

  • CONTENT_MODERATION_ENABLED=false - disables async content moderation checks (the app will still stream responses).

4. Update Privy Redirect URI:

  • Add production callback URL to Privy dashboard
  • Configure allowed origins: https://your-app.vercel.app

5. Deploy:

  • Click "Deploy"
  • Vercel automatically builds and deploys
  • Database migrations run on build

6. Configure Stripe Webhook:

  • Add webhook endpoint in Stripe dashboard
  • URL: https://your-app.vercel.app/api/stripe/webhook
  • Select events: checkout.session.completed, payment_intent.succeeded

Database Migrations in Production

Vercel runs migrations automatically via build script. For manual migration:

# Connect to production database
DATABASE_URL=postgres://prod-url npm run db:migrate

Monitoring

  • Vercel Analytics: Built-in (automatically enabled)
  • Logs: View in Vercel dashboard
  • Error Tracking: Console logs captured
  • Provider Health: Check /dashboard/analytics

🐛 Troubleshooting

Common Issues

1. Database Connection Errors

Error: Connection refused or SSL required

Solutions:

  • Verify DATABASE_URL includes ?sslmode=require
  • Check Neon dashboard for correct connection string
  • Ensure database is not paused (serverless auto-pause)

2. Authentication Issues

Error: Authentication errors or login failures

Solutions:

  • Verify NEXT_PUBLIC_PRIVY_APP_ID and PRIVY_APP_SECRET are correct
  • Check allowed origins in Privy dashboard match your domain (e.g., https://your-app.vercel.app)
  • Clear browser cookies and localStorage, then try again
  • Ensure Privy webhook is configured: https://your-domain.com/api/privy/webhook
  • Check webhook secret matches PRIVY_WEBHOOK_SECRET in your environment

3. Environment Variables Not Loading

Error: undefined values in runtime

Solutions:

  • Restart dev server after changing .env.local
  • Ensure file is named exactly .env.local (not .env)
  • Public variables must start with NEXT_PUBLIC_
  • In production, verify all variables set in Vercel dashboard

4. Container Deployment Fails

Error: "Container deployment failed" or "Deployment timeout"

Solutions:

  • Check AWS credentials are correct: AWS_REGION, AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY
  • Verify ECS configuration:
    • ECS_CLUSTER_NAME - cluster must exist or CloudFormation will create it
    • AWS_VPC_ID - must be a valid VPC ID
    • AWS_SUBNET_IDS - comma-separated subnet IDs in different AZs
    • AWS_SECURITY_GROUP_IDS - security group must allow HTTP/HTTPS ingress
    • ECS_EXECUTION_ROLE_ARN - IAM role for ECS task execution
  • Test AWS credentials: aws sts get-caller-identity
  • Check quota: GET /api/v1/containers/quota
  • View logs in AWS CloudWatch or ECS console
  • Ensure shared infrastructure is deployed: cd scripts/cloudformation && ./deploy-shared.sh

See docs/DEPLOYMENT_TROUBLESHOOTING.md for detailed troubleshooting.

5. Docker Image Push Fails

Error: "Failed to push image to ECR" or "Authentication failed"

Solutions:

  • Verify Docker is running: docker info
  • Check AWS ECR credentials are valid
  • Ensure image was built successfully: docker images
  • Verify network connectivity to ECR
  • Try re-authenticating: Request new credentials from /api/v1/containers/credentials

6. Image/Video Generation Fails

Error: "No image/video was generated" or timeout

Solutions:

  • Image: Verify Google Gemini access in AI Gateway or OpenAI API key
  • Video: Check FAL_KEY is set correctly
  • Try simpler prompts first
  • Check rate limits in provider dashboard
  • View error in /dashboard/analytics

7. Credits Not Deducting

Error: Usage not tracking or credits not deducted

Solutions:

  • Check credit_transactions table for records
  • Verify organization credit_balance column
  • Check for database transaction errors in logs
  • Ensure calculateCost() is being called

8. Stripe Webhook Not Working

Error: Credits not added after purchase

Solutions:

  • Verify STRIPE_WEBHOOK_SECRET matches Stripe dashboard
  • Check webhook endpoint URL is correct
  • View webhook events in Stripe dashboard → Developers → Webhooks
  • Test locally with Stripe CLI:
    stripe listen --forward-to localhost:3000/api/stripe/webhook

Getting Help

🚀 AWS ECS Container Deployment

Deploy elizaOS agents to AWS ECS (Elastic Container Service) using Docker containers. Each user gets a dedicated EC2 instance (t4g.small ARM, Graviton2) managed via CloudFormation.

Quick Start

# 1. Get your API key from the dashboard
# Visit https://your-domain.com/dashboard/api-keys

# 2. Set your API key
export ELIZAOS_API_KEY="your-api-key-here"

# 3. Ensure Docker is running locally
docker --version
docker info

# 4. Deploy your elizaOS project
cd your-elizaos-project
elizaos deploy

How It Works

  1. CLI requests ECR credentials from the cloud API
  2. CLI builds Docker image locally
  3. CLI pushes image to AWS ECR (Elastic Container Registry)
  4. CLI creates container deployment via cloud API
  5. Cloud deploys to dedicated EC2 instance (t4g.small ARM) with ECS
  6. Agent runs on AWS with health checks and monitoring

AWS Infrastructure Setup (Platform Maintainers)

1. Deploy Shared Infrastructure

The platform uses CloudFormation to provision per-user infrastructure. First, deploy shared resources:

cd scripts/cloudformation
./deploy-shared.sh

This creates:

  • VPC with public subnets
  • Application Load Balancer (ALB) for routing
  • IAM roles for ECS tasks
  • Security groups

2. Configure Environment Variables

# AWS Credentials
AWS_REGION=us-east-1
AWS_ACCESS_KEY_ID=your_aws_access_key
AWS_SECRET_ACCESS_KEY=your_aws_secret

# Network Configuration (from CloudFormation outputs)
AWS_VPC_ID=vpc-xxxxx
AWS_SUBNET_IDS=subnet-xxxxx,subnet-yyyyy
AWS_SECURITY_GROUP_IDS=sg-xxxxx

# ECS Configuration
ECS_CLUSTER_NAME=elizaos-production
ECS_EXECUTION_ROLE_ARN=arn:aws:iam::ACCOUNT:role/ecsTaskExecutionRole
ECS_TASK_ROLE_ARN=arn:aws:iam::ACCOUNT:role/ecsTaskRole

# Optional: Shared ALB (recommended for cost savings)
ECS_SHARED_ALB_ARN=arn:aws:elasticloadbalancing:...
ECS_SHARED_LISTENER_ARN=arn:aws:elasticloadbalancing:...

# Environment (for stack naming)
ENVIRONMENT=production

3. Start the Platform

npm run dev  # Development
npm run build && npm start  # Production

Users can now deploy via: elizaos deploy

For Users: Deployment Options

# Basic deployment
elizaos deploy

# With custom name and resources
elizaos deploy \
  --name my-agent \
  --port 8080 \
  --desired-count 2 \
  --cpu 512 \
  --memory 1024

# With environment variables
elizaos deploy \
  --env "OPENAI_API_KEY=sk-..." \
  --env "DATABASE_URL=postgresql://..."

# Using existing Docker image
elizaos deploy \
  --skip-build \
  --image-uri 123456789.dkr.ecr.us-east-1.amazonaws.com/my-project:v1.0.0

Verification

# Check container status via API
curl https://elizacloud.ai/api/v1/containers \
  -H "Authorization: Bearer $ELIZAOS_API_KEY"

# View in dashboard
# https://elizacloud.ai/dashboard/containers

Cost & Billing

Container deployments are billed daily:

  • Deployment: $0.50 one-time per deployment
  • Running Costs: $0.67/day per container (~$20/month)
    • Billed automatically at midnight UTC
    • 48-hour warning email sent when credits are low
    • Container shut down after 48 hours if no credits added

Infrastructure (managed by elizaOS Cloud):

  • t4g.small (1.75 vCPU + 1.75 GB RAM) default instance type
  • ECR image storage included
  • Load balancing included
  • Auto-scaling available for additional instances

Daily Billing Behavior:

  1. CRON runs daily at midnight UTC
  2. Charges $0.67 per running container
  3. If insufficient credits: 48-hour shutdown warning email
  4. If still insufficient after 48 hours: container stopped


📚 Additional Resources

Core Framework

Database & ORM

AI & Machine Learning

Authentication & Billing

Storage & Infrastructure

UI & Styling

Development Tools

📄 License

See the LICENSE file in the repository root.


Built with ❤️ for the elizaOS ecosystem

Iteration 2

prr-fix:ic-4143785130-8
Iteration 2

prr-fix:ic-4140973365-1
prr-fix:ic-4143807945-15
* The calling agent's character is not available here — skills are invoked via
* the A2A protocol which only provides user/org context, not agent personality.
*/
</search>
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor search/replace artifact left in production code

High Severity

A stray </search> tag from a Cursor search/replace operation was accidentally committed on line 12 of packages/lib/api/a2a/skills.ts. This is not valid TypeScript and will cause a syntax error, breaking the A2A skills module at parse time.

Fix in Cursor Fix in Web

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 67138cc.

supportsExtendedThinking,
type ThinkingConfig,
type CharacterThinkingSettings,
} from "./anthropic-thinking";
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Root test imports from nonexistent module path

Medium Severity

The root-level anthropic-thinking.test.ts imports validateBudgetTokens, getThinkingConfig, buildThinkingParam, and CharacterThinkingSettings from ./anthropic-thinking, but no such file exists at the repository root. The actual implementation lives at packages/lib/providers/anthropic-thinking.ts and exports completely different functions. This test will fail at import time.

Fix in Cursor Fix in Web

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dismissed: Comment targets line 9 but identifier validateBudgetTokens not found in file — code may have been removed or renamed

const multiplier = process.env.RATE_LIMIT_MULTIPLIER;
if (!multiplier) return 1;
const parsed = Number.parseInt(multiplier, 10);
return Number.isNaN(parsed) || parsed < 1 ? 1 : parsed;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Validator accepts floats but consumer uses parseInt

Low Severity

getRateLimitMultiplier parses RATE_LIMIT_MULTIPLIER with Number.parseInt, silently truncating decimal values (e.g., "1.5"1). However, the env validator in env-validator.ts explicitly accepts floats via parseFloat and regex /^\d+(\.\d+)?$/. An operator who sets a fractional multiplier would pass validation but get unexpected truncation.

Additional Locations (1)
Fix in Cursor Fix in Web

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could not auto-fix (wrong file or repeated failures); manual review recommended.

@claude
Copy link
Copy Markdown

claude bot commented Mar 27, 2026

Code Review

Overall: The core design is solid — thinking budgets are server-side controlled, the resolution hierarchy (character > env default, capped by env max) is clean, and the security rationale (untrusted MCP/A2A callers must not raise token budgets) is sound. The new anthropic-thinking.ts module is well-structured and the unit test coverage is thorough. A few issues worth addressing before merge:


Bug: Root-level anthropic-thinking.test.ts imports a missing file

anthropic-thinking.test.ts at the repo root imports from ./anthropic-thinking:

import {
  validateBudgetTokens,
  getThinkingConfig,
  buildThinkingParam,
  supportsExtendedThinking,
} from "./anthropic-thinking";

There is no anthropic-thinking.ts at the root level in this PR's diff. The functions validateBudgetTokens, getThinkingConfig, and buildThinkingParam are also not exported from packages/lib/providers/anthropic-thinking.ts. This test file will fail at import time. It looks like it was written against a different (earlier?) iteration of the API. Either remove it or wire it to the correct module.


Bug: model.includes("claude") in mcp/route.ts is inconsistent with the actual logic

In app/api/agents/[id]/mcp/route.ts:

const estimatedOutputTokens = model.includes("claude") && effectiveThinkingBudget > 0
  ? baseOutputTokens + effectiveThinkingBudget
  : baseOutputTokens;

resolveAnthropicThinkingBudgetTokens already returns null for non-Anthropic and non-thinking-capable models, so effectiveThinkingBudget will already be 0 for non-Claude models. The model.includes("claude") guard is therefore redundant — but it's also subtly wrong because it would match any hypothetical provider-prefixed model containing "claude" in the name that is not Anthropic. The a2a route handles this correctly using effectiveThinkingBudget != null instead. Suggest aligning both routes:

const estimatedOutputTokens = effectiveThinkingBudget > 0
  ? baseOutputTokens + effectiveThinkingBudget
  : baseOutputTokens;

Breaking change for local development: rate limits now match production by default

The PR changes RateLimitPresets from isDevelopment ? 10000 : N to flat production-level limits, with RATE_LIMIT_MULTIPLIER as the escape hatch. This will break any existing local dev or CI setup that relies on the previously relaxed limits. The new env var is also not in .env.example, making it hard to discover.

Suggest adding to .env.example:

# Set to a large number (e.g. 100) to relax rate limits during local development
# RATE_LIMIT_MULTIPLIER=1

Potential over-reservation of credits for thinking budget

In both MCP and A2A routes, the credit reservation adds the full effectiveThinkingBudget to the output token estimate:

const maxOutputTokens = effectiveThinkingBudget != null ? 500 + effectiveThinkingBudget : undefined;

effectiveThinkingBudget is the maximum budget (tokens available for thinking), not expected usage. For a 16K budget, this reserves credits for 16,500 output tokens even if the model only uses 2K thinking tokens. This could cause legitimate requests to fail with "insufficient credits" for users with smaller balances. Consider reserving a fraction of the thinking budget (e.g. 50%) or documenting this as an intentional conservative reservation.


Minor: supportsExtendedThinking dot-notation gap

The test in packages/tests/unit/anthropic-thinking.test.ts tests "anthropic/claude-sonnet-4" style IDs which all match. But the root test (noted above) expects "claude-3.5-sonnet" (dot notation) to return true. The pattern /claude-3-5-sonnet/ requires hyphens (3-5) and won't match the dot variant (3.5). If any callers or model registries use dot notation, thinking will silently be disabled. Worth adding /claude-3\.5-sonnet/ to the pattern list or confirming the dot variant is never used.


Nit: RATE_LIMIT_DISABLED bypass is production-unsafe

if (process.env.RATE_LIMIT_DISABLED === "true" && process.env.NODE_ENV !== "production") {
  return;
}

The NODE_ENV !== "production" guard is good, but RATE_LIMIT_DISABLED silently suppresses the Redis configuration check in non-prod. If someone accidentally sets this in a staging environment that has NODE_ENV=staging, they'd skip the warning about missing Redis. Consider limiting to NODE_ENV === "development" || NODE_ENV === "test" explicitly.


What's good

  • Security design: budgets live in character settings, not caller-supplied params — well-documented and enforced consistently
  • mergeProviderOptions deep-merge prevents Google responseModalities from being dropped when Anthropic CoT is added
  • Explicit 0 disables CoT for background services (app-builder, SEO, promotion) preserving temperature control — and the inline comments explain why
  • New env vars are documented in .env.example with clear explanations
  • Unit tests for the resolution logic are comprehensive (env override, agent override, max cap, 0-disable)

@claude
Copy link
Copy Markdown

claude bot commented Mar 27, 2026

test connection

@claude
Copy link
Copy Markdown

claude bot commented Mar 27, 2026

PR Review: feat: Anthropic CoT controls

Good overall direction — gating thinking budgets on server-side character settings (not untrusted request bodies) is the right security model. The per-agent override + env default + env cap hierarchy is clean. A few issues to address before merge:


Bug: Root-level test file imports non-existent module

anthropic-thinking.test.ts at the repo root imports from "./anthropic-thinking":

import {
  validateBudgetTokens,
  getThinkingConfig,
  buildThinkingParam,
  supportsExtendedThinking,
} from "./anthropic-thinking";

No anthropic-thinking.ts exists at root, and validateBudgetTokens, getThinkingConfig, buildThinkingParam are not exported from packages/lib/providers/anthropic-thinking.ts. This test will fail at import — it looks like a leftover from an earlier API design. Either delete it or rewrite it to import the actual provider module. The real unit tests in packages/tests/unit/anthropic-thinking.test.ts cover the production code correctly.


Bug: Fragile model detection in MCP credit reservation

In app/api/agents/[id]/mcp/route.ts:

const estimatedOutputTokens = model.includes("claude") && effectiveThinkingBudget > 0
  ? baseOutputTokens + effectiveThinkingBudget
  : baseOutputTokens;

supportsExtendedThinking() is already imported and handles this precisely. model.includes("claude") would match custom gateway aliases or any model with "claude" in the name that may not actually support extended thinking. These two checks can diverge.

// Prefer:
const estimatedOutputTokens = supportsExtendedThinking(model) && effectiveThinkingBudget > 0
  ? baseOutputTokens + effectiveThinkingBudget
  : baseOutputTokens;

Credit underestimation on generic v1 routes

MCP and A2A routes correctly add the thinking budget to estimatedOutputTokens before credit reservation. But /api/v1/chat, /api/v1/chat/completions, /api/v1/messages, and /api/v1/responses all spread mergeAnthropicCotProviderOptions(model) without adjusting their credit estimates. If ANTHROPIC_COT_BUDGET is set, these routes will silently under-reserve, which is especially risky for /api/v1/messages and /api/v1/responses (direct Anthropic API shape, typically larger prompts).

Consider either: (a) adjusting the estimate the same way MCP/A2A do, or (b) documenting why the asymmetry is intentional.


Silent temperature stripping on shared endpoints

When ANTHROPIC_COT_BUDGET is set and an Anthropic model is used, @ai-sdk/anthropic silently strips temperature/topP/topK. The /api/v1/chat/completions and /api/v1/responses routes accept user-supplied temperature that will be silently dropped. OpenAI-compatible clients relying on temperature control will get unexpected behavior with no signal. A warning log or API-level note would help.


Minor: Redundant null-coalescing in A2A route

In app/api/agents/[id]/a2a/route.ts, agentThinkingBudget is already number | undefined so ?? undefined is a no-op. Not a bug but noisy — the MCP route passes the value directly.


Positive callouts

  • Blocking thinking budget from MCP/A2A request bodies is the correct trust model — only the agent owner can set it via character settings.
  • Background services (SEO, Twitter, promotion, App Builder) all explicitly pass 0 to disable CoT — good discipline to preserve temperature control.
  • RATE_LIMIT_DISABLED is properly guarded with NODE_ENV !== "production" — no prod exposure.
  • mergeProviderOptions deep-merge avoids clobbering existing gateway/google provider options.
  • Test coverage in packages/tests/unit/anthropic-thinking.test.ts is solid: env parsing edge cases, model detection, precedence, and cap behavior all covered.
  • Env validation fail-fast for invalid ANTHROPIC_COT_BUDGET* values is a good addition.

Iteration 1

prr-fix:ic-4143706884-3
prr-fix:ic-4144015244-5
Iteration 1

prr-fix:ic-4143990242-4
Iteration 1

prr-fix:prrc_kwdop6btdm6y8tp_
@claude
Copy link
Copy Markdown

claude bot commented Mar 27, 2026

Code Review: feat: Anthropic CoT controls

Overall this is a well-structured feature. The separation of concerns (character settings → env default → env cap) is sound, and the rationale for not allowing CoT budget in request bodies is clearly documented. A few issues need attention before merge.

Critical

1. Root-level test file imports a nonexistent module

anthropic-thinking.test.ts at the repo root imports from './anthropic-thinking', but there is no such file at the root. The actual implementation is at packages/lib/providers/anthropic-thinking.ts, which exports none of the names the test uses (validateBudgetTokens, getThinkingConfig, buildThinkingParam, ThinkingConfig, CharacterThinkingSettings). This test file will fail at import time and never runs.

Additionally, even if the import path were fixed, supportsExtendedThinking("claude-3.5-sonnet") is expected to return true but the regex /claude-3-5-sonnet/ (dashes) does not match the dot-notation string claude-3.5-sonnet. The pattern list should cover both forms, e.g. /claude-3[.-]5-sonnet/.

Either move the test to packages/lib/providers/ with the correct API surface, or delete it.

Medium

2. getRateLimitMultiplier silently truncates float values

The env-validator for RATE_LIMIT_MULTIPLIER accepts floats (regex \d+(\.\d+)?), but the function calls Number.parseInt:

const parsed = Number.parseInt(multiplier, 10);  // 1.5 → 1

A value of 1.5 passes startup validation but silently becomes 1 at runtime. Either use parseFloat in the function, or restrict the validator to integers only.

3. RateLimitPresets is no longer as const

Removing as const makes the exported object and its nested values mutable. Since rateLimitMultiplier is a runtime value as const cannot be used inline, but Object.freeze can be applied to each preset to restore the original immutability guarantee and prevent accidental mutation across requests.

Minor / Observations

4. No unit tests for the actual exported API

The key functions in packages/lib/providers/anthropic-thinking.ts (resolveAnthropicThinkingBudgetTokens, parseAnthropicCotBudgetFromEnv, parseAnthropicCotBudgetMaxFromEnv, mergeAnthropicCotProviderOptions) have no tests in this PR. The root test file was apparently intended to cover this but targets the wrong module. Consider adding tests alongside the implementation or in packages/tests/unit/.

5. Credit over-reservation for thinking budget is undocumented

Both MCP and A2A routes compute estimatedOutputTokens = 500 + effectiveThinkingBudget. The budget is a ceiling, not expected usage — actual thinking consumption is usually lower. This is safe, but it can cause users with low balances to hit reservation failures for agents with large budgets. A comment clarifying this is an intentional worst-case estimate would help future maintainers.

6. mergeProviderOptions only deep-merges 3 provider keys

If a future call site passes both base and extra with overlapping openai or mistral keys, the top-level spread silently drops the base values. Current usage is safe, but the function name implies a full deep-merge. Consider renaming it (e.g. mergeAnthropicGatewayGoogleOptions) or adding a warning near the signature.

What looks good

  • Resolution precedence (character → env default → env cap) in resolveAnthropicThinkingBudgetTokens is correct and clearly documented.
  • All background services (SEO, promotion, tweet, app builder) explicitly pass 0 to disable CoT with clear comments about the temperature-interaction risk — good defensive practice.
  • parsePositiveIntStrict + fail-fast env validation prevents runtime surprises from misconfigured env vars.
  • RATE_LIMIT_DISABLED bypass is appropriately guarded behind NODE_ENV !== 'production'.
  • Inline WHY comments throughout are well-written and follow the project convention.

Explains reasoning for dismissed issues inline in code
@claude
Copy link
Copy Markdown

claude bot commented Mar 27, 2026

Code Review

Overall this is a well-designed feature with good security thinking — keeping the thinking budget server-controlled (character settings + env) rather than client-controlled is the right call. The layered env default + per-agent override + hard cap model is solid. A few issues worth addressing before merge:


Bugs

1. anthropic-thinking.test.ts at repo root tests a different API than the production file

The test imports validateBudgetTokens, getThinkingConfig, buildThinkingParam, ThinkingConfig, and CharacterThinkingSettings from ./anthropic-thinking:

import {
  validateBudgetTokens,
  getThinkingConfig,
  buildThinkingParam,
  supportsExtendedThinking,
  type ThinkingConfig,
  type CharacterThinkingSettings,
} from "./anthropic-thinking";

None of these names exist in packages/lib/providers/anthropic-thinking.ts. That module exports parseAnthropicCotBudgetFromEnv, resolveAnthropicThinkingBudgetTokens, mergeAnthropicCotProviderOptions, etc. Either:

  • There's a stale/draft anthropic-thinking.ts at the repo root being tested (not visible in this diff), meaning the tests don't cover production code at all.
  • Or the test file will fail to import, making the CI step meaningless.

The unit tests for the production module should be colocated under packages/lib/providers/ and test the actual exported functions (e.g. resolveAnthropicThinkingBudgetTokens, parseThinkingBudgetFromCharacterSettings, mergeAnthropicCotProviderOptions).


2. as const removed from RateLimitPresets widens TypeScript types

-} as const;
+};

This changes RateLimitPresets.STRICT.maxRequests from a literal type (e.g. 10) to number. Any code that was typed against the narrow literal types will now have wider types. More importantly, RateLimitPresets is no longer readonly — callers could accidentally mutate the presets. Consider keeping as const or switching to Object.freeze().


Performance / Correctness

3. Credit reservation estimates the thinking budget twice for MCP vs A2A routes

MCP route (mcp/route.ts):

const effectiveThinkingBudget = resolveAnthropicThinkingBudgetTokens(...) ?? 0;
const estimatedOutputTokens = model.includes("claude") && effectiveThinkingBudget > 0
  ? baseOutputTokens + effectiveThinkingBudget
  : baseOutputTokens;

A2A route (a2a/route.ts):

const maxOutputTokens = effectiveThinkingBudget != null ? 500 + effectiveThinkingBudget : undefined;
const baseCost = await estimateRequestCost(model, fullMessages, maxOutputTokens);

The MCP route uses model.includes("claude") as an additional guard, but resolveAnthropicThinkingBudgetTokens already validated the model is Anthropic and supports extended thinking — if the result is non-null and positive, the model check is redundant and could silently fail if a provider-prefixed model ID like anthropic/claude-sonnet-4 is used (it includes "claude", so this is fine today, but it's an implicit coupling). The A2A route's guard != null is cleaner. Suggest aligning both routes.


Breaking Change for Developers

4. Rate limits now default to production values in dev — requires action to restore old behavior

The change removes the isDevelopment branch that gave dev environments 10,000 req/window. The new default is production-level limits with an opt-in RATE_LIMIT_MULTIPLIER env var. This is a breaking change for anyone running locally or in CI integration tests without setting RATE_LIMIT_MULTIPLIER or RATE_LIMIT_DISABLED=true.

The PR description notes this ("Rate-limit behavior in development also changes to production-level defaults") but the .env.example doesn't add RATE_LIMIT_MULTIPLIER with a recommended dev value. Worth adding:

# Increase rate limits for local development (e.g. RATE_LIMIT_MULTIPLIER=100)
# RATE_LIMIT_MULTIPLIER=100

5. Minor: getRateLimitMultiplier uses parseInt but env-validator allows floats

In rate-limit.ts:

const parsed = Number.parseInt(multiplier, 10);

In env-validator.ts:

if (!/^\d+(\.\d+)?$/.test(trimmed)) return false;
const n = Number.parseFloat(trimmed);
return n > 0 && Number.isFinite(n);

The validator accepts "1.5" as valid but the consumer truncates it to 1. Either validate as integer-only or use parseFloat at consumption.


What's Good

  • Security model is correct: thinking budget lives on the character record and in env, never in client request bodies. The mergeAnthropicCotProviderOptions(model, process.env, 0) pattern for background services (SEO, tweet gen, app-builder, promo assets) that need temperature control is well-reasoned.
  • failOnInvalid: true in env-validator gives fail-fast startup semantics for misconfigured ANTHROPIC_COT_BUDGET / ANTHROPIC_COT_BUDGET_MAX — good.
  • mergeProviderOptions deep-merges gateway, anthropic, and google sub-keys correctly; the JSDoc caveat about other providers being clobbered is appropriate.
  • Discord sendTyping guard ("sendTyping" in channel) correctly handles DM vs guild channel type variance.
  • ANTHROPIC_COT_BUDGET_MAX capping logic is clean and the precedence (character > env default, max caps both) is documented.

Copy link
Copy Markdown

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 2 potential issues.

Fix All in Cursor

Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

if (!multiplier) return 1;
const parsed = Number.parseInt(multiplier, 10);
return Number.isNaN(parsed) || parsed < 1 ? 1 : parsed;
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rate limit multiplier applies in production without guard

Medium Severity

RATE_LIMIT_MULTIPLIER is read and applied unconditionally in all environments, including production. Previously, production always used strict numeric limits (60, 10, 200, 5, 10, 100) while only development got relaxed limits (10000). Now, if RATE_LIMIT_MULTIPLIER is accidentally set in a production deployment (e.g., left over from staging), all rate limits are multiplied — RATE_LIMIT_MULTIPLIER=100 would allow 6000 requests/minute on STANDARD endpoints. getRateLimitMultiplier has no NODE_ENV === "production" guard to force the multiplier to 1.

Additional Locations (1)
Fix in Cursor Fix in Web

const n = Number.parseFloat(trimmed);
return n > 0 && Number.isFinite(n);
},
errorMessage: "Must be a positive number (e.g., 1, 10, 100)",
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rate limit multiplier validator accepts floats, parser truncates

Low Severity

The RATE_LIMIT_MULTIPLIER env validator accepts float strings (regex ^\d+(\.\d+)?$ and Number.parseFloat), but getRateLimitMultiplier uses Number.parseInt(multiplier, 10) which silently truncates decimals. Setting RATE_LIMIT_MULTIPLIER=1.5 passes validation but is treated as 1 at runtime. The validator and the consumer disagree on the accepted format.

Additional Locations (1)
Fix in Cursor Fix in Web

@claude
Copy link
Copy Markdown

claude bot commented Mar 27, 2026

Code Review: feat: Anthropic CoT controls

Overall the design is solid — locking the thinking budget to character settings rather than request bodies is the right security call, and the deep-merge helpers prevent accidental provider-key clobbering. A few issues need fixing before merge.


Critical — will break at runtime/test time

1. anthropic-thinking.test.ts imports symbols that do not exist

The test file at the repo root imports validateBudgetTokens, getThinkingConfig, buildThinkingParam, ThinkingConfig, and CharacterThinkingSettings from "./anthropic-thinking". None of these are exported from packages/lib/providers/anthropic-thinking.ts. Additionally, the relative import "./anthropic-thinking" from the project root resolves to a file that does not exist in this PR, and unit tests belong in packages/tests/unit/, not the repo root.

This test file appears to have been generated against a stale or different API surface and will fail to compile. Either update it to test the actual exported symbols with correct import paths, or move it to the proper test directory with matching function names.

2. supportsExtendedThinking("claude-3.5-sonnet") will return false, not true

The pattern /claude-3-5-sonnet/ uses a literal hyphen at position 8. The string "claude-3.5-sonnet" has a period there, so the regex does NOT match. The test asserts this returns true — it is a broken assertion. Fix: use /claude-3[.-]5-sonnet/ or add a separate pattern for the dot-notation variant used in some Anthropic model IDs.


Behavioral Change — developer experience regression

3. Rate-limit preset change silently breaks local dev

Removing the isDevelopment branch means local dev now defaults to production-level limits (60 req/min for STANDARD). The prior behavior was 10,000. Developers who do not add RATE_LIMIT_MULTIPLIER to .env.local will hit 429s during normal usage and integration tests. RATE_LIMIT_MULTIPLIER should be added as a commented example to .env.example.


Logic / Correctness

4. Dead code in A2A route: agentThinkingBudget ?? undefined

parseThinkingBudgetFromCharacterSettings already returns number | undefined. The ?? undefined is a no-op, inconsistent with the MCP route which passes agentThinkingBudget directly.

5. Inconsistent model-check heuristic for credit estimation

mcp/route.ts uses model.includes("claude") to gate adding thinking tokens to the output estimate; a2a/route.ts uses effectiveThinkingBudget != null. Since resolveAnthropicThinkingBudgetTokens already returns null for non-Anthropic models, model.includes("claude") is redundant and would miss provider-prefixed IDs like "anthropic/claude-sonnet-4". Prefer effectiveThinkingBudget > 0 consistently across both routes.

6. ANTHROPIC_COT_BUDGET_MAX=0 semantics are ambiguous

The env-validator accepts 0 and the error message says "0 = no cap", but an operator might expect 0 to mean "block all thinking." parseAnthropicCotBudgetMaxFromEnv also returns null for 0 (= no cap), which is functionally the same but semantically surprising. Consider rejecting 0 in the validator and requiring operators to unset the variable instead.


Minor / Nits

7. mergeProviderOptions silently clobbers unknown provider keys

Only gateway, anthropic, and google are deep-merged. Any other provider key present in both operands is overwritten by the top-level spread. The comment documents this, but CloudMergedProviderOptions is Record<string, JSONObject> so there is no type-level protection as more providers are added.

8. Credit reservation underestimates for thinking-heavy responses

500 + effectiveThinkingBudget covers worst-case thinking tokens but not additional text output. Acknowledged in comments; billUsage reconciles on finish. Worth noting in operator docs that reservations are intentionally conservative low-side estimates.


What is well done

  • Thinking budget locked to character settings, not request bodies — correct threat model.
  • resolveAnthropicThinkingBudgetTokens as a single resolution path with env fallback and max-cap clamping is clean.
  • Explicit 0 passed to background services (SEO, promo, app-builder, twitter) to preserve temperature control, with clear comments explaining why.
  • failOnInvalid: true on new env vars catches misconfigurations at startup rather than silently at request time.
  • mergeGoogleImageModalitiesWithAnthropicCot correctly preserves both google and anthropic provider keys via deep merge.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants