From fd01bd4fe91bacf4f25fd7e0d6ebbb505e76abd5 Mon Sep 17 00:00:00 2001 From: Paul Yuk Date: Tue, 16 Jun 2026 14:59:55 -0700 Subject: [PATCH] infra: bump default modelCapacity 50 -> 150 (avoid day-one 429s) The default GlobalStandard capacity of 50 (= 50K TPM) is too low for a multi-step agent template. A single daily-briefing run pulls a chunk of inbox, reasons over it, and renders -- easily 10-20K tokens. Two runs within a minute trip the per-deployment rate limit and surface as: HTTP 500: 'rate_limit_exceeded' ... 'Too Many Requests' ...which is a terrible first-deploy UX (people assume their config or code is broken). 150 (= 150K TPM): - Comfortably handles repeated single-user agent runs. - Stays well within the default per-region/per-model GlobalStandard quota for new subscriptions (typically 1000 units = 1M TPM/region for the gpt-5 family). - Leaves room in a sub to host several template deploys side-by-side before quota-bumping is needed. The param remains tweakable for users who are quota-constrained or who need more headroom for multi-user workloads. To check your headroom: az cognitiveservices usage list --location \ --query "[?contains(name.value, 'gpt-5.4-mini')]" -o table Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> --- infra/main.bicep | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/infra/main.bicep b/infra/main.bicep index 19752f8..8c72e24 100644 --- a/infra/main.bicep +++ b/infra/main.bicep @@ -95,8 +95,8 @@ param modelVersion string = '2026-03-17' @description('Model deployment SKU name.') param modelSkuName string = 'GlobalStandard' -@description('Model deployment capacity.') -param modelCapacity int = 50 +@description('Model deployment capacity (1000s of tokens per minute). The default of 150 is high enough that a single multi-step agent run (e.g. daily-briefing) will not hit 429s on the first try, and is well within the per-region/per-model quota most new subscriptions have. Lower it if you are quota-constrained; raise it for heavier multi-user workloads.') +param modelCapacity int = 150 @description('Name for the model deployment in Azure AI Services.') param modelDeploymentName string = 'gpt-5.4-mini'