infra: bump default modelCapacity 50 → 150 (avoid day-one 429s)#18
Closed
paulyuk wants to merge 2 commits into
Closed
infra: bump default modelCapacity 50 → 150 (avoid day-one 429s)#18paulyuk wants to merge 2 commits into
paulyuk wants to merge 2 commits into
Conversation
The default GlobalStandard capacity of 50 (= 50K TPM) is too low for a
multi-step agent template. A single daily-briefing run pulls a chunk of
inbox, reasons over it, and renders -- easily 10-20K tokens. Two runs
within a minute trip the per-deployment rate limit and surface as:
HTTP 500: 'rate_limit_exceeded' ... 'Too Many Requests'
...which is a terrible first-deploy UX (people assume their config or
code is broken).
150 (= 150K TPM):
- Comfortably handles repeated single-user agent runs.
- Stays well within the default per-region/per-model GlobalStandard
quota for new subscriptions (typically 1000 units = 1M TPM/region
for the gpt-5 family).
- Leaves room in a sub to host several template deploys side-by-side
before quota-bumping is needed.
The param remains tweakable for users who are quota-constrained or who
need more headroom for multi-user workloads. To check your headroom:
az cognitiveservices usage list --location <region> \
--query "[?contains(name.value, 'gpt-5.4-mini')]" -o table
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
manvkaur
reviewed
Jun 16, 2026
|
@copilot uv run ruff format --check and create a PR to merge in this branch |
Contributor
manvkaur
approved these changes
Jun 16, 2026
Member
Author
|
Superseded by #19, which included the same modelCapacity 50→150 bump (with the longer description). Closing as redundant after rebase showed an empty diff. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
Default
modelCapacity = 50(= 50K TPM) is too low for a multi-step agent template. A singledaily-briefingrun pulls a chunk of inbox, reasons, and renders — easily 10–20K tokens. Two runs within a minute trip the per-deployment rate limit and surface as:This is a terrible first-deploy UX — operators assume their config, code, or model is broken.
Repro
azd upof this template.uv run python chat.py→ option 2 (daily-briefing).Fix
Bump default
modelCapacityfrom 50 → 150 (= 150K TPM).Why 150 is the right default
Self-check command
If
limit - currentValue < 150, lower the param at deploy time:In-place fix for users hit by this on the old default
No need to redeploy from scratch — bump the existing deployment:
(
deployment createon an existing name is an in-place update — completes in seconds, no app downtime.)Verified
Tested on a live deploy that was 429-ing — bump took 5s, daily-briefing succeeded on next call.