Skip to content

infra: bump default modelCapacity 50 → 150 (avoid day-one 429s)#18

Closed
paulyuk wants to merge 2 commits into
mainfrom
paulyuk/model-capacity-default-150
Closed

infra: bump default modelCapacity 50 → 150 (avoid day-one 429s)#18
paulyuk wants to merge 2 commits into
mainfrom
paulyuk/model-capacity-default-150

Conversation

@paulyuk

@paulyuk paulyuk commented Jun 16, 2026

Copy link
Copy Markdown
Member

Problem

Default modelCapacity = 50 (= 50K TPM) is too low for a multi-step agent template. A single daily-briefing run pulls a chunk of inbox, reasons, and renders — easily 10–20K tokens. Two runs within a minute trip the per-deployment rate limit and surface as:

Error calling daily_briefing /chat: HTTP 500
{"error": "...Error code: 429 - {'error': {'code': 'rate_limit_exceeded', 'message': 'Model deployment rate limit exceeded. Too Many Requests...'}}"}

This is a terrible first-deploy UX — operators assume their config, code, or model is broken.

Repro

  1. Fresh azd up of this template.
  2. uv run python chat.py → option 2 (daily-briefing).
  3. Run option 2 again within ~30s. → 429.

Fix

Bump default modelCapacity from 50 → 150 (= 150K TPM).

Why 150 is the right default

  • Comfortably handles repeated single-user agent runs — a typical chain is 4–6 LLM calls; 150K TPM gives ~10× headroom per minute.
  • Within most subs' regional quota. New subs typically get 1000 units (1M TPM) per model per region for the gpt-5 family on GlobalStandard. 150 is 15% of that → plenty of room.
  • Leaves room for multiple template deploys in the same sub before quota-bumping is needed.

Self-check command

az cognitiveservices usage list --location <region> \
  --query "[?contains(name.value, 'gpt-5.4-mini')]" -o table

If limit - currentValue < 150, lower the param at deploy time:

azd env set MODEL_CAPACITY 100   # or whatever fits
azd up

In-place fix for users hit by this on the old default

No need to redeploy from scratch — bump the existing deployment:

az cognitiveservices account deployment create \
  -g <resource-group> -n <ai-services-account> \
  --deployment-name gpt-5.4-mini --model-name gpt-5.4-mini \
  --model-version 2026-03-17 --model-format OpenAI \
  --sku-name GlobalStandard --sku-capacity 150

(deployment create on an existing name is an in-place update — completes in seconds, no app downtime.)

Verified

Tested on a live deploy that was 429-ing — bump took 5s, daily-briefing succeeded on next call.

The default GlobalStandard capacity of 50 (= 50K TPM) is too low for a
multi-step agent template. A single daily-briefing run pulls a chunk of
inbox, reasons over it, and renders -- easily 10-20K tokens. Two runs
within a minute trip the per-deployment rate limit and surface as:

  HTTP 500: 'rate_limit_exceeded' ... 'Too Many Requests'

...which is a terrible first-deploy UX (people assume their config or
code is broken).

150 (= 150K TPM):
  - Comfortably handles repeated single-user agent runs.
  - Stays well within the default per-region/per-model GlobalStandard
    quota for new subscriptions (typically 1000 units = 1M TPM/region
    for the gpt-5 family).
  - Leaves room in a sub to host several template deploys side-by-side
    before quota-bumping is needed.

The param remains tweakable for users who are quota-constrained or who
need more headroom for multi-user workloads. To check your headroom:

  az cognitiveservices usage list --location <region> \
    --query "[?contains(name.value, 'gpt-5.4-mini')]" -o table

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Comment thread infra/main.bicep
@manvkaur

Copy link
Copy Markdown

@copilot uv run ruff format --check and create a PR to merge in this branch

Copilot AI commented Jun 16, 2026

Copy link
Copy Markdown
Contributor

@manvkaur I've opened a new pull request, #19, to work on those changes. Once the pull request is ready, I'll request review from you.

@paulyuk

paulyuk commented Jun 16, 2026

Copy link
Copy Markdown
Member Author

Superseded by #19, which included the same modelCapacity 50→150 bump (with the longer description). Closing as redundant after rebase showed an empty diff.

@paulyuk paulyuk closed this Jun 16, 2026
@paulyuk paulyuk deleted the paulyuk/model-capacity-default-150 branch June 16, 2026 23:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants