infra: bump default modelCapacity 50 → 150 (avoid day-one 429s) by paulyuk · Pull Request #18 · Azure-Samples/m365-inbox-serverless-agent-python

paulyuk · 2026-06-16T22:00:17Z

Problem

Default modelCapacity = 50 (= 50K TPM) is too low for a multi-step agent template. A single daily-briefing run pulls a chunk of inbox, reasons, and renders — easily 10–20K tokens. Two runs within a minute trip the per-deployment rate limit and surface as:

Error calling daily_briefing /chat: HTTP 500
{"error": "...Error code: 429 - {'error': {'code': 'rate_limit_exceeded', 'message': 'Model deployment rate limit exceeded. Too Many Requests...'}}"}

This is a terrible first-deploy UX — operators assume their config, code, or model is broken.

Repro

Fresh azd up of this template.
uv run python chat.py → option 2 (daily-briefing).
Run option 2 again within ~30s. → 429.

Fix

Bump default modelCapacity from 50 → 150 (= 150K TPM).

Why 150 is the right default

Comfortably handles repeated single-user agent runs — a typical chain is 4–6 LLM calls; 150K TPM gives ~10× headroom per minute.
Within most subs' regional quota. New subs typically get 1000 units (1M TPM) per model per region for the gpt-5 family on GlobalStandard. 150 is 15% of that → plenty of room.
Leaves room for multiple template deploys in the same sub before quota-bumping is needed.

Self-check command

az cognitiveservices usage list --location <region> \
  --query "[?contains(name.value, 'gpt-5.4-mini')]" -o table

If limit - currentValue < 150, lower the param at deploy time:

azd env set MODEL_CAPACITY 100   # or whatever fits
azd up

In-place fix for users hit by this on the old default

No need to redeploy from scratch — bump the existing deployment:

az cognitiveservices account deployment create \
  -g <resource-group> -n <ai-services-account> \
  --deployment-name gpt-5.4-mini --model-name gpt-5.4-mini \
  --model-version 2026-03-17 --model-format OpenAI \
  --sku-name GlobalStandard --sku-capacity 150

(deployment create on an existing name is an in-place update — completes in seconds, no app downtime.)

Verified

Tested on a live deploy that was 429-ing — bump took 5s, daily-briefing succeeded on next call.

The default GlobalStandard capacity of 50 (= 50K TPM) is too low for a multi-step agent template. A single daily-briefing run pulls a chunk of inbox, reasons over it, and renders -- easily 10-20K tokens. Two runs within a minute trip the per-deployment rate limit and surface as: HTTP 500: 'rate_limit_exceeded' ... 'Too Many Requests' ...which is a terrible first-deploy UX (people assume their config or code is broken). 150 (= 150K TPM): - Comfortably handles repeated single-user agent runs. - Stays well within the default per-region/per-model GlobalStandard quota for new subscriptions (typically 1000 units = 1M TPM/region for the gpt-5 family). - Leaves room in a sub to host several template deploys side-by-side before quota-bumping is needed. The param remains tweakable for users who are quota-constrained or who need more headroom for multi-user workloads. To check your headroom: az cognitiveservices usage list --location <region> \ --query "[?contains(name.value, 'gpt-5.4-mini')]" -o table Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

manvkaur · 2026-06-16T22:23:12Z

@copilot uv run ruff format --check and create a PR to merge in this branch

Copilot · 2026-06-16T22:23:23Z

@manvkaur I've opened a new pull request, #19, to work on those changes. Once the pull request is ready, I'll request review from you.

paulyuk · 2026-06-16T23:43:47Z

Superseded by #19, which included the same modelCapacity 50→150 bump (with the longer description). Closing as redundant after rebase showed an empty diff.

manvkaur reviewed Jun 16, 2026

View reviewed changes

Comment thread infra/main.bicep

Copilot AI mentioned this pull request Jun 16, 2026

format match_rule.py and chat.py #19

Merged

manvkaur approved these changes Jun 16, 2026

View reviewed changes

Merge branch 'main' into paulyuk/model-capacity-default-150

5be1553

paulyuk closed this Jun 16, 2026

paulyuk deleted the paulyuk/model-capacity-default-150 branch June 16, 2026 23:43

paulyuk mentioned this pull request Jun 17, 2026

Migrate quickstart from Functions Core Tools v4 to Functions CLI v5 (func5) #20

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

infra: bump default modelCapacity 50 → 150 (avoid day-one 429s)#18

infra: bump default modelCapacity 50 → 150 (avoid day-one 429s)#18
paulyuk wants to merge 2 commits into
mainfrom
paulyuk/model-capacity-default-150

paulyuk commented Jun 16, 2026

Uh oh!

Uh oh!

manvkaur commented Jun 16, 2026

Uh oh!

Copilot AI commented Jun 16, 2026

Uh oh!

paulyuk commented Jun 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

paulyuk commented Jun 16, 2026

Problem

Repro

Fix

Why 150 is the right default

Self-check command

In-place fix for users hit by this on the old default

Verified

Uh oh!

Uh oh!

manvkaur commented Jun 16, 2026

Uh oh!

Copilot AI commented Jun 16, 2026

Uh oh!

paulyuk commented Jun 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants