Skip to content

Bound Claude call to a deadline under the API Gateway 30s limit#19

Merged
mattgodbolt merged 3 commits into
mainfrom
fix/explain-503-timeout
Jun 15, 2026
Merged

Bound Claude call to a deadline under the API Gateway 30s limit#19
mattgodbolt merged 3 commits into
mainfrom
fix/explain-503-timeout

Conversation

@mattgodbolt-molty

Copy link
Copy Markdown
Contributor

(I'm Molty, an AI assistant acting on behalf of @mattgodbolt)

Problem

Users report intermittent 503s from the /explain endpoint. The Lambda itself is completely healthy (Throttles=0, Errors=0) and the app only ever logs 200/307. The 503s come from API Gateway: its HTTP API integration timeout is a hard 30s ceiling that cannot be raised, but the Lambda timeout was set to 60s.

Requests whose generation runs past 30s are severed by the gateway with an opaque {"message":"Service Unavailable"} (33 bytes) while the Lambda keeps running to completion — billing tokens we never deliver.

Quantified impact (CloudWatch, 2026-06-15):

  • Today: 33/649 requests 503'd (5.1%)
  • Yesterday: 27/304 requests 503'd (8.9%)
  • Lambda duration max: 37.8s — well past the 30s ceiling

Fix

Bound each Claude call to a wall-clock budget (default 27s, configurable via ANTHROPIC_TIMEOUT_SECONDS) that sits comfortably below the gateway ceiling. On timeout or a transient upstream failure, return a structured, retryable error response (status="error", ClaudeExplainTimeout metric) instead of letting the gateway emit a 503.

Changes:

  • config.py: new anthropic_timeout_seconds (27.0) and anthropic_max_retries (2) settings
  • explain.py: wrap client.messages.create in asyncio.timeout(deadline_seconds), catch TimeoutError/APITimeoutError/APIConnectionError and transient HTTP errors (408/429/5xx/529), return a user-friendly error response
  • main.py: pass timeout/max_retries to AsyncAnthropic client and thread deadline_seconds through to process_request
  • test_explain.py: new test covering the deadline path

🤖 Generated by LLM (Claude, via OpenClaw)

Users report intermittent 503s from the explain endpoint. The Lambda
itself is healthy (Throttles=0, Errors=0) and the app only ever logs
200/307. The 503s come from API Gateway: its HTTP API integration
timeout is a hard 30s ceiling, but the Lambda timeout was raised to 60s.
Requests whose generation runs past 30s (218 over the last 4 days, up to
~40s) are severed by the gateway with an opaque 503 while the Lambda
keeps running to completion — billing tokens we never deliver.

Bound each Claude call to a wall-clock budget (default 27s, configurable
via ANTHROPIC_TIMEOUT_SECONDS) that sits comfortably below the gateway
ceiling, and set the SDK per-attempt timeout and retry budget to match.
On timeout or a transient upstream failure, return a structured,
retryable error response (status="error", ClaudeExplainTimeout metric)
instead of letting the gateway emit a 503. Genuine client errors (e.g. a
400) still propagate as real failures.

🤖 Generated by LLM (Claude, via OpenClaw)

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@mattgodbolt

Copy link
Copy Markdown
Member

Cc @dkm I had this branch hanging around...will be investigating other ways to speed up. Well, Claude will

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This pull request mitigates intermittent API Gateway–originated 503s on the /explain endpoint by ensuring Claude generation work is bounded to a wall-clock budget below the gateway’s fixed 30s integration timeout, returning a structured error response instead of letting the gateway sever the connection.

Changes:

  • Added configurable Anthropic timeout and retry settings (anthropic_timeout_seconds, anthropic_max_retries) with a default 27s budget.
  • Wrapped client.messages.create(...) with an asyncio.timeout(...) wall-clock deadline and converted timeout/transient upstream failures into a structured status="error" response plus metrics.
  • Updated FastAPI app startup to configure the Anthropic client timeout/retries and threaded the deadline through request processing; added a test covering the deadline path.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 1 comment.

File Description
app/config.py Introduces Anthropic timeout/retry settings to keep requests under API Gateway limits.
app/explain.py Enforces a wall-clock deadline around Claude calls and returns structured retryable errors on transient failures.
app/main.py Configures AsyncAnthropic with timeout/retries and passes the deadline into process_request.
app/test_explain.py Adds an async test ensuring overrunning the deadline returns a structured error response.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread app/explain.py Outdated
Comment on lines +92 to +96
message_text = (
"Claude Explain could not generate an explanation in time. "
"This usually means the model was busy or the input was very large; "
"please try again in a moment."
)
Copilot review noted the original single message ('could not generate an
explanation in time') was inaccurate for non-timeout failures like connection
errors or transient 5xx responses. Now returns a timeout-specific message for
TimeoutError/APITimeoutError and a generic 'temporarily unavailable' message
for everything else.

🤖 Generated by LLM (Claude, via OpenClaw)

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 4 out of 4 changed files in this pull request and generated 1 comment.

Comment thread app/explain.py
Comment on lines +92 to +106
if isinstance(error, (TimeoutError, APITimeoutError)):
message_text = (
"Claude Explain took too long to respond — the input may be very large "
"or the model is under heavy load. Please try again in a moment."
)
else:
message_text = "Claude Explain is temporarily unavailable. Please try again in a moment."
LOGGER.warning("Anthropic call failed (%s): %s", type(error).__name__, error)
metrics_provider.set_property("language", body.language)
metrics_provider.set_property("compiler", body.compiler)
metrics_provider.set_property("instructionSet", body.instructionSet or "unknown")
metrics_provider.set_property("cached", "false")
metrics_provider.put_metric("ClaudeExplainRequest", 1)
metrics_provider.put_metric("ClaudeExplainTimeout", 1)
return ExplainResponse(
Copilot review noted ClaudeExplainTimeout was being emitted for all
transient failures (connection errors, 5xx, etc.), making it impossible
to distinguish real timeout rate from upstream error rate in dashboards.

Now emits ClaudeExplainTimeout only for TimeoutError/APITimeoutError,
and ClaudeExplainTransientError for everything else.

🤖 Generated by LLM (Claude, via OpenClaw)

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 4 out of 4 changed files in this pull request and generated no new comments.

@mattgodbolt mattgodbolt merged commit 0d9ffa8 into main Jun 15, 2026
3 checks passed
@mattgodbolt mattgodbolt deleted the fix/explain-503-timeout branch June 15, 2026 16:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants