Bound Claude call to a deadline under the API Gateway 30s limit#19
Merged
Conversation
Users report intermittent 503s from the explain endpoint. The Lambda itself is healthy (Throttles=0, Errors=0) and the app only ever logs 200/307. The 503s come from API Gateway: its HTTP API integration timeout is a hard 30s ceiling, but the Lambda timeout was raised to 60s. Requests whose generation runs past 30s (218 over the last 4 days, up to ~40s) are severed by the gateway with an opaque 503 while the Lambda keeps running to completion — billing tokens we never deliver. Bound each Claude call to a wall-clock budget (default 27s, configurable via ANTHROPIC_TIMEOUT_SECONDS) that sits comfortably below the gateway ceiling, and set the SDK per-attempt timeout and retry budget to match. On timeout or a transient upstream failure, return a structured, retryable error response (status="error", ClaudeExplainTimeout metric) instead of letting the gateway emit a 503. Genuine client errors (e.g. a 400) still propagate as real failures. 🤖 Generated by LLM (Claude, via OpenClaw) Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Member
|
Cc @dkm I had this branch hanging around...will be investigating other ways to speed up. Well, Claude will |
There was a problem hiding this comment.
Pull request overview
This pull request mitigates intermittent API Gateway–originated 503s on the /explain endpoint by ensuring Claude generation work is bounded to a wall-clock budget below the gateway’s fixed 30s integration timeout, returning a structured error response instead of letting the gateway sever the connection.
Changes:
- Added configurable Anthropic timeout and retry settings (
anthropic_timeout_seconds,anthropic_max_retries) with a default 27s budget. - Wrapped
client.messages.create(...)with anasyncio.timeout(...)wall-clock deadline and converted timeout/transient upstream failures into a structuredstatus="error"response plus metrics. - Updated FastAPI app startup to configure the Anthropic client timeout/retries and threaded the deadline through request processing; added a test covering the deadline path.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
| app/config.py | Introduces Anthropic timeout/retry settings to keep requests under API Gateway limits. |
| app/explain.py | Enforces a wall-clock deadline around Claude calls and returns structured retryable errors on transient failures. |
| app/main.py | Configures AsyncAnthropic with timeout/retries and passes the deadline into process_request. |
| app/test_explain.py | Adds an async test ensuring overrunning the deadline returns a structured error response. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Comment on lines
+92
to
+96
| message_text = ( | ||
| "Claude Explain could not generate an explanation in time. " | ||
| "This usually means the model was busy or the input was very large; " | ||
| "please try again in a moment." | ||
| ) |
Copilot review noted the original single message ('could not generate an
explanation in time') was inaccurate for non-timeout failures like connection
errors or transient 5xx responses. Now returns a timeout-specific message for
TimeoutError/APITimeoutError and a generic 'temporarily unavailable' message
for everything else.
🤖 Generated by LLM (Claude, via OpenClaw)
Comment on lines
+92
to
+106
| if isinstance(error, (TimeoutError, APITimeoutError)): | ||
| message_text = ( | ||
| "Claude Explain took too long to respond — the input may be very large " | ||
| "or the model is under heavy load. Please try again in a moment." | ||
| ) | ||
| else: | ||
| message_text = "Claude Explain is temporarily unavailable. Please try again in a moment." | ||
| LOGGER.warning("Anthropic call failed (%s): %s", type(error).__name__, error) | ||
| metrics_provider.set_property("language", body.language) | ||
| metrics_provider.set_property("compiler", body.compiler) | ||
| metrics_provider.set_property("instructionSet", body.instructionSet or "unknown") | ||
| metrics_provider.set_property("cached", "false") | ||
| metrics_provider.put_metric("ClaudeExplainRequest", 1) | ||
| metrics_provider.put_metric("ClaudeExplainTimeout", 1) | ||
| return ExplainResponse( |
Copilot review noted ClaudeExplainTimeout was being emitted for all transient failures (connection errors, 5xx, etc.), making it impossible to distinguish real timeout rate from upstream error rate in dashboards. Now emits ClaudeExplainTimeout only for TimeoutError/APITimeoutError, and ClaudeExplainTransientError for everything else. 🤖 Generated by LLM (Claude, via OpenClaw)
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
(I'm Molty, an AI assistant acting on behalf of @mattgodbolt)
Problem
Users report intermittent 503s from the
/explainendpoint. The Lambda itself is completely healthy (Throttles=0, Errors=0) and the app only ever logs 200/307. The 503s come from API Gateway: its HTTP API integration timeout is a hard 30s ceiling that cannot be raised, but the Lambda timeout was set to 60s.Requests whose generation runs past 30s are severed by the gateway with an opaque
{"message":"Service Unavailable"}(33 bytes) while the Lambda keeps running to completion — billing tokens we never deliver.Quantified impact (CloudWatch, 2026-06-15):
Fix
Bound each Claude call to a wall-clock budget (default 27s, configurable via
ANTHROPIC_TIMEOUT_SECONDS) that sits comfortably below the gateway ceiling. On timeout or a transient upstream failure, return a structured, retryable error response (status="error",ClaudeExplainTimeoutmetric) instead of letting the gateway emit a 503.Changes:
config.py: newanthropic_timeout_seconds(27.0) andanthropic_max_retries(2) settingsexplain.py: wrapclient.messages.createinasyncio.timeout(deadline_seconds), catchTimeoutError/APITimeoutError/APIConnectionErrorand transient HTTP errors (408/429/5xx/529), return a user-friendly error responsemain.py: pass timeout/max_retries toAsyncAnthropicclient and threaddeadline_secondsthrough toprocess_requesttest_explain.py: new test covering the deadline path🤖 Generated by LLM (Claude, via OpenClaw)