Skip to content

Commit 9af0f30

Browse files
authored
Merge pull request #211 from AutoForgeAI/fix/rate-limit-event-crash
fix: handle rate_limit_event crash in chat sessions
2 parents d65fa0c + 49442f0 commit 9af0f30

File tree

15 files changed

+424
-168
lines changed

15 files changed

+424
-168
lines changed

.claude/commands/review-pr.md

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -55,10 +55,10 @@ Pull request(s): $ARGUMENTS
5555
- Reviewing large, unfocused PRs is impractical and error-prone; the review cannot provide adequate assurance for such changes
5656

5757
6. **Vision Alignment Check**
58-
- Read the project's README.md and CLAUDE.md to understand the application's core purpose
59-
- Assess whether this PR aligns with the application's intended functionality
60-
- If the changes deviate significantly from the core vision or add functionality that doesn't serve the application's purpose, note this in the review
61-
- This is not a blocker, but should be flagged for the reviewer's consideration
58+
- **VISION.md protection**: First, check whether the PR diff modifies `VISION.md` in any way (edits, deletions, renames). If it does, **stop the review immediately** — verdict is **DON'T MERGE**. VISION.md is immutable and no PR is permitted to alter it. Explain this to the user and skip all remaining steps.
59+
- Read the project's `VISION.md`, `README.md`, and `CLAUDE.md` to understand the application's core purpose and mandatory architectural constraints
60+
- Assess whether this PR aligns with the vision defined in `VISION.md`
61+
- **Vision deviation is a merge blocker.** If the PR introduces functionality, integrations, or architectural changes that conflict with `VISION.md`, the verdict must be **DON'T MERGE**. This is not negotiable — the vision document takes precedence over any PR rationale.
6262

6363
7. **Safety Assessment**
6464
- Provide a review on whether the PR is safe to merge as-is

.claude/launch.json

Lines changed: 18 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,18 @@
1+
{
2+
"version": "0.0.1",
3+
"configurations": [
4+
{
5+
"name": "backend",
6+
"runtimeExecutable": "python",
7+
"runtimeArgs": ["-m", "uvicorn", "server.main:app", "--host", "127.0.0.1", "--port", "8888", "--reload"],
8+
"port": 8888
9+
},
10+
{
11+
"name": "frontend",
12+
"runtimeExecutable": "cmd",
13+
"runtimeArgs": ["/c", "cd ui && npx vite"],
14+
"port": 5173
15+
}
16+
],
17+
"autoVerify": true
18+
}

VISION.md

Lines changed: 22 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,22 @@
1+
# VISION
2+
3+
This document defines the mandatory project vision for AutoForge. All contributions must align with these principles. PRs that deviate from this vision will be rejected. This file itself is immutable via PR — any PR that modifies VISION.md will be rejected outright.
4+
5+
## Claude Agent SDK Exclusivity
6+
7+
AutoForge is a wrapper around the **Claude Agent SDK**. This is a foundational architectural decision, not a preference.
8+
9+
**What this means:**
10+
11+
- AutoForge only supports providers, models, and integrations that work through the Claude Agent SDK.
12+
- We will not integrate with, accommodate, or add support for other AI SDKs, CLIs, or coding agent platforms (e.g., Codex, OpenCode, Aider, Continue, Cursor agents, or similar tools).
13+
14+
**Why:**
15+
16+
Each platform has its own approach to MCP tools, skills, context management, and feature integration. Attempting to support multiple agent frameworks creates an unsustainable maintenance burden and dilutes the quality of the core experience. By committing to the Claude Agent SDK exclusively, we can build deep, reliable integration rather than shallow compatibility across many targets.
17+
18+
**In practice:**
19+
20+
- PRs adding support for non-Claude agent frameworks will be rejected.
21+
- PRs introducing abstractions designed to make AutoForge "agent-agnostic" will be rejected.
22+
- Alternative API providers (e.g., Vertex AI, AWS Bedrock) are acceptable only when accessed through the Claude Agent SDK's own configuration.

package.json

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
{
22
"name": "autoforge-ai",
3-
"version": "0.1.13",
3+
"version": "0.1.14",
44
"description": "Autonomous coding agent with web UI - build complete apps with AI",
55
"license": "AGPL-3.0",
66
"bin": {

requirements-prod.txt

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
# Production runtime dependencies only
22
# For development, use requirements.txt (includes ruff, mypy, pytest)
3-
claude-agent-sdk>=0.1.0,<0.2.0
3+
claude-agent-sdk>=0.1.39,<0.2.0
44
python-dotenv>=1.0.0
55
sqlalchemy>=2.0.0
66
fastapi>=0.115.0

requirements.txt

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
claude-agent-sdk>=0.1.0,<0.2.0
1+
claude-agent-sdk>=0.1.39,<0.2.0
22
python-dotenv>=1.0.0
33
sqlalchemy>=2.0.0
44
fastapi>=0.115.0

server/services/assistant_chat_session.py

Lines changed: 64 additions & 31 deletions
Original file line numberDiff line numberDiff line change
@@ -7,6 +7,7 @@
77
but cannot modify any files.
88
"""
99

10+
import asyncio
1011
import json
1112
import logging
1213
import os
@@ -25,7 +26,12 @@
2526
create_conversation,
2627
get_messages,
2728
)
28-
from .chat_constants import ROOT_DIR
29+
from .chat_constants import (
30+
MAX_CHAT_RATE_LIMIT_RETRIES,
31+
ROOT_DIR,
32+
calculate_rate_limit_backoff,
33+
check_rate_limit_error,
34+
)
2935

3036
# Load environment variables from .env file if present
3137
load_dotenv()
@@ -393,39 +399,66 @@ async def _query_claude(self, message: str) -> AsyncGenerator[dict, None]:
393399

394400
full_response = ""
395401

396-
# Stream the response
397-
async for msg in self.client.receive_response():
398-
msg_type = type(msg).__name__
399-
400-
if msg_type == "AssistantMessage" and hasattr(msg, "content"):
401-
for block in msg.content:
402-
block_type = type(block).__name__
403-
404-
if block_type == "TextBlock" and hasattr(block, "text"):
405-
text = block.text
406-
if text:
407-
full_response += text
408-
yield {"type": "text", "content": text}
409-
410-
elif block_type == "ToolUseBlock" and hasattr(block, "name"):
411-
tool_name = block.name
412-
tool_input = getattr(block, "input", {})
402+
# Stream the response (with rate-limit retry)
403+
for _attempt in range(MAX_CHAT_RATE_LIMIT_RETRIES + 1):
404+
try:
405+
async for msg in self.client.receive_response():
406+
msg_type = type(msg).__name__
407+
408+
if msg_type == "AssistantMessage" and hasattr(msg, "content"):
409+
for block in msg.content:
410+
block_type = type(block).__name__
411+
412+
if block_type == "TextBlock" and hasattr(block, "text"):
413+
text = block.text
414+
if text:
415+
full_response += text
416+
yield {"type": "text", "content": text}
417+
418+
elif block_type == "ToolUseBlock" and hasattr(block, "name"):
419+
tool_name = block.name
420+
tool_input = getattr(block, "input", {})
421+
422+
# Intercept ask_user tool calls -> yield as question message
423+
if tool_name == "mcp__features__ask_user":
424+
questions = tool_input.get("questions", [])
425+
if questions:
426+
yield {
427+
"type": "question",
428+
"questions": questions,
429+
}
430+
continue
413431

414-
# Intercept ask_user tool calls -> yield as question message
415-
if tool_name == "mcp__features__ask_user":
416-
questions = tool_input.get("questions", [])
417-
if questions:
418432
yield {
419-
"type": "question",
420-
"questions": questions,
433+
"type": "tool_call",
434+
"tool": tool_name,
435+
"input": tool_input,
421436
}
422-
continue
423-
424-
yield {
425-
"type": "tool_call",
426-
"tool": tool_name,
427-
"input": tool_input,
428-
}
437+
# Completed successfully — break out of retry loop
438+
break
439+
except Exception as exc:
440+
is_rate_limit, retry_secs = check_rate_limit_error(exc)
441+
if is_rate_limit and _attempt < MAX_CHAT_RATE_LIMIT_RETRIES:
442+
delay = retry_secs if retry_secs else calculate_rate_limit_backoff(_attempt)
443+
logger.warning(f"Rate limited (attempt {_attempt + 1}/{MAX_CHAT_RATE_LIMIT_RETRIES}), retrying in {delay}s")
444+
yield {
445+
"type": "rate_limited",
446+
"retry_in": delay,
447+
"attempt": _attempt + 1,
448+
"max_attempts": MAX_CHAT_RATE_LIMIT_RETRIES,
449+
}
450+
await asyncio.sleep(delay)
451+
await self.client.query(message)
452+
continue
453+
if is_rate_limit:
454+
logger.error("Rate limit retries exhausted for assistant chat")
455+
yield {"type": "error", "content": "Rate limited. Please try again later."}
456+
return
457+
# Non-rate-limit MessageParseError: log and break (don't crash)
458+
if type(exc).__name__ == "MessageParseError":
459+
logger.warning(f"Ignoring unrecognized message from Claude CLI: {exc}")
460+
break
461+
raise
429462

430463
# Store the complete response in the database
431464
if full_response and self.conversation_id:

server/services/chat_constants.py

Lines changed: 40 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -9,6 +9,7 @@
99
imports (``from .chat_constants import API_ENV_VARS``) continue to work.
1010
"""
1111

12+
import logging
1213
import sys
1314
from pathlib import Path
1415
from typing import AsyncGenerator
@@ -32,6 +33,45 @@
3233
# imports continue to work unchanged.
3334
# -------------------------------------------------------------------
3435
from env_constants import API_ENV_VARS # noqa: E402, F401
36+
from rate_limit_utils import calculate_rate_limit_backoff, is_rate_limit_error, parse_retry_after # noqa: E402, F401
37+
38+
logger = logging.getLogger(__name__)
39+
40+
# -------------------------------------------------------------------
41+
# Rate-limit handling for chat sessions
42+
# -------------------------------------------------------------------
43+
MAX_CHAT_RATE_LIMIT_RETRIES = 3
44+
45+
46+
def check_rate_limit_error(exc: Exception) -> tuple[bool, int | None]:
47+
"""Inspect an exception and determine if it represents a rate-limit.
48+
49+
Returns ``(is_rate_limit, retry_seconds)``. ``retry_seconds`` is the
50+
parsed Retry-After value when available, otherwise ``None`` (caller
51+
should use exponential backoff).
52+
53+
Handles:
54+
- ``MessageParseError`` whose raw *data* dict has
55+
``type == "rate_limit_event"`` (Claude CLI sends this).
56+
- Any exception whose string representation matches known rate-limit
57+
patterns (via ``rate_limit_utils.is_rate_limit_error``).
58+
"""
59+
exc_str = str(exc)
60+
61+
# Check for MessageParseError with a rate_limit_event payload
62+
cls_name = type(exc).__name__
63+
if cls_name == "MessageParseError":
64+
raw_data = getattr(exc, "data", None)
65+
if isinstance(raw_data, dict) and raw_data.get("type") == "rate_limit_event":
66+
retry = parse_retry_after(str(raw_data)) if raw_data else None
67+
return True, retry
68+
69+
# Fallback: match error text against known rate-limit patterns
70+
if is_rate_limit_error(exc_str):
71+
retry = parse_retry_after(exc_str)
72+
return True, retry
73+
74+
return False, None
3575

3676

3777
async def make_multimodal_message(content_blocks: list[dict]) -> AsyncGenerator[dict, None]:

server/services/expand_chat_session.py

Lines changed: 67 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -22,7 +22,13 @@
2222
from dotenv import load_dotenv
2323

2424
from ..schemas import ImageAttachment
25-
from .chat_constants import ROOT_DIR, make_multimodal_message
25+
from .chat_constants import (
26+
MAX_CHAT_RATE_LIMIT_RETRIES,
27+
ROOT_DIR,
28+
calculate_rate_limit_backoff,
29+
check_rate_limit_error,
30+
make_multimodal_message,
31+
)
2632

2733
# Load environment variables from .env file if present
2834
load_dotenv()
@@ -298,24 +304,67 @@ async def _query_claude(
298304
else:
299305
await self.client.query(message)
300306

301-
# Stream the response
302-
async for msg in self.client.receive_response():
303-
msg_type = type(msg).__name__
304-
305-
if msg_type == "AssistantMessage" and hasattr(msg, "content"):
306-
for block in msg.content:
307-
block_type = type(block).__name__
308-
309-
if block_type == "TextBlock" and hasattr(block, "text"):
310-
text = block.text
311-
if text:
312-
yield {"type": "text", "content": text}
313-
314-
self.messages.append({
315-
"role": "assistant",
316-
"content": text,
317-
"timestamp": datetime.now().isoformat()
307+
# Stream the response (with rate-limit retry)
308+
for _attempt in range(MAX_CHAT_RATE_LIMIT_RETRIES + 1):
309+
try:
310+
async for msg in self.client.receive_response():
311+
msg_type = type(msg).__name__
312+
313+
if msg_type == "AssistantMessage" and hasattr(msg, "content"):
314+
for block in msg.content:
315+
block_type = type(block).__name__
316+
317+
if block_type == "TextBlock" and hasattr(block, "text"):
318+
text = block.text
319+
if text:
320+
yield {"type": "text", "content": text}
321+
322+
self.messages.append({
323+
"role": "assistant",
324+
"content": text,
325+
"timestamp": datetime.now().isoformat()
326+
})
327+
# Completed successfully — break out of retry loop
328+
break
329+
except Exception as exc:
330+
is_rate_limit, retry_secs = check_rate_limit_error(exc)
331+
if is_rate_limit and _attempt < MAX_CHAT_RATE_LIMIT_RETRIES:
332+
delay = retry_secs if retry_secs else calculate_rate_limit_backoff(_attempt)
333+
logger.warning(f"Rate limited (attempt {_attempt + 1}/{MAX_CHAT_RATE_LIMIT_RETRIES}), retrying in {delay}s")
334+
yield {
335+
"type": "rate_limited",
336+
"retry_in": delay,
337+
"attempt": _attempt + 1,
338+
"max_attempts": MAX_CHAT_RATE_LIMIT_RETRIES,
339+
}
340+
await asyncio.sleep(delay)
341+
# Re-send the query before retrying receive_response
342+
if attachments and len(attachments) > 0:
343+
content_blocks_retry: list[dict[str, Any]] = []
344+
if message:
345+
content_blocks_retry.append({"type": "text", "text": message})
346+
for att in attachments:
347+
content_blocks_retry.append({
348+
"type": "image",
349+
"source": {
350+
"type": "base64",
351+
"media_type": att.mimeType,
352+
"data": att.base64Data,
353+
}
318354
})
355+
await self.client.query(make_multimodal_message(content_blocks_retry))
356+
else:
357+
await self.client.query(message)
358+
continue
359+
if is_rate_limit:
360+
logger.error("Rate limit retries exhausted for expand chat")
361+
yield {"type": "error", "content": "Rate limited. Please try again later."}
362+
return
363+
# Non-rate-limit MessageParseError: log and break (don't crash)
364+
if type(exc).__name__ == "MessageParseError":
365+
logger.warning(f"Ignoring unrecognized message from Claude CLI: {exc}")
366+
break
367+
raise
319368

320369
def get_features_created(self) -> int:
321370
"""Get the total number of features created in this session."""

0 commit comments

Comments
 (0)