Skip to content

Add exponential backoff retry logic for AI service failures#125

Open
Copilot wants to merge 5 commits intomainfrom
copilot/fix-ai-service-error-handling
Open

Add exponential backoff retry logic for AI service failures#125
Copilot wants to merge 5 commits intomainfrom
copilot/fix-ai-service-error-handling

Conversation

Copy link
Contributor

Copilot AI commented Jan 9, 2026

AI service calls (Gemini, Hugging Face) fail silently on transient errors (rate limits, network timeouts), returning empty or fallback responses without retry attempts.

Changes

Core retry infrastructure

  • backend/retry_utils.py: Exponential backoff decorator with configurable retry parameters (default: 3 retries, 1s→2s→4s delays)
  • Structured logging for retry attempts and final failures

AI service integration

  • Gemini services (ai_service.py, gemini_summary.py): Wrapped API calls with retry decorator
    • Action plan generation
    • Chat assistant
    • MLA summary generation
  • Hugging Face service (hf_service.py): Wrapped all detection endpoints with retry decorator
    • Vandalism, infrastructure, flooding detection

Implementation pattern

@exponential_backoff_retry(max_retries=3, base_delay=1.0, max_delay=10.0)
async def _generate_action_plan_with_retry(issue_description: str, category: str) -> dict:
    model = genai.GenerativeModel('gemini-1.5-flash')
    response = await model.generate_content_async(prompt)
    return json.loads(response.text.strip())

async def generate_action_plan(issue_description: str, category: str, image_path: Optional[str] = None) -> dict:
    try:
        return await _generate_action_plan_with_retry(issue_description, category)
    except Exception as e:
        logger.error(f"All retries exhausted: {e}", exc_info=True)
        return _get_fallback_action_plan(issue_description, category)

Testing

  • 10 unit tests covering retry timing, exhaustion, fallback patterns
  • Integration tests validating end-to-end behavior

Behavior

  • Transient failures trigger automatic retry with exponential backoff
  • After retry exhaustion, returns well-formed fallback responses (never empty/error states)
  • No changes to function signatures or response formats
Original prompt

This section details on the original issue you should resolve

<issue_title>Incomplete Error Handling for AI Service Failures</issue_title>
<issue_description>If AI APIs fail (e.g., rate limits, network issues), the application continues without fallback logic, potentially returning empty or incorrect responses.
Severity: High
Recommendation: Implement retry logic with exponential backoff and default responses for AI-dependent features.</issue_description>

<agent_instructions>monitor the assignees code quality and overalll projecct flow shouldnt break</agent_instructions>

Comments on the Issue (you are @copilot in this section)


💬 We'd love your input! Share your thoughts on Copilot coding agent in our 2 minute survey.

Copilot AI self-assigned this Jan 9, 2026
@RohanExploit RohanExploit added the ECWoC26 Elite Coders winter of Code label Jan 9, 2026
Co-authored-by: RohanExploit <178623867+RohanExploit@users.noreply.github.com>
@cloudflare-workers-and-pages
Copy link

cloudflare-workers-and-pages bot commented Jan 9, 2026

Deploying with  Cloudflare Workers  Cloudflare Workers

The latest updates on your project. Learn more about integrating Git with Workers.

Status Name Latest Commit Updated (UTC)
❌ Deployment failed
View logs
fixingbharat bd49c76 Jan 09 2026, 04:53 PM

Copilot AI and others added 3 commits January 9, 2026 16:50
Co-authored-by: RohanExploit <178623867+RohanExploit@users.noreply.github.com>
Co-authored-by: RohanExploit <178623867+RohanExploit@users.noreply.github.com>
Copilot AI changed the title [WIP] Implement retry logic for AI service failures Add exponential backoff retry logic for AI service failures Jan 9, 2026
Copilot AI requested a review from RohanExploit January 9, 2026 16:56
@RohanExploit RohanExploit marked this pull request as ready for review January 14, 2026 09:53
Copilot AI review requested due to automatic review settings January 14, 2026 09:53
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This pull request adds exponential backoff retry logic to AI service calls to handle transient failures like rate limits and network timeouts. The implementation introduces a new retry utility module with decorators that wrap existing AI service calls from Gemini and Hugging Face APIs. Upon retry exhaustion, services return well-formed fallback responses instead of failing silently.

Changes:

  • Created core retry infrastructure with exponential backoff decorator supporting configurable retry parameters
  • Integrated retry logic into all Gemini AI services (action plan generation, chat assistant, MLA summary generation)
  • Integrated retry logic into all Hugging Face detection endpoints (vandalism, infrastructure, flooding)
  • Added comprehensive unit and integration tests covering retry timing, exhaustion, and fallback patterns

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
backend/retry_utils.py New module providing async and sync exponential backoff retry decorators with structured logging
backend/ai_service.py Wrapped Gemini action plan and chat functions with retry decorators, added fallback helpers
backend/gemini_summary.py Wrapped MLA summary generation with retry decorator, added fallback response
backend/hf_service.py Wrapped Hugging Face API requests with retry decorator, improved error handling and logging
tests/test_retry_logic.py Unit tests for retry decorator covering success, failure, timing, and exception handling scenarios
tests/manual_integration_test.py Manual integration tests demonstrating retry behavior with simulated API failures

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

import asyncio
import functools
import logging
from typing import TypeVar, Callable, Optional, Tuple, Type
Copy link

Copilot AI Jan 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Import of 'Optional' is not used.

Suggested change
from typing import TypeVar, Callable, Optional, Tuple, Type
from typing import TypeVar, Callable, Tuple, Type

Copilot uses AI. Check for mistakes.
with retry logic and exponential backoff.
"""
import pytest
import asyncio
Copy link

Copilot AI Jan 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Import of 'asyncio' is not used.

Suggested change
import asyncio

Copilot uses AI. Check for mistakes.
import time
import sys
import os
from unittest.mock import AsyncMock, MagicMock, patch
Copy link

Copilot AI Jan 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Import of 'AsyncMock' is not used.
Import of 'MagicMock' is not used.
Import of 'patch' is not used.

Copilot uses AI. Check for mistakes.
@RohanExploit RohanExploit requested a review from Copilot January 14, 2026 10:20
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 6 out of 6 changed files in this pull request and generated 3 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +6 to +8
from retry_utils import exponential_backoff_retry
import logging
import base64
Copy link

Copilot AI Jan 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The base64 import at module level is unnecessary since it's only used within the _make_request_with_retry function. Moving it there (line 34) would be more appropriate, or keep it at module level and remove the duplicate import location.

Copilot uses AI. Check for mistakes.
from async_lru import alru_cache
from retry_utils import exponential_backoff_retry
import logging
import json
Copy link

Copilot AI Jan 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The json import at module level is redundant since it was already used in the original code within the try block. Consider keeping it at module level throughout for consistency.

Suggested change
import json

Copilot uses AI. Check for mistakes.
Comment on lines +95 to +97
assert 0.08 < delay1 < 0.15
assert 0.18 < delay2 < 0.25
assert 0.38 < delay3 < 0.45
Copy link

Copilot AI Jan 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The timing tolerances (±0.02-0.05s) may be too strict for CI/CD environments with variable load. Consider widening these ranges or using a more flexible approach to verify exponential growth pattern rather than absolute timing values.

Copilot uses AI. Check for mistakes.
@RohanExploit RohanExploit removed the ECWoC26 Elite Coders winter of Code label Jan 21, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Incomplete Error Handling for AI Service Failures

2 participants