Skip to content

Implement retry mechanism for failed analysis jobs #33

@coderabbitai

Description

@coderabbitai

Problem

The database schema includes retry fields (retry_count, max_retries) but the worker doesn't implement retry logic:

  • Worker immediately marks jobs as FAILED without checking retry count
  • No exponential backoff between retries
  • BullMQ built-in retry not configured
  • TIMEOUT state in schema is unused

Action Items

  1. Worker-level retries:

    • Check retry_count < max_retries before marking FAILED
    • Increment retry_count and re-enqueue job with backoff
    • Only mark FAILED when max retries exceeded
  2. BullMQ configuration:

    • Configure job attempts in queue options
    • Add backoff strategy (exponential)
    • Set job timeout to use TIMEOUT state
  3. Error classification:

    • Distinguish transient (network) vs permanent (invalid data) errors
    • Only retry transient errors
    • Fail fast for permanent errors

Example Implementation

const analysisWorker = new Worker(
  'analysis',
  async (job) => { /* ... */ },
  {
    connection: createBullMQConnection(),
    settings: {
      backoffStrategy: (attemptsMade) => Math.min(1000 * 2 ** attemptsMade, 60000),
    }
  }
);

// In job options when enqueuing:
await analysisQueue.add('analyze', data, {
  attempts: 3,
  timeout: 300000, // 5 minutes
  backoff: { type: 'exponential', delay: 2000 }
});

Files

  • backend/worker/analysis.worker.js
  • backend/controllers/webhook/handleWebhook.js
  • backend/prisma/schema.prisma

Related

Requested by: @yb175

Metadata

Metadata

Assignees

Labels

No labels
No labels

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions