Skip to content

Conversation

@claudiazi
Copy link

@claudiazi claudiazi commented Aug 8, 2025

Summary

Improve error logging for failed deferred AWS Batch jobs by displaying actual CloudWatch logs instead of generic "Trigger failed" messages.

Problem

When deferred AWS Batch jobs fail, users only see generic error messages like "Trigger failed" without the actual CloudWatch logs that show the real failure reason. This makes debugging failed batch jobs difficult as users cannot see the container logs that contain the actual error details.

Solution

Enhanced the AWSBatchOperator to:

  1. Retrieve job_id from XCom: When a deferred task resumes after trigger failure, retrieve the job_id from the existing batch_job_details XCom that's automatically created by the BatchOperator.

  2. Fetch CloudWatch logs: When TaskDeferralError occurs (trigger failure), fetch the actual CloudWatch logs using the retrieved job_id.

  3. Enhanced error messages: Include the CloudWatch logs and direct link in the error message, showing users the real failure reason instead of just "Trigger failed".

Implementation Details

  • resume_execution(): Handles trigger failures by retrieving job_id from batch_job_details XCom and fetching CloudWatch logs
  • execute_complete(): Ensures logs are fetched for successful deferred tasks before status checking
  • _fetch_and_log_cloudwatch(): Helper method that fetches CloudWatch logs and returns them for error messages
  • _format_extra_info(): Formats enhanced error messages with logs and CloudWatch links

Before vs After

Before:

  airflow.exceptions.TaskDeferralError: Trigger failure

After:

  airflow.exceptions.AirflowException: Batch job a5a7573d-b709-4fb0-a6ee-36215f601267 failed: Trigger failure
  CloudWatch Logs: https://eu-west-1.console.aws.amazon.com/cloudwatch/home?region=eu-west-1#logEventViewer:group=/aws/batch/job;stream=...
  Last log lines:
  [2025-08-08 14:54:08,994] soda_scanner - ERROR - soda scan failed. Exit status code: 2

Test Plan

@claudiazi claudiazi requested a review from a team as a code owner August 8, 2025 15:10
@claudiazi claudiazi merged commit 881c1c0 into master Aug 8, 2025
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants