Skip to content

Flare 1823 : legacy graceful shutdown handling#519

Open
swadkumar wants to merge 7 commits intomasterfrom
flare-1823-2
Open

Flare 1823 : legacy graceful shutdown handling#519
swadkumar wants to merge 7 commits intomasterfrom
flare-1823-2

Conversation

@swadkumar
Copy link
Contributor

Clever Coding Standards Agreement

JIRA

Link to JIRA

Overview

Flare was caused due to regression in ALB fargate integration causing a flurry of 502s across multiple Clever services.
This was narrowed down to change in Fargate behavior post Aug 27

This PR adds a timeout to WAG shutdown behavior handling so that we can flip to legacy spot termination behavior if required

Testing

Load test is in progress to confirm that the increased timeout helps

Rollout

This gets implemented as a wag pre-release

Rollback

(specific steps? risks?)

@swadkumar swadkumar requested a review from a team as a code owner September 4, 2025 20:55
@swadkumar swadkumar requested review from andruwm and jakegut and removed request for a team September 4, 2025 20:55
Copy link
Contributor

@andruwm andruwm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left a comment explaining

@swadkumar swadkumar requested a review from andruwm September 4, 2025 21:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants