Skip to content

chore: tolerate transient empty status in mq test poll loops instead of failing#2591

Closed
kristina-solovyova wants to merge 1 commit into
06-11-chore_add_slack_notification_with_execution_linkfrom
06-11-chore_tolerate_transient_empty_status_in_mq_test_poll_loops_instead_of_failing
Closed

chore: tolerate transient empty status in mq test poll loops instead of failing#2591
kristina-solovyova wants to merge 1 commit into
06-11-chore_add_slack_notification_with_execution_linkfrom
06-11-chore_tolerate_transient_empty_status_in_mq_test_poll_loops_instead_of_failing

Conversation

@kristina-solovyova

Copy link
Copy Markdown
Collaborator

No description provided.

Copy link
Copy Markdown
Collaborator Author

Warning

This pull request is not mergeable via GitHub because a downstack PR is open. Once all requirements are satisfied, merge this PR as a stack on Graphite.
Learn more


How to use the Graphite Merge Queue

Add the label main-merge-queue to this PR to add it to the merge queue.

You must have a Graphite account in order to use the merge queue. Sign up using this link.

An organization admin has required the Graphite Merge Queue in this repository.

Please do not merge from GitHub as this will restart CI on PRs being processed by the merge queue.

This stack of pull requests is managed by Graphite. Learn more about stacking.

@github-actions

github-actions Bot commented Jun 11, 2026

Copy link
Copy Markdown

Dependency Review

✅ No vulnerabilities or license issues or OpenSSF Scorecard issues found.

Scanned Files

None

@claude

claude Bot commented Jun 11, 2026

Copy link
Copy Markdown

Claude finished @kristina-solovyova's task in 47s —— View job


Code Review

  • Read PR diff against base branch
  • Analyze poll-loop logic for correctness
  • Post review

Reviewed the change to the two merge-queue poll loops (upgrade_extended_test and clients_only_test). The change is well-scoped, clearly commented, and the logic is correct. 👍

What it does well:

  • jq -r '.status // empty' correctly avoids the "null" string trap — a missing/null .status now yields "" rather than the literal null, so the -z check works as intended.
  • The consecutive_empty counter resets only on running/pending (the else branch), so a single blip mid-run won't accumulate toward the failure threshold. Correct.
  • Layered with the existing curl --retry 3 --retry-connrefused -sf, transient HTTP errors are already retried within a poll; this adds a second layer of tolerance across polls. Good defense in depth.
  • Real failure states (anything that isn't succeeded/running/pending/empty) still fail fast via the elif, and succeeded still breaks. Behavior for the non-empty cases is unchanged.

Minor observations (non-blocking):

  1. // empty conflates two distinct cases. It collapses (a) the request failing entirely and (b) a valid 200 body that simply lacks .status. If the API ever returns a well-formed body without a status for a genuinely bad execution, this loop would now retry it 5× (~5 min) before giving up rather than failing fast. Almost certainly fine given the service contract, just worth being aware of.

  2. Code duplication. The ~30-line poll block is now identical in both jobs (L262–294 and L399–431). If this pattern grows or needs another tweak, it'd be worth extracting into a small composite action or a shared script (scripts/poll-execution.sh) to keep the two copies from drifting. Not necessary for this PR.

  3. Empty polls consume the MAX_POLLS budget. An empty poll still counts toward the 60-poll timeout window. That's a reasonable trade-off (avoids unbounded loops), just noting that a flaky service eats into the effective test-wait budget.

Note: This file lives under .github/workflows/, which I can't modify due to GitHub App permissions — but this is a review-only request, so no changes are needed from me. The change looks good to merge as-is.
· 06-11-chore_tolerate_transient_empty_status_in_mq_test_poll_loops_instead_of_failing

@graphite-app

graphite-app Bot commented Jun 11, 2026

Copy link
Copy Markdown

Graphite Automations

"Add anton/matt/sergey/kristina as reviwers on operator PRs" took an action on this PR • (06/11/26)

2 reviewers were added to this PR based on Anton Bykov's automation.

@kristina-solovyova kristina-solovyova force-pushed the 06-11-chore_tolerate_transient_empty_status_in_mq_test_poll_loops_instead_of_failing branch from 74ef9a1 to 04171e7 Compare June 11, 2026 10:06
@claude

claude Bot commented Jun 11, 2026

Copy link
Copy Markdown

Claude finished @kristina-solovyova's task in 29s —— View job


Code Review — no changes since last review

This synchronize event didn't alter the PR's contents. The diff against the base branch (06-11-chore_add_slack_notification_with_execution_link) is still the single commit 04171e7e, touching only .github/workflows/dagger_ci_on_merge_queue_plan.yaml (+58/-8) — identical to what I reviewed earlier in this PR.

That review still stands: the poll-loop change is correct and well-scoped (✅ already approved by @assafgi). Nothing new to add — exiting early to avoid a duplicate review.
· 06-11-chore_tolerate_transient_empty_status_in_mq_test_poll_loops_instead_of_failing

@graphite-app

graphite-app Bot commented Jun 11, 2026

Copy link
Copy Markdown

Merge activity

  • Jun 11, 10:08 AM UTC: kristina-solovyova added this pull request to the Graphite merge queue.
  • Jun 11, 10:09 AM UTC: CI is running for this pull request on a draft pull request (#2592) due to your merge queue CI optimization settings.
  • Jun 11, 11:01 AM UTC: Merged by the Graphite merge queue via draft PR: #2592.

@graphite-app graphite-app Bot closed this Jun 11, 2026
@graphite-app graphite-app Bot deleted the 06-11-chore_tolerate_transient_empty_status_in_mq_test_poll_loops_instead_of_failing branch June 11, 2026 11:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants