Conversation
…/prefapp/daggerverse into fix/413-fix-error-on-pr-creation
There was a problem hiding this comment.
Pull request overview
This PR addresses intermittent errors when creating GitHub PRs via the gh Dagger module by adding retry/delay behavior, and updates the consuming module to a newer pinned version of that dependency.
Changes:
- Update
update-claims-featuresto pin a newerghdaggerverse dependency revision. - Add retry loops around
gh pr createand PR-number lookup inGh.CreatePR. - Add a fixed delay before calling
CreatePRinCommitAndCreatePR.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
| update-claims-features/dagger.json | Bumps the pinned gh module revision to pick up the fix. |
| gh/main.go | Adds retry/sleep logic intended to eliminate false-negative PR creation errors. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 2 out of 2 changed files in this pull request and generated 7 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| for i := range MaxRetries { | ||
| _, err = ctr.Sync(ctx) | ||
|
|
||
| if err == nil { | ||
| break | ||
| } | ||
|
|
||
| if i == MaxRetries-1 { | ||
| return "", errors.New( | ||
| extractErrorMessage(err), | ||
| ) | ||
| } | ||
|
|
||
| fmt.Printf("Error creating PR, retrying... (%d/%d)\n", i+1, MaxRetries-1) | ||
|
|
||
| timer := time.NewTimer(WaitTimeBetweenRetries) | ||
| select { | ||
| case <-timer.C: | ||
| case <-ctx.Done(): | ||
| timer.Stop() | ||
| return "", ctx.Err() | ||
| } |
There was a problem hiding this comment.
The retry logic blindly retries all errors without distinguishing between transient failures (which should be retried) and permanent failures (which should fail immediately). For example, authentication errors, invalid repository errors, or permission errors should not be retried. Consider checking the error type or message to determine if the error is retryable before waiting and retrying.
| fmt.Println("\nCreating PR...") | ||
| for i := range MaxRetries { | ||
| _, err = ctr.Sync(ctx) | ||
|
|
||
| if err == nil { | ||
| break | ||
| } | ||
|
|
||
| if i == MaxRetries-1 { | ||
| return "", errors.New( | ||
| extractErrorMessage(err), | ||
| ) | ||
| } | ||
|
|
||
| fmt.Printf("Error creating PR, retrying... (%d/%d)\n", i+1, MaxRetries-1) | ||
|
|
||
| timer := time.NewTimer(WaitTimeBetweenRetries) | ||
| select { | ||
| case <-timer.C: | ||
| case <-ctx.Done(): | ||
| timer.Stop() | ||
| return "", ctx.Err() | ||
| } | ||
| } | ||
|
|
||
| prId, err := ctr. | ||
| WithExec([]string{ | ||
| "gh", "pr", "list", | ||
| "--head", branch, | ||
| "--json", "number", | ||
| "--jq", ".[0].number", | ||
| }). | ||
| Stdout(ctx) | ||
| var prId string | ||
| fmt.Println("Getting PR ID...") | ||
| for i := range MaxRetries { | ||
| prId, err = ctr. | ||
| WithExec([]string{ | ||
| "gh", "pr", "list", | ||
| "--head", branch, | ||
| "--json", "number", | ||
| "--jq", ".[0].number", | ||
| }). | ||
| Stdout(ctx) | ||
|
|
||
| if err != nil { | ||
| return "", errors.New( | ||
| extractErrorMessage(err), | ||
| ) | ||
| // Check if the prId is a valid integer (which means we got the PR ID successfully). If not, we retry. | ||
| _, convErr := strconv.Atoi(strings.TrimSpace(prId)) | ||
|
|
||
| if err == nil && convErr == nil { | ||
| break | ||
| } | ||
|
|
||
| if i == MaxRetries-1 { | ||
| return "", errors.New( | ||
| extractErrorMessage(err), | ||
| ) | ||
| } | ||
|
|
||
| fmt.Printf("Error getting PR ID, retrying... (%d/%d)\n", i+1, MaxRetries-1) |
There was a problem hiding this comment.
Using fmt.Println and fmt.Printf for logging in a library module is not ideal as it bypasses the caller's control over logging output and format. These print statements will always output to stdout, which may not be desired in all contexts. Consider either removing these debug statements for production code, or if logging is necessary, use a structured logging approach that can be configured by the caller (e.g., passing a logger interface).
| timer := time.NewTimer(WaitTimeBetweenRetries) | ||
| select { | ||
| case <-timer.C: | ||
| case <-ctx.Done(): | ||
| timer.Stop() | ||
| return "", ctx.Err() | ||
| } |
There was a problem hiding this comment.
The retry logic uses a fixed 2-second delay between retries. For handling timing issues with the GitHub API, an exponential backoff strategy would be more appropriate and efficient. This would reduce unnecessary load on the GitHub API and likely succeed faster in cases where GitHub needs more time to process the request. Consider implementing exponential backoff (e.g., 2s, 4s, 8s, 16s, 32s) instead of a fixed delay.
| for i := range MaxRetries { | ||
| _, err = ctr.Sync(ctx) | ||
|
|
||
| if err == nil { | ||
| break | ||
| } | ||
|
|
||
| if i == MaxRetries-1 { | ||
| return "", errors.New( | ||
| extractErrorMessage(err), | ||
| ) | ||
| } | ||
|
|
||
| fmt.Printf("Error creating PR, retrying... (%d/%d)\n", i+1, MaxRetries-1) | ||
|
|
||
| timer := time.NewTimer(WaitTimeBetweenRetries) | ||
| select { | ||
| case <-timer.C: | ||
| case <-ctx.Done(): | ||
| timer.Stop() | ||
| return "", ctx.Err() | ||
| } | ||
| } | ||
|
|
||
| prId, err := ctr. | ||
| WithExec([]string{ | ||
| "gh", "pr", "list", | ||
| "--head", branch, | ||
| "--json", "number", | ||
| "--jq", ".[0].number", | ||
| }). | ||
| Stdout(ctx) | ||
| var prId string | ||
| fmt.Println("Getting PR ID...") | ||
| for i := range MaxRetries { | ||
| prId, err = ctr. | ||
| WithExec([]string{ | ||
| "gh", "pr", "list", | ||
| "--head", branch, | ||
| "--json", "number", | ||
| "--jq", ".[0].number", | ||
| }). | ||
| Stdout(ctx) | ||
|
|
||
| if err != nil { | ||
| return "", errors.New( | ||
| extractErrorMessage(err), | ||
| ) | ||
| // Check if the prId is a valid integer (which means we got the PR ID successfully). If not, we retry. | ||
| _, convErr := strconv.Atoi(strings.TrimSpace(prId)) | ||
|
|
||
| if err == nil && convErr == nil { | ||
| break | ||
| } | ||
|
|
||
| if i == MaxRetries-1 { | ||
| return "", errors.New( | ||
| extractErrorMessage(err), | ||
| ) | ||
| } | ||
|
|
||
| fmt.Printf("Error getting PR ID, retrying... (%d/%d)\n", i+1, MaxRetries-1) | ||
|
|
||
| timer := time.NewTimer(WaitTimeBetweenRetries) | ||
| select { | ||
| case <-timer.C: | ||
| case <-ctx.Done(): | ||
| timer.Stop() | ||
| return "", ctx.Err() | ||
| } | ||
| } |
There was a problem hiding this comment.
The retry logic is duplicated between the PR creation and PR ID retrieval sections. This code duplication makes maintenance harder and increases the risk of inconsistencies. Consider extracting the retry logic into a reusable helper function that accepts a function to execute and returns its result, similar to: retryWithBackoff(ctx context.Context, operation func() error, maxRetries int, waitTime time.Duration) error. This would also make it easier to improve the retry strategy (e.g., adding exponential backoff) in one place.
| timer := time.NewTimer(WaitTimeBetweenRetries) | ||
| select { | ||
| case <-timer.C: | ||
| case <-ctx.Done(): | ||
| timer.Stop() | ||
| return "", ctx.Err() | ||
| } |
There was a problem hiding this comment.
The timer should be stopped after the select statement completes to free resources, regardless of which case is triggered. Currently, the timer is only stopped when the context is cancelled. If the timer fires normally (case <-timer.C), the timer resources are not explicitly cleaned up. While Go's garbage collector will eventually handle this, it's best practice to explicitly stop timers that are no longer needed. Add timer.Stop() after the select statement or use defer timer.Stop() right after creating the timer.
| timer := time.NewTimer(WaitTimeBetweenRetries) | ||
| select { | ||
| case <-timer.C: | ||
| case <-ctx.Done(): | ||
| timer.Stop() | ||
| return "", ctx.Err() | ||
| } |
There was a problem hiding this comment.
The timer should be stopped after the select statement completes to free resources, regardless of which case is triggered. Currently, the timer is only stopped when the context is cancelled. If the timer fires normally (case <-timer.C), the timer resources are not explicitly cleaned up. While Go's garbage collector will eventually handle this, it's best practice to explicitly stop timers that are no longer needed. Add timer.Stop() after the select statement or use defer timer.Stop() right after creating the timer.
| break | ||
| } | ||
|
|
||
| if i == MaxRetries-1 { |
There was a problem hiding this comment.
When extractErrorMessage is called at line 401, if the error that caused the retry loop to fail was actually convErr (the strconv.Atoi error) and not err, then the wrong error message will be extracted. The code should capture and handle both error conditions. If convErr is not nil but err is nil, the error message should reflect the conversion failure rather than calling extractErrorMessage on a potentially nil err.
| if i == MaxRetries-1 { | |
| if i == MaxRetries-1 { | |
| if err == nil && convErr != nil { | |
| return "", convErr | |
| } |
Closes #413