Skip to content

Checking correct error field for retry#335

Merged
knolleary merged 8 commits intomainfrom
retry-fixed-again
Mar 26, 2026
Merged

Checking correct error field for retry#335
knolleary merged 8 commits intomainfrom
retry-fixed-again

Conversation

@hardillb
Copy link
Copy Markdown
Contributor

@hardillb hardillb commented Mar 25, 2026

Description

Ensure err.code is checked not err.response.statusCode

It also ensures that the Persistent Storage PVC is only created if it doesn't exist (e.g. don't try and recreate for a suspended instance).

If the PVC is created then it will also add retry logic to the AWS EFS calls if using EFS AccessPoints to store multiple instance storage directories on single EFS (Only really used for FFC)

Related Issue(s)

Checklist

  • I have read the contribution guidelines
  • Suitable unit/system level tests have been added and they pass
  • Documentation has been updated
    • Upgrade instructions
    • Configuration details
    • Concepts
  • Changes flowforge.yml?
    • Issue/PR raised on FlowFuse/helm to update ConfigMap Template
    • Issue/PR raised on FlowFuse/CloudProject to update values for Staging/Production
  • Link to Changelog Entry PR, or note why one is not needed.

Labels

  • Includes a DB migration? -> add the area:migration label

@hardillb hardillb requested a review from knolleary March 25, 2026 10:14
@hardillb hardillb self-assigned this Mar 25, 2026
Also Fix AWS EFS AP counter to use 10,000 limit
Also add retry logic to the aws-efc AWS API calls.
Copy link
Copy Markdown
Member

@knolleary knolleary left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm nervous about the aws-efs logic that now has to query up to 9999 (rather than up to 999) endpoints just to find the one with the most space. Can we use a different heuristic that finds one with 'enough' space so we don't have to poll the whole list every time an instance is created? Or do some caching to remember the top ten with most space and only rescan if necessary?

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR updates the Kubernetes driver and AWS EFS integration to improve retry behavior and avoid unnecessary PVC recreation, aligning error handling with err.code rather than err.response.statusCode.

Changes:

  • Add async-retry dependency and use it to retry AWS EFS Describe* calls on throttling.
  • Update PVC creation flow to only create a PVC when it does not already exist.
  • Adjust Kubernetes API retry logic to check err.code for 429 responses.

Reviewed changes

Copilot reviewed 3 out of 4 changed files in this pull request and generated 3 comments.

File Description
package.json Adds async-retry dependency to support new retry behavior.
package-lock.json Locks async-retry and its transitive dependency retry.
lib/aws-efs.js Wraps EFS DescribeFileSystems / DescribeAccessPoints in retry logic for throttling cases.
kubernetes.js Skips PVC creation if it already exists; changes retry condition to use err.code.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@knolleary knolleary self-requested a review March 25, 2026 14:01
@hardillb
Copy link
Copy Markdown
Contributor Author

Tested locally and with now wrapping readNamespacedPersistentVolumeClaim we see the error get logged

{"level":"ERROR","time":"2026-03-25T14:27:05.375Z","msg":"[k8s] API call to readNamespacedPersistentVolumeClaim failed. attempt=1/11 statusCode=404  Error: HTTP-Code: 404\nMessage: Unknown API Status Code!\nBody: \"{\\\"kind\\\":\\\"Status\\\",\\\"apiVersion\\\":\\\"v1\\\",\\\"metadata\\\":{},\\\"status\\\":\\\"Failure\\\",\\\"message\\\":\\\"persistentvolumeclaims \\\\\\\"bbee7aba-fd25-471e-bdd1-e73780cc69f8-pvc\\\\\\\" not found\\\",\\\"reason\\\":\\\"NotFound\\\",\\\"details\\\":{\\\"name\\\":\\\"bbee7aba-fd25-471e-bdd1-e73780cc69f8-pvc\\\",\\\"kind\\\":\\\"persistentvolumeclaims\\\"},\\\"code\\\":404}\\n\"\nHeaders: {\"audit-id\":\"c17b9ae3-19a9-4564-9a86-0219ea60816c\",\"cache-control\":\"no-cache, private\",\"connection\":\"close\",\"content-length\":\"284\",\"content-type\":\"application/json\",\"date\":\"Wed, 25 Mar 2026 14:27:05 GMT\",\"x-kubernetes-pf-flowschema-uid\":\"3813c7af-8235-4207-aaf2-88e56242b97a\",\"x-kubernetes-pf-prioritylevel-uid\":\"f12f08f7-9332-46c3-bdc4-ef5fa4ce151b\"}"}

And it correctly created the PVC

@knolleary knolleary merged commit f09cd84 into main Mar 26, 2026
5 checks passed
@knolleary knolleary deleted the retry-fixed-again branch March 26, 2026 16:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants