Currently a single request to the dashboard failing will cause a job to fail.
This is not ideal. We should perform a few retries with exponential back-off before failing.
In the case of completion requests we can do even better, the manager should keep a list of the completion requests that need to be sent and retry over longer intervals. This ensures that our work is not lost.
Currently a single request to the dashboard failing will cause a job to fail.
This is not ideal. We should perform a few retries with exponential back-off before failing.
In the case of completion requests we can do even better, the manager should keep a list of the completion requests that need to be sent and retry over longer intervals. This ensures that our work is not lost.