Skip to content

Workers V2 #32

@cpkleynhans

Description

@cpkleynhans

The current worker infrastructure has a number of issues.

Every job requires a new build of moolloy and this has resulted in a number of failed jobs where the moolloy build has failed, this typically results in all of our workers dying.
Building moolloy has also required us to clone the entire moolloy repo for every job which is a very costly operation since the alloy repo is at least 100 MB to clone.
Initial attempts at using a seed repo to reduce the download time have been unsuccessful.

To solve these issues we will split the worker infrastructure into 2 steps.
The first step will be a build step, where a build worker will clone the repo, checkout the appropriate commit, and then upload the resulting jar file to S3.

Each commit will only be built once.

The second step will be the run step, exactly the same as the current runner except we will no longer need to build. Instead we will download the previously uploaded jar file from S3 to run.

The full workflow will be as follows:

  1. Commit hook is triggered from github to the dashboard (or a manual build is scheduled)
  2. Dashboard queues a build job to the build queue
  3. Build worker receives job from the build queue
  4. Build worker clones the moolloy repo to a temporary directory
  5. Build worker checks out the specified commit
  6. Build worker runs submodule init && submodule update
  7. Build worker runs ant deps configure dist to build moolloy
  8. Build worker uploads jar file to S3
  9. Build worker reports success to dashboard along with S3 key and hash of jar file
  10. Build worker deletes temporary directory (if everything has completed successfully, otherwise directory will remain for debugging purposes)
  11. Build worker resumes polling build queue
  12. Dashboard queues run job to the run queue (as result of build completion if CI is enabled for the model, or as a result of manual user action)
  13. Run worker receives the job from the run queue
  14. Run worker creates temporary directory
  15. Run worker downloads jar file from S3
  16. Run worker verifies file hash
  17. Run worker downloads the model from S3
  18. Run worker extracts the model
  19. Run worker executes moolloy
  20. Run worker compares results to the model results
  21. Run worker tarballs the directory and uploads it to S3
  22. Run worker reports results to dashboard
  23. Run worker deletes temporary directory
  24. Run worker resumes polling job queue

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions