Workers V2

The current worker infrastructure has a number of issues.

Every job requires a new build of moolloy and this has resulted in a number of failed jobs where the moolloy build has failed, this typically results in all of our workers dying.
Building moolloy has also required us to clone the entire moolloy repo for every job which is a very costly operation since the alloy repo is at least 100 MB to clone.
Initial attempts at using a seed repo to reduce the download time have been unsuccessful.

To solve these issues we will split the worker infrastructure into 2 steps.
The first step will be a build step, where a build worker will clone the repo, checkout the appropriate commit, and then upload the resulting jar file to S3.

Each commit will only be built once.

The second step will be the run step, exactly the same as the current runner except we will no longer need to build. Instead we will download the previously uploaded jar file from S3 to run.

The full workflow will be as follows:
1. Commit hook is triggered from github to the dashboard (or a manual build is scheduled)
2. Dashboard queues a build job to the build queue
3. Build worker receives job from the build queue
4. Build worker clones the moolloy repo to a temporary directory
5. Build worker checks out the specified commit
6. Build worker runs `submodule init && submodule update`
7. Build worker runs `ant deps configure dist` to build moolloy
8. Build worker uploads jar file to S3
9. Build worker reports success to dashboard along with S3 key and hash of jar file
10. Build worker deletes temporary directory (if everything has completed successfully, otherwise directory will remain for debugging purposes)
11. Build worker resumes polling build queue
12. Dashboard queues run job to the run queue (as result of build completion if CI is enabled for the model, or as a result of manual user action)
13. Run worker receives the job from the run queue
14. Run worker creates temporary directory
15. Run worker downloads jar file from S3
16. Run worker verifies file hash
17. Run worker downloads the model from S3
18. Run worker extracts the model
19. Run worker executes moolloy
20. Run worker compares results to the model results
21. Run worker tarballs the directory and uploads it to S3
22. Run worker reports results to dashboard
23. Run worker deletes temporary directory
24. Run worker resumes polling job queue


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Workers V2 #32

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Workers V2 #32

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions