Skip to content

Microservices single Job execution RFC

Igor Bruev edited this page Jun 27, 2024 · 2 revisions

Problem Statement:

Microservices must execute only one job at a time.

Microservices must not have any job management logic and persistence (especially in-memory persistence).

Reasons:

  1. We do not bloat microservices with business logic and persistence logic.
  2. Sometimes it is hard to understand which job is running in a particular microservice. Job execution may contain multiple processes running multiple binaries (for example, REINVENT4 uses multiple autodock vina instances running, and it is hard to determine programmatically which process belongs to job1 or job2).
  3. Microservices that contain multiple jobs must have some level of synchronization between FastAPI workers (which leads to statement 1).
  4. Microservices that contain multiple jobs must have persistent storage (must be deployed outside of the container to satisfy 12-factor app practices).
  5. Due to common cloud development practices, there is a 1 app - 1 process - 1 container relation. All side services/processes must be deployed to another container or side-car.
  6. A parallel computing framework would be hard to design and implement. For example, Parsl, since there would be another level of container-job management and we cannot map job-id to container-id directly.
  7. Containers that run multiple jobs are harder to mock for testing purposes.

Suggested Container API:

The container process can have a persistent in-memory current JobId (GUID) that must be cleared using the API method /api/v1/main/state/clear. There can be only one JobId and state assigned to a container. When the container is just started, it does not have a JobId and state. When the container FastAPI process shuts down, the JobId and state are considered lost. The reason why the container must still have a JobId is to ensure that we are running/getting the result state of the same job that we started initially.

POST /api/v1/main/state/clear

Clears the container state.

GET /api/v1/main/state

Response body:

json Copy code { "jobId": "Optional[UUID]", "status": { "oneOf": [ "running", "idle" ] } } POST /api/v1/main/start/{jobId}

Payload:

Any

Response body:

Any

POST /api/v1/main/stop

Any other methods

Clone this wiki locally