To make sure our Hugging Face DLC are well tested, we need to create "integration" tests that run different kinds of training using the container. Those tests should be run automatically or on-demand. We can use Github Actions as CI for running the tests and python + docker to implement the integration tests.
Until #3 is implemented, we can use existing Containers from, e.g. transformers to run the tests. For "tests" script, i think we can use existing "examples/" from transformers or peft trl. We could structure the tests/ folder maybe into:
local/ (run on a local machine GPU),
vertex (run on Vertex)
gke (run on GKE)
Example for a test:
0. build a container
- starts a container on a GPU
- runs a training using the container (few steps)
- validates results
- stops the container
-> repeat 1-4. with other tests.
In addition to "local" tests running on GPU instances, we should also run validation tests for GKE and Vertex AI.
To make sure our Hugging Face DLC are well tested, we need to create "integration" tests that run different kinds of training using the container. Those tests should be run automatically or on-demand. We can use Github Actions as CI for running the tests and
python+ docker to implement the integration tests.Until #3 is implemented, we can use existing Containers from, e.g.
transformersto run the tests. For "tests" script, i think we can use existing "examples/" fromtransformersorpefttrl. We could structure thetests/folder maybe into:local/(run on a local machine GPU),vertex(run on Vertex)gke(run on GKE)Example for a test:
0. build a container
-> repeat 1-4. with other tests.
In addition to "local" tests running on GPU instances, we should also run validation tests for GKE and Vertex AI.