Skip to content

CI Pipeline which builds & tests the container #4

@philschmid

Description

@philschmid

To make sure our Hugging Face DLC are well tested, we need to create "integration" tests that run different kinds of training using the container. Those tests should be run automatically or on-demand. We can use Github Actions as CI for running the tests and python + docker to implement the integration tests.

Until #3 is implemented, we can use existing Containers from, e.g. transformers to run the tests. For "tests" script, i think we can use existing "examples/" from transformers or peft trl. We could structure the tests/ folder maybe into:

  • local/ (run on a local machine GPU),
  • vertex (run on Vertex)
  • gke (run on GKE)

Example for a test:
0. build a container

  1. starts a container on a GPU
  2. runs a training using the container (few steps)
  3. validates results
  4. stops the container
    -> repeat 1-4. with other tests.

In addition to "local" tests running on GPU instances, we should also run validation tests for GKE and Vertex AI.

  • We need to implement strong CI tests, which run several tests, including training smaller models like BERT and bigger models Like Llama.
    • We should test and validate PEFT
    • Distributed Training
    • Flash attention support
  • Tests directly running on Vertex AI or GKE using vertex SDK

Metadata

Metadata

Assignees

Labels

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions