Add CUDA image variants and smoke tests#33
Conversation
|
| GitGuardian id | GitGuardian status | Secret | Commit | Filename | |
|---|---|---|---|---|---|
| 32875495 | Triggered | Generic Password | a45066a | scripts/test_uid_create_container.py | View secret |
🛠 Guidelines to remediate hardcoded secrets
- Understand the implications of revoking this secret by investigating where it is used in your code.
- Replace and store your secret safely. Learn here the best practices.
- Revoke and rotate this secret.
- If possible, rewrite git history. Rewriting git history is not a trivial act. You might completely break other contributing developers' workflow and you risk accidentally deleting legitimate data.
To avoid such incidents in the future consider
- following these best practices for managing and storing secrets including API keys and other credentials
- install secret detection on pre-commit to catch secret before it leaves your machine and ease remediation.
🦉 GitGuardian detects secrets in your source code to help developers and security teams secure the modern development process. You are seeing this because you or someone else with access to this repository has authorized GitGuardian to scan your pull request.
There was a problem hiding this comment.
Pull request overview
This PR re-architects the DECS Docker image build to support multiple CUDA/TensorFlow variants, replacing the single-image flow built on tensorflow/tensorflow:2.18.0-gpu with a parameterized Dockerfile driven by image-variants.json. It adds Miniforge/JupyterLab/noVNC runtime support, refactors the entrypoint with driver/CUDA compatibility checks and an opt-in VNC stack, updates the GitHub Actions workflow to fan out builds via a matrix, and introduces Python helpers plus Ansible playbooks for building and smoke-testing the variants on remote GPU hosts.
Changes:
- Parameterized Dockerfile +
image-variants.json+ matrix-based GitHub Actions workflow for CUDA 11.8/12.2/12.5/12.8 builds. - New
entrypoint.shwith image runtime info, optionalSTRICT_CUDA_COMPATdriver check, and TigerVNC/noVNC startup. - New
scripts/(build/test/uid helpers) andtests/ansible/playbooks (build + smoke), with documentation rewritten inREADME.md.
Reviewed changes
Copilot reviewed 12 out of 13 changed files in this pull request and generated 6 comments.
Show a summary per file
| File | Description |
|---|---|
.dockerignore |
Excludes git, scripts, tests, and README from the build context. |
.github/workflows/docker-publish.yml |
Adds a prepare job that emits a build matrix and updates build-and-push to consume per-variant build args. |
.gitignore |
Ignores Python bytecode artifacts. |
Dockerfile |
Switches to nvidia/cuda base, parameterizes CUDA/TF/Python via build args, installs Miniforge/JupyterLab/VNC stack. |
README.md |
Documents the variant matrix, build/test commands, VNC env vars, and admin notes. |
entrypoint.sh |
Adds image runtime banner, driver compatibility gate, VNC/noVNC startup, and uses $JUPYTER_BIN/$CONDA_DIR. |
image-variants.json |
Declares the four supported CUDA/TF variants with aliases and minimum driver versions. |
scripts/build_variants.py |
CLI to build (and optionally push) variants from the manifest. |
scripts/test_image_variants.py |
Local CPU/GPU smoke runner per variant. |
scripts/test_uid_create_container.py |
Drives ~/uid/script_test/create_container.sh for each variant. |
scripts/variant_matrix.py |
Loads the manifest, validates uniqueness, and emits a GitHub Actions matrix. |
tests/ansible/decs_image_build.yml |
Uploads build context tarball and runs docker build on remote hosts. |
tests/ansible/decs_image_smoke.yml |
Runs the image with create_container.sh-style env, validates GPU/TF/Jupyter/noVNC. |
Comments suppressed due to low confidence (1)
entrypoint.sh:12
is_truthyacceptstrue/TRUE/1/yes/YES/on/ONbut not the common Python-styleTrue/False. Since environment variables in this repo flow through Ansible and Python helpers (e.g. the smoke playbook normalizes viaternary('true', 'false'), but ad-hocdocker run -e ENABLE_VNC=Truefrom operators is plausible), consider also accepting mixed-caseTrue/Yes/On(e.g., normalize with${1,,}before matching). OtherwiseENABLE_VNC=Truewill be silently treated as false.
is_truthy() {
case "${1:-}" in
true|TRUE|1|yes|YES|on|ON) return 0 ;;
*) return 1 ;;
esac
}
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| if: github.event_name == 'workflow_dispatch' || github.event.pull_request.merged == true | ||
| runs-on: ubuntu-latest | ||
| outputs: | ||
| tag_name: ${{ steps.generate_tag.outputs.TAG_NAME }} |
| USER_PW="${USER_PW:-ailab2260}" | ||
|
|
| RUN apt-get update \ | ||
| && apt-get install -y --no-install-recommends tigervnc-tools \ | ||
| && rm -rf /var/lib/apt/lists/* |
| --runtime=nvidia | ||
| --cap-add=SYS_ADMIN | ||
| --ipc=host | ||
| --mount type=bind,source={{ test_home_root | quote }},target=/home/ |
| ansible.builtin.shell: > | ||
| timeout 90 bash -lc | ||
| 'until docker exec {{ test_container_name | quote }} test -f /home/{{ test_username }}/decs_jupyter_lab/jupyter_token.txt; | ||
| do sleep 3; done' |
| tags = build_tags(repository, variant, date_tag) | ||
| cmd = build_command(variant, repository, date_tag, args.no_cache) |
Summary
Validation