test(systest): upload deps to local cluster cache without redirect server#10487
test(systest): upload deps to local cluster cache without redirect server#10487basvandijk wants to merge 2 commits into
Conversation
…rver upload_systest_dep.sh no longer queries the redirect server (artifacts.idx.dfinity.network). Instead it checks the local cluster's bazel-remote cache directly and uploads there if the dependency is missing. The local cluster name (needed to build the artifacts.<cluster>.dfinity.network download URL) is auto-detected from the in-cluster Kubernetes API server certificate SAN (api.<cluster>.dfinity.network), which works on both CI runners and devenvs. It can be overridden via the SYSTEST_UPLOAD_CLUSTER environment variable.
There was a problem hiding this comment.
Pull request overview
This PR updates rs/tests/upload_systest_dep.sh to eliminate reliance on the cross-cluster redirect server by checking/uploading system-test dependencies directly against the local cluster’s bazel-remote cache, while preserving the output contract (stdout emits the artifacts URL).
Changes:
- Replace redirect-server lookup with a direct
HEADexistence check against in-cluster bazel-remote (dep_in_cache). - Add local cluster auto-detection via Kubernetes API server certificate SANs, with
SYSTEST_UPLOAD_CLUSTERoverride (resolve_cluster). - Keep upload target the same (local bazel-remote) and re-check local cache after upload until the blob is served.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
- Validate SYSTEST_UPLOAD_CLUSTER (and the auto-detected name) against a strict cluster-name pattern before it is interpolated into the artifacts.<cluster>.dfinity.network URL. - Add an explicit openssl presence check with a clear error message. - Bound the API-server probe with 'timeout 15' so DNS/network stalls fail fast. - Use 'grep -m1' instead of 'grep | head' to avoid SIGPIPE under pipefail.
|
|
||
| # Determines the name of the local cluster (e.g. "zh1-idx1"), used to build the | ||
| # download URL served at artifacts.<cluster>.dfinity.network. | ||
| resolve_cluster() { |
There was a problem hiding this comment.
since this script is called from run_systest.sh, and since run_systest.sh also tries to infer the DC, and since the two results should be aligned (AFAIU), how about taking the cluster as an input and let the caller specify the DC/cluster? or is there a reason why in one case we look it up from the Farm metadata, and once from the k8s records?
What
rs/tests/upload_systest_dep.shno longer queries the redirect server (https://artifacts.idx.dfinity.network). Instead it checks the local cluster's bazel-remote cache directly and uploads the dependency there if it is missing.Why
To reduce flakiness and operational complexity 557d727 forced testnet allocation to the DC the github runner was hosted in. We would like to have similar behaviour for manually run system-tests from devenvs. To make this easier to achieve we should always serve images from the cluster we're running the test from. That is what this commit implements.
Users should then manually pass
--set-required-host-features=dc=$DCwhere$DCis their local DC to force testnet allocation to that DC. We could also consider automating this but it will be more tricky for cloud-engines and performance tests since they need to be allocated to DCs where folks might not have a devenv.How
dep_in_cache()does aHEADagainsthttp://server.bazel-remote.svc.cluster.local:8080/cas/<sha>(200= present,404= absent, anything else is a hard error).resolve_cluster()auto-detects the local cluster (e.g.zh1-idx1) from the in-cluster Kubernetes API server certificate SAN (api.<cluster>.dfinity.network). This works on both CI runners and devenvs and yields the exact cluster name. It can be overridden with theSYSTEST_UPLOAD_CLUSTERenvironment variable.https://artifacts.<cluster>.dfinity.network/cas/<sha>on stdout, preserving the contract withrun_systest.sh.The
NODE_NAME-based derivation was intentionally avoided: it only encodes the DC (e.g.dm1), not the full cluster (dm1-idx1vsdm1-idx2), and it does not exist in devenvs.Testing
Validated in a
zh1-idx1devenv:bash -nandshellcheckclean.artifacts.zh1-idx1.dfinity.network/cas/<sha>URL actually serves the uploaded blob.SYSTEST_UPLOAD_CLUSTER=dm1-idx9correctly overrides auto-detection.bazel test //rs/tests/idx:basic_health_testpasses.Notes
openssland reachability ofkubernetes.default.svc:443from the build container (both available on CI runners and devenvs).SYSTEST_UPLOAD_CLUSTERis the escape hatch otherwise.