Add Jenkins pipeline for airgap infrastructure deployment and testing#498
Add Jenkins pipeline for airgap infrastructure deployment and testing#498floatingman wants to merge 42 commits intorancher:mainfrom
Conversation
There was a problem hiding this comment.
Pull request overview
This PR adds a new Jenkins pipeline to provision airgapped RKE2/Rancher infrastructure, inject an admin token, run Go-based validation tests, and publish results (including to Qase), plus a dedicated Docker image for these Go test runs.
Changes:
- Introduces
Jenkinsfile.airgap.go-teststo clone the tests and qa-infra repos, build an infra tools image with Go, deploy airgapped infra via Tofu/Ansible, inject an admin token, run Go tests withgotestsum, and optionally report to Qase. - Adds
Dockerfile.airgap-go-teststo build an Alpine-based infra tools image that includes OpenTofu, Ansible, AWS CLI, Go, andgotestsumfor use in the pipeline.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 6 comments.
| File | Description |
|---|---|
| validation/pipeline/Jenkinsfile.airgap.go-tests | Defines the end-to-end Jenkins pipeline for airgapped infra setup, admin token generation, Go test execution, artifact publishing, and optional Qase reporting. |
| validation/pipeline/Dockerfile.airgap-go-tests | Builds the infra tools Docker image with Go toolchain and gotestsum used by the new pipeline stages. |
| def testsBranch = env.GO_REPO_BRANCH ?: 'main' | ||
| def testsRepo = env.GO_REPO_URL ?: 'https://github.com/rancher/tests' | ||
| def qaInfraBranch = env.QA_INFRA_REPO_BRANCH ?: 'main' | ||
| def qaInfraRepo = env.QA_INFRA_REPO_URL ?: 'https://github.com/rancher/qa-infra-automation' |
There was a problem hiding this comment.
This pipeline introduces GO_REPO_BRANCH/GO_REPO_URL for the tests repo, which diverges from the established RANCHER_TEST_REPO_BRANCH/RANCHER_TEST_REPO_URL naming used by other airgap pipelines (for example validation/pipeline/Jenkinsfile.setup.airgap.rke2:13-14 and validation/pipeline/Jenkinsfile.destroy.airgap.rke2:11-12). Reusing the existing env var names (or at least falling back to them) would keep job configuration consistent and avoid confusion when wiring Jenkins jobs to different airgap pipelines.
There was a problem hiding this comment.
This is a good point.
| if ((env.DESTROY_ON_FAILURE ?: 'true').toBoolean() && workspaceName) { | ||
| echo 'DESTROY_ON_FAILURE is enabled. Cleaning up infrastructure...' | ||
| try { | ||
| stage('Cleanup on Failure') { | ||
| tofu.selectWorkspace(dir: tofuModulePath, name: workspaceName) | ||
| tofu.destroy(dir: tofuModulePath, varFile: 'terraform.tfvars', autoApprove: true) | ||
| tofu.deleteWorkspace(dir: tofuModulePath, name: workspaceName) | ||
| } | ||
| } catch (cleanupErr) { | ||
| echo "Cleanup failed: ${cleanupErr.message}" | ||
| } | ||
| } | ||
| throw err | ||
| } finally { | ||
| if (destroyAfterTests && workspaceName) { | ||
| echo 'Destroying infrastructure after tests (configured)' | ||
| try { | ||
| stage('Destroy After Tests') { | ||
| tofu.selectWorkspace(dir: tofuModulePath, name: workspaceName) | ||
| tofu.destroy(dir: tofuModulePath, varFile: 'terraform.tfvars', autoApprove: true) | ||
| tofu.deleteWorkspace(dir: tofuModulePath, name: workspaceName) |
There was a problem hiding this comment.
On failure the catch block already performs a best‑effort destroy/deleteWorkspace when DESTROY_ON_FAILURE is enabled, and then the finally block can run the same destroy logic again if DESTROY_AFTER_TESTS is also true. This double attempt to clean up the same workspace can create noisy errors in logs and makes it harder to reason about which flag controls teardown; consider centralizing the destroy logic in one place (or tracking whether cleanup has already succeeded) to keep the control flow clearer.
There was a problem hiding this comment.
This is a good point.
|
|
||
| # Install gotestsum for JUnit reporting | ||
| ENV GOBIN=/usr/local/bin | ||
| RUN go install gotest.tools/gotestsum@latest |
There was a problem hiding this comment.
Installing gotestsum with go install gotest.tools/gotestsum@latest pulls executable code from a mutable, third-party module reference at build time, which introduces a supply-chain risk. If the gotest.tools/gotestsum module or its distribution channel is ever compromised or a malicious version is published, the resulting binary will run inside this image (and thus in Jenkins jobs) with access to AWS credentials and other secrets used by the pipeline. Pin this dependency to a specific, vetted version (for example, a fixed tag or commit) and, if possible, enforce integrity verification via checksums or vendored binaries to prevent untrusted code from being pulled implicitly.
There was a problem hiding this comment.
I'll look into this.
…and improve logging
…mline installation process
…nment and enhance test command arguments
… test command and improve test result handling
…ization with qaConfig and dockerPlatform, and update test command execution to use dynamic container name and infraToolsImage
…ialization and directly use config for dockerPlatform and infraToolsImage
…mage before building to ensure a clean build
… token generation script
…password variable
…tic output; remove build tag directive from inject-admin-token.go
…ken injection script
…use inventory file for admin token generation
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
294090e to
0457967
Compare
…ion control and cleanup logic
| ARG GO_VERSION=1.25.5 | ||
| ARG GOTESTUM_VERSION=1.13.0 | ||
|
|
||
| FROM --platform=linux/amd64 alpine:3.22 |
There was a problem hiding this comment.
we typically try to use suse based images
There was a problem hiding this comment.
I think this still needs to be addressed
There was a problem hiding this comment.
In fact, I think instead of using this file we probably either:
1- Use Dockerfile.infra and make it FROM registry.suse.com/bci/golang:1.25
2- Use Dockerfile.e2e
| stage('Verify Infra Tools Tooling') { | ||
| echo 'Verifying gotestsum availability inside infra tools image' | ||
| sh """ | ||
| docker run --rm --platform ${dockerPlatform} \ | ||
| ${infraToolsImage} \ | ||
| sh -c 'set -e; echo \"PATH=[1m$PATH\"; which gotestsum || true; ls -al /root/go/bin || true; ls -al /usr/local/bin/gotestsum || true' | ||
| """ | ||
| } |
There was a problem hiding this comment.
this seems unnecessary, what use case would it not be available (unless the build failed outright)?
| stage('Configure SSH Key') { | ||
| infrastructure.writeSshKey( | ||
| keyContent: env.AWS_SSH_PEM_KEY, | ||
| keyName: env.AWS_SSH_PEM_KEY_NAME, | ||
| dir: '.ssh' | ||
| ) | ||
| } | ||
|
|
||
| stage('Configure Tofu Variables') { | ||
| echo 'Writing Terraform configuration' | ||
|
|
||
| def terraformConfig = infrastructure.parseAndSubstituteVars( | ||
| content: env.TERRAFORM_CONFIG, | ||
| envVars: [ | ||
| 'AWS_ACCESS_KEY_ID': env.AWS_ACCESS_KEY_ID, | ||
| 'AWS_SECRET_ACCESS_KEY': env.AWS_SECRET_ACCESS_KEY, | ||
| 'HOSTNAME_PREFIX': env.HOSTNAME_PREFIX, | ||
| 'AWS_SSH_PEM_KEY_NAME': env.AWS_SSH_PEM_KEY_NAME | ||
| ] | ||
| ) | ||
|
|
||
| infrastructure.writeConfig( | ||
| path: "${tofuModulePath}/terraform.tfvars", | ||
| content: terraformConfig | ||
| ) | ||
| } | ||
|
|
||
| stage('Initialize Tofu Backend') { | ||
| tofu.initBackend( | ||
| dir: tofuModulePath, | ||
| bucket: env.S3_BUCKET_NAME, | ||
| key: env.S3_KEY_PREFIX, | ||
| region: env.S3_BUCKET_REGION, | ||
| backendInitScript: tofuBackendInitScript | ||
| ) | ||
| } | ||
|
|
||
| stage('Create Workspace') { | ||
| workspaceName = infrastructure.generateWorkspaceName( | ||
| prefix: 'jenkins_airgap_ansible_workspace', | ||
| suffix: env.HOSTNAME_PREFIX, | ||
| includeTimestamp: false | ||
| ) |
There was a problem hiding this comment.
I get wanting more resolute logs, but now there are too many stages to be viewed on a standard screen (when looking at the jenkins job). Since each of these take less than 10 seconds, can you consolidate into 1 stage?
| stage('Setup SSH Keys on Nodes') { | ||
| retry(3) { | ||
| ansible.runPlaybook( | ||
| dir: ansiblePath, | ||
| inventory: 'inventory/inventory.yml', | ||
| playbook: 'playbooks/setup/setup-ssh-keys.yml' | ||
| ) | ||
| } | ||
| } | ||
|
|
||
| stage('Deploy RKE2 Cluster') { | ||
| retry(3) { | ||
| ansible.runPlaybook( | ||
| dir: ansiblePath, | ||
| inventory: 'inventory/inventory.yml', | ||
| playbook: 'playbooks/deploy/rke2-tarball-playbook.yml' | ||
| ) | ||
| } | ||
| } | ||
|
|
There was a problem hiding this comment.
if you consolidate these 2 with the previous stage, I think it would fit on 1 screen again.
| } | ||
| } | ||
|
|
||
| stage('Deploy Rancher (Optional)') { |
There was a problem hiding this comment.
iirc, optional stages break the view in jenkins if the stage is actually non-existent when its not specified. In this case, it might be OK since there's the def + echo?
| } | ||
|
|
||
| stage('Deploy RKE2 Cluster') { | ||
| retry(3) { |
There was a problem hiding this comment.
clarifying question; for error handling, if it fails on the 3rd attempt, does it stop the jenkins job too (assuming so but just double checking)
| // Run the Ansible playbook to generate and inject the admin token | ||
| // The playbook uses the rancher_token role which: | ||
| // - Reads external_lb_hostname from inventory (or uses explicit rancher_url) | ||
| // - Authenticates with Rancher API | ||
| // - Creates an API token with configurable TTL and description | ||
| // - Updates cattle-config.yaml with the generated token |
There was a problem hiding this comment.
since this is in the qa-infra-automation docs already and basically no one looks at jenkinsfiles, I think you should remove these comments.
| resultsJSON: 'gotestsum.json' | ||
| ]) | ||
|
|
||
| if (testArgs && testArgs[-1]?.endsWith(';')) { |
There was a problem hiding this comment.
why is this necessary?
| stage('Cleanup on Failure') { | ||
| tofu.selectWorkspace(dir: tofuModulePath, name: workspaceName) | ||
| tofu.destroy(dir: tofuModulePath, varFile: 'terraform.tfvars', autoApprove: true) | ||
| tofu.deleteWorkspace(dir: tofuModulePath, name: workspaceName) | ||
| infrastructureCleaned = true | ||
| } | ||
| } catch (cleanupErr) { | ||
| echo "Cleanup failed: ${cleanupErr.message}" | ||
| } | ||
| } | ||
| throw err | ||
| } finally { | ||
| if (destroyAfterTests && workspaceName && !infrastructureCleaned) { | ||
| echo 'Destroying infrastructure after tests (configured)' | ||
| try { | ||
| stage('Destroy After Tests') { |
There was a problem hiding this comment.
having conditional stages like this with different names will cause the job history to not show up properly in jenkins. It looks like 'cleanup on failure' and 'destroy after tests' are doing the same thing? in which case, if you use the same stage name in both cases you can avoid this problem.
If they do different things, can you update to persist each stage, even if there's nothing to do on a particular run for it? see the rancher comment for a potential solution.
There was a problem hiding this comment.
Fixed to use the same name. The two stages were doing different things. One destroys the infrastructure on failure; the other destroys it at the end of the build in case you want to investigate something manually.
… and remove unused ARG
…ng ansible-playbook" This reverts commit 95efe45.
| ARG GO_VERSION=1.25.5 | ||
| ARG GOTESTUM_VERSION=1.13.0 | ||
|
|
||
| FROM --platform=linux/amd64 alpine:3.22 |
There was a problem hiding this comment.
I think this still needs to be addressed
| ARG GO_VERSION=1.25.5 | ||
| ARG GOTESTUM_VERSION=1.13.0 | ||
|
|
||
| FROM --platform=linux/amd64 alpine:3.22 |
There was a problem hiding this comment.
In fact, I think instead of using this file we probably either:
1- Use Dockerfile.infra and make it FROM registry.suse.com/bci/golang:1.25
2- Use Dockerfile.e2e
| property.useWithProperties(['AWS_ACCESS_KEY_ID', 'AWS_SECRET_ACCESS_KEY', 'AWS_SSH_PEM_KEY', 'AWS_SSH_PEM_KEY_NAME']) { | ||
| try { | ||
| stage('Checkout') { | ||
| deleteDir() |
There was a problem hiding this comment.
Surely this is me being dumb with Jenkins, but why is this here?
| stage('Configure Tofu Variables') { | ||
| echo 'Writing Terraform configuration' |
There was a problem hiding this comment.
do we need both of these?
| def s3Path = "env:/${workspaceName}/terraform.tfvars" | ||
| def tfvarsPath = "${tofuModulePath}/terraform.tfvars" | ||
| def workspace = pwd() | ||
| sh """ |
There was a problem hiding this comment.
Again possibly a dumb question, but why sh """?
| def deployRancher = env.ANSIBLE_VARIABLES?.contains('deploy_rancher: true') | ||
| if (deployRancher) { | ||
| echo 'Deploying Rancher...' | ||
| retry(3) { | ||
| ansible.runPlaybook( | ||
| dir: ansiblePath, | ||
| inventory: 'inventory/inventory.yml', | ||
| playbook: 'playbooks/deploy/rancher-helm-deploy-playbook.yml' | ||
| ) | ||
| } | ||
| } else { | ||
| echo 'Skipping Rancher deployment (not enabled in ANSIBLE_VARIABLES)' | ||
| } |
There was a problem hiding this comment.
I will use this oportunity to comment on something I was thinking of regarding the ansible stuff. And that is the fact that we have a bunch of variables like deploy_rancher that basically make the playbook that uses them useless when not set, so instead of setting deploy_rancher: false, why not just don't run the playbook?
Again, no implications for this PR, I just was thinking of this for a while.
| sh """ | ||
| docker run --rm --platform ${dockerPlatform} \ | ||
| --name generate-token \ | ||
| -v ${workspace}:/workspace \ | ||
| -w /workspace/${ansiblePath} \ | ||
| ${infraToolsImage} \ | ||
| ansible-playbook -i inventory/inventory.yml /workspace/qa-infra-automation/ansible/rancher/token/generate-admin-token.yml \ | ||
| -e rancher_token_password=${adminPassword} \ | ||
| -e rancher_cattle_config_file=${cattleConfigPath} \ | ||
| -e rancher_token_ttl=${tokenTtl} \ | ||
| -e rancher_token_description=${tokenDescription} \ | ||
| -e rancher_token_output_format=json \ | ||
| -e rancher_token_output_file=/workspace/rancher-token.json | ||
| """ |
There was a problem hiding this comment.
Is it not possible to use runPlaybook here?
| ] | ||
| } | ||
| // If Ansible variables defined a bootstrap password, default commonly used value | ||
| lines += ["RANCHER_BOOTSTRAP_PASSWORD=${env.RANCHER_BOOTSTRAP_PASSWORD ?: 'rancherrocks'}"] |
There was a problem hiding this comment.
Doesn't ansible already use this if nothing is set?
…dencies, streamline installation steps, and enhance error handling
Implement a Jenkins pipeline to facilitate airgap infrastructure deployment and testing. This includes enhancements to the Dockerfile for Go testing, improvements in the Jenkinsfile for better logging and test handling, and the addition of stages for admin token injection and verification. The pipeline also integrates a retry mechanism for Ansible playbook executions and incorporates Qase reporting for test results.
rancher/qa-tasks#2125