This repository delivers a production-grade automation engine designed for deploying and managing a fully declarative cloud platform. Utilizing Infrastructure as Code (IaC) and a multi-layered GitOps workflow, the system establishes a robust, self-managing environment where OpenStack serves as the virtualized substrate, and a Talos-based cluster handles continuous operations.
The design targets repeatable production bring-up and lifecycle management, achieving a closed-loop system that spans from the virtualization host up to application delivery.
This system is built upon two distinct, yet interconnected, declarative layers:
This layer establishes the core virtualization and foundational cloud components:
- Provisioning (Terraform/Libvirt): Provisions all VMs (OpenStack controllers, computes, storage) and the initial Talos Kubernetes nodes.
- Deployment (Ansible/OSA): Deploys the full OpenStack cloud on its dedicated VMs and bootstraps the Talos Management Cluster on the Kubernetes nodes.
The Talos Cluster acts as the single, declarative management plane for all tenant resources:
- Orchestration (FluxCD/CAPI): The cluster hosts FluxCD and Cluster API.
- Control Flow: CAPI continuously monitors Git and uses the OpenStack APIs to provision, manage, and reconcile all tenant workloads (VMs, networking, storage) on demand. This workflow enforces a GitOps-driven lifecycle model where every component, from the base VMs to the deployed applications, is consistently controlled and managed from Git.
ββββββββββββββββ
β Bare-metal β scripts/init.sh installs libvirt/qemu/terraform/ansible
ββββββββ¬ββββββββ
β Terraform + Ansible
ββββββββΌββββββββββββββββββββββββββββββββββββββββββββββ
β terraform/openstack-libvirt β
β - storage pool + NAT network β
β - controller/compute/storage VMs β
β - cloud-init + Ansible inventory β
ββββββββ¬ββββββββββββββββββββββββββββββββββββββββββββββ
β kicks off ansible/roles/openstack/site.yaml
ββββββββΌββββββββββββββββββββββββββββββββββββββββββββββ
β OpenStack-Ansible deployment (controller[0]) β
β - OSA stable/epoxy β
β - OVN networking, LVM Cinder backend β
β - Full OpenStack control plane β
ββββββββ¬ββββββββββββββββββββββββββββββββββββββββββββββ
β
ββββββββΌββββββββββββββββββββββββββββββββββββββββββββββ
β terraform/talos-cluster-bootstrap β
β - Talos control-plane + worker VMs β
β - ansible/roles/k8s-talos bootstraps the cluster β
β - Flux + Cluster API installed on Talos β
ββββββββ¬ββββββββββββββββββββββββββββββββββββββββββββββ
β Flux reconciles Cluster API manifests
ββββββββΌββββββββββββββββββββββββββββββββββββββββββββββ
β OpenStack (Nova/Cinder/Neutron) β
β - CAPI uses OpenStack APIs to create tenant VMs β
β - GitOps drives application delivery β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββ
scripts/
ββ init.sh # Host prerequisite installer
ββ bootstrap-openstack.sh # Terraform/Ansible wrapper for OpenStack stack
ββ bootstrap-talos.sh # Terraform/Ansible wrapper for Talos stack
terraform/
ββ openstack-libvirt/ # Libvirt module for OpenStack VMs
ββ talos-cluster-bootstrap/ # Talos VM module
ansible/
ββ roles/
ββ openstack/ # OSA orchestration (site + roles)
ββ k8s-talos/ # Talos bootstrap and management addons (Flux, CAPI)
- Ubuntu/Debian host with hardware virtualization (KVM) and at least:
- 128β―GiB RAM, 48 vCPUs, >2β―TB fast storage (default QCOW images are large)
- Internet access for Ubuntu cloud images, OpenStack packages, Talos artifacts
- Ability to run commands with sudo/root privileges
Run the host preparation script once:
sudo ./scripts/init.shIt updates apt, installs libvirt/qemu/OVMF/dnsmasq and supporting tooling (talosctl, kubectl, helm, Terraform, Ansible, jq), and configures /etc/libvirt/qemu.conf plus system services.
Order matters: bootstrap OpenStack first so Talos has an infrastructure target to manage. Once OpenStack is healthy, bootstrap the Talos management cluster.
Use the production bootstrap helper rather than invoking Terraform manually:
./scripts/bootstrap-openstack.sh \
--action apply \
--var-file terraform.tfvarsKey flags:
--action <plan|apply|destroy>β defaults toapply--var-fileβ alternate tfvars (relative or absolute)--workspaceβ optional Terraform workspace--parallelismβ cap Terraform concurrency--upgradeβ runterraform init -upgrade
Outputs from the script include:
ansible_inventory_fileβ inventory path for manual OSA rerunsssh_private_key_pathβ SSH key used for the OpenStack VMs{controller,compute,storage}_nodesβ network metadata for each VM
Validation snippet (controller node):
ssh -i terraform/openstack-libvirt/openstack_private_key.pem ubuntu@<controller_ip>
sudo -i
cd /opt/openstack-ansible
source playbooks/openrc
openstack compute service list
lxc-ls -fOnce OpenStack is online, bring up the Talos management plane:
./scripts/bootstrap-talos.sh \
--action apply \
--var-file terraform.tfvarsThe script mirrors the OpenStack helper (same flags) and invokes Terraform for terraform/talos-cluster-bootstrap. Terraform provisions Talos control-plane and worker VMs, generates the required inventories, and ansible/roles/k8s-talos:
- Installs Talos on every VM and bootstraps the control plane
- Joins workers and exposes kubeconfig via
talosctl - Installs Flux plus Cluster API configured for the OpenStack cloud so Git reconciliation drives tenant infrastructure
The helper prints summarized outputs for master/worker IPs and the generated Talos inventory (terraform/talos-cluster-bootstrap/ansible_inventory.yaml), making it easy to rerun Ansible or talosctl commands.
- Scaling OpenStack β edit
terraform/openstack-libvirt/terraform.tfvars(*_count, CPU/RAM/disk). Terraform hashes the Ansible content and tfvars, so reapplying reconciles changes automatically. - Network adjustments β change
network_cidrinside Terraform modules and update container/tunnel/storage CIDRs inansible/roles/openstack/site.yaml. - OSA release β set
osa_branchwithin the OpenStack role to move between stable series. - Cinder backing disk β override
cinder_lvm_deviceif the extra disk differs from/dev/vdb. - Talos releases / Flux config β tune variables under
terraform/talos-cluster-bootstrapor defaults inansible/roles/k8s-talosto select Talos versions, Flux Git sources, and Cluster API settings. - Manual reruns β
ansible-playbook -i terraform/openstack-libvirt/ansible_inventory.yaml ansible/roles/openstack/site.yamlfor OpenStack, or reuse the generated Talos inventory with the k8s role. - Cleanup β run
./scripts/bootstrap-<stack>.sh --action destroywith the same tfvars/workspace settings to tear down each layer.
Expect roughly 30β60 minutes for the initial OpenStack deployment. The Talos bootstrap typically completes within minutes once the VMs are available. Re-running the bootstrap scripts is idempotent: Terraform reconciles infrastructure and the downstream Ansible roles/Flux sources ensure software state converges.
- Generate kubeconfig from Talos (
talosctl kubeconfig ...) and verify Flux reconciliation status (kubectl -n flux-system get kustomizations,sources). - Author Cluster API manifests (
Cluster,OpenStackCluster,MachineDeployment,OpenStackMachineTemplate) in your Flux source repository so Flux continuously deploys tenant clusters onto OpenStack. - Layer additional GitOps workloads or platform services on top of the Talos management cluster; Flux will fan out the changes through Cluster API into the OpenStack-backed infrastructure.
This engine gives you complete, declarative controlβfrom the libvirt host through the OpenStack substrate to Kubernetes workloads governed by Fluxβready for production-oriented automation, testing, and iterative delivery.