This guide explains how to extend the virt-platform-autopilot by adding new managed resources (assets).
Adding new assets to the autopilot requires no code changes - everything is template-driven:
- ✅ No code required: Create YAML templates, update metadata catalog
- ✅ Template-driven: Use Go templates for dynamic rendering
- ✅ Soft dependencies: Gracefully handle missing CRDs
- ✅ Declarative: Define conditions for when assets should be applied
Follow these steps to add a new asset:
Create a YAML file in the appropriate subdirectory under assets/active/:
# Choose the right category
assets/active/
├── hco/ # HyperConverged resource (only one, order: 0)
├── machine-config/ # MachineConfig resources
├── kubelet/ # KubeletConfig resources
├── node-health/ # NodeHealthCheck resources
├── descheduler/ # Descheduler resources
└── operators/ # Third-party operator CRs
# Create your asset
vi assets/active/machine-config/05-my-feature.yamlEdit assets/active/metadata.yaml and add your asset:
assets:
# ... existing assets ...
- name: my-new-feature
path: active/machine-config/05-my-feature.yaml
phase: 1
install: always # or opt-in
component: MachineConfig
reconcile_order: 1
conditions: [] # or add conditions (see below)Test your asset offline before deploying:
# Render all assets including your new one
virt-platform-autopilot render --hco-file=test-hco.yaml --output=yaml
# Or render just your asset
virt-platform-autopilot render --hco-file=test-hco.yaml --output=yaml | grep -A50 "my-new-feature"If your asset should only be applied in specific scenarios, add conditions:
conditions:
# Annotation-based activation
- type: annotation
key: platform.kubevirt.io/enable-my-feature
value: "true"
# Hardware detection
- type: hardware-detection
detector: gpuPresent
# Feature gate
- type: feature-gate
value: MyFeatureIf your asset introduces new resource types, regenerate RBAC:
make generate-rbacThis scans all assets and generates the necessary ClusterRole permissions.
For resources that don't need dynamic values:
File: assets/active/node-health/standard-remediation.yaml
apiVersion: remediation.medik8s.io/v1alpha1
kind: NodeHealthCheck
metadata:
name: virt-node-health-check
namespace: openshift-operators
spec:
minHealthy: 51%
remediationTemplate:
apiVersion: self-node-remediation.medik8s.io/v1alpha1
kind: SelfNodeRemediationTemplate
name: self-node-remediation-automatic-strategy-template
namespace: openshift-operators
selector:
matchExpressions:
- key: node-role.kubernetes.io/worker
operator: Exists
unhealthyConditions:
- duration: 5m
status: "False"
type: Ready
- duration: 5m
status: Unknown
type: ReadyNo templating needed - this is applied as-is.
Use .HCO.Object to access HyperConverged resource fields:
File: assets/active/descheduler/eviction-limits.yaml.tpl
apiVersion: operator.openshift.io/v1
kind: KubeDescheduler
metadata:
name: cluster
namespace: openshift-kube-descheduler-operator
spec:
managementState: Managed
mode: Automatic
deschedulingIntervalSeconds: 60
evictionLimits:
# Read from HCO spec with default fallback
{{- $migTotal := dig "spec" "liveMigrationConfig" "parallelMigrationsPerCluster" 5 .HCO.Object }}
total: {{ $migTotal }}
{{- $migNode := dig "spec" "liveMigrationConfig" "parallelOutboundMigrationsPerNode" 2 .HCO.Object }}
node: {{ $migNode }}Template functions available:
dig: Safely access nested fields with defaults.HCO.Object: Access HyperConverged resource.HCO.Namespace: HCO namespace.HCO.Name: HCO name
Skip rendering entirely if conditions aren't met:
{{- if crdHasEnum "kubedeschedulers.operator.openshift.io" "spec.profiles" "KubeVirtRelieveAndMigrate" }}
apiVersion: operator.openshift.io/v1
kind: KubeDescheduler
metadata:
name: cluster
spec:
profiles:
- KubeVirtRelieveAndMigrate
{{- else }}
# Fallback configuration or skip entirely
{{- end }}Use .Topology to adapt resources to the cluster shape:
apiVersion: hco.kubevirt.io/v1beta1
kind: HyperConverged
metadata:
name: kubevirt-hyperconverged
namespace: {{ dig "metadata" "namespace" "openshift-cnv" .HCO.Object }}
spec:
{{- if .Topology.IsHCP }}
# Hosted Control Plane: no local control-plane overhead, be more generous
resourceRequirements:
vmiCPUAllocationRatio: 4
{{- else if .Topology.IsCompact }}
# Compact 3-node cluster: control-plane competes with workloads
resourceRequirements:
vmiCPUAllocationRatio: 8
{{- else }}
resourceRequirements:
vmiCPUAllocationRatio: 10
{{- end }}Cloud provider awareness:
{{- if .Topology.IsAWS }}
# AWS-specific: use instance store for ephemeral scratch
storageClassName: gp3-csi
{{- else if .Topology.IsAzure }}
storageClassName: managed-premium
{{- else if .Topology.IsBareMetal }}
storageClassName: local-block
{{- end }}For less-common providers not covered by a dedicated boolean, use the raw string:
{{- if eq .Topology.CloudProvider "IBMCloud" }}
storageClassName: ibmc-block-gold
{{- end }}Combine multiple topology and hardware checks for complex logic:
{{- $crdExists := crdExists "gpus.nvidia.com/v1" }}
{{- $annotationSet := hasAnnotation .HCO.Object "platform.kubevirt.io/enable-gpu" "true" }}
{{- if and $crdExists $annotationSet }}
apiVersion: nvidia.com/v1
kind: ClusterPolicy
metadata:
name: gpu-operator-config
spec:
# GPU configuration
{{- end }}Handle missing CRDs gracefully to avoid failures:
{{- if crdExists "kubedeschedulers.operator.openshift.io" }}
apiVersion: operator.openshift.io/v1
kind: KubeDescheduler
# ... resource spec
{{- end }}If the CRD is missing:
- Asset is skipped during rendering
- No error is raised
- Reconciliation continues with other assets
- Asset is automatically applied when CRD becomes available
CRD present but operator namespace absent — a related scenario arises when the CRD exists as a leftover (e.g. after uninstalling an operator without cleaning up its CRDs) but the operator's namespace is gone. In this case the asset renders successfully but the apply fails because the target namespace does not exist. The autopilot detects this and soft-skips the asset without raising an error, identical in behaviour to the missing-CRD case. The asset will be retried on the next periodic reconciliation once the operator is reinstalled.
{{- if objectExists "PrometheusRule" "openshift-monitoring" "my-rules" }}
# Configuration that depends on the PrometheusRule
{{- end }}{{- if crdHasEnum "kubedeschedulers.operator.openshift.io" "spec.profiles" "KubeVirtRelieveAndMigrate" }}
profiles:
- KubeVirtRelieveAndMigrate
{{- else if crdHasEnum "kubedeschedulers.operator.openshift.io" "spec.profiles" "LongLifecycle" }}
profiles:
- LongLifecycle
{{- else }}
# Fallback for older API versions
{{- end }}The assets/active/metadata.yaml catalog defines all managed assets.
- name: my-asset # Unique identifier
path: active/category/my-asset.yaml # Template file path
phase: 1 # Rollout phase (1=GA, 2=TP, 3=Experimental)
install: always # always | opt-in
component: MachineConfig # Logical grouping
reconcile_order: 10 # Processing order (lower = earlier)
conditions: [] # Activation conditions (optional)name: Unique identifier for the asset. Used in logs, metrics, debug endpoints.
path: Relative path from assets/ directory to template file.
- Use
.yamlfor static resources - Use
.yaml.tplfor Go templates
phase: Rollout phase indicating maturity:
0: Critical foundation (HCO only)1: Generally Available (production-ready)2: Tech Preview (experimental, supported)3: Experimental (unsupported, for testing)
install: When this asset should be applied:
always: Applied to all clusters automaticallyopt-in: Requires conditions to be met (annotation, hardware, feature gate)
component: Logical grouping for organization. Examples:
HyperConvergedMachineConfigKubeletConfigNodeHealthCheckKubeDeschedulerForkliftControllerMetalLB
reconcile_order: Processing order (lower numbers first).
0: HCO only (must be first - serves as RenderContext source)1-9: Critical baseline (MachineConfig, Kubelet, NodeHealthCheck)10-19: Scheduling and placement (Descheduler)20+: Optional operators and advanced features
conditions: Array of conditions that must ALL be true for asset to be applied.
Asset is applied only if HCO has specific annotation:
conditions:
- type: annotation
key: platform.kubevirt.io/enable-my-feature
value: "true"Users enable with:
kubectl annotate -n openshift-cnv hyperconverged kubevirt-hyperconverged \
platform.kubevirt.io/enable-my-feature=trueAsset is applied if hardware is detected:
conditions:
- type: hardware-detection
detector: pciDevicesPresent # or numaNodesPresent, gpuPresent, etc.Available detectors:
pciDevicesPresent: PCI passthrough-capable devices detectednumaNodesPresent: Multi-NUMA topology detectedgpuPresent: GPU devices detectedsriovCapable: SR-IOV network interfaces detected
Asset is applied if feature gate is enabled:
conditions:
- type: feature-gate
value: CPUManagerFeature gates are typically set in HCO spec or platform configuration.
All conditions must be true:
conditions:
- type: annotation
key: platform.kubevirt.io/openshift
value: "true"
- type: hardware-detection
detector: gpuPresentThis asset is applied only on OpenShift clusters with GPUs.
Test template syntax and rendering without a cluster:
# Create test HCO YAML
cat > test-hco.yaml <<EOF
apiVersion: hco.kubevirt.io/v1beta1
kind: HyperConverged
metadata:
name: kubevirt-hyperconverged
namespace: openshift-cnv
annotations:
platform.kubevirt.io/enable-my-feature: "true"
spec:
liveMigrationConfig:
parallelMigrationsPerCluster: 10
parallelOutboundMigrationsPerNode: 2
EOF
# Render all assets
virt-platform-autopilot render --hco-file=test-hco.yaml --output=yaml
# Check for errors
virt-platform-autopilot render --hco-file=test-hco.yaml --output=statusTest rendering with live cluster context:
# Port-forward to debug endpoint
kubectl port-forward -n openshift-cnv deployment/virt-platform-autopilot 8081:8081
# Render all assets
curl http://localhost:8081/debug/render
# Render specific asset
curl http://localhost:8081/debug/render/my-asset
# Check exclusions (filtered assets)
curl http://localhost:8081/debug/exclusionsSee Debug Endpoints Documentation for details.
Add integration test coverage:
// pkg/controller/controller_test.go
func TestMyAssetRendering(t *testing.T) {
// Setup test environment
ctx := context.Background()
// Create test HCO
hco := &hcov1beta1.HyperConverged{
ObjectMeta: metav1.ObjectMeta{
Name: "test-hco",
Namespace: "openshift-cnv",
Annotations: map[string]string{
"platform.kubevirt.io/enable-my-feature": "true",
},
},
}
// Test rendering
// ... test logic ...
}Run tests:
make test-integrationTest with full controller in Kind cluster:
# Setup local cluster
make kind-setup
# Deploy autopilot
make deploy-local
# Check logs
make logs-local
# Verify asset was applied
kubectl get <resource-type> <resource-name>
# Make changes and redeploy
make redeploy-localThe following helper functions are available in templates:
crdExists "apiVersion"- Check if CRD is installedobjectExists "Kind" "Namespace" "Name"- Check if object existscrdHasEnum "crdName" "fieldPath" "enumValue"- Check if CRD schema has enum valueprometheusRuleHasRecordingRule "namespace" "name" "recordName"- Check PrometheusRule
dig "key1" "key2" ... default object- Safely access nested fields with default.HCO.Object- HyperConverged resource.HCO.Namespace- HCO namespace.HCO.Name- HCO name
| Field | Type | Description |
|---|---|---|
.Hardware.PCIDevicesPresent |
bool |
PCI passthrough-capable devices detected |
.Hardware.NUMANodesPresent |
bool |
Multi-NUMA topology detected |
.Hardware.VFIOCapable |
bool |
IOMMU/VFIO capable nodes detected |
.Hardware.USBDevicesPresent |
bool |
USB devices detected |
.Hardware.GPUPresent |
bool |
GPU devices detected |
Populated from node role labels and the OpenShift Infrastructure CR
(config.openshift.io/v1). All fields default to safe zero-values on
non-OpenShift or vanilla Kubernetes clusters.
| Field | Type | Description |
|---|---|---|
.Topology.IsHCP |
bool |
Hosted Control Plane (HyperShift) — status.controlPlaneTopology == "External" |
.Topology.IsCompact |
bool |
All visible master nodes also carry the worker role (3-node clusters) |
.Topology.ControlPlaneTopology |
string |
Raw Infrastructure CR value: "HighlyAvailable", "SingleReplica", "External" |
.Topology.CloudProvider |
string |
Raw platform type: "AWS", "Azure", "GCP", "BareMetal", "VSphere", "OpenStack", "IBMCloud", "Nutanix", "PowerVS", "None", … |
.Topology.IsAWS |
bool |
Running on AWS |
.Topology.IsAzure |
bool |
Running on Azure |
.Topology.IsGCP |
bool |
Running on GCP |
.Topology.IsBareMetal |
bool |
Running on bare metal |
.Topology.IsVSphere |
bool |
Running on vSphere |
.Topology.IsOpenStack |
bool |
Running on OpenStack |
.Topology.MasterCount |
int |
Nodes with the master / control-plane role label |
.Topology.WorkerCount |
int |
Dedicated worker nodes (0 on compact clusters) |
.Topology.TotalNodeCount |
int |
Total visible node count |
hasAnnotation object "key" "value"- Check if annotation exists with value
All standard Go template functions are available:
if,else,endrangewithand,or,not- String functions:
trim,trimPrefix,trimSuffix,lower,upper
The autopilot needs RBAC permissions for all resource types it manages.
After adding new resource types, regenerate RBAC:
make generate-rbacThis tool:
- Scans all templates in
assets/active/ - Extracts unique
apiVersionandkindcombinations - Generates ClusterRole with required permissions
- Updates
config/rbac/role.yaml
If automatic generation doesn't cover your use case, manually edit:
File: config/rbac/role.yaml
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: virt-platform-autopilot-role
rules:
- apiGroups: ["my-api-group.io"]
resources: ["myresources"]
verbs: ["get", "list", "watch", "create", "update", "patch", "delete"]# Good
- name: kubevirt-swap-optimization
# Bad
- name: asset-10: HCO only1-9: Infrastructure (MachineConfig, Kubelet, NodeHealthCheck)10-19: Scheduling and placement20+: Optional features
{{- $timeout := dig "spec" "workloadUpdateStrategy" "timeout" "5m" .HCO.Object }}
timeout: {{ $timeout }}{{- if crdExists "my-crd.example.com" }}
# Only render if CRD exists
{{- end }}Features that aren't universally applicable should be install: opt-in:
- name: gpu-operator
install: opt-in
conditions:
- type: annotation
key: platform.kubevirt.io/enable-gpu
value: "true"
- type: hardware-detection
detector: gpuPresentAlways test with the render command before deploying:
virt-platform-autopilot render --hco-file=test-hco.yaml --output=statusAdd comments for complex template logic:
{{- /* Select descheduler profile based on CRD version */ -}}
{{- if crdHasEnum "kubedeschedulers.operator.openshift.io" "spec.profiles" "KubeVirtRelieveAndMigrate" }}
# Preferred profile for KubeVirt workloads
profiles:
- KubeVirtRelieveAndMigrate
{{- else }}
# Fallback for older API versions
profiles:
- LongLifecycle
{{- end }}Adapt to different API versions:
{{- $apiVersion := "v1" }}
{{- if crdHasEnum "myresource.example.com" "spec.mode" "advanced" }}
{{- $apiVersion = "v2" }}
{{- end }}
apiVersion: myresource.example.com/{{ $apiVersion }}
kind: MyResourceSkip rendering entirely on unsupported topologies:
{{- if .Topology.IsHCP }}
{{- /* HCP clusters have no local control-plane — skip MachineConfig */ -}}
{{- else }}
apiVersion: machineconfiguration.openshift.io/v1
kind: MachineConfig
metadata:
name: 99-my-tuning
spec:
config:
...
{{- end }}Or guard a compact-only tuning:
{{- if and .Topology.IsCompact .Topology.IsBareMetal }}
apiVersion: machineconfiguration.openshift.io/v1
kind: MachineConfig
metadata:
name: 99-compact-baremetal-tuning
...
{{- end }}Different settings for different environments:
{{- $replicas := 1 }}
{{- if hasAnnotation .HCO.Object "platform.kubevirt.io/environment" "production" }}
{{- $replicas = 3 }}
{{- end }}
spec:
replicas: {{ $replicas }}Include entire sections conditionally:
apiVersion: v1
kind: ConfigMap
metadata:
name: my-config
data:
{{- if hasAnnotation .HCO.Object "platform.kubevirt.io/enable-debug" "true" }}
debug: "true"
log-level: "debug"
{{- end }}
required-setting: "value"Error: template: asset:5: unexpected "}"...
Fix: Check Go template syntax. Common issues:
- Missing
{{-or}} - Unmatched
if/endblocks - Invalid function calls
Debug:
virt-platform-autopilot render --hco-file=test-hco.yaml --output=statusPossible causes:
- Condition not met (check annotations, hardware detection)
- CRD not installed (check
crdExistsguards) - Asset filtered by
disabled-resourcesannotation
Debug:
# Check exclusions
kubectl port-forward -n openshift-cnv deployment/virt-platform-autopilot 8081:8081
curl http://localhost:8081/debug/exclusions
# Check if asset renders
curl http://localhost:8081/debug/render/my-assetError: forbidden: User "system:serviceaccount:openshift-cnv:virt-platform-autopilot" cannot create resource...
Fix: Regenerate RBAC:
make generate-rbac
make deployCause: Template produces different output on each render (timestamps, random values)
Fix: Make templates idempotent - same input should produce same output:
# Bad (changes every render)
timestamp: {{ now }}
# Good (stable value)
{{- $timestamp := dig "metadata" "creationTimestamp" "2024-01-01T00:00:00Z" .HCO.Object }}
createdAt: {{ $timestamp }}File: assets/active/hco/golden-config.yaml.tpl
Production-ready HCO configuration with opinionated defaults. Must have reconcile_order: 0.
File: assets/active/node-health/standard-remediation.yaml
Simple static YAML - no templating needed.
File: assets/active/descheduler/recommended.yaml.tpl
Complex conditional rendering based on CRD version, reads HCO for eviction limits.
File: assets/active/machine-config/02-pci-passthrough.yaml.tpl
Requires annotation AND hardware detection to be applied.
After adding your asset:
- Test thoroughly using render command and debug endpoints
- Add integration tests in
pkg/controller/ - Update documentation if the feature affects users
- Submit PR with your changes
- ARCHITECTURE.md - Technical implementation details
- Debug Endpoints - Debugging and inspection tools
- Lifecycle Management - Tombstoning and resource exclusions
- Local Development - Setting up dev environment