Adding Assets Guide

This guide explains how to extend the virt-platform-autopilot by adding new managed resources (assets).

Philosophy

Adding new assets to the autopilot requires no code changes - everything is template-driven:

✅ No code required: Create YAML templates, update metadata catalog
✅ Template-driven: Use Go templates for dynamic rendering
✅ Soft dependencies: Gracefully handle missing CRDs
✅ Declarative: Define conditions for when assets should be applied

Quick Start

Follow these steps to add a new asset:

1. Create Template File

Create a YAML file in the appropriate subdirectory under assets/active/:

# Choose the right category
assets/active/
├── hco/              # HyperConverged resource (only one, order: 0)
├── machine-config/   # MachineConfig resources
├── kubelet/          # KubeletConfig resources
├── node-health/      # NodeHealthCheck resources
├── descheduler/      # Descheduler resources
└── operators/        # Third-party operator CRs

# Create your asset
vi assets/active/machine-config/05-my-feature.yaml

2. Add Entry to Metadata Catalog

Edit assets/active/metadata.yaml and add your asset:

assets:
  # ... existing assets ...

  - name: my-new-feature
    path: active/machine-config/05-my-feature.yaml
    phase: 1
    install: always  # or opt-in
    component: MachineConfig
    reconcile_order: 1
    conditions: []  # or add conditions (see below)

3. Test with Render Command

Test your asset offline before deploying:

# Render all assets including your new one
virt-platform-autopilot render --hco-file=test-hco.yaml --output=yaml

# Or render just your asset
virt-platform-autopilot render --hco-file=test-hco.yaml --output=yaml | grep -A50 "my-new-feature"

4. (Optional) Add Conditions

If your asset should only be applied in specific scenarios, add conditions:

conditions:
  # Annotation-based activation
  - type: annotation
    key: platform.kubevirt.io/enable-my-feature
    value: "true"

  # Hardware detection
  - type: hardware-detection
    detector: gpuPresent

  # Feature gate
  - type: feature-gate
    value: MyFeature

5. Update RBAC

If your asset introduces new resource types, regenerate RBAC:

make generate-rbac

This scans all assets and generates the necessary ClusterRole permissions.

Template Examples

Example 1: Simple Static YAML

For resources that don't need dynamic values:

File: assets/active/node-health/standard-remediation.yaml

apiVersion: remediation.medik8s.io/v1alpha1
kind: NodeHealthCheck
metadata:
  name: virt-node-health-check
  namespace: openshift-operators
spec:
  minHealthy: 51%
  remediationTemplate:
    apiVersion: self-node-remediation.medik8s.io/v1alpha1
    kind: SelfNodeRemediationTemplate
    name: self-node-remediation-automatic-strategy-template
    namespace: openshift-operators
  selector:
    matchExpressions:
      - key: node-role.kubernetes.io/worker
        operator: Exists
  unhealthyConditions:
    - duration: 5m
      status: "False"
      type: Ready
    - duration: 5m
      status: Unknown
      type: Ready

No templating needed - this is applied as-is.

Example 2: Templated with HCO Context

Use .HCO.Object to access HyperConverged resource fields:

File: assets/active/descheduler/eviction-limits.yaml.tpl

apiVersion: operator.openshift.io/v1
kind: KubeDescheduler
metadata:
  name: cluster
  namespace: openshift-kube-descheduler-operator
spec:
  managementState: Managed
  mode: Automatic
  deschedulingIntervalSeconds: 60
  evictionLimits:
    # Read from HCO spec with default fallback
    {{- $migTotal := dig "spec" "liveMigrationConfig" "parallelMigrationsPerCluster" 5 .HCO.Object }}
    total: {{ $migTotal }}
    {{- $migNode := dig "spec" "liveMigrationConfig" "parallelOutboundMigrationsPerNode" 2 .HCO.Object }}
    node: {{ $migNode }}

Template functions available:

dig: Safely access nested fields with defaults
.HCO.Object: Access HyperConverged resource
.HCO.Namespace: HCO namespace
.HCO.Name: HCO name

Example 3: Conditional Rendering

Skip rendering entirely if conditions aren't met:

{{- if crdHasEnum "kubedeschedulers.operator.openshift.io" "spec.profiles" "KubeVirtRelieveAndMigrate" }}
apiVersion: operator.openshift.io/v1
kind: KubeDescheduler
metadata:
  name: cluster
spec:
  profiles:
    - KubeVirtRelieveAndMigrate
{{- else }}
# Fallback configuration or skip entirely
{{- end }}

Example 4: Topology-Aware Configuration

Use .Topology to adapt resources to the cluster shape:

apiVersion: hco.kubevirt.io/v1beta1
kind: HyperConverged
metadata:
  name: kubevirt-hyperconverged
  namespace: {{ dig "metadata" "namespace" "openshift-cnv" .HCO.Object }}
spec:
  {{- if .Topology.IsHCP }}
  # Hosted Control Plane: no local control-plane overhead, be more generous
  resourceRequirements:
    vmiCPUAllocationRatio: 4
  {{- else if .Topology.IsCompact }}
  # Compact 3-node cluster: control-plane competes with workloads
  resourceRequirements:
    vmiCPUAllocationRatio: 8
  {{- else }}
  resourceRequirements:
    vmiCPUAllocationRatio: 10
  {{- end }}

Cloud provider awareness:

{{- if .Topology.IsAWS }}
  # AWS-specific: use instance store for ephemeral scratch
  storageClassName: gp3-csi
{{- else if .Topology.IsAzure }}
  storageClassName: managed-premium
{{- else if .Topology.IsBareMetal }}
  storageClassName: local-block
{{- end }}

For less-common providers not covered by a dedicated boolean, use the raw string:

{{- if eq .Topology.CloudProvider "IBMCloud" }}
  storageClassName: ibmc-block-gold
{{- end }}

Example 5: Multiple Checks Combined

Combine multiple topology and hardware checks for complex logic:

{{- $crdExists := crdExists "gpus.nvidia.com/v1" }}
{{- $annotationSet := hasAnnotation .HCO.Object "platform.kubevirt.io/enable-gpu" "true" }}
{{- if and $crdExists $annotationSet }}
apiVersion: nvidia.com/v1
kind: ClusterPolicy
metadata:
  name: gpu-operator-config
spec:
  # GPU configuration
{{- end }}

Soft Dependencies

Handle missing CRDs gracefully to avoid failures:

Check CRD Existence

{{- if crdExists "kubedeschedulers.operator.openshift.io" }}
apiVersion: operator.openshift.io/v1
kind: KubeDescheduler
# ... resource spec
{{- end }}

If the CRD is missing:

Asset is skipped during rendering
No error is raised
Reconciliation continues with other assets
Asset is automatically applied when CRD becomes available

CRD present but operator namespace absent — a related scenario arises when the CRD exists as a leftover (e.g. after uninstalling an operator without cleaning up its CRDs) but the operator's namespace is gone. In this case the asset renders successfully but the apply fails because the target namespace does not exist. The autopilot detects this and soft-skips the asset without raising an error, identical in behaviour to the missing-CRD case. The asset will be retried on the next periodic reconciliation once the operator is reinstalled.

Check Object Existence

{{- if objectExists "PrometheusRule" "openshift-monitoring" "my-rules" }}
# Configuration that depends on the PrometheusRule
{{- end }}

Check Enum Values (API Version Compatibility)

{{- if crdHasEnum "kubedeschedulers.operator.openshift.io" "spec.profiles" "KubeVirtRelieveAndMigrate" }}
  profiles:
    - KubeVirtRelieveAndMigrate
{{- else if crdHasEnum "kubedeschedulers.operator.openshift.io" "spec.profiles" "LongLifecycle" }}
  profiles:
    - LongLifecycle
{{- else }}
  # Fallback for older API versions
{{- end }}

Metadata Catalog Reference

The assets/active/metadata.yaml catalog defines all managed assets.

Metadata Fields

- name: my-asset                           # Unique identifier
  path: active/category/my-asset.yaml      # Template file path
  phase: 1                                 # Rollout phase (1=GA, 2=TP, 3=Experimental)
  install: always                          # always | opt-in
  component: MachineConfig                 # Logical grouping
  reconcile_order: 10                      # Processing order (lower = earlier)
  conditions: []                           # Activation conditions (optional)

Field Descriptions

name: Unique identifier for the asset. Used in logs, metrics, debug endpoints.

path: Relative path from assets/ directory to template file.

Use .yaml for static resources
Use .yaml.tpl for Go templates

phase: Rollout phase indicating maturity:

0: Critical foundation (HCO only)
1: Generally Available (production-ready)
2: Tech Preview (experimental, supported)
3: Experimental (unsupported, for testing)

install: When this asset should be applied:

always: Applied to all clusters automatically
opt-in: Requires conditions to be met (annotation, hardware, feature gate)

component: Logical grouping for organization. Examples:

HyperConverged
MachineConfig
KubeletConfig
NodeHealthCheck
KubeDescheduler
ForkliftController
MetalLB

reconcile_order: Processing order (lower numbers first).

0: HCO only (must be first - serves as RenderContext source)
1-9: Critical baseline (MachineConfig, Kubelet, NodeHealthCheck)
10-19: Scheduling and placement (Descheduler)
20+: Optional operators and advanced features

conditions: Array of conditions that must ALL be true for asset to be applied.

Condition Types

Annotation Condition

Asset is applied only if HCO has specific annotation:

conditions:
  - type: annotation
    key: platform.kubevirt.io/enable-my-feature
    value: "true"

Users enable with:

kubectl annotate -n openshift-cnv hyperconverged kubevirt-hyperconverged \
  platform.kubevirt.io/enable-my-feature=true

Hardware Detection Condition

Asset is applied if hardware is detected:

conditions:
  - type: hardware-detection
    detector: pciDevicesPresent  # or numaNodesPresent, gpuPresent, etc.

Available detectors:

pciDevicesPresent: PCI passthrough-capable devices detected
numaNodesPresent: Multi-NUMA topology detected
gpuPresent: GPU devices detected
sriovCapable: SR-IOV network interfaces detected

Feature Gate Condition

Asset is applied if feature gate is enabled:

conditions:
  - type: feature-gate
    value: CPUManager

Feature gates are typically set in HCO spec or platform configuration.

Multiple Conditions (AND Logic)

All conditions must be true:

conditions:
  - type: annotation
    key: platform.kubevirt.io/openshift
    value: "true"
  - type: hardware-detection
    detector: gpuPresent

This asset is applied only on OpenShift clusters with GPUs.

Testing Your Asset

1. Offline Rendering

Test template syntax and rendering without a cluster:

# Create test HCO YAML
cat > test-hco.yaml <<EOF
apiVersion: hco.kubevirt.io/v1beta1
kind: HyperConverged
metadata:
  name: kubevirt-hyperconverged
  namespace: openshift-cnv
  annotations:
    platform.kubevirt.io/enable-my-feature: "true"
spec:
  liveMigrationConfig:
    parallelMigrationsPerCluster: 10
    parallelOutboundMigrationsPerNode: 2
EOF

# Render all assets
virt-platform-autopilot render --hco-file=test-hco.yaml --output=yaml

# Check for errors
virt-platform-autopilot render --hco-file=test-hco.yaml --output=status

2. Debug Endpoints

Test rendering with live cluster context:

# Port-forward to debug endpoint
kubectl port-forward -n openshift-cnv deployment/virt-platform-autopilot 8081:8081

# Render all assets
curl http://localhost:8081/debug/render

# Render specific asset
curl http://localhost:8081/debug/render/my-asset

# Check exclusions (filtered assets)
curl http://localhost:8081/debug/exclusions

See Debug Endpoints Documentation for details.

3. Integration Tests

Add integration test coverage:

// pkg/controller/controller_test.go
func TestMyAssetRendering(t *testing.T) {
    // Setup test environment
    ctx := context.Background()

    // Create test HCO
    hco := &hcov1beta1.HyperConverged{
        ObjectMeta: metav1.ObjectMeta{
            Name:      "test-hco",
            Namespace: "openshift-cnv",
            Annotations: map[string]string{
                "platform.kubevirt.io/enable-my-feature": "true",
            },
        },
    }

    // Test rendering
    // ... test logic ...
}

Run tests:

make test-integration

4. Local Deployment Testing

Test with full controller in Kind cluster:

# Setup local cluster
make kind-setup

# Deploy autopilot
make deploy-local

# Check logs
make logs-local

# Verify asset was applied
kubectl get <resource-type> <resource-name>

# Make changes and redeploy
make redeploy-local

Template Helper Functions

The following helper functions are available in templates:

Resource Lookups

crdExists "apiVersion" - Check if CRD is installed
objectExists "Kind" "Namespace" "Name" - Check if object exists
crdHasEnum "crdName" "fieldPath" "enumValue" - Check if CRD schema has enum value
prometheusRuleHasRecordingRule "namespace" "name" "recordName" - Check PrometheusRule

Data Access

dig "key1" "key2" ... default object - Safely access nested fields with default
.HCO.Object - HyperConverged resource
.HCO.Namespace - HCO namespace
.HCO.Name - HCO name

`.Hardware` — cluster hardware detection

Field	Type	Description
`.Hardware.PCIDevicesPresent`	`bool`	PCI passthrough-capable devices detected
`.Hardware.NUMANodesPresent`	`bool`	Multi-NUMA topology detected
`.Hardware.VFIOCapable`	`bool`	IOMMU/VFIO capable nodes detected
`.Hardware.USBDevicesPresent`	`bool`	USB devices detected
`.Hardware.GPUPresent`	`bool`	GPU devices detected

`.Topology` — cluster topology detection

Populated from node role labels and the OpenShift Infrastructure CR (config.openshift.io/v1). All fields default to safe zero-values on non-OpenShift or vanilla Kubernetes clusters.

Field	Type	Description
`.Topology.IsHCP`	`bool`	Hosted Control Plane (HyperShift) — `status.controlPlaneTopology == "External"`
`.Topology.IsCompact`	`bool`	All visible master nodes also carry the worker role (3-node clusters)
`.Topology.ControlPlaneTopology`	`string`	Raw Infrastructure CR value: `"HighlyAvailable"`, `"SingleReplica"`, `"External"`
`.Topology.CloudProvider`	`string`	Raw platform type: `"AWS"`, `"Azure"`, `"GCP"`, `"BareMetal"`, `"VSphere"`, `"OpenStack"`, `"IBMCloud"`, `"Nutanix"`, `"PowerVS"`, `"None"`, …
`.Topology.IsAWS`	`bool`	Running on AWS
`.Topology.IsAzure`	`bool`	Running on Azure
`.Topology.IsGCP`	`bool`	Running on GCP
`.Topology.IsBareMetal`	`bool`	Running on bare metal
`.Topology.IsVSphere`	`bool`	Running on vSphere
`.Topology.IsOpenStack`	`bool`	Running on OpenStack
`.Topology.MasterCount`	`int`	Nodes with the master / control-plane role label
`.Topology.WorkerCount`	`int`	Dedicated worker nodes (0 on compact clusters)
`.Topology.TotalNodeCount`	`int`	Total visible node count

Annotations

hasAnnotation object "key" "value" - Check if annotation exists with value

Standard Go Template Functions

All standard Go template functions are available:

if, else, end
range
with
and, or, not
String functions: trim, trimPrefix, trimSuffix, lower, upper

RBAC Generation

The autopilot needs RBAC permissions for all resource types it manages.

Automatic Generation

After adding new resource types, regenerate RBAC:

make generate-rbac

This tool:

Scans all templates in assets/active/
Extracts unique apiVersion and kind combinations
Generates ClusterRole with required permissions
Updates config/rbac/role.yaml

Manual RBAC (if needed)

If automatic generation doesn't cover your use case, manually edit:

File: config/rbac/role.yaml

apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: virt-platform-autopilot-role
rules:
  - apiGroups: ["my-api-group.io"]
    resources: ["myresources"]
    verbs: ["get", "list", "watch", "create", "update", "patch", "delete"]

Best Practices

1. Use Meaningful Names

# Good
- name: kubevirt-swap-optimization

# Bad
- name: asset-1

2. Set Appropriate Reconcile Order

0: HCO only
1-9: Infrastructure (MachineConfig, Kubelet, NodeHealthCheck)
10-19: Scheduling and placement
20+: Optional features

3. Provide Sensible Defaults

{{- $timeout := dig "spec" "workloadUpdateStrategy" "timeout" "5m" .HCO.Object }}
timeout: {{ $timeout }}

4. Handle Missing CRDs Gracefully

{{- if crdExists "my-crd.example.com" }}
# Only render if CRD exists
{{- end }}

5. Use Conditions for Opt-In Features

Features that aren't universally applicable should be install: opt-in:

- name: gpu-operator
  install: opt-in
  conditions:
    - type: annotation
      key: platform.kubevirt.io/enable-gpu
      value: "true"
    - type: hardware-detection
      detector: gpuPresent

6. Test Offline First

Always test with the render command before deploying:

virt-platform-autopilot render --hco-file=test-hco.yaml --output=status

7. Document Complex Templates

Add comments for complex template logic:

{{- /* Select descheduler profile based on CRD version */ -}}
{{- if crdHasEnum "kubedeschedulers.operator.openshift.io" "spec.profiles" "KubeVirtRelieveAndMigrate" }}
  # Preferred profile for KubeVirt workloads
  profiles:
    - KubeVirtRelieveAndMigrate
{{- else }}
  # Fallback for older API versions
  profiles:
    - LongLifecycle
{{- end }}

Common Patterns

Pattern 1: Version-Aware Configuration

Adapt to different API versions:

{{- $apiVersion := "v1" }}
{{- if crdHasEnum "myresource.example.com" "spec.mode" "advanced" }}
  {{- $apiVersion = "v2" }}
{{- end }}
apiVersion: myresource.example.com/{{ $apiVersion }}
kind: MyResource

Pattern 2: Topology-Gated Assets

Skip rendering entirely on unsupported topologies:

{{- if .Topology.IsHCP }}
{{- /* HCP clusters have no local control-plane — skip MachineConfig */ -}}
{{- else }}
apiVersion: machineconfiguration.openshift.io/v1
kind: MachineConfig
metadata:
  name: 99-my-tuning
spec:
  config:
    ...
{{- end }}

Or guard a compact-only tuning:

{{- if and .Topology.IsCompact .Topology.IsBareMetal }}
apiVersion: machineconfiguration.openshift.io/v1
kind: MachineConfig
metadata:
  name: 99-compact-baremetal-tuning
...
{{- end }}

Pattern 4: Environment-Specific Settings

Different settings for different environments:

{{- $replicas := 1 }}
{{- if hasAnnotation .HCO.Object "platform.kubevirt.io/environment" "production" }}
  {{- $replicas = 3 }}
{{- end }}
spec:
  replicas: {{ $replicas }}

Pattern 5: Conditional Subsections

Include entire sections conditionally:

apiVersion: v1
kind: ConfigMap
metadata:
  name: my-config
data:
  {{- if hasAnnotation .HCO.Object "platform.kubevirt.io/enable-debug" "true" }}
  debug: "true"
  log-level: "debug"
  {{- end }}
  required-setting: "value"

Troubleshooting

Template Syntax Error

Error: template: asset:5: unexpected "}"...

Fix: Check Go template syntax. Common issues:

Missing {{- or }}
Unmatched if/end blocks
Invalid function calls

Debug:

virt-platform-autopilot render --hco-file=test-hco.yaml --output=status

Asset Not Applied

Possible causes:

Condition not met (check annotations, hardware detection)
CRD not installed (check crdExists guards)
Asset filtered by disabled-resources annotation

Debug:

# Check exclusions
kubectl port-forward -n openshift-cnv deployment/virt-platform-autopilot 8081:8081
curl http://localhost:8081/debug/exclusions

# Check if asset renders
curl http://localhost:8081/debug/render/my-asset

RBAC Permission Denied

Error: forbidden: User "system:serviceaccount:openshift-cnv:virt-platform-autopilot" cannot create resource...

Fix: Regenerate RBAC:

make generate-rbac
make deploy

Thrashing (Constant Reconciliation)

Cause: Template produces different output on each render (timestamps, random values)

Fix: Make templates idempotent - same input should produce same output:

# Bad (changes every render)
timestamp: {{ now }}

# Good (stable value)
{{- $timestamp := dig "metadata" "creationTimestamp" "2024-01-01T00:00:00Z" .HCO.Object }}
createdAt: {{ $timestamp }}

Examples from Existing Assets

HCO Golden Config

File: assets/active/hco/golden-config.yaml.tpl

Production-ready HCO configuration with opinionated defaults. Must have reconcile_order: 0.

NodeHealthCheck

File: assets/active/node-health/standard-remediation.yaml

Simple static YAML - no templating needed.

Descheduler (Conditional)

File: assets/active/descheduler/recommended.yaml.tpl

Complex conditional rendering based on CRD version, reads HCO for eviction limits.

PCI Passthrough (Opt-In)

File: assets/active/machine-config/02-pci-passthrough.yaml.tpl

Requires annotation AND hardware detection to be applied.

Next Steps

After adding your asset:

Test thoroughly using render command and debug endpoints
Add integration tests in pkg/controller/
Update documentation if the feature affects users
Submit PR with your changes

FilesExpand file tree

adding-assets.md

Latest commit

History

adding-assets.md

File metadata and controls

Adding Assets Guide

Philosophy

Quick Start

1. Create Template File

2. Add Entry to Metadata Catalog

3. Test with Render Command

4. (Optional) Add Conditions

5. Update RBAC

Template Examples

Example 1: Simple Static YAML

Example 2: Templated with HCO Context

Example 3: Conditional Rendering

Example 4: Topology-Aware Configuration

Example 5: Multiple Checks Combined

Soft Dependencies

Check CRD Existence

Check Object Existence

Check Enum Values (API Version Compatibility)

Metadata Catalog Reference

Metadata Fields

Field Descriptions

Condition Types

Annotation Condition

Hardware Detection Condition

Feature Gate Condition

Multiple Conditions (AND Logic)

Testing Your Asset

1. Offline Rendering

2. Debug Endpoints

3. Integration Tests

4. Local Deployment Testing

Template Helper Functions

Resource Lookups

Data Access

.Hardware — cluster hardware detection

.Topology — cluster topology detection

Annotations

Standard Go Template Functions

RBAC Generation

Automatic Generation

Manual RBAC (if needed)

Best Practices

1. Use Meaningful Names

2. Set Appropriate Reconcile Order

3. Provide Sensible Defaults

4. Handle Missing CRDs Gracefully

5. Use Conditions for Opt-In Features

6. Test Offline First

7. Document Complex Templates

Common Patterns

Pattern 1: Version-Aware Configuration

Pattern 2: Topology-Gated Assets

Pattern 4: Environment-Specific Settings

Pattern 5: Conditional Subsections

Troubleshooting

Template Syntax Error

Asset Not Applied

RBAC Permission Denied

Thrashing (Constant Reconciliation)

Examples from Existing Assets

HCO Golden Config

NodeHealthCheck

Descheduler (Conditional)

PCI Passthrough (Opt-In)

Next Steps

Related Documentation

`.Hardware` — cluster hardware detection

`.Topology` — cluster topology detection