The AIC Operator enables the Qualcomm Cloud AI 100 series of accelerators on OpenShift clusters by automating the installation and configuration of their Linux device drivers and setting up the associated Device Plugin.
- Prerequisites
- Installation
- Running Inference Workloads
- Uninstallation
- Building from Source
- Contributing
- License
- OpenShift v4.18+
- Kernel Module Management (KMM) Operator v2.0+ by Red Hat
- Node Feature Discovery (NFD) Operator v4.16+ by Red Hat
- A Qualcomm AI 100 Ultra accelerator attached to the host node
ocCLI configured and logged in with cluster-admin privileges- Access to a container registry reachable from the cluster
The Kernel Module Management (KMM) Operator handles out-of-tree driver builds and loading on cluster nodes.
- In the OpenShift web console, navigate to Ecosystem → Software Catalog.
- Search for Kernel Module Management and install the version provided by Red Hat (not the Hub or Community variant).
- Use the default KMM namespace
openshift-kmm.
The Node Feature Discovery (NFD) Operator labels nodes with hardware capabilities so the AIC Operator can target the correct nodes.
- In the OpenShift web console, navigate to Ecosystem → Software Catalog.
- Search for Node Feature Discovery and install the version provided by Red Hat (not the Community variant).
- Use the default NFD namespace
openshift-nfd. - After installation, navigate to the NFD operator page and create the default NodeFeatureDiscovery CR by clicking Create instance.
Log in to the cluster from a terminal. To get the login command, open the web console, click the kube:admin dropdown in the top-right corner, and select Copy login command.
oc get nodesExpected output:
NAME STATUS ROLES AGE VERSION
<node-name> Ready control-plane,master,worker 209d v1.29.10+67d3387
oc create namespace aic-operator-systemThe sourceImage, devicePluginImage, and socResetImage are hosted on the public
ghcr.io/quic registry and do not require credentials.
The driversImage is a kernel-specific image that you build and push to your own
registry using the Dockerfile provided in this repository. Create a pull secret so
the cluster can pull it:
oc create secret docker-registry <pull-secret-name> \
--docker-server=<your-registry-server> \
--docker-username=<username> \
--docker-password=<password> \
-n aic-operator-systemReference this secret by name in the
AICCR underspec.imageRepoSecret.name.
The KMM worker needs to know where firmware files are stored on the host. Apply the following ConfigMap to set the firmware host path, then restart the KMM controller pods:
Warning: The
oc applycommand below replaces the entirekmm-operator-manager-configConfigMap. If your cluster has custom KMM settings beyond the firmware path, merge them into thecontroller_config.yamlblock before applying.
cat <<EOF | oc apply -f -
apiVersion: v1
kind: ConfigMap
metadata:
name: kmm-operator-manager-config
namespace: openshift-kmm
data:
controller_config.yaml: |
worker:
firmwareHostPath: /var/lib/firmware/updates # Ensure firmware files are placed in this directory
EOFNote: The path
/var/lib/firmware/updatesis the expected firmware location for this configuration. Verify that firmware files on your cluster nodes are placed in this directory. If your cluster uses a different firmware path, update this value accordingly before applying.
Verify the ConfigMap was updated:
oc get configmap kmm-operator-manager-config -n openshift-kmm -o=json
# Verify the output contains: "firmwareHostPath: /var/lib/firmware/updates"Restart the KMM controller to apply the change:
oc delete pod -n openshift-kmm -l app.kubernetes.io/component=kmmVerify the KMM controller pods are running after the restart:
oc get pods -n openshift-kmm -l app.kubernetes.io/component=kmm
# All pods should show STATUS: RunningFor full KMM configuration options see the KMM documentation.
Option A — via the web console:
- Navigate to Ecosystem → Software Catalog.
- Search for AIC Operator and install the version provided by Qualcomm Technologies, Inc.
- Set the Installed Namespace to
aic-operator-system.
Option B — via CLI using Kustomize:
# Latest (uses the image tag committed on the main branch)
oc apply -k https://github.com/quic/aic-operator/config/default
# Pin to a specific release — the kustomization.yaml at that tag already has the
# correct image version, so no substitution is needed
oc apply -k https://github.com/quic/aic-operator/config/default?ref=v0.3.0The image tag is managed in
config/manager/kustomization.yamlin the repository. Each release tag sets the correct operator image version — use?ref=<tag>to select a release (see GitHub releases). This command creates theaic-operator-systemnamespace, registers the CRD, and deploys the controller in one step. When using this option:
- Skip the
oc create namespacecommand in step 3.- Create the pull secret after running the command above (the namespace must exist first). Use the same
oc create secret docker-registrycommand from step 3.
A successful installation (either option) starts a pod named
aic-operator-controller-manager-* (with 2 containers) in the
aic-operator-system namespace:
oc get pods -n aic-operator-systemCreate an AIC custom resource to load the drivers and start the Device Plugin on nodes
that carry a Cloud AI 100 accelerator.
Save the following as aic-sample.yaml, replacing all <placeholder> values for your
environment. The ${KERNEL_VERSION} token in driversImage is substituted
automatically by KMM at build time — leave it as-is.
apiVersion: aic.quicinc.com/v1
kind: AIC
metadata:
name: aic-sample
namespace: aic-operator-system
labels:
app.kubernetes.io/name: aic
app.kubernetes.io/instance: aic-sample
app.kubernetes.io/part-of: aic-operator
app.kubernetes.io/managed-by: kustomize
app.kubernetes.io/created-by: aic-operator
spec:
sourceImage: "ghcr.io/quic/cloud_ai_qaic_kmd_src"
driversImage: "<your-registry-server>/<username>/cloud_ai_qaic_kmd_ko:0.4.0_1.21.4.0_${KERNEL_VERSION}"
driversVersion: "0.4.0_1.21.4.0"
devicePluginImage: "ghcr.io/quic/cloud_ai_k8s_device_plugin"
devPluginVersion: "1.21.4.0"
socResetImage: "ghcr.io/quic/cloud_ai_socreset"
socResetVersion: "0.4.0_1.21.4.0"
imageRepoSecret:
name: <pull-secret-name> # name of the secret created in step 3Apply it:
oc create -f aic-sample.yamlVerify the AIC instance is running and the device plugin pod has started:
oc get aic -n aic-operator-system
oc get pods -n aic-operator-systemOnce the AIC instance is running, nodes with Cloud AI 100 hardware expose the
qualcomm.com/qaic resource. The following example pod requests 4 AIC devices and
keeps a shell open for interactive use:
apiVersion: v1
kind: Pod
metadata:
name: qaic-inf-deployment
namespace: aic-operator-system
labels:
app: qaic-inf
spec:
restartPolicy: Always
priority: 0
containers:
- name: qaic-inf
image: ghcr.io/quic/cloud_ai_inference_ubuntu24:1.20.6.0
imagePullPolicy: IfNotPresent
command: ["/bin/bash", "-ce", "tail -f /dev/null"]
resources:
limits:
qualcomm.com/qaic: "4" # number of AIC devices to allocate
requests:
qualcomm.com/qaic: "4"
securityContext:
runAsGroup: 995 # GID of the 'qaic' group inside the container imageSave the manifest above as qaic-inf-pod.yaml, then apply it:
oc create -f qaic-inf-pod.yamlOpen a shell inside the pod and run the sample inference script:
oc exec -it qaic-inf-deployment -n aic-operator-system -- /bin/bash
# Inside the pod:
source vllm_env/bin/activate
python /opt/qti-aic/integrations/vllm/examples/offline_inference/qaic.pyDelete the AIC custom resource first (this stops the device plugin and unloads the drivers):
oc delete -f aic-sample.yaml
# or
oc delete AIC aic-sample -n aic-operator-systemThen uninstall the operator through the OpenShift web console: Ecosystem → Installed Operators → AIC Operator → Actions → Uninstall Operator.
- Go v1.23.0+
- Docker v17.03+
make docker-build docker-push \
IMG=<some-registry>/cloud_ai_openshift_operator:tag \
VERSION=<version>make bundle-build bundle-push \
IMG=<some-registry>/cloud_ai_openshift_operator:tag \
BUNDLE_IMG=<some-registry>/cloud_ai_openshift_operator_bundle:tag \
VERSION=<version>Note: The images must be published to a registry accessible from your cluster. Ensure you have the appropriate push and pull permissions before running these commands.
make install
make deploy IMG=<some-registry>/cloud_ai_openshift_operator:tagRun make help for a full list of available targets.
If you deployed the operator manually from source, use the Makefile targets instead:
oc delete -f aic-sample.yaml
make uninstall
make undeploySee the CONTRIBUTING guide for details on how to contribute to this project. Additional background on the operator framework can be found in the Kubebuilder Documentation.
The AIC Operator is licensed under the terms of the LICENSE file.