|
| 1 | +@startuml dpunetwork_cr_creation |
| 2 | + |
| 3 | +actor user |
| 4 | +box "Kubernetes Control Plane" |
| 5 | +participant k8s_api |
| 6 | +participant dpu_network_controller |
| 7 | +participant configmap as "ConfigMap\ndpu-device-plugin-config" |
| 8 | +end box |
| 9 | + |
| 10 | +box "Host Node" |
| 11 | +participant kubelet_host as "kubelet (Host)" |
| 12 | +participant dpu_daemon_host as "dpu-daemon (Host)\n(Device Plugin Manager + Device Plugin)" |
| 13 | +participant vsp_host as "vsp (Host)" |
| 14 | +end box |
| 15 | + |
| 16 | +box "DPU Node" |
| 17 | +participant kubelet_dpu as "kubelet (DPU)" |
| 18 | +participant dpu_daemon_dpu as "dpu-daemon (DPU)\n(Device Plugin Manager + Device Plugin)" |
| 19 | +participant vsp_dpu as "vsp (DPU)" |
| 20 | +end box |
| 21 | + |
| 22 | +autonumber |
| 23 | + |
| 24 | +== DpuNetwork CR Creation (Multiple Networks) == |
| 25 | + |
| 26 | +user -> k8s_api: Create DpuNetwork CR 1 |
| 27 | +activate k8s_api |
| 28 | +note right: **DpuNetwork 1: "dpu-network-1"**\n\napiVersion: networking.example.com/v1\nkind: DpuNetwork\nmetadata:\n name: dpu-network-1\nspec:\n nodeSelector:\n matchLabels:\n node-role: dpu-node\n dpuSelector:\n matchExpressions:\n - key: dpu-type\n operator: In\n values: ["IPU Adapter E2100"]\n - key: vfId\n operator: In\n values: ["0-3", "5-7"]\n IsDisruptive: true |
| 29 | + |
| 30 | +k8s_api -> dpu_network_controller: Reconcile Event |
| 31 | +activate dpu_network_controller |
| 32 | + |
| 33 | +== DpuNetwork Controller Reconciliation == |
| 34 | + |
| 35 | +dpu_network_controller -> k8s_api: List Nodes (match nodeSelector) |
| 36 | +activate k8s_api |
| 37 | +k8s_api -> dpu_network_controller: Matching Nodes |
| 38 | +deactivate k8s_api |
| 39 | + |
| 40 | +dpu_network_controller -> k8s_api: List Dpu CRs |
| 41 | +activate k8s_api |
| 42 | +k8s_api -> dpu_network_controller: All Dpu CRs |
| 43 | +note right: Dpu CR contains:\n netdevs:\n - name: "ens2f0v0" vfId: 0\n - name: "ens2f0v1" vfId: 1\n - name: "ens2f0v2" vfId: 2\n - name: "ens2f0v3" vfId: 3\n - name: "ens2f0v4" vfId: 4\n - name: "ens2f0v5" vfId: 5\n - name: "ens2f0v6" vfId: 6\n - name: "ens2f0v7" vfId: 7 |
| 44 | +deactivate k8s_api |
| 45 | + |
| 46 | +dpu_network_controller -> dpu_network_controller: Evaluate dpuSelector\n(match dpu-type and vfId) |
| 47 | + |
| 48 | +dpu_network_controller -> dpu_network_controller: Parse vfId ranges\n("0-3" -> [0,1,2,3]\n"5-7" -> [5,6,7]) |
| 49 | + |
| 50 | +dpu_network_controller -> dpu_network_controller: Filter VFs from Dpu CRs\n(match selected VFs: 0,1,2,3,5,6,7) |
| 51 | + |
| 52 | +dpu_network_controller -> dpu_network_controller: Generate ResourceName\n"openshift.io/dpunetwork-<dpuNetworkCR Name>" |
| 53 | + |
| 54 | +== ConfigMap-Based Device Plugin Registration == |
| 55 | + |
| 56 | +dpu_network_controller -> dpu_network_controller: Aggregate all DpuNetwork CRs\nfor this node |
| 57 | + |
| 58 | +dpu_network_controller -> dpu_network_controller: Generate ConfigMap data\n(config.json with resource definitions) |
| 59 | + |
| 60 | +dpu_network_controller -> k8s_api: Create/Update ConfigMap\ndpu-device-plugin-config |
| 61 | +activate k8s_api |
| 62 | +note right: **ConfigMap Approach (Single Source of Truth)**\n\n**One ConfigMap describes resources for both Host and DPU nodes.**\n**Each entry carries a nodeSelector so local daemons only advertise their slice.**\n\napiVersion: v1\nkind: ConfigMap\nmetadata:\n name: dpu-device-plugin-config\n namespace: dpu-operator-system\ndata:\n config.json: |\n {\n "resources": [\n {\n "resourceName": "openshift.io/dpunetwork-dpu-network-1",\n "dpuNetworkName": "dpu-network-1",\n "nodeSelector": {"matchLabels": {"node-role": "host"}},\n "vfRanges": ["0-3", "5-7"]\n },\n {\n "resourceName": "openshift.io/dpunetwork-dpu-network-1",\n "dpuNetworkName": "dpu-network-1",\n "nodeSelector": {"matchLabels": {"node-role": "dpu"}},\n "vfRanges": ["0-3", "5-7"],\n "rpmRanges": ["0-0"],\n "vethRanges": ["0-1"]\n }\n // Additional resources per DpuNetwork CR\n ]\n } |
| 63 | +k8s_api -> configmap: ConfigMap Created/Updated |
| 64 | +activate configmap |
| 65 | +k8s_api -> dpu_network_controller: ConfigMap Updated |
| 66 | +deactivate k8s_api |
| 67 | + |
| 68 | +== Host dpu-daemon Watches ConfigMap == |
| 69 | + |
| 70 | +configmap -> dpu_daemon_host: ConfigMap Change Event\n(watch notification) |
| 71 | +activate dpu_daemon_host |
| 72 | + |
| 73 | +dpu_daemon_host -> k8s_api: Get ConfigMap\ndpu-device-plugin-config |
| 74 | +activate k8s_api |
| 75 | +k8s_api -> dpu_daemon_host: ConfigMap with config.json |
| 76 | +deactivate k8s_api |
| 77 | + |
| 78 | +dpu_daemon_host -> dpu_daemon_host: Parse config.json\nFilter entries where node-role = host |
| 79 | + |
| 80 | +note over dpu_daemon_host: **Per-Node Architecture Decision:**\n**Single device plugin instance per node**\n- Host instance only advertises host-scoped resources\n- Reads shared ConfigMap, filters via nodeSelector\n- Updates in-place on ConfigMap changes |
| 81 | + |
| 82 | +alt Host Device Plugin Not Running |
| 83 | + dpu_daemon_host -> dpu_daemon_host: Start Device Plugin Instance\n(read host resources) |
| 84 | +else Host Device Plugin Already Running |
| 85 | + dpu_daemon_host -> dpu_daemon_host: Reload Config\n(apply new host resource set) |
| 86 | +end |
| 87 | + |
| 88 | +dpu_daemon_host -> vsp_host: GetDevices() |
| 89 | +activate vsp_host |
| 90 | +vsp_host -> vsp_host: Return host-visible devices\n(VF repr set shared with DPU) |
| 91 | +vsp_host -> dpu_daemon_host: Host device inventory |
| 92 | +deactivate vsp_host |
| 93 | + |
| 94 | +dpu_daemon_host -> dpu_daemon_host: Build device list\nApply vfRanges [0-3,5-7] |
| 95 | +note right: Host Resource\n"openshift.io/dpunetwork-dpu-network-1"\nDevices: VFs 0,1,2,3,5,6,7 (no RPM/veth) |
| 96 | + |
| 97 | +dpu_daemon_host -> dpu_daemon_host: ListAndWatch()\n(advertise host resource only) |
| 98 | + |
| 99 | +dpu_daemon_host -> kubelet_host: Register Device Plugin\nResource "openshift.io/dpunetwork-dpu-network-1" |
| 100 | +activate kubelet_host |
| 101 | +kubelet_host -> dpu_daemon_host: Registration Accepted |
| 102 | +kubelet_host -> kubelet_host: Add node capacity\n"openshift.io/dpunetwork-dpu-network-1": 7 (host) |
| 103 | +deactivate kubelet_host |
| 104 | + |
| 105 | +deactivate dpu_daemon_host |
| 106 | + |
| 107 | +== DPU dpu-daemon Watches ConfigMap == |
| 108 | + |
| 109 | +configmap -> dpu_daemon_dpu: ConfigMap Change Event\n(watch notification) |
| 110 | +activate dpu_daemon_dpu |
| 111 | + |
| 112 | +dpu_daemon_dpu -> k8s_api: Get ConfigMap\ndpu-device-plugin-config |
| 113 | +activate k8s_api |
| 114 | +k8s_api -> dpu_daemon_dpu: ConfigMap with config.json |
| 115 | +deactivate k8s_api |
| 116 | + |
| 117 | +dpu_daemon_dpu -> dpu_daemon_dpu: Parse config.json\nFilter entries where node-role = dpu |
| 118 | + |
| 119 | +note over dpu_daemon_dpu: **Per-Node Architecture Decision:**\n**Single DPU-side device plugin instance**\n- Reads same ConfigMap, filters for node-role=dpu\n- Advertises VF + RPM + veth resources\n- No restart required on updates |
| 120 | + |
| 121 | +alt DPU Device Plugin Not Running |
| 122 | + dpu_daemon_dpu -> dpu_daemon_dpu: Start Device Plugin Instance\n(read DPU resources) |
| 123 | +else DPU Device Plugin Already Running |
| 124 | + dpu_daemon_dpu -> dpu_daemon_dpu: Reload Config\n(apply new DPU resource set) |
| 125 | +end |
| 126 | + |
| 127 | +dpu_daemon_dpu -> vsp_dpu: GetDevices() |
| 128 | +activate vsp_dpu |
| 129 | +vsp_dpu -> vsp_dpu: Return devices by type\n(VF repr, RPM, veth) |
| 130 | +vsp_dpu -> dpu_daemon_dpu: DPU device inventory |
| 131 | +deactivate vsp_dpu |
| 132 | + |
| 133 | +dpu_daemon_dpu -> dpu_daemon_dpu: Build device lists\n- VF repr filtered by vfRanges\n- RPM list via rpmRanges\n- veth list via vethRanges |
| 134 | +note right: DPU Resources Advertised\n1. "openshift.io/dpunetwork-dpu-network-1" (VF x7)\n2. "openshift.io/rpm-disruptive" (rpmRange 0-0)\n3. "openshift.io/veth-nondisruptive" (vethRange 0-1) |
| 135 | + |
| 136 | +dpu_daemon_dpu -> dpu_daemon_dpu: ListAndWatch()\n(advertise three resources) |
| 137 | + |
| 138 | +dpu_daemon_dpu -> kubelet_dpu: Register Device Plugin\nAll DPU resources |
| 139 | +activate kubelet_dpu |
| 140 | +kubelet_dpu -> dpu_daemon_dpu: Registration Accepted |
| 141 | +kubelet_dpu -> kubelet_dpu: Add node capacity\nVF=7, RPM=1, veth=2 |
| 142 | +deactivate kubelet_dpu |
| 143 | + |
| 144 | +deactivate dpu_daemon_dpu |
| 145 | +deactivate configmap |
| 146 | + |
| 147 | +== BridgeID and NAD Generation (1 NAD per DpuNetwork CR) == |
| 148 | + |
| 149 | +dpu_network_controller -> dpu_network_controller: Create BridgeID |
| 150 | + |
| 151 | +dpu_network_controller -> dpu_network_controller: Create single NAD\nfor all VFs in network\n(shared config: IsDisruptive, IPAM) |
| 152 | + |
| 153 | +dpu_network_controller -> k8s_api: Create NetworkAttachmentDefinition |
| 154 | +activate k8s_api |
| 155 | +note right: **NAD 1 for DpuNetwork 1**\n\nmetadata:\n name: dpunetwork-1-nad\n namespace: default\n annotations:\n dpu.config.openshift.io/dpu-network: dpu-network-1\n k8s.v1.cni.cncf.io/resourceName: openshift.io/dpunetwork-dpu-network-1\nspec:\n config: {\n "type": "dpu-cni",\n "cniVersion": "0.4.0",\n "name": "dpu-cni",\n "BridgeID": "<created-bridgeID>",\n "IsDisruptive": "true",\n "ipam": {...}\n }\n\n**VFs (0,1,2,3,5,6,7) use this NAD**\n**Multiple pods can use this NAD**\n**Each pod gets allocated a VF from the pool** |
| 156 | +k8s_api -> dpu_network_controller: NAD Created |
| 157 | +deactivate k8s_api |
| 158 | + |
| 159 | +note over dpu_network_controller: **About NRI (Network Resources Injector):**\nNRI webhook is installed once (via DpuOperatorConfig) and is not re-registered per DpuNetwork.\nDpuNetwork creation only needs to create NAD(s) and (optionally) publish a mapping (e.g., in DpuNetwork.status)\nso NRI can translate `dpu.config.openshift.io/dpu-network: <name>` into\n`k8s.v1.cni.cncf.io/networks: <nad list>` during Pod CREATE. |
| 160 | + |
| 161 | +dpu_network_controller -> k8s_api: Update DpuNetwork 1 Status |
| 162 | +activate k8s_api |
| 163 | +note right: status:\n conditions:\n - type: Ready\n status: True\n message: NAD and Device Plugin created\n resourceName: "openshift.io/dpunetwork-dpu-network-1"\n selectedVFs: [0,1,2,3,5,6,7]\n excludedVFs: [4] |
| 164 | +k8s_api -> dpu_network_controller: Status Updated |
| 165 | +deactivate k8s_api |
| 166 | + |
| 167 | +deactivate dpu_network_controller |
| 168 | +deactivate k8s_api |
| 169 | + |
| 170 | +note over k8s_api: **Architecture Summary:**\n**Single ConfigMap, per-node device plugin instances**\n**- Host dpu-daemon filters node-role=host resources**\n**- DPU dpu-daemon filters node-role=dpu resources (VF+RPM+veth)**\n**- Each node runs exactly one device plugin instance**\n**- Entries share resourceName when devices overlap**\n**- NAD per DpuNetwork CR stays unchanged**\n\n**When new DpuNetwork CR created:**\n**- Controller updates ConfigMap with host + DPU entries**\n**- Both daemons detect change and reload in-place**\n**- No new pods/daemons required, only ListAndWatch updates** |
| 171 | + |
| 172 | +note right of user: **See:**\n- pod_creation_regular.puml for pod creation flow\n- pod_creation_nf_disruptive.puml for NF pod flow\n- dpunetwork_cr_update.puml for update flow\n- dpunetwork_cr_deletion.puml for deletion flow |
| 173 | + |
| 174 | +@enduml |
| 175 | + |
0 commit comments