@@ -186,52 +186,6 @@ path using ``file`` as the url scheme e.g:
186186
187187 See :ref: `NVIDIA Role Configuration `.
188188
189- .. _NVIDIA OS Configuration :
190-
191- OS Configuration
192- ----------------
193-
194- Host OS configuration is done by using roles in the `stackhpc.linux <https://github.com/stackhpc/ansible-collection-linux >`_ ansible collection.
195-
196- Create a new playbook or update an existing on to apply the roles:
197-
198- .. code-block :: yaml
199- :caption : $KAYOBE_CONFIG_PATH/ansible/host-configure.yml
200-
201- ---
202- - hosts : iommu
203- tags :
204- - iommu
205- tasks :
206- - import_role :
207- name : stackhpc.linux.iommu
208- handlers :
209- - name : reboot
210- set_fact :
211- kayobe_needs_reboot : true
212-
213- - hosts : vgpu
214- tags :
215- - vgpu
216- tasks :
217- - import_role :
218- name : stackhpc.linux.vgpu
219- handlers :
220- - name : reboot
221- set_fact :
222- kayobe_needs_reboot : true
223-
224- - name : Reboot when required
225- hosts : iommu:vgpu
226- tags :
227- - reboot
228- tasks :
229- - name : Reboot
230- reboot :
231- reboot_timeout : 3600
232- become : true
233- when : kayobe_needs_reboot | default(false) | bool
234-
235189Ansible Inventory Configuration
236190-------------------------------
237191
@@ -276,7 +230,39 @@ hosts can automatically be mapped to these groups by configuring
276230Role Configuration
277231------------------
278232
279- Configure the VGPU devices:
233+ Look up the supported VGPU devices (here we use an H100 as an example).
234+ ``0000:06:00.0 `` is the PCI address of the GPU itself. You can find this with
235+ ``lspci | grep NVIDIA ``.
236+
237+ .. code-block :: bash
238+
239+ # Find the supported mdev types
240+ ls /sys/class/mdev_bus/0000\: 06\: 00.0/mdev_supported_types/
241+ nvidia-1130 nvidia-1131 nvidia-1132 nvidia-1133 nvidia-1134 nvidia-1135 nvidia-1136 nvidia-1137 nvidia-1138 nvidia-1139 nvidia-1140 nvidia-1141 nvidia-1142 nvidia-1143 nvidia-1144
242+
243+ # Find the names of these types.
244+ cat /sys/class/mdev_bus/0000\: 06\: 00.0/mdev_supported_types/* /name
245+ NVIDIA H100XM-1-10CME
246+ NVIDIA H100XM-1-10C
247+ NVIDIA H100XM-1-20C
248+ NVIDIA H100XM-2-20C
249+ NVIDIA H100XM-3-40C
250+ NVIDIA H100XM-4-40C
251+ NVIDIA H100XM-7-80C
252+ NVIDIA H100XM-4C
253+ NVIDIA H100XM-5C
254+ NVIDIA H100XM-8C
255+ NVIDIA H100XM-10C
256+ NVIDIA H100XM-16C
257+ NVIDIA H100XM-20C
258+ NVIDIA H100XM-40C
259+ NVIDIA H100XM-80C
260+
261+ See
262+ `the NVIDIA VGPU user guide <https://docs.nvidia.com/vgpu/19.0/grid-vgpu-user-guide/index.html>`__`
263+ for details on device types.
264+
265+ Configure the VGPU devices (here we use an A100 as a different example).
280266
281267.. code-block :: yaml
282268 :caption : $KAYOBE_CONFIG_PATH/inventory/group_vars/compute_vgpu/vgpu
0 commit comments