Skip to content

Commit 0d43bb2

Browse files
authored
More gpu docs improvements (#2033)
1 parent 240b3ed commit 0d43bb2

File tree

1 file changed

+33
-47
lines changed

1 file changed

+33
-47
lines changed

doc/source/operations/gpu-in-openstack.rst

Lines changed: 33 additions & 47 deletions
Original file line numberDiff line numberDiff line change
@@ -186,52 +186,6 @@ path using ``file`` as the url scheme e.g:
186186
187187
See :ref:`NVIDIA Role Configuration`.
188188

189-
.. _NVIDIA OS Configuration:
190-
191-
OS Configuration
192-
----------------
193-
194-
Host OS configuration is done by using roles in the `stackhpc.linux <https://github.com/stackhpc/ansible-collection-linux>`_ ansible collection.
195-
196-
Create a new playbook or update an existing on to apply the roles:
197-
198-
.. code-block:: yaml
199-
:caption: $KAYOBE_CONFIG_PATH/ansible/host-configure.yml
200-
201-
---
202-
- hosts: iommu
203-
tags:
204-
- iommu
205-
tasks:
206-
- import_role:
207-
name: stackhpc.linux.iommu
208-
handlers:
209-
- name: reboot
210-
set_fact:
211-
kayobe_needs_reboot: true
212-
213-
- hosts: vgpu
214-
tags:
215-
- vgpu
216-
tasks:
217-
- import_role:
218-
name: stackhpc.linux.vgpu
219-
handlers:
220-
- name: reboot
221-
set_fact:
222-
kayobe_needs_reboot: true
223-
224-
- name: Reboot when required
225-
hosts: iommu:vgpu
226-
tags:
227-
- reboot
228-
tasks:
229-
- name: Reboot
230-
reboot:
231-
reboot_timeout: 3600
232-
become: true
233-
when: kayobe_needs_reboot | default(false) | bool
234-
235189
Ansible Inventory Configuration
236190
-------------------------------
237191

@@ -276,7 +230,39 @@ hosts can automatically be mapped to these groups by configuring
276230
Role Configuration
277231
------------------
278232

279-
Configure the VGPU devices:
233+
Look up the supported VGPU devices (here we use an H100 as an example).
234+
``0000:06:00.0`` is the PCI address of the GPU itself. You can find this with
235+
``lspci | grep NVIDIA``.
236+
237+
.. code-block:: bash
238+
239+
# Find the supported mdev types
240+
ls /sys/class/mdev_bus/0000\:06\:00.0/mdev_supported_types/
241+
nvidia-1130 nvidia-1131 nvidia-1132 nvidia-1133 nvidia-1134 nvidia-1135 nvidia-1136 nvidia-1137 nvidia-1138 nvidia-1139 nvidia-1140 nvidia-1141 nvidia-1142 nvidia-1143 nvidia-1144
242+
243+
# Find the names of these types.
244+
cat /sys/class/mdev_bus/0000\:06\:00.0/mdev_supported_types/*/name
245+
NVIDIA H100XM-1-10CME
246+
NVIDIA H100XM-1-10C
247+
NVIDIA H100XM-1-20C
248+
NVIDIA H100XM-2-20C
249+
NVIDIA H100XM-3-40C
250+
NVIDIA H100XM-4-40C
251+
NVIDIA H100XM-7-80C
252+
NVIDIA H100XM-4C
253+
NVIDIA H100XM-5C
254+
NVIDIA H100XM-8C
255+
NVIDIA H100XM-10C
256+
NVIDIA H100XM-16C
257+
NVIDIA H100XM-20C
258+
NVIDIA H100XM-40C
259+
NVIDIA H100XM-80C
260+
261+
See
262+
`the NVIDIA VGPU user guide <https://docs.nvidia.com/vgpu/19.0/grid-vgpu-user-guide/index.html>`__`
263+
for details on device types.
264+
265+
Configure the VGPU devices (here we use an A100 as a different example).
280266

281267
.. code-block:: yaml
282268
:caption: $KAYOBE_CONFIG_PATH/inventory/group_vars/compute_vgpu/vgpu

0 commit comments

Comments
 (0)