-
Notifications
You must be signed in to change notification settings - Fork 19
Description
NVIDIA Open GPU Kernel Modules Version
580.95.05
Please confirm this issue does not happen with the proprietary driver (of the same version). This issue tracker is only for bugs specific to the open kernel driver.
- I confirm that this does not happen with the proprietary driver package.
Operating System and Version
Description: Ubuntu 22.04.5 LTS
Kernel Release
Linux gpu-svr 6.8.0-85-generic NVIDIA#85~22.04.1-Ubuntu SMP PREEMPT_DYNAMIC Fri Sep 19 16:18:59 UTC 2 x86_64 x86_64 x86_64 GNU/Linux
Please confirm you are running a stable release kernel (e.g. not a -rc). We do not accept bug reports for unreleased kernels.
- I am running on a stable kernel release.
Hardware: GPU
NVIDIA GeForce RTX 5090
Describe the bug
supermicro as 4125gs tnrt2
GRUB_CMDLINE_LINUX_DEFAULT="quiet splash amd_iommu=on iommu=pt"
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 580.95.05 Driver Version: 580.95.05 CUDA Version: 13.0 |
+-----------------------------------------+------------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 NVIDIA GeForce RTX 5090 On | 00000000:65:00.0 Off | N/A |
| 0% 33C P8 13W / 575W | 1MiB / 32607MiB | 0% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
| 1 NVIDIA GeForce RTX 5090 On | 00000000:84:00.0 Off | N/A |
| 0% 33C P8 14W / 575W | 1MiB / 32607MiB | 0% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
| 2 NVIDIA GeForce RTX 5090 On | 00000000:E4:00.0 Off | N/A |
| 0% 35C P8 6W / 575W | 1MiB / 32607MiB | 0% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
+-----------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=========================================================================================|
| No running processes found |
+-----------------------------------------------------------------------------------------+
uname -r
6.8.0-85-generic
BISO:
CSM Disabled
IOMMU Disabled
ACS Disabled
Re-Size BAR Enabled
lspci -vvv |grep ACSCtl
ACSCtl: SrcValid- TransBlk- ReqRedir- CmpltRedir- UpstreamFwd- EgressCtrl- DirectTrans-
.
.
.
ACSCtl: SrcValid- TransBlk- ReqRedir- CmpltRedir- UpstreamFwd- EgressCtrl- DirectTrans-
dmesg:
[ 2.022048] kernel: pci 0000:65:00.0: [10de:2b85] type 00 class 0x030000 PCIe Legacy Endpoint
[ 2.023376] kernel: pci 0000:65:00.0: BAR 0 [mem 0xd4000000-0xd7ffffff]
[ 2.023389] kernel: pci 0000:65:00.0: BAR 1 [mem 0x303b0000000-0x303bfffffff 64bit pref]
[ 2.023402] kernel: pci 0000:65:00.0: BAR 3 [mem 0x303c2000000-0x303c3ffffff 64bit pref]
[ 2.023409] kernel: pci 0000:65:00.0: BAR 5 [io 0x4000-0x407f]
[ 2.023415] kernel: pci 0000:65:00.0: ROM [mem 0xd8000000-0xd807ffff pref]
[ 2.024054] kernel: pci 0000:65:00.0: Enabling HDA controller
[ 2.024798] kernel: pci 0000:65:00.0: PME# supported from D0 D3hot
[ 2.025267] kernel: pci 0000:65:00.0: VF BAR 0 [mem 0x303c4000000-0x303c403ffff 64bit pref]
[ 2.025267] kernel: pci 0000:65:00.0: VF BAR 0 [mem 0x303c4000000-0x303c403ffff 64bit pref]: contains BAR 0 for 1 VFs
[ 2.025277] kernel: pci 0000:65:00.0: VF BAR 2 [mem 0x303a0000000-0x303afffffff 64bit pref]
[ 2.025277] kernel: pci 0000:65:00.0: VF BAR 2 [mem 0x303a0000000-0x303afffffff 64bit pref]: contains BAR 2 for 1 VFs
[ 2.025287] kernel: pci 0000:65:00.0: VF BAR 4 [mem 0x303c0000000-0x303c1ffffff 64bit pref]
[ 2.025287] kernel: pci 0000:65:00.0: VF BAR 4 [mem 0x303c0000000-0x303c1ffffff 64bit pref]: contains BAR 4 for 1 VFs
[ 2.183898] kernel: pci 0000:65:00.0: vgaarb: bridge control possible
[ 2.183898] kernel: pci 0000:65:00.0: vgaarb: VGA device added: decodes=io+mem,owns=none,locks=none
[ 2.208377] kernel: pci 0000:65:00.1: D0 power state depends on 0000:65:00.0
[ 12.860169] kernel: nouveau 0000:65:00.0: unknown chipset (1b2000a1)
[ 12.937109] kernel: nvidia 0000:65:00.0: vgaarb: VGA decodes changed: olddecodes=io+mem,decodes=none:owns=none
[ 13.085425] kernel: [drm] Initialized nvidia-drm 0.0.0 20160202 for 0000:65:00.0 on minor 2
[ 14.332406] kernel: NVOC: __nvoc_objDelete: Child class KernelVideoEngine not freed from parent class OBJGPU.NVRM: GPU 0000:65:00.0: RmInitAdapter failed! (0x24:0x72:1207)
[ 14.333020] kernel: NVRM: GPU 0000:65:00.0: rm_init_adapter failed, device minor number 0
[ 14.495841] kernel: NVRM: GPU 0000:65:00.0: RmInitAdapter failed! (0x62:0x40:2015)
[ 14.496466] kernel: NVRM: GPU 0000:65:00.0: rm_init_adapter failed, device minor number 0
[ 2.078028] kernel: pci 0000:84:00.0: [10de:2b85] type 00 class 0x030000 PCIe Legacy Endpoint
[ 2.078962] kernel: pci 0000:84:00.0: BAR 0 [mem 0xb8000000-0xbbffffff]
[ 2.078975] kernel: pci 0000:84:00.0: BAR 1 [mem 0xe05c0000000-0xe05cfffffff 64bit pref]
[ 2.078988] kernel: pci 0000:84:00.0: BAR 3 [mem 0xe05d2000000-0xe05d3ffffff 64bit pref]
[ 2.078995] kernel: pci 0000:84:00.0: BAR 5 [io 0x5000-0x507f]
[ 2.079002] kernel: pci 0000:84:00.0: ROM [mem 0xbc000000-0xbc07ffff pref]
[ 2.081207] kernel: pci 0000:84:00.0: Enabling HDA controller
[ 2.081649] kernel: pci 0000:84:00.0: PME# supported from D0 D3hot
[ 2.082204] kernel: pci 0000:84:00.0: VF BAR 0 [mem 0xe05d4000000-0xe05d403ffff 64bit pref]
[ 2.082205] kernel: pci 0000:84:00.0: VF BAR 0 [mem 0xe05d4000000-0xe05d403ffff 64bit pref]: contains BAR 0 for 1 VFs
[ 2.082218] kernel: pci 0000:84:00.0: VF BAR 2 [mem 0xe05b0000000-0xe05bfffffff 64bit pref]
[ 2.082218] kernel: pci 0000:84:00.0: VF BAR 2 [mem 0xe05b0000000-0xe05bfffffff 64bit pref]: contains BAR 2 for 1 VFs
[ 2.082231] kernel: pci 0000:84:00.0: VF BAR 4 [mem 0xe05d0000000-0xe05d1ffffff 64bit pref]
[ 2.082232] kernel: pci 0000:84:00.0: VF BAR 4 [mem 0xe05d0000000-0xe05d1ffffff 64bit pref]: contains BAR 4 for 1 VFs
[ 2.183919] kernel: pci 0000:84:00.0: vgaarb: bridge control possible
[ 2.183919] kernel: pci 0000:84:00.0: vgaarb: VGA device added: decodes=io+mem,owns=none,locks=none
[ 2.208982] kernel: pci 0000:84:00.1: D0 power state depends on 0000:84:00.0
[ 12.860498] kernel: nouveau 0000:84:00.0: unknown chipset (1b2000a1)
[ 13.041762] kernel: nvidia 0000:84:00.0: BAR 1 [mem 0xe05c0000000-0xe05cfffffff 64bit pref]: releasing
[ 13.041768] kernel: nvidia 0000:84:00.0: BAR 3 [mem 0xe05d2000000-0xe05d3ffffff 64bit pref]: releasing
[ 13.042021] kernel: nvidia 0000:84:00.0: BAR 1 [mem size 0x800000000 64bit pref]: can't assign; no space
[ 13.042023] kernel: nvidia 0000:84:00.0: BAR 1 [mem size 0x800000000 64bit pref]: failed to assign
[ 13.042025] kernel: nvidia 0000:84:00.0: BAR 3 [mem 0xe05c0000000-0xe05c1ffffff 64bit pref]: assigned
[ 13.042201] kernel: nvidia 0000:84:00.0: vgaarb: VGA decodes changed: olddecodes=io+mem,decodes=none:owns=none
[ 13.085603] kernel: [drm] Initialized nvidia-drm 0.0.0 20160202 for 0000:84:00.0 on minor 3
[ 15.330638] kernel: NVOC: __nvoc_objDelete: Child class KernelVideoEngine not freed from parent class OBJGPU.NVRM: GPU 0000:84:00.0: RmInitAdapter failed! (0x24:0x72:1207)
[ 15.331263] kernel: NVRM: GPU 0000:84:00.0: rm_init_adapter failed, device minor number 1
[ 15.501999] kernel: NVRM: GPU 0000:84:00.0: RmInitAdapter failed! (0x62:0x40:2015)
[ 15.502613] kernel: NVRM: GPU 0000:84:00.0: rm_init_adapter failed, device minor number 1
[ 2.132194] kernel: pci 0000:e4:00.0: [10de:2b85] type 00 class 0x030000 PCIe Legacy Endpoint
[ 2.133137] kernel: pci 0000:e4:00.0: BAR 0 [mem 0xc0000000-0xc3ffffff]
[ 2.133150] kernel: pci 0000:e4:00.0: BAR 1 [mem 0xb0530000000-0xb053fffffff 64bit pref]
[ 2.133163] kernel: pci 0000:e4:00.0: BAR 3 [mem 0xb0542000000-0xb0543ffffff 64bit pref]
[ 2.133170] kernel: pci 0000:e4:00.0: BAR 5 [io 0x7000-0x707f]
[ 2.133176] kernel: pci 0000:e4:00.0: ROM [mem 0xc4000000-0xc407ffff pref]
[ 2.135906] kernel: pci 0000:e4:00.0: Enabling HDA controller
[ 2.136743] kernel: pci 0000:e4:00.0: PME# supported from D0 D3hot
[ 2.137295] kernel: pci 0000:e4:00.0: VF BAR 0 [mem 0xb0544000000-0xb054403ffff 64bit pref]
[ 2.137295] kernel: pci 0000:e4:00.0: VF BAR 0 [mem 0xb0544000000-0xb054403ffff 64bit pref]: contains BAR 0 for 1 VFs
[ 2.137308] kernel: pci 0000:e4:00.0: VF BAR 2 [mem 0xb0520000000-0xb052fffffff 64bit pref]
[ 2.137309] kernel: pci 0000:e4:00.0: VF BAR 2 [mem 0xb0520000000-0xb052fffffff 64bit pref]: contains BAR 2 for 1 VFs
[ 2.137322] kernel: pci 0000:e4:00.0: VF BAR 4 [mem 0xb0540000000-0xb0541ffffff 64bit pref]
[ 2.137322] kernel: pci 0000:e4:00.0: VF BAR 4 [mem 0xb0540000000-0xb0541ffffff 64bit pref]: contains BAR 4 for 1 VFs
[ 2.183939] kernel: pci 0000:e4:00.0: vgaarb: bridge control possible
[ 2.183939] kernel: pci 0000:e4:00.0: vgaarb: VGA device added: decodes=io+mem,owns=none,locks=none
[ 2.209675] kernel: pci 0000:e4:00.1: D0 power state depends on 0000:e4:00.0
[ 12.862345] kernel: nouveau 0000:e4:00.0: unknown chipset (1b2000a1)
[ 13.055685] kernel: nvidia 0000:e4:00.0: BAR 1 [mem 0xb0530000000-0xb053fffffff 64bit pref]: releasing
[ 13.055687] kernel: nvidia 0000:e4:00.0: BAR 3 [mem 0xb0542000000-0xb0543ffffff 64bit pref]: releasing
[ 13.055958] kernel: nvidia 0000:e4:00.0: BAR 1 [mem size 0x800000000 64bit pref]: can't assign; no space
[ 13.055959] kernel: nvidia 0000:e4:00.0: BAR 1 [mem size 0x800000000 64bit pref]: failed to assign
[ 13.055960] kernel: nvidia 0000:e4:00.0: BAR 3 [mem 0xb0530000000-0xb0531ffffff 64bit pref]: assigned
[ 13.056132] kernel: nvidia 0000:e4:00.0: vgaarb: VGA decodes changed: olddecodes=io+mem,decodes=none:owns=none
[ 13.085701] kernel: [drm] Initialized nvidia-drm 0.0.0 20160202 for 0000:e4:00.0 on minor 4
[ 16.394910] kernel: NVOC: __nvoc_objDelete: Child class KernelVideoEngine not freed from parent class OBJGPU.NVRM: GPU 0000:e4:00.0: RmInitAdapter failed! (0x24:0x72:1207)
[ 16.395590] kernel: NVRM: GPU 0000:e4:00.0: rm_init_adapter failed, device minor number 2
[ 16.561983] kernel: NVRM: GPU 0000:e4:00.0: RmInitAdapter failed! (0x62:0x40:2015)
[ 16.562622] kernel: NVRM: GPU 0000:e4:00.0: rm_init_adapter failed, device minor number 2
[ 14.495505] kernel: NVRM: _kgspBootGspRm: unexpected WPR2 already up, cannot proceed with booting GSP
[ 14.495532] kernel: NVRM: RmInitAdapter: Cannot initialize GSP firmware RM
[ 15.501657] kernel: NVRM: _kgspBootGspRm: unexpected WPR2 already up, cannot proceed with booting GSP
[ 15.501676] kernel: NVRM: RmInitAdapter: Cannot initialize GSP firmware RM
[ 16.561510] kernel: NVRM: _kgspBootGspRm: unexpected WPR2 already up, cannot proceed with booting GSP
[ 16.561534] kernel: NVRM: RmInitAdapter: Cannot initialize GSP firmware RM
With Resizable BAR disabled
Re-Size BAR Disabled
[ 2.059343] kernel: pci 0000:65:00.0: [10de:2b85] type 00 class 0x030000 PCIe Legacy Endpoint
[ 2.059697] kernel: pci 0000:65:00.0: BAR 0 [mem 0xd4000000-0xd7ffffff]
[ 2.059710] kernel: pci 0000:65:00.0: BAR 1 [mem 0x303b0000000-0x303bfffffff 64bit pref]
[ 2.059723] kernel: pci 0000:65:00.0: BAR 3 [mem 0x303c2000000-0x303c3ffffff 64bit pref]
[ 2.059730] kernel: pci 0000:65:00.0: BAR 5 [io 0x4000-0x407f]
[ 2.059736] kernel: pci 0000:65:00.0: ROM [mem 0xd8000000-0xd807ffff pref]
[ 2.060606] kernel: pci 0000:65:00.0: Enabling HDA controller
[ 2.061307] kernel: pci 0000:65:00.0: PME# supported from D0 D3hot
[ 2.061776] kernel: pci 0000:65:00.0: VF BAR 0 [mem 0x303c4000000-0x303c403ffff 64bit pref]
[ 2.061776] kernel: pci 0000:65:00.0: VF BAR 0 [mem 0x303c4000000-0x303c403ffff 64bit pref]: contains BAR 0 for 1 VFs
[ 2.061786] kernel: pci 0000:65:00.0: VF BAR 2 [mem 0x303a0000000-0x303afffffff 64bit pref]
[ 2.061786] kernel: pci 0000:65:00.0: VF BAR 2 [mem 0x303a0000000-0x303afffffff 64bit pref]: contains BAR 2 for 1 VFs
[ 2.061796] kernel: pci 0000:65:00.0: VF BAR 4 [mem 0x303c0000000-0x303c1ffffff 64bit pref]
[ 2.061796] kernel: pci 0000:65:00.0: VF BAR 4 [mem 0x303c0000000-0x303c1ffffff 64bit pref]: contains BAR 4 for 1 VFs
[ 2.213551] kernel: pci 0000:65:00.0: vgaarb: bridge control possible
[ 2.213551] kernel: pci 0000:65:00.0: vgaarb: VGA device added: decodes=io+mem,owns=none,locks=none
[ 2.240224] kernel: pci 0000:65:00.1: D0 power state depends on 0000:65:00.0
[ 2.242527] kernel: pci 0000:65:00.0: Adding to iommu group 38
[ 13.079625] kernel: nouveau 0000:65:00.0: unknown chipset (1b2000a1)
[ 13.197569] kernel: nvidia 0000:65:00.0: vgaarb: VGA decodes changed: olddecodes=io+mem,decodes=none:owns=none
[ 13.318375] kernel: [drm] Initialized nvidia-drm 0.0.0 20160202 for 0000:65:00.0 on minor 2
[ 2.112645] kernel: pci 0000:84:00.0: [10de:2b85] type 00 class 0x030000 PCIe Legacy Endpoint
[ 2.113324] kernel: pci 0000:84:00.0: BAR 0 [mem 0xb8000000-0xbbffffff]
[ 2.113337] kernel: pci 0000:84:00.0: BAR 1 [mem 0xe05c0000000-0xe05cfffffff 64bit pref]
[ 2.113350] kernel: pci 0000:84:00.0: BAR 3 [mem 0xe05d2000000-0xe05d3ffffff 64bit pref]
[ 2.113357] kernel: pci 0000:84:00.0: BAR 5 [io 0x5000-0x507f]
[ 2.113364] kernel: pci 0000:84:00.0: ROM [mem 0xbc000000-0xbc07ffff pref]
[ 2.115362] kernel: pci 0000:84:00.0: Enabling HDA controller
[ 2.116198] kernel: pci 0000:84:00.0: PME# supported from D0 D3hot
[ 2.116746] kernel: pci 0000:84:00.0: VF BAR 0 [mem 0xe05d4000000-0xe05d403ffff 64bit pref]
[ 2.116746] kernel: pci 0000:84:00.0: VF BAR 0 [mem 0xe05d4000000-0xe05d403ffff 64bit pref]: contains BAR 0 for 1 VFs
[ 2.116759] kernel: pci 0000:84:00.0: VF BAR 2 [mem 0xe05b0000000-0xe05bfffffff 64bit pref]
[ 2.116760] kernel: pci 0000:84:00.0: VF BAR 2 [mem 0xe05b0000000-0xe05bfffffff 64bit pref]: contains BAR 2 for 1 VFs
[ 2.116773] kernel: pci 0000:84:00.0: VF BAR 4 [mem 0xe05d0000000-0xe05d1ffffff 64bit pref]
[ 2.116773] kernel: pci 0000:84:00.0: VF BAR 4 [mem 0xe05d0000000-0xe05d1ffffff 64bit pref]: contains BAR 4 for 1 VFs
[ 2.213551] kernel: pci 0000:84:00.0: vgaarb: bridge control possible
[ 2.213551] kernel: pci 0000:84:00.0: vgaarb: VGA device added: decodes=io+mem,owns=none,locks=none
[ 2.240551] kernel: pci 0000:84:00.1: D0 power state depends on 0000:84:00.0
[ 2.246017] kernel: pci 0000:84:00.0: Adding to iommu group 123
[ 13.079763] kernel: nouveau 0000:84:00.0: unknown chipset (1b2000a1)
[ 13.279837] kernel: nvidia 0000:84:00.0: BAR 1 [mem 0xe05c0000000-0xe05cfffffff 64bit pref]: releasing
[ 13.279842] kernel: nvidia 0000:84:00.0: BAR 3 [mem 0xe05d2000000-0xe05d3ffffff 64bit pref]: releasing
[ 13.280071] kernel: nvidia 0000:84:00.0: BAR 1 [mem 0xe05c0000000-0xe05cfffffff 64bit pref]: assigned
[ 13.280131] kernel: nvidia 0000:84:00.0: BAR 3 [mem 0xe05d2000000-0xe05d3ffffff 64bit pref]: assigned
[ 13.280221] kernel: nvidia 0000:84:00.0: vgaarb: VGA decodes changed: olddecodes=io+mem,decodes=none:owns=none
[ 13.318424] kernel: [drm] Initialized nvidia-drm 0.0.0 20160202 for 0000:84:00.0 on minor 3
[ 2.164752] kernel: pci 0000:e4:00.0: [10de:2b85] type 00 class 0x030000 PCIe Legacy Endpoint
[ 2.165160] kernel: pci 0000:e4:00.0: BAR 0 [mem 0xc0000000-0xc3ffffff]
[ 2.165173] kernel: pci 0000:e4:00.0: BAR 1 [mem 0xb0530000000-0xb053fffffff 64bit pref]
[ 2.165186] kernel: pci 0000:e4:00.0: BAR 3 [mem 0xb0542000000-0xb0543ffffff 64bit pref]
[ 2.165193] kernel: pci 0000:e4:00.0: BAR 5 [io 0x7000-0x707f]
[ 2.165199] kernel: pci 0000:e4:00.0: ROM [mem 0xc4000000-0xc407ffff pref]
[ 2.167199] kernel: pci 0000:e4:00.0: Enabling HDA controller
[ 2.168266] kernel: pci 0000:e4:00.0: PME# supported from D0 D3hot
[ 2.168817] kernel: pci 0000:e4:00.0: VF BAR 0 [mem 0xb0544000000-0xb054403ffff 64bit pref]
[ 2.168817] kernel: pci 0000:e4:00.0: VF BAR 0 [mem 0xb0544000000-0xb054403ffff 64bit pref]: contains BAR 0 for 1 VFs
[ 2.168830] kernel: pci 0000:e4:00.0: VF BAR 2 [mem 0xb0520000000-0xb052fffffff 64bit pref]
[ 2.168831] kernel: pci 0000:e4:00.0: VF BAR 2 [mem 0xb0520000000-0xb052fffffff 64bit pref]: contains BAR 2 for 1 VFs
[ 2.168844] kernel: pci 0000:e4:00.0: VF BAR 4 [mem 0xb0540000000-0xb0541ffffff 64bit pref]
[ 2.168845] kernel: pci 0000:e4:00.0: VF BAR 4 [mem 0xb0540000000-0xb0541ffffff 64bit pref]: contains BAR 4 for 1 VFs
[ 2.213551] kernel: pci 0000:e4:00.0: vgaarb: bridge control possible
[ 2.213551] kernel: pci 0000:e4:00.0: vgaarb: VGA device added: decodes=io+mem,owns=none,locks=none
[ 2.241256] kernel: pci 0000:e4:00.1: D0 power state depends on 0000:e4:00.0
[ 2.245408] kernel: pci 0000:e4:00.0: Adding to iommu group 105
[ 13.081390] kernel: nouveau 0000:e4:00.0: unknown chipset (1b2000a1)
[ 13.292682] kernel: nvidia 0000:e4:00.0: BAR 1 [mem 0xb0530000000-0xb053fffffff 64bit pref]: releasing
[ 13.292685] kernel: nvidia 0000:e4:00.0: BAR 3 [mem 0xb0542000000-0xb0543ffffff 64bit pref]: releasing
[ 13.292895] kernel: nvidia 0000:e4:00.0: BAR 1 [mem 0xb0530000000-0xb053fffffff 64bit pref]: assigned
[ 13.292948] kernel: nvidia 0000:e4:00.0: BAR 3 [mem 0xb0542000000-0xb0543ffffff 64bit pref]: assigned
[ 13.293034] kernel: nvidia 0000:e4:00.0: vgaarb: VGA decodes changed: olddecodes=io+mem,decodes=none:owns=none
[ 13.318472] kernel: [drm] Initialized nvidia-drm 0.0.0 20160202 for 0000:e4:00.0 on minor 4
nvidia-smi topo -p2p p
GPU0 GPU1 GPU2
GPU0 X OK OK
GPU1 OK X OK
GPU2 OK OK X
Legend:
X = Self
OK = Status Ok
CNS = Chipset not supported
GNS = GPU not supported
TNS = Topology not supported
NS = Not supported
U = Unknown
nvidia-smi -q |grep -A 3 "BAR1"
BAR1 Memory Usage
Total : 256 MiB
Used : 2 MiB
Free : 254 MiB
BAR1 Memory Usage
Total : 256 MiB
Used : 2 MiB
Free : 254 MiB
--
BAR1 Memory Usage
Total : 256 MiB
Used : 2 MiB
Free : 254 MiB
export CUDA_VISIBLE_DEVICES=1,2
cd ~/cuda-samples/build/Samples/0_Introduction/simpleP2P
numactl --cpunodebind=1 --membind=1 ./simpleP2P
[./simpleP2P] - Starting...
Checking for multiple GPUs...
CUDA-capable device count: 2
Checking GPU(s) for support of peer to peer memory access...
Peer access from NVIDIA GeForce RTX 5090 (GPU0) -> NVIDIA GeForce RTX 5090 (GPU1) : Yes
Peer access from NVIDIA GeForce RTX 5090 (GPU1) -> NVIDIA GeForce RTX 5090 (GPU0) : Yes
Enabling peer access between GPU0 and GPU1...
Allocating buffers (64MB on GPU0, GPU1 and CPU Host)...
Creating event handles...
cudaMemcpyPeer / cudaMemcpy between GPU0 and GPU1: 1.13GB/s
Preparing host buffer and memcpy to GPU0...
Run kernel on GPU1, taking source data from GPU0 and writing to GPU1...
Run kernel on GPU0, taking source data from GPU1 and writing to GPU0...
Copy data back to host from GPU0 and verify results...
Verification error @ element 0: val = nan, ref = 0.000000
Verification error @ element 1: val = nan, ref = 4.000000
Verification error @ element 2: val = nan, ref = 8.000000
Verification error @ element 3: val = nan, ref = 12.000000
Verification error @ element 4: val = nan, ref = 16.000000
Verification error @ element 5: val = nan, ref = 20.000000
Verification error @ element 6: val = nan, ref = 24.000000
Verification error @ element 7: val = nan, ref = 28.000000
Verification error @ element 8: val = nan, ref = 32.000000
Verification error @ element 9: val = nan, ref = 36.000000
Verification error @ element 10: val = nan, ref = 40.000000
Verification error @ element 11: val = nan, ref = 44.000000
Disabling peer access...
Shutting down...
Test failed!
cd ~/cuda-samples/build/Samples/5_Domain_Specific/p2pBandwidthLatencyTest
numactl --cpunodebind=1 --membind=1 ./p2pBandwidthLatencyTest
[P2P (Peer-to-Peer) GPU Bandwidth Latency Test]
Device: 0, NVIDIA GeForce RTX 5090, pciBusID: 84, pciDeviceID: 0, pciDomainID:0
Device: 1, NVIDIA GeForce RTX 5090, pciBusID: e4, pciDeviceID: 0, pciDomainID:0
Device=0 CAN Access Peer Device=1
Device=1 CAN Access Peer Device=0
***NOTE: In case a device doesn't have P2P access to other one, it falls back to normal memcopy procedure.
So you can see lesser Bandwidth (GB/s) and unstable Latency (us) in those cases.
P2P Connectivity Matrix
D\D 0 1
0 1 1
1 1 1
Unidirectional P2P=Disabled Bandwidth Matrix (GB/s)
D\D 0 1
0 1528.91 42.47
1 42.54 1553.23
Unidirectional P2P=Enabled Bandwidth (P2P Writes) Matrix (GB/s)
D\D 0 1
0 1519.99 12.19
1 2.01 1547.03
Bidirectional P2P=Disabled Bandwidth Matrix (GB/s)
D\D 0 1
0 1527.32 56.67
1 56.62 1539.36
Bidirectional P2P=Enabled Bandwidth Matrix (GB/s)
D\D 0 1
0 1530.32 4.03
1 4.03 1540.12
P2P=Disabled Latency Matrix (us)
GPU 0 1
0 2.07 14.44
1 14.37 2.07
CPU 0 1
0 2.05 5.48
1 5.37 2.09
P2P=Enabled Latency (P2P Writes) Matrix (us)
GPU 0 1
0 2.07 0.36
1 0.37 2.07
CPU 0 1
0 2.02 1.38
1 1.47 2.11
NOTE: The CUDA Samples are not meant for performance measurements. Results may vary when GPU Boost is enabled.
To Reproduce
Re-Size BAR Enabled
Bug Incidence
Always
nvidia-bug-report.log.gz
Just a heads-up: this report is from a run with Resizable BAR off. I can generate a follow-up report with it on if needed, once I'm back at my computer.
More Info
No response