The P2P driver for 5090 can not work with Nvidia A10 graphic card

### NVIDIA Open GPU Kernel Modules Version

a9284ecf7ab29e599e96de82168484728627eb7e06727467053719b785401e0a  /root/xconn/wade/open-gpu-kernel-modules/kernel-open/nvidia.ko

### Please confirm this issue does not happen with the proprietary driver (of the same version). This issue tracker is only for bugs specific to the open kernel driver.

- [ ] I confirm that this does not happen with the proprietary driver package.

### Operating System and Version

Description:    Ubuntu 22.04.5 LTS

### Kernel Release

Linux h3 6.8.0-78-generic #78~22.04.1-Ubuntu SMP PREEMPT_DYNAMIC Wed Aug 13 14:32:06 UTC 2 x86_64 x86_64 x86_64 GNU/Linux

### Please confirm you are running a stable release kernel (e.g. not a -rc). We do not accept bug reports for unreleased kernels.

- [x] I am running on a stable kernel release.

### Hardware: GPU

After inputting"nvidia-smi -L", terminal hangs.

### Describe the bug

I have 1 AMD Turin server + 2 Nvidia A10 cards on it.
AMD Server can recognize the 2 A10 cards.
lspci:
 +-[0000:c0]-+-00.0  Advanced Micro Devices, Inc. [AMD] Turin Root Complex
 |           +-00.3  Advanced Micro Devices, Inc. [AMD] Turin RCEC
 |           +-01.0  Advanced Micro Devices, Inc. [AMD] Turin PCIe Dummy Host Bridge
 |           +-01.1-[c1-c4]----00.0-[c2-c4]--+-07.0-[c3]----00.0  NVIDIA Corporation GA102GL [A10]
 |           |                               \-0a.0-[c4]----00.0  NVIDIA Corporation GA102GL [A10]
 |           \-02.0  Advanced Micro Devices, Inc. [AMD] Turin PCIe Dummy Host Bridge

nvidia-smi shows the following error message.
NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running.

dmesg has the following error message.
[  271.395530] nvidia-nvlink: Nvlink Core is being initialized, major device number 510
[  271.395537] NVRM: GPU 0000:c3:00.0 is already bound to nouveau.
[  271.397851] NVRM: GPU 0000:c4:00.0 is already bound to nouveau.
[  271.397920] NVRM: The NVIDIA probe routine was not called for 2 device(s).
[  271.397921] NVRM: This can occur when another driver was loaded and
               NVRM: obtained ownership of the NVIDIA device(s).
[  271.397922] NVRM: Try unloading the conflicting kernel module (and/or
               NVRM: reconfigure your kernel without the conflicting
               NVRM: driver(s)), then try loading the NVIDIA kernel module
               NVRM: again.
[  271.397922] NVRM: No NVIDIA devices probed.
[  271.398524] nvidia-nvlink: Unregistered Nvlink Core, major device number 510


### To Reproduce

 Reproduce steps:
1. Power on AMD Turin servers
2. lspci -vt     // confirm server can recognize the 2 Nvidia A10 cards
3. nvidia-smi
4. dmesg | tail -n 50

### Bug Incidence

Always

### nvidia-bug-report.log.gz

[nvidia-bug-report.log.gz](https://github.com/user-attachments/files/24631904/nvidia-bug-report.log.gz)

### More Info

I did the same test on AMD Turin server with 2 RTX5090 and it worked as below.
1. nvidia-smi
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 590.44.01 Driver Version: 590.44.01 CUDA Version: 13.1 |
+-----------------------------------------+------------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 NVIDIA GeForce RTX 5090 Off | 00000000:21:00.0 Off | N/A |
| 0% 28C P8 12W / 600W | 2MiB / 32607MiB | 0% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
| 1 NVIDIA GeForce RTX 5090 Off | 00000000:C1:00.0 Off | N/A |
| 0% 27C P8 14W / 600W | 2MiB / 32607MiB | 0% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+ 
2. p2pBandwidthLatencyTest
[P2P (Peer-to-Peer) GPU Bandwidth Latency Test]
Device: 0, NVIDIA GeForce RTX 5090, pciBusID: c3, pciDeviceID: 0, pciDomainID:0
Device: 1, NVIDIA GeForce RTX 5090, pciBusID: c4, pciDeviceID: 0, pciDomainID:0
Device=0 CAN Access Peer Device=1
Device=1 CAN Access Peer Device=0

***NOTE: In case a device doesn't have P2P access to other one, it falls back to normal memcopy procedure.
So you can see lesser Bandwidth (GB/s) and unstable Latency (us) in those cases.

P2P Connectivity Matrix
     D\D     0     1
     0       1     1
     1       1     1
Unidirectional P2P=Disabled Bandwidth Matrix (GB/s)
   D\D     0      1
     0 1522.95  11.47
     1  11.46 1556.27
Unidirectional P2P=Enabled Bandwidth (P2P Writes) Matrix (GB/s)
   D\D     0      1
     0 1525.93  57.19
     1  57.19 1547.03
Bidirectional P2P=Disabled Bandwidth Matrix (GB/s)
   D\D     0      1
     0 1527.32  11.57
     1  11.46 1540.12
Bidirectional P2P=Enabled Bandwidth Matrix (GB/s)
   D\D     0      1
     0 1528.07 112.29
     1 112.34 1538.58
P2P=Disabled Latency Matrix (us)
   GPU     0      1
     0   2.09  15.43
     1  15.42   2.08

   CPU     0      1
     0   2.27   6.21
     1   6.21   2.24
P2P=Enabled Latency (P2P Writes) Matrix (us)
   GPU     0      1
     0   2.07   0.37
     1   0.45   2.07

   CPU     0      1
     0   2.28   1.58
     1   1.59   2.23

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

The P2P driver for 5090 can not work with Nvidia A10 graphic card #16

NVIDIA Open GPU Kernel Modules Version

Please confirm this issue does not happen with the proprietary driver (of the same version). This issue tracker is only for bugs specific to the open kernel driver.

Operating System and Version

Kernel Release

Please confirm you are running a stable release kernel (e.g. not a -rc). We do not accept bug reports for unreleased kernels.

Hardware: GPU

Describe the bug

To Reproduce

Bug Incidence

nvidia-bug-report.log.gz

More Info

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

The P2P driver for 5090 can not work with Nvidia A10 graphic card #16

Description

NVIDIA Open GPU Kernel Modules Version

Please confirm this issue does not happen with the proprietary driver (of the same version). This issue tracker is only for bugs specific to the open kernel driver.

Operating System and Version

Kernel Release

Please confirm you are running a stable release kernel (e.g. not a -rc). We do not accept bug reports for unreleased kernels.

Hardware: GPU

Describe the bug

To Reproduce

Bug Incidence

nvidia-bug-report.log.gz

More Info

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions