PCI/NPEM: Set LED_HW_PLUGGABLE for hotplug-capable ports by nvax-r · Pull Request #2 · JiandiAnNVIDIA/NV-Kernels

nvax-r · 2026-03-30T08:44:15Z

Description

NPEM registers LED classdevs on PCI endpoint that may be behind hotplug-capable ports. During hot-removal, led_classdev_unregister() calls led_set_brightness(LED_OFF) which PCI config on a disconnected device, returning -ENODEV:

leds 0003:01:00.0:enclosure:ok: Setting an LED's brightness failed (-19)

The LED core already suppresses this for devices with LED_HW_PLUGGABLE set, but NPEM never sets it. Add the flag since NPEM LEDs are on hot-pluggable hardware by nature.

Related links

NV-bug link: https://nvbugspro.nvidia.com/bug/5739122
Reproducing steps: https://nvbugspro.nvidia.com/bug/5739122?commentNumber=92

BugLink: https://bugs.launchpad.net/bugs/2139315 Add cpu part and model macro definitions for NVIDIA Olympus core. Signed-off-by: Shanker Donthineni <sdonthineni@nvidia.com> Signed-off-by: Will Deacon <will@kernel.org> (cherry picked from commit e185c8a) Signed-off-by: Matthew R. Ochs <mochs@nvidia.com> Acked-by: Jamie Nguyen <jamien@nvidia.com> Acked-by: Carol L Soto <csoto@nvidia.com> Acked-by: Jacob Martin <jacob.martin@canonical.com> Acked-by: Abdur Rahman <abdur.rahman@canonical.com> Signed-off-by: Brad Figg <bfigg@nvidia.com>

BugLink: https://bugs.launchpad.net/bugs/2139315 Add the part number and MIDR for NVIDIA Olympus. Signed-off-by: Besar Wicaksono <bwicaksono@nvidia.com> Reviewed-by: Leo Yan <leo.yan@arm.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> (cherry picked from commit d5e4c71) Signed-off-by: Matthew R. Ochs <mochs@nvidia.com> Acked-by: Jamie Nguyen <jamien@nvidia.com> Acked-by: Carol L Soto <csoto@nvidia.com> Acked-by: Jacob Martin <jacob.martin@canonical.com> Acked-by: Abdur Rahman <abdur.rahman@canonical.com> Signed-off-by: Brad Figg <bfigg@nvidia.com>

BugLink: https://bugs.launchpad.net/bugs/2139315 Add NVIDIA Olympus MIDR to neoverse_spe range list. Signed-off-by: Besar Wicaksono <bwicaksono@nvidia.com> Reviewed-by: Leo Yan <leo.yan@arm.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> (backported from commit d852b83) [mochs: Minor context cleanup due to absence of "perf arm_spe: Add CPU variants supporting common data source packet"] Signed-off-by: Matthew R. Ochs <mochs@nvidia.com> Acked-by: Jamie Nguyen <jamien@nvidia.com> Acked-by: Carol L Soto <csoto@nvidia.com> Acked-by: Jacob Martin <jacob.martin@canonical.com> Acked-by: Abdur Rahman <abdur.rahman@canonical.com> Signed-off-by: Brad Figg <bfigg@nvidia.com>

BugLink: https://bugs.launchpad.net/bugs/2139315 The documentation in nvidia-pmu.rst contains PMUs specific to NVIDIA Tegra241 SoC. Rename the file for this specific SoC to have better distinction with other NVIDIA SoC. Signed-off-by: Besar Wicaksono <bwicaksono@nvidia.com> (backported from https://lore.kernel.org/all/20260126181155.2776097-1-bwicaksono@nvidia.com/) Signed-off-by: Matthew R. Ochs <mochs@nvidia.com> Acked-by: Jamie Nguyen <jamien@nvidia.com> Acked-by: Carol L Soto <csoto@nvidia.com> Acked-by: Jacob Martin <jacob.martin@canonical.com> Acked-by: Abdur Rahman <abdur.rahman@canonical.com> Signed-off-by: Brad Figg <bfigg@nvidia.com>

BugLink: https://bugs.launchpad.net/bugs/2139315 Adds Unified Coherent Fabric PMU support in Tegra410 SOC. Signed-off-by: Besar Wicaksono <bwicaksono@nvidia.com> (backported from https://lore.kernel.org/all/20260126181155.2776097-1-bwicaksono@nvidia.com/) Signed-off-by: Matthew R. Ochs <mochs@nvidia.com> Acked-by: Jamie Nguyen <jamien@nvidia.com> Acked-by: Carol L Soto <csoto@nvidia.com> Acked-by: Jacob Martin <jacob.martin@canonical.com> Acked-by: Abdur Rahman <abdur.rahman@canonical.com> Signed-off-by: Brad Figg <bfigg@nvidia.com>

BugLink: https://bugs.launchpad.net/bugs/2139315 Add interface to get ACPI device associated with the PMU. This ACPI device may contain additional properties not covered by the standard properties. Signed-off-by: Besar Wicaksono <bwicaksono@nvidia.com> (backported from https://lore.kernel.org/all/20260126181155.2776097-1-bwicaksono@nvidia.com/) Signed-off-by: Matthew R. Ochs <mochs@nvidia.com> Acked-by: Jamie Nguyen <jamien@nvidia.com> Acked-by: Carol L Soto <csoto@nvidia.com> Acked-by: Jacob Martin <jacob.martin@canonical.com> Acked-by: Abdur Rahman <abdur.rahman@canonical.com> Signed-off-by: Brad Figg <bfigg@nvidia.com>

BugLink: https://bugs.launchpad.net/bugs/2139315 Adds PCIE PMU support in Tegra410 SOC. Signed-off-by: Besar Wicaksono <bwicaksono@nvidia.com> (backported from https://lore.kernel.org/all/20260126181155.2776097-1-bwicaksono@nvidia.com/) Signed-off-by: Matthew R. Ochs <mochs@nvidia.com> Acked-by: Jamie Nguyen <jamien@nvidia.com> Acked-by: Carol L Soto <csoto@nvidia.com> Acked-by: Jacob Martin <jacob.martin@canonical.com> Acked-by: Abdur Rahman <abdur.rahman@canonical.com> Signed-off-by: Brad Figg <bfigg@nvidia.com>

BugLink: https://bugs.launchpad.net/bugs/2139315 Adds PCIE-TGT PMU support in Tegra410 SOC. Signed-off-by: Besar Wicaksono <bwicaksono@nvidia.com> (backported from https://lore.kernel.org/all/20260126181155.2776097-1-bwicaksono@nvidia.com/) Signed-off-by: Matthew R. Ochs <mochs@nvidia.com> Acked-by: Jamie Nguyen <jamien@nvidia.com> Acked-by: Carol L Soto <csoto@nvidia.com> Acked-by: Jacob Martin <jacob.martin@canonical.com> Acked-by: Abdur Rahman <abdur.rahman@canonical.com> Signed-off-by: Brad Figg <bfigg@nvidia.com>

BugLink: https://bugs.launchpad.net/bugs/2139315 Adds CPU Memory (CMEM) Latency PMU support in Tegra410 SOC. Signed-off-by: Besar Wicaksono <bwicaksono@nvidia.com> (backported from https://lore.kernel.org/all/20260126181155.2776097-1-bwicaksono@nvidia.com/) Signed-off-by: Matthew R. Ochs <mochs@nvidia.com> Acked-by: Jamie Nguyen <jamien@nvidia.com> Acked-by: Carol L Soto <csoto@nvidia.com> Acked-by: Jacob Martin <jacob.martin@canonical.com> Acked-by: Abdur Rahman <abdur.rahman@canonical.com> Signed-off-by: Brad Figg <bfigg@nvidia.com>

BugLink: https://bugs.launchpad.net/bugs/2139315 Adds NVIDIA C2C PMU support in Tegra410 SOC. Signed-off-by: Besar Wicaksono <bwicaksono@nvidia.com> (backported from https://lore.kernel.org/all/20260126181155.2776097-1-bwicaksono@nvidia.com/) Signed-off-by: Matthew R. Ochs <mochs@nvidia.com> Acked-by: Jamie Nguyen <jamien@nvidia.com> Acked-by: Carol L Soto <csoto@nvidia.com> Acked-by: Jacob Martin <jacob.martin@canonical.com> Acked-by: Abdur Rahman <abdur.rahman@canonical.com> Signed-off-by: Brad Figg <bfigg@nvidia.com>

BugLink: https://bugs.launchpad.net/bugs/2139315 Enable driver for NVIDIA TEGRA410 CMEM Latency and C2C PMU device. Signed-off-by: Besar Wicaksono <bwicaksono@nvidia.com> (backported from https://lore.kernel.org/all/20260126181155.2776097-1-bwicaksono@nvidia.com/) Signed-off-by: Matthew R. Ochs <mochs@nvidia.com> Acked-by: Jamie Nguyen <jamien@nvidia.com> Acked-by: Carol L Soto <csoto@nvidia.com> Acked-by: Jacob Martin <jacob.martin@canonical.com> Acked-by: Abdur Rahman <abdur.rahman@canonical.com> Signed-off-by: Brad Figg <bfigg@nvidia.com>

… events BugLink: https://bugs.launchpad.net/bugs/2139315 Add JSON files for NVIDIA Tegra410 Olympus core PMU events. Also updated the common-and-microarch.json. Signed-off-by: Besar Wicaksono <bwicaksono@nvidia.com> (backported from https://lore.kernel.org/all/20260127225909.3296202-1-bwicaksono@nvidia.com/) Signed-off-by: Matthew R. Ochs <mochs@nvidia.com> Acked-by: Jamie Nguyen <jamien@nvidia.com> Acked-by: Carol L Soto <csoto@nvidia.com> Acked-by: Jacob Martin <jacob.martin@canonical.com> Acked-by: Abdur Rahman <abdur.rahman@canonical.com> Signed-off-by: Brad Figg <bfigg@nvidia.com>

…EGRA410_CMEM_LATENCY_PMU BugLink: https://bugs.launchpad.net/bugs/2139315 Set the following kconfigs to enable these PMUs on T410: CONFIG_NVIDIA_TEGRA410_C2C_PMU=m CONFIG_NVIDIA_TEGRA410_CMEM_LATENCY_PMU=m Signed-off-by: Matthew R. Ochs <mochs@nvidia.com> Acked-by: Jamie Nguyen <jamien@nvidia.com> Acked-by: Carol L Soto <csoto@nvidia.com> Acked-by: Jacob Martin <jacob.martin@canonical.com> Acked-by: Abdur Rahman <abdur.rahman@canonical.com> Signed-off-by: Brad Figg <bfigg@nvidia.com>

BugLink: https://bugs.launchpad.net/bugs/2138131 Properly pass the variadic arguments so it can be called with or without them depending on the format. Signed-off-by: Lucas De Marchi <ldemarchi@nvidia.com> Acked-by: Carol L Soto <csoto@nvidia.com> Acked-by: Jamie Nguyen <jamien@nvidia.com> Acked-by: Jacob Martin <jacob.martin@canonical.com> Acked-by: Abdur Rahman <abdur.rahman@canonical.com> Signed-off-by: Brad Figg <bfigg@nvidia.com>

BugLink: https://bugs.launchpad.net/bugs/2140343 A vDEVICE has been a hard requirement for attaching a nested domain to the device. This makes sense when installing a guest STE, since a vSID must be present and given to the kernel during the vDEVICE allocation. But, when CR0.SMMUEN is disabled, VM doesn't really need a vSID to program the vSMMU behavior as GBPA will take effect, in which case the vSTE in the nested domain could have carried the bypass or abort configuration in GBPA register. Thus, having such a hard requirement doesn't work well for GBPA. Skip vmaster allocation in arm_smmu_attach_prepare_vmaster() for an abort or bypass vSTE. Note that device on this attachment won't report vevents. Update the uAPI doc accordingly. Link: https://patch.msgid.link/r/20251103172755.2026145-1-nicolinc@nvidia.com Tested-by: Shameer Kolothum <skolothumtho@nvidia.com> Signed-off-by: Nicolin Chen <nicolinc@nvidia.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Pranjal Shrivastava <praan@google.com> Tested-by: Shuai Xue <xueshuai@linux.alibaba.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> (backported from commit 81c45c6) Signed-off-by: Nathan Chen <nathanc@nvidia.com> Acked-by: Matthew R. Ochs <mochs@nvidia.com> Acked-by: Jamie Nguyen <jamien@nvidia.com> Acked-by: Carol L Soto <csoto@nvidia.com> Acked-by: Jacob Martin <jacob.martin@canonical.com> Acked-by: Noah Wager <noah.wager@canonical.com> Signed-off-by: Brad Figg <bfigg@nvidia.com>

BugLink: https://bugs.launchpad.net/bugs/2140343 The function hugetlb_reserve_pages() returns the number of pages added to the reservation map on success and a negative error code on failure (e.g. -EINVAL, -ENOMEM). However, in some error paths, it may return -1 directly. For example, a failure at: if (hugetlb_acct_memory(h, gbl_reserve) < 0) goto out_put_pages; results in returning -1 (since add = -1), which may be misinterpreted in userspace as -EPERM. Fix this by explicitly capturing and propagating the return values from helper functions, and using -EINVAL for all other failure cases. Link: https://lkml.kernel.org/r/20251125171350.86441-1-skolothumtho@nvidia.com Fixes: 986f5f2 ("mm/hugetlb: make hugetlb_reserve_pages() return nr of entries updated") Signed-off-by: Shameer Kolothum <skolothumtho@nvidia.com> Reviewed-by: Joshua Hahn <joshua.hahnjy@gmail.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Acked-by: Oscar Salvador <osalvador@suse.de> Cc: Matthew R. Ochs <mochs@nvidia.com> Cc: Muchun Song <muchun.song@linux.dev> Cc: Nicolin Chen <nicolinc@nvidia.com> Cc: Vivek Kasireddy <vivek.kasireddy@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> (backported from commit 9ee5d17) Signed-off-by: Nathan Chen <nathanc@nvidia.com> Acked-by: Matthew R. Ochs <mochs@nvidia.com> Acked-by: Jamie Nguyen <jamien@nvidia.com> Acked-by: Carol L Soto <csoto@nvidia.com> Acked-by: Jacob Martin <jacob.martin@canonical.com> Acked-by: Noah Wager <noah.wager@canonical.com> Signed-off-by: Brad Figg <bfigg@nvidia.com>

BugLink: https://bugs.launchpad.net/bugs/2140343 The Enable bits in CMDQV/VINTF/VCMDQ_CONFIG registers do not actually reset the HW registers. So, the driver explicitly clears all the registers when a VINTF or VCMDQ is being initialized calling its hw_deinit() function. However, a userspace VCMDQ is not properly reset, unlike an in-kernel VCMDQ getting reset in tegra241_vcmdq_hw_init(). Meanwhile, tegra241_vintf_hw_init() calling tegra241_vintf_hw_deinit() will not deinit any VCMDQ, since there is no userspace VCMDQ mapped to the VINTF at that stage. Then, this may result in dirty VCMDQ registers, which can fail the VM. Like tegra241_vcmdq_hw_init(), reset a VCMDQ in tegra241_vcmdq_hw_init() to fix this bug. This is required by a host kernel. Fixes: 6717f26ab1e7 ("iommu/tegra241-cmdqv: Add user-space use support") Cc: stable@vger.kernel.org Reported-by: Bao Nguyen <ncqb@google.com> Signed-off-by: Nicolin Chen <nicolinc@nvidia.com> Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> (backported from commit 80f1a2c) Signed-off-by: Nathan Chen <nathanc@nvidia.com> Acked-by: Matthew R. Ochs <mochs@nvidia.com> Acked-by: Jamie Nguyen <jamien@nvidia.com> Acked-by: Carol L Soto <csoto@nvidia.com> Acked-by: Jacob Martin <jacob.martin@canonical.com> Acked-by: Noah Wager <noah.wager@canonical.com> Signed-off-by: Brad Figg <bfigg@nvidia.com>

… transfer BugLink: https://bugs.launchpad.net/bugs/2139640 When the ISR thread wakes up late and finds that the timeout handler has already processed the transfer (curr_xfer is NULL), return IRQ_HANDLED instead of IRQ_NONE. Use a similar approach to tegra_qspi_handle_timeout() by reading QSPI_TRANS_STATUS and checking the QSPI_RDY bit to determine if the hardware actually completed the transfer. If QSPI_RDY is set, the interrupt was legitimate and triggered by real hardware activity. The fact that the timeout path handled it first doesn't make it spurious. Returning IRQ_NONE incorrectly suggests the interrupt wasn't for this device, which can cause issues with shared interrupt lines and interrupt accounting. Fixes: b4e002d ("spi: tegra210-quad: Fix timeout handling") Signed-off-by: Breno Leitao <leitao@debian.org> Signed-off-by: Usama Arif <usamaarif642@gmail.com> Tested-by: Jon Hunter <jonathanh@nvidia.com> Acked-by: Jon Hunter <jonathanh@nvidia.com> Acked-by: Thierry Reding <treding@nvidia.com> Link: https://patch.msgid.link/20260126-tegra_xfer-v2-1-6d2115e4f387@debian.org Signed-off-by: Mark Brown <broonie@kernel.org> (cherry picked from commit aabd8ea linux-next) Signed-off-by: Carol L Soto <csoto@nvidia.com> Acked-by: Matthew R. Ochs <mochs@nvidia.com> Acked-by: Jamie Nguyen <jamien@nvidia.com> Acked-by: Abdur Rahman <abdur.rahman@canonical.com> Acked-by: Noah Wager <noah.wager@canonical.com> Signed-off-by: Brad Figg <bfigg@nvidia.com>

BugLink: https://bugs.launchpad.net/bugs/2139640 Move the assignment of the transfer pointer from curr_xfer inside the spinlock critical section in both handle_cpu_based_xfer() and handle_dma_based_xfer(). Previously, curr_xfer was read before acquiring the lock, creating a window where the timeout path could clear curr_xfer between reading it and using it. By moving the read inside the lock, the handlers are guaranteed to see a consistent value that cannot be modified by the timeout path. Fixes: 921fc18 ("spi: tegra210-quad: Add support for Tegra210 QSPI controller") Signed-off-by: Breno Leitao <leitao@debian.org> Acked-by: Thierry Reding <treding@nvidia.com> Tested-by: Jon Hunter <jonathanh@nvidia.com> Acked-by: Jon Hunter <jonathanh@nvidia.com> Link: https://patch.msgid.link/20260126-tegra_xfer-v2-2-6d2115e4f387@debian.org Signed-off-by: Mark Brown <broonie@kernel.org> (cherry picked from commit ef13ba3 linux-next) Signed-off-by: Carol L Soto <csoto@nvidia.com> Acked-by: Matthew R. Ochs <mochs@nvidia.com> Acked-by: Jamie Nguyen <jamien@nvidia.com> Acked-by: Abdur Rahman <abdur.rahman@canonical.com> Acked-by: Noah Wager <noah.wager@canonical.com> Signed-off-by: Brad Figg <bfigg@nvidia.com>

…transfer_one BugLink: https://bugs.launchpad.net/bugs/2139640 When the timeout handler processes a completed transfer and signals completion, the transfer thread can immediately set up the next transfer and assign curr_xfer to point to it. If a delayed ISR from the previous transfer then runs, it checks if (!tqspi->curr_xfer) (currently without the lock also -- to be fixed soon) to detect stale interrupts, but this check passes because curr_xfer now points to the new transfer. The ISR then incorrectly processes the new transfer's context. Protect the curr_xfer assignment with the spinlock to ensure the ISR either sees NULL (and bails out) or sees the new value only after the assignment is complete. Fixes: 921fc18 ("spi: tegra210-quad: Add support for Tegra210 QSPI controller") Signed-off-by: Breno Leitao <leitao@debian.org> Tested-by: Jon Hunter <jonathanh@nvidia.com> Acked-by: Jon Hunter <jonathanh@nvidia.com> Acked-by: Thierry Reding <treding@nvidia.com> Link: https://patch.msgid.link/20260126-tegra_xfer-v2-3-6d2115e4f387@debian.org Signed-off-by: Mark Brown <broonie@kernel.org> (cherry picked from commit f5a4d7f linux-next) Signed-off-by: Carol L Soto <csoto@nvidia.com> Acked-by: Matthew R. Ochs <mochs@nvidia.com> Acked-by: Jamie Nguyen <jamien@nvidia.com> Acked-by: Abdur Rahman <abdur.rahman@canonical.com> Acked-by: Noah Wager <noah.wager@canonical.com> Signed-off-by: Brad Figg <bfigg@nvidia.com>

BugLink: https://bugs.launchpad.net/bugs/2139640 The curr_xfer field is read by the IRQ handler without holding the lock to check if a transfer is in progress. When clearing curr_xfer in the combined sequence transfer loop, protect it with the spinlock to prevent a race with the interrupt handler. Protect the curr_xfer clearing at the exit path of tegra_qspi_combined_seq_xfer() with the spinlock to prevent a race with the interrupt handler that reads this field. Without this protection, the IRQ handler could read a partially updated curr_xfer value, leading to NULL pointer dereference or use-after-free. Fixes: b4e002d ("spi: tegra210-quad: Fix timeout handling") Signed-off-by: Breno Leitao <leitao@debian.org> Tested-by: Jon Hunter <jonathanh@nvidia.com> Acked-by: Jon Hunter <jonathanh@nvidia.com> Acked-by: Thierry Reding <treding@nvidia.com> Link: https://patch.msgid.link/20260126-tegra_xfer-v2-4-6d2115e4f387@debian.org Signed-off-by: Mark Brown <broonie@kernel.org> (cherry picked from commit bf4528a linux-next) Signed-off-by: Carol L Soto <csoto@nvidia.com> Acked-by: Matthew R. Ochs <mochs@nvidia.com> Acked-by: Jamie Nguyen <jamien@nvidia.com> Acked-by: Abdur Rahman <abdur.rahman@canonical.com> Acked-by: Noah Wager <noah.wager@canonical.com> Signed-off-by: Brad Figg <bfigg@nvidia.com>

…ined_seq_xfer BugLink: https://bugs.launchpad.net/bugs/2139640 Protect the curr_xfer clearing in tegra_qspi_non_combined_seq_xfer() with the spinlock to prevent a race with the interrupt handler that reads this field to check if a transfer is in progress. Fixes: b4e002d ("spi: tegra210-quad: Fix timeout handling") Signed-off-by: Breno Leitao <leitao@debian.org> Tested-by: Jon Hunter <jonathanh@nvidia.com> Acked-by: Jon Hunter <jonathanh@nvidia.com> Acked-by: Thierry Reding <treding@nvidia.com> Link: https://patch.msgid.link/20260126-tegra_xfer-v2-5-6d2115e4f387@debian.org Signed-off-by: Mark Brown <broonie@kernel.org> (cherry picked from commit 6d7723e linux-next) Signed-off-by: Carol L Soto <csoto@nvidia.com> Acked-by: Matthew R. Ochs <mochs@nvidia.com> Acked-by: Jamie Nguyen <jamien@nvidia.com> Acked-by: Abdur Rahman <abdur.rahman@canonical.com> Acked-by: Noah Wager <noah.wager@canonical.com> Signed-off-by: Brad Figg <bfigg@nvidia.com>

BugLink: https://bugs.launchpad.net/bugs/2139640 Now that all other accesses to curr_xfer are done under the lock, protect the curr_xfer NULL check in tegra_qspi_isr_thread() with the spinlock. Without this protection, the following race can occur: CPU0 (ISR thread) CPU1 (timeout path) ---------------- ------------------- if (!tqspi->curr_xfer) // sees non-NULL spin_lock() tqspi->curr_xfer = NULL spin_unlock() handle_*_xfer() spin_lock() t = tqspi->curr_xfer // NULL! ... t->len ... // NULL dereference! With this patch, all curr_xfer accesses are now properly synchronized. Although all accesses to curr_xfer are done under the lock, in tegra_qspi_isr_thread() it checks for NULL, releases the lock and reacquires it later in handle_cpu_based_xfer()/handle_dma_based_xfer(). There is a potential for an update in between, which could cause a NULL pointer dereference. To handle this, add a NULL check inside the handlers after acquiring the lock. This ensures that if the timeout path has already cleared curr_xfer, the handler will safely return without dereferencing the NULL pointer. Fixes: b4e002d ("spi: tegra210-quad: Fix timeout handling") Signed-off-by: Breno Leitao <leitao@debian.org> Tested-by: Jon Hunter <jonathanh@nvidia.com> Acked-by: Jon Hunter <jonathanh@nvidia.com> Acked-by: Thierry Reding <treding@nvidia.com> Link: https://patch.msgid.link/20260126-tegra_xfer-v2-6-6d2115e4f387@debian.org Signed-off-by: Mark Brown <broonie@kernel.org> (cherry picked from commit edf9088 linux-next) Signed-off-by: Carol L Soto <csoto@nvidia.com> Acked-by: Matthew R. Ochs <mochs@nvidia.com> Acked-by: Jamie Nguyen <jamien@nvidia.com> Acked-by: Abdur Rahman <abdur.rahman@canonical.com> Acked-by: Noah Wager <noah.wager@canonical.com> Signed-off-by: Brad Figg <bfigg@nvidia.com>

BugLink: https://bugs.launchpad.net/bugs/2139648 Currently cpu-clock event always returns 0 count, e.g., perf stat -e cpu-clock -- sleep 1 Performance counter stats for 'sleep 1': 0 cpu-clock # 0.000 CPUs utilized 1.002308394 seconds time elapsed The root cause is the commit 'bc4394e5e79c ("perf: Fix the throttle error of some clock events")' adds PERF_EF_UPDATE flag check before calling cpu_clock_event_update() to update the count, however the PERF_EF_UPDATE flag is never set when the cpu-clock event is stopped in counting mode (pmu->dev() -> cpu_clock_event_del() -> cpu_clock_event_stop()). This leads to the cpu-clock event count is never updated. To fix this issue, force to set PERF_EF_UPDATE flag for cpu-clock event just like what task-clock does. Fixes: bc4394e ("perf: Fix the throttle error of some clock events") Signed-off-by: Dapeng Mi <dapeng1.mi@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Ian Rogers <irogers@google.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Link: https://patch.msgid.link/20251112080526.3971392-1-dapeng1.mi@linux.intel.com (cherry picked from commit f1f9651) Signed-off-by: Matthew R. Ochs <mochs@nvidia.com> Acked-by: Carol L Soto <csoto@nvidia.com> Acked-by: Nirmoy Das <nirmoyd@nvidia.com> Acked-by: Jamie Nguyen <jamien@nvidia.com> Acked-by: Abdur Rahman <abdur.rahman@canonical.com> Acked-by: Noah Wager <noah.wager@canonical.com> Signed-off-by: Brad Figg <bfigg@nvidia.com>

BugLink: https://bugs.launchpad.net/bugs/2093957 Signed-off-by: Jeremy Szu <jszu@nvidia.com> Acked-by: Nirmoy Das <nirmoyd@nvidia.com> Acked-by: Matthew R. Ochs <mochs@nvidia.com> Acked-by: Abdur Rahman <abdur.rahman@canonical.com> Acked-by: Noah Wager <noah.wager@canonical.com> Signed-off-by: Brad Figg <bfigg@nvidia.com>

BugLink: https://bugs.launchpad.net/bugs/2138892 Remove this declaration which is now used within the file after merging upstream "vfio/nvgrace-gpu: register device memory for poison handling". Signed-off-by: Nirmoy Das <nirmoyd@nvidia.com> Acked-by: Matthew R. Ochs <mochs@nvidia.com> Acked-by: Jamie Nguyen <jamien@nvidia.com> Signed-off-by: Brad Figg <bfigg@nvidia.com>

BugLink: https://bugs.launchpad.net/bugs/2136828 Add PCI_VENDOR_ID_ASPEED to the shared pci_ids.h header and remove the duplicate local definition from ehci-pci.c. This prepares for adding a PCI quirk for ASPEED devices. Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Nirmoy Das <nirmoyd@nvidia.com> (backported from https://lore.kernel.org/linux-iommu/20251217154529.377586-1-nirmoyd@nvidia.com/) Signed-off-by: Nirmoy Das <nirmoyd@nvidia.com> Acked-by: Carol L Soto <csoto@nvidia.com> Acked-by: Matthew R. Ochs <mochs@nvidia.com> Acked-by: Jamie Nguyen <jamien@nvidia.com> Acked-by: Noah Wager <noah.wager@canonical.com> Acked-by: Abdur Rahman <abdur.rahman@canonical.com> Signed-off-by: Brad Figg <bfigg@nvidia.com>

BugLink: https://bugs.launchpad.net/bugs/2136828 ASPEED BMC controllers have VGA and USB functions behind a PCIe-to-PCI bridge that causes them to share the same stream ID: [e0]---00.0-[e1-e2]----00.0-[e2]--+-00.0 ASPEED Graphics Family \-02.0 ASPEED USB Controller Both devices get stream ID 0x5e200 due to bridge aliasing, causing the USB controller to be rejected with 'Aliasing StreamID unsupported'. Per ASPEED, the AST1150 doesn't use a real PCI bus and always forwards the original requester ID from downstream devices rather than replacing it with any alias. Add a new PCI_DEV_FLAGS_PCI_BRIDGE_NO_ALIASES flag and apply it to the AST1150. Suggested-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Robin Murphy <robin.murphy@arm.com> Signed-off-by: Nirmoy Das <nirmoyd@nvidia.com> (backported from https://lore.kernel.org/linux-iommu/20251217154529.377586-2-nirmoyd@nvidia.com/) [nirmoy: set PCI_DEV_FLAGS_PCI_BRIDGE_NO_ALIASES to (1 << 15) instead of (1 << 14)] Signed-off-by: Nirmoy Das <nirmoyd@nvidia.com> Acked-by: Carol L Soto <csoto@nvidia.com> Acked-by: Matthew R. Ochs <mochs@nvidia.com> Acked-by: Jamie Nguyen <jamien@nvidia.com> Acked-by: Noah Wager <noah.wager@canonical.com> Acked-by: Abdur Rahman <abdur.rahman@canonical.com> Signed-off-by: Brad Figg <bfigg@nvidia.com>

…ivation (LFA) BugLink: https://bugs.launchpad.net/bugs/2138342 The Arm Live Firmware Activation (LFA) is a specification [1] to describe activating firmware components without a reboot. Those components (like TF-A's BL31, EDK-II, TF-RMM, secure paylods) would be updated the usual way: via fwupd, FF-A or other secure storage methods, or via some IMPDEF Out-Of-Bound method. The user can then activate this new firmware, at system runtime, without requiring a reboot. The specification covers the SMCCC interface to list and query available components and eventually trigger the activation. Add a new directory under /sys/firmware to present firmware components capable of live activation. Each of them is a directory under lfa/, and is identified via its GUID. The activation will be triggered by echoing "1" into the "activate" file: ========================================== /sys/firmware/lfa # ls -l . 6c* .: total 0 drwxr-xr-x 2 0 0 0 Jan 19 11:33 47d4086d-4cfe-9846-9b95-2950cbbd5a00 drwxr-xr-x 2 0 0 0 Jan 19 11:33 6c0762a6-12f2-4b56-92cb-ba8f633606d9 drwxr-xr-x 2 0 0 0 Jan 19 11:33 d6d0eea7-fcea-d54b-9782-9934f234b6e4 6c0762a6-12f2-4b56-92cb-ba8f633606d9: total 0 --w------- 1 0 0 4096 Jan 19 11:33 activate -r--r--r-- 1 0 0 4096 Jan 19 11:33 activation_capable -r--r--r-- 1 0 0 4096 Jan 19 11:33 activation_pending --w------- 1 0 0 4096 Jan 19 11:33 cancel -r--r--r-- 1 0 0 4096 Jan 19 11:33 cpu_rendezvous -r--r--r-- 1 0 0 4096 Jan 19 11:33 current_version -rw-r--r-- 1 0 0 4096 Jan 19 11:33 force_cpu_rendezvous -r--r--r-- 1 0 0 4096 Jan 19 11:33 may_reset_cpu -r--r--r-- 1 0 0 4096 Jan 19 11:33 name -r--r--r-- 1 0 0 4096 Jan 19 11:33 pending_version /sys/firmware/lfa/6c0762a6-12f2-4b56-92cb-ba8f633606d9 # grep . * grep: activate: Permission denied activation_capable:1 activation_pending:1 grep: cancel: Permission denied cpu_rendezvous:1 current_version:0.0 force_cpu_rendezvous:1 may_reset_cpu:0 name:TF-RMM pending_version:0.0 /sys/firmware/lfa/6c0762a6-12f2-4b56-92cb-ba8f633606d9 # echo 1 > activate [ 2825.797871] Arm LFA: firmware activation succeeded. /sys/firmware/lfa/6c0762a6-12f2-4b56-92cb-ba8f633606d9 # ========================================== [1] https://developer.arm.com/documentation/den0147/latest/ Signed-off-by: Salman Nabi <salman.nabi@arm.com> Signed-off-by: Vedashree Vidwans <vvidwans@nvidia.com> (backported from https://lore.kernel.org/all/20260119122729.287522-2-salman.nabi@arm.com/) [nirmoyd: Added image_name fallback to fw_uuid in update_fw_image_node()] Signed-off-by: Nirmoy Das <nirmoyd@nvidia.com> Acked-by: Carol L Soto <csoto@nvidia.com> Acked-by: Matthew R. Ochs <mochs@nvidia.com> Acked-by: Jamie Nguyen <jamien@nvidia.com> Acked-by: Jacob Martin <jacob.martin@canonical.com> Acked-by: Noah Wager <noah.wager@canonical.com> Signed-off-by: Brad Figg <bfigg@nvidia.com>

BugLink: https://bugs.launchpad.net/bugs/2138342 Enhance PRIME/ACTIVATION functions to touch watchdog and implement timeout mechanism. This update ensures that any potential hangs are detected promptly and that the LFA process is allocated sufficient execution time before the watchdog timer expires. These changes improve overall system reliability by reducing the risk of undetected process stalls and unexpected watchdog resets. Signed-off-by: Vedashree Vidwans <vvidwans@nvidia.com> Signed-off-by: Nirmoy Das <nirmoyd@nvidia.com> Acked-by: Carol L Soto <csoto@nvidia.com> Acked-by: Matthew R. Ochs <mochs@nvidia.com> Acked-by: Jamie Nguyen <jamien@nvidia.com> Acked-by: Jacob Martin <jacob.martin@canonical.com> Acked-by: Noah Wager <noah.wager@canonical.com> Signed-off-by: Brad Figg <bfigg@nvidia.com>

Region creation involves finding available DPA (device-physical-address) capacity to map into HPA (host-physical-address) space. In order to support CXL Type2 devices, define an API, cxl_request_dpa(), that tries to allocate the DPA memory the driver requires to operate.The memory requested should not be bigger than the max available HPA obtained previously with cxl_get_hpa_freespace(). Based on https://lore.kernel.org/linux-cxl/168592158743.1948938.7622563891193802610.stgit@dwillia2-xfh.jf.intel.com/ Signed-off-by: Alejandro Lucero <alucerop@amd.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Reviewed-by: Dave Jiang <dave.jiang@intel.com> Reviewed-by: Ben Cheatham <benjamin.cheatham@amd.com> (backported from https://lore.kernel.org/linux-cxl/20260201155438.2664640-1-alejandro.lucero-palau@amd.com/) Signed-off-by: Jiandi An <jan@nvidia.com>

Use cxl api for getting DPA (Device Physical Address) to use through an endpoint decoder. Signed-off-by: Alejandro Lucero <alucerop@amd.com> Reviewed-by: Martin Habets <habetsm.xilinx@gmail.com> Acked-by: Edward Cree <ecree.xilinx@gmail.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Reviewed-by: Ben Cheatham <benjamin.cheatham@amd.com> Reviewed-by: Dave Jiang <dave.jiang@intel.com> (backported from https://lore.kernel.org/linux-cxl/20260201155438.2664640-1-alejandro.lucero-palau@amd.com/) Signed-off-by: Jiandi An <jan@nvidia.com>

Current code is expecting Type3 or CXL_DECODER_HOSTONLYMEM devices only. Support for Type2 implies region type needs to be based on the endpoint type HDM-D[B] instead. Signed-off-by: Alejandro Lucero <alucerop@amd.com> Reviewed-by: Zhi Wang <zhiw@nvidia.com> Reviewed-by: Dave Jiang <dave.jiang@intel.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Reviewed-by: Ben Cheatham <benjamin.cheatham@amd.com> Reviewed-by: Alison Schofield <alison.schofield@intel.com> Reviewed-by: Davidlohr Bueso <daves@stgolabs.net> Reviewed-by: Gregory Price <gourry@gourry.net> Reviewed-by: Davidlohr Bueso <dave@stgolabs.net> (backported from https://lore.kernel.org/linux-cxl/20260201155438.2664640-1-alejandro.lucero-palau@amd.com/) Signed-off-by: Jiandi An <jan@nvidia.com>

Region creation based on Type3 devices is triggered from user space allowing memory combination through interleaving. In preparation for kernel driven region creation, that is Type2 drivers triggering region creation backed with its advertised CXL memory, factor out a common helper from the user-sysfs region setup for interleave ways. Signed-off-by: Alejandro Lucero <alucerop@amd.com> Reviewed-by: Zhi Wang <zhiw@nvidia.com> Reviewed-by: Dave Jiang <dave.jiang@intel.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Reviewed-by: Ben Cheatham <benjamin.cheatham@amd.com> Reviewed-by: Alison Schofield <alison.schofield@intel.com> (backported from https://lore.kernel.org/linux-cxl/20260201155438.2664640-1-alejandro.lucero-palau@amd.com/) Signed-off-by: Jiandi An <jan@nvidia.com>

Region creation based on Type3 devices is triggered from user space allowing memory combination through interleaving. In preparation for kernel driven region creation, that is Type2 drivers triggering region creation backed with its advertised CXL memory, factor out a common helper from the user-sysfs region setup forinterleave granularity. Signed-off-by: Alejandro Lucero <alucerop@amd.com> Reviewed-by: Zhi Wang <zhiw@nvidia.com> Reviewed-by: Dave Jiang <dave.jiang@intel.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Reviewed-by: Ben Cheatham <benjamin.cheatham@amd.com> Reviewed-by: Alison Schofield <alison.schofield@intel.com> (backported from https://lore.kernel.org/linux-cxl/20260201155438.2664640-1-alejandro.lucero-palau@amd.com/) Signed-off-by: Jiandi An <jan@nvidia.com>

Creating a CXL region requires userspace intervention through the cxl sysfs files. Type2 support should allow accelerator drivers to create such cxl region from kernel code. Adding that functionality and integrating it with current support for memory expanders. Based on https://lore.kernel.org/linux-cxl/168592159835.1948938.1647215579839222774.stgit@dwillia2-xfh.jf.intel.com/ Signed-off-by: Alejandro Lucero <alucerop@amd.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Reviewed-by: Dave Jiang <dave.jiang@intel.com> (backported from https://lore.kernel.org/linux-cxl/20260201155438.2664640-1-alejandro.lucero-palau@amd.com/) [jan: Resolve minor conflict due to code lines shift] Signed-off-by: Jiandi An <jan@nvidia.com>

By definition a type2 cxl device will use the host managed memory for specific functionality, therefore it should not be available to other uses. Signed-off-by: Alejandro Lucero <alucerop@amd.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Reviewed-by: Davidlohr Bueso <daves@stgolabs.net> Reviewed-by: Dave Jiang <dave.jiang@intel.com> Reviewed-by: Ben Cheatham <benjamin.cheatham@amd.com> (backported from https://lore.kernel.org/linux-cxl/20260201155438.2664640-1-alejandro.lucero-palau@amd.com/) Signed-off-by: Jiandi An <jan@nvidia.com>

Use cxl api for creating a region using the endpoint decoder related to a DPA range. Signed-off-by: Alejandro Lucero <alucerop@amd.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Reviewed-by: Dave Jiang <dave.jiang@intel.com> (backported from https://lore.kernel.org/linux-cxl/20260201155438.2664640-1-alejandro.lucero-palau@amd.com/) Signed-off-by: Jiandi An <jan@nvidia.com>

A PIO buffer is a region of device memory to which the driver can write a packet for TX, with the device handling the transmit doorbell without requiring a DMA for getting the packet data, which helps reducing latency in certain exchanges. With CXL mem protocol this latency can be lowered further. With a device supporting CXL and successfully initialised, use the cxl region to map the memory range and use this mapping for PIO buffers. Add the disabling of those CXL-based PIO buffers if the callback for potential cxl endpoint removal by the CXL code happens. Signed-off-by: Alejandro Lucero <alucerop@amd.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Reviewed-by: Dave Jiang <dave.jiang@intel.com> (backported from https://lore.kernel.org/linux-cxl/20260201155438.2664640-1-alejandro.lucero-palau@amd.com/) Signed-off-by: Jiandi An <jan@nvidia.com>

…smaller granularities for lower levels The CXL specification supports multi-level interleaving "as long as all the levels use different, but consecutive, HPA bits to select the target and no Interleave Set has more than 8 devices" (from 3.2). Currently the kernel expects that a decoder's "interleave granularity is a multiple of @parent_port granularity". That is, the granularity of a lower level is bigger than those of the parent and uses the outer HPA bits as selector. It works e.g. for the following 8-way config: * cross-link (cross-hostbridge config in CFMWS): * 4-way * 256 granularity * Selector: HPA[8:9] * sub-link (CXL Host bridge config of the HDM): * 2-way * 1024 granularity * Selector: HPA[10] Now, if the outer HPA bits are used for the cross-hostbridge, an 8-way config could look like this: * cross-link (cross-hostbridge config in CFMWS): * 4-way * 512 granularity * Selector: HPA[9:10] * sub-link (CXL Host bridge config of the HDM): * 2-way * 256 granularity * Selector: HPA[8] The enumeration of decoders for this configuration fails then with following error: cxl region0: pci0000:00:port1 cxl_port_setup_targets expected iw: 2 ig: 1024 [mem 0x10000000000-0x1ffffffffff flags 0x200] cxl region0: pci0000:00:port1 cxl_port_setup_targets got iw: 2 ig: 256 state: enabled 0x10000000000:0x1ffffffffff cxl_port endpoint12: failed to attach decoder12.0 to region0: -6 Note that this happens only if firmware is setting up the decoders (CXL_REGION_F_AUTO). For userspace region assembly the granularities are chosen to increase from root down to the lower levels. That is, outer HPA bits are always used for lower interleaving levels. Rework the implementation to also support multi-level interleaving with smaller granularities for lower levels. Determine the interleave set of autodetected decoders. Check that it is a subset of the root interleave. The HPA selector bits are extracted for all decoders of the set and checked that there is no overlap and bits are consecutive. All decoders can be programmed now to use any bit range within the region's target selector. Signed-off-by: Robert Richter <rrichter@amd.com> (backported from https://lore.kernel.org/all/20251028094754.72816-1-rrichter@amd.com/) [jan: Resolved minor conflicts] Signed-off-by: Jiandi An <jan@nvidia.com>

…er definitions PCI: Add CXL DVSEC control, lock, and range register definitions Add register offset and field definitions for CXL DVSEC registers needed by CXL state save/restore across resets: - CTRL2 (offset 0x10) and LOCK (offset 0x14) registers - CONFIG_LOCK bit in the LOCK register - RWL (read-write-when-locked) field masks for CTRL and range base registers. Signed-off-by: Srirangan Madhavan <smadhavan@nvidia.com> (backported from https://lore.kernel.org/linux-cxl/20260306080026.116789-1-smadhavan@nvidia.com/) Signed-off-by: Jiandi An <jan@nvidia.com>

… to include/cxl/cxl.h Move CXL HDM decoder register defines, register map structs (cxl_reg_map, cxl_component_reg_map, cxl_device_reg_map, cxl_pmu_reg_map, cxl_register_map), cxl_hdm_decoder_count(), enum cxl_regloc_type, and cxl_find_regblock()/cxl_setup_regs() declarations from internal CXL headers to include/cxl/pci.h. This makes them accessible to code outside the CXL subsystem, in particular the PCI core CXL state save/restore support added in a subsequent patch. No functional change. Signed-off-by: Srirangan Madhavan <smadhavan@nvidia.com> (backported from https://lore.kernel.org/linux-cxl/20260306080026.116789-1-smadhavan@nvidia.com/) [jan: Resolve conflicts by moving certain definitions to include/cxl/cxl.h instead of to include/cxl/pci.h to align with its dependency of Alejandro's series] Signed-off-by: Jiandi An <jan@nvidia.com>

…state Add pci_add_virtual_ext_cap_save_buffer() to allocate save buffers using virtual cap IDs (above PCI_EXT_CAP_ID_MAX) that don't require a real capability in config space. The existing pci_add_ext_cap_save_buffer() cannot be used for CXL DVSEC state because it calls pci_find_saved_ext_cap() which searches for a matching capability in PCI config space. The CXL state saved here is a synthetic snapshot (DVSEC+HDM) and should not be tied to a real extended-cap instance. A virtual extended-cap save buffer API (cap IDs above PCI_EXT_CAP_ID_MAX) allows PCI to track this state without a backing config space capability. Signed-off-by: Srirangan Madhavan <smadhavan@nvidia.com> (backported from https://lore.kernel.org/linux-cxl/20260306080026.116789-1-smadhavan@nvidia.com/) Signed-off-by: Jiandi An <jan@nvidia.com>

Save and restore CXL DVSEC control registers (CTRL, CTRL2), range base registers, and lock state across PCI resets. When the DVSEC CONFIG_LOCK bit is set, certain DVSEC fields become read-only and hardware may have updated them. Blindly restoring saved values would be silently ignored or conflict with hardware state. Instead, a read-merge-write approach is used: current hardware values are read for the RWL (read-write-when-locked) fields and merged with saved state, so only writable bits are restored while locked bits retain their hardware values. Hooked into pci_save_state()/pci_restore_state() so all PCI reset paths automatically preserve CXL DVSEC configuration. Signed-off-by: Srirangan Madhavan <smadhavan@nvidia.com> (backported from https://lore.kernel.org/linux-cxl/20260306080026.116789-1-smadhavan@nvidia.com/) [jan: Resolve minor conflict in drivers/pci/Makefile due to code line shifts ] Signed-off-by: Jiandi An <jan@nvidia.com>

Save and restore CXL HDM decoder registers (global control, per-decoder base/size/target-list, and commit state) across PCI resets. On restore, decoders that were committed are reprogrammed and recommitted with a 10ms timeout. Locked decoders that are already committed are skipped, since their state is protected by hardware and reprogramming them would fail. The Register Locator DVSEC is parsed directly via PCI config space reads rather than calling cxl_find_regblock()/cxl_setup_regs(), since this code lives in the PCI core and must not depend on CXL module symbols. MSE is temporarily enabled during save/restore to allow MMIO access to the HDM decoder register block. Signed-off-by: Srirangan Madhavan <smadhavan@nvidia.com> (backported from https://lore.kernel.org/linux-cxl/20260306080026.116789-1-smadhavan@nvidia.com/) [jan: Include <cxl/cxl.h> in drivers/pci/cxl.c due to conflict resolution in "4acbc27592b8 NVIDIA: VR: SAUCE: cxl: Move HDM decoder and register map definitions to include/cxl/cxl.h"] Signed-off-by: Jiandi An <jan@nvidia.com>

…efinitions Add CXL DVSEC register definitions needed for CXL device reset per CXL r3.2 section 8.1.3.1: - Capability bits: RST_CAPABLE, CACHE_CAPABLE, CACHE_WBI_CAPABLE, RST_TIMEOUT, RST_MEM_CLR_CAPABLE - Control2 register: DISABLE_CACHING, INIT_CACHE_WBI, INIT_CXL_RST, RST_MEM_CLR_EN - Status2 register: CACHE_INV, RST_DONE, RST_ERR - Non-CXL Function Map DVSEC register offset Signed-off-by: Srirangan Madhavan <smadhavan@nvidia.com> (backported from https://lore.kernel.org/linux-cxl/20260306092322.148765-1-smadhavan@nvidia.com/) [jan: Resolve conflicts where PCI_DVSEC_CXL_CACHE_CAPABLE is already added by "72bd823fb4f1 NVIDIA: VR: SAUCE: PCI: Allow ATS to be always on for CXL.cache capable devices"] Signed-off-by: Jiandi An <jan@nvidia.com>

…_restore() Export pci_dev_save_and_disable() and pci_dev_restore() so that subsystems performing non-standard reset sequences (e.g. CXL) can reuse the PCI core standard pre/post reset lifecycle: driver reset_prepare/reset_done callbacks, PCI config space save/restore, and device disable/re-enable. Signed-off-by: Srirangan Madhavan <smadhavan@nvidia.com> (backported from https://lore.kernel.org/linux-cxl/20260306092322.148765-1-smadhavan@nvidia.com/) Signed-off-by: Jiandi An <jan@nvidia.com>

Add infrastructure for quiescing the CXL data path before reset: - Memory offlining: check if CXL-backed memory is online and offline it via offline_and_remove_memory() before reset, per CXL spec requirement to quiesce all CXL.mem transactions before issuing CXL Reset. - CPU cache flush: invalidate cache lines before reset as a safety measure after memory offline. Signed-off-by: Srirangan Madhavan <smadhavan@nvidia.com> (backported from https://lore.kernel.org/linux-cxl/20260306092322.148765-1-smadhavan@nvidia.com/) Signed-off-by: Jiandi An <jan@nvidia.com>

…XL reset Add sibling PCI function save/disable/restore coordination for CXL reset. Before reset, all CXL.cachemem sibling functions are locked, saved, and disabled; after reset they are restored. The Non-CXL Function Map DVSEC and per-function DVSEC capability register are consulted to skip non-CXL and CXL.io-only functions. A global mutex serializes concurrent resets to prevent deadlocks between sibling functions. Signed-off-by: Srirangan Madhavan <smadhavan@nvidia.com> (backported from https://lore.kernel.org/linux-cxl/20260306092322.148765-1-smadhavan@nvidia.com/) Signed-off-by: Jiandi An <jan@nvidia.com>

…ration cxl_dev_reset() implements the hardware reset sequence: optionally enable memory clear, initiate reset via CTRL2, wait for completion, and re-enable caching. cxl_do_reset() orchestrates the full reset flow: 1. CXL pre-reset: mem offlining and cache flush (when memdev present) 2. PCI save/disable: pci_dev_save_and_disable() automatically saves CXL DVSEC and HDM decoder state via PCI core hooks 3. Sibling coordination: save/disable CXL.cachemem sibling functions 4. Execute CXL DVSEC reset 5. Sibling restore: always runs to re-enable sibling functions 6. PCI restore: pci_dev_restore() automatically restores CXL state The CXL-specific DVSEC and HDM save/restore is handled by the PCI core's CXL save/restore infrastructure (drivers/pci/cxl.c). Signed-off-by: Srirangan Madhavan <smadhavan@nvidia.com> (backported from https://lore.kernel.org/linux-cxl/20260306092322.148765-1-smadhavan@nvidia.com/) Signed-off-by: Jiandi An <jan@nvidia.com>

Add a "cxl_reset" sysfs attribute to PCI devices that support CXL Reset (CXL r3.2 section 8.1.3.1). The attribute is visible only on devices with both CXL.cache and CXL.mem capabilities and the CXL Reset Capable bit set in the DVSEC. Writing "1" to the attribute triggers the full CXL reset flow via cxl_do_reset(). The interface is decoupled from memdev creation: when a CXL memdev exists, memory offlining and cache flush are performed; otherwise reset proceeds without the memory management. The sysfs attribute is managed entirely by the CXL module using sysfs_create_group() / sysfs_remove_group() rather than the PCI core's static attribute groups. This avoids cross-module symbol dependencies between the PCI core (always built-in) and CXL_BUS (potentially modular). At module init, existing PCI devices are scanned and a PCI bus notifier handles hot-plug/unplug. kernfs_drain() makes sure that any in-flight store() completes before sysfs_remove_group() returns, preventing use-after-free during module unload. Signed-off-by: Srirangan Madhavan <smadhavan@nvidia.com> (backported from https://lore.kernel.org/linux-cxl/20260306092322.148765-1-smadhavan@nvidia.com/) Signed-off-by: Jiandi An <jan@nvidia.com>

…tribute Document the cxl_reset sysfs attribute added to PCI devices that support CXL Reset. Signed-off-by: Srirangan Madhavan <smadhavan@nvidia.com> (backported from https://lore.kernel.org/linux-cxl/20260306092322.148765-1-smadhavan@nvidia.com/) Signed-off-by: Jiandi An <jan@nvidia.com>

…and RAS support Add Ubuntu kernel config annotations for CXL-related configs introduced or changed by the following cherry-picked patch series: - drivers/cxl changes between v6.17.9 and upstream 7.0 (which includes a portion of Terry Bowman's v14 CXL RAS series merged via for-7.0/cxl-aer-prep) - Alejandro Lucero's v23 CXL Type-2 device support series - Smita Koralahalli's v6 patch 3/9 (cxl/region: Skip decoder reset on detach for autodiscovered regions) CONFIG_CXL_BUS: Enable CXL bus support built-in; required for CXL Type-2 device and RAS support CONFIG_CXL_PCI: Enable CXL PCI management built-in; auto-selects CXL_MEM; required for CXL Type-2 device support CONFIG_CXL_MEM: Auto-selected by CXL_PCI; required for CXL memory expansion and Type-2 device support CONFIG_CXL_PORT: Required for CXL port enumeration; defaults to CXL_BUS value CONFIG_FWCTL: Selected by CXL_BUS when CXL_FEATURES is enabled; required for CXL feature mailbox access CONFIG_CXL_RAS: New def_bool replacing PCIEAER_CXL (Terry Bowman v14); auto-enabled with ACPI_APEI_GHES+PCIEAER+ CXL_BUS for CXL RAS error handling CONFIG_SFC_CXL: Solarflare SFC9100-family CXL Type-2 device support; not needed for NVIDIA platforms (n) CONFIG_ACPI_APEI_EINJ: Required prerequisite for CONFIG_ACPI_APEI_EINJ_CXL CONFIG_ACPI_APEI_EINJ_CXL: CXL protocol error injection support via APEI EINJ CONFIG_PCIEAER_CXL: Remove it from debian.master policy. This config was removed from Kconfig by upstream commit d18f1b7 (PCI/AER: Replace PCIEAER_CXL symbol with CXL_RAS) which is included in this port. CONFIG_ARCH_HAS_CPU_CACHE_INVALIDATE_MEMREGION: Override debian.master amd64-only policy to include arm64. Commit 4d873c5 added 'select ARCH_HAS_CPU_CACHE_INVALIDATE_MEMREGION' to arch/arm64/Kconfig, making this y on arm64 as well. CONFIG_GENERIC_CPU_CACHE_MAINTENANCE: New bool config defined by c460697 in lib/Kconfig. Selected by arm64 via 4d873c5; not selected by x86. Set arm64: y, amd64: -. CONFIG_CACHEMAINT_FOR_HOTPLUG: New optional menuconfig defined by 2ec3b54 in drivers/cache/Kconfig. Depends on GENERIC_CPU_CACHE_MAINTENANCE so becomes visible on arm64. Defaults to n; HiSilicon HHA driver not needed for NVIDIA platforms. Set arm64: n, amd64: -. Signed-off-by: Jiandi An <jan@nvidia.com>

…memory access Override debian.master policy (m->y) for DEV_DAX, DEV_DAX_CXL, and DEV_DAX_KMEM to ensure CXL memory regions are accessible as both raw DAX devices and hotplugged System-RAM nodes. debian.master sets these to 'm' (modules). For NVIDIA platforms with CXL Type-2 devices, built-in (y) is required to ensure CXL memory regions provisioned early in boot are immediately accessible without relying on module loading order. CONFIG_DEV_DAX: Override m->y; prerequisite for DEV_DAX_CXL and DEV_DAX_KMEM to be built-in; depends on TRANSPARENT_HUGEPAGE (already y in debian.master) CONFIG_DEV_DAX_CXL: Override m->y; creates /dev/daxX.Y devices for CXL RAM regions not in the default system memory map (Soft Reserved or dynamically provisioned regions); depends on CXL_BUS+CXL_REGION+DEV_DAX (all y) CONFIG_DEV_DAX_KMEM: Override m->y; onlines CXL DAX devices as System-RAM NUMA nodes via memory hotplug, making CXL memory available for normal kernel and userspace allocation Signed-off-by: Jiandi An <jan@nvidia.com>

…/restore Add Ubuntu kernel config annotation for CONFIG_PCI_CXL introduced by the CXL DVSEC and HDM state save/restore series (Srirangan Madhavan). CONFIG_PCI_CXL: Hidden bool in drivers/pci/Kconfig; auto-enabled when CXL_BUS=y. Gates compilation of drivers/pci/cxl.o which saves and restores CXL DVSEC control/range registers and HDM decoder state across PCI resets and link transitions. Signed-off-by: Jiandi An <jan@nvidia.com>

NPEM registers LED classdevs on PCI endpoint that may be behind hotplug-capable ports. During hot-removal, led_classdev_unregister() calls led_set_brightness(LED_OFF) which PCI config on a disconnected device, returning -ENODEV: ``` leds 0003:01:00.0:enclosure:ok: Setting an LED's brightness failed (-19) ``` The LED core already suppresses this for devices with LED_HW_PLUGGABLE set, but NPEM never sets it. Add the flag since NPEM LEDs are on hot-pluggable hardware by nature. Fixes: 4e89354 ("PCI/NPEM: Add Native PCIe Enclosure Management support") Signed-off-by: Richard Cheng <icheng@nvidia.com> --- NV-bug link: https://nvbugspro.nvidia.com/bug/5739122 Testing machine: vr-loaner-36 with Montage Technology c002 (CXL 2.x, 128GB, rev 03) Reproducing steps: https://nvbugspro.nvidia.com/bug/5739122?commentNumber=92 Best regards, Richard Cheng.

BugLink: https://bugs.launchpad.net/bugs/2139960 [ Upstream commit 163e5f2 ] When using perf record with the `--overwrite` option, a segmentation fault occurs if an event fails to open. For example: perf record -e cycles-ct -F 1000 -a --overwrite Error: cycles-ct:H: PMU Hardware doesn't support sampling/overflow-interrupts. Try 'perf stat' perf: Segmentation fault #0 0x6466b6 in dump_stack debug.c:366 #1 0x646729 in sighandler_dump_stack debug.c:378 #2 0x453fd1 in sigsegv_handler builtin-record.c:722 NVIDIA#3 0x7f8454e65090 in __restore_rt libc-2.32.so[54090] NVIDIA#4 0x6c5671 in __perf_event__synthesize_id_index synthetic-events.c:1862 NVIDIA#5 0x6c5ac0 in perf_event__synthesize_id_index synthetic-events.c:1943 NVIDIA#6 0x458090 in record__synthesize builtin-record.c:2075 NVIDIA#7 0x45a85a in __cmd_record builtin-record.c:2888 NVIDIA#8 0x45deb6 in cmd_record builtin-record.c:4374 NVIDIA#9 0x4e5e33 in run_builtin perf.c:349 NVIDIA#10 0x4e60bf in handle_internal_command perf.c:401 NVIDIA#11 0x4e6215 in run_argv perf.c:448 NVIDIA#12 0x4e653a in main perf.c:555 NVIDIA#13 0x7f8454e4fa72 in __libc_start_main libc-2.32.so[3ea72] NVIDIA#14 0x43a3ee in _start ??:0 The --overwrite option implies --tail-synthesize, which collects non-sample events reflecting the system status when recording finishes. However, when evsel opening fails (e.g., unsupported event 'cycles-ct'), session->evlist is not initialized and remains NULL. The code unconditionally calls record__synthesize() in the error path, which iterates through the NULL evlist pointer and causes a segfault. To fix it, move the record__synthesize() call inside the error check block, so it's only called when there was no error during recording, ensuring that evlist is properly initialized. Fixes: 4ea648a ("perf record: Add --tail-synthesize option") Signed-off-by: Shuai Xue <xueshuai@linux.alibaba.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Manuel Diewald <manuel.diewald@canonical.com> Signed-off-by: Edoardo Canepa <edoardo.canepa@canonical.com>

shankerd04 and others added 30 commits February 13, 2026 16:49

alucerop and others added 26 commits March 24, 2026 11:58

nvax-r mentioned this pull request Mar 30, 2026

[linux-nvidia-6.17-next] Backport TLB flush optimizations and fixes for ARM64 NVIDIA/NV-Kernels#350

Closed

JiandiAnNVIDIA force-pushed the cxl_2026-03-04 branch from acf188b to 2d99890 Compare April 15, 2026 10:01

JiandiAnNVIDIA force-pushed the cxl_2026-03-04 branch from 2d99890 to 5c70002 Compare April 15, 2026 23:05

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PCI/NPEM: Set LED_HW_PLUGGABLE for hotplug-capable ports#2

PCI/NPEM: Set LED_HW_PLUGGABLE for hotplug-capable ports#2
nvax-r wants to merge 643 commits into
JiandiAnNVIDIA:cxl_2026-03-04from
nvax-r:cxl_fix_2026_03-30

nvax-r commented Mar 30, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

20 participants

Conversation

nvax-r commented Mar 30, 2026

Description

Related links

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

20 participants