Skip to content

PCI/NPEM: Set LED_HW_PLUGGABLE for hotplug-capable ports#2

Open
nvax-r wants to merge 643 commits into
JiandiAnNVIDIA:cxl_2026-03-04from
nvax-r:cxl_fix_2026_03-30
Open

PCI/NPEM: Set LED_HW_PLUGGABLE for hotplug-capable ports#2
nvax-r wants to merge 643 commits into
JiandiAnNVIDIA:cxl_2026-03-04from
nvax-r:cxl_fix_2026_03-30

Conversation

@nvax-r
Copy link
Copy Markdown

@nvax-r nvax-r commented Mar 30, 2026

Description

NPEM registers LED classdevs on PCI endpoint that may be behind hotplug-capable ports. During hot-removal, led_classdev_unregister() calls led_set_brightness(LED_OFF) which PCI config on a disconnected device, returning -ENODEV:

leds 0003:01:00.0:enclosure:ok: Setting an LED's brightness failed (-19)

The LED core already suppresses this for devices with LED_HW_PLUGGABLE set, but NPEM never sets it. Add the flag since NPEM LEDs are on hot-pluggable hardware by nature.

Related links

NV-bug link: https://nvbugspro.nvidia.com/bug/5739122
Reproducing steps: https://nvbugspro.nvidia.com/bug/5739122?commentNumber=92

shankerd04 and others added 30 commits February 13, 2026 16:49
BugLink: https://bugs.launchpad.net/bugs/2139315

Add cpu part and model macro definitions for NVIDIA Olympus core.

Signed-off-by: Shanker Donthineni <sdonthineni@nvidia.com>
Signed-off-by: Will Deacon <will@kernel.org>
(cherry picked from commit e185c8a)
Signed-off-by: Matthew R. Ochs <mochs@nvidia.com>
Acked-by: Jamie Nguyen <jamien@nvidia.com>
Acked-by: Carol L Soto <csoto@nvidia.com>
Acked-by: Jacob Martin <jacob.martin@canonical.com>
Acked-by: Abdur Rahman <abdur.rahman@canonical.com>
Signed-off-by: Brad Figg <bfigg@nvidia.com>
BugLink: https://bugs.launchpad.net/bugs/2139315

Add the part number and MIDR for NVIDIA Olympus.

Signed-off-by: Besar Wicaksono <bwicaksono@nvidia.com>
Reviewed-by: Leo Yan <leo.yan@arm.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
(cherry picked from commit d5e4c71)
Signed-off-by: Matthew R. Ochs <mochs@nvidia.com>
Acked-by: Jamie Nguyen <jamien@nvidia.com>
Acked-by: Carol L Soto <csoto@nvidia.com>
Acked-by: Jacob Martin <jacob.martin@canonical.com>
Acked-by: Abdur Rahman <abdur.rahman@canonical.com>
Signed-off-by: Brad Figg <bfigg@nvidia.com>
BugLink: https://bugs.launchpad.net/bugs/2139315

Add NVIDIA Olympus MIDR to neoverse_spe range list.

Signed-off-by: Besar Wicaksono <bwicaksono@nvidia.com>
Reviewed-by: Leo Yan <leo.yan@arm.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
(backported from commit d852b83)
[mochs: Minor context cleanup due to absence of "perf arm_spe: Add CPU variants supporting common data source packet"]
Signed-off-by: Matthew R. Ochs <mochs@nvidia.com>
Acked-by: Jamie Nguyen <jamien@nvidia.com>
Acked-by: Carol L Soto <csoto@nvidia.com>
Acked-by: Jacob Martin <jacob.martin@canonical.com>
Acked-by: Abdur Rahman <abdur.rahman@canonical.com>
Signed-off-by: Brad Figg <bfigg@nvidia.com>
BugLink: https://bugs.launchpad.net/bugs/2139315

The documentation in nvidia-pmu.rst contains PMUs specific
to NVIDIA Tegra241 SoC. Rename the file for this specific
SoC to have better distinction with other NVIDIA SoC.

Signed-off-by: Besar Wicaksono <bwicaksono@nvidia.com>
(backported from https://lore.kernel.org/all/20260126181155.2776097-1-bwicaksono@nvidia.com/)
Signed-off-by: Matthew R. Ochs <mochs@nvidia.com>
Acked-by: Jamie Nguyen <jamien@nvidia.com>
Acked-by: Carol L Soto <csoto@nvidia.com>
Acked-by: Jacob Martin <jacob.martin@canonical.com>
Acked-by: Abdur Rahman <abdur.rahman@canonical.com>
Signed-off-by: Brad Figg <bfigg@nvidia.com>
BugLink: https://bugs.launchpad.net/bugs/2139315

Adds Unified Coherent Fabric PMU support in Tegra410 SOC.

Signed-off-by: Besar Wicaksono <bwicaksono@nvidia.com>
(backported from https://lore.kernel.org/all/20260126181155.2776097-1-bwicaksono@nvidia.com/)
Signed-off-by: Matthew R. Ochs <mochs@nvidia.com>
Acked-by: Jamie Nguyen <jamien@nvidia.com>
Acked-by: Carol L Soto <csoto@nvidia.com>
Acked-by: Jacob Martin <jacob.martin@canonical.com>
Acked-by: Abdur Rahman <abdur.rahman@canonical.com>
Signed-off-by: Brad Figg <bfigg@nvidia.com>
BugLink: https://bugs.launchpad.net/bugs/2139315

Add interface to get ACPI device associated with the
PMU. This ACPI device may contain additional properties
not covered by the standard properties.

Signed-off-by: Besar Wicaksono <bwicaksono@nvidia.com>
(backported from https://lore.kernel.org/all/20260126181155.2776097-1-bwicaksono@nvidia.com/)
Signed-off-by: Matthew R. Ochs <mochs@nvidia.com>
Acked-by: Jamie Nguyen <jamien@nvidia.com>
Acked-by: Carol L Soto <csoto@nvidia.com>
Acked-by: Jacob Martin <jacob.martin@canonical.com>
Acked-by: Abdur Rahman <abdur.rahman@canonical.com>
Signed-off-by: Brad Figg <bfigg@nvidia.com>
BugLink: https://bugs.launchpad.net/bugs/2139315

Adds PCIE PMU support in Tegra410 SOC.

Signed-off-by: Besar Wicaksono <bwicaksono@nvidia.com>
(backported from https://lore.kernel.org/all/20260126181155.2776097-1-bwicaksono@nvidia.com/)
Signed-off-by: Matthew R. Ochs <mochs@nvidia.com>
Acked-by: Jamie Nguyen <jamien@nvidia.com>
Acked-by: Carol L Soto <csoto@nvidia.com>
Acked-by: Jacob Martin <jacob.martin@canonical.com>
Acked-by: Abdur Rahman <abdur.rahman@canonical.com>
Signed-off-by: Brad Figg <bfigg@nvidia.com>
BugLink: https://bugs.launchpad.net/bugs/2139315

Adds PCIE-TGT PMU support in Tegra410 SOC.

Signed-off-by: Besar Wicaksono <bwicaksono@nvidia.com>
(backported from https://lore.kernel.org/all/20260126181155.2776097-1-bwicaksono@nvidia.com/)
Signed-off-by: Matthew R. Ochs <mochs@nvidia.com>
Acked-by: Jamie Nguyen <jamien@nvidia.com>
Acked-by: Carol L Soto <csoto@nvidia.com>
Acked-by: Jacob Martin <jacob.martin@canonical.com>
Acked-by: Abdur Rahman <abdur.rahman@canonical.com>
Signed-off-by: Brad Figg <bfigg@nvidia.com>
BugLink: https://bugs.launchpad.net/bugs/2139315

Adds CPU Memory (CMEM) Latency  PMU support in Tegra410 SOC.

Signed-off-by: Besar Wicaksono <bwicaksono@nvidia.com>
(backported from https://lore.kernel.org/all/20260126181155.2776097-1-bwicaksono@nvidia.com/)
Signed-off-by: Matthew R. Ochs <mochs@nvidia.com>
Acked-by: Jamie Nguyen <jamien@nvidia.com>
Acked-by: Carol L Soto <csoto@nvidia.com>
Acked-by: Jacob Martin <jacob.martin@canonical.com>
Acked-by: Abdur Rahman <abdur.rahman@canonical.com>
Signed-off-by: Brad Figg <bfigg@nvidia.com>
BugLink: https://bugs.launchpad.net/bugs/2139315

Adds NVIDIA C2C PMU support in Tegra410 SOC.

Signed-off-by: Besar Wicaksono <bwicaksono@nvidia.com>
(backported from https://lore.kernel.org/all/20260126181155.2776097-1-bwicaksono@nvidia.com/)
Signed-off-by: Matthew R. Ochs <mochs@nvidia.com>
Acked-by: Jamie Nguyen <jamien@nvidia.com>
Acked-by: Carol L Soto <csoto@nvidia.com>
Acked-by: Jacob Martin <jacob.martin@canonical.com>
Acked-by: Abdur Rahman <abdur.rahman@canonical.com>
Signed-off-by: Brad Figg <bfigg@nvidia.com>
BugLink: https://bugs.launchpad.net/bugs/2139315

Enable driver for NVIDIA TEGRA410 CMEM Latency and C2C PMU device.

Signed-off-by: Besar Wicaksono <bwicaksono@nvidia.com>
(backported from https://lore.kernel.org/all/20260126181155.2776097-1-bwicaksono@nvidia.com/)
Signed-off-by: Matthew R. Ochs <mochs@nvidia.com>
Acked-by: Jamie Nguyen <jamien@nvidia.com>
Acked-by: Carol L Soto <csoto@nvidia.com>
Acked-by: Jacob Martin <jacob.martin@canonical.com>
Acked-by: Abdur Rahman <abdur.rahman@canonical.com>
Signed-off-by: Brad Figg <bfigg@nvidia.com>
… events

BugLink: https://bugs.launchpad.net/bugs/2139315

Add JSON files for NVIDIA Tegra410 Olympus core PMU events.
Also updated the common-and-microarch.json.

Signed-off-by: Besar Wicaksono <bwicaksono@nvidia.com>
(backported from https://lore.kernel.org/all/20260127225909.3296202-1-bwicaksono@nvidia.com/)
Signed-off-by: Matthew R. Ochs <mochs@nvidia.com>
Acked-by: Jamie Nguyen <jamien@nvidia.com>
Acked-by: Carol L Soto <csoto@nvidia.com>
Acked-by: Jacob Martin <jacob.martin@canonical.com>
Acked-by: Abdur Rahman <abdur.rahman@canonical.com>
Signed-off-by: Brad Figg <bfigg@nvidia.com>
…EGRA410_CMEM_LATENCY_PMU

BugLink: https://bugs.launchpad.net/bugs/2139315

Set the following kconfigs to enable these PMUs on T410:
    CONFIG_NVIDIA_TEGRA410_C2C_PMU=m
    CONFIG_NVIDIA_TEGRA410_CMEM_LATENCY_PMU=m

Signed-off-by: Matthew R. Ochs <mochs@nvidia.com>
Acked-by: Jamie Nguyen <jamien@nvidia.com>
Acked-by: Carol L Soto <csoto@nvidia.com>
Acked-by: Jacob Martin <jacob.martin@canonical.com>
Acked-by: Abdur Rahman <abdur.rahman@canonical.com>
Signed-off-by: Brad Figg <bfigg@nvidia.com>
BugLink: https://bugs.launchpad.net/bugs/2138131

Properly pass the variadic arguments so it can be called with or without
them depending on the format.

Signed-off-by: Lucas De Marchi <ldemarchi@nvidia.com>
Acked-by: Carol L Soto <csoto@nvidia.com>
Acked-by: Jamie Nguyen <jamien@nvidia.com>
Acked-by: Jacob Martin <jacob.martin@canonical.com>
Acked-by: Abdur Rahman <abdur.rahman@canonical.com>
Signed-off-by: Brad Figg <bfigg@nvidia.com>
BugLink: https://bugs.launchpad.net/bugs/2140343

A vDEVICE has been a hard requirement for attaching a nested domain to the
device. This makes sense when installing a guest STE, since a vSID must be
present and given to the kernel during the vDEVICE allocation.

But, when CR0.SMMUEN is disabled, VM doesn't really need a vSID to program
the vSMMU behavior as GBPA will take effect, in which case the vSTE in the
nested domain could have carried the bypass or abort configuration in GBPA
register. Thus, having such a hard requirement doesn't work well for GBPA.

Skip vmaster allocation in arm_smmu_attach_prepare_vmaster() for an abort
or bypass vSTE. Note that device on this attachment won't report vevents.

Update the uAPI doc accordingly.

Link: https://patch.msgid.link/r/20251103172755.2026145-1-nicolinc@nvidia.com
Tested-by: Shameer Kolothum <skolothumtho@nvidia.com>
Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Pranjal Shrivastava <praan@google.com>
Tested-by: Shuai Xue <xueshuai@linux.alibaba.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
(backported from commit 81c45c6)
Signed-off-by: Nathan Chen <nathanc@nvidia.com>
Acked-by: Matthew R. Ochs <mochs@nvidia.com>
Acked-by: Jamie Nguyen <jamien@nvidia.com>
Acked-by: Carol L Soto <csoto@nvidia.com>
Acked-by: Jacob Martin <jacob.martin@canonical.com>
Acked-by: Noah Wager <noah.wager@canonical.com>
Signed-off-by: Brad Figg <bfigg@nvidia.com>
BugLink: https://bugs.launchpad.net/bugs/2140343

The function hugetlb_reserve_pages() returns the number of pages added
to the reservation map on success and a negative error code on failure
(e.g. -EINVAL, -ENOMEM). However, in some error paths, it may return -1
directly.

For example, a failure at:

    if (hugetlb_acct_memory(h, gbl_reserve) < 0)
        goto out_put_pages;

results in returning -1 (since add = -1), which may be misinterpreted
in userspace as -EPERM.

Fix this by explicitly capturing and propagating the return values from
helper functions, and using -EINVAL for all other failure cases.

Link: https://lkml.kernel.org/r/20251125171350.86441-1-skolothumtho@nvidia.com
Fixes: 986f5f2 ("mm/hugetlb: make hugetlb_reserve_pages() return nr of entries updated")
Signed-off-by: Shameer Kolothum <skolothumtho@nvidia.com>
Reviewed-by: Joshua Hahn <joshua.hahnjy@gmail.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Acked-by: Oscar Salvador <osalvador@suse.de>
Cc: Matthew R. Ochs <mochs@nvidia.com>
Cc: Muchun Song <muchun.song@linux.dev>
Cc: Nicolin Chen <nicolinc@nvidia.com>
Cc: Vivek Kasireddy <vivek.kasireddy@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
(backported from commit 9ee5d17)
Signed-off-by: Nathan Chen <nathanc@nvidia.com>
Acked-by: Matthew R. Ochs <mochs@nvidia.com>
Acked-by: Jamie Nguyen <jamien@nvidia.com>
Acked-by: Carol L Soto <csoto@nvidia.com>
Acked-by: Jacob Martin <jacob.martin@canonical.com>
Acked-by: Noah Wager <noah.wager@canonical.com>
Signed-off-by: Brad Figg <bfigg@nvidia.com>
BugLink: https://bugs.launchpad.net/bugs/2140343

The Enable bits in CMDQV/VINTF/VCMDQ_CONFIG registers do not actually reset
the HW registers. So, the driver explicitly clears all the registers when a
VINTF or VCMDQ is being initialized calling its hw_deinit() function.

However, a userspace VCMDQ is not properly reset, unlike an in-kernel VCMDQ
getting reset in tegra241_vcmdq_hw_init().

Meanwhile, tegra241_vintf_hw_init() calling tegra241_vintf_hw_deinit() will
not deinit any VCMDQ, since there is no userspace VCMDQ mapped to the VINTF
at that stage.

Then, this may result in dirty VCMDQ registers, which can fail the VM.

Like tegra241_vcmdq_hw_init(), reset a VCMDQ in tegra241_vcmdq_hw_init() to
fix this bug. This is required by a host kernel.

Fixes: 6717f26ab1e7 ("iommu/tegra241-cmdqv: Add user-space use support")
Cc: stable@vger.kernel.org
Reported-by: Bao Nguyen <ncqb@google.com>
Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
(backported from commit 80f1a2c)
Signed-off-by: Nathan Chen <nathanc@nvidia.com>
Acked-by: Matthew R. Ochs <mochs@nvidia.com>
Acked-by: Jamie Nguyen <jamien@nvidia.com>
Acked-by: Carol L Soto <csoto@nvidia.com>
Acked-by: Jacob Martin <jacob.martin@canonical.com>
Acked-by: Noah Wager <noah.wager@canonical.com>
Signed-off-by: Brad Figg <bfigg@nvidia.com>
… transfer

BugLink: https://bugs.launchpad.net/bugs/2139640

When the ISR thread wakes up late and finds that the timeout handler
has already processed the transfer (curr_xfer is NULL), return
IRQ_HANDLED instead of IRQ_NONE.

Use a similar approach to tegra_qspi_handle_timeout() by reading
QSPI_TRANS_STATUS and checking the QSPI_RDY bit to determine if the
hardware actually completed the transfer. If QSPI_RDY is set, the
interrupt was legitimate and triggered by real hardware activity.
The fact that the timeout path handled it first doesn't make it
spurious. Returning IRQ_NONE incorrectly suggests the interrupt
wasn't for this device, which can cause issues with shared interrupt
lines and interrupt accounting.

Fixes: b4e002d ("spi: tegra210-quad: Fix timeout handling")
Signed-off-by: Breno Leitao <leitao@debian.org>
Signed-off-by: Usama Arif <usamaarif642@gmail.com>
Tested-by: Jon Hunter <jonathanh@nvidia.com>
Acked-by: Jon Hunter <jonathanh@nvidia.com>
Acked-by: Thierry Reding <treding@nvidia.com>
Link: https://patch.msgid.link/20260126-tegra_xfer-v2-1-6d2115e4f387@debian.org
Signed-off-by: Mark Brown <broonie@kernel.org>
(cherry picked from commit aabd8ea linux-next)
Signed-off-by: Carol L Soto <csoto@nvidia.com>
Acked-by: Matthew R. Ochs <mochs@nvidia.com>
Acked-by: Jamie Nguyen <jamien@nvidia.com>
Acked-by: Abdur Rahman <abdur.rahman@canonical.com>
Acked-by: Noah Wager <noah.wager@canonical.com>
Signed-off-by: Brad Figg <bfigg@nvidia.com>
BugLink: https://bugs.launchpad.net/bugs/2139640

Move the assignment of the transfer pointer from curr_xfer inside the
spinlock critical section in both handle_cpu_based_xfer() and
handle_dma_based_xfer().

Previously, curr_xfer was read before acquiring the lock, creating a
window where the timeout path could clear curr_xfer between reading it
and using it. By moving the read inside the lock, the handlers are
guaranteed to see a consistent value that cannot be modified by the
timeout path.

Fixes: 921fc18 ("spi: tegra210-quad: Add support for Tegra210 QSPI controller")
Signed-off-by: Breno Leitao <leitao@debian.org>
Acked-by: Thierry Reding <treding@nvidia.com>
Tested-by: Jon Hunter <jonathanh@nvidia.com>
Acked-by: Jon Hunter <jonathanh@nvidia.com>
Link: https://patch.msgid.link/20260126-tegra_xfer-v2-2-6d2115e4f387@debian.org
Signed-off-by: Mark Brown <broonie@kernel.org>
(cherry picked from commit ef13ba3 linux-next)
Signed-off-by: Carol L Soto <csoto@nvidia.com>
Acked-by: Matthew R. Ochs <mochs@nvidia.com>
Acked-by: Jamie Nguyen <jamien@nvidia.com>
Acked-by: Abdur Rahman <abdur.rahman@canonical.com>
Acked-by: Noah Wager <noah.wager@canonical.com>
Signed-off-by: Brad Figg <bfigg@nvidia.com>
…transfer_one

BugLink: https://bugs.launchpad.net/bugs/2139640

When the timeout handler processes a completed transfer and signals
completion, the transfer thread can immediately set up the next transfer
and assign curr_xfer to point to it.

If a delayed ISR from the previous transfer then runs, it checks if
(!tqspi->curr_xfer) (currently without the lock also -- to be fixed
soon) to detect stale interrupts, but this check passes because
curr_xfer now points to the new transfer. The ISR then incorrectly
processes the new transfer's context.

Protect the curr_xfer assignment with the spinlock to ensure the ISR
either sees NULL (and bails out) or sees the new value only after the
assignment is complete.

Fixes: 921fc18 ("spi: tegra210-quad: Add support for Tegra210 QSPI controller")
Signed-off-by: Breno Leitao <leitao@debian.org>
Tested-by: Jon Hunter <jonathanh@nvidia.com>
Acked-by: Jon Hunter <jonathanh@nvidia.com>
Acked-by: Thierry Reding <treding@nvidia.com>
Link: https://patch.msgid.link/20260126-tegra_xfer-v2-3-6d2115e4f387@debian.org
Signed-off-by: Mark Brown <broonie@kernel.org>
(cherry picked from commit f5a4d7f linux-next)
Signed-off-by: Carol L Soto <csoto@nvidia.com>
Acked-by: Matthew R. Ochs <mochs@nvidia.com>
Acked-by: Jamie Nguyen <jamien@nvidia.com>
Acked-by: Abdur Rahman <abdur.rahman@canonical.com>
Acked-by: Noah Wager <noah.wager@canonical.com>
Signed-off-by: Brad Figg <bfigg@nvidia.com>
BugLink: https://bugs.launchpad.net/bugs/2139640

The curr_xfer field is read by the IRQ handler without holding the lock
to check if a transfer is in progress. When clearing curr_xfer in the
combined sequence transfer loop, protect it with the spinlock to prevent
a race with the interrupt handler.

Protect the curr_xfer clearing at the exit path of
tegra_qspi_combined_seq_xfer() with the spinlock to prevent a race
with the interrupt handler that reads this field.

Without this protection, the IRQ handler could read a partially updated
curr_xfer value, leading to NULL pointer dereference or use-after-free.

Fixes: b4e002d ("spi: tegra210-quad: Fix timeout handling")
Signed-off-by: Breno Leitao <leitao@debian.org>
Tested-by: Jon Hunter <jonathanh@nvidia.com>
Acked-by: Jon Hunter <jonathanh@nvidia.com>
Acked-by: Thierry Reding <treding@nvidia.com>
Link: https://patch.msgid.link/20260126-tegra_xfer-v2-4-6d2115e4f387@debian.org
Signed-off-by: Mark Brown <broonie@kernel.org>
(cherry picked from commit bf4528a linux-next)
Signed-off-by: Carol L Soto <csoto@nvidia.com>
Acked-by: Matthew R. Ochs <mochs@nvidia.com>
Acked-by: Jamie Nguyen <jamien@nvidia.com>
Acked-by: Abdur Rahman <abdur.rahman@canonical.com>
Acked-by: Noah Wager <noah.wager@canonical.com>
Signed-off-by: Brad Figg <bfigg@nvidia.com>
…ined_seq_xfer

BugLink: https://bugs.launchpad.net/bugs/2139640

Protect the curr_xfer clearing in tegra_qspi_non_combined_seq_xfer()
with the spinlock to prevent a race with the interrupt handler that
reads this field to check if a transfer is in progress.

Fixes: b4e002d ("spi: tegra210-quad: Fix timeout handling")
Signed-off-by: Breno Leitao <leitao@debian.org>
Tested-by: Jon Hunter <jonathanh@nvidia.com>
Acked-by: Jon Hunter <jonathanh@nvidia.com>
Acked-by: Thierry Reding <treding@nvidia.com>
Link: https://patch.msgid.link/20260126-tegra_xfer-v2-5-6d2115e4f387@debian.org
Signed-off-by: Mark Brown <broonie@kernel.org>
(cherry picked from commit 6d7723e linux-next)
Signed-off-by: Carol L Soto <csoto@nvidia.com>
Acked-by: Matthew R. Ochs <mochs@nvidia.com>
Acked-by: Jamie Nguyen <jamien@nvidia.com>
Acked-by: Abdur Rahman <abdur.rahman@canonical.com>
Acked-by: Noah Wager <noah.wager@canonical.com>
Signed-off-by: Brad Figg <bfigg@nvidia.com>
BugLink: https://bugs.launchpad.net/bugs/2139640

Now that all other accesses to curr_xfer are done under the lock,
protect the curr_xfer NULL check in tegra_qspi_isr_thread() with the
spinlock. Without this protection, the following race can occur:

  CPU0 (ISR thread)              CPU1 (timeout path)
  ----------------               -------------------
  if (!tqspi->curr_xfer)
    // sees non-NULL
                                 spin_lock()
                                 tqspi->curr_xfer = NULL
                                 spin_unlock()
  handle_*_xfer()
    spin_lock()
    t = tqspi->curr_xfer  // NULL!
    ... t->len ...        // NULL dereference!

With this patch, all curr_xfer accesses are now properly synchronized.

Although all accesses to curr_xfer are done under the lock, in
tegra_qspi_isr_thread() it checks for NULL, releases the lock and
reacquires it later in handle_cpu_based_xfer()/handle_dma_based_xfer().
There is a potential for an update in between, which could cause a NULL
pointer dereference.

To handle this, add a NULL check inside the handlers after acquiring
the lock. This ensures that if the timeout path has already cleared
curr_xfer, the handler will safely return without dereferencing the
NULL pointer.

Fixes: b4e002d ("spi: tegra210-quad: Fix timeout handling")
Signed-off-by: Breno Leitao <leitao@debian.org>
Tested-by: Jon Hunter <jonathanh@nvidia.com>
Acked-by: Jon Hunter <jonathanh@nvidia.com>
Acked-by: Thierry Reding <treding@nvidia.com>
Link: https://patch.msgid.link/20260126-tegra_xfer-v2-6-6d2115e4f387@debian.org
Signed-off-by: Mark Brown <broonie@kernel.org>
(cherry picked from commit edf9088 linux-next)
Signed-off-by: Carol L Soto <csoto@nvidia.com>
Acked-by: Matthew R. Ochs <mochs@nvidia.com>
Acked-by: Jamie Nguyen <jamien@nvidia.com>
Acked-by: Abdur Rahman <abdur.rahman@canonical.com>
Acked-by: Noah Wager <noah.wager@canonical.com>
Signed-off-by: Brad Figg <bfigg@nvidia.com>
BugLink: https://bugs.launchpad.net/bugs/2139648

Currently cpu-clock event always returns 0 count, e.g.,

perf stat -e cpu-clock -- sleep 1

 Performance counter stats for 'sleep 1':
                 0      cpu-clock                        #    0.000 CPUs utilized
       1.002308394 seconds time elapsed

The root cause is the commit 'bc4394e5e79c ("perf: Fix the throttle
 error of some clock events")' adds PERF_EF_UPDATE flag check before
calling cpu_clock_event_update() to update the count, however the
PERF_EF_UPDATE flag is never set when the cpu-clock event is stopped in
counting mode (pmu->dev() -> cpu_clock_event_del() ->
cpu_clock_event_stop()). This leads to the cpu-clock event count is
never updated.

To fix this issue, force to set PERF_EF_UPDATE flag for cpu-clock event
just like what task-clock does.

Fixes: bc4394e ("perf: Fix the throttle error of some clock events")
Signed-off-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Ian Rogers <irogers@google.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Link: https://patch.msgid.link/20251112080526.3971392-1-dapeng1.mi@linux.intel.com
(cherry picked from commit f1f9651)
Signed-off-by: Matthew R. Ochs <mochs@nvidia.com>
Acked-by: Carol L Soto <csoto@nvidia.com>
Acked-by: Nirmoy Das <nirmoyd@nvidia.com>
Acked-by: Jamie Nguyen <jamien@nvidia.com>
Acked-by: Abdur Rahman <abdur.rahman@canonical.com>
Acked-by: Noah Wager <noah.wager@canonical.com>
Signed-off-by: Brad Figg <bfigg@nvidia.com>
BugLink: https://bugs.launchpad.net/bugs/2093957

Signed-off-by: Jeremy Szu <jszu@nvidia.com>
Acked-by: Nirmoy Das <nirmoyd@nvidia.com>
Acked-by: Matthew R. Ochs <mochs@nvidia.com>
Acked-by: Abdur Rahman <abdur.rahman@canonical.com>
Acked-by: Noah Wager <noah.wager@canonical.com>
Signed-off-by: Brad Figg <bfigg@nvidia.com>
BugLink: https://bugs.launchpad.net/bugs/2138892

Remove this declaration which is now used within the file
after merging upstream "vfio/nvgrace-gpu: register device memory for
poison handling".

Signed-off-by: Nirmoy Das <nirmoyd@nvidia.com>
Acked-by: Matthew R. Ochs <mochs@nvidia.com>
Acked-by: Jamie Nguyen <jamien@nvidia.com>
Signed-off-by: Brad Figg <bfigg@nvidia.com>
BugLink: https://bugs.launchpad.net/bugs/2136828

Add PCI_VENDOR_ID_ASPEED to the shared pci_ids.h header and remove the
duplicate local definition from ehci-pci.c.

This prepares for adding a PCI quirk for ASPEED devices.

Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Nirmoy Das <nirmoyd@nvidia.com>
(backported from https://lore.kernel.org/linux-iommu/20251217154529.377586-1-nirmoyd@nvidia.com/)
Signed-off-by: Nirmoy Das <nirmoyd@nvidia.com>
Acked-by: Carol L Soto <csoto@nvidia.com>
Acked-by: Matthew R. Ochs <mochs@nvidia.com>
Acked-by: Jamie Nguyen <jamien@nvidia.com>
Acked-by: Noah Wager <noah.wager@canonical.com>
Acked-by: Abdur Rahman <abdur.rahman@canonical.com>
Signed-off-by: Brad Figg <bfigg@nvidia.com>
BugLink: https://bugs.launchpad.net/bugs/2136828

ASPEED BMC controllers have VGA and USB functions behind a PCIe-to-PCI
bridge that causes them to share the same stream ID:

  [e0]---00.0-[e1-e2]----00.0-[e2]--+-00.0  ASPEED Graphics Family
                                    \-02.0  ASPEED USB Controller

Both devices get stream ID 0x5e200 due to bridge aliasing, causing the
USB controller to be rejected with 'Aliasing StreamID unsupported'.

Per ASPEED, the AST1150 doesn't use a real PCI bus and always forwards
the original requester ID from downstream devices rather than replacing
it with any alias.

Add a new PCI_DEV_FLAGS_PCI_BRIDGE_NO_ALIASES flag and apply it to the
AST1150.

Suggested-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Robin Murphy <robin.murphy@arm.com>
Signed-off-by: Nirmoy Das <nirmoyd@nvidia.com>
(backported from https://lore.kernel.org/linux-iommu/20251217154529.377586-2-nirmoyd@nvidia.com/)
[nirmoy: set PCI_DEV_FLAGS_PCI_BRIDGE_NO_ALIASES to (1 << 15) instead of (1 << 14)]
Signed-off-by: Nirmoy Das <nirmoyd@nvidia.com>
Acked-by: Carol L Soto <csoto@nvidia.com>
Acked-by: Matthew R. Ochs <mochs@nvidia.com>
Acked-by: Jamie Nguyen <jamien@nvidia.com>
Acked-by: Noah Wager <noah.wager@canonical.com>
Acked-by: Abdur Rahman <abdur.rahman@canonical.com>
Signed-off-by: Brad Figg <bfigg@nvidia.com>
…ivation (LFA)

BugLink: https://bugs.launchpad.net/bugs/2138342

The Arm Live Firmware Activation (LFA) is a specification [1] to describe
activating firmware components without a reboot. Those components
(like TF-A's BL31, EDK-II, TF-RMM, secure paylods) would be updated the
usual way: via fwupd, FF-A or other secure storage methods, or via some
IMPDEF Out-Of-Bound method. The user can then activate this new firmware,
at system runtime, without requiring a reboot.
The specification covers the SMCCC interface to list and query available
components and eventually trigger the activation.

Add a new directory under /sys/firmware to present firmware components
capable of live activation. Each of them is a directory under lfa/,
and is identified via its GUID. The activation will be triggered by echoing
"1" into the "activate" file:
==========================================
/sys/firmware/lfa # ls -l . 6c*
.:
total 0
drwxr-xr-x    2 0 0         0 Jan 19 11:33 47d4086d-4cfe-9846-9b95-2950cbbd5a00
drwxr-xr-x    2 0 0         0 Jan 19 11:33 6c0762a6-12f2-4b56-92cb-ba8f633606d9
drwxr-xr-x    2 0 0         0 Jan 19 11:33 d6d0eea7-fcea-d54b-9782-9934f234b6e4

6c0762a6-12f2-4b56-92cb-ba8f633606d9:
total 0
--w-------    1 0        0             4096 Jan 19 11:33 activate
-r--r--r--    1 0        0             4096 Jan 19 11:33 activation_capable
-r--r--r--    1 0        0             4096 Jan 19 11:33 activation_pending
--w-------    1 0        0             4096 Jan 19 11:33 cancel
-r--r--r--    1 0        0             4096 Jan 19 11:33 cpu_rendezvous
-r--r--r--    1 0        0             4096 Jan 19 11:33 current_version
-rw-r--r--    1 0        0             4096 Jan 19 11:33 force_cpu_rendezvous
-r--r--r--    1 0        0             4096 Jan 19 11:33 may_reset_cpu
-r--r--r--    1 0        0             4096 Jan 19 11:33 name
-r--r--r--    1 0        0             4096 Jan 19 11:33 pending_version
/sys/firmware/lfa/6c0762a6-12f2-4b56-92cb-ba8f633606d9 # grep . *
grep: activate: Permission denied
activation_capable:1
activation_pending:1
grep: cancel: Permission denied
cpu_rendezvous:1
current_version:0.0
force_cpu_rendezvous:1
may_reset_cpu:0
name:TF-RMM
pending_version:0.0
/sys/firmware/lfa/6c0762a6-12f2-4b56-92cb-ba8f633606d9 # echo 1 > activate
[ 2825.797871] Arm LFA: firmware activation succeeded.
/sys/firmware/lfa/6c0762a6-12f2-4b56-92cb-ba8f633606d9 #
==========================================

[1] https://developer.arm.com/documentation/den0147/latest/

Signed-off-by: Salman Nabi <salman.nabi@arm.com>
Signed-off-by: Vedashree Vidwans <vvidwans@nvidia.com>
(backported from https://lore.kernel.org/all/20260119122729.287522-2-salman.nabi@arm.com/)
[nirmoyd: Added image_name fallback to fw_uuid in update_fw_image_node()]
Signed-off-by: Nirmoy Das <nirmoyd@nvidia.com>
Acked-by: Carol L Soto <csoto@nvidia.com>
Acked-by: Matthew R. Ochs <mochs@nvidia.com>
Acked-by: Jamie Nguyen <jamien@nvidia.com>
Acked-by: Jacob Martin <jacob.martin@canonical.com>
Acked-by: Noah Wager <noah.wager@canonical.com>
Signed-off-by: Brad Figg <bfigg@nvidia.com>
BugLink: https://bugs.launchpad.net/bugs/2138342

Enhance PRIME/ACTIVATION functions to touch watchdog and implement
timeout mechanism. This update ensures that any potential hangs are
detected promptly and that the LFA process is allocated sufficient
execution time before the watchdog timer expires. These changes improve
overall system reliability by reducing the risk of undetected process
stalls and unexpected watchdog resets.

Signed-off-by: Vedashree Vidwans <vvidwans@nvidia.com>
Signed-off-by: Nirmoy Das <nirmoyd@nvidia.com>
Acked-by: Carol L Soto <csoto@nvidia.com>
Acked-by: Matthew R. Ochs <mochs@nvidia.com>
Acked-by: Jamie Nguyen <jamien@nvidia.com>
Acked-by: Jacob Martin <jacob.martin@canonical.com>
Acked-by: Noah Wager <noah.wager@canonical.com>
Signed-off-by: Brad Figg <bfigg@nvidia.com>
alucerop and others added 26 commits March 24, 2026 11:58
Region creation involves finding available DPA (device-physical-address)
capacity to map into HPA (host-physical-address) space.

In order to support CXL Type2 devices, define an API, cxl_request_dpa(),
that tries to allocate the DPA memory the driver requires to operate.The
memory requested should not be bigger than the max available HPA obtained
previously with cxl_get_hpa_freespace().

Based on https://lore.kernel.org/linux-cxl/168592158743.1948938.7622563891193802610.stgit@dwillia2-xfh.jf.intel.com/

Signed-off-by: Alejandro Lucero <alucerop@amd.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Reviewed-by: Ben Cheatham <benjamin.cheatham@amd.com>
(backported from https://lore.kernel.org/linux-cxl/20260201155438.2664640-1-alejandro.lucero-palau@amd.com/)
Signed-off-by: Jiandi An <jan@nvidia.com>
Use cxl api for getting DPA (Device Physical Address) to use through an
endpoint decoder.

Signed-off-by: Alejandro Lucero <alucerop@amd.com>
Reviewed-by: Martin Habets <habetsm.xilinx@gmail.com>
Acked-by: Edward Cree <ecree.xilinx@gmail.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Ben Cheatham <benjamin.cheatham@amd.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
(backported from https://lore.kernel.org/linux-cxl/20260201155438.2664640-1-alejandro.lucero-palau@amd.com/)
Signed-off-by: Jiandi An <jan@nvidia.com>
Current code is expecting Type3 or CXL_DECODER_HOSTONLYMEM devices only.
Support for Type2 implies region type needs to be based on the endpoint
type HDM-D[B] instead.

Signed-off-by: Alejandro Lucero <alucerop@amd.com>
Reviewed-by: Zhi Wang <zhiw@nvidia.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Ben Cheatham <benjamin.cheatham@amd.com>
Reviewed-by: Alison Schofield <alison.schofield@intel.com>
Reviewed-by: Davidlohr Bueso <daves@stgolabs.net>
Reviewed-by: Gregory Price <gourry@gourry.net>
Reviewed-by: Davidlohr Bueso <dave@stgolabs.net>
(backported from https://lore.kernel.org/linux-cxl/20260201155438.2664640-1-alejandro.lucero-palau@amd.com/)
Signed-off-by: Jiandi An <jan@nvidia.com>
Region creation based on Type3 devices is triggered from user space
allowing memory combination through interleaving.

In preparation for kernel driven region creation, that is Type2 drivers
triggering region creation backed with its advertised CXL memory, factor
out a common helper from the user-sysfs region setup for interleave ways.

Signed-off-by: Alejandro Lucero <alucerop@amd.com>
Reviewed-by: Zhi Wang <zhiw@nvidia.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Ben Cheatham <benjamin.cheatham@amd.com>
Reviewed-by: Alison Schofield <alison.schofield@intel.com>
(backported from https://lore.kernel.org/linux-cxl/20260201155438.2664640-1-alejandro.lucero-palau@amd.com/)
Signed-off-by: Jiandi An <jan@nvidia.com>
Region creation based on Type3 devices is triggered from user space
allowing memory combination through interleaving.

In preparation for kernel driven region creation, that is Type2 drivers
triggering region creation backed with its advertised CXL memory, factor
out a common helper from the user-sysfs region setup forinterleave
granularity.

Signed-off-by: Alejandro Lucero <alucerop@amd.com>
Reviewed-by: Zhi Wang <zhiw@nvidia.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Ben Cheatham <benjamin.cheatham@amd.com>
Reviewed-by: Alison Schofield <alison.schofield@intel.com>
(backported from https://lore.kernel.org/linux-cxl/20260201155438.2664640-1-alejandro.lucero-palau@amd.com/)
Signed-off-by: Jiandi An <jan@nvidia.com>
Creating a CXL region requires userspace intervention through the cxl
sysfs files. Type2 support should allow accelerator drivers to create
such cxl region from kernel code.

Adding that functionality and integrating it with current support for
memory expanders.

Based on https://lore.kernel.org/linux-cxl/168592159835.1948938.1647215579839222774.stgit@dwillia2-xfh.jf.intel.com/

Signed-off-by: Alejandro Lucero <alucerop@amd.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
(backported from https://lore.kernel.org/linux-cxl/20260201155438.2664640-1-alejandro.lucero-palau@amd.com/)
[jan: Resolve minor conflict due to code lines shift]
Signed-off-by: Jiandi An <jan@nvidia.com>
By definition a type2 cxl device will use the host managed memory for
specific functionality, therefore it should not be available to other
uses.

Signed-off-by: Alejandro Lucero <alucerop@amd.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Davidlohr Bueso <daves@stgolabs.net>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Reviewed-by: Ben Cheatham <benjamin.cheatham@amd.com>
(backported from https://lore.kernel.org/linux-cxl/20260201155438.2664640-1-alejandro.lucero-palau@amd.com/)
Signed-off-by: Jiandi An <jan@nvidia.com>
Use cxl api for creating a region using the endpoint decoder related to
a DPA range.

Signed-off-by: Alejandro Lucero <alucerop@amd.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
(backported from https://lore.kernel.org/linux-cxl/20260201155438.2664640-1-alejandro.lucero-palau@amd.com/)
Signed-off-by: Jiandi An <jan@nvidia.com>
A PIO buffer is a region of device memory to which the driver can write a
packet for TX, with the device handling the transmit doorbell without
requiring a DMA for getting the packet data, which helps reducing latency
in certain exchanges. With CXL mem protocol this latency can be lowered
further.

With a device supporting CXL and successfully initialised, use the cxl
region to map the memory range and use this mapping for PIO buffers.

Add the disabling of those CXL-based PIO buffers if the callback for
potential cxl endpoint removal by the CXL code happens.

Signed-off-by: Alejandro Lucero <alucerop@amd.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
(backported from https://lore.kernel.org/linux-cxl/20260201155438.2664640-1-alejandro.lucero-palau@amd.com/)
Signed-off-by: Jiandi An <jan@nvidia.com>
…smaller granularities for lower levels

The CXL specification supports multi-level interleaving "as long as
all the levels use different, but consecutive, HPA bits to select the
target and no Interleave Set has more than 8 devices" (from 3.2).

Currently the kernel expects that a decoder's "interleave granularity
is a multiple of @parent_port granularity". That is, the granularity
of a lower level is bigger than those of the parent and uses the outer
HPA bits as selector. It works e.g. for the following 8-way config:

 * cross-link (cross-hostbridge config in CFMWS):
   * 4-way
   * 256 granularity
   * Selector: HPA[8:9]
 * sub-link (CXL Host bridge config of the HDM):
   * 2-way
   * 1024 granularity
   * Selector: HPA[10]

Now, if the outer HPA bits are used for the cross-hostbridge, an 8-way
config could look like this:

 * cross-link (cross-hostbridge config in CFMWS):
   * 4-way
   * 512 granularity
   * Selector: HPA[9:10]
 * sub-link (CXL Host bridge config of the HDM):
   * 2-way
   * 256 granularity
   * Selector: HPA[8]

The enumeration of decoders for this configuration fails then with
following error:

 cxl region0: pci0000:00:port1 cxl_port_setup_targets expected iw: 2 ig: 1024 [mem 0x10000000000-0x1ffffffffff flags 0x200]
 cxl region0: pci0000:00:port1 cxl_port_setup_targets got iw: 2 ig: 256 state: enabled 0x10000000000:0x1ffffffffff
 cxl_port endpoint12: failed to attach decoder12.0 to region0: -6

Note that this happens only if firmware is setting up the decoders
(CXL_REGION_F_AUTO). For userspace region assembly the granularities
are chosen to increase from root down to the lower levels. That is,
outer HPA bits are always used for lower interleaving levels.

Rework the implementation to also support multi-level interleaving
with smaller granularities for lower levels. Determine the interleave
set of autodetected decoders. Check that it is a subset of the root
interleave.

The HPA selector bits are extracted for all decoders of the set and
checked that there is no overlap and bits are consecutive. All
decoders can be programmed now to use any bit range within the
region's target selector.

Signed-off-by: Robert Richter <rrichter@amd.com>
(backported from https://lore.kernel.org/all/20251028094754.72816-1-rrichter@amd.com/)
[jan: Resolved minor conflicts]
Signed-off-by: Jiandi An <jan@nvidia.com>
…er definitions

PCI: Add CXL DVSEC control, lock, and range register definitions

Add register offset and field definitions for CXL DVSEC registers needed
by CXL state save/restore across resets:

  - CTRL2 (offset 0x10) and LOCK (offset 0x14) registers
  - CONFIG_LOCK bit in the LOCK register
  - RWL (read-write-when-locked) field masks for CTRL and range base
    registers.

Signed-off-by: Srirangan Madhavan <smadhavan@nvidia.com>
(backported from https://lore.kernel.org/linux-cxl/20260306080026.116789-1-smadhavan@nvidia.com/)
Signed-off-by: Jiandi An <jan@nvidia.com>
… to include/cxl/cxl.h

Move CXL HDM decoder register defines, register map structs
(cxl_reg_map, cxl_component_reg_map, cxl_device_reg_map,
cxl_pmu_reg_map, cxl_register_map), cxl_hdm_decoder_count(),
enum cxl_regloc_type, and cxl_find_regblock()/cxl_setup_regs()
declarations from internal CXL headers to include/cxl/pci.h.

This makes them accessible to code outside the CXL subsystem, in
particular the PCI core CXL state save/restore support added in a
subsequent patch.

No functional change.

Signed-off-by: Srirangan Madhavan <smadhavan@nvidia.com>
(backported from https://lore.kernel.org/linux-cxl/20260306080026.116789-1-smadhavan@nvidia.com/)
[jan: Resolve conflicts by moving certain definitions to include/cxl/cxl.h instead of to include/cxl/pci.h to align with its dependency of Alejandro's series]
Signed-off-by: Jiandi An <jan@nvidia.com>
…state

Add pci_add_virtual_ext_cap_save_buffer() to allocate save buffers
using virtual cap IDs (above PCI_EXT_CAP_ID_MAX) that don't require
a real capability in config space.

The existing pci_add_ext_cap_save_buffer() cannot be used for
CXL DVSEC state because it calls pci_find_saved_ext_cap()
which searches for a matching capability in PCI config space.
The CXL state saved here is a synthetic snapshot (DVSEC+HDM)
and should not be tied to a real extended-cap instance. A
virtual extended-cap save buffer API (cap IDs above
PCI_EXT_CAP_ID_MAX) allows PCI to track this state without
a backing config space capability.

Signed-off-by: Srirangan Madhavan <smadhavan@nvidia.com>
(backported from https://lore.kernel.org/linux-cxl/20260306080026.116789-1-smadhavan@nvidia.com/)
Signed-off-by: Jiandi An <jan@nvidia.com>
Save and restore CXL DVSEC control registers (CTRL, CTRL2), range
base registers, and lock state across PCI resets.

When the DVSEC CONFIG_LOCK bit is set, certain DVSEC fields
become read-only and hardware may have updated them. Blindly
restoring saved values would be silently ignored or conflict
with hardware state. Instead, a read-merge-write approach is
used: current hardware values are read for the RWL
(read-write-when-locked) fields and merged with saved state,
so only writable bits are restored while locked bits retain
their hardware values.

Hooked into pci_save_state()/pci_restore_state() so all PCI reset
paths automatically preserve CXL DVSEC configuration.

Signed-off-by: Srirangan Madhavan <smadhavan@nvidia.com>
(backported from https://lore.kernel.org/linux-cxl/20260306080026.116789-1-smadhavan@nvidia.com/)
[jan: Resolve minor conflict in drivers/pci/Makefile due to code line shifts ]
Signed-off-by: Jiandi An <jan@nvidia.com>
Save and restore CXL HDM decoder registers (global control,
per-decoder base/size/target-list, and commit state) across PCI
resets. On restore, decoders that were committed are reprogrammed
and recommitted with a 10ms timeout. Locked decoders that are
already committed are skipped, since their state is protected by
hardware and reprogramming them would fail.

The Register Locator DVSEC is parsed directly via PCI config space
reads rather than calling cxl_find_regblock()/cxl_setup_regs(),
since this code lives in the PCI core and must not depend on CXL
module symbols.

MSE is temporarily enabled during save/restore to allow MMIO
access to the HDM decoder register block.

Signed-off-by: Srirangan Madhavan <smadhavan@nvidia.com>
(backported from https://lore.kernel.org/linux-cxl/20260306080026.116789-1-smadhavan@nvidia.com/)
[jan: Include <cxl/cxl.h> in drivers/pci/cxl.c due to conflict resolution in "4acbc27592b8 NVIDIA: VR: SAUCE: cxl: Move HDM decoder and register map definitions to include/cxl/cxl.h"]
Signed-off-by: Jiandi An <jan@nvidia.com>
…efinitions

Add CXL DVSEC register definitions needed for CXL device reset per
CXL r3.2 section 8.1.3.1:
- Capability bits: RST_CAPABLE, CACHE_CAPABLE, CACHE_WBI_CAPABLE,
  RST_TIMEOUT, RST_MEM_CLR_CAPABLE
- Control2 register: DISABLE_CACHING, INIT_CACHE_WBI, INIT_CXL_RST,
  RST_MEM_CLR_EN
- Status2 register: CACHE_INV, RST_DONE, RST_ERR
- Non-CXL Function Map DVSEC register offset

Signed-off-by: Srirangan Madhavan <smadhavan@nvidia.com>
(backported from https://lore.kernel.org/linux-cxl/20260306092322.148765-1-smadhavan@nvidia.com/)
[jan: Resolve conflicts where PCI_DVSEC_CXL_CACHE_CAPABLE is already added by "72bd823fb4f1 NVIDIA: VR: SAUCE: PCI: Allow ATS to be always on for CXL.cache capable devices"]
Signed-off-by: Jiandi An <jan@nvidia.com>
…_restore()

Export pci_dev_save_and_disable() and pci_dev_restore() so that
subsystems performing non-standard reset sequences (e.g. CXL)
can reuse the PCI core standard pre/post reset lifecycle:
driver reset_prepare/reset_done callbacks, PCI config space
save/restore, and device disable/re-enable.

Signed-off-by: Srirangan Madhavan <smadhavan@nvidia.com>
(backported from https://lore.kernel.org/linux-cxl/20260306092322.148765-1-smadhavan@nvidia.com/)
Signed-off-by: Jiandi An <jan@nvidia.com>
Add infrastructure for quiescing the CXL data path before reset:

- Memory offlining: check if CXL-backed memory is online and offline
  it via offline_and_remove_memory() before reset, per CXL
  spec requirement to quiesce all CXL.mem transactions before issuing
  CXL Reset.
- CPU cache flush: invalidate cache lines before reset
  as a safety measure after memory offline.

Signed-off-by: Srirangan Madhavan <smadhavan@nvidia.com>
(backported from https://lore.kernel.org/linux-cxl/20260306092322.148765-1-smadhavan@nvidia.com/)
Signed-off-by: Jiandi An <jan@nvidia.com>
…XL reset

Add sibling PCI function save/disable/restore coordination for CXL
reset. Before reset, all CXL.cachemem sibling functions are locked,
saved, and disabled; after reset they are restored. The Non-CXL Function
Map DVSEC and per-function DVSEC capability register are consulted to
skip non-CXL and CXL.io-only functions. A global mutex serializes
concurrent resets to prevent deadlocks between sibling functions.

Signed-off-by: Srirangan Madhavan <smadhavan@nvidia.com>
(backported from https://lore.kernel.org/linux-cxl/20260306092322.148765-1-smadhavan@nvidia.com/)
Signed-off-by: Jiandi An <jan@nvidia.com>
…ration

cxl_dev_reset() implements the hardware reset sequence:
optionally enable memory clear, initiate reset via
CTRL2, wait for completion, and re-enable caching.

cxl_do_reset() orchestrates the full reset flow:
  1. CXL pre-reset: mem offlining and cache flush (when memdev present)
  2. PCI save/disable: pci_dev_save_and_disable() automatically saves
     CXL DVSEC and HDM decoder state via PCI core hooks
  3. Sibling coordination: save/disable CXL.cachemem sibling functions
  4. Execute CXL DVSEC reset
  5. Sibling restore: always runs to re-enable sibling functions
  6. PCI restore: pci_dev_restore() automatically restores CXL state

The CXL-specific DVSEC and HDM save/restore is handled
by the PCI core's CXL save/restore infrastructure (drivers/pci/cxl.c).

Signed-off-by: Srirangan Madhavan <smadhavan@nvidia.com>
(backported from https://lore.kernel.org/linux-cxl/20260306092322.148765-1-smadhavan@nvidia.com/)
Signed-off-by: Jiandi An <jan@nvidia.com>
Add a "cxl_reset" sysfs attribute to PCI devices that support CXL
Reset (CXL r3.2 section 8.1.3.1). The attribute is visible only on
devices with both CXL.cache and CXL.mem capabilities and the CXL
Reset Capable bit set in the DVSEC.

Writing "1" to the attribute triggers the full CXL reset flow via
cxl_do_reset(). The interface is decoupled from memdev creation:
when a CXL memdev exists, memory offlining and cache flush are
performed; otherwise reset proceeds without the memory management.

The sysfs attribute is managed entirely by the CXL module using
sysfs_create_group() / sysfs_remove_group() rather than the PCI
core's static attribute groups. This avoids cross-module symbol
dependencies between the PCI core (always built-in) and CXL_BUS
(potentially modular).

At module init, existing PCI devices are scanned and a PCI bus
notifier handles hot-plug/unplug. kernfs_drain() makes sure that
any in-flight store() completes before sysfs_remove_group() returns,
preventing use-after-free during module unload.

Signed-off-by: Srirangan Madhavan <smadhavan@nvidia.com>
(backported from https://lore.kernel.org/linux-cxl/20260306092322.148765-1-smadhavan@nvidia.com/)
Signed-off-by: Jiandi An <jan@nvidia.com>
…tribute

Document the cxl_reset sysfs attribute added to PCI devices that
support CXL Reset.

Signed-off-by: Srirangan Madhavan <smadhavan@nvidia.com>
(backported from https://lore.kernel.org/linux-cxl/20260306092322.148765-1-smadhavan@nvidia.com/)
Signed-off-by: Jiandi An <jan@nvidia.com>
…and RAS support

Add Ubuntu kernel config annotations for CXL-related configs introduced
or changed by the following cherry-picked patch series:
  - drivers/cxl changes between v6.17.9 and upstream 7.0 (which includes
    a portion of Terry Bowman's v14 CXL RAS series merged via
    for-7.0/cxl-aer-prep)
  - Alejandro Lucero's v23 CXL Type-2 device support series
  - Smita Koralahalli's v6 patch 3/9 (cxl/region: Skip decoder reset on
    detach for autodiscovered regions)

CONFIG_CXL_BUS:           Enable CXL bus support built-in; required for
                          CXL Type-2 device and RAS support
CONFIG_CXL_PCI:           Enable CXL PCI management built-in; auto-selects
                          CXL_MEM; required for CXL Type-2 device support
CONFIG_CXL_MEM:           Auto-selected by CXL_PCI; required for CXL
                          memory expansion and Type-2 device support
CONFIG_CXL_PORT:          Required for CXL port enumeration; defaults to
                          CXL_BUS value
CONFIG_FWCTL:             Selected by CXL_BUS when CXL_FEATURES is enabled;
                          required for CXL feature mailbox access
CONFIG_CXL_RAS:           New def_bool replacing PCIEAER_CXL (Terry Bowman
                          v14); auto-enabled with ACPI_APEI_GHES+PCIEAER+
                          CXL_BUS for CXL RAS error handling
CONFIG_SFC_CXL:           Solarflare SFC9100-family CXL Type-2 device
                          support; not needed for NVIDIA platforms (n)
CONFIG_ACPI_APEI_EINJ:    Required prerequisite for CONFIG_ACPI_APEI_EINJ_CXL
CONFIG_ACPI_APEI_EINJ_CXL: CXL protocol error injection support via APEI EINJ

CONFIG_PCIEAER_CXL: Remove it from debian.master policy. This config
  was removed from Kconfig by upstream commit d18f1b7
 (PCI/AER: Replace PCIEAER_CXL symbol with CXL_RAS) which is included
 in this port.

CONFIG_ARCH_HAS_CPU_CACHE_INVALIDATE_MEMREGION: Override debian.master
  amd64-only policy to include arm64. Commit 4d873c5 added
  'select ARCH_HAS_CPU_CACHE_INVALIDATE_MEMREGION' to arch/arm64/Kconfig,
  making this y on arm64 as well.

CONFIG_GENERIC_CPU_CACHE_MAINTENANCE: New bool config defined by
  c460697 in lib/Kconfig. Selected by arm64 via 4d873c5;
  not selected by x86. Set arm64: y, amd64: -.

CONFIG_CACHEMAINT_FOR_HOTPLUG: New optional menuconfig defined by
  2ec3b54 in drivers/cache/Kconfig. Depends on
  GENERIC_CPU_CACHE_MAINTENANCE so becomes visible on arm64. Defaults
  to n; HiSilicon HHA driver not needed for NVIDIA platforms.
  Set arm64: n, amd64: -.

Signed-off-by: Jiandi An <jan@nvidia.com>
…memory access

Override debian.master policy (m->y) for DEV_DAX, DEV_DAX_CXL, and
DEV_DAX_KMEM to ensure CXL memory regions are accessible as both raw
DAX devices and hotplugged System-RAM nodes.

debian.master sets these to 'm' (modules). For NVIDIA platforms with
CXL Type-2 devices, built-in (y) is required to ensure CXL memory
regions provisioned early in boot are immediately accessible without
relying on module loading order.

CONFIG_DEV_DAX:     Override m->y; prerequisite for DEV_DAX_CXL and
                    DEV_DAX_KMEM to be built-in; depends on
                    TRANSPARENT_HUGEPAGE (already y in debian.master)

CONFIG_DEV_DAX_CXL: Override m->y; creates /dev/daxX.Y devices for CXL
                    RAM regions not in the default system memory map
                    (Soft Reserved or dynamically provisioned regions);
                    depends on CXL_BUS+CXL_REGION+DEV_DAX (all y)

CONFIG_DEV_DAX_KMEM: Override m->y; onlines CXL DAX devices as System-RAM
                    NUMA nodes via memory hotplug, making CXL memory
                    available for normal kernel and userspace allocation

Signed-off-by: Jiandi An <jan@nvidia.com>
…/restore

Add Ubuntu kernel config annotation for CONFIG_PCI_CXL introduced by
the CXL DVSEC and HDM state save/restore series (Srirangan Madhavan).

CONFIG_PCI_CXL:  Hidden bool in drivers/pci/Kconfig; auto-enabled when
                 CXL_BUS=y. Gates compilation of drivers/pci/cxl.o which
                 saves and restores CXL DVSEC control/range registers and
                 HDM decoder state across PCI resets and link transitions.

Signed-off-by: Jiandi An <jan@nvidia.com>
NPEM registers LED classdevs on PCI endpoint that may be behind
hotplug-capable ports. During hot-removal, led_classdev_unregister()
calls led_set_brightness(LED_OFF) which PCI config on a disconnected
device, returning -ENODEV:

```
leds 0003:01:00.0:enclosure:ok: Setting an LED's brightness failed (-19)
```

The LED core already suppresses this for devices with LED_HW_PLUGGABLE
set, but NPEM never sets it. Add the flag since NPEM LEDs are on
hot-pluggable hardware by nature.

Fixes: 4e89354 ("PCI/NPEM: Add Native PCIe Enclosure Management support")
Signed-off-by: Richard Cheng <icheng@nvidia.com>
---
NV-bug link: https://nvbugspro.nvidia.com/bug/5739122
Testing machine: vr-loaner-36 with Montage Technology c002 (CXL 2.x, 128GB, rev 03)
Reproducing steps: https://nvbugspro.nvidia.com/bug/5739122?commentNumber=92

Best regards,
Richard Cheng.
JiandiAnNVIDIA pushed a commit that referenced this pull request Apr 15, 2026
BugLink: https://bugs.launchpad.net/bugs/2139960

[ Upstream commit 163e5f2 ]

When using perf record with the `--overwrite` option, a segmentation fault
occurs if an event fails to open. For example:

  perf record -e cycles-ct -F 1000 -a --overwrite
  Error:
  cycles-ct:H: PMU Hardware doesn't support sampling/overflow-interrupts. Try 'perf stat'
  perf: Segmentation fault
      #0 0x6466b6 in dump_stack debug.c:366
      #1 0x646729 in sighandler_dump_stack debug.c:378
      #2 0x453fd1 in sigsegv_handler builtin-record.c:722
      NVIDIA#3 0x7f8454e65090 in __restore_rt libc-2.32.so[54090]
      NVIDIA#4 0x6c5671 in __perf_event__synthesize_id_index synthetic-events.c:1862
      NVIDIA#5 0x6c5ac0 in perf_event__synthesize_id_index synthetic-events.c:1943
      NVIDIA#6 0x458090 in record__synthesize builtin-record.c:2075
      NVIDIA#7 0x45a85a in __cmd_record builtin-record.c:2888
      NVIDIA#8 0x45deb6 in cmd_record builtin-record.c:4374
      NVIDIA#9 0x4e5e33 in run_builtin perf.c:349
      NVIDIA#10 0x4e60bf in handle_internal_command perf.c:401
      NVIDIA#11 0x4e6215 in run_argv perf.c:448
      NVIDIA#12 0x4e653a in main perf.c:555
      NVIDIA#13 0x7f8454e4fa72 in __libc_start_main libc-2.32.so[3ea72]
      NVIDIA#14 0x43a3ee in _start ??:0

The --overwrite option implies --tail-synthesize, which collects non-sample
events reflecting the system status when recording finishes. However, when
evsel opening fails (e.g., unsupported event 'cycles-ct'), session->evlist
is not initialized and remains NULL. The code unconditionally calls
record__synthesize() in the error path, which iterates through the NULL
evlist pointer and causes a segfault.

To fix it, move the record__synthesize() call inside the error check block, so
it's only called when there was no error during recording, ensuring that evlist
is properly initialized.

Fixes: 4ea648a ("perf record: Add --tail-synthesize option")
Signed-off-by: Shuai Xue <xueshuai@linux.alibaba.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Manuel Diewald <manuel.diewald@canonical.com>
Signed-off-by: Edoardo Canepa <edoardo.canepa@canonical.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.