Skip to content

Support core TLB/IOTLB fixes from upstream#1

Open
nvax-r wants to merge 709 commits into
JiandiAnNVIDIA:cxl_2026-03-04from
nvax-r:cxl_2026-03-18-richard
Open

Support core TLB/IOTLB fixes from upstream#1
nvax-r wants to merge 709 commits into
JiandiAnNVIDIA:cxl_2026-03-04from
nvax-r:cxl_2026-03-18-richard

Conversation

@nvax-r
Copy link
Copy Markdown

@nvax-r nvax-r commented Mar 20, 2026

Support core TLB/IOTLB fixes from upstream

Description

This work adds TLB (Translation Lookaside Buffer) and IOTLB-adjacent fixes
and features to the nvidia-6.17 lineage on branch cxl_2026-03-18-richard, by
cherry-picking upstream commits from the Linux 6.17 → 6.19 window (and
closely related prerequisites). It is orthogonal to the branch’s primary
CXL bring-up: these patches target CPU MMU shootdown, IOMMU IOTLB,
GPU address-space TLB invalidation, KVM MMU, and observability.

Included areas:

  • Intel drm/xe — GT TLB invalidation pipeline — Dependency jobs, ordered
    workqueues, GT-local TLB invalidation jobs, CT locking, VF post-migration TLB
    reset, PPGTT vs GGTT address choice for inval, bind/fence ordering, kerneldoc.
    (Michal Wajdeczko, Matthew Brost, Stuart Summers, Matthew Auld, Shuicheng Lin.)
  • AMD drm/amdgpu — GFX12 MES INV_TLBS — MES API header, hardware TLB
    invalidation on gfx12, VM invalidation engine reservation for uni_mes, TLB
    fences on page-table updates, SI regression handling, flush_gpu_tlb_pasid()
    validation fixes. (Shaoyun Liu, Prike Liang, Michael Chen, Alex Deucher,
    Colin Ian King, Timur Kristóf.)
  • arm64 + generic mm — huge PMD / write fault / TLBI — Elide unnecessary
    TLB flushes on some PTE protection transitions, spurious-fault repair for huge
    PMDs, avoid broadcast TLBI when a page is reused on write fault. (Dev Jain,
    Huang Ying.)
  • x86/mm + mm — CPA, tracepoints, userfaultfd — Move tlb_flush trace
    event registration to x86; cpa_flush()flush_kernel_range(); batch
    TLB flushes on UFFD MOVE for present pages. (Steven Rostedt, Yu-cheng Yu,
    Lokesh Gidra.)
  • x86/mm/tlb trace + mm_types — Export TLB_REMOTE_WRONG_CPU for tracing;
    remove NR_TLB_FLUSH_REASONS from generic mm_types. (Tal Zussman.)
  • mm/hugetlb + rmapmmu_gather / IPI storm on PMD unshare — Reduce
    excessive IPI broadcasts when unsharing huge PMDs; follow-up hugetlb_pmd_shared
    / comment / hugetlb_reserve_pages() fixes. (David Hildenbrand, Shameer
    Kolothum.)
  • s390/mm — TLB-related cleanup — Remove cpu_has_idte(), CSP→CSPG,
    unused flush_tlb(). (Heiko Carstens.)
  • MIPS — host TLB — Avoid TLB shutdown on uniquification; kmalloc for
    tlb_vpn array. (Maciej W. Rozycki, Thomas Bogendoerfer.)
  • Standalone fixes — Intel VT-d qi_desc_iotlb (Aashish Sharma); i915 TLB
    inval seqcount (Andi Shyti); KVM TDP MMU NX huge pages read lock (Vipin
    Sharma); LoongArch CSR_*TLB* entry addresses (Huacai Chen); powerpc 8xx
    DataStoreTLBMiss cleanup (Christophe Leroy); x86 switch_mm_irqs_off() SMP
    ordering (Ingo Molnar).

Key features (TLB impact):

  • GPU device TLB / GTT / PPGTT invalidation is ordered, teardown-safe,
    and migration-aware (Xe, AMDGPU).
  • Fewer unnecessary CPU TLB shootdowns and broadcast TLBI on arm64
    and hugetlb unshare paths.
  • IOTLB descriptor hint correctness on Intel VT-d.
  • SMP ordering fixes around context switch vs TLB flush on x86.
  • Traceability of remote TLB flush behavior (TLB_REMOTE_WRONG_CPU).

Source

Patch breakdown (60 commits in scoped inventory — see appendix for SHAs):

# Category Count Source
1 drm/xe — GT TLB invalidation, VF/fence/kerneldoc 25 Upstream linux (v6.17..v6.19; Intel drm/xe)
2 drm/amdgpu — MES INV_TLBS, uni_mes, PT TLB fences, validation 11 Upstream linux (AMD)
3 arm64/mm + mm — huge PMD / write fault / TLBI 3 Upstream linux
4 x86/mm + mm — CPA, tlb_flush trace event, userfaultfd 3 Upstream linux
5 x86/mm/tlb trace + mm_types 2 Upstream linux
6 mm/hugetlb + rmapmmu_gather / IPI + follow-ups 5 Upstream linux
7 s390/mm 3 Upstream linux
8 MIPS 2 Upstream linux
9 Standalone (iommu vt-d, i915, kvm, LoongArch, powerpc, x86) 6 Upstream linux
TOTAL 60

Notes on scope:

  • Counts exclude non-TLB drm/xe commits on the same branch (workarounds,
    GuC, unrelated resets).
  • 62 commits match a broad git log v6.17..branch grep for TLB keywords;
    the table above uses the deduplicated inventory (60) tied to the series
    below.

Patch list by series (branch SHAs, apply order top-to-bottom within each series)

Series A — drm/xe: GT TLB invalidation pipeline & bind/fence ordering

Order Upstream SHA Subject (abbrev.)
1 538b27a09af9 Make GGTT TLB invalidation failure message GT oriented
2 69f187d446c9 Add generic dependency jobs / scheduler
3 ada51219489f Create ordered workqueue for GT TLB invalidation jobs
4 535c445eb94c Add dependency scheduler for GT TLB invalidations to bind queues
5 dba89840a920 Add GT TLB invalidation jobs
6 b8d5779eee38 Use GT TLB invalidation jobs in PT layer
7 51330ba66caa Remove unused GT TLB invalidation trace points
8 ce5059bf851b Move explicit CT lock in TLB invalidation sequence
9 76186a253a4b Cancel pending TLB inval workers on teardown
10 c697ddcf27bd s/tlb_invalidation/tlb_inval
11 594bb930fc7d Add xe_tlb_inval structure
12 6d1e452e0948 Add xe_gt_tlb_invalidation_done_handler
13 15366239e213 Decouple TLB invalidations from GT
14 9aff63cf3791 Prep TLB invalidation fence before sending
15 8443e8c448cf Add helpers to send TLB invalidations
16 db16f9d90c1d Split TLB invalidation code in frontend and backend
17 81a45cb7ea31 drm/xe/migrate: make MI_TLB_INVALIDATE conditional
18 489d890a3913 VF: Abort H2G sends during VF post-migration recovery
19 24687730cdc7 VF: Reset TLB invalidations during VF post migration recovery
20 673167d9f083 Use PPGTT addresses for TLB invalidation to avoid GGTT fixups
21 b2d7ec41f2a3 Attach last fence to TLB invalidation job queues
22 cb99e12ba8cb Decouple bind queue last fence from TLB invalidations
23 ebb0880d4973 Skip TLB invalidation waits in page fault binds
24 904b2e5063af Fix kerneldoc for xe_gt_tlb_inval_init_early
25 51cedb93da11 Fix kerneldoc for xe_tlb_inval_job_alloc_dep

Series B — drm/amdgpu: MES INV_TLBS, VM invalidation engine, PT TLB fences

Order Upstream SHA Subject (abbrev.)
1 e86a411b4293 Update MES v12 API header (INV_TLBS)
2 87e65052616c Use INV_TLBS API for TLB invalidation on gfx12
3 4320fd9e0d81 Fix uint32_t field check (supporting fix)
4 85442bac8466 Fix MES version that supports inv_tlbs
5 8dbac5cf8bd5 Same (duplicate-style fix in history—verify you only need one)
6 873373739b9b Reserve VM invalidation engine for uni_mes
7 f3854e04b708 Attach TLB fence to PTs update
8 820b3d376e8a Don't attach TLB fence for SI
9 f4db9913e4d3 Validate flush_gpu_tlb_pasid()
10 9163fe4d790f Revert "don't attach TLB fence for SI"
11 e3a6eff92bbd Fix validating flush_gpu_tlb_pasid()

Series C — arm64 + generic mm: huge PMD / write fault / TLBI

Order Upstream SHA Subject (abbrev.)
1 c320dbb7c80d arm64/mm: Elide TLB flush in certain PTE protection transitions
2 79301c7d605a mm: Spurious fault fixing for huge PMD
3 cb1fa2e99955 arm64, tlbflush: Don't TLBI broadcast if page reused in write fault

Series D — x86/mm + mm: CPA flush, tracepoint home, userfaultfd batching

Order Upstream SHA Subject (abbrev.)
1 658fa653b4d1 Move tlb_flush trace event creation back to x86
2 86e6815b316e cpa_flush()flush_kernel_range() directly
3 50944692052b userfaultfd: opportunistic TLB-flush batching (MOVE)

Series E — x86/mm/tlb trace + mm_types

Order Upstream SHA Subject (abbrev.)
1 8b62e64e6d30 Export TLB_REMOTE_WRONG_CPU in <trace/events/tlb.h>
2 0c01ea92f545 Remove tlb_flush_reason::NR_TLB_FLUSH_REASONS from <linux/mm_types.h>

Series F — mm/hugetlb + rmap: mmu_gather / IPI on PMD unshare

Order Upstream SHA Subject (abbrev.)
1 8ce720d5bd91 Fix excessive IPI broadcasts when unsharing PMD tables (mmu_gather)
2 ca1a47cd3f5f Fix hugetlb_pmd_shared()
3 3937027caecb Fix comments related to huge_pmd_unshare()
4 a8682d500f69 mm/rmap: fix comments related to huge_pmd_unshare()
5 9ee5d1766c8b Fix incorrect error return from hugetlb_reserve_pages()

Series G — s390/mm

Order Upstream SHA Subject (abbrev.)
1 220d8e10d69a Remove cpu_has_idte()
2 68807a894f0c Replace CSP with CSPG
3 02310adcc621 Remove unused flush_tlb()

Series H — MIPS host TLB

Order Upstream SHA Subject (abbrev.)
1 9f048fa48740 Prevent TLB shutdown on initial uniquification
2 841ecc979b18 kmalloc tlb_vpn array to avoid stack overflow

Standalone patches

# Upstream SHA Author (upstream) Subject (abbrev.)
I 6b38a108eeb3 Aashish Sharma iommu/vt-d: Fix unused invalidation hint in qi_desc_iotlb
II 3bcf7894a93e Andi Shyti drm/i915/gt: Use standard API for seqcount read in TLB invalidation
III a57750909580 Vipin Sharma KVM: TDP MMU NX huge pages (MMU read lock)
IV 4e67526840fc Huacai Chen LoongArch: CSR_MERRENTRY / CSR_TLBRENTRY physical addresses
V d9e46de4bf5c Christophe Leroy powerpc/8xx: DataStoreTLBMiss handler cleanup
VI 83b0177a6c48 Ingo Molnar x86/mm: switch_mm_irqs_off() SMP ordering

Appendix B — Quick reference: series authors

Series Primary upstream authors
A drm/xe Michal Wajdeczko, Matthew Brost, Stuart Summers, Matthew Auld, Shuicheng Lin
B drm/amdgpu Shaoyun Liu, Prike Liang, Michael Chen, Alex Deucher, Colin Ian King, Timur Kristóf
C arm64 + mm Dev Jain, Huang Ying
D x86/mm + uffd Steven Rostedt, Yu-cheng Yu, Lokesh Gidra
E trace + mm_types Tal Zussman
F hugetlb / rmap David Hildenbrand, Shameer Kolothum
G s390/mm Heiko Carstens
H MIPS Maciej W. Rozycki, Thomas Bogendoerfer
Standalone Aashish Sharma, Andi Shyti, Vipin Sharma, Huacai Chen, Christophe Leroy, Ingo Molnar

stevenprice-arm and others added 30 commits February 18, 2026 12:11
BugLink: https://bugs.launchpad.net/bugs/2139249

Physical device assignment is not yet supported by the RMM, so it
doesn't make much sense to allow device mappings within the realm.
Prevent them when the guest is a realm.

Signed-off-by: Steven Price <steven.price@arm.com>
Reviewed-by: Gavin Shan <gshan@redhat.com>
Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com>
(backported from https://lore.kernel.org/20250820145606.180644-31-steven.price@arm.com/)
[ianm: context adjustment in arch/arm64/kvm/mmu.c user_mem_abort() around
"KVM: arm64: Handle DABT caused by LS64* instructions on unsupported memory"]
Signed-off-by: Ian May <ianm@nvidia.com>
Acked-by: Jamie Nguyen <jamien@nvidia.com>
Acked-by: Matthew R. Ochs <mochs@nvidia.com>
Acked-by: Jacob Martin <jacob.martin@canonical.com>
Acked-by: Abdur Rahman <abdur.rahman@canonical.com>
Signed-off-by: Ian May <ianm@nvidia.com>
…sical IRQ

BugLink: https://bugs.launchpad.net/bugs/2139249

Arm CCA assigns the physical PMU device to the guest running in realm
world, however the IRQs are routed via the host. To enter a realm guest
while a PMU IRQ is pending it is necessary to block the physical IRQ to
prevent an immediate exit. Provide a mechanism in the PMU driver for KVM
to control the physical IRQ.

Signed-off-by: Steven Price <steven.price@arm.com>
Reviewed-by: Gavin Shan <gshan@redhat.com>
(cherry picked from https://lore.kernel.org/20250820145606.180644-32-steven.price@arm.com/)
Signed-off-by: Ian May <ianm@nvidia.com>
Acked-by: Jamie Nguyen <jamien@nvidia.com>
Acked-by: Matthew R. Ochs <mochs@nvidia.com>
Acked-by: Jacob Martin <jacob.martin@canonical.com>
Acked-by: Abdur Rahman <abdur.rahman@canonical.com>
Signed-off-by: Ian May <ianm@nvidia.com>
BugLink: https://bugs.launchpad.net/bugs/2139249

Use the PMU registers from the RmiRecExit structure to identify when an
overflow interrupt is due and inject it into the guest. Also hook up the
configuration option for enabling the PMU within the guest.

When entering a realm guest with a PMU interrupt pending, it is
necessary to disable the physical interrupt. Otherwise when the RMM
restores the PMU state the physical interrupt will trigger causing an
immediate exit back to the host. The guest is expected to acknowledge
the interrupt causing a host exit (to update the GIC state) which gives
the opportunity to re-enable the physical interrupt before the next PMU
event.

Number of PMU counters is configured by the VMM by writing to PMCR.N.

Signed-off-by: Steven Price <steven.price@arm.com>
Reviewed-by: Gavin Shan <gshan@redhat.com>
(cherry picked from https://lore.kernel.org/20250820145606.180644-33-steven.price@arm.com/)
Signed-off-by: Ian May <ianm@nvidia.com>
Acked-by: Jamie Nguyen <jamien@nvidia.com>
Acked-by: Matthew R. Ochs <mochs@nvidia.com>
Acked-by: Jacob Martin <jacob.martin@canonical.com>
Acked-by: Abdur Rahman <abdur.rahman@canonical.com>
Signed-off-by: Ian May <ianm@nvidia.com>
…ests

BugLink: https://bugs.launchpad.net/bugs/2139249

For protected memory read only isn't supported by the RMM. While it may
be possible to support read only for unprotected memory, this isn't
supported at the present time.

Note that this does mean that ROM (or flash) data cannot be emulated
correctly by the VMM as the stage 2 mappings are either always
read/write or are trapped as MMIO (so don't support operations where the
syndrome information doesn't allow emulation, e.g. load/store pair).

This restriction can be lifted in the future by allowing the unprotected
stage 2 mappings to be made read only.

Signed-off-by: Steven Price <steven.price@arm.com>
Reviewed-by: Gavin Shan <gshan@redhat.com>
Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com>
(cherry picked from https://lore.kernel.org/20250820145606.180644-34-steven.price@arm.com/)
Signed-off-by: Ian May <ianm@nvidia.com>
Acked-by: Jamie Nguyen <jamien@nvidia.com>
Acked-by: Matthew R. Ochs <mochs@nvidia.com>
Acked-by: Jacob Martin <jacob.martin@canonical.com>
Acked-by: Abdur Rahman <abdur.rahman@canonical.com>
Signed-off-by: Ian May <ianm@nvidia.com>
…tchpoints to userspace

BugLink: https://bugs.launchpad.net/bugs/2139249

The RMM describes the maximum number of BPs/WPs available to the guest
in the Feature Register 0. Propagate those numbers into ID_AA64DFR0_EL1,
which is visible to userspace. A VMM needs this information in order to
set up realm parameters.

Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
Signed-off-by: Steven Price <steven.price@arm.com>
Reviewed-by: Gavin Shan <gshan@redhat.com>
Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Reviewed-by: Joey Gouly <joey.gouly@arm.com>
(cherry picked from https://lore.kernel.org/20250820145606.180644-35-steven.price@arm.com/)
Signed-off-by: Ian May <ianm@nvidia.com>
Acked-by: Jamie Nguyen <jamien@nvidia.com>
Acked-by: Matthew R. Ochs <mochs@nvidia.com>
Acked-by: Jacob Martin <jacob.martin@canonical.com>
Acked-by: Abdur Rahman <abdur.rahman@canonical.com>
Signed-off-by: Ian May <ianm@nvidia.com>
…ONE_REG

BugLink: https://bugs.launchpad.net/bugs/2139249

Allow userspace to configure the number of breakpoints and watchpoints
of a Realm VM through KVM_SET_ONE_REG ID_AA64DFR0_EL1.

The KVM sys_reg handler checks the user value against the maximum value
given by RMM (arm64_check_features() gets it from the
read_sanitised_id_aa64dfr0_el1() reset handler).

Userspace discovers that it can write these fields by issuing a
KVM_ARM_GET_REG_WRITABLE_MASKS ioctl.

Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
Signed-off-by: Steven Price <steven.price@arm.com>
Reviewed-by: Gavin Shan <gshan@redhat.com>
Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com>
(cherry picked from https://lore.kernel.org/20250820145606.180644-36-steven.price@arm.com/)
Signed-off-by: Ian May <ianm@nvidia.com>
Acked-by: Jamie Nguyen <jamien@nvidia.com>
Acked-by: Matthew R. Ochs <mochs@nvidia.com>
Acked-by: Jacob Martin <jacob.martin@canonical.com>
Acked-by: Abdur Rahman <abdur.rahman@canonical.com>
Signed-off-by: Ian May <ianm@nvidia.com>
…supported by RMM

BugLink: https://bugs.launchpad.net/bugs/2139249

Provide an accurate number of available PMU counters to userspace when
setting up a Realm.

Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
Signed-off-by: Steven Price <steven.price@arm.com>
Reviewed-by: Gavin Shan <gshan@redhat.com>
Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Reviewed-by: Joey Gouly <joey.gouly@arm.com>
(cherry picked from https://lore.kernel.org/20250820145606.180644-37-steven.price@arm.com/)
Signed-off-by: Ian May <ianm@nvidia.com>
Acked-by: Jamie Nguyen <jamien@nvidia.com>
Acked-by: Matthew R. Ochs <mochs@nvidia.com>
Acked-by: Jacob Martin <jacob.martin@canonical.com>
Acked-by: Abdur Rahman <abdur.rahman@canonical.com>
Signed-off-by: Ian May <ianm@nvidia.com>
BugLink: https://bugs.launchpad.net/bugs/2139249

RMM provides the maximum vector length it supports for a guest in its
feature register. Make it visible to the rest of KVM and to userspace
via KVM_REG_ARM64_SVE_VLS.

Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
Signed-off-by: Steven Price <steven.price@arm.com>
Reviewed-by: Gavin Shan <gshan@redhat.com>
Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com>
(cherry picked from https://lore.kernel.org/20250820145606.180644-38-steven.price@arm.com/)
Signed-off-by: Ian May <ianm@nvidia.com>
Acked-by: Jamie Nguyen <jamien@nvidia.com>
Acked-by: Matthew R. Ochs <mochs@nvidia.com>
Acked-by: Jacob Martin <jacob.martin@canonical.com>
Acked-by: Abdur Rahman <abdur.rahman@canonical.com>
Signed-off-by: Ian May <ianm@nvidia.com>
…Realm

BugLink: https://bugs.launchpad.net/bugs/2139249

Obtain the max vector length configured by userspace on the vCPUs, and
write it into the Realm parameters. By default the vCPU is configured
with the max vector length reported by RMM, and userspace can reduce it
with a write to KVM_REG_ARM64_SVE_VLS.

Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
Signed-off-by: Steven Price <steven.price@arm.com>
Reviewed-by: Gavin Shan <gshan@redhat.com>
Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com>
(cherry picked from https://lore.kernel.org/20250820145606.180644-39-steven.price@arm.com/)
Signed-off-by: Ian May <ianm@nvidia.com>
Acked-by: Jamie Nguyen <jamien@nvidia.com>
Acked-by: Matthew R. Ochs <mochs@nvidia.com>
Acked-by: Jacob Martin <jacob.martin@canonical.com>
Acked-by: Abdur Rahman <abdur.rahman@canonical.com>
Signed-off-by: Ian May <ianm@nvidia.com>
…RME RECs

BugLink: https://bugs.launchpad.net/bugs/2139249

KVM_GET_REG_LIST should not be called before SVE is finalized. The ioctl
handler currently returns -EPERM in this case. But because it uses
kvm_arm_vcpu_is_finalized(), it now also rejects the call for
unfinalized REC even though finalizing the REC can only be done late,
after Realm descriptor creation.

Move the check to copy_sve_reg_indices(). One adverse side effect of
this change is that a KVM_GET_REG_LIST call that only probes for the
array size will now succeed even if SVE is not finalized, but that seems
harmless since the following KVM_GET_REG_LIST with the full array will
fail.

Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
Signed-off-by: Steven Price <steven.price@arm.com>
Reviewed-by: Gavin Shan <gshan@redhat.com>
(cherry picked from https://lore.kernel.org/20250820145606.180644-40-steven.price@arm.com/)
Signed-off-by: Ian May <ianm@nvidia.com>
Acked-by: Jamie Nguyen <jamien@nvidia.com>
Acked-by: Matthew R. Ochs <mochs@nvidia.com>
Acked-by: Jacob Martin <jacob.martin@canonical.com>
Acked-by: Abdur Rahman <abdur.rahman@canonical.com>
Signed-off-by: Ian May <ianm@nvidia.com>
BugLink: https://bugs.launchpad.net/bugs/2139249

Userspace can set a few registers with KVM_SET_ONE_REG (9 GP registers
at runtime, and 3 system registers during initialization). Update the
register list returned by KVM_GET_REG_LIST.

Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
Signed-off-by: Steven Price <steven.price@arm.com>
Reviewed-by: Gavin Shan <gshan@redhat.com>
Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com>
(cherry picked from https://lore.kernel.org/20250820145606.180644-41-steven.price@arm.com/)
Signed-off-by: Ian May <ianm@nvidia.com>
Acked-by: Jamie Nguyen <jamien@nvidia.com>
Acked-by: Matthew R. Ochs <mochs@nvidia.com>
Acked-by: Jacob Martin <jacob.martin@canonical.com>
Acked-by: Abdur Rahman <abdur.rahman@canonical.com>
Signed-off-by: Ian May <ianm@nvidia.com>
BugLink: https://bugs.launchpad.net/bugs/2139249

Select KVM_GENERIC_PRIVATE_MEM and provide the necessary support
functions.

Signed-off-by: Steven Price <steven.price@arm.com>
Reviewed-by: Gavin Shan <gshan@redhat.com>
(cherry picked from https://lore.kernel.org/20250820145606.180644-42-steven.price@arm.com/)
Signed-off-by: Ian May <ianm@nvidia.com>
Acked-by: Jamie Nguyen <jamien@nvidia.com>
Acked-by: Matthew R. Ochs <mochs@nvidia.com>
Acked-by: Jacob Martin <jacob.martin@canonical.com>
Acked-by: Abdur Rahman <abdur.rahman@canonical.com>
Signed-off-by: Ian May <ianm@nvidia.com>
BugLink: https://bugs.launchpad.net/bugs/2139249

Increment KVM_VCPU_MAX_FEATURES to expose the new capability to user
space.

Signed-off-by: Steven Price <steven.price@arm.com>
Reviewed-by: Gavin Shan <gshan@redhat.com>
(cherry picked from https://lore.kernel.org/20250820145606.180644-43-steven.price@arm.com/)
Signed-off-by: Ian May <ianm@nvidia.com>
Acked-by: Jamie Nguyen <jamien@nvidia.com>
Acked-by: Matthew R. Ochs <mochs@nvidia.com>
Acked-by: Jacob Martin <jacob.martin@canonical.com>
Acked-by: Abdur Rahman <abdur.rahman@canonical.com>
Signed-off-by: Ian May <ianm@nvidia.com>
BugLink: https://bugs.launchpad.net/bugs/2139249

Add the ioctl to activate a realm and set the static branch to enable
access to the realm functionality if the RMM is detected.

Signed-off-by: Steven Price <steven.price@arm.com>
Reviewed-by: Gavin Shan <gshan@redhat.com>
Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com>
(cherry picked from https://lore.kernel.org/20250820145606.180644-44-steven.price@arm.com/)
Signed-off-by: Ian May <ianm@nvidia.com>
Acked-by: Jamie Nguyen <jamien@nvidia.com>
Acked-by: Matthew R. Ochs <mochs@nvidia.com>
Acked-by: Jacob Martin <jacob.martin@canonical.com>
Acked-by: Abdur Rahman <abdur.rahman@canonical.com>
Signed-off-by: Ian May <ianm@nvidia.com>
BugLink: https://bugs.launchpad.net/bugs/2139249

Add Memory Encryption Context ID (MECID) support for Realms to provide
better isolation between them when RMM supports MEC.

- Bitmap-based private MECID allocation per Realm
- Reference-counted shared MECID for backward compatibility
- Userspace config via KVM_CAP_ARM_RME_CONFIG_REALM ioctl
- MEC capability query interface (no arm.c changes needed)
- Graceful fallback: MECID 0 when RMM lacks MEC support
- Unconfigured realms default to shared MECID

State managed via struct mecid_state with clear locking semantics.
Policy enum: MEC_POLICY_{UNCONFIGURED,PRIVATE,SHARED}.

Signed-off-by: Raghu Krishnamurthy <raghupathyk@nvidia.com>
Signed-off-by: Ian May <ianm@nvidia.com>
Acked-by: Jamie Nguyen <jamien@nvidia.com>
Acked-by: Matthew R. Ochs <mochs@nvidia.com>
Acked-by: Jacob Martin <jacob.martin@canonical.com>
Acked-by: Abdur Rahman <abdur.rahman@canonical.com>
Signed-off-by: Ian May <ianm@nvidia.com>
BugLink: https://bugs.launchpad.net/bugs/2139249

Add validation to ensure that the number of auxiliary granules returned
by RMM via rmi_rec_aux_count() does not exceed the maximum allowed value
of REC_PARAMS_AUX_GRANULES (16).

This prevents potential buffer overflow in the aux_pages array which is
statically defined with REC_PARAMS_AUX_GRANULES elements in struct
realm_rec.

If the RMM returns a value greater than 16, the realm creation is aborted
with proper cleanup to maintain system integrity.

Signed-off-by: Raghu Krishnamurthy <raghupathyk@nvidia.com>
[ianm: fix missing error return in realm_create_rd() num_aux check]
Signed-off-by: Ian May <ianm@nvidia.com>
Acked-by: Jamie Nguyen <jamien@nvidia.com>
Acked-by: Matthew R. Ochs <mochs@nvidia.com>
Acked-by: Jacob Martin <jacob.martin@canonical.com>
Acked-by: Abdur Rahman <abdur.rahman@canonical.com>
Signed-off-by: Ian May <ianm@nvidia.com>
…meter

BugLink: https://bugs.launchpad.net/bugs/2139249

Add a module parameter to expose the KVM_CAP_ARM_RME capability number
via sysfs. This allows userspace (QEMU) to discover the correct
capability number at runtime rather than relying on hardcoded values
that may become stale when capability numbers shift due to other
patches being merged.

The value is exposed at:
  /sys/module/kvm/parameters/kvm_cap_arm_rme

Signed-off-by: Ian May <ianm@nvidia.com>
Acked-by: Jamie Nguyen <jamien@nvidia.com>
Acked-by: Matthew R. Ochs <mochs@nvidia.com>
Acked-by: Jacob Martin <jacob.martin@canonical.com>
Acked-by: Abdur Rahman <abdur.rahman@canonical.com>
Signed-off-by: Ian May <ianm@nvidia.com>
BugLink: https://bugs.launchpad.net/bugs/2139249

For ioremap(), so far we only checked if it was a device (RIPAS_DEV) to choose
an encrypted vs decrypted mapping. However, we may have firmware reserved memory
regions exposed to the OS (e.g., EFI Coco Secret Securityfs, ACPI CCEL).
We need to make sure that anything that is RIPAS_RAM (i.e., Guest
protected memory with RMM guarantees) are also mapped as encrypted.

Rephrasing the above, anything that is not RIPAS_EMPTY is guaranteed to be
protected by the RMM. Thus we choose encrypted mapping for anything that is not
RIPAS_EMPTY. While at it, rename the helper function

  __arm64_is_protected_mmio => arm64_rsi_is_protected

to clearly indicate that this not an arm64 generic helper, but something to do
with Realms.

Cc: Sami Mujawar <sami.mujawar@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Aneesh Kumar K.V <aneesh.kumar@kernel.org>
Cc: Steven Price <steven.price@arm.com>
Reviewed-by: Gavin Shan <gshan@redhat.com>
Reviewed-by: Steven Price <steven.price@arm.com>
Tested-by: Sami Mujawar <sami.mujawar@arm.com>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Signed-off-by: Will Deacon <will@kernel.org>
(cherry picked from commit fa84e53)
Signed-off-by: Shanker Donthineni <sdonthineni@nvidia.com>
Signed-off-by: Ian May <ianm@nvidia.com>
Acked-by: Jamie Nguyen <jamien@nvidia.com>
Acked-by: Matthew R. Ochs <mochs@nvidia.com>
Acked-by: Jacob Martin <jacob.martin@canonical.com>
Acked-by: Abdur Rahman <abdur.rahman@canonical.com>
Signed-off-by: Ian May <ianm@nvidia.com>
BugLink: https://bugs.launchpad.net/bugs/2139249

Enable EFI COCO secrets support. Provide the ioremap_encrypted() support required
by the driver.

Cc: Sami Mujawar <sami.mujawar@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Aneesh Kumar K.V <aneesh.kumar@kernel.org>
Cc: Steven Price <steven.price@arm.com>
Reviewed-by: Gavin Shan <gshan@redhat.com>
Tested-by: Sami Mujawar <sami.mujawar@arm.com>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Signed-off-by: Will Deacon <will@kernel.org>
(cherry picked from commit 9e8a3df)
Signed-off-by: Shanker Donthineni <sdonthineni@nvidia.com>
Signed-off-by: Ian May <ianm@nvidia.com>
Acked-by: Jamie Nguyen <jamien@nvidia.com>
Acked-by: Matthew R. Ochs <mochs@nvidia.com>
Acked-by: Jacob Martin <jacob.martin@canonical.com>
Acked-by: Abdur Rahman <abdur.rahman@canonical.com>
Signed-off-by: Ian May <ianm@nvidia.com>
BugLink: https://bugs.launchpad.net/bugs/2139249

Signed-off-by: Ian May <ianm@nvidia.com>
Acked-by: Jamie Nguyen <jamien@nvidia.com>
Acked-by: Matthew R. Ochs <mochs@nvidia.com>
Acked-by: Jacob Martin <jacob.martin@canonical.com>
Acked-by: Abdur Rahman <abdur.rahman@canonical.com>
Signed-off-by: Ian May <ianm@nvidia.com>
…xplicitly

The virtual nvidia-kernel-source and nvidia-dkms-kernel dependencies
would sometimes pull the 470 driver, which is incompatible with the
nvidia-fs build.

Stick to the latest LTS. This could be made to use the virtual packages
again once the 470 driver transitionals are released.

Ignore: yes
Signed-off-by: Jacob Martin <jacob.martin@canonical.com>
Ignore: yes
Signed-off-by: Jacob Martin <jacob.martin@canonical.com>
BugLink: https://bugs.launchpad.net/bugs/2141777
Properties: no-test-build
Signed-off-by: Jacob Martin <jacob.martin@canonical.com>
…ernel-versions (main/d2026.02.09)

BugLink: https://bugs.launchpad.net/bugs/1786013
Signed-off-by: Jacob Martin <jacob.martin@canonical.com>
BugLink: https://bugs.launchpad.net/bugs/2093957

The patch "UBUNTU: [Packaging] Enable coresight in Perf if arm64"
enables perf to be built with CORESIGHT=1 on arm64. This requires
libopencsd.

Signed-off-by: Jacob Martin <jacob.martin@canonical.com>
Signed-off-by: Jacob Martin <jacob.martin@canonical.com>
BugLink: https://bugs.launchpad.net/bugs/2141780

When the r8127 module is unloaded, __netif_napi_del_locked() can trigger
a WARN because NAPI is removed while still enabled. unregister_netdev()
calls ndo_stop, which disables NAPI; deleting NAPI before that runs
violates the netdev/NAPI teardown order.

Move rtl8127_del_napi() to after unregister_netdev() so NAPI is disabled
in ndo_stop before it is removed.

Aligns with the upstream r8169 fix in commit 12b1bc7
("r8169: improve rtl_remove_one").

Signed-off-by: Nirmoy Das <nirmoyd@nvidia.com>
Acked-by: Carol L Soto <csoto@nvidia.com>
Acked-by: Matthew R. Ochs <mochs@nvidia.com>
Acked-by: Noah Wager <noah.wager@canonical.com>
Acked-by: Jacob Martin <jacob.martin@canonical.com>
Signed-off-by: Brad Figg <bfigg@nvidia.com>
BugLink: https://bugs.launchpad.net/bugs/2142160

When initializing EGM (Extended GPU Memory) regions, the current
implementation performs a single memset operation over the entire
memory region. For very large regions, this can result in long-running
uninterruptible operations that may cause system responsiveness issues
or trigger watchdog timeouts.

Split the memset operation into 1GB chunks.

Signed-off-by: Ankit Agrawal <ankita@nvidia.com>
Acked-by: Carol L Soto <csoto@nvidia.com>
Acked-by: Matthew R. Ochs <mochs@nvidia.com>
Acked-by: Noah Wager <noah.wager@canonical.com>
Acked-by: Jacob Martin <jacob.martin@canonical.com>
Signed-off-by: Brad Figg <bfigg@nvidia.com>
…by country code of 11d changed"

BugLink: https://bugs.launchpad.net/bugs/2142694

This reverts commit 7dfd80e.

The changes are now merged in mainline kernel so reverting
these sauce changes. The mainline changes will be cherry
picked in subsequent commits.

Signed-off-by: Abhishek Sahu <abhsahu@nvidia.com>
Acked-by: Jamie Nguyen <jamien@nvidia.com>
Acked-by: Matthew R. Ochs <mochs@nvidia.com>
Acked-by: Noah Wager <noah.wager@canonical.com>
Acked-by: Jacob Martin <jacob.martin@canonical.com>
Signed-off-by: Brad Figg <bfigg@nvidia.com>
…port"

BugLink: https://bugs.launchpad.net/bugs/2142694

This reverts commit fa8adb8.

The changes are now merged in mainline kernel so reverting
these sauce changes. The mainline changes will be cherry
picked in subsequent commits.

Signed-off-by: Abhishek Sahu <abhsahu@nvidia.com>
Acked-by: Jamie Nguyen <jamien@nvidia.com>
Acked-by: Matthew R. Ochs <mochs@nvidia.com>
Acked-by: Noah Wager <noah.wager@canonical.com>
Acked-by: Jacob Martin <jacob.martin@canonical.com>
Signed-off-by: Brad Figg <bfigg@nvidia.com>
JiandiAnNVIDIA pushed a commit that referenced this pull request Apr 15, 2026
BugLink: https://bugs.launchpad.net/bugs/2137723

commit 63b5aa0 upstream.

When emulating an nvme device on qemu with both logical_block_size and
physical_block_size set to 8 KiB, but without format, a kernel panic
was triggered during the early boot stage while attempting to mount a
vfat filesystem.

[95553.682035] EXT4-fs (nvme0n1): unable to set blocksize
[95553.684326] EXT4-fs (nvme0n1): unable to set blocksize
[95553.686501] EXT4-fs (nvme0n1): unable to set blocksize
[95553.696448] ISOFS: unsupported/invalid hardware sector size 8192
[95553.697117] ------------[ cut here ]------------
[95553.697567] kernel BUG at fs/buffer.c:1582!
[95553.697984] Oops: invalid opcode: 0000 [#1] SMP NOPTI
[95553.698602] CPU: 0 UID: 0 PID: 7212 Comm: mount Kdump: loaded Not tainted 6.18.0-rc2+ NVIDIA#38 PREEMPT(voluntary)
[95553.699511] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/01/2014
[95553.700534] RIP: 0010:folio_alloc_buffers+0x1bb/0x1c0
[95553.701018] Code: 48 8b 15 e8 93 18 02 65 48 89 35 e0 93 18 02 48 83 c4 10 5b 41 5c 41 5d 41 5e 41 5f 5d 31 d2 31 c9 31 f6 31 ff c3 cc cc cc cc <0f> 0b 90 66 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 0f
[95553.702648] RSP: 0018:ffffd1b0c676f990 EFLAGS: 00010246
[95553.703132] RAX: ffff8cfc4176d820 RBX: 0000000000508c48 RCX: 0000000000000001
[95553.703805] RDX: 0000000000002000 RSI: 0000000000000000 RDI: 0000000000000000
[95553.704481] RBP: ffffd1b0c676f9c8 R08: 0000000000000000 R09: 0000000000000000
[95553.705148] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000001
[95553.705816] R13: 0000000000002000 R14: fffff8bc8257e800 R15: 0000000000000000
[95553.706483] FS:  000072ee77315840(0000) GS:ffff8cfdd2c8d000(0000) knlGS:0000000000000000
[95553.707248] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[95553.707782] CR2: 00007d8f2a9e5a20 CR3: 0000000039d0c006 CR4: 0000000000772ef0
[95553.708439] PKRU: 55555554
[95553.708734] Call Trace:
[95553.709015]  <TASK>
[95553.709266]  __getblk_slow+0xd2/0x230
[95553.709641]  ? find_get_block_common+0x8b/0x530
[95553.710084]  bdev_getblk+0x77/0xa0
[95553.710449]  __bread_gfp+0x22/0x140
[95553.710810]  fat_fill_super+0x23a/0xfc0
[95553.711216]  ? __pfx_setup+0x10/0x10
[95553.711580]  ? __pfx_vfat_fill_super+0x10/0x10
[95553.712014]  vfat_fill_super+0x15/0x30
[95553.712401]  get_tree_bdev_flags+0x141/0x1e0
[95553.712817]  get_tree_bdev+0x10/0x20
[95553.713177]  vfat_get_tree+0x15/0x20
[95553.713550]  vfs_get_tree+0x2a/0x100
[95553.713910]  vfs_cmd_create+0x62/0xf0
[95553.714273]  __do_sys_fsconfig+0x4e7/0x660
[95553.714669]  __x64_sys_fsconfig+0x20/0x40
[95553.715062]  x64_sys_call+0x21ee/0x26a0
[95553.715453]  do_syscall_64+0x80/0x670
[95553.715816]  ? __fs_parse+0x65/0x1e0
[95553.716172]  ? fat_parse_param+0x103/0x4b0
[95553.716587]  ? vfs_parse_fs_param_source+0x21/0xa0
[95553.717034]  ? __do_sys_fsconfig+0x3d9/0x660
[95553.717548]  ? __x64_sys_fsconfig+0x20/0x40
[95553.717957]  ? x64_sys_call+0x21ee/0x26a0
[95553.718360]  ? do_syscall_64+0xb8/0x670
[95553.718734]  ? __x64_sys_fsconfig+0x20/0x40
[95553.719141]  ? x64_sys_call+0x21ee/0x26a0
[95553.719545]  ? do_syscall_64+0xb8/0x670
[95553.719922]  ? x64_sys_call+0x1405/0x26a0
[95553.720317]  ? do_syscall_64+0xb8/0x670
[95553.720702]  ? __x64_sys_close+0x3e/0x90
[95553.721080]  ? x64_sys_call+0x1b5e/0x26a0
[95553.721478]  ? do_syscall_64+0xb8/0x670
[95553.721841]  ? irqentry_exit+0x43/0x50
[95553.722211]  ? exc_page_fault+0x90/0x1b0
[95553.722681]  entry_SYSCALL_64_after_hwframe+0x76/0x7e
[95553.723166] RIP: 0033:0x72ee774f3afe
[95553.723562] Code: 73 01 c3 48 8b 0d 0a 33 0f 00 f7 d8 64 89 01 48 83 c8 ff c3 0f 1f 84 00 00 00 00 00 f3 0f 1e fa 49 89 ca b8 af 01 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d da 32 0f 00 f7 d8 64 89 01 48
[95553.725188] RSP: 002b:00007ffe97148978 EFLAGS: 00000246 ORIG_RAX: 00000000000001af
[95553.725892] RAX: ffffffffffffffda RBX: 00005dcfe53d0080 RCX: 000072ee774f3afe
[95553.726526] RDX: 0000000000000000 RSI: 0000000000000006 RDI: 0000000000000003
[95553.727176] RBP: 00007ffe97148ac0 R08: 0000000000000000 R09: 000072ee775e7ac0
[95553.727818] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
[95553.728459] R13: 00005dcfe53d04b0 R14: 000072ee77670b00 R15: 00005dcfe53d1a28
[95553.729086]  </TASK>

The panic occurs as follows:
1. logical_block_size is 8KiB, causing {struct super_block *sb}->s_blocksize
is initialized to 0.
vfat_fill_super
 - fat_fill_super
  - sb_min_blocksize
   - sb_set_blocksize //return 0 when size is 8KiB.
2. __bread_gfp is called with size == 0, causing folio_alloc_buffers() to
compute an offset equal to folio_size(folio), which triggers a BUG_ON.
fat_fill_super
 - sb_bread
  - __bread_gfp  // size == {struct super_block *sb}->s_blocksize == 0
   - bdev_getblk
    - __getblk_slow
     - grow_buffers
      - grow_dev_folio
       - folio_alloc_buffers  // size == 0
        - folio_set_bh //offset == folio_size(folio) and panic

To fix this issue, add proper return value checks for
sb_min_blocksize().

Cc: stable@vger.kernel.org # v6.15
Fixes: a64e5a5 ("bdev: add back PAGE_SIZE block size validation for sb_set_blocksize()")
Reviewed-by: Matthew Wilcox <willy@infradead.org>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Yongpeng Yang <yangyongpeng@xiaomi.com>
Link: https://patch.msgid.link/20251104125009.2111925-2-yangyongpeng.storage@gmail.com
Acked-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Signed-off-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
CVE-2025-40265
Signed-off-by: Bethany <bethany.jamison@canonical.com>
Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
Signed-off-by: Edoardo Canepa <edoardo.canepa@canonical.com>
JiandiAnNVIDIA pushed a commit that referenced this pull request Apr 15, 2026
BugLink: https://bugs.launchpad.net/bugs/2137723

commit ec33b59 upstream.

The kernel test has reported:

  BUG: unable to handle page fault for address: fffba000
  #PF: supervisor write access in kernel mode
  #PF: error_code(0x0002) - not-present page
  *pde = 03171067 *pte = 00000000
  Oops: Oops: 0002 [#1]
  CPU: 0 UID: 0 PID: 1 Comm: swapper/0 Tainted: G                T   6.18.0-rc2-00031-gec7f31b2a2d3 #1 NONE  a1d066dfe789f54bc7645c7989957d2bdee593ca
  Tainted: [T]=RANDSTRUCT
  Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.3-debian-1.16.3-2 04/01/2014
  EIP: memset (arch/x86/include/asm/string_32.h:168 arch/x86/lib/memcpy_32.c:17)
  Code: a5 8b 4d f4 83 e1 03 74 02 f3 a4 83 c4 04 5e 5f 5d 2e e9 73 41 01 00 90 90 90 3e 8d 74 26 00 55 89 e5 57 56 89 c6 89 d0 89 f7 <f3> aa 89 f0 5e 5f 5d 2e e9 53 41 01 00 cc cc cc 55 89 e5 53 57 56
  EAX: 0000006b EBX: 00000015 ECX: 001fefff EDX: 0000006b
  ESI: fffb9000 EDI: fffba000 EBP: c611fbf0 ESP: c611fbe8
  DS: 007b ES: 007b FS: 0000 GS: 0000 SS: 0068 EFLAGS: 00010287
  CR0: 80050033 CR2: fffba000 CR3: 0316e000 CR4: 00040690
  Call Trace:
   poison_element (mm/mempool.c:83 mm/mempool.c:102)
   mempool_init_node (mm/mempool.c:142 mm/mempool.c:226)
   mempool_init_noprof (mm/mempool.c:250 (discriminator 1))
   ? mempool_alloc_pages (mm/mempool.c:640)
   bio_integrity_initfn (block/bio-integrity.c:483 (discriminator 8))
   ? mempool_alloc_pages (mm/mempool.c:640)
   do_one_initcall (init/main.c:1283)

Christoph found out this is due to the poisoning code not dealing
properly with CONFIG_HIGHMEM because only the first page is mapped but
then the whole potentially high-order page is accessed.

We could give up on HIGHMEM here, but it's straightforward to fix this
with a loop that's mapping, poisoning or checking and unmapping
individual pages.

Reported-by: kernel test robot <oliver.sang@intel.com>
Closes: https://lore.kernel.org/oe-lkp/202511111411.9ebfa1ba-lkp@intel.com
Analyzed-by: Christoph Hellwig <hch@lst.de>
Fixes: bdfedb7 ("mm, mempool: poison elements backed by slab allocator")
Cc: stable@vger.kernel.org
Tested-by: kernel test robot <oliver.sang@intel.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Link: https://patch.msgid.link/20251113-mempool-poison-v1-1-233b3ef984c3@suse.cz
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
CVE-2025-68231
Signed-off-by: Bethany <bethany.jamison@canonical.com>
Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
Signed-off-by: Edoardo Canepa <edoardo.canepa@canonical.com>
JiandiAnNVIDIA pushed a commit that referenced this pull request Apr 15, 2026
BugLink: https://bugs.launchpad.net/bugs/2137723

commit 0a2c549 upstream.

nvme_fc_delete_assocation() waits for pending I/O to complete before
returning, and an error can cause ->ioerr_work to be queued after
cancel_work_sync() had been called.  Move the call to cancel_work_sync() to
be after nvme_fc_delete_association() to ensure ->ioerr_work is not running
when the nvme_fc_ctrl object is freed.  Otherwise the following can occur:

[ 1135.911754] list_del corruption, ff2d24c8093f31f8->next is NULL
[ 1135.917705] ------------[ cut here ]------------
[ 1135.922336] kernel BUG at lib/list_debug.c:52!
[ 1135.926784] Oops: invalid opcode: 0000 [#1] SMP NOPTI
[ 1135.931851] CPU: 48 UID: 0 PID: 726 Comm: kworker/u449:23 Kdump: loaded Not tainted 6.12.0 #1 PREEMPT(voluntary)
[ 1135.943490] Hardware name: Dell Inc. PowerEdge R660/0HGTK9, BIOS 2.5.4 01/16/2025
[ 1135.950969] Workqueue:  0x0 (nvme-wq)
[ 1135.954673] RIP: 0010:__list_del_entry_valid_or_report.cold+0xf/0x6f
[ 1135.961041] Code: c7 c7 98 68 72 94 e8 26 45 fe ff 0f 0b 48 c7 c7 70 68 72 94 e8 18 45 fe ff 0f 0b 48 89 fe 48 c7 c7 80 69 72 94 e8 07 45 fe ff <0f> 0b 48 89 d1 48 c7 c7 a0 6a 72 94 48 89 c2 e8 f3 44 fe ff 0f 0b
[ 1135.979788] RSP: 0018:ff579b19482d3e50 EFLAGS: 00010046
[ 1135.985015] RAX: 0000000000000033 RBX: ff2d24c8093f31f0 RCX: 0000000000000000
[ 1135.992148] RDX: 0000000000000000 RSI: ff2d24d6bfa1d0c0 RDI: ff2d24d6bfa1d0c0
[ 1135.999278] RBP: ff2d24c8093f31f8 R08: 0000000000000000 R09: ffffffff951e2b08
[ 1136.006413] R10: ffffffff95122ac8 R11: 0000000000000003 R12: ff2d24c78697c100
[ 1136.013546] R13: fffffffffffffff8 R14: 0000000000000000 R15: ff2d24c78697c0c0
[ 1136.020677] FS:  0000000000000000(0000) GS:ff2d24d6bfa00000(0000) knlGS:0000000000000000
[ 1136.028765] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1136.034510] CR2: 00007fd207f90b80 CR3: 000000163ea22003 CR4: 0000000000f73ef0
[ 1136.041641] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 1136.048776] DR3: 0000000000000000 DR6: 00000000fffe07f0 DR7: 0000000000000400
[ 1136.055910] PKRU: 55555554
[ 1136.058623] Call Trace:
[ 1136.061074]  <TASK>
[ 1136.063179]  ? show_trace_log_lvl+0x1b0/0x2f0
[ 1136.067540]  ? show_trace_log_lvl+0x1b0/0x2f0
[ 1136.071898]  ? move_linked_works+0x4a/0xa0
[ 1136.075998]  ? __list_del_entry_valid_or_report.cold+0xf/0x6f
[ 1136.081744]  ? __die_body.cold+0x8/0x12
[ 1136.085584]  ? die+0x2e/0x50
[ 1136.088469]  ? do_trap+0xca/0x110
[ 1136.091789]  ? do_error_trap+0x65/0x80
[ 1136.095543]  ? __list_del_entry_valid_or_report.cold+0xf/0x6f
[ 1136.101289]  ? exc_invalid_op+0x50/0x70
[ 1136.105127]  ? __list_del_entry_valid_or_report.cold+0xf/0x6f
[ 1136.110874]  ? asm_exc_invalid_op+0x1a/0x20
[ 1136.115059]  ? __list_del_entry_valid_or_report.cold+0xf/0x6f
[ 1136.120806]  move_linked_works+0x4a/0xa0
[ 1136.124733]  worker_thread+0x216/0x3a0
[ 1136.128485]  ? __pfx_worker_thread+0x10/0x10
[ 1136.132758]  kthread+0xfa/0x240
[ 1136.135904]  ? __pfx_kthread+0x10/0x10
[ 1136.139657]  ret_from_fork+0x31/0x50
[ 1136.143236]  ? __pfx_kthread+0x10/0x10
[ 1136.146988]  ret_from_fork_asm+0x1a/0x30
[ 1136.150915]  </TASK>

Fixes: 19fce04 ("nvme-fc: avoid calling _nvme_fc_abort_outstanding_ios from interrupt context")
Cc: stable@vger.kernel.org
Tested-by: Marco Patalano <mpatalan@redhat.com>
Reviewed-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: Ewan D. Milne <emilne@redhat.com>
Signed-off-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
CVE-2025-40261
Signed-off-by: Bethany <bethany.jamison@canonical.com>
Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
Signed-off-by: Edoardo Canepa <edoardo.canepa@canonical.com>
JiandiAnNVIDIA pushed a commit that referenced this pull request Apr 15, 2026
BugLink: https://bugs.launchpad.net/bugs/2137723

commit e696518 upstream.

If the allocation of tl_hba->sh fails in tcm_loop_driver_probe() and we
attempt to dereference it in tcm_loop_tpg_address_show() we will get a
segfault, see below for an example. So, check tl_hba->sh before
dereferencing it.

  Unable to allocate struct scsi_host
  BUG: kernel NULL pointer dereference, address: 0000000000000194
  #PF: supervisor read access in kernel mode
  #PF: error_code(0x0000) - not-present page
  PGD 0 P4D 0
  Oops: 0000 [#1] PREEMPT SMP NOPTI
  CPU: 1 PID: 8356 Comm: tokio-runtime-w Not tainted 6.6.104.2-4.azl3 #1
  Hardware name: Microsoft Corporation Virtual Machine/Virtual Machine, BIOS Hyper-V UEFI Release v4.1 09/28/2024
  RIP: 0010:tcm_loop_tpg_address_show+0x2e/0x50 [tcm_loop]
...
  Call Trace:
   <TASK>
   configfs_read_iter+0x12d/0x1d0 [configfs]
   vfs_read+0x1b5/0x300
   ksys_read+0x6f/0xf0
...

Cc: stable@vger.kernel.org
Fixes: 2628b35 ("tcm_loop: Show address of tpg in configfs")
Signed-off-by: Hamza Mahfooz <hamzamahfooz@linux.microsoft.com>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Reviewed-by: Allen Pais <apais@linux.microsoft.com>
Link: https://patch.msgid.link/1762370746-6304-1-git-send-email-hamzamahfooz@linux.microsoft.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
CVE-2025-68229
Signed-off-by: Bethany <bethany.jamison@canonical.com>
Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
Signed-off-by: Edoardo Canepa <edoardo.canepa@canonical.com>
JiandiAnNVIDIA pushed a commit that referenced this pull request Apr 15, 2026
BugLink: https://bugs.launchpad.net/bugs/2137723

[ Upstream commit dfe28c4 ]

The validation of the set(nsh(...)) action is completely wrong.
It runs through the nsh_key_put_from_nlattr() function that is the
same function that validates NSH keys for the flow match and the
push_nsh() action.  However, the set(nsh(...)) has a very different
memory layout.  Nested attributes in there are doubled in size in
case of the masked set().  That makes proper validation impossible.

There is also confusion in the code between the 'masked' flag, that
says that the nested attributes are doubled in size containing both
the value and the mask, and the 'is_mask' that says that the value
we're parsing is the mask.  This is causing kernel crash on trying to
write into mask part of the match with SW_FLOW_KEY_PUT() during
validation, while validate_nsh() doesn't allocate any memory for it:

  BUG: kernel NULL pointer dereference, address: 0000000000000018
  #PF: supervisor read access in kernel mode
  #PF: error_code(0x0000) - not-present page
  PGD 1c2383067 P4D 1c2383067 PUD 20b703067 PMD 0
  Oops: Oops: 0000 [#1] SMP NOPTI
  CPU: 8 UID: 0 Kdump: loaded Not tainted 6.17.0-rc4+ NVIDIA#107 PREEMPT(voluntary)
  RIP: 0010:nsh_key_put_from_nlattr+0x19d/0x610 [openvswitch]
  Call Trace:
   <TASK>
   validate_nsh+0x60/0x90 [openvswitch]
   validate_set.constprop.0+0x270/0x3c0 [openvswitch]
   __ovs_nla_copy_actions+0x477/0x860 [openvswitch]
   ovs_nla_copy_actions+0x8d/0x100 [openvswitch]
   ovs_packet_cmd_execute+0x1cc/0x310 [openvswitch]
   genl_family_rcv_msg_doit+0xdb/0x130
   genl_family_rcv_msg+0x14b/0x220
   genl_rcv_msg+0x47/0xa0
   netlink_rcv_skb+0x53/0x100
   genl_rcv+0x24/0x40
   netlink_unicast+0x280/0x3b0
   netlink_sendmsg+0x1f7/0x430
   ____sys_sendmsg+0x36b/0x3a0
   ___sys_sendmsg+0x87/0xd0
   __sys_sendmsg+0x6d/0xd0
   do_syscall_64+0x7b/0x2c0
   entry_SYSCALL_64_after_hwframe+0x76/0x7e

The third issue with this process is that while trying to convert
the non-masked set into masked one, validate_set() copies and doubles
the size of the OVS_KEY_ATTR_NSH as if it didn't have any nested
attributes.  It should be copying each nested attribute and doubling
them in size independently.  And the process must be properly reversed
during the conversion back from masked to a non-masked variant during
the flow dump.

In the end, the only two outcomes of trying to use this action are
either validation failure or a kernel crash.  And if somehow someone
manages to install a flow with such an action, it will most definitely
not do what it is supposed to, since all the keys and the masks are
mixed up.

Fixing all the issues is a complex task as it requires re-writing
most of the validation code.

Given that and the fact that this functionality never worked since
introduction, let's just remove it altogether.  It's better to
re-introduce it later with a proper implementation instead of trying
to fix it in stable releases.

Fixes: b2d0f5d ("openvswitch: enable NSH support")
Reported-by: Junvy Yang <zhuque@tencent.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Acked-by: Eelco Chaudron <echaudro@redhat.com>
Reviewed-by: Aaron Conole <aconole@redhat.com>
Link: https://patch.msgid.link/20251112112246.95064-1-i.maximets@ovn.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
CVE-2025-40254
Signed-off-by: Bethany <bethany.jamison@canonical.com>
Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
Signed-off-by: Edoardo Canepa <edoardo.canepa@canonical.com>
JiandiAnNVIDIA pushed a commit that referenced this pull request Apr 15, 2026
BugLink: https://bugs.launchpad.net/bugs/2137723

[ Upstream commit f94c1a1 ]

The function devl_rate_nodes_destroy is documented to "Unset parent for
all rate objects". However, it was only calling the driver-specific
`rate_leaf_parent_set` or `rate_node_parent_set` ops and decrementing
the parent's refcount, without actually setting the
`devlink_rate->parent` pointer to NULL.

This leaves a dangling pointer in the `devlink_rate` struct, which cause
refcount error in netdevsim[1] and mlx5[2]. In addition, this is
inconsistent with the behavior of `devlink_nl_rate_parent_node_set`,
where the parent pointer is correctly cleared.

This patch fixes the issue by explicitly setting `devlink_rate->parent`
to NULL after notifying the driver, thus fulfilling the function's
documented behavior for all rate objects.

[1]
repro steps:
echo 1 > /sys/bus/netdevsim/new_device
devlink dev eswitch set netdevsim/netdevsim1 mode switchdev
echo 1 > /sys/bus/netdevsim/devices/netdevsim1/sriov_numvfs
devlink port function rate add netdevsim/netdevsim1/test_node
devlink port function rate set netdevsim/netdevsim1/128 parent test_node
echo 1 > /sys/bus/netdevsim/del_device

dmesg:
refcount_t: decrement hit 0; leaking memory.
WARNING: CPU: 8 PID: 1530 at lib/refcount.c:31 refcount_warn_saturate+0x42/0xe0
CPU: 8 UID: 0 PID: 1530 Comm: bash Not tainted 6.18.0-rc4+ #1 NONE
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014
RIP: 0010:refcount_warn_saturate+0x42/0xe0
Call Trace:
 <TASK>
 devl_rate_leaf_destroy+0x8d/0x90
 __nsim_dev_port_del+0x6c/0x70 [netdevsim]
 nsim_dev_reload_destroy+0x11c/0x140 [netdevsim]
 nsim_drv_remove+0x2b/0xb0 [netdevsim]
 device_release_driver_internal+0x194/0x1f0
 bus_remove_device+0xc6/0x130
 device_del+0x159/0x3c0
 device_unregister+0x1a/0x60
 del_device_store+0x111/0x170 [netdevsim]
 kernfs_fop_write_iter+0x12e/0x1e0
 vfs_write+0x215/0x3d0
 ksys_write+0x5f/0xd0
 do_syscall_64+0x55/0x10f0
 entry_SYSCALL_64_after_hwframe+0x4b/0x53

[2]
devlink dev eswitch set pci/0000:08:00.0 mode switchdev
devlink port add pci/0000:08:00.0 flavour pcisf pfnum 0 sfnum 1000
devlink port function rate add pci/0000:08:00.0/group1
devlink port function rate set pci/0000:08:00.0/32768 parent group1
modprobe -r mlx5_ib mlx5_fwctl mlx5_core

dmesg:
refcount_t: decrement hit 0; leaking memory.
WARNING: CPU: 7 PID: 16151 at lib/refcount.c:31 refcount_warn_saturate+0x42/0xe0
CPU: 7 UID: 0 PID: 16151 Comm: bash Not tainted 6.17.0-rc7_for_upstream_min_debug_2025_10_02_12_44 #1 NONE
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/01/2014
RIP: 0010:refcount_warn_saturate+0x42/0xe0
Call Trace:
 <TASK>
 devl_rate_leaf_destroy+0x8d/0x90
 mlx5_esw_offloads_devlink_port_unregister+0x33/0x60 [mlx5_core]
 mlx5_esw_offloads_unload_rep+0x3f/0x50 [mlx5_core]
 mlx5_eswitch_unload_sf_vport+0x40/0x90 [mlx5_core]
 mlx5_sf_esw_event+0xc4/0x120 [mlx5_core]
 notifier_call_chain+0x33/0xa0
 blocking_notifier_call_chain+0x3b/0x50
 mlx5_eswitch_disable_locked+0x50/0x110 [mlx5_core]
 mlx5_eswitch_disable+0x63/0x90 [mlx5_core]
 mlx5_unload+0x1d/0x170 [mlx5_core]
 mlx5_uninit_one+0xa2/0x130 [mlx5_core]
 remove_one+0x78/0xd0 [mlx5_core]
 pci_device_remove+0x39/0xa0
 device_release_driver_internal+0x194/0x1f0
 unbind_store+0x99/0xa0
 kernfs_fop_write_iter+0x12e/0x1e0
 vfs_write+0x215/0x3d0
 ksys_write+0x5f/0xd0
 do_syscall_64+0x53/0x1f0
 entry_SYSCALL_64_after_hwframe+0x4b/0x53

Fixes: d755598 ("devlink: Allow setting parent node of rate objects")
Signed-off-by: Shay Drory <shayd@nvidia.com>
Reviewed-by: Carolina Jubran <cjubran@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/1763381149-1234377-1-git-send-email-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
CVE-2025-40251
Signed-off-by: Bethany <bethany.jamison@canonical.com>
Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
Signed-off-by: Edoardo Canepa <edoardo.canepa@canonical.com>
JiandiAnNVIDIA pushed a commit that referenced this pull request Apr 15, 2026
BugLink: https://bugs.launchpad.net/bugs/2137723

[ Upstream commit d47515a ]

The mlx5_irq_alloc() function can inadvertently free the entire rmap
and end up in a crash[1] when the other threads tries to access this,
when request_irq() fails due to exhausted IRQ vectors. This commit
modifies the cleanup to remove only the specific IRQ mapping that was
just added.

This prevents removal of other valid mappings and ensures precise
cleanup of the failed IRQ allocation's associated glue object.

Note: This error is observed when both fwctl and rds configs are enabled.

[1]
mlx5_core 0000:05:00.0: Successfully registered panic handler for port 1
mlx5_core 0000:05:00.0: mlx5_irq_alloc:293:(pid 66740): Failed to
request irq. err = -28
infiniband mlx5_0: mlx5_ib_test_wc:290:(pid 66740): Error -28 while
trying to test write-combining support
mlx5_core 0000:05:00.0: Successfully unregistered panic handler for port 1
mlx5_core 0000:06:00.0: Successfully registered panic handler for port 1
mlx5_core 0000:06:00.0: mlx5_irq_alloc:293:(pid 66740): Failed to
request irq. err = -28
infiniband mlx5_0: mlx5_ib_test_wc:290:(pid 66740): Error -28 while
trying to test write-combining support
mlx5_core 0000:06:00.0: Successfully unregistered panic handler for port 1
mlx5_core 0000:03:00.0: mlx5_irq_alloc:293:(pid 28895): Failed to
request irq. err = -28
mlx5_core 0000:05:00.0: mlx5_irq_alloc:293:(pid 28895): Failed to
request irq. err = -28
general protection fault, probably for non-canonical address
0xe277a58fde16f291: 0000 [#1] SMP NOPTI

RIP: 0010:free_irq_cpu_rmap+0x23/0x7d
Call Trace:
   <TASK>
   ? show_trace_log_lvl+0x1d6/0x2f9
   ? show_trace_log_lvl+0x1d6/0x2f9
   ? mlx5_irq_alloc.cold+0x5d/0xf3 [mlx5_core]
   ? __die_body.cold+0x8/0xa
   ? die_addr+0x39/0x53
   ? exc_general_protection+0x1c4/0x3e9
   ? dev_vprintk_emit+0x5f/0x90
   ? asm_exc_general_protection+0x22/0x27
   ? free_irq_cpu_rmap+0x23/0x7d
   mlx5_irq_alloc.cold+0x5d/0xf3 [mlx5_core]
   irq_pool_request_vector+0x7d/0x90 [mlx5_core]
   mlx5_irq_request+0x2e/0xe0 [mlx5_core]
   mlx5_irq_request_vector+0xad/0xf7 [mlx5_core]
   comp_irq_request_pci+0x64/0xf0 [mlx5_core]
   create_comp_eq+0x71/0x385 [mlx5_core]
   ? mlx5e_open_xdpsq+0x11c/0x230 [mlx5_core]
   mlx5_comp_eqn_get+0x72/0x90 [mlx5_core]
   ? xas_load+0x8/0x91
   mlx5_comp_irqn_get+0x40/0x90 [mlx5_core]
   mlx5e_open_channel+0x7d/0x3c7 [mlx5_core]
   mlx5e_open_channels+0xad/0x250 [mlx5_core]
   mlx5e_open_locked+0x3e/0x110 [mlx5_core]
   mlx5e_open+0x23/0x70 [mlx5_core]
   __dev_open+0xf1/0x1a5
   __dev_change_flags+0x1e1/0x249
   dev_change_flags+0x21/0x5c
   do_setlink+0x28b/0xcc4
   ? __nla_parse+0x22/0x3d
   ? inet6_validate_link_af+0x6b/0x108
   ? cpumask_next+0x1f/0x35
   ? __snmp6_fill_stats64.constprop.0+0x66/0x107
   ? __nla_validate_parse+0x48/0x1e6
   __rtnl_newlink+0x5ff/0xa57
   ? kmem_cache_alloc_trace+0x164/0x2ce
   rtnl_newlink+0x44/0x6e
   rtnetlink_rcv_msg+0x2bb/0x362
   ? __netlink_sendskb+0x4c/0x6c
   ? netlink_unicast+0x28f/0x2ce
   ? rtnl_calcit.isra.0+0x150/0x146
   netlink_rcv_skb+0x5f/0x112
   netlink_unicast+0x213/0x2ce
   netlink_sendmsg+0x24f/0x4d9
   __sock_sendmsg+0x65/0x6a
   ____sys_sendmsg+0x28f/0x2c9
   ? import_iovec+0x17/0x2b
   ___sys_sendmsg+0x97/0xe0
   __sys_sendmsg+0x81/0xd8
   do_syscall_64+0x35/0x87
   entry_SYSCALL_64_after_hwframe+0x6e/0x0
RIP: 0033:0x7fc328603727
Code: c3 66 90 41 54 41 89 d4 55 48 89 f5 53 89 fb 48 83 ec 10 e8 0b ed
ff ff 44 89 e2 48 89 ee 89 df 41 89 c0 b8 2e 00 00 00 0f 05 <48> 3d 00
f0 ff ff 77 35 44 89 c7 48 89 44 24 08 e8 44 ed ff ff 48
RSP: 002b:00007ffe8eb3f1a0 EFLAGS: 00000293 ORIG_RAX: 000000000000002e
RAX: ffffffffffffffda RBX: 000000000000000d RCX: 00007fc328603727
RDX: 0000000000000000 RSI: 00007ffe8eb3f1f0 RDI: 000000000000000d
RBP: 00007ffe8eb3f1f0 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000293 R12: 0000000000000000
R13: 0000000000000000 R14: 00007ffe8eb3f3c8 R15: 00007ffe8eb3f3bc
   </TASK>
---[ end trace f43ce73c3c2b13a2 ]---
RIP: 0010:free_irq_cpu_rmap+0x23/0x7d
Code: 0f 1f 80 00 00 00 00 48 85 ff 74 6b 55 48 89 fd 53 66 83 7f 06 00
74 24 31 db 48 8b 55 08 0f b7 c3 48 8b 04 c2 48 85 c0 74 09 <8b> 38 31
f6 e8 c4 0a b8 ff 83 c3 01 66 3b 5d 06 72 de b8 ff ff ff
RSP: 0018:ff384881640eaca0 EFLAGS: 00010282
RAX: e277a58fde16f291 RBX: 0000000000000000 RCX: 0000000000000000
RDX: ff2335e2e20b3600 RSI: 0000000000000000 RDI: ff2335e2e20b3400
RBP: ff2335e2e20b3400 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 00000000ffffffe4 R12: ff384881640ead88
R13: ff2335c3760751e0 R14: ff2335e2e1672200 R15: ff2335c3760751f8
FS:  00007fc32ac22480(0000) GS:ff2335e2d6e00000(0000)
knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f651ab54000 CR3: 00000029f1206003 CR4: 0000000000771ef0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
PKRU: 55555554
Kernel panic - not syncing: Fatal exception
Kernel Offset: 0x1dc00000 from 0xffffffff81000000 (relocation range:
0xffffffff80000000-0xffffffffbfffffff)
kvm-guest: disable async PF for cpu 0

Fixes: 3354822 ("net/mlx5: Use dynamic msix vectors allocation")
Signed-off-by: Mohith Kumar Thummaluru<mohith.k.kumar.thummaluru@oracle.com>
Tested-by: Mohith Kumar Thummaluru<mohith.k.kumar.thummaluru@oracle.com>
Reviewed-by: Moshe Shemesh <moshe@nvidia.com>
Reviewed-by: Shay Drori <shayd@nvidia.com>
Signed-off-by: Pradyumn Rahar <pradyumn.rahar@oracle.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/1763381768-1234998-1-git-send-email-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
CVE-2025-40250
Signed-off-by: Bethany <bethany.jamison@canonical.com>
Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
Signed-off-by: Edoardo Canepa <edoardo.canepa@canonical.com>
JiandiAnNVIDIA pushed a commit that referenced this pull request Apr 15, 2026
BugLink: https://bugs.launchpad.net/bugs/2137723

[ Upstream commit 830d68f ]

The following splat was reported:

    Unable to handle kernel NULL pointer dereference at virtual address 0000000000000010
    Mem abort info:
      ESR = 0x0000000096000004
      EC = 0x25: DABT (current EL), IL = 32 bits
      SET = 0, FnV = 0
      EA = 0, S1PTW = 0
      FSC = 0x04: level 0 translation fault
    Data abort info:
      ISV = 0, ISS = 0x00000004, ISS2 = 0x00000000
      CM = 0, WnR = 0, TnD = 0, TagAccess = 0
      GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0
    user pgtable: 4k pages, 48-bit VAs, pgdp=00000008d0fd8000
    [0000000000000010] pgd=0000000000000000, p4d=0000000000000000
    Internal error: Oops: 0000000096000004 [#1]  SMP
    CPU: 5 UID: 1000 PID: 149076 Comm: Xwayland Tainted: G S                  6.16.0-rc2-00809-g0b6974bb4134-dirty NVIDIA#367 PREEMPT
    Tainted: [S]=CPU_OUT_OF_SPEC
    Hardware name: Qualcomm Technologies, Inc. SM8650 HDK (DT)
    pstate: 83400005 (Nzcv daif +PAN -UAO +TCO +DIT -SSBS BTYPE=--)
    pc : build_detached_freelist+0x28/0x224
    lr : kmem_cache_free_bulk.part.0+0x38/0x244
    sp : ffff000a508c7a20
    x29: ffff000a508c7a20 x28: ffff000a508c7d50 x27: ffffc4e49d16f350
    x26: 0000000000000058 x25: 00000000fffffffc x24: 0000000000000000
    x23: ffff00098c4e1450 x22: 00000000fffffffc x21: 0000000000000000
    x20: ffff000a508c7af8 x19: 0000000000000002 x18: 00000000000003e8
    x17: ffff000809523850 x16: ffff000809523820 x15: 0000000000401640
    x14: ffff000809371140 x13: 0000000000000130 x12: ffff0008b5711e30
    x11: 00000000001058fa x10: 0000000000000a80 x9 : ffff000a508c7940
    x8 : ffff000809371ba0 x7 : 781fffe033087fff x6 : 0000000000000000
    x5 : ffff0008003cd000 x4 : 781fffe033083fff x3 : ffff000a508c7af8
    x2 : fffffdffc0000000 x1 : 0001000000000000 x0 : ffff0008001a6a00
    Call trace:
     build_detached_freelist+0x28/0x224 (P)
     kmem_cache_free_bulk.part.0+0x38/0x244
     kmem_cache_free_bulk+0x10/0x1c
     msm_iommu_pagetable_prealloc_cleanup+0x3c/0xd0
     msm_vma_job_free+0x30/0x240
     msm_ioctl_vm_bind+0x1d0/0x9a0
     drm_ioctl_kernel+0x84/0x104
     drm_ioctl+0x358/0x4d4
     __arm64_sys_ioctl+0x8c/0xe0
     invoke_syscall+0x44/0x100
     el0_svc_common.constprop.0+0x3c/0xe0
     do_el0_svc+0x18/0x20
     el0_svc+0x30/0x100
     el0t_64_sync_handler+0x104/0x130
     el0t_64_sync+0x170/0x174
    Code: aa0203f5 b26287e2 f2dfbfe2 aa0303f4 (f8737ab6)
    ---[ end trace 0000000000000000 ]---

Since msm_vma_job_free() is called directly from the ioctl, this looks
like an error path cleanup issue.  Which I think results from
prealloc_cleanup() called without a preceding successful
prealloc_allocate() call.  So handle that case better.

Reported-by: Connor Abbott <cwabbott0@gmail.com>
Signed-off-by: Rob Clark <robin.clark@oss.qualcomm.com>
Patchwork: https://patchwork.freedesktop.org/patch/678677/
Message-ID: <20251006153542.419998-1-robin.clark@oss.qualcomm.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
CVE-2025-40247
Signed-off-by: Bethany <bethany.jamison@canonical.com>
Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
Signed-off-by: Edoardo Canepa <edoardo.canepa@canonical.com>
JiandiAnNVIDIA pushed a commit that referenced this pull request Apr 15, 2026
… NULL on error

BugLink: https://bugs.launchpad.net/bugs/2137723

[ Upstream commit 90a8830 ]

Make knav_dma_open_channel consistently return NULL on error instead
of ERR_PTR. Currently the header include/linux/soc/ti/knav_dma.h
returns NULL when the driver is disabled, but the driver
implementation does not even return NULL or ERR_PTR on failure,
causing inconsistency in the users. This results in a crash in
netcp_free_navigator_resources as followed (trimmed):

Unhandled fault: alignment exception (0x221) at 0xfffffff2
[fffffff2] *pgd=80000800207003, *pmd=82ffda003, *pte=00000000
Internal error: : 221 [#1] SMP ARM
Modules linked in:
CPU: 0 UID: 0 PID: 1 Comm: swapper/0 Not tainted 6.17.0-rc7 #1 NONE
Hardware name: Keystone
PC is at knav_dma_close_channel+0x30/0x19c
LR is at netcp_free_navigator_resources+0x2c/0x28c

[... TRIM...]

Call trace:
 knav_dma_close_channel from netcp_free_navigator_resources+0x2c/0x28c
 netcp_free_navigator_resources from netcp_ndo_open+0x430/0x46c
 netcp_ndo_open from __dev_open+0x114/0x29c
 __dev_open from __dev_change_flags+0x190/0x208
 __dev_change_flags from netif_change_flags+0x1c/0x58
 netif_change_flags from dev_change_flags+0x38/0xa0
 dev_change_flags from ip_auto_config+0x2c4/0x11f0
 ip_auto_config from do_one_initcall+0x58/0x200
 do_one_initcall from kernel_init_freeable+0x1cc/0x238
 kernel_init_freeable from kernel_init+0x1c/0x12c
 kernel_init from ret_from_fork+0x14/0x38
[... TRIM...]

Standardize the error handling by making the function return NULL on
all error conditions. The API is used in just the netcp_core.c so the
impact is limited.

Note, this change, in effect reverts commit 5b6cb43 ("net:
ethernet: ti: netcp_core: return error while dma channel open issue"),
but provides a less error prone implementation.

Suggested-by: Simon Horman <horms@kernel.org>
Suggested-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Nishanth Menon <nm@ti.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Link: https://patch.msgid.link/20251103162811.3730055-1-nm@ti.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
CVE-2025-68220
Signed-off-by: Bethany <bethany.jamison@canonical.com>
Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
Signed-off-by: Edoardo Canepa <edoardo.canepa@canonical.com>
JiandiAnNVIDIA pushed a commit that referenced this pull request Apr 15, 2026
BugLink: https://bugs.launchpad.net/bugs/2138824

[ Upstream commit 0ebc27a ]

Since commit 30f241f ("xsk: Fix immature cq descriptor
production"), the descriptor number is stored in skb control block and
xsk_cq_submit_addr_locked() relies on it to put the umem addrs onto
pool's completion queue.

skb control block shouldn't be used for this purpose as after transmit
xsk doesn't have control over it and other subsystems could use it. This
leads to the following kernel panic due to a NULL pointer dereference.

 BUG: kernel NULL pointer dereference, address: 0000000000000000
 #PF: supervisor read access in kernel mode
 #PF: error_code(0x0000) - not-present page
 PGD 0 P4D 0
 Oops: Oops: 0000 [#1] SMP NOPTI
 CPU: 2 UID: 1 PID: 927 Comm: p4xsk.bin Not tainted 6.16.12+deb14-cloud-amd64 #1 PREEMPT(lazy)  Debian 6.16.12-1
 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.17.0-debian-1.17.0-1 04/01/2014
 RIP: 0010:xsk_destruct_skb+0xd0/0x180
 [...]
 Call Trace:
  <IRQ>
  ? napi_complete_done+0x7a/0x1a0
  ip_rcv_core+0x1bb/0x340
  ip_rcv+0x30/0x1f0
  __netif_receive_skb_one_core+0x85/0xa0
  process_backlog+0x87/0x130
  __napi_poll+0x28/0x180
  net_rx_action+0x339/0x420
  handle_softirqs+0xdc/0x320
  ? handle_edge_irq+0x90/0x1e0
  do_softirq.part.0+0x3b/0x60
  </IRQ>
  <TASK>
  __local_bh_enable_ip+0x60/0x70
  __dev_direct_xmit+0x14e/0x1f0
  __xsk_generic_xmit+0x482/0xb70
  ? __remove_hrtimer+0x41/0xa0
  ? __xsk_generic_xmit+0x51/0xb70
  ? _raw_spin_unlock_irqrestore+0xe/0x40
  xsk_sendmsg+0xda/0x1c0
  __sys_sendto+0x1ee/0x200
  __x64_sys_sendto+0x24/0x30
  do_syscall_64+0x84/0x2f0
  ? __pfx_pollwake+0x10/0x10
  ? __rseq_handle_notify_resume+0xad/0x4c0
  ? restore_fpregs_from_fpstate+0x3c/0x90
  ? switch_fpu_return+0x5b/0xe0
  ? do_syscall_64+0x204/0x2f0
  ? do_syscall_64+0x204/0x2f0
  ? do_syscall_64+0x204/0x2f0
  entry_SYSCALL_64_after_hwframe+0x76/0x7e
  </TASK>
 [...]
 Kernel panic - not syncing: Fatal exception in interrupt
 Kernel Offset: 0x1c000000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)

Instead use the skb destructor_arg pointer along with pointer tagging.
As pointers are always aligned to 8B, use the bottom bit to indicate
whether this a single address or an allocated struct containing several
addresses.

Fixes: 30f241f ("xsk: Fix immature cq descriptor production")
Closes: https://lore.kernel.org/netdev/0435b904-f44f-48f8-afb0-68868474bf1c@nop.hu/
Suggested-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Fernando Fernandez Mancera <fmancera@suse.de>
Reviewed-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Reviewed-by: Jason Xing <kerneljasonxing@gmail.com>
Link: https://patch.msgid.link/20251124171409.3845-1-fmancera@suse.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
CVE-2025-40290
Signed-off-by: Bethany <bethany.jamison@canonical.com>
Signed-off-by: Edoardo Canepa <edoardo.canepa@canonical.com>
JiandiAnNVIDIA pushed a commit that referenced this pull request Apr 15, 2026
BugLink: https://bugs.launchpad.net/bugs/2138824

commit 43962db upstream.

The crash in process_v2_sparse_read() for fscrypt-encrypted directories
has been reported. Issue takes place for Ceph msgr2 protocol in secure
mode. It can be reproduced by the steps:

sudo mount -t ceph :/ /mnt/cephfs/ -o name=admin,fs=cephfs,ms_mode=secure

(1) mkdir /mnt/cephfs/fscrypt-test-3
(2) cp area_decrypted.tar /mnt/cephfs/fscrypt-test-3
(3) fscrypt encrypt --source=raw_key --key=./my.key /mnt/cephfs/fscrypt-test-3
(4) fscrypt lock /mnt/cephfs/fscrypt-test-3
(5) fscrypt unlock --key=my.key /mnt/cephfs/fscrypt-test-3
(6) cat /mnt/cephfs/fscrypt-test-3/area_decrypted.tar
(7) Issue has been triggered

[  408.072247] ------------[ cut here ]------------
[  408.072251] WARNING: CPU: 1 PID: 392 at net/ceph/messenger_v2.c:865
ceph_con_v2_try_read+0x4b39/0x72f0
[  408.072267] Modules linked in: intel_rapl_msr intel_rapl_common
intel_uncore_frequency_common intel_pmc_core pmt_telemetry pmt_discovery
pmt_class intel_pmc_ssram_telemetry intel_vsec kvm_intel joydev kvm irqbypass
polyval_clmulni ghash_clmulni_intel aesni_intel rapl input_leds psmouse
serio_raw i2c_piix4 vga16fb bochs vgastate i2c_smbus floppy mac_hid qemu_fw_cfg
pata_acpi sch_fq_codel rbd msr parport_pc ppdev lp parport efi_pstore
[  408.072304] CPU: 1 UID: 0 PID: 392 Comm: kworker/1:3 Not tainted 6.17.0-rc7+
[  408.072307] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
1.17.0-5.fc42 04/01/2014
[  408.072310] Workqueue: ceph-msgr ceph_con_workfn
[  408.072314] RIP: 0010:ceph_con_v2_try_read+0x4b39/0x72f0
[  408.072317] Code: c7 c1 20 f0 d4 ae 50 31 d2 48 c7 c6 60 27 d5 ae 48 c7 c7 f8
8e 6f b0 68 60 38 d5 ae e8 00 47 61 fe 48 83 c4 18 e9 ac fc ff ff <0f> 0b e9 06
fe ff ff 4c 8b 9d 98 fd ff ff 0f 84 64 e7 ff ff 89 85
[  408.072319] RSP: 0018:ffff88811c3e7a30 EFLAGS: 00010246
[  408.072322] RAX: ffffed1024874c6f RBX: ffffea00042c2b40 RCX: 0000000000000f38
[  408.072324] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
[  408.072325] RBP: ffff88811c3e7ca8 R08: 0000000000000000 R09: 00000000000000c8
[  408.072326] R10: 00000000000000c8 R11: 0000000000000000 R12: 00000000000000c8
[  408.072327] R13: dffffc0000000000 R14: ffff8881243a6030 R15: 0000000000003000
[  408.072329] FS:  0000000000000000(0000) GS:ffff88823eadf000(0000)
knlGS:0000000000000000
[  408.072331] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  408.072332] CR2: 000000c0003c6000 CR3: 000000010c106005 CR4: 0000000000772ef0
[  408.072336] PKRU: 55555554
[  408.072337] Call Trace:
[  408.072338]  <TASK>
[  408.072340]  ? sched_clock_noinstr+0x9/0x10
[  408.072344]  ? __pfx_ceph_con_v2_try_read+0x10/0x10
[  408.072347]  ? _raw_spin_unlock+0xe/0x40
[  408.072349]  ? finish_task_switch.isra.0+0x15d/0x830
[  408.072353]  ? __kasan_check_write+0x14/0x30
[  408.072357]  ? mutex_lock+0x84/0xe0
[  408.072359]  ? __pfx_mutex_lock+0x10/0x10
[  408.072361]  ceph_con_workfn+0x27e/0x10e0
[  408.072364]  ? metric_delayed_work+0x311/0x2c50
[  408.072367]  process_one_work+0x611/0xe20
[  408.072371]  ? __kasan_check_write+0x14/0x30
[  408.072373]  worker_thread+0x7e3/0x1580
[  408.072375]  ? __pfx__raw_spin_lock_irqsave+0x10/0x10
[  408.072378]  ? __pfx_worker_thread+0x10/0x10
[  408.072381]  kthread+0x381/0x7a0
[  408.072383]  ? __pfx__raw_spin_lock_irq+0x10/0x10
[  408.072385]  ? __pfx_kthread+0x10/0x10
[  408.072387]  ? __kasan_check_write+0x14/0x30
[  408.072389]  ? recalc_sigpending+0x160/0x220
[  408.072392]  ? _raw_spin_unlock_irq+0xe/0x50
[  408.072394]  ? calculate_sigpending+0x78/0xb0
[  408.072395]  ? __pfx_kthread+0x10/0x10
[  408.072397]  ret_from_fork+0x2b6/0x380
[  408.072400]  ? __pfx_kthread+0x10/0x10
[  408.072402]  ret_from_fork_asm+0x1a/0x30
[  408.072406]  </TASK>
[  408.072407] ---[ end trace 0000000000000000 ]---
[  408.072418] Oops: general protection fault, probably for non-canonical
address 0xdffffc0000000000: 0000 [#1] SMP KASAN NOPTI
[  408.072984] KASAN: null-ptr-deref in range [0x0000000000000000-
0x0000000000000007]
[  408.073350] CPU: 1 UID: 0 PID: 392 Comm: kworker/1:3 Tainted: G        W
6.17.0-rc7+ #1 PREEMPT(voluntary)
[  408.073886] Tainted: [W]=WARN
[  408.074042] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
1.17.0-5.fc42 04/01/2014
[  408.074468] Workqueue: ceph-msgr ceph_con_workfn
[  408.074694] RIP: 0010:ceph_msg_data_advance+0x79/0x1a80
[  408.074976] Code: fc ff df 49 8d 77 08 48 c1 ee 03 80 3c 16 00 0f 85 07 11 00
00 48 ba 00 00 00 00 00 fc ff df 49 8b 5f 08 48 89 de 48 c1 ee 03 <0f> b6 14 16
84 d2 74 09 80 fa 03 0f 8e 0f 0e 00 00 8b 13 83 fa 03
[  408.075884] RSP: 0018:ffff88811c3e7990 EFLAGS: 00010246
[  408.076305] RAX: ffff8881243a6388 RBX: 0000000000000000 RCX: 0000000000000000
[  408.076909] RDX: dffffc0000000000 RSI: 0000000000000000 RDI: ffff8881243a6378
[  408.077466] RBP: ffff88811c3e7a20 R08: 0000000000000000 R09: 00000000000000c8
[  408.078034] R10: ffff8881243a6388 R11: 0000000000000000 R12: ffffed1024874c71
[  408.078575] R13: dffffc0000000000 R14: ffff8881243a6030 R15: ffff8881243a6378
[  408.079159] FS:  0000000000000000(0000) GS:ffff88823eadf000(0000)
knlGS:0000000000000000
[  408.079736] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  408.080039] CR2: 000000c0003c6000 CR3: 000000010c106005 CR4: 0000000000772ef0
[  408.080376] PKRU: 55555554
[  408.080513] Call Trace:
[  408.080630]  <TASK>
[  408.080729]  ceph_con_v2_try_read+0x49b9/0x72f0
[  408.081115]  ? __pfx_ceph_con_v2_try_read+0x10/0x10
[  408.081348]  ? _raw_spin_unlock+0xe/0x40
[  408.081538]  ? finish_task_switch.isra.0+0x15d/0x830
[  408.081768]  ? __kasan_check_write+0x14/0x30
[  408.081986]  ? mutex_lock+0x84/0xe0
[  408.082160]  ? __pfx_mutex_lock+0x10/0x10
[  408.082343]  ceph_con_workfn+0x27e/0x10e0
[  408.082529]  ? metric_delayed_work+0x311/0x2c50
[  408.082737]  process_one_work+0x611/0xe20
[  408.082948]  ? __kasan_check_write+0x14/0x30
[  408.083156]  worker_thread+0x7e3/0x1580
[  408.083331]  ? __pfx__raw_spin_lock_irqsave+0x10/0x10
[  408.083557]  ? __pfx_worker_thread+0x10/0x10
[  408.083751]  kthread+0x381/0x7a0
[  408.083922]  ? __pfx__raw_spin_lock_irq+0x10/0x10
[  408.084139]  ? __pfx_kthread+0x10/0x10
[  408.084310]  ? __kasan_check_write+0x14/0x30
[  408.084510]  ? recalc_sigpending+0x160/0x220
[  408.084708]  ? _raw_spin_unlock_irq+0xe/0x50
[  408.084917]  ? calculate_sigpending+0x78/0xb0
[  408.085138]  ? __pfx_kthread+0x10/0x10
[  408.085335]  ret_from_fork+0x2b6/0x380
[  408.085525]  ? __pfx_kthread+0x10/0x10
[  408.085720]  ret_from_fork_asm+0x1a/0x30
[  408.085922]  </TASK>
[  408.086036] Modules linked in: intel_rapl_msr intel_rapl_common
intel_uncore_frequency_common intel_pmc_core pmt_telemetry pmt_discovery
pmt_class intel_pmc_ssram_telemetry intel_vsec kvm_intel joydev kvm irqbypass
polyval_clmulni ghash_clmulni_intel aesni_intel rapl input_leds psmouse
serio_raw i2c_piix4 vga16fb bochs vgastate i2c_smbus floppy mac_hid qemu_fw_cfg
pata_acpi sch_fq_codel rbd msr parport_pc ppdev lp parport efi_pstore
[  408.087778] ---[ end trace 0000000000000000 ]---
[  408.088007] RIP: 0010:ceph_msg_data_advance+0x79/0x1a80
[  408.088260] Code: fc ff df 49 8d 77 08 48 c1 ee 03 80 3c 16 00 0f 85 07 11 00
00 48 ba 00 00 00 00 00 fc ff df 49 8b 5f 08 48 89 de 48 c1 ee 03 <0f> b6 14 16
84 d2 74 09 80 fa 03 0f 8e 0f 0e 00 00 8b 13 83 fa 03
[  408.089118] RSP: 0018:ffff88811c3e7990 EFLAGS: 00010246
[  408.089357] RAX: ffff8881243a6388 RBX: 0000000000000000 RCX: 0000000000000000
[  408.089678] RDX: dffffc0000000000 RSI: 0000000000000000 RDI: ffff8881243a6378
[  408.090020] RBP: ffff88811c3e7a20 R08: 0000000000000000 R09: 00000000000000c8
[  408.090360] R10: ffff8881243a6388 R11: 0000000000000000 R12: ffffed1024874c71
[  408.090687] R13: dffffc0000000000 R14: ffff8881243a6030 R15: ffff8881243a6378
[  408.091035] FS:  0000000000000000(0000) GS:ffff88823eadf000(0000)
knlGS:0000000000000000
[  408.091452] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  408.092015] CR2: 000000c0003c6000 CR3: 000000010c106005 CR4: 0000000000772ef0
[  408.092530] PKRU: 55555554
[  417.112915]
==================================================================
[  417.113491] BUG: KASAN: slab-use-after-free in
__mutex_lock.constprop.0+0x1522/0x1610
[  417.114014] Read of size 4 at addr ffff888124870034 by task kworker/2:0/4951

[  417.114587] CPU: 2 UID: 0 PID: 4951 Comm: kworker/2:0 Tainted: G      D W
6.17.0-rc7+ #1 PREEMPT(voluntary)
[  417.114592] Tainted: [D]=DIE, [W]=WARN
[  417.114593] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
1.17.0-5.fc42 04/01/2014
[  417.114596] Workqueue: events handle_timeout
[  417.114601] Call Trace:
[  417.114602]  <TASK>
[  417.114604]  dump_stack_lvl+0x5c/0x90
[  417.114610]  print_report+0x171/0x4dc
[  417.114613]  ? __pfx__raw_spin_lock_irqsave+0x10/0x10
[  417.114617]  ? kasan_complete_mode_report_info+0x80/0x220
[  417.114621]  kasan_report+0xbd/0x100
[  417.114625]  ? __mutex_lock.constprop.0+0x1522/0x1610
[  417.114628]  ? __mutex_lock.constprop.0+0x1522/0x1610
[  417.114630]  __asan_report_load4_noabort+0x14/0x30
[  417.114633]  __mutex_lock.constprop.0+0x1522/0x1610
[  417.114635]  ? queue_con_delay+0x8d/0x200
[  417.114638]  ? __pfx___mutex_lock.constprop.0+0x10/0x10
[  417.114641]  ? __send_subscribe+0x529/0xb20
[  417.114644]  __mutex_lock_slowpath+0x13/0x20
[  417.114646]  mutex_lock+0xd4/0xe0
[  417.114649]  ? __pfx_mutex_lock+0x10/0x10
[  417.114652]  ? ceph_monc_renew_subs+0x2a/0x40
[  417.114654]  ceph_con_keepalive+0x22/0x110
[  417.114656]  handle_timeout+0x6b3/0x11d0
[  417.114659]  ? _raw_spin_unlock_irq+0xe/0x50
[  417.114662]  ? __pfx_handle_timeout+0x10/0x10
[  417.114664]  ? queue_delayed_work_on+0x8e/0xa0
[  417.114669]  process_one_work+0x611/0xe20
[  417.114672]  ? __kasan_check_write+0x14/0x30
[  417.114676]  worker_thread+0x7e3/0x1580
[  417.114678]  ? __pfx__raw_spin_lock_irqsave+0x10/0x10
[  417.114682]  ? __pfx_sched_setscheduler_nocheck+0x10/0x10
[  417.114687]  ? __pfx_worker_thread+0x10/0x10
[  417.114689]  kthread+0x381/0x7a0
[  417.114692]  ? __pfx__raw_spin_lock_irq+0x10/0x10
[  417.114694]  ? __pfx_kthread+0x10/0x10
[  417.114697]  ? __kasan_check_write+0x14/0x30
[  417.114699]  ? recalc_sigpending+0x160/0x220
[  417.114703]  ? _raw_spin_unlock_irq+0xe/0x50
[  417.114705]  ? calculate_sigpending+0x78/0xb0
[  417.114707]  ? __pfx_kthread+0x10/0x10
[  417.114710]  ret_from_fork+0x2b6/0x380
[  417.114713]  ? __pfx_kthread+0x10/0x10
[  417.114715]  ret_from_fork_asm+0x1a/0x30
[  417.114720]  </TASK>

[  417.125171] Allocated by task 2:
[  417.125333]  kasan_save_stack+0x26/0x60
[  417.125522]  kasan_save_track+0x14/0x40
[  417.125742]  kasan_save_alloc_info+0x39/0x60
[  417.125945]  __kasan_slab_alloc+0x8b/0xb0
[  417.126133]  kmem_cache_alloc_node_noprof+0x13b/0x460
[  417.126381]  copy_process+0x320/0x6250
[  417.126595]  kernel_clone+0xb7/0x840
[  417.126792]  kernel_thread+0xd6/0x120
[  417.126995]  kthreadd+0x85c/0xbe0
[  417.127176]  ret_from_fork+0x2b6/0x380
[  417.127378]  ret_from_fork_asm+0x1a/0x30

[  417.127692] Freed by task 0:
[  417.127851]  kasan_save_stack+0x26/0x60
[  417.128057]  kasan_save_track+0x14/0x40
[  417.128267]  kasan_save_free_info+0x3b/0x60
[  417.128491]  __kasan_slab_free+0x6c/0xa0
[  417.128708]  kmem_cache_free+0x182/0x550
[  417.128906]  free_task+0xeb/0x140
[  417.129070]  __put_task_struct+0x1d2/0x4f0
[  417.129259]  __put_task_struct_rcu_cb+0x15/0x20
[  417.129480]  rcu_do_batch+0x3d3/0xe70
[  417.129681]  rcu_core+0x549/0xb30
[  417.129839]  rcu_core_si+0xe/0x20
[  417.130005]  handle_softirqs+0x160/0x570
[  417.130190]  __irq_exit_rcu+0x189/0x1e0
[  417.130369]  irq_exit_rcu+0xe/0x20
[  417.130531]  sysvec_apic_timer_interrupt+0x9f/0xd0
[  417.130768]  asm_sysvec_apic_timer_interrupt+0x1b/0x20

[  417.131082] Last potentially related work creation:
[  417.131305]  kasan_save_stack+0x26/0x60
[  417.131484]  kasan_record_aux_stack+0xae/0xd0
[  417.131695]  __call_rcu_common+0xcd/0x14b0
[  417.131909]  call_rcu+0x31/0x50
[  417.132071]  delayed_put_task_struct+0x128/0x190
[  417.132295]  rcu_do_batch+0x3d3/0xe70
[  417.132478]  rcu_core+0x549/0xb30
[  417.132658]  rcu_core_si+0xe/0x20
[  417.132808]  handle_softirqs+0x160/0x570
[  417.132993]  __irq_exit_rcu+0x189/0x1e0
[  417.133181]  irq_exit_rcu+0xe/0x20
[  417.133353]  sysvec_apic_timer_interrupt+0x9f/0xd0
[  417.133584]  asm_sysvec_apic_timer_interrupt+0x1b/0x20

[  417.133921] Second to last potentially related work creation:
[  417.134183]  kasan_save_stack+0x26/0x60
[  417.134362]  kasan_record_aux_stack+0xae/0xd0
[  417.134566]  __call_rcu_common+0xcd/0x14b0
[  417.134782]  call_rcu+0x31/0x50
[  417.134929]  put_task_struct_rcu_user+0x58/0xb0
[  417.135143]  finish_task_switch.isra.0+0x5d3/0x830
[  417.135366]  __schedule+0xd30/0x5100
[  417.135534]  schedule_idle+0x5a/0x90
[  417.135712]  do_idle+0x25f/0x410
[  417.135871]  cpu_startup_entry+0x53/0x70
[  417.136053]  start_secondary+0x216/0x2c0
[  417.136233]  common_startup_64+0x13e/0x141

[  417.136894] The buggy address belongs to the object at ffff888124870000
                which belongs to the cache task_struct of size 10504
[  417.138122] The buggy address is located 52 bytes inside of
                freed 10504-byte region [ffff888124870000, ffff888124872908)

[  417.139465] The buggy address belongs to the physical page:
[  417.140016] page: refcount:0 mapcount:0 mapping:0000000000000000 index:0x0
pfn:0x124870
[  417.140789] head: order:3 mapcount:0 entire_mapcount:0 nr_pages_mapped:0
pincount:0
[  417.141519] memcg:ffff88811aa20e01
[  417.141874] anon flags:
0x17ffffc0000040(head|node=0|zone=2|lastcpupid=0x1fffff)
[  417.142600] page_type: f5(slab)
[  417.142922] raw: 0017ffffc0000040 ffff88810094f040 0000000000000000
dead000000000001
[  417.143554] raw: 0000000000000000 0000000000030003 00000000f5000000
ffff88811aa20e01
[  417.143954] head: 0017ffffc0000040 ffff88810094f040 0000000000000000
dead000000000001
[  417.144329] head: 0000000000000000 0000000000030003 00000000f5000000
ffff88811aa20e01
[  417.144710] head: 0017ffffc0000003 ffffea0004921c01 00000000ffffffff
00000000ffffffff
[  417.145106] head: ffffffffffffffff 0000000000000000 00000000ffffffff
0000000000000008
[  417.145485] page dumped because: kasan: bad access detected

[  417.145859] Memory state around the buggy address:
[  417.146094]  ffff88812486ff00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
fc
[  417.146439]  ffff88812486ff80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
fc
[  417.146791] >ffff888124870000: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb
fb
[  417.147145]                                      ^
[  417.147387]  ffff888124870080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
fb
[  417.147751]  ffff888124870100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
fb
[  417.148123]
==================================================================

First of all, we have warning in get_bvec_at() because
cursor->total_resid contains zero value. And, finally,
we have crash in ceph_msg_data_advance() because
cursor->data is NULL. It means that get_bvec_at()
receives not initialized ceph_msg_data_cursor structure
because data is NULL and total_resid contains zero.

Moreover, we don't have likewise issue for the case of
Ceph msgr1 protocol because ceph_msg_data_cursor_init()
has been called before reading sparse data.

This patch adds calling of ceph_msg_data_cursor_init()
in the beginning of process_v2_sparse_read() with
the goal to guarantee that logic of reading sparse data
works correctly for the case of Ceph msgr2 protocol.

Cc: stable@vger.kernel.org
Link: https://tracker.ceph.com/issues/73152
Signed-off-by: Viacheslav Dubeyko <Slava.Dubeyko@ibm.com>
Reviewed-by: Ilya Dryomov <idryomov@gmail.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
CVE-2025-68297
Signed-off-by: Bethany <bethany.jamison@canonical.com>
Signed-off-by: Edoardo Canepa <edoardo.canepa@canonical.com>
JiandiAnNVIDIA pushed a commit that referenced this pull request Apr 15, 2026
…ptcp_do_fastclose().

BugLink: https://bugs.launchpad.net/bugs/2138824

commit f07f4ea upstream.

syzbot reported divide-by-zero in __tcp_select_window() by
MPTCP socket. [0]

We had a similar issue for the bare TCP and fixed in commit
499350a ("tcp: initialize rcv_mss to TCP_MIN_MSS instead
of 0").

Let's apply the same fix to mptcp_do_fastclose().

[0]:
Oops: divide error: 0000 [#1] SMP KASAN PTI
CPU: 0 UID: 0 PID: 6068 Comm: syz.0.17 Not tainted syzkaller #0 PREEMPT(full)
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 10/25/2025
RIP: 0010:__tcp_select_window+0x824/0x1320 net/ipv4/tcp_output.c:3336
Code: ff ff ff 44 89 f1 d3 e0 89 c1 f7 d1 41 01 cc 41 21 c4 e9 a9 00 00 00 e8 ca 49 01 f8 e9 9c 00 00 00 e8 c0 49 01 f8 44 89 e0 99 <f7> 7c 24 1c 41 29 d4 48 bb 00 00 00 00 00 fc ff df e9 80 00 00 00
RSP: 0018:ffffc90003017640 EFLAGS: 00010293
RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffff88807b469e40
RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
RBP: ffffc90003017730 R08: ffff888033268143 R09: 1ffff1100664d028
R10: dffffc0000000000 R11: ffffed100664d029 R12: 0000000000000000
R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
FS:  000055557faa0500(0000) GS:ffff888126135000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f64a1912ff8 CR3: 0000000072122000 CR4: 00000000003526f0
Call Trace:
 <TASK>
 tcp_select_window net/ipv4/tcp_output.c:281 [inline]
 __tcp_transmit_skb+0xbc7/0x3aa0 net/ipv4/tcp_output.c:1568
 tcp_transmit_skb net/ipv4/tcp_output.c:1649 [inline]
 tcp_send_active_reset+0x2d1/0x5b0 net/ipv4/tcp_output.c:3836
 mptcp_do_fastclose+0x27e/0x380 net/mptcp/protocol.c:2793
 mptcp_disconnect+0x238/0x710 net/mptcp/protocol.c:3253
 mptcp_sendmsg_fastopen+0x2f8/0x580 net/mptcp/protocol.c:1776
 mptcp_sendmsg+0x1774/0x1980 net/mptcp/protocol.c:1855
 sock_sendmsg_nosec net/socket.c:727 [inline]
 __sock_sendmsg+0xe5/0x270 net/socket.c:742
 __sys_sendto+0x3bd/0x520 net/socket.c:2244
 __do_sys_sendto net/socket.c:2251 [inline]
 __se_sys_sendto net/socket.c:2247 [inline]
 __x64_sys_sendto+0xde/0x100 net/socket.c:2247
 do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
 do_syscall_64+0xfa/0xfa0 arch/x86/entry/syscall_64.c:94
 entry_SYSCALL_64_after_hwframe+0x77/0x7f
RIP: 0033:0x7f66e998f749
Code: ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 a8 ff ff ff f7 d8 64 89 01 48
RSP: 002b:00007ffff9acedb8 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
RAX: ffffffffffffffda RBX: 00007f66e9be5fa0 RCX: 00007f66e998f749
RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000003
RBP: 00007ffff9acee10 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000001
R13: 00007f66e9be5fa0 R14: 00007f66e9be5fa0 R15: 0000000000000006
 </TASK>

Fixes: ae15506 ("mptcp: fix duplicate reset on fastclose")
Reported-by: syzbot+3a92d359bc2ec6255a33@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/netdev/69260882.a70a0220.d98e3.00b4.GAE@google.com/
Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com>
Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Cc: stable@vger.kernel.org
Link: https://patch.msgid.link/20251125195331.309558-1-kuniyu@google.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Bethany <bethany.jamison@canonical.com>
Signed-off-by: Edoardo Canepa <edoardo.canepa@canonical.com>
JiandiAnNVIDIA pushed a commit that referenced this pull request Apr 15, 2026
BugLink: https://bugs.launchpad.net/bugs/2138824

commit eb9ac77 upstream.

A synchronous external abort occurs on the Renesas RZ/G3S SoC if unbind is
executed after the configuration sequence described above:

modprobe usb_f_ecm
modprobe libcomposite
modprobe configfs
cd /sys/kernel/config/usb_gadget
mkdir -p g1
cd g1
echo "0x1d6b" > idVendor
echo "0x0104" > idProduct
mkdir -p strings/0x409
echo "0123456789" > strings/0x409/serialnumber
echo "Renesas." > strings/0x409/manufacturer
echo "Ethernet Gadget" > strings/0x409/product
mkdir -p functions/ecm.usb0
mkdir -p configs/c.1
mkdir -p configs/c.1/strings/0x409
echo "ECM" > configs/c.1/strings/0x409/configuration

if [ ! -L configs/c.1/ecm.usb0 ]; then
        ln -s functions/ecm.usb0 configs/c.1
fi

echo 11e20000.usb > UDC
echo 11e20000.usb > /sys/bus/platform/drivers/renesas_usbhs/unbind

The displayed trace is as follows:

 Internal error: synchronous external abort: 0000000096000010 [#1] SMP
 CPU: 0 UID: 0 PID: 188 Comm: sh Tainted: G M 6.17.0-rc7-next-20250922-00010-g41050493b2bd NVIDIA#55 PREEMPT
 Tainted: [M]=MACHINE_CHECK
 Hardware name: Renesas SMARC EVK version 2 based on r9a08g045s33 (DT)
 pstate: 604000c5 (nZCv daIF +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
 pc : usbhs_sys_function_pullup+0x10/0x40 [renesas_usbhs]
 lr : usbhsg_update_pullup+0x3c/0x68 [renesas_usbhs]
 sp : ffff8000838b3920
 x29: ffff8000838b3920 x28: ffff00000d585780 x27: 0000000000000000
 x26: 0000000000000000 x25: 0000000000000000 x24: ffff00000c3e3810
 x23: ffff00000d5e5c80 x22: ffff00000d5e5d40 x21: 0000000000000000
 x20: 0000000000000000 x19: ffff00000d5e5c80 x18: 0000000000000020
 x17: 2e30303230316531 x16: 312d7968703a7968 x15: 3d454d414e5f4344
 x14: 000000000000002c x13: 0000000000000000 x12: 0000000000000000
 x11: ffff00000f358f38 x10: ffff00000f358db0 x9 : ffff00000b41f418
 x8 : 0101010101010101 x7 : 7f7f7f7f7f7f7f7f x6 : fefefeff6364626d
 x5 : 8080808000000000 x4 : 000000004b5ccb9d x3 : 0000000000000000
 x2 : 0000000000000000 x1 : ffff800083790000 x0 : ffff00000d5e5c80
 Call trace:
 usbhs_sys_function_pullup+0x10/0x40 [renesas_usbhs] (P)
 usbhsg_pullup+0x4c/0x7c [renesas_usbhs]
 usb_gadget_disconnect_locked+0x48/0xd4
 gadget_unbind_driver+0x44/0x114
 device_remove+0x4c/0x80
 device_release_driver_internal+0x1c8/0x224
 device_release_driver+0x18/0x24
 bus_remove_device+0xcc/0x10c
 device_del+0x14c/0x404
 usb_del_gadget+0x88/0xc0
 usb_del_gadget_udc+0x18/0x30
 usbhs_mod_gadget_remove+0x24/0x44 [renesas_usbhs]
 usbhs_mod_remove+0x20/0x30 [renesas_usbhs]
 usbhs_remove+0x98/0xdc [renesas_usbhs]
 platform_remove+0x20/0x30
 device_remove+0x4c/0x80
 device_release_driver_internal+0x1c8/0x224
 device_driver_detach+0x18/0x24
 unbind_store+0xb4/0xb8
 drv_attr_store+0x24/0x38
 sysfs_kf_write+0x7c/0x94
 kernfs_fop_write_iter+0x128/0x1b8
 vfs_write+0x2ac/0x350
 ksys_write+0x68/0xfc
 __arm64_sys_write+0x1c/0x28
 invoke_syscall+0x48/0x110
 el0_svc_common.constprop.0+0xc0/0xe0
 do_el0_svc+0x1c/0x28
 el0_svc+0x34/0xf0
 el0t_64_sync_handler+0xa0/0xe4
 el0t_64_sync+0x198/0x19c
 Code: 7100003f 1a9f07e1 531c6c22 f9400001 (79400021)
 ---[ end trace 0000000000000000 ]---
 note: sh[188] exited with irqs disabled
 note: sh[188] exited with preempt_count 1

The issue occurs because usbhs_sys_function_pullup(), which accesses the IP
registers, is executed after the USBHS clocks have been disabled. The
problem is reproducible on the Renesas RZ/G3S SoC starting with the
addition of module stop in the clock enable/disable APIs. With module stop
functionality enabled, a bus error is expected if a master accesses a
module whose clock has been stopped and module stop activated.

Disable the IP clocks at the end of remove.

Cc: stable <stable@kernel.org>
Fixes: f1407d5 ("usb: renesas_usbhs: Add Renesas USBHS common code")
Signed-off-by: Claudiu Beznea <claudiu.beznea.uj@bp.renesas.com>
Link: https://patch.msgid.link/20251027140741.557198-1-claudiu.beznea.uj@bp.renesas.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
CVE-2025-68327
Signed-off-by: Bethany <bethany.jamison@canonical.com>
Signed-off-by: Edoardo Canepa <edoardo.canepa@canonical.com>
JiandiAnNVIDIA pushed a commit that referenced this pull request Apr 15, 2026
BugLink: https://bugs.launchpad.net/bugs/2138824

commit 3ce62c1 upstream.

[WHAT]
IGT kms_cursor_legacy's long-nonblocking-modeset-vs-cursor-atomic
fails with NULL pointer dereference. This can be reproduced with
both an eDP panel and a DP monitors connected.

 BUG: kernel NULL pointer dereference, address: 0000000000000000
 #PF: supervisor read access in kernel mode
 #PF: error_code(0x0000) - not-present page
 PGD 0 P4D 0
 Oops: Oops: 0000 [#1] SMP NOPTI
 CPU: 13 UID: 0 PID: 2960 Comm: kms_cursor_lega Not tainted
6.16.0-99-custom NVIDIA#8 PREEMPT(voluntary)
 Hardware name: AMD ........
 RIP: 0010:dc_stream_get_scanoutpos+0x34/0x130 [amdgpu]
 Code: 57 4d 89 c7 41 56 49 89 ce 41 55 49 89 d5 41 54 49
 89 fc 53 48 83 ec 18 48 8b 87 a0 64 00 00 48 89 75 d0 48 c7 c6 e0 41 30
 c2 <48> 8b 38 48 8b 9f 68 06 00 00 e8 8d d7 fd ff 31 c0 48 81 c3 e0 02
 RSP: 0018:ffffd0f3c2bd7608 EFLAGS: 00010292
 RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffffd0f3c2bd7668
 RDX: ffffd0f3c2bd7664 RSI: ffffffffc23041e0 RDI: ffff8b32494b8000
 RBP: ffffd0f3c2bd7648 R08: ffffd0f3c2bd766c R09: ffffd0f3c2bd7760
 R10: ffffd0f3c2bd7820 R11: 0000000000000000 R12: ffff8b32494b8000
 R13: ffffd0f3c2bd7664 R14: ffffd0f3c2bd7668 R15: ffffd0f3c2bd766c
 FS:  000071f631b68700(0000) GS:ffff8b399f114000(0000)
knlGS:0000000000000000
 CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
 CR2: 0000000000000000 CR3: 00000001b8105000 CR4: 0000000000f50ef0
 PKRU: 55555554
 Call Trace:
 <TASK>
 dm_crtc_get_scanoutpos+0xd7/0x180 [amdgpu]
 amdgpu_display_get_crtc_scanoutpos+0x86/0x1c0 [amdgpu]
 ? __pfx_amdgpu_crtc_get_scanout_position+0x10/0x10[amdgpu]
 amdgpu_crtc_get_scanout_position+0x27/0x50 [amdgpu]
 drm_crtc_vblank_helper_get_vblank_timestamp_internal+0xf7/0x400
 drm_crtc_vblank_helper_get_vblank_timestamp+0x1c/0x30
 drm_crtc_get_last_vbltimestamp+0x55/0x90
 drm_crtc_next_vblank_start+0x45/0xa0
 drm_atomic_helper_wait_for_fences+0x81/0x1f0
 ...

Cc: Mario Limonciello <mario.limonciello@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Aurabindo Pillai <aurabindo.pillai@amd.com>
Signed-off-by: Alex Hung <alex.hung@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 621e55f)
Cc: stable@vger.kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
CVE-2025-68286
Signed-off-by: Bethany <bethany.jamison@canonical.com>
Signed-off-by: Edoardo Canepa <edoardo.canepa@canonical.com>
JiandiAnNVIDIA pushed a commit that referenced this pull request Apr 15, 2026
…em corrupted

BugLink: https://bugs.launchpad.net/bugs/2139373

commit 986835b upstream.

There's issue when file system corrupted:
------------[ cut here ]------------
kernel BUG at fs/jbd2/transaction.c:1289!
Oops: invalid opcode: 0000 [#1] SMP KASAN PTI
CPU: 5 UID: 0 PID: 2031 Comm: mkdir Not tainted 6.18.0-rc1-next
RIP: 0010:jbd2_journal_get_create_access+0x3b6/0x4d0
RSP: 0018:ffff888117aafa30 EFLAGS: 00010202
RAX: 0000000000000000 RBX: ffff88811a86b000 RCX: ffffffff89a63534
RDX: 1ffff110200ec602 RSI: 0000000000000004 RDI: ffff888100763010
RBP: ffff888100763000 R08: 0000000000000001 R09: ffff888100763028
R10: 0000000000000003 R11: 0000000000000000 R12: 0000000000000000
R13: ffff88812c432000 R14: ffff88812c608000 R15: ffff888120bfc000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f91d6970c99 CR3: 00000001159c4000 CR4: 00000000000006f0
Call Trace:
 <TASK>
 __ext4_journal_get_create_access+0x42/0x170
 ext4_getblk+0x319/0x6f0
 ext4_bread+0x11/0x100
 ext4_append+0x1e6/0x4a0
 ext4_init_new_dir+0x145/0x1d0
 ext4_mkdir+0x326/0x920
 vfs_mkdir+0x45c/0x740
 do_mkdirat+0x234/0x2f0
 __x64_sys_mkdir+0xd6/0x120
 do_syscall_64+0x5f/0xfa0
 entry_SYSCALL_64_after_hwframe+0x76/0x7e

The above issue occurs with us in errors=continue mode when accompanied by
storage failures. There have been many inconsistencies in the file system
data.
In the case of file system data inconsistency, for example, if the block
bitmap of a referenced block is not set, it can lead to the situation where
a block being committed is allocated and used again. As a result, the
following condition will not be satisfied then trigger BUG_ON. Of course,
it is entirely possible to construct a problematic image that can trigger
this BUG_ON through specific operations. In fact, I have constructed such
an image and easily reproduced this issue.
Therefore, J_ASSERT() holds true only under ideal conditions, but it may
not necessarily be satisfied in exceptional scenarios. Using J_ASSERT()
directly in abnormal situations would cause the system to crash, which is
clearly not what we want. So here we directly trigger a JBD abort instead
of immediately invoking BUG_ON.

Fixes: 470decc ("[PATCH] jbd2: initial copy of files from jbd")
Signed-off-by: Ye Bin <yebin10@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Message-ID: <20251025072657.307851-1-yebin@huaweicloud.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: stable@kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Noah Wager <noah.wager@canonical.com>
Signed-off-by: Edoardo Canepa <edoardo.canepa@canonical.com>
JiandiAnNVIDIA pushed a commit that referenced this pull request Apr 15, 2026
BugLink: https://bugs.launchpad.net/bugs/2139373

commit 0cd8fee upstream.

Fix a race between inline data destruction and block mapping.

The function ext4_destroy_inline_data_nolock() changes the inode data
layout by clearing EXT4_INODE_INLINE_DATA and setting EXT4_INODE_EXTENTS.
At the same time, another thread may execute ext4_map_blocks(), which
tests EXT4_INODE_EXTENTS to decide whether to call ext4_ext_map_blocks()
or ext4_ind_map_blocks().

Without i_data_sem protection, ext4_ind_map_blocks() may receive inode
with EXT4_INODE_EXTENTS flag and triggering assert.

kernel BUG at fs/ext4/indirect.c:546!
EXT4-fs (loop2): unmounting filesystem.
invalid opcode: 0000 [#1] PREEMPT SMP KASAN NOPTI
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-1 04/01/2014
RIP: 0010:ext4_ind_map_blocks.cold+0x2b/0x5a fs/ext4/indirect.c:546

Call Trace:
 <TASK>
 ext4_map_blocks+0xb9b/0x16f0 fs/ext4/inode.c:681
 _ext4_get_block+0x242/0x590 fs/ext4/inode.c:822
 ext4_block_write_begin+0x48b/0x12c0 fs/ext4/inode.c:1124
 ext4_write_begin+0x598/0xef0 fs/ext4/inode.c:1255
 ext4_da_write_begin+0x21e/0x9c0 fs/ext4/inode.c:3000
 generic_perform_write+0x259/0x5d0 mm/filemap.c:3846
 ext4_buffered_write_iter+0x15b/0x470 fs/ext4/file.c:285
 ext4_file_write_iter+0x8e0/0x17f0 fs/ext4/file.c:679
 call_write_iter include/linux/fs.h:2271 [inline]
 do_iter_readv_writev+0x212/0x3c0 fs/read_write.c:735
 do_iter_write+0x186/0x710 fs/read_write.c:861
 vfs_iter_write+0x70/0xa0 fs/read_write.c:902
 iter_file_splice_write+0x73b/0xc90 fs/splice.c:685
 do_splice_from fs/splice.c:763 [inline]
 direct_splice_actor+0x10f/0x170 fs/splice.c:950
 splice_direct_to_actor+0x33a/0xa10 fs/splice.c:896
 do_splice_direct+0x1a9/0x280 fs/splice.c:1002
 do_sendfile+0xb13/0x12c0 fs/read_write.c:1255
 __do_sys_sendfile64 fs/read_write.c:1323 [inline]
 __se_sys_sendfile64 fs/read_write.c:1309 [inline]
 __x64_sys_sendfile64+0x1cf/0x210 fs/read_write.c:1309
 do_syscall_x64 arch/x86/entry/common.c:51 [inline]
 do_syscall_64+0x35/0x80 arch/x86/entry/common.c:81
 entry_SYSCALL_64_after_hwframe+0x6e/0xd8

Fixes: c755e25 ("ext4: fix deadlock between inline_data and ext4_expand_extra_isize_ea()")
Cc: stable@vger.kernel.org # v4.11+
Signed-off-by: Alexey Nepomnyashih <sdl@nppct.ru>
Message-ID: <20251104093326.697381-1-sdl@nppct.ru>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Noah Wager <noah.wager@canonical.com>
Signed-off-by: Edoardo Canepa <edoardo.canepa@canonical.com>
JiandiAnNVIDIA pushed a commit that referenced this pull request Apr 15, 2026
BugLink: https://bugs.launchpad.net/bugs/2139373

commit a51f025 upstream.

Syzbot identified an issue [1] in pcl818_ai_cancel(), which stems from
the fact that in case of early device detach via pcl818_detach(),
subdevice dev->read_subdev may not have initialized its pointer to
&struct comedi_async as intended. Thus, any such dereferencing of
&s->async->cmd will lead to general protection fault and kernel crash.

Mitigate this problem by removing a call to pcl818_ai_cancel() from
pcl818_detach() altogether. This way, if the subdevice setups its
support for async commands, everything async-related will be
handled via subdevice's own ->cancel() function in
comedi_device_detach_locked() even before pcl818_detach(). If no
support for asynchronous commands is provided, there is no need
to cancel anything either.

[1] Syzbot crash:
Oops: general protection fault, probably for non-canonical address 0xdffffc0000000005: 0000 [#1] SMP KASAN PTI
KASAN: null-ptr-deref in range [0x0000000000000028-0x000000000000002f]
CPU: 1 UID: 0 PID: 6050 Comm: syz.0.18 Not tainted syzkaller #0 PREEMPT(full)
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 08/18/2025
RIP: 0010:pcl818_ai_cancel+0x69/0x3f0 drivers/comedi/drivers/pcl818.c:762
...
Call Trace:
 <TASK>
 pcl818_detach+0x66/0xd0 drivers/comedi/drivers/pcl818.c:1115
 comedi_device_detach_locked+0x178/0x750 drivers/comedi/drivers.c:207
 do_devconfig_ioctl drivers/comedi/comedi_fops.c:848 [inline]
 comedi_unlocked_ioctl+0xcde/0x1020 drivers/comedi/comedi_fops.c:2178
 vfs_ioctl fs/ioctl.c:51 [inline]
 __do_sys_ioctl fs/ioctl.c:597 [inline]
...

Reported-by: syzbot+fce5d9d5bd067d6fbe9b@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=fce5d9d5bd067d6fbe9b
Fixes: 00aba6e ("staging: comedi: pcl818: remove 'neverending_ai' from private data")
Cc: stable <stable@kernel.org>
Signed-off-by: Nikita Zhandarovich <n.zhandarovich@fintech.ru>
Reviewed-by: Ian Abbott <abbotti@mev.co.uk>
Link: https://patch.msgid.link/20251023141457.398685-1-n.zhandarovich@fintech.ru
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Noah Wager <noah.wager@canonical.com>
Signed-off-by: Edoardo Canepa <edoardo.canepa@canonical.com>
JiandiAnNVIDIA pushed a commit that referenced this pull request Apr 15, 2026
BugLink: https://bugs.launchpad.net/bugs/2139960

[ Upstream commit 78b4d64 ]

A timer that expires a vgem fence automatically in 10 seconds is now
released with timer_delete_sync() from fence->ops.release() called on last
dma_fence_put().  In some scenarios, it can run in IRQ context, which is
not safe unless TIMER_IRQSAFE is used.  One potentially risky scenario was
demonstrated in Intel DRM CI trybot, BAT run on machine bat-adlp-6, while
working on new IGT subtests syncobj_timeline@stress-* as user space
replacements of some problematic test cases of a dma-fence-chain selftest
[1].

[117.004338] ================================
[117.004340] WARNING: inconsistent lock state
[117.004342] 6.17.0-rc7-CI_DRM_17270-g7644974e648c+ #1 Tainted: G S   U
[117.004346] --------------------------------
[117.004347] inconsistent {HARDIRQ-ON-W} -> {IN-HARDIRQ-W} usage.
[117.004349] swapper/0/0 [HC1[1]:SC1[1]:HE0:SE0] takes:
[117.004352] ffff888138f86aa8 ((&fence->timer)){?.-.}-{0:0}, at: __timer_delete_sync+0x4b/0x190
[117.004361] {HARDIRQ-ON-W} state was registered at:
[117.004363]   lock_acquire+0xc4/0x2e0
[117.004366]   call_timer_fn+0x80/0x2a0
[117.004368]   __run_timers+0x231/0x310
[117.004370]   run_timer_softirq+0x76/0xe0
[117.004372]   handle_softirqs+0xd4/0x4d0
[117.004375]   __irq_exit_rcu+0x13f/0x160
[117.004377]   irq_exit_rcu+0xe/0x20
[117.004379]   sysvec_apic_timer_interrupt+0xa0/0xc0
[117.004382]   asm_sysvec_apic_timer_interrupt+0x1b/0x20
[117.004385]   cpuidle_enter_state+0x12b/0x8a0
[117.004388]   cpuidle_enter+0x2e/0x50
[117.004393]   call_cpuidle+0x22/0x60
[117.004395]   do_idle+0x1fd/0x260
[117.004398]   cpu_startup_entry+0x29/0x30
[117.004401]   start_secondary+0x12d/0x160
[117.004404]   common_startup_64+0x13e/0x141
[117.004407] irq event stamp: 2282669
[117.004409] hardirqs last  enabled at (2282668): [<ffffffff8289db71>] _raw_spin_unlock_irqrestore+0x51/0x80
[117.004414] hardirqs last disabled at (2282669): [<ffffffff82882021>] sysvec_irq_work+0x11/0xc0
[117.004419] softirqs last  enabled at (2254702): [<ffffffff8289fd00>] __do_softirq+0x10/0x18
[117.004423] softirqs last disabled at (2254725): [<ffffffff813d4ddf>] __irq_exit_rcu+0x13f/0x160
[117.004426]
other info that might help us debug this:
[117.004429]  Possible unsafe locking scenario:
[117.004432]        CPU0
[117.004433]        ----
[117.004434]   lock((&fence->timer));
[117.004436]   <Interrupt>
[117.004438]     lock((&fence->timer));
[117.004440]
 *** DEADLOCK ***
[117.004443] 1 lock held by swapper/0/0:
[117.004445]  #0: ffffc90000003d50 ((&fence->timer)){?.-.}-{0:0}, at: call_timer_fn+0x7a/0x2a0
[117.004450]
stack backtrace:
[117.004453] CPU: 0 UID: 0 PID: 0 Comm: swapper/0 Tainted: G S   U              6.17.0-rc7-CI_DRM_17270-g7644974e648c+ #1 PREEMPT(voluntary)
[117.004455] Tainted: [S]=CPU_OUT_OF_SPEC, [U]=USER
[117.004455] Hardware name: Intel Corporation Alder Lake Client Platform/AlderLake-P DDR4 RVP, BIOS RPLPFWI1.R00.4035.A00.2301200723 01/20/2023
[117.004456] Call Trace:
[117.004456]  <IRQ>
[117.004457]  dump_stack_lvl+0x91/0xf0
[117.004460]  dump_stack+0x10/0x20
[117.004461]  print_usage_bug.part.0+0x260/0x360
[117.004463]  mark_lock+0x76e/0x9c0
[117.004465]  ? register_lock_class+0x48/0x4a0
[117.004467]  __lock_acquire+0xbc3/0x2860
[117.004469]  lock_acquire+0xc4/0x2e0
[117.004470]  ? __timer_delete_sync+0x4b/0x190
[117.004472]  ? __timer_delete_sync+0x4b/0x190
[117.004473]  __timer_delete_sync+0x68/0x190
[117.004474]  ? __timer_delete_sync+0x4b/0x190
[117.004475]  timer_delete_sync+0x10/0x20
[117.004476]  vgem_fence_release+0x19/0x30 [vgem]
[117.004478]  dma_fence_release+0xc1/0x3b0
[117.004480]  ? dma_fence_release+0xa1/0x3b0
[117.004481]  dma_fence_chain_release+0xe7/0x130
[117.004483]  dma_fence_release+0xc1/0x3b0
[117.004484]  ? _raw_spin_unlock_irqrestore+0x27/0x80
[117.004485]  dma_fence_chain_irq_work+0x59/0x80
[117.004487]  irq_work_single+0x75/0xa0
[117.004490]  irq_work_run_list+0x33/0x60
[117.004491]  irq_work_run+0x18/0x40
[117.004493]  __sysvec_irq_work+0x35/0x170
[117.004494]  sysvec_irq_work+0x47/0xc0
[117.004496]  asm_sysvec_irq_work+0x1b/0x20
[117.004497] RIP: 0010:_raw_spin_unlock_irqrestore+0x57/0x80
[117.004499] Code: 00 75 1c 65 ff 0d d9 34 68 01 74 20 5b 41 5c 5d 31 c0 31 d2 31 c9 31 f6 31 ff c3 cc cc cc cc e8 7f 9d d3 fe fb 0f 1f 44 00 00 <eb> d7 0f 1f 44 00 00 5b 41 5c 5d 31 c0 31 d2 31 c9 31 f6 31 ff c3
[117.004499] RSP: 0018:ffffc90000003cf0 EFLAGS: 00000246
[117.004500] RAX: 0000000000000000 RBX: ffff888155e94c40 RCX: 0000000000000000
[117.004501] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
[117.004502] RBP: ffffc90000003d00 R08: 0000000000000000 R09: 0000000000000000
[117.004502] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000246
[117.004502] R13: 0000000000000001 R14: 0000000000000246 R15: ffff888155e94c80
[117.004506]  dma_fence_signal+0x49/0xb0
[117.004507]  ? __pfx_vgem_fence_timeout+0x10/0x10 [vgem]
[117.004508]  vgem_fence_timeout+0x12/0x20 [vgem]
[117.004509]  call_timer_fn+0xa1/0x2a0
[117.004512]  ? __pfx_vgem_fence_timeout+0x10/0x10 [vgem]
[117.004513]  __run_timers+0x231/0x310
[117.004514]  ? tmigr_handle_remote+0x2ac/0x560
[117.004517]  timer_expire_remote+0x46/0x70
[117.004518]  tmigr_handle_remote+0x433/0x560
[117.004520]  ? __run_timers+0x239/0x310
[117.004521]  ? run_timer_softirq+0x21/0xe0
[117.004522]  ? lock_release+0xce/0x2a0
[117.004524]  run_timer_softirq+0xcf/0xe0
[117.004525]  handle_softirqs+0xd4/0x4d0
[117.004526]  __irq_exit_rcu+0x13f/0x160
[117.004527]  irq_exit_rcu+0xe/0x20
[117.004528]  sysvec_apic_timer_interrupt+0xa0/0xc0
[117.004529]  </IRQ>
[117.004529]  <TASK>
[117.004529]  asm_sysvec_apic_timer_interrupt+0x1b/0x20
[117.004530] RIP: 0010:cpuidle_enter_state+0x12b/0x8a0
[117.004532] Code: 48 0f a3 05 97 ce 0e 01 0f 82 2e 03 00 00 31 ff e8 8a 41 bd fe 80 7d d0 00 0f 85 11 03 00 00 e8 8b 06 d5 fe fb 0f 1f 44 00 00 <45> 85 f6 0f 88 67 02 00 00 4d 63 ee 49 83 fd 0a 0f 83 34 06 00 00
[117.004532] RSP: 0018:ffffffff83403d88 EFLAGS: 00000246
[117.004533] RAX: 0000000000000000 RBX: ffff88888f046440 RCX: 0000000000000000
[117.004533] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
[117.004534] RBP: ffffffff83403dd8 R08: 0000000000000000 R09: 0000000000000000
[117.004534] R10: 0000000000000000 R11: 0000000000000000 R12: ffffffff837cbe80
[117.004534] R13: 0000000000000004 R14: 0000000000000004 R15: 0000001ad1df466b
[117.004537]  ? cpuidle_enter_state+0x125/0x8a0
[117.004538]  ? sched_clock_noinstr+0x9/0x10
[117.004540]  cpuidle_enter+0x2e/0x50
[117.004542]  call_cpuidle+0x22/0x60
[117.004542]  do_idle+0x1fd/0x260
[117.004544]  cpu_startup_entry+0x29/0x30
[117.004546]  rest_init+0x104/0x200
[117.004548]  start_kernel+0x93d/0xbd0
[117.004550]  ? load_ucode_intel_bsp+0x2a/0x90
[117.004551]  ? sme_unmap_bootdata+0x14/0x80
[117.004554]  x86_64_start_reservations+0x18/0x30
[117.004555]  x86_64_start_kernel+0xfd/0x150
[117.004556]  ? soft_restart_cpu+0x14/0x14
[117.004558]  common_startup_64+0x13e/0x141
[117.004560]  </TASK>
[117.004565] ------------[ cut here ]------------
[117.004692] WARNING: CPU: 0 PID: 0 at kernel/time/timer.c:1610 __timer_delete_sync+0x126/0x190
[117.004697] Modules linked in: vgem snd_hda_codec_intelhdmi snd_hda_codec_hdmi i915 prime_numbers ttm drm_buddy drm_display_helper cec rc_core i2c_algo_bit hid_sensor_custom hid_sensor_hub hid_generic intel_ishtp_hid hid intel_uncore_frequency intel_uncore_frequency_common x86_pkg_temp_thermal intel_powerclamp cmdlinepart ee1004 r8153_ecm spi_nor coretemp cdc_ether mei_pxp mei_hdcp usbnet mtd intel_rapl_msr wmi_bmof kvm_intel snd_hda_intel snd_intel_dspcfg processor_thermal_device_pci kvm snd_hda_codec processor_thermal_device irqbypass processor_thermal_wt_hint polyval_clmulni platform_temperature_control snd_hda_core ghash_clmulni_intel processor_thermal_rfim spi_pxa2xx_platform snd_hwdep aesni_intel processor_thermal_rapl dw_dmac snd_pcm dw_dmac_core intel_rapl_common r8152 rapl mii intel_cstate spi_pxa2xx_core i2c_i801 processor_thermal_wt_req snd_timer i2c_mux mei_me intel_ish_ipc processor_thermal_power_floor e1000e snd i2c_smbus spi_intel_pci processor_thermal_mbox mei soundcore intel_ishtp thunderbolt idma64
[117.004733]  spi_intel int340x_thermal_zone igen6_edac binfmt_misc intel_skl_int3472_tps68470 intel_pmc_core tps68470_regulator video clk_tps68470 pmt_telemetry pmt_discovery nls_iso8859_1 pmt_class intel_pmc_ssram_telemetry intel_skl_int3472_discrete int3400_thermal intel_hid intel_skl_int3472_common acpi_thermal_rel intel_vsec wmi pinctrl_tigerlake acpi_tad sparse_keymap acpi_pad dm_multipath msr nvme_fabrics fuse efi_pstore nfnetlink autofs4
[117.004782] CPU: 0 UID: 0 PID: 0 Comm: swapper/0 Tainted: G S   U              6.17.0-rc7-CI_DRM_17270-g7644974e648c+ #1 PREEMPT(voluntary)
[117.004787] Tainted: [S]=CPU_OUT_OF_SPEC, [U]=USER
[117.004789] Hardware name: Intel Corporation Alder Lake Client Platform/AlderLake-P DDR4 RVP, BIOS RPLPFWI1.R00.4035.A00.2301200723 01/20/2023
[117.004793] RIP: 0010:__timer_delete_sync+0x126/0x190
[117.004795] Code: 31 c0 45 31 c9 c3 cc cc cc cc 48 8b 75 d0 45 84 f6 74 63 49 c7 45 18 00 00 00 00 48 89 c7 e8 51 46 39 01 f3 90 e9 66 ff ff ff <0f> 0b e9 5f ff ff ff e8 ee e4 0c 00 49 8d 5d 28 45 31 c9 31 c9 4c
[117.004801] RSP: 0018:ffffc90000003a40 EFLAGS: 00010046
[117.004804] RAX: ffffffff815093fb RBX: ffff888138f86aa8 RCX: 0000000000000000
[117.004807] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
[117.004809] RBP: ffffc90000003a70 R08: 0000000000000000 R09: 0000000000000000
[117.004812] R10: 0000000000000000 R11: 0000000000000000 R12: ffffffff815093fb
[117.004814] R13: ffff888138f86a80 R14: 0000000000000000 R15: 0000000000000000
[117.004817] FS:  0000000000000000(0000) GS:ffff88890b0f7000(0000) knlGS:0000000000000000
[117.004820] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[117.004823] CR2: 00005db8131eb7f0 CR3: 0000000003448000 CR4: 0000000000f52ef0
[117.004826] PKRU: 55555554
[117.004827] Call Trace:
[117.004829]  <IRQ>
[117.004831]  timer_delete_sync+0x10/0x20
[117.004833]  vgem_fence_release+0x19/0x30 [vgem]
[117.004836]  dma_fence_release+0xc1/0x3b0
[117.004838]  ? dma_fence_release+0xa1/0x3b0
[117.004841]  dma_fence_chain_release+0xe7/0x130
[117.004844]  dma_fence_release+0xc1/0x3b0
[117.004847]  ? _raw_spin_unlock_irqrestore+0x27/0x80
[117.004850]  dma_fence_chain_irq_work+0x59/0x80
[117.004853]  irq_work_single+0x75/0xa0
[117.004857]  irq_work_run_list+0x33/0x60
[117.004860]  irq_work_run+0x18/0x40
[117.004863]  __sysvec_irq_work+0x35/0x170
[117.004865]  sysvec_irq_work+0x47/0xc0
[117.004868]  asm_sysvec_irq_work+0x1b/0x20
[117.004871] RIP: 0010:_raw_spin_unlock_irqrestore+0x57/0x80
[117.004874] Code: 00 75 1c 65 ff 0d d9 34 68 01 74 20 5b 41 5c 5d 31 c0 31 d2 31 c9 31 f6 31 ff c3 cc cc cc cc e8 7f 9d d3 fe fb 0f 1f 44 00 00 <eb> d7 0f 1f 44 00 00 5b 41 5c 5d 31 c0 31 d2 31 c9 31 f6 31 ff c3
[117.004879] RSP: 0018:ffffc90000003cf0 EFLAGS: 00000246
[117.004882] RAX: 0000000000000000 RBX: ffff888155e94c40 RCX: 0000000000000000
[117.004884] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
[117.004887] RBP: ffffc90000003d00 R08: 0000000000000000 R09: 0000000000000000
[117.004890] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000246
[117.004892] R13: 0000000000000001 R14: 0000000000000246 R15: ffff888155e94c80
[117.004897]  dma_fence_signal+0x49/0xb0
[117.004899]  ? __pfx_vgem_fence_timeout+0x10/0x10 [vgem]
[117.004902]  vgem_fence_timeout+0x12/0x20 [vgem]
[117.004904]  call_timer_fn+0xa1/0x2a0
[117.004908]  ? __pfx_vgem_fence_timeout+0x10/0x10 [vgem]
[117.004910]  __run_timers+0x231/0x310
[117.004913]  ? tmigr_handle_remote+0x2ac/0x560
[117.004917]  timer_expire_remote+0x46/0x70
[117.004919]  tmigr_handle_remote+0x433/0x560
[117.004923]  ? __run_timers+0x239/0x310
[117.004925]  ? run_timer_softirq+0x21/0xe0
[117.004928]  ? lock_release+0xce/0x2a0
[117.004931]  run_timer_softirq+0xcf/0xe0
[117.004933]  handle_softirqs+0xd4/0x4d0
[117.004936]  __irq_exit_rcu+0x13f/0x160
[117.004938]  irq_exit_rcu+0xe/0x20
[117.004940]  sysvec_apic_timer_interrupt+0xa0/0xc0
[117.004943]  </IRQ>
[117.004944]  <TASK>
[117.004946]  asm_sysvec_apic_timer_interrupt+0x1b/0x20
[117.004949] RIP: 0010:cpuidle_enter_state+0x12b/0x8a0
[117.004953] Code: 48 0f a3 05 97 ce 0e 01 0f 82 2e 03 00 00 31 ff e8 8a 41 bd fe 80 7d d0 00 0f 85 11 03 00 00 e8 8b 06 d5 fe fb 0f 1f 44 00 00 <45> 85 f6 0f 88 67 02 00 00 4d 63 ee 49 83 fd 0a 0f 83 34 06 00 00
[117.004961] RSP: 0018:ffffffff83403d88 EFLAGS: 00000246
[117.004963] RAX: 0000000000000000 RBX: ffff88888f046440 RCX: 0000000000000000
[117.004966] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
[117.004968] RBP: ffffffff83403dd8 R08: 0000000000000000 R09: 0000000000000000
[117.004971] R10: 0000000000000000 R11: 0000000000000000 R12: ffffffff837cbe80
[117.004974] R13: 0000000000000004 R14: 0000000000000004 R15: 0000001ad1df466b
[117.004978]  ? cpuidle_enter_state+0x125/0x8a0
[117.004981]  ? sched_clock_noinstr+0x9/0x10
[117.004985]  cpuidle_enter+0x2e/0x50
[117.004989]  call_cpuidle+0x22/0x60
[117.004991]  do_idle+0x1fd/0x260
[117.005001]  cpu_startup_entry+0x29/0x30
[117.005004]  rest_init+0x104/0x200
[117.005008]  start_kernel+0x93d/0xbd0
[117.005011]  ? load_ucode_intel_bsp+0x2a/0x90
[117.005014]  ? sme_unmap_bootdata+0x14/0x80
[117.005017]  x86_64_start_reservations+0x18/0x30
[117.005020]  x86_64_start_kernel+0xfd/0x150
[117.005023]  ? soft_restart_cpu+0x14/0x14
[117.005026]  common_startup_64+0x13e/0x141
[117.005030]  </TASK>
[117.005032] irq event stamp: 2282669
[117.005034] hardirqs last  enabled at (2282668): [<ffffffff8289db71>] _raw_spin_unlock_irqrestore+0x51/0x80
[117.005038] hardirqs last disabled at (2282669): [<ffffffff82882021>] sysvec_irq_work+0x11/0xc0
[117.005043] softirqs last  enabled at (2254702): [<ffffffff8289fd00>] __do_softirq+0x10/0x18
[117.005047] softirqs last disabled at (2254725): [<ffffffff813d4ddf>] __irq_exit_rcu+0x13f/0x160
[117.005051] ---[ end trace 0000000000000000 ]---

Make the timer IRQ safe.

[1] https://patchwork.freedesktop.org/series/154987/#rev2

Fixes: 4077798 ("drm/vgem: Attach sw fences to exported vGEM dma-buf (ioctl)")
Signed-off-by: Janusz Krzysztofik <janusz.krzysztofik@linux.intel.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Link: https://lore.kernel.org/r/20250926152628.2165080-2-janusz.krzysztofik@linux.intel.com
Signed-off-by: Maarten Lankhorst <dev@lankhorst.se>
Signed-off-by: Sasha Levin <sashal@kernel.org>
CVE-2025-68757
Signed-off-by: Manuel Diewald <manuel.diewald@canonical.com>
Signed-off-by: Edoardo Canepa <edoardo.canepa@canonical.com>
JiandiAnNVIDIA pushed a commit that referenced this pull request Apr 15, 2026
BugLink: https://bugs.launchpad.net/bugs/2139960

[ Upstream commit 163e5f2 ]

When using perf record with the `--overwrite` option, a segmentation fault
occurs if an event fails to open. For example:

  perf record -e cycles-ct -F 1000 -a --overwrite
  Error:
  cycles-ct:H: PMU Hardware doesn't support sampling/overflow-interrupts. Try 'perf stat'
  perf: Segmentation fault
      #0 0x6466b6 in dump_stack debug.c:366
      #1 0x646729 in sighandler_dump_stack debug.c:378
      #2 0x453fd1 in sigsegv_handler builtin-record.c:722
      NVIDIA#3 0x7f8454e65090 in __restore_rt libc-2.32.so[54090]
      NVIDIA#4 0x6c5671 in __perf_event__synthesize_id_index synthetic-events.c:1862
      NVIDIA#5 0x6c5ac0 in perf_event__synthesize_id_index synthetic-events.c:1943
      NVIDIA#6 0x458090 in record__synthesize builtin-record.c:2075
      NVIDIA#7 0x45a85a in __cmd_record builtin-record.c:2888
      NVIDIA#8 0x45deb6 in cmd_record builtin-record.c:4374
      NVIDIA#9 0x4e5e33 in run_builtin perf.c:349
      NVIDIA#10 0x4e60bf in handle_internal_command perf.c:401
      NVIDIA#11 0x4e6215 in run_argv perf.c:448
      NVIDIA#12 0x4e653a in main perf.c:555
      NVIDIA#13 0x7f8454e4fa72 in __libc_start_main libc-2.32.so[3ea72]
      NVIDIA#14 0x43a3ee in _start ??:0

The --overwrite option implies --tail-synthesize, which collects non-sample
events reflecting the system status when recording finishes. However, when
evsel opening fails (e.g., unsupported event 'cycles-ct'), session->evlist
is not initialized and remains NULL. The code unconditionally calls
record__synthesize() in the error path, which iterates through the NULL
evlist pointer and causes a segfault.

To fix it, move the record__synthesize() call inside the error check block, so
it's only called when there was no error during recording, ensuring that evlist
is properly initialized.

Fixes: 4ea648a ("perf record: Add --tail-synthesize option")
Signed-off-by: Shuai Xue <xueshuai@linux.alibaba.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Manuel Diewald <manuel.diewald@canonical.com>
Signed-off-by: Edoardo Canepa <edoardo.canepa@canonical.com>
JiandiAnNVIDIA pushed a commit that referenced this pull request Apr 15, 2026
BugLink: https://bugs.launchpad.net/bugs/2139960

[ Upstream commit 23b2d2f ]

When booting with KASAN enabled the following splat is
encountered during probe of the k1 clock driver:

UBSAN: array-index-out-of-bounds in drivers/clk/spacemit/ccu-k1.c:1044:16
index 0 is out of range for type 'clk_hw *[*]'
CPU: 0 UID: 0 PID: 1 Comm: swapper/0 Not tainted 6.18.0-rc5+ #1 PREEMPT(lazy)
Hardware name: Unknown Unknown Product/Unknown Product, BIOS 2022.10spacemit 10/01/2022
Call Trace:
[<ffffffff8002b628>] dump_backtrace+0x28/0x38
[<ffffffff800027d2>] show_stack+0x3a/0x50
[<ffffffff800220c2>] dump_stack_lvl+0x5a/0x80
[<ffffffff80022100>] dump_stack+0x18/0x20
[<ffffffff800164b8>] ubsan_epilogue+0x10/0x48
[<ffffffff8099034e>] __ubsan_handle_out_of_bounds+0xa6/0xa8
[<ffffffff80acbfa6>] k1_ccu_probe+0x37e/0x420
[<ffffffff80b79e6e>] platform_probe+0x56/0x98
[<ffffffff80b76a7e>] really_probe+0x9e/0x350
[<ffffffff80b76db0>] __driver_probe_device+0x80/0x138
[<ffffffff80b76f52>] driver_probe_device+0x3a/0xd0
[<ffffffff80b771c4>] __driver_attach+0xac/0x1b8
[<ffffffff80b742fc>] bus_for_each_dev+0x6c/0xc8
[<ffffffff80b76296>] driver_attach+0x26/0x38
[<ffffffff80b759ae>] bus_add_driver+0x13e/0x268
[<ffffffff80b7836a>] driver_register+0x52/0x100
[<ffffffff80b79a78>] __platform_driver_register+0x28/0x38
[<ffffffff814585da>] k1_ccu_driver_init+0x22/0x38
[<ffffffff80023a8a>] do_one_initcall+0x62/0x2a0
[<ffffffff81401c60>] do_initcalls+0x170/0x1a8
[<ffffffff81401e7a>] kernel_init_freeable+0x16a/0x1e0
[<ffffffff811f7534>] kernel_init+0x2c/0x180
[<ffffffff80025f56>] ret_from_fork_kernel+0x16/0x1d8
[<ffffffff81205336>] ret_from_fork_kernel_asm+0x16/0x18
---[ end trace ]---

This is bogus and is simply a result of KASAN consulting the
`.num` member of the struct for bounds information (as it should
due to `__counted_by`) and finding 0 set by kzalloc() because it
has not been initialized before the loop that fills in the array.
The easy fix is to just move the line that sets `num` to before
the loop that fills the array so that KASAN has the information
it needs to accurately conclude that the access is valid.

Fixes: 1b72c59 ("clk: spacemit: Add clock support for SpacemiT K1 SoC")
Tested-by: Yanko Kaneti <yaneti@declera.com>
Signed-off-by: Charles Mirabile <cmirabil@redhat.com>
Reviewed-by: Alex Elder <elder@riscstar.com>
Reviewed-by: Troy Mitchell <troy.mitchell@linux.spacemit.com>
Reviewed-by: Yixun Lan <dlan@gentoo.org>
Signed-off-by: Stephen Boyd <sboyd@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Manuel Diewald <manuel.diewald@canonical.com>
Signed-off-by: Edoardo Canepa <edoardo.canepa@canonical.com>
JiandiAnNVIDIA pushed a commit that referenced this pull request Apr 15, 2026
BugLink: https://bugs.launchpad.net/bugs/2139960

[ Upstream commit 385aab8 ]

MT7996 driver can use both wed and wed_hif2 devices to offload traffic
from/to the wireless NIC. In the current codebase we assume to always
use the primary wed device in wed callbacks resulting in the following
crash if the hw runs wed_hif2 (e.g. 6GHz link).

[  297.455876] Unable to handle kernel read from unreadable memory at virtual address 000000000000080a
[  297.464928] Mem abort info:
[  297.467722]   ESR = 0x0000000096000005
[  297.471461]   EC = 0x25: DABT (current EL), IL = 32 bits
[  297.476766]   SET = 0, FnV = 0
[  297.479809]   EA = 0, S1PTW = 0
[  297.482940]   FSC = 0x05: level 1 translation fault
[  297.487809] Data abort info:
[  297.490679]   ISV = 0, ISS = 0x00000005, ISS2 = 0x00000000
[  297.496156]   CM = 0, WnR = 0, TnD = 0, TagAccess = 0
[  297.501196]   GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0
[  297.506500] user pgtable: 4k pages, 39-bit VAs, pgdp=0000000107480000
[  297.512927] [000000000000080a] pgd=08000001097fb003, p4d=08000001097fb003, pud=08000001097fb003, pmd=0000000000000000
[  297.523532] Internal error: Oops: 0000000096000005 [#1] SMP
[  297.715393] CPU: 2 UID: 0 PID: 45 Comm: kworker/u16:2 Tainted: G           O       6.12.50 #0
[  297.723908] Tainted: [O]=OOT_MODULE
[  297.727384] Hardware name: Banana Pi BPI-R4 (2x SFP+) (DT)
[  297.732857] Workqueue: nf_ft_offload_del nf_flow_rule_route_ipv6 [nf_flow_table]
[  297.740254] pstate: 60400005 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[  297.747205] pc : mt76_wed_offload_disable+0x64/0xa0 [mt76]
[  297.752688] lr : mtk_wed_flow_remove+0x58/0x80
[  297.757126] sp : ffffffc080fe3ae0
[  297.760430] x29: ffffffc080fe3ae0 x28: ffffffc080fe3be0 x27: 00000000deadbef7
[  297.767557] x26: ffffff80c5ebca00 x25: 0000000000000001 x24: ffffff80c85f4c00
[  297.774683] x23: ffffff80c1875b78 x22: ffffffc080d42cd0 x21: ffffffc080660018
[  297.781809] x20: ffffff80c6a076d0 x19: ffffff80c6a043c8 x18: 0000000000000000
[  297.788935] x17: 0000000000000000 x16: 0000000000000001 x15: 0000000000000000
[  297.796060] x14: 0000000000000019 x13: ffffff80c0ad8ec0 x12: 00000000fa83b2da
[  297.803185] x11: ffffff80c02700c0 x10: ffffff80c0ad8ec0 x9 : ffffff81fef96200
[  297.810311] x8 : ffffff80c02700c0 x7 : ffffff80c02700d0 x6 : 0000000000000002
[  297.817435] x5 : 0000000000000400 x4 : 0000000000000000 x3 : 0000000000000000
[  297.824561] x2 : 0000000000000001 x1 : 0000000000000800 x0 : ffffff80c6a063c8
[  297.831686] Call trace:
[  297.834123]  mt76_wed_offload_disable+0x64/0xa0 [mt76]
[  297.839254]  mtk_wed_flow_remove+0x58/0x80
[  297.843342]  mtk_flow_offload_cmd+0x434/0x574
[  297.847689]  mtk_wed_setup_tc_block_cb+0x30/0x40
[  297.852295]  nf_flow_offload_ipv6_hook+0x7f4/0x964 [nf_flow_table]
[  297.858466]  nf_flow_rule_route_ipv6+0x438/0x4a4 [nf_flow_table]
[  297.864463]  process_one_work+0x174/0x300
[  297.868465]  worker_thread+0x278/0x430
[  297.872204]  kthread+0xd8/0xdc
[  297.875251]  ret_from_fork+0x10/0x20
[  297.878820] Code: 928b5ae0 8b000273 91400a60 f943fa61 (79401421)
[  297.884901] ---[ end trace 0000000000000000 ]---

Fix the issue detecting the proper wed reference to use running wed
callabacks.

Fixes: 83eafc9 ("wifi: mt76: mt7996: add wed tx support")
Tested-by: Daniel Pawlik <pawlik.dan@gmail.com>
Tested-by: Matteo Croce <teknoraver@meta.com>
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Link: https://patch.msgid.link/20251008-wed-fixes-v1-1-8f7678583385@kernel.org
Signed-off-by: Felix Fietkau <nbd@nbd.name>
Signed-off-by: Sasha Levin <sashal@kernel.org>
CVE-2025-68360
Signed-off-by: Manuel Diewald <manuel.diewald@canonical.com>
Signed-off-by: Edoardo Canepa <edoardo.canepa@canonical.com>
JiandiAnNVIDIA pushed a commit that referenced this pull request Apr 15, 2026
BugLink: https://bugs.launchpad.net/bugs/2139960

[ Upstream commit ccb61a3 ]

kbd_led_set() can sleep, and so may not be used as the brightness_set()
callback.

Otherwise using this led with a trigger leads to system hangs
accompanied by:
BUG: scheduling while atomic: acpi_fakekeyd/2588/0x00000003
CPU: 4 UID: 0 PID: 2588 Comm: acpi_fakekeyd Not tainted 6.17.9+deb14-amd64 #1 PREEMPT(lazy)  Debian 6.17.9-1
Hardware name: ASUSTeK COMPUTER INC. ASUS EXPERTBOOK B9403CVAR/B9403CVAR, BIOS B9403CVAR.311 12/24/2024
Call Trace:
 <TASK>
 [...]
 schedule_timeout+0xbd/0x100
 __down_common+0x175/0x290
 down_timeout+0x67/0x70
 acpi_os_wait_semaphore+0x57/0x90
 [...]
 asus_wmi_evaluate_method3+0x87/0x190 [asus_wmi]
 led_trigger_event+0x3f/0x60
 [...]

Fixes: 9fe44fc ("platform/x86: asus-wmi: Simplify the keyboard brightness updating process")
Signed-off-by: Anton Khirnov <anton@khirnov.net>
Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Reviewed-by: Denis Benato <benato.denis96@gmail.com>
Link: https://patch.msgid.link/20251129101307.18085-3-anton@khirnov.net
Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Manuel Diewald <manuel.diewald@canonical.com>
Signed-off-by: Edoardo Canepa <edoardo.canepa@canonical.com>
JiandiAnNVIDIA pushed a commit that referenced this pull request Apr 15, 2026
BugLink: https://bugs.launchpad.net/bugs/2139960

[ Upstream commit d84e47e ]

Since commit a735831 ("drm/nouveau: vendor in drm_encoder_slave API")
nouveau appears to be broken for all dispnv04 GPUs (before NV50). Depending
on the kernel version, either having no display output and hanging in
kernel for a long time, or even oopsing in the cleanup path like:

Hardware name: PowerMac11,2 PPC970MP 0x440101 PowerMac
...
nouveau 0000:0a:00.0: drm: 0x14C5: Parsing digital output script table
BUG: Unable to handle kernel data access on read at 0x00041520
Faulting instruction address: 0xc0003d0001be0844
Oops: Kernel access of bad area, sig: 11 [#1]
BE PAGE_SIZE=4K MMU=Hash  SMP NR_CPUS=8 NUMA PowerMac
Modules linked in: windfarm_cpufreq_clamp windfarm_smu_sensors windfarm_smu_controls windfarm_pm112 snd_aoa_codec_onyx snd_aoa_fabric_layout snd_aoa windfarm_pid jo
 apple_mfi_fastcharge rndis_host cdc_ether usbnet mii snd_aoa_i2sbus snd_aoa_soundbus snd_pcm snd_timer snd soundcore rack_meter windfarm_smu_sat windfarm_max6690_s
m75_sensor windfarm_core gpu_sched drm_gpuvm drm_exec drm_client_lib drm_ttm_helper ttm drm_display_helper drm_kms_helper drm drm_panel_orientation_quirks syscopyar
_sys_fops i2c_algo_bit backlight uio_pdrv_genirq uio uninorth_agp agpgart zram dm_mod dax ipv6 nfsv4 dns_resolver nfs lockd grace sunrpc offb cfbfillrect cfbimgblt
ont input_leds sr_mod cdrom sd_mod uas ata_generic hid_apple hid_generic usbhid hid usb_storage pata_macio sata_svw libata firewire_ohci scsi_mod firewire_core ohci
ehci_pci ehci_hcd tg3 ohci_hcd libphy usbcore usb_common nls_base
 led_class
CPU: 0 UID: 0 PID: 245 Comm: (udev-worker) Not tainted 6.14.0-09584-g7d06015d936c NVIDIA#7 PREEMPTLAZY
Hardware name: PowerMac11,2 PPC970MP 0x440101 PowerMac
NIP:  c0003d0001be0844 LR: c0003d0001be0830 CTR: 0000000000000000
REGS: c0000000053f70e0 TRAP: 0300   Not tainted  (6.14.0-09584-g7d06015d936c)
MSR:  9000000000009032 <SF,HV,EE,ME,IR,DR,RI>  CR: 24222220  XER: 00000000
DAR: 0000000000041520 DSISR: 40000000 IRQMASK: 0 \x0aGPR00: c0003d0001be0830 c0000000053f7380 c0003d0000911900 c000000007bc6800 \x0aGPR04: 0000000000000000 0000000000000000 c000000007bc6e70 0000000000000001 \x0aGPR08: 01f3040000000000 0000000000041520 0000000000000000 c0003d0000813958 \x0aGPR12: c000000000071a48 c000000000e28000 0000000000000020 0000000000000000 \x0aGPR16: 0000000000000000 0000000000f52630 0000000000000000 0000000000000000 \x0aGPR20: 0000000000000000 0000000000000000 0000000000000001 c0003d0000928528 \x0aGPR24: c0003d0000928598 0000000000000000 c000000007025480 c000000007025480 \x0aGPR28: c0000000010b4000 0000000000000000 c000000007bc1800 c000000007bc6800
NIP [c0003d0001be0844] nv_crtc_destroy+0x44/0xd4 [nouveau]
LR [c0003d0001be0830] nv_crtc_destroy+0x30/0xd4 [nouveau]
Call Trace:
[c0000000053f7380] [c0003d0001be0830] nv_crtc_destroy+0x30/0xd4 [nouveau] (unreliable)
[c0000000053f73c0] [c0003d00007f7bf4] drm_mode_config_cleanup+0x27c/0x30c [drm]
[c0000000053f7490] [c0003d0001bdea50] nouveau_display_create+0x1cc/0x550 [nouveau]
[c0000000053f7500] [c0003d0001bcc29c] nouveau_drm_device_init+0x1c8/0x844 [nouveau]
[c0000000053f75e0] [c0003d0001bcc9ec] nouveau_drm_probe+0xd4/0x1e0 [nouveau]
[c0000000053f7670] [c000000000557d24] local_pci_probe+0x50/0xa8
[c0000000053f76f0] [c000000000557fa8] pci_device_probe+0x22c/0x240
[c0000000053f7760] [c0000000005fff3c] really_probe+0x188/0x31c
[c0000000053f77e0] [c000000000600204] __driver_probe_device+0x134/0x13c
[c0000000053f7860] [c0000000006002c0] driver_probe_device+0x3c/0xb4
[c0000000053f78a0] [c000000000600534] __driver_attach+0x118/0x128
[c0000000053f78e0] [c0000000005fe038] bus_for_each_dev+0xa8/0xf4
[c0000000053f7950] [c0000000005ff460] driver_attach+0x2c/0x40
[c0000000053f7970] [c0000000005fea68] bus_add_driver+0x130/0x278
[c0000000053f7a00] [c00000000060117c] driver_register+0x9c/0x1a0
[c0000000053f7a80] [c00000000055623c] __pci_register_driver+0x5c/0x70
[c0000000053f7aa0] [c0003d0001c058a0] nouveau_drm_init+0x254/0x278 [nouveau]
[c0000000053f7b10] [c00000000000e9bc] do_one_initcall+0x84/0x268
[c0000000053f7bf0] [c0000000001a0ba0] do_init_module+0x70/0x2d8
[c0000000053f7c70] [c0000000001a42bc] init_module_from_file+0xb4/0x108
[c0000000053f7d50] [c0000000001a4504] sys_finit_module+0x1ac/0x478
[c0000000053f7e10] [c000000000023230] system_call_exception+0x1a4/0x20c
[c0000000053f7e50] [c00000000000c554] system_call_common+0xf4/0x258
 --- interrupt: c00 at 0xfd5f988
NIP:  000000000fd5f988 LR: 000000000ff9b148 CTR: 0000000000000000
REGS: c0000000053f7e80 TRAP: 0c00   Not tainted  (6.14.0-09584-g7d06015d936c)
MSR:  100000000000d032 <HV,EE,PR,ME,IR,DR,RI>  CR: 28222244  XER: 00000000
IRQMASK: 0 \x0aGPR00: 0000000000000161 00000000ffcdc2d0 00000000405db160 0000000000000020 \x0aGPR04: 000000000ffa2c9c 0000000000000000 000000000000001f 0000000000000045 \x0aGPR08: 0000000011a13770 0000000000000000 0000000000000000 0000000000000000 \x0aGPR12: 0000000000000000 0000000010249d8c 0000000000000020 0000000000000000 \x0aGPR16: 0000000000000000 0000000000f52630 0000000000000000 0000000000000000 \x0aGPR20: 0000000000000000 0000000000000000 0000000000000000 0000000011a11a70 \x0aGPR24: 0000000011a13580 0000000011a11950 0000000011a11a70 0000000000020000 \x0aGPR28: 000000000ffa2c9c 0000000000000000 000000000ffafc40 0000000011a11a70
NIP [000000000fd5f988] 0xfd5f988
LR [000000000ff9b148] 0xff9b148
 --- interrupt: c00
Code: f821ffc1 418200ac e93f0000 e9290038 e9291468 eba90000 48026c0d e8410018 e93f06aa 3d290001 392982a4 79291f24 <7fdd482a> 2c3e0000 41820030 7fc3f378
 ---[ end trace 0000000000000000 ]---

This is caused by the i2c encoder modules vendored into nouveau/ now
depending on the equally vendored nouveau_i2c_encoder_destroy
function. Trying to auto-load this modules hangs on nouveau
initialization until timeout, and nouveau continues without i2c video
encoders.

Fix by avoiding nouveau dependency by __always_inlining that helper
functions into those i2c video encoder modules.

Fixes: a735831 ("drm/nouveau: vendor in drm_encoder_slave API")
Signed-off-by: René Rebe <rene@exactco.de>
Reviewed-by: Lyude Paul <lyude@redhat.com>
[Lyude: fixed commit reference in description]
Signed-off-by: Lyude Paul <lyude@redhat.com>
Link: https://patch.msgid.link/20251202.164952.2216481867721531616.rene@exactco.de
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Manuel Diewald <manuel.diewald@canonical.com>
Signed-off-by: Edoardo Canepa <edoardo.canepa@canonical.com>
JiandiAnNVIDIA pushed a commit that referenced this pull request Apr 15, 2026
…g connector cleanup

BugLink: https://bugs.launchpad.net/bugs/2127764

During UCSI initialization and operation, there is a race condition where
delayed work items can be scheduled but attempt to queue work after the
workqueue has been destroyed. This occurs in multiple code paths.

The race occurs when:
1. ucsi_partner_task() or ucsi_poll_worker() schedule delayed work
2. Connector cleanup paths call destroy_workqueue()
3. Previously scheduled delayed work timers fire after destruction
4. This triggers warnings and crashes in __queue_work()

The issue is timing-sensitive and typically manifests when:
- Port registration fails due to PPM timing issues
- System shutdown/cleanup occurs with pending delayed work
- Module removal races with active delayed work

[  170.605181] ucsi_acpi USBC000:00: con2: failed to register alt modes
[  181.868900] ------------[ cut here ]------------
[  181.868905] workqueue: cannot queue ucsi_poll_worker [typec_ucsi] on wq USBC000:00-con1
[  181.868918] WARNING: CPU: 1 PID: 0 at kernel/workqueue.c:2255 __queue_work+0x420/0x5a0
...
[  181.869062] CPU: 1 UID: 0 PID: 0 Comm: swapper/1 Not tainted 6.17.0-rc7+ #1 PREEMPT(voluntary)
[  181.869065] Hardware name: Dell Inc. , BIOS xx.xx.xx xx/xx/2025
[  181.869067] RIP: 0010:__queue_work+0x420/0x5a0
[  181.869070] Code: 00 00 41 83 e4 01 0f 85 57 fd ff ff 49 8b 77 18 48 8d 93 c0 00 00 00 48 c7 c7 00 8c bc 92 c6 05 27 47 68 02 01 e8 50 24 fd f
f <0f> 0b e9 32 fd ff ff 0f 0b e9 1d fd ff ff 0f 0b e9 0f fd ff ff 0f
[  181.869072] RSP: 0018:ffffd53c000acdf8 EFLAGS: 00010046
[  181.869075] RAX: 0000000000000000 RBX: ffff8ecd0727f200 RCX: 0000000000000000
[  181.869076] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
[  181.869077] RBP: ffffd53c000ace38 R08: 0000000000000000 R09: 0000000000000000
[  181.869078] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
[  181.869079] R13: ffffffff913995e0 R14: ffff8ecc824387a0 R15: ffff8ecc82438780
[  181.869081] FS:  0000000000000000(0000) GS:ffff8eec0b92f000(0000) knlGS:0000000000000000
[  181.869083] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  181.869084] CR2: 000005593e67a008 CR3: 0000001f41840002 CR4: 0000000000f72ef0
[  181.869086] PKRU: 55555554
[  181.869087] Call Trace:
[  181.869089]  <IRQ>
[  181.869093]  ? sched_clock+0x10/0x30
[  181.869098]  ? __pfx_delayed_work_timer_fn+0x10/0x10
[  181.869100]  delayed_work_timer_fn+0x19/0x30
[  181.869102]  call_timer_fn+0x2c/0x150
[  181.869106]  ? __pfx_delayed_work_timer_fn+0x10/0x10
[  181.869108]  __run_timers+0x1c6/0x2d0
[  181.869111]  run_timer_softirq+0x8a/0x100
[  181.869114]  handle_softirqs+0xe4/0x340
[  181.869118]  __irq_exit_rcu+0x10e/0x130
[  181.869121]  irq_exit_rcu+0xe/0x20
[  181.869124]  sysvec_apic_timer_interrupt+0xa0/0xc0
[  181.869130]  </IRQ>
[  181.869131]  <TASK>
[  181.869132]  asm_sysvec_apic_timer_interrupt+0x1b/0x20                                                                                        [  181.869135] RIP: 0010:cpuidle_enter_state+0xda/0x710
[  181.869137] Code: 8f f7 fe e8 78 f0 ff ff 8b 53 04 49 89 c7 0f 1f 44 00 00 31 ff e8 86 bf f5 fe 80 7d d0 00 0f 85 22 02 00 00 fb 0f 1f 44 00 0
0 <45> 85 f6 0f 88 f2 01 00 00 4d 63 ee 49 83 fd 0a 0f 83 d8 04 00 00
[  181.869139] RSP: 0018:ffffd53c0022be18 EFLAGS: 00000246
[  181.869140] RAX: 0000000000000000 RBX: ffff8eeb9f8bf880 RCX: 0000000000000000
[  181.869142] RDX: 0000000000000001 RSI: 0000000000000000 RDI: 0000000000000000
[  181.869143] RBP: ffffd53c0022be68 R08: 0000000000000000 R09: 0000000000000000
[  181.869144] R10: 0000000000000000 R11: 0000000000000000 R12: ffffffff93914780
[  181.869145] R13: 0000000000000002 R14: 0000000000000002 R15: 0000002a583b0b41
[  181.869148]  ? cpuidle_enter_state+0xca/0x710
[  181.869151]  cpuidle_enter+0x2e/0x50
[  181.869156]  call_cpuidle+0x22/0x60
[  181.869160]  do_idle+0x1dc/0x240
[  181.869163]  cpu_startup_entry+0x29/0x30
[  181.869164]  start_secondary+0x128/0x160
[  181.869167]  common_startup_64+0x13e/0x141
[  181.869171]  </TASK>
[  181.869172] ---[ end trace 0000000000000000 ]---
[  226.924460] workqueue USBC000:00-con1: drain_workqueue() isn't complete after 10 tries
[  329.470977] ucsi_acpi USBC000:00: error -ETIMEDOUT: PPM init failed

Fix this by:
1. Creating ucsi_destroy_connector_wq() helper function that safely
   cancels all pending delayed work before destroying workqueues
2. Applying the safe cleanup to all three workqueue destruction paths:
   - ucsi_register_port() error path
   - ucsi_init() error path
   - ucsi_unregister() cleanup path

This prevents both the initial queueing on destroyed workqueues and
retry attempts from running workers, eliminating the timer races.

Fixes: b9aa02c ("usb: typec: ucsi: Add polling mechanism for partner tasks like alt mode checking")
Cc: stable@vger.kernel.org
Signed-off-by: Chia-Lin Kao (AceLan) <acelan.kao@canonical.com>
(cherry picked from commit https://lore.kernel.org/lkml/20251218071925.3459787-1-acelan.kao@canonical.com/)
Signed-off-by: Chia-Lin Kao (AceLan) <acelan.kao@canonical.com>
Acked-by: Aaron Ma <aaron.ma@canonical.com>
Acked-by: Bethany Jamison <bethany.jamison@canonical.com>
Acked-by: Aaron Ma <aaron.ma@canonical.com>
Acked-by: Bethany Jamison <bethany.jamison@canonical.com>
Acked-by: Aaron Ma <aaron.ma@canonical.com>
Acked-by: Bethany Jamison <bethany.jamison@canonical.com>
Signed-off-by: Edoardo Canepa <edoardo.canepa@canonical.com>
JiandiAnNVIDIA pushed a commit that referenced this pull request Apr 15, 2026
The match_char() macro evaluates its character parameter multiple
times when traversing differential encoding chains. When invoked
with *str++, the string pointer advances on each iteration of the
inner do-while loop, causing the DFA to check different characters
at each iteration and therefore skip input characters.
This results in out-of-bounds reads when the pointer advances past
the input buffer boundary.

[   94.984676] ==================================================================
[   94.985301] BUG: KASAN: slab-out-of-bounds in aa_dfa_match+0x5ae/0x760
[   94.985655] Read of size 1 at addr ffff888100342000 by task file/976

[   94.986319] CPU: 7 UID: 1000 PID: 976 Comm: file Not tainted 6.19.0-rc7-next-20260127 #1 PREEMPT(lazy)
[   94.986322] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-debian-1.16.3-2 04/01/2014
[   94.986329] Call Trace:
[   94.986341]  <TASK>
[   94.986347]  dump_stack_lvl+0x5e/0x80
[   94.986374]  print_report+0xc8/0x270
[   94.986384]  ? aa_dfa_match+0x5ae/0x760
[   94.986388]  kasan_report+0x118/0x150
[   94.986401]  ? aa_dfa_match+0x5ae/0x760
[   94.986405]  aa_dfa_match+0x5ae/0x760
[   94.986408]  __aa_path_perm+0x131/0x400
[   94.986418]  aa_path_perm+0x219/0x2f0
[   94.986424]  apparmor_file_open+0x345/0x570
[   94.986431]  security_file_open+0x5c/0x140
[   94.986442]  do_dentry_open+0x2f6/0x1120
[   94.986450]  vfs_open+0x38/0x2b0
[   94.986453]  ? may_open+0x1e2/0x2b0
[   94.986466]  path_openat+0x231b/0x2b30
[   94.986469]  ? __x64_sys_openat+0xf8/0x130
[   94.986477]  do_file_open+0x19d/0x360
[   94.986487]  do_sys_openat2+0x98/0x100
[   94.986491]  __x64_sys_openat+0xf8/0x130
[   94.986499]  do_syscall_64+0x8e/0x660
[   94.986515]  ? count_memcg_events+0x15f/0x3c0
[   94.986526]  ? srso_alias_return_thunk+0x5/0xfbef5
[   94.986540]  ? handle_mm_fault+0x1639/0x1ef0
[   94.986551]  ? vma_start_read+0xf0/0x320
[   94.986558]  ? srso_alias_return_thunk+0x5/0xfbef5
[   94.986561]  ? srso_alias_return_thunk+0x5/0xfbef5
[   94.986563]  ? fpregs_assert_state_consistent+0x50/0xe0
[   94.986572]  ? srso_alias_return_thunk+0x5/0xfbef5
[   94.986574]  ? arch_exit_to_user_mode_prepare+0x9/0xb0
[   94.986587]  ? srso_alias_return_thunk+0x5/0xfbef5
[   94.986588]  ? irqentry_exit+0x3c/0x590
[   94.986595]  entry_SYSCALL_64_after_hwframe+0x76/0x7e
[   94.986597] RIP: 0033:0x7fda4a79c3ea

Fix by extracting the character value before invoking match_char,
ensuring single evaluation per outer loop.

Fixes: 074c1cd ("apparmor: dfa move character match into a macro")
Reported-by: Qualys Security Advisory <qsa@qualys.com>
Tested-by: Salvatore Bonaccorso <carnil@debian.org>
Reviewed-by: Georgia Garcia <georgia.garcia@canonical.com>
Reviewed-by: Cengiz Can <cengiz.can@canonical.com>
Signed-off-by: Massimiliano Pellizzer <massimiliano.pellizzer@canonical.com>
Signed-off-by: John Johansen <john.johansen@canonical.com>
Signed-off-by: Cengiz Can <cengiz.can@canonical.com>
Signed-off-by: Mehmet Basaran <mehmet.basaran@canonical.com>
JiandiAnNVIDIA pushed a commit that referenced this pull request Apr 15, 2026
The verify_dfa() function only checks DEFAULT_TABLE bounds when the state
is not differentially encoded.

When the verification loop traverses the differential encoding chain,
it reads k = DEFAULT_TABLE[j] and uses k as an array index without
validation. A malformed DFA with DEFAULT_TABLE[j] >= state_count,
therefore, causes both out-of-bounds reads and writes.

[   57.179855] ==================================================================
[   57.180549] BUG: KASAN: slab-out-of-bounds in verify_dfa+0x59a/0x660
[   57.180904] Read of size 4 at addr ffff888100eadec4 by task su/993

[   57.181554] CPU: 1 UID: 0 PID: 993 Comm: su Not tainted 6.19.0-rc7-next-20260127 #1 PREEMPT(lazy)
[   57.181558] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-debian-1.16.3-2 04/01/2014
[   57.181563] Call Trace:
[   57.181572]  <TASK>
[   57.181577]  dump_stack_lvl+0x5e/0x80
[   57.181596]  print_report+0xc8/0x270
[   57.181605]  ? verify_dfa+0x59a/0x660
[   57.181608]  kasan_report+0x118/0x150
[   57.181620]  ? verify_dfa+0x59a/0x660
[   57.181623]  verify_dfa+0x59a/0x660
[   57.181627]  aa_dfa_unpack+0x1610/0x1740
[   57.181629]  ? __kmalloc_cache_noprof+0x1d0/0x470
[   57.181640]  unpack_pdb+0x86d/0x46b0
[   57.181647]  ? srso_alias_return_thunk+0x5/0xfbef5
[   57.181653]  ? srso_alias_return_thunk+0x5/0xfbef5
[   57.181656]  ? aa_unpack_nameX+0x1a8/0x300
[   57.181659]  aa_unpack+0x20b0/0x4c30
[   57.181662]  ? srso_alias_return_thunk+0x5/0xfbef5
[   57.181664]  ? stack_depot_save_flags+0x33/0x700
[   57.181681]  ? kasan_save_track+0x4f/0x80
[   57.181683]  ? kasan_save_track+0x3e/0x80
[   57.181686]  ? __kasan_kmalloc+0x93/0xb0
[   57.181688]  ? __kvmalloc_node_noprof+0x44a/0x780
[   57.181693]  ? aa_simple_write_to_buffer+0x54/0x130
[   57.181697]  ? policy_update+0x154/0x330
[   57.181704]  aa_replace_profiles+0x15a/0x1dd0
[   57.181707]  ? srso_alias_return_thunk+0x5/0xfbef5
[   57.181710]  ? __kvmalloc_node_noprof+0x44a/0x780
[   57.181712]  ? aa_loaddata_alloc+0x77/0x140
[   57.181715]  ? srso_alias_return_thunk+0x5/0xfbef5
[   57.181717]  ? _copy_from_user+0x2a/0x70
[   57.181730]  policy_update+0x17a/0x330
[   57.181733]  profile_replace+0x153/0x1a0
[   57.181735]  ? rw_verify_area+0x93/0x2d0
[   57.181740]  vfs_write+0x235/0xab0
[   57.181745]  ksys_write+0xb0/0x170
[   57.181748]  do_syscall_64+0x8e/0x660
[   57.181762]  entry_SYSCALL_64_after_hwframe+0x76/0x7e
[   57.181765] RIP: 0033:0x7f6192792eb2

Remove the MATCH_FLAG_DIFF_ENCODE condition to validate all DEFAULT_TABLE
entries unconditionally.

Fixes: 031dcc8 ("apparmor: dfa add support for state differential encoding")
Reported-by: Qualys Security Advisory <qsa@qualys.com>
Tested-by: Salvatore Bonaccorso <carnil@debian.org>
Reviewed-by: Georgia Garcia <georgia.garcia@canonical.com>
Reviewed-by: Cengiz Can <cengiz.can@canonical.com>
Signed-off-by: Massimiliano Pellizzer <massimiliano.pellizzer@canonical.com>
Signed-off-by: John Johansen <john.johansen@canonical.com>
Signed-off-by: Cengiz Can <cengiz.can@canonical.com>
Signed-off-by: Mehmet Basaran <mehmet.basaran@canonical.com>
JiandiAnNVIDIA pushed a commit that referenced this pull request Apr 15, 2026
BugLink: https://bugs.launchpad.net/bugs/2119656

Some environments may provide a "nvidia,egm-retired-pages-data-base” but
fail to populate it with a base address, leaving it NULL. Mapping this
invalid value results in a synchronous exception when the region is first
touched. Detect a NULL value, generate a warning to draw attention to the
firmware bug, and return without mapping.

INFO:    th500_ras_intr_handler: External Abort reason=1 syndrome=0x92000410 flags=0x1
[   82.104493] Internal error: synchronous external abort: 0000000096000410 [#1] SMP
[   82.114898] Modules linked in: nvgrace_gpu_vfio_pci(E) nvgrace_egm(E)
[   82.257218] CPU: 0 PID: 10 Comm: kworker/0:1 Tainted: G           OE      6.8.12+ NVIDIA#5
[   82.265135] Hardware name: NVIDIA GH200 P5042, BIOS 24103110 20241031
[   82.271720] Workqueue: events work_for_cpu_fn
[   82.276180] pstate: 03400009 (nzcv daif +PAN -UAO +TCO +DIT -SSBS BTYPE=--)
[   82.283298] pc : register_egm_node+0x2cc/0x440 [nvgrace_egm]
[   82.289087] lr : register_egm_node+0x2c4/0x440 [nvgrace_egm]
[   82.294872] sp : ffff8000802ebc30
[   82.298254] x29: ffff8000802ebc60 x28: 00000000000000ff x27: 0000000000000000
[   82.305550] x26: ffff000087a320c8 x25: ffff0000a5700000 x24: ffff000087a32000
[   82.312846] x23: ffffa77cd758e368 x22: 0000000000000000 x21: ffffa77cd758c640
[   82.320141] x20: ffffa77cd758e170 x19: ffff800081e7d000 x18: ffff800080293038
[   82.327437] x17: 0000000000000000 x16: 0000000000000000 x15: 0000000000000000
[   82.334732] x14: 0000000000000000 x13: 65203a65646f6e5f x12: 0000000000000000
[   82.342027] x11: 0000000000000000 x10: 0000000000000000 x9 : 0000000000000000
[   82.349322] x8 : 0000000000000000 x7 : 0000000000000000 x6 : 0000000000000000
[   82.356618] x5 : 0000000000000000 x4 : 0000000000000000 x3 : 0000000000000000
[   82.363913] x2 : 0000000000000000 x1 : 0000000000000000 x0 : ffff800081e7d000
[   82.371210] Call trace:
[   82.373705]  register_egm_node+0x2cc/0x440 [nvgrace_egm]
[   82.379135]  nvgrace_gpu_probe+0x2ac/0x528 [nvgrace_gpu_vfio_pci]
[   82.385366]  local_pci_probe+0x4c/0xe0
[   82.389198]  work_for_cpu_fn+0x28/0x58
[   82.393026]  process_one_work+0x168/0x3f0
[   82.397123]  worker_thread+0x360/0x480
[   82.400952]  kthread+0x11c/0x128
[   82.404248]  ret_from_fork+0x10/0x20
[   82.407906] Code: d2820001 940002b3 aa0003f3 b4fffac0 (f9400017)
[   82.414134] ---[ end trace 0000000000000000 ]---

Signed-off-by: Matthew R. Ochs <mochs@nvidia.com>
Acked-by: Kai-Heng Feng <kaihengf@nvidia.com>
Acked-by: Carol L. Soto <csoto@nvidia.com>
Acked-by: Koba Ko <kobak@nvidia.com>
Signed-off-by: Matthew R. Ochs <mochs@nvidia.com>
(cherry picked from commit 7ba2930 https://github.com/NVIDIA/NV-Kernels/tree/24.04_linux-nvidia-adv-6.8-next)
Signed-off-by: Koba Ko <kobak@nvidia.com>
Acked-by: Matthew R. Ochs <mochs@nvidia.com>
Acked-by: Carol L. Soto <csoto@nvidia.com>
Signed-off-by: Matthew R. Ochs <mochs@nvidia.com>
(cherry picked from commit 349fb1c https://github.com/NVIDIA/NV-Kernels/tree/24.04_linux-nvidia-adv-6.11-next)
Signed-off-by: Nirmoy Das <nirmoyd@nvidia.com>
Acked-by: Carol L Soto <csoto@nvidia.com>
Acked-by: Matt Ochs <mochs@nvidia.com>
Acked-by: Noah Wager <noah.wager@canonical.com>
Acked-by: Jacob Martin <jacob.martin@canonical.com>
Signed-off--by: Brad Figg <bfigg@nvidia.com>

(cherry picked from commit 6e9c94a noble:linux-nvidia-6.14)
Signed-off-by: Abdur Rahman <abdur.rahman@canonical.com>
JiandiAnNVIDIA pushed a commit that referenced this pull request Apr 15, 2026
CXL testing environment can trigger following trace
 Oops: general protection fault, probably for non-canonical address 0xdffffc0000000092: 0000 [#1] SMP KASAN NOPTI
 KASAN: null-ptr-deref in range [0x0000000000000490-0x0000000000000497]
 RIP: 0010:cxl_dpa_to_region+0x105/0x1f0 [cxl_core]
 Call Trace:
  <TASK>
  cxl_event_trace_record+0xd1/0xa70 [cxl_core]
  __cxl_event_trace_record+0x12f/0x1e0 [cxl_core]
  cxl_mem_get_records_log+0x261/0x500 [cxl_core]
  cxl_mem_get_event_records+0x7c/0xc0 [cxl_core]
  cxl_mock_mem_probe+0xd38/0x1c60 [cxl_mock_mem]
  platform_probe+0x9d/0x130
  really_probe+0x1c8/0x960
  __driver_probe_device+0x187/0x3e0
  driver_probe_device+0x45/0x120
  __device_attach_driver+0x15d/0x280

When CXL subsystem adds a cxl port to a hierarchy, there is a small
window where the new port becomes visible before it is bound to a
driver. This happens because device_add() adds a device to bus device
list before bus_probe_device() binds it to a driver.
So if two cxl memdevs are trying to add a dport to a same port via
devm_cxl_enumerate_ports(), the second cxl memdev may observe the port
and attempt to add a dport, but fails because the port has not yet been
attached to cxl port driver. That causes the memdev->endpoint can not be
updated.

The sequence is like:

CPU 0					CPU 1
devm_cxl_enumerate_ports()
  # port not found, add it
  add_port_attach_ep()
    # hold the parent port lock
    # to add the new port
    devm_cxl_create_port()
      device_add()
	# Add dev to bus devs list
	bus_add_device()
					devm_cxl_enumerate_ports()
					  # found the port
					  find_cxl_port_by_uport()
					  # hold port lock to add a dport
					  device_lock(the port)
					  find_or_add_dport()
					    cxl_port_add_dport()
					      return -ENXIO because port->dev.driver is NULL
					  device_unlock(the port)
	bus_probe_device()
	  # hold the port lock
	  # for attaching
	  device_lock(the port)
	    attaching the new port
	  device_unlock(the port)

To fix this race, require that dport addition holds the host lock
of the target port(the host of CXL root and all cxl host bridge ports is
the platform firmware device, the host of all other ports is their
parent port). The CXL subsystem already requires holding the host lock
while attaching a new port. Therefore, successfully acquiring the host
lock guarantees that port attaching has completed.

Fixes: 4f06d81 ("cxl: Defer dport allocation for switch ports")
Signed-off-by: Li Ming <ming.li@zohomail.com>
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Tested-by: Alison Schofield <alison.schofield@intel.com>
Link: https://patch.msgid.link/20260210-fix-port-enumeration-failure-v3-2-06acce0b9ead@zohomail.com
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
(cherry picked from commit 0066688)
Signed-off-by: Jiandi An <jan@nvidia.com>
JiandiAnNVIDIA pushed a commit that referenced this pull request Apr 15, 2026
CXL testing environment can trigger following trace
 Oops: general protection fault, probably for non-canonical address 0xdffffc0000000092: 0000 [#1] SMP KASAN NOPTI
 KASAN: null-ptr-deref in range [0x0000000000000490-0x0000000000000497]
 RIP: 0010:cxl_dpa_to_region+0x105/0x1f0 [cxl_core]
 Call Trace:
  <TASK>
  cxl_event_trace_record+0xd1/0xa70 [cxl_core]
  __cxl_event_trace_record+0x12f/0x1e0 [cxl_core]
  cxl_mem_get_records_log+0x261/0x500 [cxl_core]
  cxl_mem_get_event_records+0x7c/0xc0 [cxl_core]
  cxl_mock_mem_probe+0xd38/0x1c60 [cxl_mock_mem]
  platform_probe+0x9d/0x130
  really_probe+0x1c8/0x960
  __driver_probe_device+0x187/0x3e0
  driver_probe_device+0x45/0x120
  __device_attach_driver+0x15d/0x280

When CXL subsystem adds a cxl port to a hierarchy, there is a small
window where the new port becomes visible before it is bound to a
driver. This happens because device_add() adds a device to bus device
list before bus_probe_device() binds it to a driver.
So if two cxl memdevs are trying to add a dport to a same port via
devm_cxl_enumerate_ports(), the second cxl memdev may observe the port
and attempt to add a dport, but fails because the port has not yet been
attached to cxl port driver. That causes the memdev->endpoint can not be
updated.

The sequence is like:

CPU 0					CPU 1
devm_cxl_enumerate_ports()
  # port not found, add it
  add_port_attach_ep()
    # hold the parent port lock
    # to add the new port
    devm_cxl_create_port()
      device_add()
	# Add dev to bus devs list
	bus_add_device()
					devm_cxl_enumerate_ports()
					  # found the port
					  find_cxl_port_by_uport()
					  # hold port lock to add a dport
					  device_lock(the port)
					  find_or_add_dport()
					    cxl_port_add_dport()
					      return -ENXIO because port->dev.driver is NULL
					  device_unlock(the port)
	bus_probe_device()
	  # hold the port lock
	  # for attaching
	  device_lock(the port)
	    attaching the new port
	  device_unlock(the port)

To fix this race, require that dport addition holds the host lock
of the target port(the host of CXL root and all cxl host bridge ports is
the platform firmware device, the host of all other ports is their
parent port). The CXL subsystem already requires holding the host lock
while attaching a new port. Therefore, successfully acquiring the host
lock guarantees that port attaching has completed.

Fixes: 4f06d81 ("cxl: Defer dport allocation for switch ports")
Signed-off-by: Li Ming <ming.li@zohomail.com>
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Tested-by: Alison Schofield <alison.schofield@intel.com>
Link: https://patch.msgid.link/20260210-fix-port-enumeration-failure-v3-2-06acce0b9ead@zohomail.com
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
(cherry picked from commit 0066688)
Signed-off-by: Jiandi An <jan@nvidia.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.