Merge tag 'v6.12.74-rt16' into nilrt/master/6.12#269
Merged
chaitu236 merged 590 commits intoni:nilrt/master/6.12from Mar 13, 2026
Merged
Merge tag 'v6.12.74-rt16' into nilrt/master/6.12#269chaitu236 merged 590 commits intoni:nilrt/master/6.12from
chaitu236 merged 590 commits intoni:nilrt/master/6.12from
Conversation
[ Upstream commit 56c430c7f06d838fe3b2077dbbc4cc0bf992312b ] Currently, dma_alloc_from_pool() unconditionally warns and dumps a stack trace when an allocation fails, with the message "Failed to get suitable pool". This conflates two distinct failure modes: 1. Configuration error: No atomic pool is available for the requested DMA mask (a fundamental system setup issue) 2. Resource Exhaustion: A suitable pool exists but is currently full (a recoverable runtime state) This lack of distinction prevents drivers from using __GFP_NOWARN to suppress error messages during temporary pressure spikes, such as when awaiting synchronous reclaim of descriptors. Refactor the error handling to distinguish these cases: - If no suitable pool is found, keep the unconditional WARN regarding the missing pool. - If a pool was found but is exhausted, respect __GFP_NOWARN and update the warning message to explicitly state "DMA pool exhausted". Fixes: 9420139 ("dma-pool: fix coherent pool allocations for IOMMU mappings") Signed-off-by: Sai Sree Kartheek Adivi <s-adivi@ti.com> Reviewed-by: Robin Murphy <robin.murphy@arm.com> Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com> Link: https://lore.kernel.org/r/20260128133554.3056582-1-s-adivi@ti.com Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 2614069c5912e9d6f1f57c262face1b368fb8c93 ]
Place the notes that resulted from going through the dl_server code in a
comment.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Stable-dep-of: 115135422562 ("sched/deadline: Fix 'stuck' dl_server")
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 115135422562e2f791e98a6f55ec57b2da3b3a95 ] Andrea reported the dl_server getting stuck for him. He tracked it down to a state where dl_server_start() saw dl_defer_running==1, but the dl_server's job is no longer valid at the time of dl_server_start(). In the state diagram this corresponds to [4] D->A (or dl_server_stop() due to no more runnable tasks) followed by [1], which in case of a lapsed deadline must then be A->B. Now our A has dl_defer_running==1, while B demands dl_defer_running==0, therefore it must get cleared when the CBS wakeup rules demand a replenish. Fixes: a110a81 ("sched/deadline: Deferrable dl server") Reported-by: Andrea Righi arighi@nvidia.com Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Juri Lelli <juri.lelli@redhat.com> Tested-by: Andrea Righi arighi@nvidia.com Link: https://lkml.kernel.org/r/20260123161645.2181752-1-arighi@nvidia.com Link: https://patch.msgid.link/20260130124100.GC1079264@noisy.programming.kicks-ass.net Signed-off-by: Sasha Levin <sashal@kernel.org>
commit 28f24068387169722b508bba6b5257cb68b86e74 upstream. The GPIO controller is configured as non-sleeping but it uses generic pinctrl helpers which use a mutex for synchronization. This can cause the following lockdep splat with shared GPIOs enabled on boards which have multiple devices using the same GPIO: BUG: sleeping function called from invalid context at kernel/locking/mutex.c:591 in_atomic(): 1, irqs_disabled(): 1, non_block: 0, pid: 142, name: kworker/u25:3 preempt_count: 1, expected: 0 RCU nest depth: 0, expected: 0 INFO: lockdep is turned off. irq event stamp: 46379 hardirqs last enabled at (46379): [<ffff8000813acb24>] _raw_spin_unlock_irqrestore+0x74/0x78 hardirqs last disabled at (46378): [<ffff8000813abf38>] _raw_spin_lock_irqsave+0x84/0x88 softirqs last enabled at (46330): [<ffff8000800c71b4>] handle_softirqs+0x4c4/0x4dc softirqs last disabled at (46295): [<ffff800080010674>] __do_softirq+0x14/0x20 CPU: 1 UID: 0 PID: 142 Comm: kworker/u25:3 Tainted: G C 6.19.0-rc4-next-20260105+ #11963 PREEMPT Tainted: [C]=CRAP Hardware name: Khadas VIM3 (DT) Workqueue: events_unbound deferred_probe_work_func Call trace: show_stack+0x18/0x24 (C) dump_stack_lvl+0x90/0xd0 dump_stack+0x18/0x24 __might_resched+0x144/0x248 __might_sleep+0x48/0x98 __mutex_lock+0x5c/0x894 mutex_lock_nested+0x24/0x30 pinctrl_get_device_gpio_range+0x44/0x128 pinctrl_gpio_set_config+0x40/0xdc gpiochip_generic_config+0x28/0x3c gpio_do_set_config+0xa8/0x194 gpiod_set_config+0x34/0xfc gpio_shared_proxy_set_config+0x6c/0xfc [gpio_shared_proxy] gpio_do_set_config+0xa8/0x194 gpiod_set_transitory+0x4c/0xf0 gpiod_configure_flags+0xa4/0x480 gpiod_find_and_request+0x1a0/0x574 gpiod_get_index+0x58/0x84 devm_gpiod_get_index+0x20/0xb4 devm_gpiod_get+0x18/0x24 mmc_pwrseq_emmc_probe+0x40/0xb8 platform_probe+0x5c/0xac really_probe+0xbc/0x298 __driver_probe_device+0x78/0x12c driver_probe_device+0xdc/0x164 __device_attach_driver+0xb8/0x138 bus_for_each_drv+0x80/0xdc __device_attach+0xa8/0x1b0 Fixes: 6ac7309 ("pinctrl: add driver for Amlogic Meson SoCs") Cc: stable@vger.kernel.org Reported-by: Marek Szyprowski <m.szyprowski@samsung.com> Closes: https://lore.kernel.org/all/00107523-7737-4b92-a785-14ce4e93b8cb@samsung.com/ Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@oss.qualcomm.com> Reviewed-by: Martin Blumenstingl <martin.blumenstingl@googlemail.com> Reviewed-by: Neil Armstrong <neil.armstrong@linaro.org> Signed-off-by: Linus Walleij <linusw@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 0ea05c4f7527a98f5946f96c829733788934311d upstream. The COMPAT_UTS_MACHINE for riscv was incorrectly defined as "riscv". Change it to "riscv32" to reflect the correct 32-bit compat name. Fixes: 06d0e37 ("riscv: compat: Add basic compat data type implementation") Cc: stable@vger.kernel.org Signed-off-by: Han Gao <gaohan@iscas.ac.cn> Reviewed-by: Guo Ren (Alibaba Damo Academy) <guoren@kernel.org> Link: https://patch.msgid.link/20260127190711.2264664-1-gaohan@iscas.ac.cn Signed-off-by: Paul Walmsley <pjw@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 45f6aed8a835ee2bdd0a5d5ee626a91fe285014f upstream. The peek_next method's doc comment incorrectly stated it accesses the "previous" node when it actually accesses the next node. Fix the documentation to accurately reflect the method's behavior. Fixes: 98c14e4 ("rust: rbtree: add cursor") Reviewed-by: Alice Ryhl <aliceryhl@google.com> Signed-off-by: Hang Shu <m18080292938@163.com> Reported-by: Miguel Ojeda <ojeda@kernel.org> Closes: Rust-for-Linux/linux#1205 Cc: stable@vger.kernel.org Reviewed-by: Gary Guo <gary@garyguo.net> Link: https://patch.msgid.link/20251107093921.3379954-1-m18080292938@163.com [ Reworded slightly. - Miguel ] Signed-off-by: Miguel Ojeda <ojeda@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit af20ae33e7dd949f2e770198e74ac8f058cb299d upstream.
`rustfmt` is configured via the `.rustfmt.toml` file in the source tree,
and we apply `rustfmt` to the macro expanded sources generated by the
`.rsi` target.
However, under an `O=` pointing to an external folder (i.e. not just
a subdir), `rustfmt` will not find the file when checking the parent
folders. Since the edition is configured in this file, this can lead to
errors when it encounters newer syntax, e.g.
error: expected one of `!`, `.`, `::`, `;`, `?`, `where`, `{`, or an operator, found `"rust_minimal"`
--> samples/rust/rust_minimal.rsi:29:49
|
28 | impl ::kernel::ModuleMetadata for RustMinimal {
| - while parsing this item list starting here
29 | const NAME: &'static ::kernel::str::CStr = c"rust_minimal";
| ^^^^^^^^^^^^^^ expected one of 8 possible tokens
30 | }
| - the item list ends here
|
= note: you may be trying to write a c-string literal
= note: c-string literals require Rust 2021 or later
= help: pass `--edition 2024` to `rustc`
= note: for more on editions, read https://doc.rust-lang.org/edition-guide
A workaround is to use `RUSTFMT=n`, which is documented in the `Makefile`
help for cases where macro expanded source may happen to break `rustfmt`
for other reasons, but this is not one of those cases.
One solution would be to pass `--edition`, but we want `rustfmt` to
use the entire configuration, even if currently we essentially use the
default configuration.
Thus explicitly give the path to the config file to `rustfmt` instead.
Reported-by: Alice Ryhl <aliceryhl@google.com>
Fixes: 2f7ab12 ("Kbuild: add Rust support")
Cc: stable@vger.kernel.org
Reviewed-by: Nathan Chancellor <nathan@kernel.org>
Reviewed-by: Gary Guo <gary@garyguo.net>
Link: https://patch.msgid.link/20260115183832.46595-1-ojeda@kernel.org
Signed-off-by: Miguel Ojeda <ojeda@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 9210f5ff6318163835d9e42ee68006be4da0f531 upstream. imx-card currently sets the slot width to the physical sample width for I2S links. This breaks controllers that use fixed-width slots (e.g. 32-bit FIFO words), causing the unused bits in the slot to contain undefined data when playing 16-bit streams. Do not override the slot width in the machine driver and let the CPU DAI select an appropriate default instead. This matches the behavior of simple-audio-card and avoids embedding controller-specific policy in the machine driver. On an i.MX8MP-based board using SAI as the I2S master with 32-bit slots, playing 16-bit audio resulted in spurious frequencies and an incorrect SAI data waveform, as the slot width was forced to 16 bits. After this change, audio artifacts are eliminated and the 16-bit samples correctly occupy the first half of the 32-bit slot, with the remaining bits padded with zeroes. Cc: stable@vger.kernel.org Fixes: aa73670 ("ASoC: imx-card: Add imx-card machine driver") Signed-off-by: Fabio Estevam <festevam@gmail.com> Acked-by: Shengjiu Wang <shengjiu.wang@gmail.com> Link: https://patch.msgid.link/20260118205030.1532696-1-festevam@gmail.com Signed-off-by: Mark Brown <broonie@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 4747bafaa50115d9667ece446b1d2d4aba83dc7f upstream. If nonemb_cmd->va fails to be allocated, free the allocation previously made by alloc_mcc_wrb(). Fixes: 50a4b82 ("scsi: be2iscsi: Fix to make boot discovery non-blocking") Cc: stable@vger.kernel.org Signed-off-by: Haoxiang Li <lihaoxiang@isrc.iscas.ac.cn> Link: https://patch.msgid.link/20251213083643.301240-1-lihaoxiang@isrc.iscas.ac.cn Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 9502b7df5a3c7e174f74f20324ac1fe781fc5c2d upstream. Add a DMI quirk for the Acer TravelMate P216-41-TCO fixing the issue where the internal microphone was not detected. Link: https://bugzilla.kernel.org/show_bug.cgi?id=220983 Cc: stable@vger.kernel.org Signed-off-by: Zhang Heng <zhangheng@kylinos.cn> Link: https://patch.msgid.link/20260126014952.3674450-1-zhangheng@kylinos.cn Signed-off-by: Mark Brown <broonie@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit d02f20a4de0c498fbba2b0e3c1496e72c630a91e upstream. In the existing implementation irq_shutdown does not mask the interrupts in hardware. This can cause spurious interrupts from the IO expander. Add masking to irq_shutdown to prevent spurious interrupts. Cc: stable@vger.kernel.org Signed-off-by: Martin Larsson <martin.larsson@actia.se> Reviewed-by: Linus Walleij <linusw@kernel.org> Link: https://lore.kernel.org/r/20260121125631.2758346-1-martin.larsson@actia.se Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@oss.qualcomm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 56bd3c0f749f45793d1eae1d0ddde4255c749bf6 upstream. Earlier in the function, the ha->flt buffer is allocated with size sizeof(struct qla_flt_header) + FLT_REGIONS_SIZE but freed in the error path with size SFP_DEV_SIZE. Fixes: 84318a9 ("scsi: qla2xxx: edif: Add send, receive, and accept for auth_els") Cc: stable@vger.kernel.org Signed-off-by: Thomas Fourier <fourier.thomas@gmail.com> Link: https://patch.msgid.link/20260112134326.55466-2-fourier.thomas@gmail.com Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 4b22ec1685ce1fc0d862dcda3225d852fb107995 upstream. efivar_entry_get() always returns success even if the underlying __efivar_entry_get() fails, masking errors. This may result in uninitialized heap memory being copied to userspace in the efivarfs_file_read() path. Fix it by returning the error from __efivar_entry_get(). Fixes: 2d82e62 ("efi: vars: Move efivar caching layer into efivarfs") Cc: <stable@vger.kernel.org> # v6.1+ Signed-off-by: Kohei Enju <kohei@enjuk.jp> Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 0fcee2cfc4b2e16e62ff8e0cc2cd8dd24efad65e upstream. There is a race condition in nvmet_bio_done() that can cause a NULL pointer dereference in blk_cgroup_bio_start(): 1. nvmet_bio_done() is called when a bio completes 2. nvmet_req_complete() is called, which invokes req->ops->queue_response(req) 3. The queue_response callback can re-queue and re-submit the same request 4. The re-submission reuses the same inline_bio from nvmet_req 5. Meanwhile, nvmet_req_bio_put() (called after nvmet_req_complete) invokes bio_uninit() for inline_bio, which sets bio->bi_blkg to NULL 6. The re-submitted bio enters submit_bio_noacct_nocheck() 7. blk_cgroup_bio_start() dereferences bio->bi_blkg, causing a crash: BUG: kernel NULL pointer dereference, address: 0000000000000028 #PF: supervisor read access in kernel mode RIP: 0010:blk_cgroup_bio_start+0x10/0xd0 Call Trace: submit_bio_noacct_nocheck+0x44/0x250 nvmet_bdev_execute_rw+0x254/0x370 [nvmet] process_one_work+0x193/0x3c0 worker_thread+0x281/0x3a0 Fix this by reordering nvmet_bio_done() to call nvmet_req_bio_put() BEFORE nvmet_req_complete(). This ensures the bio is cleaned up before the request can be re-submitted, preventing the race condition. Fixes: 190f4c2 ("nvmet: fix memory leak of bio integrity") Cc: Dmitry Bogdanov <d.bogdanov@yadro.com> Cc: stable@vger.kernel.org Cc: Guangwu Zhang <guazhang@redhat.com> Link: http://www.mail-archive.com/debian-kernel@lists.debian.org/msg146238.html Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Keith Busch <kbusch@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 7ca497be00163610afb663867db24ac408752f13 upstream.
Marking the whole controller as sleeping due to the pinctrl calls in the
.direction_{input,output} callbacks has the unfortunate side effect that
legitimate invocations of .get and .set, which cannot themselves sleep,
in atomic context now spew WARN()s from gpiolib.
However, as Heiko points out, the driver doing this is a bit silly to
begin with, as the pinctrl .gpio_set_direction hook doesn't even care
about the direction, the hook is only used to claim the mux. And sure
enough, the .gpio_request_enable hook exists to serve this very purpose,
so switch to that and remove the problematic business entirely.
Cc: stable@vger.kernel.org
Fixes: 20cf2aed89ac ("gpio: rockchip: mark the GPIO controller as sleeping")
Suggested-by: Heiko Stuebner <heiko@sntech.de>
Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Reviewed-by: Heiko Stuebner <heiko@sntech.de>
Link: https://lore.kernel.org/r/bddc0469f25843ca5ae0cf578ab3671435ae98a7.1769429546.git.robin.murphy@arm.com
Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@oss.qualcomm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 9b47d4eea3f7c1f620e95bda1d6221660bde7d7b upstream.
A KASAN warning can be triggered when vrealloc() changes the requested
size to a value that is not aligned to KASAN_GRANULE_SIZE.
------------[ cut here ]------------
WARNING: CPU: 2 PID: 1 at mm/kasan/shadow.c:174 kasan_unpoison+0x40/0x48
...
pc : kasan_unpoison+0x40/0x48
lr : __kasan_unpoison_vmalloc+0x40/0x68
Call trace:
kasan_unpoison+0x40/0x48 (P)
vrealloc_node_align_noprof+0x200/0x320
bpf_patch_insn_data+0x90/0x2f0
convert_ctx_accesses+0x8c0/0x1158
bpf_check+0x1488/0x1900
bpf_prog_load+0xd20/0x1258
__sys_bpf+0x96c/0xdf0
__arm64_sys_bpf+0x50/0xa0
invoke_syscall+0x90/0x160
Introduce a dedicated kasan_vrealloc() helper that centralizes KASAN
handling for vmalloc reallocations. The helper accounts for KASAN granule
alignment when growing or shrinking an allocation and ensures that partial
granules are handled correctly.
Use this helper from vrealloc_node_align_noprof() to fix poisoning logic.
[ryabinin.a.a@gmail.com: move kasan_enabled() check, fix build]
Link: https://lkml.kernel.org/r/20260119144509.32767-1-ryabinin.a.a@gmail.com
Link: https://lkml.kernel.org/r/20260113191516.31015-1-ryabinin.a.a@gmail.com
Fixes: d699440 ("mm: fix vrealloc()'s KASAN poisoning logic")
Signed-off-by: Andrey Ryabinin <ryabinin.a.a@gmail.com>
Reported-by: Maciej Żenczykowski <maze@google.com>
Reported-by: <joonki.min@samsung-slsi.corp-partner.google.com>
Closes: https://lkml.kernel.org/r/CANP3RGeuRW53vukDy7WDO3FiVgu34-xVJYkfpm08oLO3odYFrA@mail.gmail.com
Reviewed-by: Andrey Konovalov <andreyknvl@gmail.com>
Tested-by: Maciej Wieczor-Retman <maciej.wieczor-retman@intel.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Dmitriy Vyukov <dvyukov@google.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Uladzislau Rezki <urezki@gmail.com>
Cc: Vincenzo Frascino <vincenzo.frascino@arm.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit dccf46179ddd6c04c14be8ed584dc54665f53f0e upstream. Some subflow socket errors need to be reported to the MPTCP socket: the initial subflow connect (MP_CAPABLE), and the ones from the fallback sockets. The others are not propagated. The issue is that sock_error() was used to retrieve the error, which was also resetting the sk_err field. Because of that, when notifying the userspace about subflow close events later on from the MPTCP worker, the ssk->sk_err field was always 0. Now, the error (sk_err) is only reset when propagating it to the msk. Fixes: 15cc104 ("mptcp: deliver ssk errors to msk") Cc: stable@vger.kernel.org Reviewed-by: Geliang Tang <geliang@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://patch.msgid.link/20260127-net-mptcp-dup-nl-events-v1-3-7f71e1bc4feb@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 8467458dfa61b37e259e3485a5d3e415d08193c1 upstream. This validates the previous commit: subflow closed events are re-sent with less info when the initial subflow is disconnected after an error and each time a subflow is closed after that. In this new test, the userspace PM is involved because that's how it was discovered, but it is not specific to it. The initial subflow is terminated with a RESET, and that will cause the subflow disconnect. Then, a new subflow is initiated, but also got rejected, which cause a second subflow closed event, but not a third one. While at it, in case of failure to get the expected amount of events, the events are printed. The 'Fixes' tag here below is the same as the one from the previous commit: this patch here is not fixing anything wrong in the selftests, but it validates the previous fix for an issue introduced by this commit ID. Fixes: d82809b ("mptcp: avoid duplicated SUB_CLOSED events") Cc: stable@vger.kernel.org Reviewed-by: Geliang Tang <geliang@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://patch.msgid.link/20260127-net-mptcp-dup-nl-events-v1-2-7f71e1bc4feb@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 2ef9e3a3845d0a20b62b01f5b731debd0364688d upstream. This validates the previous commit: subflow closed events should contain an error field when a subflow got closed with an error, e.g. reset or timeout. For this test, the chk_evt_nr helper has been extended to check attributes in the matched events. In this test, the 2 subflow closed events should have an error. The 'Fixes' tag here below is the same as the one from the previous commit: this patch here is not fixing anything wrong in the selftests, but it validates the previous fix for an issue introduced by this commit ID. Fixes: 15cc104 ("mptcp: deliver ssk errors to msk") Cc: stable@vger.kernel.org Reviewed-by: Geliang Tang <geliang@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://patch.msgid.link/20260127-net-mptcp-dup-nl-events-v1-4-7f71e1bc4feb@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit c5d5ecf21fdd9ce91e6116feb3aa83cee73352cc upstream.
When running this mptcp_join.sh selftest on older kernel versions not
supporting local endpoints tracking, this test fails because 3 MP_JOIN
ACKs have been received, while only 2 were expected.
It is not clear why only 2 MP_JOIN ACKs were expected on old kernel
versions, while 3 MP_JOIN SYN and SYN+ACK were expected. When testing on
the v5.15.197 kernel, 3 MP_JOIN ACKs are seen, which is also what is
expected in the selftests included in this kernel version, see commit
f4480eaad489 ("selftests: mptcp: add missing join check").
Switch the expected MP_JOIN ACKs to 3. While at it, move this
chk_join_nr helper out of the special condition for older kernel
versions as it is now the same as with more recent ones. Also, invert
the condition to be more logical: what's expected on newer kernel
versions having such helper first.
Fixes: d4c81bb ("selftests: mptcp: join: support local endpoint being tracked or not")
Cc: stable@vger.kernel.org
Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20260127-net-mptcp-dup-nl-events-v1-5-7f71e1bc4feb@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit dd9e2f5b38f1fdd49b1ab6d3a85f81c14369eacc upstream.
Bernd has reported a lockdep splat from flexible proportions code that is
essentially complaining about the following race:
<timer fires>
run_timer_softirq - we are in softirq context
call_timer_fn
writeout_period
fprop_new_period
write_seqcount_begin(&p->sequence);
<hardirq is raised>
...
blk_mq_end_request()
blk_update_request()
ext4_end_bio()
folio_end_writeback()
__wb_writeout_add()
__fprop_add_percpu_max()
if (unlikely(max_frac < FPROP_FRAC_BASE)) {
fprop_fraction_percpu()
seq = read_seqcount_begin(&p->sequence);
- sees odd sequence so loops indefinitely
Note that a deadlock like this is only possible if the bdi has configured
maximum fraction of writeout throughput which is very rare in general but
frequent for example for FUSE bdis. To fix this problem we have to make
sure write section of the sequence counter is irqsafe.
Link: https://lkml.kernel.org/r/20260121112729.24463-2-jack@suse.cz
Fixes: a91befd ("lib/flex_proportions.c: remove local_irq_ops in fprop_new_period()")
Signed-off-by: Jan Kara <jack@suse.cz>
Reported-by: Bernd Schubert <bernd@bsbernd.com>
Link: https://lore.kernel.org/all/9b845a47-9aee-43dd-99bc-1a82bea00442@bsbernd.com/
Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Joanne Koong <joannelkoong@gmail.com>
Cc: Miklos Szeredi <miklos@szeredi.hu>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit a148a2040191b12b45b82cb29c281cb3036baf90 upstream. When a newly poisoned subpage ends up in an already poisoned hugetlb folio, 'num_poisoned_pages' is incremented, but the per node ->mf_stats is not. Fix the inconsistency by designating action_result() to update them both. While at it, define __get_huge_page_for_hwpoison() return values in terms of symbol names for better readibility. Also rename folio_set_hugetlb_hwpoison() to hugetlb_update_hwpoison() since the function does more than the conventional bit setting and the fact three possible return values are expected. Link: https://lkml.kernel.org/r/20260120232234.3462258-1-jane.chu@oracle.com Fixes: 18f41fa ("mm: memory-failure: bump memory failure stats to pglist_data") Signed-off-by: Jane Chu <jane.chu@oracle.com> Acked-by: Miaohe Lin <linmiaohe@huawei.com> Cc: Chris Mason <clm@meta.com> Cc: David Hildenbrand <david@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Jiaqi Yan <jiaqiyan@google.com> Cc: Liam R. Howlett <Liam.Howlett@oracle.com> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Muchun Song <muchun.song@linux.dev> Cc: Naoya Horiguchi <nao.horiguchi@gmail.com> Cc: Oscar Salvador <osalvador@suse.de> Cc: Suren Baghdasaryan <surenb@google.com> Cc: William Roche <william.roche@oracle.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
…l page pfn commit 057a6f2632c956483e2b2628477f0fcd1cd8a844 upstream. When a hugetlb folio is being poisoned again, try_memory_failure_hugetlb() passed head pfn to kill_accessing_process(), that is not right. The precise pfn of the poisoned page should be used in order to determine the precise vaddr as the SIGBUS payload. This issue has already been taken care of in the normal path, that is, hwpoison_user_mappings(), see [1][2]. Further more, for [3] to work correctly in the hugetlb repoisoning case, it's essential to inform VM the precise poisoned page, not the head page. [1] https://lkml.kernel.org/r/20231218135837.3310403-1-willy@infradead.org [2] https://lkml.kernel.org/r/20250224211445.2663312-1-jane.chu@oracle.com [3] https://lore.kernel.org/lkml/20251116013223.1557158-1-jiaqiyan@google.com/ Link: https://lkml.kernel.org/r/20260120232234.3462258-2-jane.chu@oracle.com Signed-off-by: Jane Chu <jane.chu@oracle.com> Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com> Acked-by: Miaohe Lin <linmiaohe@huawei.com> Cc: Chris Mason <clm@meta.com> Cc: David Hildenbrand <david@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Jiaqi Yan <jiaqiyan@google.com> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Muchun Song <muchun.song@linux.dev> Cc: Naoya Horiguchi <nao.horiguchi@gmail.com> Cc: Oscar Salvador <osalvador@suse.de> Cc: Suren Baghdasaryan <surenb@google.com> Cc: William Roche <william.roche@oracle.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 8a1968bd997f45a9b11aefeabdd1232e1b6c7184 upstream. The helper for shmem swap freeing is not handling the order of swap entries correctly. It uses xa_cmpxchg_irq to erase the swap entry, but it gets the entry order before that using xa_get_order without lock protection, and it may get an outdated order value if the entry is split or changed in other ways after the xa_get_order and before the xa_cmpxchg_irq. And besides, the order could grow and be larger than expected, and cause truncation to erase data beyond the end border. For example, if the target entry and following entries are swapped in or freed, then a large folio was added in place and swapped out, using the same entry, the xa_cmpxchg_irq will still succeed, it's very unlikely to happen though. To fix that, open code the Xarray cmpxchg and put the order retrieval and value checking in the same critical section. Also, ensure the order won't exceed the end border, skip it if the entry goes across the border. Skipping large swap entries crosses the end border is safe here. Shmem truncate iterates the range twice, in the first iteration, find_lock_entries already filtered such entries, and shmem will swapin the entries that cross the end border and partially truncate the folio (split the folio or at least zero part of it). So in the second loop here, if we see a swap entry that crosses the end order, it must at least have its content erased already. I observed random swapoff hangs and kernel panics when stress testing ZSWAP with shmem. After applying this patch, all problems are gone. Link: https://lkml.kernel.org/r/20260120-shmem-swap-fix-v3-1-3d33ebfbc057@tencent.com Fixes: 809bc86 ("mm: shmem: support large folio swap out") Signed-off-by: Kairui Song <kasong@tencent.com> Reviewed-by: Nhat Pham <nphamcs@gmail.com> Acked-by: Chris Li <chrisl@kernel.org> Cc: Baolin Wang <baolin.wang@linux.alibaba.com> Cc: Baoquan He <bhe@redhat.com> Cc: Barry Song <baohua@kernel.org> Cc: Hugh Dickins <hughd@google.com> Cc: Kemeng Shi <shikemeng@huaweicloud.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 426ca15c7f6cb6562a081341ca88893a50c59fa2 upstream. This patch enhances GSO segment handling by properly checking the SKB_GSO_DODGY flag for frag_list GSO packets, addressing low throughput issues observed when a station accesses IPv4 servers via hotspots with an IPv6-only upstream interface. Specifically, it fixes a bug in GSO segmentation when forwarding GRO packets containing a frag_list. The function skb_segment_list cannot correctly process GRO skbs that have been converted by XLAT, since XLAT only translates the header of the head skb. Consequently, skbs in the frag_list may remain untranslated, resulting in protocol inconsistencies and reduced throughput. To address this, the patch explicitly sets the SKB_GSO_DODGY flag for GSO packets in XLAT's IPv4/IPv6 protocol translation helpers (bpf_skb_proto_4_to_6 and bpf_skb_proto_6_to_4). This marks GSO packets as potentially modified after protocol translation. As a result, GSO segmentation will avoid using skb_segment_list and instead falls back to skb_segment for packets with the SKB_GSO_DODGY flag. This ensures that only safe and fully translated frag_list packets are processed by skb_segment_list, resolving protocol inconsistencies and improving throughput when forwarding GRO packets converted by XLAT. Signed-off-by: Jibin Zhang <jibin.zhang@mediatek.com> Fixes: 9fd1ff5 ("udp: Support UDP fraglist GRO/GSO.") Cc: stable@vger.kernel.org Link: https://patch.msgid.link/20260126152114.1211-1-jibin.zhang@mediatek.com Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 1b83ef9f7ad4635c913b80ef5e718f95f48e85af upstream. With nixpkgs's rustc, rust-src component is not bundled with the compiler by default and is instead provided from a separate store path, so this assumption does not hold. The assertion assumes these paths are in the same location which causes `make LLVM=1 rust-analyzer` to fail on NixOS. Link: https://rust-for-linux.zulipchat.com/#narrow/stream/x/topic/x/near/565284250 Signed-off-by: Onur Özkan <work@onurozkan.dev> Reviewed-by: Gary Guo <gary@garyguo.net> Fixes: fe99216 ("rust: Support latest version of `rust-analyzer`") Cc: stable@vger.kernel.org Link: https://patch.msgid.link/20251224135343.32476-1-work@onurozkan.dev [ Reworded title. - Miguel ] Signed-off-by: Miguel Ojeda <ojeda@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit ac3c50b9a24e9ebeb585679078d6c47922034bb6 upstream. Use `core_edition` for all sysroot crates rather than just core as all were updated to edition 2024 in Rust 1.87. Fixes: f4daa80 ("rust: compile libcore with edition 2024 for 1.87+") Signed-off-by: Tamir Duberstein <tamird@kernel.org> Cc: stable@vger.kernel.org Link: https://patch.msgid.link/20260116-rust-analyzer-sysroot-v2-1-094aedc33208@kernel.org [ Added `>`s to make the quote a single block. - Miguel ] Signed-off-by: Miguel Ojeda <ojeda@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 5157c328edb35bac05ce77da473c3209d20e0bbb upstream. Add a dependency edge from `compiler_builtins` to `core` to `scripts/generate_rust_analyzer.py` to match `rust/Makefile`. This has been incorrect since commit 8c4555c ("scripts: add `generate_rust_analyzer.py`") Signed-off-by: Tamir Duberstein <tamird@kernel.org> Reviewed-by: Jesung Yang <y.j3ms.n@gmail.com> Acked-by: Benno Lossin <lossin@kernel.org> Fixes: 8c4555c ("scripts: add `generate_rust_analyzer.py`") Cc: stable@vger.kernel.org Link: https://patch.msgid.link/20250723-rust-analyzer-pin-init-v1-1-3c6956173c78@kernel.org Signed-off-by: Miguel Ojeda <ojeda@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit dedb897f11c5d7e32c0e0a0eff7cec23a8047167 upstream.
The hw clock gating register sequence consists of register value pairs
that are written to the GPU during initialisation.
The a690 hwcg sequence has two GMU registers in it that used to amount
to random writes in the GPU mapping, but since commit 188db3d7fe66
("drm/msm/a6xx: Rebase GMU register offsets") they trigger a fault as
the updated offsets now lie outside the mapping. This in turn breaks
boot of machines like the Lenovo ThinkPad X13s.
Note that the updates of these GMU registers is already taken care of
properly since commit 40c297e ("drm/msm/a6xx: Set GMU CGC
properties on a6xx too"), but for some reason these two entries were
left in the table.
Fixes: 5e7665b ("drm/msm/adreno: Add Adreno A690 support")
Cc: stable@vger.kernel.org # 6.5
Cc: Bjorn Andersson <andersson@kernel.org>
Cc: Konrad Dybcio <konradybcio@kernel.org>
Signed-off-by: Johan Hovold <johan@kernel.org>
Reviewed-by: Konrad Dybcio <konrad.dybcio@oss.qualcomm.com>
Reviewed-by: Akhil P Oommen <akhilpo@oss.qualcomm.com>
Fixes: 188db3d7fe66 ("drm/msm/a6xx: Rebase GMU register offsets")
Patchwork: https://patchwork.freedesktop.org/patch/695778/
Message-ID: <20251221164552.19990-1-johan@kernel.org>
Signed-off-by: Rob Clark <robin.clark@oss.qualcomm.com>
(cherry picked from commit dcbd2f8280eea2c965453ed8c3c69d6f121e950b)
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit e535c23513c63f02f67e3e09e0787907029efeaf upstream. Make sure to drop the reference taken to the DDC device during probe on probe failure (e.g. probe deferral) and on driver unbind. Fixes: fcbc51e ("staging: drm/imx: Add support for Television Encoder (TVEv2)") Cc: stable@vger.kernel.org # 3.10 Cc: Philipp Zabel <p.zabel@pengutronix.de> Reviewed-by: Frank Li <Frank.Li@nxp.com> Signed-off-by: Johan Hovold <johan@kernel.org> Link: https://patch.msgid.link/20251030163456.15807-1-johan@kernel.org Signed-off-by: Maxime Ripard <mripard@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit d25b32aa829a3ed5570138e541a71fb7805faec3 ] Commit 27fc10d ("drm/amd/display: Fix the delta clamping for shaper LUT") fixed banding when using plane shaper LUT in DCN10 CM helper. The problem is also present in DCN30 CM helper, fix banding by extending the same bug delta clamping fix to CM3. Signed-off-by: Melissa Wen <mwen@igalia.com> Reviewed-by: Harry Wentland <harry.wentland@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 0274a54897f356f9c78767c4a2a5863f7dde90c6) Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 84962445cd8a83dc5bed4c8ad5bbb2c1cdb249a0 ] There is nothing wrong if in_shaper_func type is DISTRIBUTED POINTS. Remove the assert placed for a TODO to avoid misinterpretations. Signed-off-by: Melissa Wen <mwen@igalia.com> Reviewed-by: Alex Hung <alex.hung@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 1714dcc4c2c53e41190896eba263ed6328bcf415) Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit f514248727606b9087bc38a284ff686e0093abf1 ] fsl_xcvr_activate_ctl() has lockdep_assert_held(&card->snd_card->controls_rwsem), but fsl_xcvr_mode_put() calls it without acquiring this lock. Other callers of fsl_xcvr_activate_ctl() in fsl_xcvr_startup() and fsl_xcvr_shutdown() properly acquire the lock with down_read()/up_read(). Add the missing down_read()/up_read() calls around fsl_xcvr_activate_ctl() in fsl_xcvr_mode_put() to fix the lockdep assertion and prevent potential race conditions when multiple userspace threads access the control. Signed-off-by: Ziyi Guo <n7l8m4@u.northwestern.edu> Link: https://patch.msgid.link/20260202174112.2018402-1-n7l8m4@u.northwestern.edu Signed-off-by: Mark Brown <broonie@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit c62e0658d458d8f100445445c3ddb106f3824a45 ] Since commit 9880702 ("ACPI: property: Support using strings in reference properties") it is possible to use strings instead of local references. This work fine with single GPIO but not with arrays as acpi_gpio_package_count() didn't handle this case. Update it to handle strings like local references to cover this case as well. Signed-off-by: Alban Bedel <alban.bedel@lht.dlh.de> Reviewed-by: Mika Westerberg <mika.westerberg@linux.intel.com> Link: https://patch.msgid.link/20260129145944.3372777-1-alban.bedel@lht.dlh.de Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@oss.qualcomm.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
commit 14967a9 upstream. commit 59d9094 ("mm: hugetlb: independent PMD page table shared count") introduced ->pt_share_count dedicated to hugetlb PMD share count tracking, but omitted fixing copy_hugetlb_page_range(), leaving the function relying on page_count() for tracking that no longer works. When lazy page table copy for hugetlb is disabled, that is, revert commit bcd51a3 ("hugetlb: lazy page table copies in fork()") fork()'ing with hugetlb PMD sharing quickly lockup - [ 239.446559] watchdog: BUG: soft lockup - CPU#75 stuck for 27s! [ 239.446611] RIP: 0010:native_queued_spin_lock_slowpath+0x7e/0x2e0 [ 239.446631] Call Trace: [ 239.446633] <TASK> [ 239.446636] _raw_spin_lock+0x3f/0x60 [ 239.446639] copy_hugetlb_page_range+0x258/0xb50 [ 239.446645] copy_page_range+0x22b/0x2c0 [ 239.446651] dup_mmap+0x3e2/0x770 [ 239.446654] dup_mm.constprop.0+0x5e/0x230 [ 239.446657] copy_process+0xd17/0x1760 [ 239.446660] kernel_clone+0xc0/0x3e0 [ 239.446661] __do_sys_clone+0x65/0xa0 [ 239.446664] do_syscall_64+0x82/0x930 [ 239.446668] ? count_memcg_events+0xd2/0x190 [ 239.446671] ? syscall_trace_enter+0x14e/0x1f0 [ 239.446676] ? syscall_exit_work+0x118/0x150 [ 239.446677] ? arch_exit_to_user_mode_prepare.constprop.0+0x9/0xb0 [ 239.446681] ? clear_bhb_loop+0x30/0x80 [ 239.446684] ? clear_bhb_loop+0x30/0x80 [ 239.446686] entry_SYSCALL_64_after_hwframe+0x76/0x7e There are two options to resolve the potential latent issue: 1. warn against PMD sharing in copy_hugetlb_page_range(), 2. fix it. This patch opts for the second option. While at it, simplify the comment, the details are not actually relevant anymore. Link: https://lkml.kernel.org/r/20250916004520.1604530-1-jane.chu@oracle.com Fixes: 59d9094 ("mm: hugetlb: independent PMD page table shared count") Signed-off-by: Jane Chu <jane.chu@oracle.com> Reviewed-by: Harry Yoo <harry.yoo@oracle.com> Acked-by: Oscar Salvador <osalvador@suse.de> Acked-by: David Hildenbrand <david@redhat.com> Cc: Jann Horn <jannh@google.com> Cc: Liu Shixin <liushixin2@huawei.com> Cc: Muchun Song <muchun.song@linux.dev> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: David Hildenbrand (Arm) <david@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit ca1a47cd3f5f4c46ca188b1c9a27af87d1ab2216 upstream. Patch series "mm/hugetlb: fixes for PMD table sharing (incl. using mmu_gather)", v3. One functional fix, one performance regression fix, and two related comment fixes. I cleaned up my prototype I recently shared [1] for the performance fix, deferring most of the cleanups I had in the prototype to a later point. While doing that I identified the other things. The goal of this patch set is to be backported to stable trees "fairly" easily. At least patch ni#1 and ni#4. Patch ni#1 fixes hugetlb_pmd_shared() not detecting any sharing Patch ni#2 + ni#3 are simple comment fixes that patch ni#4 interacts with. Patch ni#4 is a fix for the reported performance regression due to excessive IPI broadcasts during fork()+exit(). The last patch is all about TLB flushes, IPIs and mmu_gather. Read: complicated There are plenty of cleanups in the future to be had + one reasonable optimization on x86. But that's all out of scope for this series. Runtime tested, with a focus on fixing the performance regression using the original reproducer [2] on x86. This patch (of 4): We switched from (wrongly) using the page count to an independent shared count. Now, shared page tables have a refcount of 1 (excluding speculative references) and instead use ptdesc->pt_share_count to identify sharing. We didn't convert hugetlb_pmd_shared(), so right now, we would never detect a shared PMD table as such, because sharing/unsharing no longer touches the refcount of a PMD table. Page migration, like mbind() or migrate_pages() would allow for migrating folios mapped into such shared PMD tables, even though the folios are not exclusive. In smaps we would account them as "private" although they are "shared", and we would be wrongly setting the PM_MMAP_EXCLUSIVE in the pagemap interface. Fix it by properly using ptdesc_pmd_is_shared() in hugetlb_pmd_shared(). Link: https://lkml.kernel.org/r/20251223214037.580860-1-david@kernel.org Link: https://lkml.kernel.org/r/20251223214037.580860-2-david@kernel.org Link: https://lore.kernel.org/all/8cab934d-4a56-44aa-b641-bfd7e23bd673@kernel.org/ [1] Link: https://lore.kernel.org/all/8cab934d-4a56-44aa-b641-bfd7e23bd673@kernel.org/ [2] Fixes: 59d9094 ("mm: hugetlb: independent PMD page table shared count") Signed-off-by: David Hildenbrand (Red Hat) <david@kernel.org> Reviewed-by: Rik van Riel <riel@surriel.com> Reviewed-by: Lance Yang <lance.yang@linux.dev> Tested-by: Lance Yang <lance.yang@linux.dev> Reviewed-by: Harry Yoo <harry.yoo@oracle.com> Tested-by: Laurence Oberman <loberman@redhat.com> Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Acked-by: Oscar Salvador <osalvador@suse.de> Cc: Liu Shixin <liushixin2@huawei.com> Cc: "Uschakow, Stanislav" <suschako@amazon.de> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: David Hildenbrand (Arm) <david@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 3937027caecb4f8251e82dd857ba1d749bb5a428 upstream. Ever since we stopped using the page count to detect shared PMD page tables, these comments are outdated. The only reason we have to flush the TLB early is because once we drop the i_mmap_rwsem, the previously shared page table could get freed (to then get reallocated and used for other purpose). So we really have to flush the TLB before that could happen. So let's simplify the comments a bit. The "If we unshared PMDs, the TLB flush was not recorded in mmu_gather." part introduced as in commit a4a118f ("hugetlbfs: flush TLBs correctly after huge_pmd_unshare") was confusing: sure it is recorded in the mmu_gather, otherwise tlb_flush_mmu_tlbonly() wouldn't do anything. So let's drop that comment while at it as well. We'll centralize these comments in a single helper as we rework the code next. Link: https://lkml.kernel.org/r/20251223214037.580860-3-david@kernel.org Fixes: 59d9094 ("mm: hugetlb: independent PMD page table shared count") Signed-off-by: David Hildenbrand (Red Hat) <david@kernel.org> Reviewed-by: Rik van Riel <riel@surriel.com> Tested-by: Laurence Oberman <loberman@redhat.com> Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Acked-by: Oscar Salvador <osalvador@suse.de> Reviewed-by: Harry Yoo <harry.yoo@oracle.com> Cc: Liu Shixin <liushixin2@huawei.com> Cc: Lance Yang <lance.yang@linux.dev> Cc: "Uschakow, Stanislav" <suschako@amazon.de> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: David Hildenbrand (Arm) <david@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
…ing mmu_gather commit 8ce720d5bd91e9dc16db3604aa4b1bf76770a9a1 upstream. As reported, ever since commit 1013af4 ("mm/hugetlb: fix huge_pmd_unshare() vs GUP-fast race") we can end up in some situations where we perform so many IPI broadcasts when unsharing hugetlb PMD page tables that it severely regresses some workloads. In particular, when we fork()+exit(), or when we munmap() a large area backed by many shared PMD tables, we perform one IPI broadcast per unshared PMD table. There are two optimizations to be had: (1) When we process (unshare) multiple such PMD tables, such as during exit(), it is sufficient to send a single IPI broadcast (as long as we respect locking rules) instead of one per PMD table. Locking prevents that any of these PMD tables could get reused before we drop the lock. (2) When we are not the last sharer (> 2 users including us), there is no need to send the IPI broadcast. The shared PMD tables cannot become exclusive (fully unshared) before an IPI will be broadcasted by the last sharer. Concurrent GUP-fast could walk into a PMD table just before we unshared it. It could then succeed in grabbing a page from the shared page table even after munmap() etc succeeded (and supressed an IPI). But there is not difference compared to GUP-fast just sleeping for a while after grabbing the page and re-enabling IRQs. Most importantly, GUP-fast will never walk into page tables that are no-longer shared, because the last sharer will issue an IPI broadcast. (if ever required, checking whether the PUD changed in GUP-fast after grabbing the page like we do in the PTE case could handle this) So let's rework PMD sharing TLB flushing + IPI sync to use the mmu_gather infrastructure so we can implement these optimizations and demystify the code at least a bit. Extend the mmu_gather infrastructure to be able to deal with our special hugetlb PMD table sharing implementation. To make initialization of the mmu_gather easier when working on a single VMA (in particular, when dealing with hugetlb), provide tlb_gather_mmu_vma(). We'll consolidate the handling for (full) unsharing of PMD tables in tlb_unshare_pmd_ptdesc() and tlb_flush_unshared_tables(), and track in "struct mmu_gather" whether we had (full) unsharing of PMD tables. Because locking is very special (concurrent unsharing+reuse must be prevented), we disallow deferring flushing to tlb_finish_mmu() and instead require an explicit earlier call to tlb_flush_unshared_tables(). From hugetlb code, we call huge_pmd_unshare_flush() where we make sure that the expected lock protecting us from concurrent unsharing+reuse is still held. Check with a VM_WARN_ON_ONCE() in tlb_finish_mmu() that tlb_flush_unshared_tables() was properly called earlier. Document it all properly. Notes about tlb_remove_table_sync_one() interaction with unsharing: There are two fairly tricky things: (1) tlb_remove_table_sync_one() is a NOP on architectures without CONFIG_MMU_GATHER_RCU_TABLE_FREE. Here, the assumption is that the previous TLB flush would send an IPI to all relevant CPUs. Careful: some architectures like x86 only send IPIs to all relevant CPUs when tlb->freed_tables is set. The relevant architectures should be selecting MMU_GATHER_RCU_TABLE_FREE, but x86 might not do that in stable kernels and it might have been problematic before this patch. Also, the arch flushing behavior (independent of IPIs) is different when tlb->freed_tables is set. Do we have to enlighten them to also take care of tlb->unshared_tables? So far we didn't care, so hopefully we are fine. Of course, we could be setting tlb->freed_tables as well, but that might then unnecessarily flush too much, because the semantics of tlb->freed_tables are a bit fuzzy. This patch changes nothing in this regard. (2) tlb_remove_table_sync_one() is not a NOP on architectures with CONFIG_MMU_GATHER_RCU_TABLE_FREE that actually don't need a sync. Take x86 as an example: in the common case (!pv, !X86_FEATURE_INVLPGB) we still issue IPIs during TLB flushes and don't actually need the second tlb_remove_table_sync_one(). This optimized can be implemented on top of this, by checking e.g., in tlb_remove_table_sync_one() whether we really need IPIs. But as described in (1), it really must honor tlb->freed_tables then to send IPIs to all relevant CPUs. Notes on TLB flushing changes: (1) Flushing for non-shared PMD tables We're converting from flush_hugetlb_tlb_range() to tlb_remove_huge_tlb_entry(). Given that we properly initialize the MMU gather in tlb_gather_mmu_vma() to be hugetlb aware, similar to __unmap_hugepage_range(), that should be fine. (2) Flushing for shared PMD tables We're converting from various things (flush_hugetlb_tlb_range(), tlb_flush_pmd_range(), flush_tlb_range()) to tlb_flush_pmd_range(). tlb_flush_pmd_range() achieves the same that tlb_remove_huge_tlb_entry() would achieve in these scenarios. Note that tlb_remove_huge_tlb_entry() also calls __tlb_remove_tlb_entry(), however that is only implemented on powerpc, which does not support PMD table sharing. Similar to (1), tlb_gather_mmu_vma() should make sure that TLB flushing keeps on working as expected. Further, note that the ptdesc_pmd_pts_dec() in huge_pmd_share() is not a concern, as we are holding the i_mmap_lock the whole time, preventing concurrent unsharing. That ptdesc_pmd_pts_dec() usage will be removed separately as a cleanup later. There are plenty more cleanups to be had, but they have to wait until this is fixed. [david@kernel.org: fix kerneldoc] Link: https://lkml.kernel.org/r/f223dd74-331c-412d-93fc-69e360a5006c@kernel.org Link: https://lkml.kernel.org/r/20251223214037.580860-5-david@kernel.org Fixes: 1013af4 ("mm/hugetlb: fix huge_pmd_unshare() vs GUP-fast race") Signed-off-by: David Hildenbrand (Red Hat) <david@kernel.org> Reported-by: "Uschakow, Stanislav" <suschako@amazon.de> Closes: https://lore.kernel.org/all/4d3878531c76479d9f8ca9789dc6485d@amazon.de/ Tested-by: Laurence Oberman <loberman@redhat.com> Acked-by: Harry Yoo <harry.yoo@oracle.com> Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Lance Yang <lance.yang@linux.dev> Cc: Liu Shixin <liushixin2@huawei.com> Cc: Oscar Salvador <osalvador@suse.de> Cc: Rik van Riel <riel@surriel.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: David Hildenbrand (Arm) <david@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 5ec5ac4ca27e4daa234540ac32f9fc5219377d53 upstream.
kasan_init_generic() indicates that kasan is fully initialized, so it
should be put at end of kasan_init().
Otherwise bringing up the primary CPU failed when CONFIG_KASAN is set
on PTW-enabled systems, here are the call chains:
kernel_entry()
start_kernel()
setup_arch()
kasan_init()
kasan_init_generic()
The reason is PTW-enabled systems have speculative accesses which means
memory accesses to the shadow memory after kasan_init() may be executed
by hardware before. However, accessing shadow memory is safe only after
kasan fully initialized because kasan_init() uses a temporary PGD table
until we have populated all levels of shadow page tables and writen the
PGD register. Moving kasan_init_generic() later can defer the occasion
of kasan_enabled(), so as to avoid speculative accesses on shadow pages.
After moving kasan_init_generic() to the end, kasan_init() can no longer
call kasan_mem_to_shadow() for shadow address conversion because it will
always return kasan_early_shadow_page. On the other hand, we should keep
the current logic of kasan_mem_to_shadow() for both the early and final
stage because there may be instrumentation before kasan_init().
To solve this, we factor out a new mem_to_shadow() function from current
kasan_mem_to_shadow() for the shadow address conversion in kasan_init().
Cc: stable@vger.kernel.org
Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
[ Huacai: To backport from upstream to 6.6 & 6.12, kasan_enabled() is
replaced with kasan_arch_is_ready() and kasan_init_generic()
is replaced with "kasan_early_stage = false". ]
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
This reverts commit 933466f which is commit db9ae3b upstream. We have had three independent production user reports in combination with Cilium utilizing WireGuard as encryption underneath that k8s Pod E/W traffic to certain peer nodes fully stalled. The situation appears as follows: - Occurs very rarely but at random times under heavy networking load. - Once the issue triggers the decryption side stops working completely for that WireGuard peer, other peers keep working fine. The stall happens also for newly initiated connections towards that particular WireGuard peer. - Only the decryption side is affected, never the encryption side. - Once it triggers, it never recovers and remains in this state, the CPU/mem on that node looks normal, no leak, busy loop or crash. - bpftrace on the affected system shows that wg_prev_queue_enqueue fails, thus the MAX_QUEUED_PACKETS (1024 skbs!) for the peer's rx_queue is reached. - Also, bpftrace shows that wg_packet_rx_poll for that peer is never called again after reaching this state for that peer. For other peers wg_packet_rx_poll does get called normally. - Commit db9ae3b ("wireguard: device: enable threaded NAPI") switched WireGuard to threaded NAPI by default. The default has not been changed for triggering the issue, neither did CPU hotplugging occur (i.e. 5bd8de2 ("wireguard: queueing: always return valid online CPU in wg_cpumask_choose_online()")). - The issue has been observed with stable kernels of v5.15 as well as v6.1. It was reported to us that v5.10 stable is working fine, and no report on v6.6 stable either (somewhat related discussion in [0] though). - In the WireGuard driver the only material difference between v5.10 stable and v5.15 stable is the switch to threaded NAPI by default. [0] https://lore.kernel.org/netdev/CA+wXwBTT74RErDGAnj98PqS=wvdh8eM1pi4q6tTdExtjnokKqA@mail.gmail.com/ Breakdown of the problem: 1) skbs arriving for decryption are enqueued to the peer->rx_queue in wg_packet_consume_data via wg_queue_enqueue_per_device_and_peer. 2) The latter only moves the skb into the MPSC peer queue if it does not surpass MAX_QUEUED_PACKETS (1024) which is kept track in an atomic counter via wg_prev_queue_enqueue. 3) In case enqueueing was successful, the skb is also queued up in the device queue, round-robin picks a next online CPU, and schedules the decryption worker. 4) The wg_packet_decrypt_worker, once scheduled, picks these up from the queue, decrypts the packets and once done calls into wg_queue_enqueue_per_peer_rx. 5) The latter updates the state to PACKET_STATE_CRYPTED on success and calls napi_schedule on the per peer->napi instance. 6) NAPI then polls via wg_packet_rx_poll. wg_prev_queue_peek checks on the peer->rx_queue. It will wg_prev_queue_dequeue if the queue->peeked skb was not cached yet, or just return the latter otherwise. (wg_prev_queue_drop_peeked later clears the cache.) 7) From an ordering perspective, the peer->rx_queue has skbs in order while the device queue with the per-CPU worker threads from a global ordering PoV can finish the decryption and signal the skb PACKET_STATE_CRYPTED out of order. 8) A situation can be observed that the first packet coming in will be stuck waiting for the decryption worker to be scheduled for a longer time when the system is under pressure. 9) While this is the case, the other CPUs in the meantime finish decryption and call into napi_schedule. 10) Now in wg_packet_rx_poll it picks up the first in-order skb from the peer->rx_queue and sees that its state is still PACKET_STATE_UNCRYPTED. The NAPI poll routine then exits early with work_done = 0 and calls napi_complete_done, signalling it "finished" processing. 11) The assumption in wg_packet_decrypt_worker is that when the decryption finished the subsequent napi_schedule will always lead to a later invocation of wg_packet_rx_poll to pick up the finished packet. 12) However, it appears that a later napi_schedule does /not/ schedule a later poll and thus no wg_packet_rx_poll. 13) If this situation happens exactly for the corner case where the decryption worker of the first packet is stuck and waiting to be scheduled, and the network load for WireGuard is very high then the queue can build up to MAX_QUEUED_PACKETS. 14) If this situation occurs, then no new decryption worker will be scheduled and also no new napi_schedule to make forward progress. 15) This means the peer->rx_queue stops processing packets completely and they are indefinitely stuck waiting for a new NAPI poll on that peer which never happens. New packets for that peer are then dropped due to full queue, as it has been observed on the production machines. Technically, the backport of commit db9ae3b ("wireguard: device: enable threaded NAPI") to stable should not have happened since it is more of an optimization rather than a pure fix and addresses a NAPI situation with utilizing many WireGuard tunnel devices in parallel. Revert it from stable given the backport triggers a regression for mentioned kernels. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Jason A. Donenfeld <Jason@zx2c4.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Commit b1bcaed1e39a ("cpuset: Treat cpusets in attaching as populated")
was backported to the long‑term support (LTS) branches. However, because
commit d5cf4d34a333 ("cgroup/cpuset: Don't track # of local child
partitions") was not backported, a corresponding adaptation to the
backported code is still required.
To ensure correct behavior, replace cgroup_is_populated with
cpuset_is_populated in the partition_is_populated function.
Cc: stable@vger.kernel.org # 6.1+
Fixes: b1bcaed1e39a ("cpuset: Treat cpusets in attaching as populated")
Cc: Waiman Long <longman@redhat.com>
Cc: Tejun Heo <tj@kernel.org>
Signed-off-by: Chen Ridong <chenridong@huawei.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 0209e21e3c372fa2da04c39214bec0b64e4eb5f4 upstream. A userspace program can trigger the RIVA NV3 arbitration code by calling the FBIOPUT_VSCREENINFO ioctl on /dev/fb*. When doing so, the driver recomputes FIFO arbitration parameters in nv3_arb(), using state->mclk_khz (derived from the PRAMDAC MCLK PLL) as a divisor without validating it first. In a normal setup, state->mclk_khz is provided by the real hardware and is non-zero. However, an attacker can construct a malicious or misconfigured device (e.g. a crafted/emulated PCI device) that exposes a bogus PLL configuration, causing state->mclk_khz to become zero. Once nv3_get_param() calls nv3_arb(), the division by state->mclk_khz in the gns calculation causes a divide error and crashes the kernel. Fix this by checking whether state->mclk_khz is zero and bailing out before doing the division. The following log reveals it: rivafb: setting virtual Y resolution to 2184 divide error: 0000 [ni#1] PREEMPT SMP KASAN PTI CPU: 0 PID: 2187 Comm: syz-executor.0 Not tainted 5.18.0-rc1+ ni#1 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.0-59-gc9ba5276e321-prebuilt.qemu.org 04/01/2014 RIP: 0010:nv3_arb drivers/video/fbdev/riva/riva_hw.c:439 [inline] RIP: 0010:nv3_get_param+0x3ab/0x13b0 drivers/video/fbdev/riva/riva_hw.c:546 Call Trace: nv3CalcArbitration.constprop.0+0x255/0x460 drivers/video/fbdev/riva/riva_hw.c:603 nv3UpdateArbitrationSettings drivers/video/fbdev/riva/riva_hw.c:637 [inline] CalcStateExt+0x447/0x1b90 drivers/video/fbdev/riva/riva_hw.c:1246 riva_load_video_mode+0x8a9/0xea0 drivers/video/fbdev/riva/fbdev.c:779 rivafb_set_par+0xc0/0x5f0 drivers/video/fbdev/riva/fbdev.c:1196 fb_set_var+0x604/0xeb0 drivers/video/fbdev/core/fbmem.c:1033 do_fb_ioctl+0x234/0x670 drivers/video/fbdev/core/fbmem.c:1109 fb_ioctl+0xdd/0x130 drivers/video/fbdev/core/fbmem.c:1188 __x64_sys_ioctl+0x122/0x190 fs/ioctl.c:856 Fixes: 1da177e ("Linux-2.6.12-rc2") Cc: stable@vger.kernel.org Signed-off-by: Guangshuo Li <lgs201920130244@gmail.com> Signed-off-by: Helge Deller <deller@gmx.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 120adae7b42faa641179270c067864544a50ab69 upstream. The UFX_IOCTL_REPORT_DAMAGE ioctl does not properly copy data from userspace to kernelspace, and instead directly references the memory, which can cause problems if invalid data is passed from userspace. Fix this all up by correctly copying the memory before accessing it within the kernel. Reported-by: Tianchu Chen <flynnnchen@tencent.com> Cc: stable <stable@kernel.org> Cc: Steve Glendinning <steve.glendinning@shawell.net> Cc: Helge Deller <deller@gmx.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Helge Deller <deller@gmx.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 761dac9073cd67d4705a94cd1af674945a117f4c upstream. It missed the stat count in f2fs_gc_range. Cc: stable@kernel.org Fixes: 9bf1dcb ("f2fs: fix to account gc stats correctly") Signed-off-by: Zhiguo Niu <zhiguo.niu@unisoc.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 0eda086de85e140f53c6123a4c00662f4e614ee4 upstream. Sysfs entry name is gc_pin_file_thresh instead of gc_pin_file_threshold, fix it. Cc: stable@kernel.org Fixes: c521a6a ("f2fs: fix to limit gc_pin_file_threshold") Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 98ea0039dbfdd00e5cc1b9a8afa40434476c0955 upstream.
Some f2fs sysfs attributes suffer from out-of-bounds memory access and
incorrect handling of integer values whose size is not 4 bytes.
For example:
vm:~# echo 65537 > /sys/fs/f2fs/vde/carve_out
vm:~# cat /sys/fs/f2fs/vde/carve_out
65537
vm:~# echo 4294967297 > /sys/fs/f2fs/vde/atgc_age_threshold
vm:~# cat /sys/fs/f2fs/vde/atgc_age_threshold
1
carve_out maps to {struct f2fs_sb_info}->carve_out, which is a 8-bit
integer. However, the sysfs interface allows setting it to a value
larger than 255, resulting in an out-of-range update.
atgc_age_threshold maps to {struct atgc_management}->age_threshold,
which is a 64-bit integer, but its sysfs interface cannot correctly set
values larger than UINT_MAX.
The root causes are:
1. __sbi_store() treats all default values as unsigned int, which
prevents updating integers larger than 4 bytes and causes out-of-bounds
writes for integers smaller than 4 bytes.
2. f2fs_sbi_show() also assumes all default values are unsigned int,
leading to out-of-bounds reads and incorrect access to integers larger
than 4 bytes.
This patch introduces {struct f2fs_attr}->size to record the actual size
of the integer associated with each sysfs attribute. With this
information, sysfs read and write operations can correctly access and
update values according to their real data size, avoiding memory
corruption and truncation.
Fixes: b59d0ba ("f2fs: add sysfs support for controlling the gc_thread")
Cc: stable@kernel.org
Signed-off-by: Jinbao Liu <liujinbao1@xiaomi.com>
Signed-off-by: Yongpeng Yang <yangyongpeng@xiaomi.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 5c145c03188bc9ba1c29e0bc4d527a5978fc47f9 upstream. Xiaolong Guo reported a f2fs bug in bugzilla [1] [1] https://bugzilla.kernel.org/show_bug.cgi?id=220951 Quoted: "When using stress-ng's swap stress test on F2FS filesystem with kernel 6.6+, the system experiences data corruption leading to either: 1 dm-verity corruption errors and device reboot 2 F2FS node corruption errors and boot hangs The issue occurs specifically when: 1 Using F2FS filesystem (ext4 is unaffected) 2 Swapfile size is less than F2FS section size (2MB) 3 Swapfile has fragmented physical layout (multiple non-contiguous extents) 4 Kernel version is 6.6+ (6.1 is unaffected) The root cause is in check_swap_activate() function in fs/f2fs/data.c. When the first extent of a small swapfile (< 2MB) is not aligned to section boundaries, the function incorrectly treats it as the last extent, failing to map subsequent extents. This results in incorrect swap_extent creation where only the first extent is mapped, causing subsequent swap writes to overwrite wrong physical locations (other files' data). Steps to Reproduce 1 Setup a device with F2FS-formatted userdata partition 2 Compile stress-ng from https://github.com/ColinIanKing/stress-ng 3 Run swap stress test: (Android devices) adb shell "cd /data/stressng; ./stress-ng-64 --metrics-brief --timeout 60 --swap 0" Log: 1 Ftrace shows in kernel 6.6, only first extent is mapped during second f2fs_map_blocks call in check_swap_activate(): stress-ng-swap-8990: f2fs_map_blocks: ino=11002, file offset=0, start blkaddr=0x43143, len=0x1 (Only 4KB mapped, not the full swapfile) 2 in kernel 6.1, both extents are correctly mapped: stress-ng-swap-5966: f2fs_map_blocks: ino=28011, file offset=0, start blkaddr=0x13cd4, len=0x1 stress-ng-swap-5966: f2fs_map_blocks: ino=28011, file offset=1, start blkaddr=0x60c84b, len=0xff The problematic code is in check_swap_activate(): if ((pblock - SM_I(sbi)->main_blkaddr) % blks_per_sec || nr_pblocks % blks_per_sec || !f2fs_valid_pinned_area(sbi, pblock)) { bool last_extent = false; not_aligned++; nr_pblocks = roundup(nr_pblocks, blks_per_sec); if (cur_lblock + nr_pblocks > sis->max) nr_pblocks -= blks_per_sec; /* this extent is last one */ if (!nr_pblocks) { nr_pblocks = last_lblock - cur_lblock; last_extent = true; } ret = f2fs_migrate_blocks(inode, cur_lblock, nr_pblocks); if (ret) { if (ret == -ENOENT) ret = -EINVAL; goto out; } if (!last_extent) goto retry; } When the first extent is unaligned and roundup(nr_pblocks, blks_per_sec) exceeds sis->max, we subtract blks_per_sec resulting in nr_pblocks = 0. The code then incorrectly assumes this is the last extent, sets nr_pblocks = last_lblock - cur_lblock (entire swapfile), and performs migration. After migration, it doesn't retry mapping, so subsequent extents are never processed. " In order to fix this issue, we need to lookup block mapping info after we migrate all blocks in the tail of swapfile. Cc: stable@kernel.org Fixes: 9703d69 ("f2fs: support file pinning for zoned devices") Cc: Daeho Jeong <daehojeong@google.com> Reported-and-tested-by: Xiaolong Guo <guoxiaolong2008@gmail.com> Closes: https://bugzilla.kernel.org/show_bug.cgi?id=220951 Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit ed1ac3c977dd6b119405fa36dd41f7151bd5b4de upstream. Commit 0b4eeee ("iommu/arm-smmu-qcom: Register the TBU driver in qcom_smmu_impl_init") intended to also probe the TBU driver when CONFIG_ARM_SMMU_QCOM_DEBUG is disabled, but also moved the corresponding platform_driver_register() call into qcom_smmu_impl_init() which is called from arm_smmu_device_probe(). However, it neither makes sense to register drivers from probe() callbacks of other drivers, nor does the driver core allow registering drivers with a device lock already being held. The latter was revealed by commit dc23806a7c47 ("driver core: enforce device_lock for driver_match_device()") leading to a deadlock condition described in [1]. Additionally, it was noted by Robin that the current approach is potentially racy with async probe [2]. Hence, fix this by registering the qcom_smmu_tbu_driver from module_init(). Unfortunately, due to the vendoring of the driver, this requires an indirection through arm-smmu-impl.c. Reported-by: Mark Brown <broonie@kernel.org> Closes: https://lore.kernel.org/lkml/7ae38e31-ef31-43ad-9106-7c76ea0e8596@sirena.org.uk/ Link: https://lore.kernel.org/lkml/DFU7CEPUSG9A.1KKGVW4HIPMSH@kernel.org/ [1] Link: https://lore.kernel.org/lkml/0c0d3707-9ea5-44f9-88a1-a65c62e3df8d@arm.com/ [2] Fixes: dc23806a7c47 ("driver core: enforce device_lock for driver_match_device()") Fixes: 0b4eeee ("iommu/arm-smmu-qcom: Register the TBU driver in qcom_smmu_impl_init") Acked-by: Robin Murphy <robin.murphy@arm.com> Tested-by: Bjorn Andersson <andersson@kernel.org> Reviewed-by: Bjorn Andersson <andersson@kernel.org> Acked-by: Konrad Dybcio <konradybcio@kernel.org> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Tested-by: Ioana Ciornei <ioana.ciornei@nxp.com> #LX2160ARDB Tested-by: Wang Jiayue <akaieurus@gmail.com> Reviewed-by: Wang Jiayue <akaieurus@gmail.com> Tested-by: Mark Brown <broonie@kernel.org> Acked-by: Joerg Roedel <joerg.roedel@amd.com> Link: https://patch.msgid.link/20260121141215.29658-1-dakr@kernel.org Signed-off-by: Danilo Krummrich <dakr@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit ce2739e482bce8d2c014d76c4531c877f382aa54 ]
As syzbot reported an use-after-free issue in f2fs_write_end_io().
It is caused by below race condition:
loop device umount
- worker_thread
- loop_process_work
- do_req_filebacked
- lo_rw_aio
- lo_rw_aio_complete
- blk_mq_end_request
- blk_update_request
- f2fs_write_end_io
- dec_page_count
- folio_end_writeback
- kill_f2fs_super
- kill_block_super
- f2fs_put_super
: free(sbi)
: get_pages(, F2FS_WB_CP_DATA)
accessed sbi which is freed
In kill_f2fs_super(), we will drop all page caches of f2fs inodes before
call free(sbi), it guarantee that all folios should end its writeback, so
it should be safe to access sbi before last folio_end_writeback().
Let's relocate ckpt thread wakeup flow before folio_end_writeback() to
resolve this issue.
Cc: stable@kernel.org
Fixes: e234088 ("f2fs: avoid wait if IO end up when do_checkpoint for better performance")
Reported-by: syzbot+b4444e3c972a7a124187@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=b4444e3c972a7a124187
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
[ folio => page ]
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
…nt atomic commit and checkpoint writes
[ Upstream commit 7633a7387eb4d0259d6bea945e1d3469cd135bbc ]
During SPO tests, when mounting F2FS, an -EINVAL error was returned from
f2fs_recover_inode_page. The issue occurred under the following scenario
Thread A Thread B
f2fs_ioc_commit_atomic_write
- f2fs_do_sync_file // atomic = true
- f2fs_fsync_node_pages
: last_folio = inode folio
: schedule before folio_lock(last_folio) f2fs_write_checkpoint
- block_operations// writeback last_folio
- schedule before f2fs_flush_nat_entries
: set_fsync_mark(last_folio, 1)
: set_dentry_mark(last_folio, 1)
: folio_mark_dirty(last_folio)
- __write_node_folio(last_folio)
: f2fs_down_read(&sbi->node_write)//block
- f2fs_flush_nat_entries
: {struct nat_entry}->flag |= BIT(IS_CHECKPOINTED)
- unblock_operations
: f2fs_up_write(&sbi->node_write)
f2fs_write_checkpoint//return
: f2fs_do_write_node_page()
f2fs_ioc_commit_atomic_write//return
SPO
Thread A calls f2fs_need_dentry_mark(sbi, ino), and the last_folio has
already been written once. However, the {struct nat_entry}->flag did not
have the IS_CHECKPOINTED set, causing set_dentry_mark(last_folio, 1) and
write last_folio again after Thread B finishes f2fs_write_checkpoint.
After SPO and reboot, it was detected that {struct node_info}->blk_addr
was not NULL_ADDR because Thread B successfully write the checkpoint.
This issue only occurs in atomic write scenarios. For regular file
fsync operations, the folio must be dirty. If
block_operations->f2fs_sync_node_pages successfully submit the folio
write, this path will not be executed. Otherwise, the
f2fs_write_checkpoint will need to wait for the folio write submission
to complete, as sbi->nr_pages[F2FS_DIRTY_NODES] > 0. Therefore, the
situation where f2fs_need_dentry_mark checks that the {struct
nat_entry}->flag /wo the IS_CHECKPOINTED flag, but the folio write has
already been submitted, will not occur.
Therefore, for atomic file fsync, sbi->node_write should be acquired
through __write_node_folio to ensure that the IS_CHECKPOINTED flag
correctly indicates that the checkpoint write has been completed.
Fixes: 608514d ("f2fs: set fsync mark only for the last dnode")
Cc: stable@kernel.org
Signed-off-by: Sheng Yong <shengyong1@xiaomi.com>
Signed-off-by: Jinbao Liu <liujinbao1@xiaomi.com>
Signed-off-by: Yongpeng Yang <yangyongpeng@xiaomi.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
[ folio => page ]
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 509f403f3ccec14188036212118651bf23599396 upstream. Add the following compositions: 0x10a1: RNDIS + tty (AT/NMEA) + tty (AT) + tty (diag) T: Bus=01 Lev=01 Prnt=01 Port=00 Cnt=01 Dev#= 9 Spd=480 MxCh= 0 D: Ver= 2.00 Cls=00(>ifc ) Sub=00 Prot=00 MxPS=64 #Cfgs= 1 P: Vendor=1bc7 ProdID=10a1 Rev=05.15 S: Manufacturer=Telit Cinterion S: Product=FN920 S: SerialNumber=d128dba9 C: #Ifs= 5 Cfg#= 1 Atr=e0 MxPwr=500mA I: If#= 0 Alt= 0 #EPs= 1 Cls=ef(misc ) Sub=04 Prot=01 Driver=rndis_host E: Ad=82(I) Atr=03(Int.) MxPS= 8 Ivl=32ms I: If#= 1 Alt= 0 #EPs= 2 Cls=0a(data ) Sub=00 Prot=00 Driver=rndis_host E: Ad=01(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms E: Ad=81(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms I: If#= 2 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=ff Prot=60 Driver=option E: Ad=02(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms E: Ad=83(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms E: Ad=84(I) Atr=03(Int.) MxPS= 10 Ivl=32ms I: If#= 3 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=ff Prot=40 Driver=option E: Ad=03(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms E: Ad=85(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms E: Ad=86(I) Atr=03(Int.) MxPS= 10 Ivl=32ms I: If#= 4 Alt= 0 #EPs= 2 Cls=ff(vend.) Sub=ff Prot=30 Driver=option E: Ad=04(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms E: Ad=87(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms 0x10a6: RNDIS + tty (AT/NMEA) + tty (AT) + tty (diag) T: Bus=01 Lev=01 Prnt=01 Port=00 Cnt=01 Dev#= 10 Spd=480 MxCh= 0 D: Ver= 2.00 Cls=00(>ifc ) Sub=00 Prot=00 MxPS=64 #Cfgs= 1 P: Vendor=1bc7 ProdID=10a6 Rev=05.15 S: Manufacturer=Telit Cinterion S: Product=FN920 S: SerialNumber=d128dba9 C: #Ifs= 5 Cfg#= 1 Atr=e0 MxPwr=500mA I: If#= 0 Alt= 0 #EPs= 1 Cls=ef(misc ) Sub=04 Prot=01 Driver=rndis_host E: Ad=82(I) Atr=03(Int.) MxPS= 8 Ivl=32ms I: If#= 1 Alt= 0 #EPs= 2 Cls=0a(data ) Sub=00 Prot=00 Driver=rndis_host E: Ad=01(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms E: Ad=81(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms I: If#= 2 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=ff Prot=40 Driver=option E: Ad=02(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms E: Ad=83(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms E: Ad=84(I) Atr=03(Int.) MxPS= 10 Ivl=32ms I: If#= 3 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=ff Prot=40 Driver=option E: Ad=03(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms E: Ad=85(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms E: Ad=86(I) Atr=03(Int.) MxPS= 10 Ivl=32ms I: If#= 4 Alt= 0 #EPs= 2 Cls=ff(vend.) Sub=ff Prot=30 Driver=option E: Ad=04(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms E: Ad=87(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms 0x10ab: RNDIS + tty (AT) + tty (diag) + DPL (Data Packet Logging) + adb T: Bus=01 Lev=01 Prnt=01 Port=00 Cnt=01 Dev#= 11 Spd=480 MxCh= 0 D: Ver= 2.00 Cls=00(>ifc ) Sub=00 Prot=00 MxPS=64 #Cfgs= 1 P: Vendor=1bc7 ProdID=10ab Rev=05.15 S: Manufacturer=Telit Cinterion S: Product=FN920 S: SerialNumber=d128dba9 C: #Ifs= 6 Cfg#= 1 Atr=e0 MxPwr=500mA I: If#= 0 Alt= 0 #EPs= 1 Cls=ef(misc ) Sub=04 Prot=01 Driver=rndis_host E: Ad=82(I) Atr=03(Int.) MxPS= 8 Ivl=32ms I: If#= 1 Alt= 0 #EPs= 2 Cls=0a(data ) Sub=00 Prot=00 Driver=rndis_host E: Ad=01(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms E: Ad=81(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms I: If#= 2 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=ff Prot=40 Driver=option E: Ad=02(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms E: Ad=83(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms E: Ad=84(I) Atr=03(Int.) MxPS= 10 Ivl=32ms I: If#= 3 Alt= 0 #EPs= 2 Cls=ff(vend.) Sub=ff Prot=30 Driver=option E: Ad=03(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms E: Ad=85(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms I: If#= 4 Alt= 0 #EPs= 1 Cls=ff(vend.) Sub=ff Prot=80 Driver=(none) E: Ad=86(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms I: If#= 5 Alt= 0 #EPs= 2 Cls=ff(vend.) Sub=42 Prot=01 Driver=(none) E: Ad=04(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms E: Ad=87(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms Cc: stable@vger.kernel.org Signed-off-by: Fabio Porcedda <fabio.porcedda@gmail.com> Signed-off-by: Johan Hovold <johan@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Link: https://lore.kernel.org/r/20260217200005.998240758@linuxfoundation.org Tested-by: Florian Fainelli <florian.fainelli@broadcom.com> Tested-by: Peter Schneider <pschneider1968@googlemail.com> Tested-by: Jon Hunter <jonathanh@nvidia.com> Tested-by: Salvatore Bonaccorso <carnil@debian.org> Tested-by: Brett A C Sheffield <bacs@librecast.net> Tested-by: Pavel Machek (CIP) <pavel@nabladev.com> Tested-by: Francesco Dolcini <francesco.dolcini@toradex.com> Tested-by: Mark Brown <broonie@kernel.org> Tested-by: Ron Economos <re@w6rz.net> Tested-by: Harshit Mogalapalli <harshit.m.mogalapalli@oracle.com> Tested-by: Miguel Ojeda <ojeda@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Daniel Wagner <wagi@monom.org>
v6.12.74-rt16 Signed-off-by: Shubhanshu Mani Tripathi <shubhanshu.mani.tripathi@emerson.com>
chaitu236
approved these changes
Mar 13, 2026
|
@gratian merging to get into tonight's build. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Justification
This is a merge of the latest upstream stable-rt tag v6.12.74-rt16.
There were no merge conflicts.
AB#3599959
Testing