Skip to content

Conversation

@CV-Bowen
Copy link
Contributor

@CV-Bowen CV-Bowen commented Jan 4, 2026

Summary

This PR series addresses critical issues in the rpmsg/openamp subsystem related to mutex handling and service lifecycle management.

Changes

1 openamp/libmetal: Change mutex to recursive mutex to prevent deadlock

Changed libmetal mutex implementation from regular mutex to recursive mutex (rmutex) to prevent crashes when locks are acquired twice in nested scenarios.

Problem: During the destroy process, we hold rdev->lock while iterating rdev->endpoints to destroy all endpoints. However, rpmsg_destroy_ept() may send a name service message, which attempts to acquire rdev->lock again, leading to a deadlock and crash.

Solution: Use recursive mutex in libmetal to allow the same thread to acquire the lock multiple times safely.

This change already pushed to the OpenAMP/Libmetal community: OpenAMP/libmetal#352

2 drivers/rpmsg: Remove held status check after rmutex change

Since metal_mutex has been changed to rmutex_t, removed the redundant held status checks in rpmsg drivers (rpmsg_port.c, rpmsg_virtio_lite.c, rptun.c). The recursive mutex naturally handles re-acquisition scenarios.

3. rpmsg: Call unbound_cb for server/client services during destruction

Problem: Before this patch, rpmsg services were not properly destroyed after calling rpmsg_device_destroy(). This caused errors when calling rpmsg_device_created() during a second connection attempt with the peer.

Solution: Properly invoke unbound_cb callback for services used as server/client during the destruction process to ensure complete cleanup.

Impact

Rpmsg and Rptun

Testing

cmake -B cmake_out/v8a_server -DBOARD_CONFIG=qemu-armv8a:rpserver_ivshmem -GNinja
cmake --build cmake_out/v8a_server

cmake -B cmake_out/v8a_proxy -DBOARD_CONFIG=qemu-armv8a:rpproxy_ivshmem -GNinja
cmake --build cmake_out/v8a_proxy

qemu-system-aarch64 -cpu cortex-a53 -nographic
-machine virt,virtualization=on,gic-version=3
-chardev stdio,id=con,mux=on -serial chardev:con
-object memory-backend-file,discard-data=on,id=shmmem-shmem0,mem-path=/dev/shm/my_shmem0,size=4194304,share=yes
-device ivshmem-plain,id=shmem0,memdev=shmmem-shmem0,addr=0xb
-mon chardev=con,mode=readline -kernel ./nuttx/cmake_out/v8a_server/nuttx
-gdb tcp::7775

qemu-system-aarch64 -cpu cortex-a53 -nographic
-machine virt,virtualization=on,gic-version=3
-chardev stdio,id=con,mux=on -serial chardev:con
-object memory-backend-file,discard-data=on,id=shmmem-shmem0,mem-path=/dev/shm/my_shmem0,size=4194304,share=yes
-device ivshmem-plain,id=shmem0,memdev=shmmem-shmem0,addr=0xb
-mon chardev=con,mode=readline -kernel ./nuttx/cmake_out/v8a_proxy/nuttx
-gdb tcp::7776

server log:

NuttShell (NSH) NuttX-12.10.0
server> 
server> 
server> rptun ping all 1 1 1 1
[    0.000000] [ 5] [ EMERG] [server] ping times: 1
[    0.000000] [ 5] [ EMERG] [server] buffer_len: 2032, send_len: 17
[    0.000000] [ 5] [ EMERG] [server] avg: 0 s, 18603232 ns
[    0.000000] [ 5] [ EMERG] [server] min: 0 s, 18603232 ns
[    0.000000] [ 5] [ EMERG] [server] max: 0 s, 18603232 ns
[    0.000000] [ 5] [ EMERG] [server] rate: 0.007310 Mbits/sec
server> 
server> 
server> ps
  PID GROUP PRI POLICY   TYPE    NPX STATE    EVENT     SIGMASK            STACK    USED FILLED COMMAND
    0     0   0 FIFO     Kthread   - Ready              0000000000000000 0008160 0001792  21.9%  Idle_Task
    1     0 192 FIFO     Kthread   - Waiting  Semaphore 0000000000000000 0008096 0001344  16.6%  hpwork 0x4046dc08 0x4046dc88
    2     0 100 FIFO     Kthread   - Waiting  Semaphore 0000000000000000 0008096 0001344  16.6%  lpwork 0x4046dcb8 0x4046dd38
    4     0 224 FIFO     Kthread   - Waiting  Semaphore 0000000000000000 0008096 0001776  21.9%  rptun proxy 0x40488550
    5     5 100 FIFO     Task      - Running            0000000000000000 0008128 0004240  52.1%  nsh_main
server> 
server> rptun dump all
[    0.000000] [ 5] [ EMERG] [server] Remote: proxy headrx 8
[    0.000000] [ 5] [ EMERG] [server] Dump rpmsg info between cpu (master: yes)server <==> proxy:
[    0.000000] [ 5] [ EMERG] [server] rpmsg vq RX:
[    0.000000] [ 5] [ EMERG] [server] rpmsg vq TX:
[    0.000000] [ 5] [ EMERG] [server]   rpmsg ept list:
[    0.000000] [ 5] [ EMERG] [server]     ept NS
[    0.000000] [ 5] [ EMERG] [server]     ept rpmsg-sensor
[    0.000000] [ 5] [ EMERG] [server]     ept rpmsg-ping
[    0.000000] [ 5] [ EMERG] [server]     ept rpmsg-syslog
[    0.000000] [ 5] [ EMERG] [server]   rpmsg buffer list:
[    0.000000] [ 5] [ EMERG] [server]     RX buffer, total 8, pending 0
[    0.000000] [ 5] [ EMERG] [server]     TX buffer, total 8, pending 0
server> uname -a
NuttX server 12.10.0 0cb3aa6c84 Jan  4 2026 15:10:56 arm64 qemu-armv8a
server> 

@github-actions github-actions bot added Area: Drivers Drivers issues Size: M The size of the change in this PR is medium labels Jan 4, 2026
CV-Bowen and others added 3 commits January 4, 2026 15:28
To avoid the crash when lock are acquired twice, one case is in the
destory process:
we hold the rpdev->lock to iterate the rpdev->endpoints to destory all
the endpoints, but rpmsg_destroy_ept() may send the name service message,
and need acquire the rpdev->lock again to lead crash.

Signed-off-by: Bowen Wang <wangbowen6@xiaomi.com>
metal_mutex has been changed to rmutex_t, so do not need check
the held status again.

Signed-off-by: Bowen Wang <wangbowen6@xiaomi.com>
…stroyed

Before this patch, the rpmsg services will not be destroied after
call the rpmsg_device_destory(). And will cause error after
call rpmsg_device_created() at the secound connecting witt peer.

Signed-off-by: liaoao <liaoao@xiaomi.com>
Signed-off-by: Bowen Wang <wangbowen6@xiaomi.com>
@CV-Bowen CV-Bowen changed the title Should hold lock when access the rdev->endpoints Rpmsg: rpmsg services should be properly destroyed after calling rpmsg_device_destroy() Jan 4, 2026
@xiaoxiang781216 xiaoxiang781216 merged commit d019a63 into apache:master Jan 4, 2026
40 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Area: Drivers Drivers issues Size: M The size of the change in this PR is medium

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants