Rpmsg: rpmsg services should be properly destroyed after calling rpmsg_device_destroy() #17761
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Summary
This PR series addresses critical issues in the rpmsg/openamp subsystem related to mutex handling and service lifecycle management.
Changes
1 openamp/libmetal: Change mutex to recursive mutex to prevent deadlock
Changed libmetal mutex implementation from regular mutex to recursive mutex (rmutex) to prevent crashes when locks are acquired twice in nested scenarios.
Problem: During the destroy process, we hold rdev->lock while iterating rdev->endpoints to destroy all endpoints. However, rpmsg_destroy_ept() may send a name service message, which attempts to acquire rdev->lock again, leading to a deadlock and crash.
Solution: Use recursive mutex in libmetal to allow the same thread to acquire the lock multiple times safely.
This change already pushed to the OpenAMP/Libmetal community: OpenAMP/libmetal#352
2 drivers/rpmsg: Remove held status check after rmutex change
Since metal_mutex has been changed to rmutex_t, removed the redundant held status checks in rpmsg drivers (rpmsg_port.c, rpmsg_virtio_lite.c, rptun.c). The recursive mutex naturally handles re-acquisition scenarios.
3. rpmsg: Call unbound_cb for server/client services during destruction
Problem: Before this patch, rpmsg services were not properly destroyed after calling rpmsg_device_destroy(). This caused errors when calling rpmsg_device_created() during a second connection attempt with the peer.
Solution: Properly invoke unbound_cb callback for services used as server/client during the destruction process to ensure complete cleanup.
Impact
Rpmsg and Rptun
Testing
cmake -B cmake_out/v8a_server -DBOARD_CONFIG=qemu-armv8a:rpserver_ivshmem -GNinja
cmake --build cmake_out/v8a_server
cmake -B cmake_out/v8a_proxy -DBOARD_CONFIG=qemu-armv8a:rpproxy_ivshmem -GNinja
cmake --build cmake_out/v8a_proxy
qemu-system-aarch64 -cpu cortex-a53 -nographic
-machine virt,virtualization=on,gic-version=3
-chardev stdio,id=con,mux=on -serial chardev:con
-object memory-backend-file,discard-data=on,id=shmmem-shmem0,mem-path=/dev/shm/my_shmem0,size=4194304,share=yes
-device ivshmem-plain,id=shmem0,memdev=shmmem-shmem0,addr=0xb
-mon chardev=con,mode=readline -kernel ./nuttx/cmake_out/v8a_server/nuttx
-gdb tcp::7775
qemu-system-aarch64 -cpu cortex-a53 -nographic
-machine virt,virtualization=on,gic-version=3
-chardev stdio,id=con,mux=on -serial chardev:con
-object memory-backend-file,discard-data=on,id=shmmem-shmem0,mem-path=/dev/shm/my_shmem0,size=4194304,share=yes
-device ivshmem-plain,id=shmem0,memdev=shmmem-shmem0,addr=0xb
-mon chardev=con,mode=readline -kernel ./nuttx/cmake_out/v8a_proxy/nuttx
-gdb tcp::7776
server log: