rtppeer: fix UAF when disconnect() triggers self-removal via status_change_event by ibiltari · Pull Request #150 · davidmoreno/rtpmidid

ibiltari · 2026-05-12T20:09:22Z

First, thank you for rtpmidid. It's been a small-but-load-bearing piece of a live-performance system we maintain, and it's been reliable across a lot of show hardware!.

Where this came from

CUEMS is our distributed cue-playback stack for live performance — audio, video, DMX, and timecode coordinated across a cluster of nodes. rtpmidid is what carries our MTC stream over the network. When the cluster is rolling, rtpmidid is on the critical path for every timecoded frame.

We hit this bug on debian 12 bookworm, previous releases of rtpmidid where working for us on debian 11. After migrating our system to 12, we found this issue. Whe were using rtpmidid release 24 and it hit there, and then updated to release 26 and it was still there. After the pach we are able to use it on debian 12 again with no problems, and it has been running consistently for about a week now with intensive testing (since rtpmidid is a key component in our system, nice work!).

Summary

Heap-use-after-free in rtppeer_t::reset() triggered from inside
rtppeer_t::disconnect(). send_goodbye() synchronously fires
status_change_event, whose rtpserverpeer_t slot calls
rtpserver_t::remove_peer(), which erases the owning rtpserverpeer_t
from the server's peers vector — dropping the last
shared_ptr<rtppeer_t> and freeing *this while disconnect() is
still on the stack. Control returns and disconnect() calls
this->reset() (lib/rtppeer.cpp:867)
on the freed object.

Impact

Under steady-state operation on a 3-host LAN cluster (Debian 12,
glibc 2.36), this fires ~317 times per boot on the server-side
rtpmidid, with a ~15-20 s natural retrigger interval driven by avahi
peer churn. Production symptom is malloc(): unaligned tcache chunk detected; the libfmt-vformat crash signature previously reported on
v24.12 is the same root cause — libfmt just happens to be the next
allocator user of the freed slab.

Reproducer (any one of)

systemctl restart rtpmidid on a peer host with [connect_to] config.
Any other rtpmidid on the LAN starting/stopping (avahi
BROWSER_REMOVE triggers the same cleanup path).
Steady-state with 2-3 instances visible — fires on its own every
~15 s as auto-reconnect after each crash triggers the next cascade.

ASAN+UBSAN build catches it on the first peer-disconnect of the run.

ASAN (key frames)

WRITE of size 4 at heap-use-after-free
  #0 rtpmidid::rtppeer_t::reset()       lib/rtppeer.cpp:55
  #1 rtpmidid::rtppeer_t::disconnect()  lib/rtppeer.cpp:867
freed by thread T0 here:
  #9  ~rtpserverpeer_t                  lib/rtpserverpeer.cpp:85
  #14 rtpserver_t::remove_peer(int)     lib/rtpserver.cpp:202
  #15 rtpserverpeer_t::status_change(...) lib/rtpserverpeer.cpp:127
  #21 signal_t<...>::operator()         include/rtpmidid/signal.hpp:105
  #22 rtppeer_t::send_goodbye(...)      lib/rtppeer.cpp:724
  #23 rtpmidid::rtppeer_t::disconnect() lib/rtppeer.cpp:865
previously allocated by thread T0 here:
  #7  std::make_shared<rtpmidid::rtppeer_t, ...>
  #8  rtpserverpeer_t::rtpserverpeer_t  lib/rtpserverpeer.cpp:32

(Full report can be attached on request.)

Fix

Make rtppeer_t inherit std::enable_shared_from_this<rtppeer_t>
and, in disconnect(), take a local shared_ptr via
weak_from_this().lock() at entry. The local keeps *this alive
across the synchronous signal storm so the trailing reset() runs on
valid memory.

weak_from_this().lock() rather than shared_from_this() because
two paths construct rtppeer_t without a managing shared_ptr —
rtpclient_t::peer is a value member (rtpclient.hpp:45), and the
unit tests stack-allocate rtpmidid::rtppeer_t peer("test") in
several test cases. shared_from_this() would throw bad_weak_ptr
on both. With lock() it just returns nullptr in those cases —
which is safe, because no external owner exists to drop the last
reference during the signal storm anyway.

Deferring the peers.erase() in remove_peer() was also considered
but rejected: ~rtpserver_t() calls send_goodbye() directly on
each peer, which would enqueue a deferred lambda capturing
this (the dying server) and UAF on the next poller tick.

Verified

ASAN+UBSAN build with the patch applied — ctest green; production
crash trigger no longer fires.

Thanks!!

…nnect() rtppeer_t::disconnect() called send_goodbye() and then reset() on *this. send_goodbye() synchronously fires status_change_event, whose rtpserverpeer slot calls rtpserver_t::remove_peer(). remove_peer() does peers.erase() on the owning vector, destructing the rtpserverpeer_t and dropping the last shared_ptr<rtppeer_t> — freeing *this. Control then returned up the stack to reset(), reading and writing freed memory. ASAN confirmed (heap-use-after-free, WRITE of size 4 at rtppeer.cpp:55, free chain disconnect -> send_goodbye -> status_change_event -> rtpserverpeer_t::status_change -> rtpserver_t::remove_peer -> vector::erase -> ~rtpserverpeer_t -> ~shared_ptr<rtppeer_t> -> freed). Fix: take a local shared_ptr at entry via weak_from_this().lock(). When the rtppeer is shared-owned (the server path that originally crashed), the local keeps it alive across the signal storm. When it isn't (rtpclient_t embeds rtppeer_t by value at rtpclient.hpp:45; tests stack-allocate it in tests/test_rtppeer.cpp), no external owner can drop the last reference mid-call, so the UAF is structurally impossible — and weak_from_this().lock() yielding nullptr is harmless. In production this crashed ~317 times per boot under steady cluster activity (~15-20s interval). It also explains the libfmt-vformat crash signature previously reported on v24.12 — same root-cause UAF, the libfmt allocator just happened to be the next user of the freed slab.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

rtppeer: fix UAF when disconnect() triggers self-removal via status_change_event#150

rtppeer: fix UAF when disconnect() triggers self-removal via status_change_event#150
ibiltari wants to merge 1 commit into
davidmoreno:masterfrom
stagesoft:upstream-pr/uaf-fix

ibiltari commented May 12, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

ibiltari commented May 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Where this came from

Summary

Impact

Reproducer (any one of)

ASAN (key frames)

Fix

Verified

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

ibiltari commented May 12, 2026 •

edited

Loading