Skip to content

Delete TURN allocation on socket close#101

Open
sgfn wants to merge 9 commits intomasterfrom
fix/437-alloc-mismatch
Open

Delete TURN allocation on socket close#101
sgfn wants to merge 9 commits intomasterfrom
fix/437-alloc-mismatch

Conversation

@sgfn
Copy link
Copy Markdown
Member

@sgfn sgfn commented May 5, 2026

Use ExTURN.Client.close/1 to send Refresh(lifetime=0) and delete the present allocation.

Resolves #100

ref: elixir-webrtc/ex_turn#10

joaothallis and others added 9 commits March 19, 2026 00:20
Before closing a socket, walk every relay candidate and gathering
transaction bound to it, invoke ExTURN.Client.close/1, and ship the
returned Refresh(lifetime=0) datagram on the original socket. Without
this, a TURN server (notably Cloudflare Realtime TURN) keeps the
5-tuple allocated until TTL expires; a future Allocate from the same
source port is then rejected with 437 Allocation Mismatch (RFC 5766
§6.2, RFC 8656 §6.2), gathering completes with no typ relay candidate,
and ICE fails.

Make the teardown run on abrupt parent death too. PeerConnection in
ex_webrtc 0.16 does not trap exits; if its DTLSTransport child crashes
(e.g. unifex_parse_arg when DTLS never negotiated), the linked cascade
kills ICE before PeerConnection can call ice_transport.close. Trap
exits in ICEAgent's init and propagate non-:normal EXITs as {:stop,
reason, state} so terminate/2 always runs the close path. :normal
EXITs (from short-lived children like gatherer worker processes) stay
noreply.

Transport.Mock in test support keeps closed sockets in the ETS table
with state: :closed so tests can assert what the agent sent on the
close path; setup_socket / open_ephemeral transparently reuse the slot
on re-open.

Depends on the matching ExTURN.Client.close/1 addition; pinned to that
commit via git dep until an ex_turn release ships.

Verified end-to-end against Cloudflare Realtime TURN via
ex_turn_cloudflare_repro: 20/20 iterations emit typ relay with zero
437s on narrow port-range cycling (was 0/20 without the fix, 437 on
iteration 1).
The handle_info({:EXIT, _, reason}, state) clause was uncovered: gen_server
intercepts EXITs from the parent and runs terminate/2 directly, so the
existing parent-death test never reached the clause. Add a test that links
a non-parent process to the agent and lets it exit abnormally, which is
the only path that actually drives the {:stop, reason, state} return.

Drop the case fallback in terminate/2; init/1 always returns a state map
with :ice_agent, so the _ -> :ok branch was unreachable.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
ExTURN.Client.close/1 emits no logs and the transport's send/4 returns
{:error, _} silently, so a failed Refresh leaves no breadcrumb — exactly
the failure mode that triggers 437 Allocation Mismatch on the next port
reuse. Surface the error at warning level instead of swallowing it.

Add a Transport.Mock.fail_send/2 hook so the test can drive a real
allocation to :allocated, force the close-path send to return :enotconn,
and assert on the captured log.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@sgfn sgfn requested a review from Karolk99 May 5, 2026 15:49
@sgfn
Copy link
Copy Markdown
Member Author

sgfn commented May 5, 2026

@joaothallis

Rationale behind my changes to your branch:

  1. Don't trap exits: We don't expect ExWebRTC to crash, preparing for it seems a bit overkill. A fix for the crash on ExWebRTC.DTLSTransport.close/1 is coming
  2. Don't call ExTURN.Client.close/1 on clients in gathering_transactions: these clients are not :allocated yet, they "move" to local_cands on :allocation_created. The additional cleanup step would only deal with the corner case where we received the :allocation_created message, but have yet to process it during teardown -- not worth covering IMHO
  3. Logger.warning -> debug: There's virtually nothing the user can do when that final send fails, and we'd rather keep the verbosity level low

Comments are welcome:)

Comment on lines +2489 to +2496
defp release_turn_allocation(ice_agent, socket, client) do
with {:send, turn_addr, data, _client} <- ExTURN.Client.close(client) do
case ice_agent.transport_module.send(socket, turn_addr, data) do
:ok -> :ok
{:error, reason} -> Logger.debug("Couldn't send deallocate request, reason: #{reason}")
end
end
end
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NITPICK
It looks nice (if ExTURN.close/1 returns {:ok, state} it automatically returns), but on the other hand it would be better to always return the same value.

Suggested change
defp release_turn_allocation(ice_agent, socket, client) do
with {:send, turn_addr, data, _client} <- ExTURN.Client.close(client) do
case ice_agent.transport_module.send(socket, turn_addr, data) do
:ok -> :ok
{:error, reason} -> Logger.debug("Couldn't send deallocate request, reason: #{reason}")
end
end
end
defp release_turn_allocation(ice_agent, socket, client) do
with {:send, turn_addr, data, _client} <- ExTURN.Client.close(client) do
:ok <- ice_agent.transport_module.send(socket, turn_addr, data) do
:ok
else
{:ok, _state} -> :ok
{:error, reason} -> Logger.debug("Couldn't send deallocate request, reason: #{reason}")
end
end
end

with {:send, turn_addr, data, _client} <- ExTURN.Client.close(client) do
case ice_agent.transport_module.send(socket, turn_addr, data) do
:ok -> :ok
{:error, reason} -> Logger.debug("Couldn't send deallocate request, reason: #{reason}")
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If that can cause problems for the user, maybe info would be better.

Copy link
Copy Markdown
Contributor

@Karolk99 Karolk99 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should look more closely at whether we need to close TURN allocations in cases when handle_terminate won't be called (when PeerConnection is closed with a different reason than :normal)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

ICEAgent.close/1 does not release TURN allocations: Cloudflare TURN returns 437 on socket reuse

3 participants