Hotfix/subscription race by FieldSwan · Pull Request #1194 · RobotWebTools/rosbridge_suite

FieldSwan · 2026-03-18T15:14:47Z

Public API Changes
None

Background
rosbridge_server would sometimes partially crash (on the ROS2 side) when multiple clients connected/disconnected or reconnected on bootup:

[INFO] [1773839146.399416586] [rosbridge_websocket]: [Client 706ee77c-025c-4b4c-aed6-cf68aab19bd7] Subscribed to /foo/dashboard/current_path
Exception in thread Thread-1 (spin):
Traceback (most recent call last):
File "/usr/lib/python3.12/threading.py", line 1073, in _bootstrap_inner
self.run()
File "/usr/lib/python3.12/threading.py", line 1010, in run
self._target(*self._args, **self._kwargs)
File "/opt/ros/kilted/lib/python3.12/site-packages/rclpy/executors.py", line 374, in spin
self.spin_once()
File "/opt/ros/kilted/lib/python3.12/site-packages/rclpy/executors.py", line 968, in spin_once
self._spin_once_impl(timeout_sec)
File "/opt/ros/kilted/lib/python3.12/site-packages/rclpy/executors.py", line 951, in _spin_once_impl
handler, entity, node = self.wait_for_ready_callbacks(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/ros/kilted/lib/python3.12/site-packages/rclpy/executors.py", line 921, in wait_for_ready_callbacks
return next(self._cb_iter)
^^^^^^^^^^^^^^^^^^^
[INFO] [1773839146.413645247] [rosbridge_websocket]: [Client 706ee77c-025c-4b4c-aed6-cf68aab19bd7] Subscribed to /foo/vehicle_status
File "/opt/ros/kilted/lib/python3.12/site-packages/rclpy/executors.py", line 820, in _wait_for_ready_callbacks
waitable.add_to_wait_set(wait_set)
File "/opt/ros/kilted/lib/python3.12/site-packages/rclpy/event_handler.py", line 176, in add_to_wait_set
with self.__event:
rclpy._rclpy_pybind11.InvalidHandle: cannot use Destroyable because destruction was requested

Description

This PR fixes the issue by scheduling the destruction task on the executor thread instead of destroying subscriptions inside of their own callback.
In my stress-testing (3 clients with high CPU load and high bandwidth load, then reload each client) the fix resulted in subscriptions racing to send to the websocket after it was already down (better than crashing!) which caused delays on the client side for a short while (up to ~30 seconds) as it attempted to clear the queue while some subscriptions were still coming in. Finally when the queue is cleared it accepts new websocket connections and the bridge was functional again. To reduce this issue, we block sending messages out to the websocket as soon as we know the client has disconnected which seems to reduce the worst-case reconnect time to ~5 seconds.

Testing
I have not written a unit test for this, but the replication in general seems to be:

Have multiple clients waiting to connect to the rosbridge_server at the same time.
Due to system CPU stress, some clients timeout and attempt to reconnect
rosbridge_server has a partial crash where new web socket connections may be made, but no responses are ever sent back.

FieldSwan added 2 commits March 19, 2026 00:43

Fix subscription creation and destruction race conditions

3899e7d

Send fewer messages to closed websocket.

1f462ec

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Hotfix/subscription race#1194

Hotfix/subscription race#1194
FieldSwan wants to merge 2 commits intoRobotWebTools:ros2from
field-ai:hotfix/subscription-race

FieldSwan commented Mar 18, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

FieldSwan commented Mar 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

FieldSwan commented Mar 18, 2026 •

edited

Loading