Fix race condition in websocket test#1185
Conversation
|
You sure you want to remove the |
|
Tested this code, doesn't fix the underlying issue mentioned in. #1144 Happens because the RCLPY (SingleThreaded)Executor may crash, which then still throws Me and @YannickdeHoop are working on an actual fix which uses the executor directly so you may never have race conditions for your results ever again: https://github.com/eurogroep/rosbridge_suite/tree/fix/use-executor-callback-queue @bjsowa LMK what you think of this. |
A test, demonstrating where the underlying issue is, would be helpful. |
It did help me identify a race condition, but specifically the race condition located in The test does not capture the original deadlock condition I found; |
@ikwilnaarhuisman did you try PR #1183 ? This is the fix I suggested could fix Issue #1144 |
|
@Mergifyio backport kilted |
✅ Backports have been createdDetails
|
* Integration test instrumentation fix * Revert unnecessary startup fix and remove corresponding always passing test (cherry picked from commit 85ff1cd)
|
@Mergifyio backport jazzy |
✅ Backports have been createdDetails
Cherry-pick of 85ff1cd has failed: To fix up this pull request, you can check it out locally. See documentation: https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/reviewing-changes-in-pull-requests/checking-out-pull-requests-locally |
* Integration test instrumentation fix * Revert unnecessary startup fix and remove corresponding always passing test (cherry picked from commit 85ff1cd) # Conflicts: # rosbridge_server/CMakeLists.txt # rosbridge_server/test/websocket/startup_race.test.py
* Integration test instrumentation fix * Revert unnecessary startup fix and remove corresponding always passing test (cherry picked from commit 85ff1cd) # Conflicts: # rosbridge_server/CMakeLists.txt # rosbridge_server/test/websocket/startup_race.test.py * Fix conflicts --------- Co-authored-by: FieldSwan <michael.swan@fieldai.com> Co-authored-by: Błażej Sowa <bsowa123@gmail.com>
Public API Changes
None
Description
This fixes an issue with the websocket testing instrumentation which had a race condition inherent in it that was causing my test to sometimes fail incorrectly (false negative) which led me to believe that other solutions were helping with other issues (they were not).
Changes:
Eventually this should fail, especially with stress running at the same time (load your CPU as much as possible). The fix was to use the reactorThread for sending messages and to add retries to getting the ros port parameter from the ros node.
3. Removes the
startup_race.test.pytest which, it turns out, was mostly exercising the websocket test instrumentation and not any particularly useful code paths.4. Reverts the previous "fix" I made to the server instantiation order. Upon further examination, I realized that the order does not matter since the websocket requests are not processed in
rosbridge_websocket.pyuntil we callawait stop_event.wait(), which is already after the ros2 executor and node are all started. I can remove this change if we want to keep the order of instantiation I had changed to earlier (either way should work fine).