Commit d72a0fe
Always take th->interrupt_lock in ubf_clear
Patch 0837263 fixed a race condition on ubfs, but it's only valid if right after
a call to `ubf_clear`, we assume the ubf function cannot be in the middle of running.
This patch removes an optimization in `ubf_clear` that violates that assumption. In short,
`ubf_clear` needs to take `th->interrupt_lock` unconditionally both to avoid deadlocks and to be
able to reason about when ubfs can be run.
This should fix CI errors like https://ci.rvm.jp/results/trunk-jemalloc@ruby-sp2-noble-docker/6242153.
The error was in test_timeout.rb, which had a deadlock during VM shutdown.
```ruby
r = Ractor.new do
begin
Timeout.timeout(0.1) { sleep }
rescue Timeout::Error
:ok
end
end.value
assert_equal :ok, r
```
The deadlock happened during `rb_ractor_terminate_interrupt_main_thread` with 2 ractors:
1) r1 t1: UBF called with t2->interrupt_lock (ubf = ubf_waiting)
2) r2 t2: ubf cleared from previous thread_sched_wait_events_call (but no lock taken, because of optimization)
3) r2 t2: thread_sched_wait_events: acquire thread_sched_lock(t2) (caller calling native_sleep() in loop)
4) r2 t2: ubf_set: try to acquire t2->interrupt_lock [block]
5) r1 t1: try to acquire thread_sched_lock(t2) [block, deadlock]
t2 needs to block on t2->interrupt_lock in step 2 until the ubf has completed. Only then can it register a new
ubf in the next `native_sleep` iteration.1 parent 44b99d6 commit d72a0fe
1 file changed
Lines changed: 5 additions & 7 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1055 | 1055 | | |
1056 | 1056 | | |
1057 | 1057 | | |
1058 | | - | |
1059 | | - | |
1060 | | - | |
1061 | | - | |
1062 | | - | |
1063 | | - | |
1064 | | - | |
| 1058 | + | |
| 1059 | + | |
| 1060 | + | |
| 1061 | + | |
1065 | 1062 | | |
| 1063 | + | |
1066 | 1064 | | |
1067 | 1065 | | |
1068 | 1066 | | |
| |||
0 commit comments