Fix deadlock when an error occurs in the frame_generator#45
Fix deadlock when an error occurs in the frame_generator#45rskew wants to merge 22 commits intogeekscape:masterfrom
Conversation
179fedc to
28b0a23
Compare
|
@rskew , when I was looking into this, i suspected that the try / catch around the frame_generator on needed a finally to release the lock.I noticed your PR looks to release in the destroy stream. Did you consider releasing the lock in _create_frames_generator funciton? |
28b0a23 to
e5e722f
Compare
|
@jonochang The lock is released in which when aiko_services/src/aiko_services/main/pipeline.py Lines 1579 to 1581 in f2e42a1 The bug was in |
|
@jonochang Although note that in our testing using https://github.com/silverpond/aiko_services we saw slightly different behaviour due to also using the commits from #42 However the bug is the same |
Example graph:
__________
/ \ \
A B ---- C --->
\___/______/
has syntax in a pipeline definition:
"graph": [
"(A B (A.a_out_1: b_in_1 A.a_out_2: b_in_2) C (A.a_out_1: c_in_1 B.b_out_1: c_in_2 A.a_out_2: c_in_3))"
],
Note that output names must be fully-qualified, e.g. "B.b_out_1" instead
of "b_out_1". This is due to the graph traversal not yet handling edges
defined between B and C in the example graph, only between A and B, and
between A and C.
PipelineImpl posts to the listening response_queue and/or response_topic when the stream is destroyed. A stream creator might pass a queue_response or topic response when calling create_stream(), so it can be notified as frames are processed. If a stream exits due to an error in process_frame then these listeners will be notified, but if the stream exits without error then the listeners will not previously be notified.
destroy_stream() in an error condition Currently when an error is raised, _process_stream_event() will call destroy_stream() directly so that the stream is immediately terminated and cleaned up. However, _process_stream_event() releases stream.lock before calling destroy_stream(), allowing another thread to update stream.state before destroy_stream() can stop and clean-up the stream, meaning that stream.state cannot be used to signal that an error condition has occurred.
…es' start_stream() method to False, making the use of create_frames() the default
… pipeline elements
geekscape
left a comment
There was a problem hiding this comment.
Thanks for your fix.
I've made this change (just the fix) and pushed to master.
Since that change broke two of the unit tests, I've made a further change, which has also been pushed to master.
I have not yet included the unit test associated with the PR#45 fix, because it sounded like there is still a problem with that test to be resolved.
| raise RuntimeError("Simulated frame generator exception - this should cause unreleased lock!") | ||
|
|
||
| def process_frame(self, stream, **kwargs) -> Tuple[aiko.StreamEvent, dict]: | ||
| self.logger.warning(f"Processin frame {stream.frame_id}") |
|
Thanks @geekscape |
7d54176 to
239f6cb
Compare
When an exception is thrown in the frame generator, the stream lock was not being released.
This PR adds a test showing the problem, and the fix.
The test should pass, but reverting the changes to pipeline.py will cause the test to fail.