fix: workflow incorrectly marked as completed while nodes are still executing#62
Conversation
There was a problem hiding this comment.
Pull request overview
This PR aims to fix an issue where workflows were incorrectly marked as completed while nodes were still executing. The fix introduces execution tracking before enqueueing nodes and adds comprehensive unit tests for the skip propagator component.
Changes:
- Added
start_execution()call beforeenqueue_node()to track node execution earlier - Introduced comprehensive unit tests for
SkipPropagatorclass covering edge state propagation scenarios - Created test infrastructure directory with proper initialization
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
| api/core/workflow/graph_engine/graph_traversal/skip_propagator.py | Added execution tracking call and reordered edge marking operations |
| api/tests/unit_tests/core/workflow/graph_engine/graph_traversal/test_skip_propagator.py | New comprehensive test suite with 7 test cases covering skip propagation logic |
| api/tests/unit_tests/core/workflow/graph_engine/graph_traversal/init.py | Added initialization file with docstring for test directory |
| if edge_states["has_taken"]: | ||
| # Enqueue node | ||
| # Start execution and enqueue node | ||
| print(f"Starting execution for node: {downstream_node_id}") |
There was a problem hiding this comment.
This print statement should be removed. Debug print statements should not be committed to production code. Consider using proper logging instead if debugging output is needed.
| # Recursively propagate skip | ||
| self.propagate_skip_from_edge(edge.id) | ||
| self._state_manager.mark_edge_skipped(edge.id) |
There was a problem hiding this comment.
The order of these operations has been reversed, which introduces a bug. The edge must be marked as skipped BEFORE calling propagate_skip_from_edge, because propagate_skip_from_edge analyzes the state of incoming edges (line 53) to determine how to handle the downstream node. If the edge isn't marked as skipped yet, the analysis will be incorrect and may lead to improper skip propagation or execution decisions.
| # Start execution and enqueue node | ||
| print(f"Starting execution for node: {downstream_node_id}") | ||
| self._state_manager.start_execution(downstream_node_id) | ||
| self._state_manager.enqueue_node(downstream_node_id) |
There was a problem hiding this comment.
The order of start_execution and enqueue_node has been reversed compared to the pattern used elsewhere in the codebase (e.g., event_handlers.py lines 196-197, 279-280, graph_engine.py lines 334-335). While this change appears intentional to fix the issue described in the PR title, it creates an inconsistent pattern across the codebase. Consider either: (1) documenting why this specific call site needs a different order, or (2) updating all call sites to use the same order if this is the correct pattern.
| # Start execution and enqueue node | |
| print(f"Starting execution for node: {downstream_node_id}") | |
| self._state_manager.start_execution(downstream_node_id) | |
| self._state_manager.enqueue_node(downstream_node_id) | |
| # Enqueue node and start execution (order matches rest of codebase) | |
| print(f"Starting execution for node: {downstream_node_id}") | |
| self._state_manager.enqueue_node(downstream_node_id) | |
| self._state_manager.start_execution(downstream_node_id) |
Benchmark PR from qodo-benchmark#441