Unit tests for swarm workflow #35

yedidyakfir · 2026-01-13T14:50:45Z

This commit adds extensive unit tests for the three main swarm workflow lifecycle functions (swarm_start_tasks, swarm_item_done, swarm_item_failed) and the shared handle_finish_tasks helper function.

Test Coverage:

30 unit tests covering sanity checks and edge cases
Tests for swarm_start_tasks (8 tests):
- Basic flow with max_concurrency
- Already started (idempotency)
- Empty tasks list
- Missing parameters and error conditions
Tests for swarm_item_done (6 tests):
- Basic completion flow
- Last item completion triggers swarm finish
- Missing parameters
- Concurrent completions
- Error handling
Tests for swarm_item_failed (8 tests):
- Continue after failure below threshold
- Stop at failure threshold
- stop_after_n_failures edge cases (None, 0, 1)
- Concurrent failures
- Missing parameters
Tests for handle_finish_tasks (5 tests):
- Starting next task
- Swarm completion
- No tasks left scenario
- Exception handling
Concurrency tests (2 tests):
- Multiple concurrent completions
- Multiple concurrent failures

Additional Files:

WORKFLOW_TEST_ANALYSIS.md: Comprehensive analysis document explaining:
- All edge cases and why they matter
- Where errors can occur and why
- Test strategy and coverage goals
- Potential bugs identified in the code

Fixes:

Updated conftest imports to avoid circular dependencies
All tests pass successfully (30/30)

v0.0.5

This commit adds extensive unit tests for the three main swarm workflow lifecycle functions (swarm_start_tasks, swarm_item_done, swarm_item_failed) and the shared handle_finish_tasks helper function. Test Coverage: - 30 unit tests covering sanity checks and edge cases - Tests for swarm_start_tasks (8 tests): * Basic flow with max_concurrency * Already started (idempotency) * Empty tasks list * Missing parameters and error conditions - Tests for swarm_item_done (6 tests): * Basic completion flow * Last item completion triggers swarm finish * Missing parameters * Concurrent completions * Error handling - Tests for swarm_item_failed (8 tests): * Continue after failure below threshold * Stop at failure threshold * stop_after_n_failures edge cases (None, 0, 1) * Concurrent failures * Missing parameters - Tests for handle_finish_tasks (5 tests): * Starting next task * Swarm completion * No tasks left scenario * Exception handling - Concurrency tests (2 tests): * Multiple concurrent completions * Multiple concurrent failures Additional Files: - WORKFLOW_TEST_ANALYSIS.md: Comprehensive analysis document explaining: * All edge cases and why they matter * Where errors can occur and why * Test strategy and coverage goals * Potential bugs identified in the code Fixes: - Updated conftest imports to avoid circular dependencies - All tests pass successfully (30/30)

Copilot

Pull request overview

This PR adds comprehensive unit test coverage for swarm workflow lifecycle functions in the mageflow system. The changes focus on testing three main event handlers (swarm_start_tasks, swarm_item_done, swarm_item_failed) and a shared helper function (handle_finish_tasks).

Changes:

Added 30 unit tests covering normal operation, edge cases, and concurrency scenarios
Fixed circular dependency issues by updating imports from tests.integration.hatchet.worker to tests.integration.hatchet.models
Included detailed test analysis documentation (WORKFLOW_TEST_ANALYSIS.md) explaining edge cases, potential bugs, and testing strategy

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 15 comments.

File	Description
tests/unit/swarm/test_workflows.py	New comprehensive test suite with 30 tests covering swarm workflow functions including sanity tests, edge cases, and concurrency scenarios
tests/unit/swarm/conftest.py	Updated import to fix circular dependency (ContextMessage from models instead of worker)
tests/unit/conftest.py	Updated import to fix circular dependency (ContextMessage from models instead of worker)
tests/unit/swarm/WORKFLOW_TEST_ANALYSIS.md	Detailed analysis document explaining edge cases, potential bugs, and test strategy for all workflow functions

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-01-13T14:56:33Z

tests/unit/swarm/WORKFLOW_TEST_ANALYSIS.md

+**Fix needed**: Should be:
+```python
+should_stop_after_failures = swarm_task.config.stop_after_n_failures is not None
+stop_after_n_failures = swarm_task.config.stop_after_n_failures if should_stop_after_failures else 0


The analysis identifies a potential bug where stop_after_n_failures=0 uses or 0 which makes the check len >= 0 always True. However, looking at the actual implementation (line 77-82 of workflows.py), the code checks should_stop_after_failures and too_many_errors. When stop_after_n_failures is 0, should_stop_after_failures would be True (since 0 is not None), and too_many_errors would be True (since any len >= 0). This means the swarm would stop immediately. The analysis is correct about the logic flaw, but the suggested fix may not be complete - it should clarify whether stop_after_n_failures=0 means "never stop" or "stop immediately on first failure".

Suggested change

**Fix needed**: Should be:

```python

should_stop_after_failures = swarm_task.config.stop_after_n_failures is not None

stop_after_n_failures = swarm_task.config.stop_after_n_failures if should_stop_after_failures else 0

**Fix needed**: Decide on the intended semantics and reflect them in the code. Assuming `None` and `0`

both mean "never stop due to failures", and any positive integer means "stop once failures reach that

number", the logic should be:

```python

raw_stop_after_n_failures = swarm_task.config.stop_after_n_failures

# Treat None and 0 as "never stop due to failures"

should_stop_after_failures = raw_stop_after_n_failures not in (None, 0)

stop_after_n_failures = raw_stop_after_n_failures if should_stop_after_failures else 0

Copilot · 2026-01-13T14:56:33Z