You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/index.md
+3Lines changed: 3 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -20,6 +20,9 @@
20
20
21
21
</div>
22
22
23
+
!!! warning "Breaking Changes in v5.0.0"
24
+
Starting with version 5.0.0, **Pydantic support will become optional**. The default implementations of `Request`, `Response`, `DomainEvent`, and `NotificationEvent` will be migrated to dataclasses-based implementations.
Copy file name to clipboardExpand all lines: docs/saga/recovery.md
+61-13Lines changed: 61 additions & 13 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -10,23 +10,25 @@
10
10
11
11
</div>
12
12
13
-
Recovery ensures eventual consistency by resuming interrupted sagas from persistent storage, guaranteeing all sagas eventually reach a terminal state (COMPLETED or FAILED).
13
+
Recovery ensures eventual consistency by resuming interrupted sagas from persistent storage, guaranteeing all sagas eventually reach a terminal state (COMPLETED or FAILED). Recovery **attempts** are tracked per saga so you can limit retries and exclude persistently failing sagas.
14
14
15
15
## Overview
16
16
17
17
Sagas can be interrupted due to server crashes, network timeouts, or system overload. Recovery solves this by:
18
18
19
19
1. Persisting saga state after each step
20
-
2. Periodically scanning for incomplete sagas
20
+
2. Periodically scanning for incomplete sagas (via `get_sagas_for_recovery`)
21
21
3. Resuming execution from the last completed step
22
22
4. Completing compensation if saga was in compensating state
23
+
5. Tracking **recovery attempts** — on recovery failure, the storage increments `recovery_attempts` automatically so sagas can be retried or excluded when the limit is reached
23
24
24
25
## Eventual Consistency
25
26
26
27
The saga pattern ensures eventual consistency through:
27
28
28
29
-**Persistent State** — Saved after each step
29
30
-**Recovery Mechanism** — Interrupted sagas can be resumed
31
+
-**Recovery Attempts** — Each saga has a `recovery_attempts` counter; it is incremented automatically when recovery fails, so you can limit retries and exclude sagas that exceed `max_recovery_attempts`
30
32
-**Compensation Guarantee** — Failed sagas are always compensated
31
33
-**Terminal States** — All sagas eventually reach COMPLETED or FAILED
32
34
@@ -93,6 +95,33 @@ except RuntimeError:
93
95
**Status:**`COMPLETED`
94
96
**Recovery:** No action needed
95
97
98
+
## Recovery Attempts
99
+
100
+
Each saga in storage has a **recovery_attempts** counter. It is used to:
101
+
102
+
-**Limit retries** — Sagas that fail recovery repeatedly can be excluded from future recovery runs
103
+
-**Avoid infinite loops** — Persistently failing sagas (e.g. due to bad data) stop being picked after `max_recovery_attempts`
104
+
105
+
**Automatic increment:** When `recover_saga()` fails (exception during resume), the storage's `increment_recovery_attempts(saga_id, new_status=SagaStatus.FAILED)` is called automatically. Callers do **not** need to call `increment_recovery_attempts` themselves.
106
+
107
+
**Getting sagas for recovery:** Use `storage.get_sagas_for_recovery()` instead of a custom query:
108
+
109
+
```python
110
+
ids =await storage.get_sagas_for_recovery(
111
+
limit=50,
112
+
max_recovery_attempts=5, # Only sagas with recovery_attempts < 5
113
+
stale_after_seconds=120, # Only sagas not updated in last 2 minutes (avoids picking active sagas)
114
+
)
115
+
```
116
+
117
+
| Parameter | Description |
118
+
|-----------|-------------|
119
+
|`limit`| Maximum number of saga IDs to return |
120
+
|`max_recovery_attempts`| Only include sagas with `recovery_attempts` strictly less than this value (default: 5) |
121
+
|`stale_after_seconds`| If set, only include sagas whose `updated_at` is older than (now − this value). Use to avoid picking sagas currently being executed. `None` = no filter |
122
+
123
+
Returns saga IDs in status RUNNING, COMPENSATING, or FAILED, ordered by `updated_at` ascending (oldest first).
124
+
96
125
## Strict Backward Recovery
97
126
98
127
Once a saga enters `COMPENSATING` or `FAILED` status, forward execution is **permanently disabled**. Only compensation can proceed.
@@ -103,20 +132,33 @@ This prevents "zombie states" where compensation actions conflict with new execu
103
132
104
133
### Background Recovery Job
105
134
135
+
Use `storage.get_sagas_for_recovery()` to get saga IDs that need recovery. On recovery failure, `recover_saga()` calls `increment_recovery_attempts` internally — no extra code needed.
-**get_sagas_for_recovery** — Returns saga IDs that need recovery (RUNNING, COMPENSATING, FAILED) with`recovery_attempts`< `max_recovery_attempts`, optionally filtered by staleness. Used by recovery jobs.
39
+
-**increment_recovery_attempts** — Called automatically by `recover_saga()` on recovery failure; increments `recovery_attempts`and optionally updates status (e.g. to FAILED).
40
+
36
41
## Memory Storage
37
42
38
43
In-memory implementation for testing and development.
@@ -78,6 +83,7 @@ Database-backed implementation for production. It uses a session factory to mana
0 commit comments