Skip to content

sagas may need more ways to fail, especially when interacting with external services #66

@smklein

Description

@smklein

(FYI to @leftwo and @jmpesp, whom I chatted with about this scenario)

Suppose we have the following saga DAG:

  1. Action: Record "new resource" in a database, with state "creating". Undo: Delete "new resource" from database.
  2. Action: Make a request to an external service to provision the resource. Undo: Make a request to an external service to delete said resource.
  3. Action: Record "new resource" in the database as "created". (No undo action)

In this example saga graph, we can happily move "forward" and "backwards" through saga states, but can enter an awkward state if we fail saga execution while communicating with the external service.

Suppose we do the following:

  • Action for (1) (record the new resource in the DB)
  • Action for (2) (we send the request to the external service, but do not yet write the result to the saga log)
  • Crash
  • Action for (2) (we send the request to the external service, but this time, suppose the external service is not responding)

In this scenario, if we simply perform the "Node 1 undo action", there is a chance we're leaking resources. Concretely, let's suppose the "external service request" would be to provision storage from Crucible, or to provision an instance on a sled. If we simply delete our database record, the resource is consumed, but Nexus is unaware. In this particular case, it may be more correct to record that the provisioned resources exist, but are in a "failed" state.

Proposal: I think we need a way of identifying a "different" error pathway for actions, allowing us to distinguish between "clean errors" and "fatal errors" - akin to how we are recording the results of "successful actions", it seems equally useful to record the results of "unsuccessful actions", so we can know how to treat the database records associated with resources (either deletion or marking the state as dirty).

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions