Skip to content

Stale SubjectSet during linking of large number of subjects #325

@lcjohnso

Description

@lcjohnso

We're still running into issues with stale subject sets during subject linking (as part of subject upload).

Connected to #290 but opening a new issue here to document a specific technical issue and solution.

Problem

A SubjectSet.add() call returns a 409 status and raises the following exception:
PanoptesAPIException: Attempted to touch a stale object: SubjectSet

Note: retries only apply to status code >=500; 409 breaks out of retry loop, and json_request raises the exception.

Recent instance: while trying to link 37k subjects London HogWatch (see Freshdesk ticket), stale object exception is raised after 23k are uploaded.

It is not clear what action is happening to the SubjectSet to trigger the stale object error. Originally, I thought this was happening due to subject set completeness checks and updates, but not all cases are explained by this (exception occurred when subject set is unlinked from workflow). Testing with a subject set name change (one of the only other editable fields on SubjectSet resource) while linking is in progress did not result in a stale object. Perhaps related to the

Previous Action

#298 added a SubjectSet.reload() just before the add() which helps, but if the input list is large (think: 10k subjects that is handed 100 at a time by batchable()), then the add action can run for long enough that something can happen to the subject set.

Potential Proposed Solution

Would we consider adding a SubjectSet.reload() as part of each batched add()? Specifically, we could implement this in a few different ways:

  1. (my pref) make the SubjectSet.add() function batchable instead of batching the super LinkCollection.add(), thereby forcing the reload for SubjectSet each time.
  2. (probably bad idea) add a reload to super LinkCollection class (see here: https://github.com/zooniverse/panoptes-python-client/blob/master/panoptes_client/panoptes.py#L1044), either for all parent classes or adding logic to only refresh for SubjectSet parent.

I also wonder if we should try to catch the 409 for SubjectSet.add() explicitly and retry. I'm bullish about essentially ignoring and overriding stale object errors for subject sets because adding subjects should not override any existing state that we would be concern about (unlike edits of other resources like Workflows or Projects).

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions