Skip to content

Revoke prior DNA task on re-upload (fix stuck-at-50%)#101

Merged
vanajmoorthy merged 1 commit into
mainfrom
feat/concurrent-upload-revoke
May 5, 2026
Merged

Revoke prior DNA task on re-upload (fix stuck-at-50%)#101
vanajmoorthy merged 1 commit into
mainfrom
feat/concurrent-upload-revoke

Conversation

@vanajmoorthy
Copy link
Copy Markdown
Owner

Summary

Fixes the "stuck at 50%" bug from PR #98's handoff doc (§2): uploading a CSV while a previous DNA task was still running caused the new task's progress bar to stall at ~50% until the old task finished.

Root cause: Both tasks ran concurrently and contended on Postgres row locks during `get_or_create` on shared Book/Author rows. The new task blocked on the old task's open transaction, so the dashboard's polling saw frozen progress.

Fix: Before dispatching the new DNA task, `upload_view` revokes the prior pending task (`AsyncResult.revoke(terminate=True, signal="SIGTERM")`) — best-effort, with a try/except so a failed revoke can't block the new upload. This matches user intent ("I clearly meant to replace that upload"), which the handoff doc preferred over a reject-while-pending guard.

Why no orphan-row cleanup is needed:

  • `calculate_full_dna` already deletes `UserBook` rows not in the current upload (`core/services/dna_analyser.py:687`).
  • Orphan `Book` and `Author` rows are shared resources — nothing depends on them being scoped to a single user's upload.
  • The existing upload-nonce mechanism makes any in-flight `enrich_book_task` from the revoked upload exit early.

Side fix: Switched `pending_dna_task_id` write to `UserProfile.objects.filter(...).update(...)`. The previous code accessed `request.user.userprofile` after `.delay()`, which in eager-mode tests fetched fresh after the task wrote DNA. My new revoke logic accesses `userprofile` before dispatch, caching a stale instance whose later `.save()` clobbered the DNA. `.update()` sidesteps the cached instance entirely.

Test plan

  • `test_upload_revokes_prior_pending_task` — prior task ready=False → revoke called with terminate=True, SIGTERM
  • `test_upload_does_not_revoke_completed_task` — prior task ready=True → revoke not called
  • `test_upload_succeeds_when_revoke_raises` — revoke raises → upload still succeeds, new task ID written
  • `test_reupload_clears_stale_userbooks` — re-upload with different books removes the old UserBook rows
  • Full suite: `docker-compose -f docker-compose.local.yml exec web poetry run python manage.py test` → 343 passed
  • Manual: upload a large CSV, immediately re-upload before the first finishes; confirm the second progress bar advances normally instead of stalling

When a user uploaded a CSV while a previous DNA task was still running, both
tasks contended on the same Book/Author row locks in Postgres. The dashboard
polled the new task, which appeared frozen (~50%) until the prior task
released its locks.

Now upload_view revokes the prior task (best-effort, SIGTERM) before
dispatching the new one. The existing nonce mechanism already protects
enrichment subtasks, and calculate_full_dna already deletes UserBook rows
not present in the new CSV — so there are no stale rows from the revoked
task to worry about.

Also switches the pending_dna_task_id write to .update() so a cached
UserProfile instance can't clobber DNA written inline by an eager task in
tests.
@vanajmoorthy vanajmoorthy merged commit fe8f480 into main May 5, 2026
1 check passed
@vanajmoorthy vanajmoorthy deleted the feat/concurrent-upload-revoke branch May 5, 2026 12:07
vanajmoorthy added a commit that referenced this pull request May 14, 2026
…-50% (#105)

PR #101 revoked the parent generate_reading_dna_task on re-upload but
left two paths uncovered:

1. check_author_mainstream_status_task had no nonce protection. A
   StoryGraph upload spawns one of these per new author, and they kept
   running after the parent task was revoked — flooding the worker
   queue with stale work and starving the new DNA task. Now accepts
   user_id + upload_nonce and exits early on mismatch.

2. enrich_book_task only checked the nonce at task start. A task that
   passed the start check could still sit briefly on Book.objects.get
   before its API call, during which a new upload could supersede it.
   Added a second re-check after the DB fetch.

Both dispatch sites in dna_analyser.py now pass user_id + upload_nonce.

Tests cover the new author-task nonce path (skip / run / backwards-
compat) and the new mid-task re-check on enrich_book_task.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant