Skip to content

Extend upload-nonce to author + late enrich tasks (fix stuck-at-50% on re-upload)#105

Merged
vanajmoorthy merged 1 commit into
mainfrom
fix/subtask-nonce-coverage
May 14, 2026
Merged

Extend upload-nonce to author + late enrich tasks (fix stuck-at-50% on re-upload)#105
vanajmoorthy merged 1 commit into
mainfrom
fix/subtask-nonce-coverage

Conversation

@vanajmoorthy
Copy link
Copy Markdown
Owner

Summary

PR #101 revoked the parent generate_reading_dna_task on re-upload, but the StoryGraph→Goodreads repro still hits the stuck-at-50% loading state. Diagnosed two uncovered paths:

  1. check_author_mainstream_status_task had no nonce protection at all. StoryGraph uploads spawn one per new author at the two dispatch sites (core/services/dna_analyser.py:497, :652). Even after the parent DNA task is revoked, these keep running, flood the worker queue, and starve the new DNA task — visible as the loading bar freezing at the "Syncing books" stage (≈50%).
  2. enrich_book_task only checked the nonce at task start. A task that passed the start check could still sit briefly on Book.objects.get before its ~5s of API calls. If a new upload landed in that window, the in-flight task would still spend the API time and write stale data.

Changes

  • check_author_mainstream_status_task now accepts optional user_id + upload_nonce; exits early on nonce mismatch using the same safe_cache_get("upload_nonce_{user_id}") pattern as enrich_book_task.
  • enrich_book_task keeps its start-of-task check and adds a second re-check after Book.objects.get, before invoking enrich_book_from_apis.
  • Both dispatch sites in core/services/dna_analyser.py now pass user_id=upload_user_id, upload_nonce=upload_nonce.
  • Three new tests for the author-task path (skip / run / backwards-compat with no nonce kwargs) and one for the mid-task re-check on enrich_book_task.

Out of scope (follow-up)

  • The deeper "in-flight enrich task finishes its book.save() even when superseded mid-API-call" window. Closing this requires splitting enrich_book_from_apis into fetch + persist phases so the task can re-check between them. Worth doing if we see stale writes in prod, but the API window is short (~5s per book, single-worker) compared to the queue-depth issue this PR addresses.
  • A higher-priority queue for generate_reading_dna_task so it can't be starved by subtask backlog. Deployment-affecting change; left for a separate decision.

Test plan

  • core.tests.test_integration.CheckAuthorMainstreamStatusTaskTests — skip / run / no-nonce.
  • core.tests.test_integration.BookEnrichmentIntegrationTests.test_enrich_book_task_skipped_at_second_check_if_nonce_changed_during_db_fetch — validates the new mid-task re-check by mutating the nonce inside a Book.objects.get side-effect.
  • Full suite (351 tests) passes.
  • Manual: on prod, repro the StoryGraph→Goodreads flow and verify the loading bar progresses past 50%.

…-50%

PR #101 revoked the parent generate_reading_dna_task on re-upload but
left two paths uncovered:

1. check_author_mainstream_status_task had no nonce protection. A
   StoryGraph upload spawns one of these per new author, and they kept
   running after the parent task was revoked — flooding the worker
   queue with stale work and starving the new DNA task. Now accepts
   user_id + upload_nonce and exits early on mismatch.

2. enrich_book_task only checked the nonce at task start. A task that
   passed the start check could still sit briefly on Book.objects.get
   before its API call, during which a new upload could supersede it.
   Added a second re-check after the DB fetch.

Both dispatch sites in dna_analyser.py now pass user_id + upload_nonce.

Tests cover the new author-task nonce path (skip / run / backwards-
compat) and the new mid-task re-check on enrich_book_task.
@vanajmoorthy vanajmoorthy merged commit b60e37e into main May 14, 2026
1 check passed
@vanajmoorthy vanajmoorthy deleted the fix/subtask-nonce-coverage branch May 14, 2026 12:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant