Skip to content

feat(batch): activate max_parallel + cooperative cancellation (PRP-34) #290

@w7-mgfcode

Description

@w7-mgfcode

Track the implementation of PRP-34 — Batch Parallel Execution on top of the PRP-33 batch-runner MVP (PR #281).

Summary

Activate the three forward-compat columns PRP-33 shipped on batch_job (max_parallel, running_items, cancelled_items) by rewiring BatchService.submit through a new app/features/batch/runner.py — a single asyncio.Semaphore(effective_parallel) inside an asyncio.TaskGroup with cooperative cancellation via a per-batch asyncio.Event.

Adds DELETE /batch/{batch_id} with bounded drain (default 30s, configurable via BATCH_CANCEL_DRAIN_TIMEOUT_SECONDS) and a max-parallel slider + cancel button on frontend/src/pages/visualize/batch.tsx.

Why now

  • The MVP runs items serially; the next operator who fans out 50–500 store/product pairs will reach for asyncio.gather (precedent: app/features/demo/pipeline.py:419) and exhaust the SQLAlchemy pool (pool_size=5, max_overflow=10) plus host RAM. PRP-34 makes unbounded fan-out unreachable from any code path.
  • Operators currently have no way to stop a misconfigured batch. DELETE + cooperative cancel is the missing control surface.
  • All schema is already in place (app/features/batch/models.py:136-139) — no new Alembic migration.

Source docs

  • INITIAL: PRPs/INITIAL/INITIAL-batch-parallel-execution.md (refreshed in PR docs(docs): refresh batch-parallel-execution INITIAL post-PRP-33 (#280) #283, PRP-ready)
  • PRP: PRPs/PRP-34-batch-parallel-execution.md (994 lines, confidence 8/10)
  • AI doc: PRPs/ai_docs/asyncio-taskgroup-cancellation.md — verified Python 3.12.13 asyncio semantics, including the corrected cancel mechanism (the INITIAL pseudocode's tg.cancel_scope.cancel() does not exist on stdlib asyncio.TaskGroup).

Both docs land on the feature branch alongside the implementation — not on dev directly.

Acceptance Criteria

See the PRP's "Success Criteria" section. Highlights:

  • `grep -rn "asyncio.gather" app/features/batch/` returns no production-code match.
  • `app/features/batch/runner.py` exists and is the only path that schedules `batch_job_item` execution.
  • `Settings.batch_global_max_parallel = 4`, `Settings.batch_cancel_drain_timeout_seconds = 30`; both listed in `.env.example`.
  • `DELETE /batch/{batch_id}` returns 200 (cancelled), 404 (unknown), 409 (terminal), 504 (drain timeout) — all RFC 7807.
  • 8 unit tests + 3 integration tests + 2 chaos tests pass; semaphore-cap regression is the load-bearing spec.
  • `frontend/src/pages/visualize/batch.tsx` renders a shadcn `Slider` (added via MCP) and `AlertDialog` cancel button.
  • All five validation gates green: ruff, mypy + pyright strict, pytest (unit + integration), frontend tsc + lint + test.
  • `uv run alembic check` reports no drift (no new migration).

Out of scope

Per the INITIAL's "Non-goals" — retry of failed items, priority queue, champion selection, process-wide cross-batch semaphore (deferred per the INITIAL's own recommendation), WebSocket streaming of progress, multi-host scale-out. Each is a separate INITIAL/PRP.

Dependencies

Branch

`feat/batch-parallel-execution` off `dev` (per `.claude/rules/branch-naming.md`).

Housekeeping

The `scope:batch` label does not exist in this repo yet (PRP-33's issue used `enhancement`). Suggested follow-up: `gh label create scope:batch --description 'Touches the batch area (commit-format scope)' --color bfd4f2` — matches the existing scope:* label palette.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or requestfeatNew feature

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions