Skip to content

feat: add per-job stop capability to serverless worker#510

Open
KAJdev wants to merge 2 commits into
mainfrom
zeke/sls-41-add-stop-job-capability-to-runpod-python-sdk
Open

feat: add per-job stop capability to serverless worker#510
KAJdev wants to merge 2 commits into
mainfrom
zeke/sls-41-add-stop-job-capability-to-runpod-python-sdk

Conversation

@KAJdev
Copy link
Copy Markdown
Contributor

@KAJdev KAJdev commented Jun 5, 2026

A serverless worker that takes more than one job concurrently had no way to stop processing an individual request once it started. The only available lever was killing the entire worker, which also terminates the other healthy in-progress jobs on that worker. This is the root cause behind cancelled requests continuing to run and incur charges when a worker is handling several jobs at once.

This gives the worker a notion of stopping a single request. The worker now tracks each in-progress job by id and can cancel just that job's task, leaving its siblings untouched. Stop signal arrives via a new job-stop long-polling channel similar to the job-take long polling endpoint.

Handlers need no changes; async handlers holding resources can clean up by catching asyncio.CancelledError.

relies on https://github.com/runpod/ai-api/pull/881

Closes SLS-41.

@promptless
Copy link
Copy Markdown

promptless Bot commented Jun 5, 2026

Promptless prepared a documentation update related to this change.

Triggered by PR #510

Added documentation for the per-job stop capability to the main docs site. The update explains that workers handling multiple jobs concurrently can now stop individual jobs without affecting siblings, and includes guidance on catching asyncio.CancelledError for resource cleanup in async handlers.

Review: Document per-job stop capability for concurrent workers

@KAJdev KAJdev requested review from deanq and jhcipar June 6, 2026 00:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant