Skip to content

Create 'heavy' queue for large conversion tasks#302

Merged
don-vip merged 2 commits intomasterfrom
worker-routing
Feb 4, 2026
Merged

Create 'heavy' queue for large conversion tasks#302
don-vip merged 2 commits intomasterfrom
worker-routing

Conversation

@Amdrel
Copy link
Collaborator

@Amdrel Amdrel commented Feb 3, 2026

Description

Resolves #284

This patch creates a new queue for processing memory-intensive tasks called 'heavy'. The purpose of this queue is to send tasks that require more than 50% of the memory of a worker (>8GB) to instances that process a single task at a time to prevent OOM errors from interrupting encoding tasks and bringing down workers.

Implementation

  • The heuristic for determining if a task is heavy is if the task exceeds 3840 pixels in either dimension, or if the bitrate is greater than 20Mbps.
    • Currently there is no special handling for videos that can have their content copied into another container using vcodec=copy. I believe those videos can technically be processed in the default queue without issue, but it's something I would have to test. For now anything fitting the criteria is sent to the heavy queue regardless if it can be copied or not.
  • This detection is implemented during the 'extracturl' phase of task creation as that's when yt-dlp is invoked and the necessary metadata is available for making the decision. The JS frontend then schedules the task on the queue returned by the API call.
  • There is an exception for manually uploaded files since yt-dlp isn't ran against those. For those ffprobe is used to collect the metadata when the task is created.
  • If yt-dlp is unable to determine the dimensions and bitrate of a video due to the site being pulled from lacking an extractor, the task is sent to the heavy queue by default (as is the case with our one NASA video source).

Changes

  • Send both 4k and 20Mbps videos to the heavy queue.
  • Videos under 4k and 20Mbps are sent to the default queue named 'celery'.
  • Updated puppet manifest to designate encoding05 and encoding06 as the heavy queue workers. We can discuss and adjust this as needed.
  • Restarted tasks are always placed on the heavy queue as a failsafe to increase the odds of the task processing successfully in case the current heuristic fails to detect the task as heavy.
  • The worker statistics collection script now measures the heavy queue as well.
  • encoding05 and encoding06 are currently configured to process tasks from both queues so that those workers don't sit idle while there are lighter tasks that are waiting to be processed.

Deployment

The deployment process using the script remains the same as the setup of the 'heavy' workers is done in the puppet manifest by checking the hostname of the worker and tweaking the celery params as necessary. To verify that the deployment was successful you can check if the command-line arguments of the workers have the correct values set, assuming the workers have had the opportunity to finish their tasks and restart first of course.

@Amdrel Amdrel requested a review from don-vip February 3, 2026 18:18
@Amdrel
Copy link
Collaborator Author

Amdrel commented Feb 3, 2026

I'm taking a look and reviewing #293 right now. I think that PR should be merged before this one. I can resolve any conflicts that arise in this one.

@don-vip
Copy link
Collaborator

don-vip commented Feb 3, 2026

I'm taking a look and reviewing #293 right now. I think that PR should be merged before this one. I can resolve any conflicts that arise in this one.

Agreed :)

Changes

I love this PR so much! All the ideas you have implemented here should significantly improve v2c reliability 👍

@Amdrel
Copy link
Collaborator Author

Amdrel commented Feb 4, 2026

It looks like the JS build validator doesn't verify the edge case where the file was removed prior to committing 🫠

@don-vip don-vip merged commit a502d82 into master Feb 4, 2026
5 checks passed
@don-vip don-vip deleted the worker-routing branch February 4, 2026 19:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

[Bug]: ExitCode 137/139 when uploading .mov videos

2 participants