Skip to content

Conversation

@Dan-Flores
Copy link
Contributor

@Dan-Flores Dan-Flores commented Oct 29, 2025

This PR enables encoding frames on GPU device in VideoEncoder.

Previously, DeviceInterface and its subclasses only contained methods related to decoding. For now, we are going to break that rule and add two encoding specific methods: convertTensorToAVFrame, and setupHardwareFrameContext.

  • This is necessary to isolate CUDA headers to CUDA specific files.
  • I will propose options to decide where these functions belong longterm.

The constructor for VideoEncoder is unchanged.
Inspired by other torch operators, we will determine use the device the frames tensor` is on (thanks for the suggestion @NicolasHug!) :

frames = (torch.randint(0, 256, size=(5, 3, 64, 64), dtype=torch.uint8).to("cuda:0")
encoder = VideoEncoder(frames, output.mp4) # will encode on GPU

The code path in Encoder.cpp is unchanged on CPU.
On GPU, a member variable deviceInterface_ will be initialized and used to call GPU encoding functions.

  • In convertTensorToAVFrame, there are several TODOs to extend it. Right now, it only handles limited range color, and always encodes frames in the pixel format AV_PIX_FMT_NV12 (essentially yuv420p).

@meta-cla meta-cla bot added the CLA Signed This label is managed by the Meta Open Source bot. label Oct 29, 2025
@Dan-Flores Dan-Flores changed the title video encoder python file video encoder CUDA support Oct 29, 2025
@Dan-Flores Dan-Flores force-pushed the encode_gpu branch 2 times, most recently from f1678b7 to 997158f Compare November 3, 2025 20:56
@Dan-Flores Dan-Flores marked this pull request as ready for review November 3, 2025 21:07
@Dan-Flores Dan-Flores changed the title video encoder CUDA support Enable CUDA device for video encoder Nov 3, 2025
@Dan-Flores Dan-Flores marked this pull request as draft November 3, 2025 21:35
@Dan-Flores Dan-Flores changed the title Enable CUDA device for video encoder [wip] Enable CUDA device for video encoder Nov 6, 2025
@Dan-Flores Dan-Flores changed the title [wip] Enable CUDA device for video encoder Enable CUDA device for video encoder Nov 26, 2025

void VideoEncoder::initializeEncoder(
const VideoStreamOptions& videoStreamOptions) {
if (videoStreamOptions.device.is_cuda()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was hopiing we wouldn't need to support a device parameter anywhere. Any reason we can't just rely on the input frames device?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As per our offline discussion, I've updated this PR to not have an explicit device param, and instead use whichever device the frames Tensor is on.

avFrame->height = static_cast<int>(tensor.size(1));
avFrame->pts = frameIndex;

int ret = av_hwframe_get_buffer(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a note here that we're letting FFmpeg allocate the CUDA memory. I think we should explore allocating the memory with pytorch instead, so that we can automatically rely on pytorch's CUDA memory allocator, which should be more efficient. There could be a TODO to investigate how to do that (this is related to my comment about setupEncodingContext above).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would an example of this be allocateEmptyHWCTensor?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct, in allocateEmptyHWCTensor we pre-allocate the output tensors with pytorch, and thus with pytorch's CUDA memory allocator. But with the decoder, it's more of a necessity than an optimization: we want to output tensors, so of course we have to use pytorch to allocate those. We wouldn't want or be able to do that via FFmpeg.

In our case here, with the encoder, relying on pytorch for the allocation isn't a necessity: FFmpeg can do that. It's more of an optimization.

@Dan-Flores Dan-Flores marked this pull request as ready for review December 4, 2025 21:36
("mov", "h264_nvenc"),
("mp4", "hevc_nvenc"),
("avi", "h264_nvenc"),
# ("mkv", "av1_nvenc"), # av1_nvenc is not supported on CI
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's OK for now but add a TODO that we should re-enable it when not in CI. We should be able to have an in_CI() helper just like we do for in_fbcode(). Torchvision has: https://github.com/pytorch/vision/blob/6b56de1cc83386025f2bad87abf608077d1853f7/test/common_utils.py#L28

encoder_output = file_like.getvalue()
else:
raise ValueError(f"Unknown method: {method}")

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After we have encoded the frame, let's assert with ffprobe that the pixel format is NV12 (and any other relevant properties)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added the assert below and learned that ffmpeg does not distinguish nv12 from yuv420p in video metadata. I hope to gain a better understanding of this once we enable alternate pixel formats.

@pytest.mark.parametrize(
"format_codec",
[
("mov", "h264_nvenc"),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIUC, we get an error when not explicitly passing the codec. That's OK for this PR, but not a great UX. On CPU we don't require the user to pass the codec, so ideally it should work the same on GPU. Let's add a TODO to enable that, I think that should be one of the first follow-ups.

I acknowledge it might be related to the *find_codec* function that I suggested we remove earlier, in a previous comment. I also understand that the FFmpeg CLi requires the codec, but there should be no reason we have to :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree on the worse UX there, I'll add a TODO.

# Encode with FFmpeg CLI using nvenc codecs
format, codec = format_codec
device = "cuda"
pixel_format = "nv12"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We only support nv12 for now. We should error if the user tries to pass anything else.

Also note that the tests are passing when I remove pixel_format from both the FFmpeg CLI and our API calls - that's good, and it suggests we should remove it from this test as well. As long as we assert with ffprobe that the output is nv12, we're good (see other suggestion below)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a little tricky. FFmpeg CLI defaults to pix_fmt=gbrp which does no chroma subsampling, but somehow the the frame assertions pass against our VideoEncoder which defaults to pix_fmt=yuv420p.
For now, I'll keep specifying pix_fmt, and investigate and address this once we enable other pixel formats.

Comment on lines 1358 to 1365
# TODO-VideoEncoder: Ensure CI does not skip this test, as we know NVENC is available.
try:
subprocess.run(ffmpeg_cmd, check=True, capture_output=True)
except subprocess.CalledProcessError as e:
if b"No NVENC capable devices found" in e.stderr:
pytest.skip("NVENC not available on this system")
else:
raise
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the CI TODO above: I think the easiest way to address it is to remove the whole try/except block and to never skip this test. In the GitHub CI, it should always pass. In fbcode, it's consistently skipped anyway. I now think we should do that here in this PR rather than leaving it out as a TODO (sorry for suggesting a TODO earlier :) )

Comment on lines 142 to 147
virtual std::optional<UniqueAVFrame> convertTensorToAVFrame(
[[maybe_unused]] const torch::Tensor& tensor,
[[maybe_unused]] int frameIndex,
[[maybe_unused]] AVCodecContext* codecContext) {
return std::nullopt;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is all good, a few suggestions:

  • let's explicitly reflect in the name that it's about encoding and about CUDA. Something like convertCUDATensorToCUDAAVFrameForEncoding. The name is ugly but that's kind of on purpose. We should add a TODO here to re-consider the use of encoding via the device interface
  • Let's add a comment above indicating that these are here because we need the interface "plumbing" for not including CUDA-specific dependencies in CPU builds (related to TODO mentioned above)
  • Let's not return an optional. I don't thikn we need to?
  • Let's call TORCH_CHECK(false) here in this implementation. No one should ever call that except the CUDA interface.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for these suggestions, I've implemented them in my latest commit.

Comment on lines 150 to 151
virtual void setupHardwareFrameContext(
[[maybe_unused]] AVCodecContext* codecContext) {}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar comments:

  • rename to something like setupHardwareFrameContextForEncoding
  • use TORCH_CHECK(false).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Meta Open Source bot.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants