Enable CUDA device for video encoder #1008

Dan-Flores · 2025-10-29T05:43:32Z

This PR enables encoding frames on GPU device in VideoEncoder.

Previously, DeviceInterface and its subclasses only contained methods related to decoding. For now, we are going to break that rule and add two encoding specific methods: convertTensorToAVFrame, and setupHardwareFrameContext.

This is necessary to isolate CUDA headers to CUDA specific files.
I will propose options to decide where these functions belong longterm.

The constructor for VideoEncoder is unchanged.
Inspired by other torch operators, we will determine use the device the frames tensor` is on (thanks for the suggestion @NicolasHug!) :

frames = (torch.randint(0, 256, size=(5, 3, 64, 64), dtype=torch.uint8).to("cuda:0")
encoder = VideoEncoder(frames, output.mp4) # will encode on GPU

The code path in Encoder.cpp is unchanged on CPU.
On GPU, a member variable deviceInterface_ will be initialized and used to call GPU encoding functions.

In convertTensorToAVFrame, there are several TODOs to extend it. Right now, it only handles limited range color, and always encodes frames in the pixel format AV_PIX_FMT_NV12 (essentially yuv420p).

…o encode_gpu

NicolasHug · 2025-11-26T13:41:28Z

src/torchcodec/_core/Encoder.cpp


 void VideoEncoder::initializeEncoder(
    const VideoStreamOptions& videoStreamOptions) {
+  if (videoStreamOptions.device.is_cuda()) {


I was hopiing we wouldn't need to support a device parameter anywhere. Any reason we can't just rely on the input frames device?

As per our offline discussion, I've updated this PR to not have an explicit device param, and instead use whichever device the frames Tensor is on.

src/torchcodec/_core/CudaDeviceInterface.cpp

src/torchcodec/_core/GpuEncoder.cpp

src/torchcodec/_core/Encoder.cpp

src/torchcodec/_core/GpuEncoder.h

src/torchcodec/_core/CUDACommon.cpp

src/torchcodec/_core/GpuEncoder.cpp

src/torchcodec/encoders/_video_encoder.py

src/torchcodec/_core/Encoder.cpp

src/torchcodec/_core/custom_ops.cpp

src/torchcodec/_core/Encoder.cpp

NicolasHug · 2025-12-01T14:35:36Z

src/torchcodec/_core/GpuEncoder.cpp

+  avFrame->height = static_cast<int>(tensor.size(1));
+  avFrame->pts = frameIndex;
+
+  int ret = av_hwframe_get_buffer(


Add a note here that we're letting FFmpeg allocate the CUDA memory. I think we should explore allocating the memory with pytorch instead, so that we can automatically rely on pytorch's CUDA memory allocator, which should be more efficient. There could be a TODO to investigate how to do that (this is related to my comment about setupEncodingContext above).

Would an example of this be allocateEmptyHWCTensor?

Correct, in allocateEmptyHWCTensor we pre-allocate the output tensors with pytorch, and thus with pytorch's CUDA memory allocator. But with the decoder, it's more of a necessity than an optimization: we want to output tensors, so of course we have to use pytorch to allocate those. We wouldn't want or be able to do that via FFmpeg.

In our case here, with the encoder, relying on pytorch for the allocation isn't a necessity: FFmpeg can do that. It's more of an optimization.

src/torchcodec/_core/GpuEncoder.cpp

test/test_encoders.py

…o encode_gpu

NicolasHug · 2025-12-05T11:18:29Z

test/test_encoders.py

+            ("mov", "h264_nvenc"),
+            ("mp4", "hevc_nvenc"),
+            ("avi", "h264_nvenc"),
+            # ("mkv", "av1_nvenc"), # av1_nvenc is not supported on CI


That's OK for now but add a TODO that we should re-enable it when not in CI. We should be able to have an in_CI() helper just like we do for in_fbcode(). Torchvision has: https://github.com/pytorch/vision/blob/6b56de1cc83386025f2bad87abf608077d1853f7/test/common_utils.py#L28

NicolasHug · 2025-12-05T11:23:14Z

test/test_encoders.py

+            encoder_output = file_like.getvalue()
+        else:
+            raise ValueError(f"Unknown method: {method}")
+


After we have encoded the frame, let's assert with ffprobe that the pixel format is NV12 (and any other relevant properties)

I added the assert below and learned that ffmpeg does not distinguish nv12 from yuv420p in video metadata. I hope to gain a better understanding of this once we enable alternate pixel formats.

src/torchcodec/_core/StreamOptions.h

NicolasHug · 2025-12-05T11:26:29Z

test/test_encoders.py

+    @pytest.mark.parametrize(
+        "format_codec",
+        [
+            ("mov", "h264_nvenc"),


IIUC, we get an error when not explicitly passing the codec. That's OK for this PR, but not a great UX. On CPU we don't require the user to pass the codec, so ideally it should work the same on GPU. Let's add a TODO to enable that, I think that should be one of the first follow-ups.

I acknowledge it might be related to the *find_codec* function that I suggested we remove earlier, in a previous comment. I also understand that the FFmpeg CLi requires the codec, but there should be no reason we have to :)

I agree on the worse UX there, I'll add a TODO.

NicolasHug · 2025-12-05T11:29:17Z

test/test_encoders.py

+        # Encode with FFmpeg CLI using nvenc codecs
+        format, codec = format_codec
+        device = "cuda"
+        pixel_format = "nv12"


We only support nv12 for now. We should error if the user tries to pass anything else.

Also note that the tests are passing when I remove pixel_format from both the FFmpeg CLI and our API calls - that's good, and it suggests we should remove it from this test as well. As long as we assert with ffprobe that the output is nv12, we're good (see other suggestion below)

This is a little tricky. FFmpeg CLI defaults to pix_fmt=gbrp which does no chroma subsampling, but somehow the the frame assertions pass against our VideoEncoder which defaults to pix_fmt=yuv420p.
For now, I'll keep specifying pix_fmt, and investigate and address this once we enable other pixel formats.

test/test_encoders.py

NicolasHug · 2025-12-05T11:32:09Z

test/test_encoders.py

+        # TODO-VideoEncoder: Ensure CI does not skip this test, as we know NVENC is available.
+        try:
+            subprocess.run(ffmpeg_cmd, check=True, capture_output=True)
+        except subprocess.CalledProcessError as e:
+            if b"No NVENC capable devices found" in e.stderr:
+                pytest.skip("NVENC not available on this system")
+            else:
+                raise


On the CI TODO above: I think the easiest way to address it is to remove the whole try/except block and to never skip this test. In the GitHub CI, it should always pass. In fbcode, it's consistently skipped anyway. I now think we should do that here in this PR rather than leaving it out as a TODO (sorry for suggesting a TODO earlier :) )

NicolasHug · 2025-12-05T11:40:23Z

src/torchcodec/_core/DeviceInterface.h

+  virtual std::optional<UniqueAVFrame> convertTensorToAVFrame(
+      [[maybe_unused]] const torch::Tensor& tensor,
+      [[maybe_unused]] int frameIndex,
+      [[maybe_unused]] AVCodecContext* codecContext) {
+    return std::nullopt;
+  }


This is all good, a few suggestions:

let's explicitly reflect in the name that it's about encoding and about CUDA. Something like convertCUDATensorToCUDAAVFrameForEncoding. The name is ugly but that's kind of on purpose. We should add a TODO here to re-consider the use of encoding via the device interface

Let's add a comment above indicating that these are here because we need the interface "plumbing" for not including CUDA-specific dependencies in CPU builds (related to TODO mentioned above)

Let's not return an optional. I don't thikn we need to?

Let's call TORCH_CHECK(false) here in this implementation. No one should ever call that except the CUDA interface.

Thanks for these suggestions, I've implemented them in my latest commit.

NicolasHug · 2025-12-05T11:40:45Z

src/torchcodec/_core/DeviceInterface.h

+  virtual void setupHardwareFrameContext(
+      [[maybe_unused]] AVCodecContext* codecContext) {}


Similar comments:

rename to something like setupHardwareFrameContextForEncoding

use TORCH_CHECK(false).

meta-cla bot added the CLA Signed This label is managed by the Meta Open Source bot. label Oct 29, 2025

Dan-Flores changed the title ~~video encoder python file~~ video encoder CUDA support Oct 29, 2025

Dan-Flores force-pushed the encode_gpu branch 2 times, most recently from f1678b7 to 997158f Compare November 3, 2025 20:56

Dan-Flores marked this pull request as ready for review November 3, 2025 21:07

Dan-Flores changed the title ~~video encoder CUDA support~~ Enable CUDA device for video encoder Nov 3, 2025

Dan-Flores marked this pull request as draft November 3, 2025 21:35

Dan-Flores changed the title ~~Enable CUDA device for video encoder~~ [wip] Enable CUDA device for video encoder Nov 6, 2025

Dan-Flores force-pushed the encode_gpu branch from 0ff2701 to 0246410 Compare November 20, 2025 05:52

Dan-Flores added 3 commits November 20, 2025 14:06

changes

ca1f538

lint

50cdb21

BT.601, test_nvenc_against_ffmpeg_cli

54d1a1f

Dan-Flores force-pushed the encode_gpu branch from 0246410 to 54d1a1f Compare November 20, 2025 14:07

Dan-Flores added 9 commits November 20, 2025 14:40

remove cuda header from Encoder.cpp

88e1299

separate encoding frame ctx init

926b7ea

Merge branch 'main' of https://github.com/meta-pytorch/torchcodec int…

eee8889

…o encode_gpu

lint

43c6221

parametrize other nvenc

4af1f53

disable av1_nvenc

bdd133f

Merge branch 'main' into encode_gpu

d5f2637

reduce files affected, add GpuEncoder.cpp

9c7bae7

actually add GpuEncoder.cpp

bf78468

Dan-Flores changed the title ~~[wip] Enable CUDA device for video encoder~~ Enable CUDA device for video encoder Nov 26, 2025