-
Notifications
You must be signed in to change notification settings - Fork 75
Enable CUDA device for video encoder #1008
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
f1678b7 to
997158f
Compare
0ff2701 to
0246410
Compare
0246410 to
54d1a1f
Compare
src/torchcodec/_core/Encoder.cpp
Outdated
|
|
||
| void VideoEncoder::initializeEncoder( | ||
| const VideoStreamOptions& videoStreamOptions) { | ||
| if (videoStreamOptions.device.is_cuda()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was hopiing we wouldn't need to support a device parameter anywhere. Any reason we can't just rely on the input frames device?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As per our offline discussion, I've updated this PR to not have an explicit device param, and instead use whichever device the frames Tensor is on.
src/torchcodec/_core/GpuEncoder.cpp
Outdated
| avFrame->height = static_cast<int>(tensor.size(1)); | ||
| avFrame->pts = frameIndex; | ||
|
|
||
| int ret = av_hwframe_get_buffer( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add a note here that we're letting FFmpeg allocate the CUDA memory. I think we should explore allocating the memory with pytorch instead, so that we can automatically rely on pytorch's CUDA memory allocator, which should be more efficient. There could be a TODO to investigate how to do that (this is related to my comment about setupEncodingContext above).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would an example of this be allocateEmptyHWCTensor?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correct, in allocateEmptyHWCTensor we pre-allocate the output tensors with pytorch, and thus with pytorch's CUDA memory allocator. But with the decoder, it's more of a necessity than an optimization: we want to output tensors, so of course we have to use pytorch to allocate those. We wouldn't want or be able to do that via FFmpeg.
In our case here, with the encoder, relying on pytorch for the allocation isn't a necessity: FFmpeg can do that. It's more of an optimization.
| ("mov", "h264_nvenc"), | ||
| ("mp4", "hevc_nvenc"), | ||
| ("avi", "h264_nvenc"), | ||
| # ("mkv", "av1_nvenc"), # av1_nvenc is not supported on CI |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's OK for now but add a TODO that we should re-enable it when not in CI. We should be able to have an in_CI() helper just like we do for in_fbcode(). Torchvision has: https://github.com/pytorch/vision/blob/6b56de1cc83386025f2bad87abf608077d1853f7/test/common_utils.py#L28
| encoder_output = file_like.getvalue() | ||
| else: | ||
| raise ValueError(f"Unknown method: {method}") | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After we have encoded the frame, let's assert with ffprobe that the pixel format is NV12 (and any other relevant properties)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added the assert below and learned that ffmpeg does not distinguish nv12 from yuv420p in video metadata. I hope to gain a better understanding of this once we enable alternate pixel formats.
| @pytest.mark.parametrize( | ||
| "format_codec", | ||
| [ | ||
| ("mov", "h264_nvenc"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIUC, we get an error when not explicitly passing the codec. That's OK for this PR, but not a great UX. On CPU we don't require the user to pass the codec, so ideally it should work the same on GPU. Let's add a TODO to enable that, I think that should be one of the first follow-ups.
I acknowledge it might be related to the *find_codec* function that I suggested we remove earlier, in a previous comment. I also understand that the FFmpeg CLi requires the codec, but there should be no reason we have to :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree on the worse UX there, I'll add a TODO.
| # Encode with FFmpeg CLI using nvenc codecs | ||
| format, codec = format_codec | ||
| device = "cuda" | ||
| pixel_format = "nv12" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We only support nv12 for now. We should error if the user tries to pass anything else.
Also note that the tests are passing when I remove pixel_format from both the FFmpeg CLI and our API calls - that's good, and it suggests we should remove it from this test as well. As long as we assert with ffprobe that the output is nv12, we're good (see other suggestion below)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a little tricky. FFmpeg CLI defaults to pix_fmt=gbrp which does no chroma subsampling, but somehow the the frame assertions pass against our VideoEncoder which defaults to pix_fmt=yuv420p.
For now, I'll keep specifying pix_fmt, and investigate and address this once we enable other pixel formats.
test/test_encoders.py
Outdated
| # TODO-VideoEncoder: Ensure CI does not skip this test, as we know NVENC is available. | ||
| try: | ||
| subprocess.run(ffmpeg_cmd, check=True, capture_output=True) | ||
| except subprocess.CalledProcessError as e: | ||
| if b"No NVENC capable devices found" in e.stderr: | ||
| pytest.skip("NVENC not available on this system") | ||
| else: | ||
| raise |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the CI TODO above: I think the easiest way to address it is to remove the whole try/except block and to never skip this test. In the GitHub CI, it should always pass. In fbcode, it's consistently skipped anyway. I now think we should do that here in this PR rather than leaving it out as a TODO (sorry for suggesting a TODO earlier :) )
| virtual std::optional<UniqueAVFrame> convertTensorToAVFrame( | ||
| [[maybe_unused]] const torch::Tensor& tensor, | ||
| [[maybe_unused]] int frameIndex, | ||
| [[maybe_unused]] AVCodecContext* codecContext) { | ||
| return std::nullopt; | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is all good, a few suggestions:
- let's explicitly reflect in the name that it's about encoding and about CUDA. Something like
convertCUDATensorToCUDAAVFrameForEncoding. The name is ugly but that's kind of on purpose. We should add a TODO here to re-consider the use of encoding via the device interface - Let's add a comment above indicating that these are here because we need the interface "plumbing" for not including CUDA-specific dependencies in CPU builds (related to TODO mentioned above)
- Let's not return an optional. I don't thikn we need to?
- Let's call TORCH_CHECK(false) here in this implementation. No one should ever call that except the CUDA interface.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for these suggestions, I've implemented them in my latest commit.
| virtual void setupHardwareFrameContext( | ||
| [[maybe_unused]] AVCodecContext* codecContext) {} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similar comments:
- rename to something like
setupHardwareFrameContextForEncoding - use TORCH_CHECK(false).
This PR enables encoding frames on GPU device in VideoEncoder.
Previously,
DeviceInterfaceand its subclasses only contained methods related to decoding. For now, we are going to break that rule and add two encoding specific methods:convertTensorToAVFrame, andsetupHardwareFrameContext.The constructor for
VideoEncoderis unchanged.Inspired by other torch operators, we will determine use the device the
framestensor` is on (thanks for the suggestion @NicolasHug!) :The code path in
Encoder.cppis unchanged on CPU.On GPU, a member variable
deviceInterface_will be initialized and used to call GPU encoding functions.convertTensorToAVFrame, there are several TODOs to extend it. Right now, it only handles limited range color, and always encodes frames in the pixel formatAV_PIX_FMT_NV12(essentiallyyuv420p).