-
Notifications
You must be signed in to change notification settings - Fork 21
Add DepthAnything pipeline #303
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
34a222e to
05d05c5
Compare
Signed-off-by: Rafal Leszko <rafal@livepeer.org>
Signed-off-by: Rafal Leszko <rafal@livepeer.org>
ec6404d to
287bcef
Compare
| ], | ||
| "depth-anything": [ | ||
| HuggingfaceRepoArtifact( | ||
| repo_id="depth-anything/Video-Depth-Anything-Small", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just noting that it's ok to start with the small checkpoint for now, but since there is also a base and large checkpoint I think it makes sense to take a separate pass (can be done outside of this PR) to allow the model for the pipeline to be configurable since IIUC the pipeline should be usable with any of the checkpoints. At that point, we'd want to only download the artifacts for the model selected for the pipeline.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Totally. I just didn't want to introduce the concept of the model weight selection in this PR, because we don't have it yet in Scope. But totally that we should add it and then have base and large models for video depth anything. The change will be trivial, it's just about finding the good UX for it.
| # Convert input to numpy uint8 array | ||
| # Frames from frame_processor are always (1, H, W, C), so we squeeze the T dimension | ||
| frames = [] | ||
| for frame in video: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It makes sense to handle the case where there are many frames, but just want to clarify for now in practice the return value of prepare() we'd always expect a single frame in video in` each call yeah?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Video Depth Anything works in 2 modes:
- default => batch (32 frames in, 32 frames out (can pass less and use padding))
- streaming => experimental (1 frame in, 1 frame out)
Batch is better in terms of resource consumption, streaming is better for latency. My thinking was that (especially for latency) we want to optimize for the latency, so I used "streaming". At the same time, the pipeline interface accepts multiple frames to be consistent with other pipelines. But yeah, prepare() will always return 1 frame. To make it clearer, I updated test.py to pass frame by frame.
Now, we could consider supporting both batch and streaming and give control to the user. That's trivial to add, but I wonder if we want to even expose the batch mode.
3aecfc0 to
0daa61d
Compare
Signed-off-by: Rafal Leszko <rafal@livepeer.org>
0daa61d to
bbc4a06
Compare
Add
depth-anythingpipelines from https://github.com/DepthAnything/Video-Depth-Anything with the intention to use is as a first (built-in) preprocessor.Comments
depth-anythingfrom the pipelines in the Scope app; AFAIU we don't want to use it as a pipeline, but this will be used only as a preprocessor