Skip to content

Conversation

@leszko
Copy link
Collaborator

@leszko leszko commented Jan 5, 2026

Add depth-anything pipelines from https://github.com/DepthAnything/Video-Depth-Anything with the intention to use is as a first (built-in) preprocessor.

Comments

  1. I used experimental streaming mode. because it provides the best latency
  2. I vendored the whole code to keep the convention of other pipelines and to avoid additional dependencies; alternatively we could use the original project as a dependency
  3. The second commit of this PR removes the depth-anything from the pipelines in the Scope app; AFAIU we don't want to use it as a pipeline, but this will be used only as a preprocessor
  4. Added the usage of non-metric small model weights; I think this should be the fastest (so the best) as a preprocessor
  5. I haven't used any blocks or components, I think using those would make more sense if we have similar pipelines with shared logic

@leszko leszko force-pushed the rafal/add-depth-anything branch 5 times, most recently from 34a222e to 05d05c5 Compare January 5, 2026 12:59
@leszko leszko marked this pull request as ready for review January 5, 2026 13:10
@leszko leszko requested a review from yondonfu January 5, 2026 13:10
leszko added 2 commits January 7, 2026 08:25
Signed-off-by: Rafal Leszko <rafal@livepeer.org>
Signed-off-by: Rafal Leszko <rafal@livepeer.org>
@leszko leszko force-pushed the rafal/add-depth-anything branch from ec6404d to 287bcef Compare January 7, 2026 08:25
],
"depth-anything": [
HuggingfaceRepoArtifact(
repo_id="depth-anything/Video-Depth-Anything-Small",
Copy link
Contributor

@yondonfu yondonfu Jan 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just noting that it's ok to start with the small checkpoint for now, but since there is also a base and large checkpoint I think it makes sense to take a separate pass (can be done outside of this PR) to allow the model for the pipeline to be configurable since IIUC the pipeline should be usable with any of the checkpoints. At that point, we'd want to only download the artifacts for the model selected for the pipeline.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Totally. I just didn't want to introduce the concept of the model weight selection in this PR, because we don't have it yet in Scope. But totally that we should add it and then have base and large models for video depth anything. The change will be trivial, it's just about finding the good UX for it.

# Convert input to numpy uint8 array
# Frames from frame_processor are always (1, H, W, C), so we squeeze the T dimension
frames = []
for frame in video:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It makes sense to handle the case where there are many frames, but just want to clarify for now in practice the return value of prepare() we'd always expect a single frame in video in` each call yeah?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Video Depth Anything works in 2 modes:

  1. default => batch (32 frames in, 32 frames out (can pass less and use padding))
  2. streaming => experimental (1 frame in, 1 frame out)

Batch is better in terms of resource consumption, streaming is better for latency. My thinking was that (especially for latency) we want to optimize for the latency, so I used "streaming". At the same time, the pipeline interface accepts multiple frames to be consistent with other pipelines. But yeah, prepare() will always return 1 frame. To make it clearer, I updated test.py to pass frame by frame.

Now, we could consider supporting both batch and streaming and give control to the user. That's trivial to add, but I wonder if we want to even expose the batch mode.

@leszko leszko requested a review from yondonfu January 9, 2026 08:05
@leszko leszko force-pushed the rafal/add-depth-anything branch 4 times, most recently from 3aecfc0 to 0daa61d Compare January 9, 2026 08:33
Signed-off-by: Rafal Leszko <rafal@livepeer.org>
@leszko leszko force-pushed the rafal/add-depth-anything branch from 0daa61d to bbc4a06 Compare January 9, 2026 08:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants