Skip to content

docs: Course section — Video Processing and Frame Extraction #17

@rdwj

Description

@rdwj

Summary

Teach users how to add video understanding capabilities to their agents using frame extraction and vision models. This section covers the batch-oriented architecture and sets clear expectations about maturity and limitations.

Course Section Outline

  • Video processing architecture — FFmpeg frame extraction piped to a vision model
  • When video processing is appropriate (batch analysis) vs. not ready (real-time streaming)
  • Configuring the VideoPreprocessor service and its extraction parameters
  • Frame extraction strategies — uniform sampling, keyframe detection, scene change
  • Integrating video content with the file upload endpoint and storage in MinIO
  • Maturity expectations and current limitations — processing latency, model accuracy on frames
  • Cost implications — each frame is a vision model call

Lab Exercise

Upload a short video clip (under 30 seconds) through the file upload endpoint. Observe the frame extraction process. Ask the agent to describe what happens in the video and verify it synthesizes information across multiple extracted frames.

Companion Issues

Companion issues filed on fips-agents/agent-template, fips-agents/gateway-template, fips-agents/ui-template, and fips-agents/fips-agents-cli.

Size

S-M

Metadata

Metadata

Assignees

No one assigned

    Labels

    documentationImprovements or additions to documentation

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions