AI Language Plugins for Daydream Scope

-+= Update 26 Feb 2026 | AI Language wins top prize in Daydream Scope Plugin contest! =+-

AI Language Plugins for Daydream Scope

Real-time AI plugins that close the loop between seeing and generating. The system watches a video stream, reasons about what it sees in real-time, and continuously steers the AI image generation based on that understanding.

A vision language model (VLM) produces semantic descriptions: the mood of a crowd, the species of an animal, the weather in a landscape, the emotional tone of a scene. Those descriptions can optionally feed into a second preprocessor with large language model (LLM), which can rewrite them as rich diffusion prompts, which helps shape what the AI generates, frame by frame, in real time.

4.mp4

Advanced semantic reasoning about content

Example: Point the camera at a cat. Ask the VLM "what are the natural predators of what you see in three words?". It answers "eagles, foxes, coyotes". That response becomes the live diffusion prompt. The AI no longer renders a cat; it renders whatever is hunting it, morphing dynamically as the VLM's answers evolve with each new inference.

The generation doesn't follow a fixed script. It follows the scene. Prompt state changes smoothly via temporal interpolation rather than cutting abruptly between semantic states. Multiple plugins can run in parallel, chained, or driven from external tools (OSC, UDP) for live performance and installation contexts.

Live completion to steer streaming video generation

Example By drawing live into the feed (e.g., using local Spout streaming), the VLM can be used to drive a visual auto-complete. While initial strokes are ambigious, with more detail, the VLM inference starts to converge and provide increasingly accurate interpretations. Those are directly fed into the live video generation, serving as both a live autocomplete, but also as a means to create animated drawings.

streamdiffusion2_compressed.mp4

Try it out

Test server A: RTX PRO 4050
Test server B: RTX PRO 6000
Test server C: RTX PRO 6000, latest
Pipeline ID: streamdiffusionv2
Preprocessor: vlm-ollama-pre
- Ollama URL: http://157.157.221.29:23058
- Model: llava:7b
Postprocessor: vlm-ollama-post

How it works

The plugins slot into Daydream Scope's preprocessor / postprocessor pipeline architecture. A typical split chain:

Camera → [VLM Pre] ──────────────────────► [AI Model] → [VLM Post] → Output
               │ UDP multicast 239.255.42.99       ▲
               └──► [UDP Prompt] ─── prompts ──────┘

scope-vlm-ollama queries an Ollama vision model on each frame at a configurable interval. Runs as a preprocessor (queries the raw feed, injects the VLM response as a diffusion prompt and broadcasts it via UDP), a postprocessor (receives the UDP text and overlays it on the AI output), or as a combined main pipeline.
scope-llm-ollama sends text to an Ollama LLM and injects the rewritten response as a diffusion prompt. Use it to transform a short observation into an elaborate scene description, style directive, or creative prompt.
scope-udp-prompt / scope-osc-prompt receive prompts from any external source via UDP multicast or OSC and inject them into the pipeline — bridging Python scripts, TouchDesigner, Ableton Live, Max/MSP, or any custom controller.

Semantic responses are broadcast over UDP multicast so any number of downstream plugins receive them simultaneously — fan-out with no additional routing. The port number acts as a channel: any plugin listening on the same port gets every message.

Prompt transitions use temporal interpolation (slerp or linear) to blend smoothly between semantic states over a configurable number of frames, rather than snapping abruptly when the VLM's description changes.

Built on Ollama for local or remote VLM/LLM inference. Shared libraries handle all transport, frame conversion, text rendering, and prompt injection (scope-bus, scope-language), so each plugin stays focused on its single role in the chain.

3.mp4

2.mp4

1.mp4

5.mp4

moondream.mp4

Plugins

scope-vlm-ollama

Queries an Ollama vision model on live video. Available as three variants:

Pipeline	Role	Description
VLM Ollama	Main	Query VLM + overlay response + inject prompt
VLM Ollama (Pre)	Preprocessor	Query VLM + inject prompt + broadcast via UDP
VLM Ollama (Post)	Postprocessor	Receive UDP text + overlay on AI output

Typical chain: [VLM Pre] → [AI Model] → [VLM Post]

The Pre queries the raw camera feed; the Post overlays the description on the AI-processed output.

Key settings:

ollama_url / ollama_model — load-time connection config
vlm_prompt — question sent to the VLM with each frame
send_interval — seconds between VLM queries (VLM is slow; 3–10s typical)
inject_prompt / prompt_weight — whether to use the VLM response as a diffusion prompt
transition_steps — frames to blend from current to new prompt (0 = instant)
udp_port — channel for Pre→Post communication (Pre/Post only)

scope-llm-ollama

Sends text to an Ollama LLM and injects the response as a diffusion prompt.

Role: Preprocessor

Use case: Transform a simple input phrase into an elaborate scene description, style directive, or creative prompt. Works well chained before any image generation model.

Key settings:

system_prompt — LLM personality / rewriting instruction
input_prompt — the text fed to the LLM each interval
send_interval — query frequency
inject_prompt — send LLM response downstream as a diffusion prompt
udp_enabled / udp_port — optionally broadcast LLM response to other plugins

scope-udp-prompt

Receives text via UDP and injects it as a diffusion prompt.

Role: Preprocessor

Use case: Bridge any external application into Scope's prompt chain. Send prompts from a Python script, a custom controller, or any other tool that can send UDP packets.

Key settings:

udp_port — channel to listen on (load-time)
prompt_weight — weight of injected prompt
transition_steps — frames to blend from current to new prompt (0 = instant)
overlay_enabled — show received text on video (yellow, top-left) for monitoring

Sending from Python:

import socket

MULTICAST_GROUP = "239.255.42.99"
PORT = 9400

sock = socket.socket(socket.AF_INET, socket.SOCK_DGRAM, socket.IPPROTO_UDP)
sock.setsockopt(socket.IPPROTO_IP, socket.IP_MULTICAST_TTL, 1)
sock.sendto("a moonlit forest, painterly".encode(), (MULTICAST_GROUP, PORT))

scope-osc-prompt

Receives OSC /prompt messages and injects the text as a diffusion prompt.

Role: Preprocessor

Use case: Integrate Scope with TouchDesigner, Ableton Live, Max/MSP, or any other tool that sends OSC. Send a string to /prompt on the configured port and it becomes the active diffusion prompt.

Key settings:

osc_port — UDP port to listen for OSC messages (load-time, default 9000)
prompt_weight — weight of injected prompt
transition_steps — frames to blend from current to new prompt (0 = instant)
overlay_enabled — show received text on video (yellow, top-left) for monitoring

Sending from TouchDesigner / Python:

from pythonosc.udp_client import SimpleUDPClient

client = SimpleUDPClient("127.0.0.1", 9000)
client.send_message("/prompt", "a misty forest at dawn, painterly")

scope-test-text-log

Debug postprocessor that overlays all pipeline kwargs on the video and prints them to stdout. Shows video shape, prompts, UDP messages, and any extra kwargs flowing through the chain.

Role: Postprocessor Use case: Drop this at the end of any chain to inspect exactly what's flowing between stages.

Installation

Dependencies must be installed before the plugins that use them. Via the Scope UI (installs into the correct venv):

scope-bus
scope-language
scope-vlm-ollama, scope-llm-ollama, scope-udp-prompt, scope-osc-prompt
scope-test-text-log

After installing scope-bus and scope-language, they appear in the Scope UI pipeline list as passthrough pipelines — confirming installation and allowing uninstall via the UI.

RunPod Deployment

Daydream Scope Pod

Tested configuration for running Daydream Scope on RunPod:

Setting	Value
GPU	RTX PRO 6000 (1×)
vCPU / Memory	16 vCPU / 188 GB
Container disk	20 GB
Network volume	80 GB (`daydream_scope`, mounted at `/workspace`)
On-demand price	~$1.69/hr compute + $0.003/hr storage
Base template	daydream-scope (`aca8mw9ivw`)

The network volume at /workspace persists across pod restarts — use it for model weights and checkpoints.

Split-Instance Architecture (Recommended)

Ollama VLM/LLM queries are slow (1–5s each) and run in background threads, so Ollama doesn't need to share the GPU with diffusion inference. Running Ollama on a separate, cheaper pod frees the main GPU for full-speed diffusion:

[Scope pod: RTX PRO 6000]          [Ollama pod: CPU-only or cheapest GPU]
  StreamDiffusion / other models       ollama serve
  scope-vlm-ollama (pre/post)   ──►   http://<ollama-pod-ip>:11434
  scope-llm-ollama

In each VLM/LLM plugin, set ollama_url (load-time) to the Ollama pod's public IP:

http://213.x.x.x:11434

RunPod exposes port 11434 via the pod's public IP when you add it under Expose TCP Ports in the pod settings.

Ollama Pod Setup

For the Ollama-only pod, use the cheapest CPU pod (or any GPU pod). Paste this as the Container Start Command when creating the pod or template:

curl -fsSL https://raw.githubusercontent.com/olwal/scope-ai-language/main/scripts/setup-ollama-pod.sh | sh

This installs Ollama, pulls qwen3-vl:2b, and starts the server bound to 0.0.0.0:11434. OLLAMA_HOST=0.0.0.0 is required so Ollama is reachable via RunPod's TCP port forwarding — without it, Ollama only listens on 127.0.0.1.

To use a different model, set the OLLAMA_MODEL environment variable on the pod before running the script, or add it inline:

OLLAMA_MODEL=llava:7b curl -fsSL https://raw.githubusercontent.com/olwal/scope-ai-language/main/scripts/setup-ollama-pod.sh | sh

The model is downloaded on first boot — subsequent restarts skip the pull if the model is cached on a network volume.

Recommended model: qwen3-vl:2b — fast, small, capable vision model. For higher quality at the cost of speed: llava:7b or llava:13b.

Creating a RunPod Template

To save this as a reusable template in the RunPod console:

Go to Manage → Templates → New Template
Set Container Image to any base image with CUDA or a plain Ubuntu image (e.g. runpod/base:0.4.0-cuda11.8.0)
Under Container Start Command, paste the Ollama install script above
Under Expose TCP Ports, add 11434 (Ollama API)
Set Container Disk to 5–10 GB (Ollama binary + small model overhead if no volume)
Optionally attach a Network Volume at /root/.ollama to cache pulled models across restarts
Save as private template — it will appear in your pod creation flow

For the network volume approach, change the pull line to check first:

ollama pull qwen3-vl:2b 2>/dev/null || true

So re-pulling an already-cached model is a no-op.

Architecture

scope-bus          ← shared transport + rendering library
scope-language     ← Ollama VLM/LLM clients (depends on scope-bus)

scope-vlm-ollama   ← vision language model pipeline (depends on scope-language)
scope-llm-ollama   ← text language model pipeline (depends on scope-language)
scope-udp-prompt   ← receive UDP text → inject as prompt (depends on scope-bus)
scope-osc-prompt   ← receive OSC /prompt → inject as prompt (depends on scope-bus)

scope-test-text-log  ← debug: overlay postprocessor

Pipeline Types

Scope supports three pipeline roles, declared in each plugin's schema.py:

Role	`usage =`	Runs	Typical use
Main	(omit)	In the AI model slot	Full processing pipelines
Preprocessor	`[UsageType.PREPROCESSOR]`	Before the AI model	Prompt injection, signal routing
Postprocessor	`[UsageType.POSTPROCESSOR]`	After the AI model	Overlays, logging, routing

The UDP Bus

Plugins communicate at runtime using UDP multicast on 239.255.42.99. The port number acts as a channel — sender and receiver must use the same port. Multiple receivers on the same port all receive every message (fan-out).

[VLM Pre]──UDP:9400──►[VLM Post]   (overlay on AI output)
                  └──►[UDP Prompt] (forward VLM text as prompt)
                  └──►[Text Log]   (debug display)

Shared Libraries

scope-bus

Transport, rendering, and frame utilities. All other plugins depend on this.

from scope_bus import (
    UDPSender,                 # send text/dict via UDP multicast
    UDPReceiver,               # receive text/dict via UDP multicast
    render_text_overlay,       # draw text onto (T, H, W, C) tensors
    apply_overlay_from_kwargs, # render_text_overlay reading from pipeline kwargs dict
    normalize_input,           # list[Tensor] → (T, H, W, C) float32 [0,1]
    tensor_to_pil,             # (H, W, C) tensor → PIL Image
    PromptInjector,            # dedup-inject prompts to output dict
    OverlayMixin,              # Pydantic mixin: overlay appearance fields for schemas
    FontFamily,                # Enum: arial | courier | times | helvetica
    TextPosition,              # Enum: top-left | top-center | bottom-left | bottom-center
)

UDPSender — multicast sender with debounced port changes. Accepts strings or dicts (serialised as JSON):

sender = UDPSender(port=9400)
sender.send("a sunset over mountains")          # plain text
sender.send({"prompt": "...", "response": "..."})  # JSON dict
sender.update_port(9401)  # debounced 3s — call every frame, applies after stable

UDPReceiver — multicast receiver, non-blocking poll. Auto-parses JSON:

receiver = UDPReceiver(port=9400)
msg = receiver.poll()  # str, dict (if JSON), or None

render_text_overlay — composites text onto video frames:

frames = render_text_overlay(
    frames,
    text="VLM response here",
    font_family="arial",        # arial | courier | times | helvetica
    font_size=24,
    font_color=(1.0, 1.0, 1.0), # RGB [0,1]
    opacity=1.0,
    position="bottom-left",     # top-left | top-center | bottom-left | bottom-center
    word_wrap=True,
    bg_opacity=0.5,
)

PromptInjector — injects prompts only when text changes. Supports instant or smooth transitions:

injector = PromptInjector()

# Instant change (default)
injector.inject_if_new(output, text="a cat on a couch", weight=100.0)
# output["prompts"] is set only when text differs from last call

# Smooth temporal blend (uses Scope's transition API)
injector.inject_if_new(output, text="a stormy sea", weight=100.0,
                       transition_steps=10, interpolation_method="slerp")
# output["transition"] is set with target_prompts + num_steps

normalize_input — converts Scope's raw video list to a usable tensor:

frames = normalize_input(video, device)
# video: list of (1, H, W, C) uint8 tensors from Scope
# returns: (T, H, W, C) float32 on device, values in [0, 1]

scope-language

Async Ollama clients for vision and text models.

from scope_language import OllamaVLM, OllamaLLM

OllamaVLM — sends video frames to a vision model in a background thread:

vlm = OllamaVLM(url="http://localhost:11434", model="llava:7b")

# In __call__ (runs every frame):
if vlm.should_send(interval=3.0):       # time-throttled
    vlm.query_async(
        frames[0],                       # single (H, W, C) tensor
        prompt="Describe what you see",
        callback=lambda text: sender.send(text),  # optional
    )
description = vlm.get_last_response()   # returns last completed response

OllamaLLM — text-to-text, same async pattern:

llm = OllamaLLM(url="http://localhost:11434", model="llama3.2:3b")

if llm.should_send(interval=5.0):
    llm.query_async(
        prompt="a foggy forest",
        system="Rewrite as a cinematic scene description in one sentence.",
    )
response = llm.get_last_response()

Key Daydream Scope Concepts

Concept	Documentation
Pipeline interface (`Pipeline`, `Requirements`)	`scope/src/scope/core/pipelines/interface.py`
Schema base class (`BasePipelineConfig`, `ui_field_config`)	`scope/src/scope/core/pipelines/base_schema.py`
Plugin registration (`hookimpl`, `register_pipelines`)	`scope/src/scope/core/plugins/hookspecs.py`
Preprocessor → main pipeline parameter forwarding	`scope/src/scope/server/pipeline_processor.py`
Prompt format (`{"text": str, "weight": float}`)	Consumed by the main diffusion pipeline

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

AI Language Plugins for Daydream Scope

Advanced semantic reasoning about content

Live completion to steer streaming video generation

Try it out

How it works

Plugins

scope-vlm-ollama

scope-llm-ollama

scope-udp-prompt

scope-osc-prompt

scope-test-text-log

Installation

RunPod Deployment

Daydream Scope Pod

Split-Instance Architecture (Recommended)

Ollama Pod Setup

Creating a RunPod Template

Architecture

Pipeline Types

The UDP Bus

Shared Libraries

scope-bus

scope-language

Key Daydream Scope Concepts

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 26 Commits
scope-bus		scope-bus
scope-language		scope-language
scope-llm-ollama		scope-llm-ollama
scope-osc-prompt		scope-osc-prompt
scope-test-text-log		scope-test-text-log
scope-udp-prompt		scope-udp-prompt
scope-vlm-ollama		scope-vlm-ollama
scripts		scripts
.gitignore		.gitignore
README.md		README.md

olwal/scope-ai-language

Folders and files

Latest commit

History

Repository files navigation

AI Language Plugins for Daydream Scope

Advanced semantic reasoning about content

Live completion to steer streaming video generation

Try it out

How it works

Plugins

scope-vlm-ollama

scope-llm-ollama

scope-udp-prompt

scope-osc-prompt

scope-test-text-log

Installation

RunPod Deployment

Daydream Scope Pod

Split-Instance Architecture (Recommended)

Ollama Pod Setup

Creating a RunPod Template

Architecture

Pipeline Types

The UDP Bus

Shared Libraries

scope-bus

scope-language

Key Daydream Scope Concepts

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages