Skip to content

Generative AI plugins for language-driven, real-time video inference and generation. Ollama VLM/LLM pipelines and UDP prompt routing, built on shared libraries for communication and AI services (scope-bus, scope-language).

Notifications You must be signed in to change notification settings

olwal/scope-ai-language

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

26 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

-+= Update 26 Feb 2026 | AI Language wins top prize in Daydream Scope Plugin contest! =+-

ai_language_winner_3

AI Language Plugins for Daydream Scope

Real-time AI plugins that close the loop between seeing and generating. The system watches a video stream, reasons about what it sees in real-time, and continuously steers the AI image generation based on that understanding.

A vision language model (VLM) produces semantic descriptions: the mood of a crowd, the species of an animal, the weather in a landscape, the emotional tone of a scene. Those descriptions can optionally feed into a second preprocessor with large language model (LLM), which can rewrite them as rich diffusion prompts, which helps shape what the AI generates, frame by frame, in real time.

4.mp4

Advanced semantic reasoning about content

Example: Point the camera at a cat. Ask the VLM "what are the natural predators of what you see in three words?". It answers "eagles, foxes, coyotes". That response becomes the live diffusion prompt. The AI no longer renders a cat; it renders whatever is hunting it, morphing dynamically as the VLM's answers evolve with each new inference.

The generation doesn't follow a fixed script. It follows the scene. Prompt state changes smoothly via temporal interpolation rather than cutting abruptly between semantic states. Multiple plugins can run in parallel, chained, or driven from external tools (OSC, UDP) for live performance and installation contexts.

image image

Live completion to steer streaming video generation

Example By drawing live into the feed (e.g., using local Spout streaming), the VLM can be used to drive a visual auto-complete. While initial strokes are ambigious, with more detail, the VLM inference starts to converge and provide increasingly accurate interpretations. Those are directly fed into the live video generation, serving as both a live autocomplete, but also as a means to create animated drawings.

streamdiffusion2_compressed.mp4

Try it out

image

How it works

The plugins slot into Daydream Scope's preprocessor / postprocessor pipeline architecture. A typical split chain:

Camera β†’ [VLM Pre] ──────────────────────► [AI Model] β†’ [VLM Post] β†’ Output
               β”‚ UDP multicast 239.255.42.99       β–²
               └──► [UDP Prompt] ─── prompts β”€β”€β”€β”€β”€β”€β”˜
  • scope-vlm-ollama queries an Ollama vision model on each frame at a configurable interval. Runs as a preprocessor (queries the raw feed, injects the VLM response as a diffusion prompt and broadcasts it via UDP), a postprocessor (receives the UDP text and overlays it on the AI output), or as a combined main pipeline.
  • scope-llm-ollama sends text to an Ollama LLM and injects the rewritten response as a diffusion prompt. Use it to transform a short observation into an elaborate scene description, style directive, or creative prompt.
  • scope-udp-prompt / scope-osc-prompt receive prompts from any external source via UDP multicast or OSC and inject them into the pipeline β€” bridging Python scripts, TouchDesigner, Ableton Live, Max/MSP, or any custom controller.

Semantic responses are broadcast over UDP multicast so any number of downstream plugins receive them simultaneously β€” fan-out with no additional routing. The port number acts as a channel: any plugin listening on the same port gets every message.

Prompt transitions use temporal interpolation (slerp or linear) to blend smoothly between semantic states over a configurable number of frames, rather than snapping abruptly when the VLM's description changes.

Built on Ollama for local or remote VLM/LLM inference. Shared libraries handle all transport, frame conversion, text rendering, and prompt injection (scope-bus, scope-language), so each plugin stays focused on its single role in the chain.

3.mp4
2.mp4
1.mp4
5.mp4
moondream.mp4

Plugins

scope-vlm-ollama

Queries an Ollama vision model on live video. Available as three variants:

Pipeline Role Description
VLM Ollama Main Query VLM + overlay response + inject prompt
VLM Ollama (Pre) Preprocessor Query VLM + inject prompt + broadcast via UDP
VLM Ollama (Post) Postprocessor Receive UDP text + overlay on AI output

Typical chain: [VLM Pre] β†’ [AI Model] β†’ [VLM Post]

The Pre queries the raw camera feed; the Post overlays the description on the AI-processed output.

Key settings:

  • ollama_url / ollama_model β€” load-time connection config
  • vlm_prompt β€” question sent to the VLM with each frame
  • send_interval β€” seconds between VLM queries (VLM is slow; 3–10s typical)
  • inject_prompt / prompt_weight β€” whether to use the VLM response as a diffusion prompt
  • transition_steps β€” frames to blend from current to new prompt (0 = instant)
  • udp_port β€” channel for Preβ†’Post communication (Pre/Post only)

scope-llm-ollama

Sends text to an Ollama LLM and injects the response as a diffusion prompt.

Role: Preprocessor

Use case: Transform a simple input phrase into an elaborate scene description, style directive, or creative prompt. Works well chained before any image generation model.

Key settings:

  • system_prompt β€” LLM personality / rewriting instruction
  • input_prompt β€” the text fed to the LLM each interval
  • send_interval β€” query frequency
  • inject_prompt β€” send LLM response downstream as a diffusion prompt
  • udp_enabled / udp_port β€” optionally broadcast LLM response to other plugins

scope-udp-prompt

Receives text via UDP and injects it as a diffusion prompt.

Role: Preprocessor

Use case: Bridge any external application into Scope's prompt chain. Send prompts from a Python script, a custom controller, or any other tool that can send UDP packets.

Key settings:

  • udp_port β€” channel to listen on (load-time)
  • prompt_weight β€” weight of injected prompt
  • transition_steps β€” frames to blend from current to new prompt (0 = instant)
  • overlay_enabled β€” show received text on video (yellow, top-left) for monitoring

Sending from Python:

import socket

MULTICAST_GROUP = "239.255.42.99"
PORT = 9400

sock = socket.socket(socket.AF_INET, socket.SOCK_DGRAM, socket.IPPROTO_UDP)
sock.setsockopt(socket.IPPROTO_IP, socket.IP_MULTICAST_TTL, 1)
sock.sendto("a moonlit forest, painterly".encode(), (MULTICAST_GROUP, PORT))

scope-osc-prompt

Receives OSC /prompt messages and injects the text as a diffusion prompt.

Role: Preprocessor

Use case: Integrate Scope with TouchDesigner, Ableton Live, Max/MSP, or any other tool that sends OSC. Send a string to /prompt on the configured port and it becomes the active diffusion prompt.

Key settings:

  • osc_port β€” UDP port to listen for OSC messages (load-time, default 9000)
  • prompt_weight β€” weight of injected prompt
  • transition_steps β€” frames to blend from current to new prompt (0 = instant)
  • overlay_enabled β€” show received text on video (yellow, top-left) for monitoring

Sending from TouchDesigner / Python:

from pythonosc.udp_client import SimpleUDPClient

client = SimpleUDPClient("127.0.0.1", 9000)
client.send_message("/prompt", "a misty forest at dawn, painterly")

scope-test-text-log

Debug postprocessor that overlays all pipeline kwargs on the video and prints them to stdout. Shows video shape, prompts, UDP messages, and any extra kwargs flowing through the chain.

Role: Postprocessor Use case: Drop this at the end of any chain to inspect exactly what's flowing between stages.


Installation

Dependencies must be installed before the plugins that use them. Via the Scope UI (installs into the correct venv):

  1. scope-bus
  2. scope-language
  3. scope-vlm-ollama, scope-llm-ollama, scope-udp-prompt, scope-osc-prompt
  4. scope-test-text-log

After installing scope-bus and scope-language, they appear in the Scope UI pipeline list as passthrough pipelines β€” confirming installation and allowing uninstall via the UI.


RunPod Deployment

Daydream Scope Pod

Tested configuration for running Daydream Scope on RunPod:

Setting Value
GPU RTX PRO 6000 (1Γ—)
vCPU / Memory 16 vCPU / 188 GB
Container disk 20 GB
Network volume 80 GB (daydream_scope, mounted at /workspace)
On-demand price ~$1.69/hr compute + $0.003/hr storage
Base template daydream-scope (aca8mw9ivw)

The network volume at /workspace persists across pod restarts β€” use it for model weights and checkpoints.

Split-Instance Architecture (Recommended)

Ollama VLM/LLM queries are slow (1–5s each) and run in background threads, so Ollama doesn't need to share the GPU with diffusion inference. Running Ollama on a separate, cheaper pod frees the main GPU for full-speed diffusion:

[Scope pod: RTX PRO 6000]          [Ollama pod: CPU-only or cheapest GPU]
  StreamDiffusion / other models       ollama serve
  scope-vlm-ollama (pre/post)   ──►   http://<ollama-pod-ip>:11434
  scope-llm-ollama

In each VLM/LLM plugin, set ollama_url (load-time) to the Ollama pod's public IP:

http://213.x.x.x:11434

RunPod exposes port 11434 via the pod's public IP when you add it under Expose TCP Ports in the pod settings.

Ollama Pod Setup

For the Ollama-only pod, use the cheapest CPU pod (or any GPU pod). Paste this as the Container Start Command when creating the pod or template:

curl -fsSL https://raw.githubusercontent.com/olwal/scope-ai-language/main/scripts/setup-ollama-pod.sh | sh

This installs Ollama, pulls qwen3-vl:2b, and starts the server bound to 0.0.0.0:11434. OLLAMA_HOST=0.0.0.0 is required so Ollama is reachable via RunPod's TCP port forwarding β€” without it, Ollama only listens on 127.0.0.1.

To use a different model, set the OLLAMA_MODEL environment variable on the pod before running the script, or add it inline:

OLLAMA_MODEL=llava:7b curl -fsSL https://raw.githubusercontent.com/olwal/scope-ai-language/main/scripts/setup-ollama-pod.sh | sh

The model is downloaded on first boot β€” subsequent restarts skip the pull if the model is cached on a network volume.

Recommended model: qwen3-vl:2b β€” fast, small, capable vision model. For higher quality at the cost of speed: llava:7b or llava:13b.

Creating a RunPod Template

To save this as a reusable template in the RunPod console:

  1. Go to Manage β†’ Templates β†’ New Template
  2. Set Container Image to any base image with CUDA or a plain Ubuntu image (e.g. runpod/base:0.4.0-cuda11.8.0)
  3. Under Container Start Command, paste the Ollama install script above
  4. Under Expose TCP Ports, add 11434 (Ollama API)
  5. Set Container Disk to 5–10 GB (Ollama binary + small model overhead if no volume)
  6. Optionally attach a Network Volume at /root/.ollama to cache pulled models across restarts
  7. Save as private template β€” it will appear in your pod creation flow

For the network volume approach, change the pull line to check first:

ollama pull qwen3-vl:2b 2>/dev/null || true

So re-pulling an already-cached model is a no-op.


Architecture

scope-bus          ← shared transport + rendering library
scope-language     ← Ollama VLM/LLM clients (depends on scope-bus)

scope-vlm-ollama   ← vision language model pipeline (depends on scope-language)
scope-llm-ollama   ← text language model pipeline (depends on scope-language)
scope-udp-prompt   ← receive UDP text β†’ inject as prompt (depends on scope-bus)
scope-osc-prompt   ← receive OSC /prompt β†’ inject as prompt (depends on scope-bus)

scope-test-text-log  ← debug: overlay postprocessor

Pipeline Types

Scope supports three pipeline roles, declared in each plugin's schema.py:

Role usage = Runs Typical use
Main (omit) In the AI model slot Full processing pipelines
Preprocessor [UsageType.PREPROCESSOR] Before the AI model Prompt injection, signal routing
Postprocessor [UsageType.POSTPROCESSOR] After the AI model Overlays, logging, routing

The UDP Bus

Plugins communicate at runtime using UDP multicast on 239.255.42.99. The port number acts as a channel β€” sender and receiver must use the same port. Multiple receivers on the same port all receive every message (fan-out).

[VLM Pre]──UDP:9400──►[VLM Post]   (overlay on AI output)
                  └──►[UDP Prompt] (forward VLM text as prompt)
                  └──►[Text Log]   (debug display)

Shared Libraries

scope-bus

Transport, rendering, and frame utilities. All other plugins depend on this.

from scope_bus import (
    UDPSender,                 # send text/dict via UDP multicast
    UDPReceiver,               # receive text/dict via UDP multicast
    render_text_overlay,       # draw text onto (T, H, W, C) tensors
    apply_overlay_from_kwargs, # render_text_overlay reading from pipeline kwargs dict
    normalize_input,           # list[Tensor] β†’ (T, H, W, C) float32 [0,1]
    tensor_to_pil,             # (H, W, C) tensor β†’ PIL Image
    PromptInjector,            # dedup-inject prompts to output dict
    OverlayMixin,              # Pydantic mixin: overlay appearance fields for schemas
    FontFamily,                # Enum: arial | courier | times | helvetica
    TextPosition,              # Enum: top-left | top-center | bottom-left | bottom-center
)

UDPSender β€” multicast sender with debounced port changes. Accepts strings or dicts (serialised as JSON):

sender = UDPSender(port=9400)
sender.send("a sunset over mountains")          # plain text
sender.send({"prompt": "...", "response": "..."})  # JSON dict
sender.update_port(9401)  # debounced 3s β€” call every frame, applies after stable

UDPReceiver β€” multicast receiver, non-blocking poll. Auto-parses JSON:

receiver = UDPReceiver(port=9400)
msg = receiver.poll()  # str, dict (if JSON), or None

render_text_overlay β€” composites text onto video frames:

frames = render_text_overlay(
    frames,
    text="VLM response here",
    font_family="arial",        # arial | courier | times | helvetica
    font_size=24,
    font_color=(1.0, 1.0, 1.0), # RGB [0,1]
    opacity=1.0,
    position="bottom-left",     # top-left | top-center | bottom-left | bottom-center
    word_wrap=True,
    bg_opacity=0.5,
)

PromptInjector β€” injects prompts only when text changes. Supports instant or smooth transitions:

injector = PromptInjector()

# Instant change (default)
injector.inject_if_new(output, text="a cat on a couch", weight=100.0)
# output["prompts"] is set only when text differs from last call

# Smooth temporal blend (uses Scope's transition API)
injector.inject_if_new(output, text="a stormy sea", weight=100.0,
                       transition_steps=10, interpolation_method="slerp")
# output["transition"] is set with target_prompts + num_steps

normalize_input β€” converts Scope's raw video list to a usable tensor:

frames = normalize_input(video, device)
# video: list of (1, H, W, C) uint8 tensors from Scope
# returns: (T, H, W, C) float32 on device, values in [0, 1]

scope-language

Async Ollama clients for vision and text models.

from scope_language import OllamaVLM, OllamaLLM

OllamaVLM β€” sends video frames to a vision model in a background thread:

vlm = OllamaVLM(url="http://localhost:11434", model="llava:7b")

# In __call__ (runs every frame):
if vlm.should_send(interval=3.0):       # time-throttled
    vlm.query_async(
        frames[0],                       # single (H, W, C) tensor
        prompt="Describe what you see",
        callback=lambda text: sender.send(text),  # optional
    )
description = vlm.get_last_response()   # returns last completed response

OllamaLLM β€” text-to-text, same async pattern:

llm = OllamaLLM(url="http://localhost:11434", model="llama3.2:3b")

if llm.should_send(interval=5.0):
    llm.query_async(
        prompt="a foggy forest",
        system="Rewrite as a cinematic scene description in one sentence.",
    )
response = llm.get_last_response()

Key Daydream Scope Concepts

Concept Documentation
Pipeline interface (Pipeline, Requirements) scope/src/scope/core/pipelines/interface.py
Schema base class (BasePipelineConfig, ui_field_config) scope/src/scope/core/pipelines/base_schema.py
Plugin registration (hookimpl, register_pipelines) scope/src/scope/core/plugins/hookspecs.py
Preprocessor β†’ main pipeline parameter forwarding scope/src/scope/server/pipeline_processor.py
Prompt format ({"text": str, "weight": float}) Consumed by the main diffusion pipeline

About

Generative AI plugins for language-driven, real-time video inference and generation. Ollama VLM/LLM pipelines and UDP prompt routing, built on shared libraries for communication and AI services (scope-bus, scope-language).

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors