Skip to content

SurrealByDesign/ComfyUI-Storyboard-Layout

Repository files navigation

ComfyUI Storyboard Layout & BBox Engine

A visual layout engine for ComfyUI. Give it a shot, it decides where things go.

Status: released. v0.1.1 — six nodes implemented and tested (81 tests), verified end-to-end in a real ComfyUI install, and published on the ComfyUI Registry under surrealbydesign.

This pack takes a shot definition and turns it into composition layouts and normalized bounding boxes — all as editable, human-readable JSON — plus a simple preview to inspect placement before you render anything.

It answers "given a shot, where should things go?" — not "what shots should exist?"

This is not a shot/storyboard generator, not a prompt generator, and not an image generator. It is a visual layout engine. It contains no LLM, no cloud calls, no external APIs — everything is deterministic and runs offline. Shot planning, sequencing, and narrative breakdown are explicitly out of scope and belong in a separate shot-list-generation tool; this repo doesn't duplicate that responsibility, by design.


Where it sits in the pipeline

Your shot source  →  shot.json  →  THIS layout engine  →  layout.json
(any source)                                            →  bbox.json
                                                         →  downstream image tools

Shot records can come from a shot-list-generation tool, manual input, another ComfyUI node, or an external JSON file — the engine consumes them all equally and never generates its own.

Design principles

JSON first · human-readable · no black boxes · everything editable · composable nodes · no AI magic · deterministic (same input → same output) · layout, not story · source-agnostic input.

Coordinate system

All bounding boxes are normalized to 0.0 → 1.0. Origin (0, 0) is the top-left of the frame; x increases right, y increases down. A box is { "x", "y", "width", "height" } and a valid box satisfies x + width ≤ 1.0 and y + height ≤ 1.0. Normalization keeps layouts resolution- and aspect-independent — the same plan works at any output size.

Nodes (v0.1.1)

Five required nodes, plus one optional classical computer-vision node:

Node Inputs Output Behavior notes
Shot Record Loader shot_json (STRING, JSON), or shot_file (path), or shot_type+subjects (manual fields) shot_json — normalized shot record Validates and normalizes a shot definition. Trims/de-duplicates subjects, defaults aspect to 16:9, assigns a shot_id if missing. Processes one shot per execution. Never invents subjects or shots.
Composition Layout Generator shot_json; optional composition (combo: auto or one of the 8 named compositions) layout_json — composition + subject→slot assignments (no boxes yet) Resolves which composition to use: explicit input → the shot's own composition field → a deterministic, subject-count-aware default (1 subject → centered, 2+ → rule_of_thirds).
BBox Generator layout_json bbox_json{subject: {x, y, width, height}} Resolves each subject's slot assignment into an actual normalized box. Subjects beyond a composition's slot count are placed on a deterministic overflow grid.
Storyboard Preview Renderer bbox_json; optional shot_json (for the title label), width, height (default 768×432) preview — an IMAGE tensor Draws a labeled rectangle per subject on a plain frame. A visualization aid, not an image generator.
Export Tools any combination of shot_json / layout_json / bbox_json; filename_prefix; optional title saved_path — STRING Bundles whatever is connected into a versioned project JSON file, written to ComfyUI's output directory (or ./output if run standalone).
BBox from Reference Image (optional) image (IMAGE); optional max_subjects, threshold_factor, close_size bbox_json Classical computer vision: frequency-tuned saliency + Otsu adaptive thresholding + morphological closing + connected components. No neural network. Only loads if numpy is available (zero new runtime dependencies — see docs/COMPUTATION_STRATEGY.md). See Known limitations below before relying on it.

Compositions: centered, rule_of_thirds, left_weighted, right_weighted, closeup, medium, wide, over_shoulder. Defined as data in storyboard_layout/data/compositions.json — add your own without touching code.

Example

Input shot (shot.json, conforms to docs/schemas/shot.schema.json):

{ "shot_type": "wide_establishing", "subjects": ["knight", "cathedral"] }

Layout output (layout.json — a plan, no geometry yet; conforms to docs/schemas/layout.schema.json):

{
  "composition": "rule_of_thirds",
  "assignments": [
    { "subject": "knight", "slot": 0, "role": "primary" },
    { "subject": "cathedral", "slot": 1, "role": "secondary" }
  ]
}

BBox output (bbox.json, conforms to docs/schemas/bbox.schema.json):

{
  "cathedral": { "x": 0.64, "y": 0.45, "width": 0.22, "height": 0.38 },
  "knight":    { "x": 0.28, "y": 0.30, "width": 0.28, "height": 0.52 }
}

More fixtures, including a ready-to-load ComfyUI workflow, live in examples/.

Installation

cd ComfyUI/custom_nodes
git clone https://github.com/SurrealByDesign/ComfyUI-Storyboard-Layout
# restart ComfyUI

Or install via ComfyUI-Manager / the ComfyUI Registry listing.

The pack is dependency-light — Pillow / numpy / torch are already provided by ComfyUI; no additional runtime packages are required for the five core nodes. The optional reference-image node also needs only numpy + Pillow (no new dependency). Dev tooling for running the test suite is in requirements-dev.txt.

Quick start

Shot Record Loader ──► Composition Layout ──► BBox Generator ──► Export Tools
                                │                   │                ▲
                                └──► Storyboard Preview Renderer ─────┘

A ready-to-load workflow (both UI and API format) is in examples/example_workflow.json / example_workflow_api.json.

Known limitations (v0.1.1)

  • One shot per execution. Every node processes a single shot; there is no batch mode yet. Multi-shot batching is the planned first feature for the next minor version — see CONTRIBUTING.md for the design notes if you want to pick it up.
  • The reference-image node is a classical-CV baseline, not a trained detector. It works well on clean, high-contrast inputs (a few distinct shapes or regions) but produces coarser, noisier boxes on busy or low-contrast real photos. It also assumes the subject is a minority of the frame — a single subject filling most of the frame can be mis-segmented. This is an inherent property of mean/Otsu-based saliency, not a bug either threshold method fixes on its own.

Running the tests

pip install -r requirements-dev.txt
pytest

81 tests: schema conformance, determinism, round-trips, in-frame invariants, overflow handling, classical-CV correctness, and two regression guards (a golden-file test against the committed example outputs, and a static scan ensuring core/ never imports networking libraries).

Project documents

Doc Purpose
ARCHITECTURE.md Folder layout, node contracts, data model, engine design
TESTING.md Testing strategy
docs/COMPUTATION_STRATEGY.md Traditional-first build-vs-ML policy (T1–T4 tiers)
CONTRIBUTING.md How to contribute, open work, versioning policy
CHANGELOG.md What changed, release by release
docs/schemas/ JSON Schema definitions

License

MIT © 2026 SurrealByDesign.

About

ComfyUI visual layout engine: turn a shot definition into composition layouts and normalized bounding boxes — deterministic, offline, no AI.

Resources

License

Contributing

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages