⚠️ EXPERIMENTAL CODE — DO NOT PULL ⚠️

CaptionForge is under active, unstable development.

This repository is currently a code-safekeeping and development checkpoint for the author. It is not ready for general installation, ComfyUI Registry use, production workflows, or third-party support.

APIs, node names, file layout, model-loading behavior, JSONL schemas, semantic profiles, and output formats may change without warning.

Please do not clone, install, package, fork for use, or submit issues expecting support yet.

CaptionForge

CaptionForge is a model-agnostic captioning framework for ComfyUI, designed to generate cleaner, more consistent, auditable captions for LoRA dataset preparation and image dataset analysis.

Rather than treating any single captioning model as the final authority, CaptionForge is built around the idea that caption quality can be improved through structured comparison, repeated observation, audit trails, claim extraction, statistical agreement, LLM-assisted reasoning, and consensus-based refinement.

The long-term goal is to approximate human-quality descriptive captions by using existing vision and language models inside a more deliberate captioning infrastructure.

Current Status

CaptionForge now includes its first working ComfyUI implementation.

The current codebase provides:

Qwen-family captioning through a full ComfyUI node
JoyCaption/LLaVA-family captioning through a full ComfyUI node
Lite daily-use caption nodes for Qwen and Joy
Shared model cache management
Pass A JSONL caption audit records
Pass B text-only claim extraction
LLM-ready Pass B scaffolding for future local text-model integration
Frontend icon branding for CaptionForge nodes

This is still early development, but the repository now contains working code rather than only a project placeholder.

Vision

Image captioning models are powerful, but individual captions can still contain:

hallucinated visual details
missing important features
inconsistent wording
repetition
awkward grammar
overconfident but unsupported claims
model-specific biases

CaptionForge aims to reduce these problems by treating captions as evidence, not answers.

Multiple engines can describe the same image. Their outputs can then be audited, decomposed into visual claims, compared, normalized, scored, and recombined into cleaner final captions.

Current Nodes

CaptionForge currently registers the following ComfyUI nodes:

JLC Qwen Caption

Full Qwen-family vision-language captioning node.

Supports direct ComfyUI IMAGE input, file/folder captioning, prompt presets, custom prompts, prompt files, TXT sidecars, JSONL audit output, run configuration logging, model download probing, and optional bitsandbytes 8-bit loading.

JLC Joy Caption

Full JoyCaption/LLaVA-family captioning node.

Supports direct ComfyUI IMAGE input, file/folder captioning, CaptionForge prompt presets, JoyCaption-native template controls, TXT sidecars, JSONL audit output, run configuration logging, model download probing, and memory-efficient 8-bit loading.

JLC Qwen Caption (Lite)

Minimal direct-image Qwen captioning node for daily interactive use.

This Lite node exposes only the core controls needed for quick captioning inside ComfyUI.

JLC Joy Caption (Lite)

Minimal direct-image JoyCaption node for daily interactive use.

This Lite node exposes only the core controls needed for quick captioning inside ComfyUI.

JLC CaptionForge Claim Extractor

Text-only Pass B node.

Consumes shared Pass A caption JSONL, groups records by image_key, extracts atomic visual claims, normalizes rough equivalents, preserves source references, flags simple conflicts, and writes one Pass B claim JSONL record per image.

Current Files

The first code seed includes:

__init__.py
captionforge_model_cache.py
captionforge_claim_engine.py
jlc_qwen_caption_engine.py
jlc_joy_caption_engine.py
jlc_qwen_caption_CUI_node.py
jlc_joy_caption_CUI_node.py
jlc_qwen_caption_lite_CUI_node.py
jlc_joy_caption_lite_CUI_node.py
jlc_captionforge_claim_extractor_CUI_node.py
web/jlc_captionforge_icons.js
web/assets/icons/jlc-comfyui-nodes_Logo-Dark-0128.png

Pipeline Direction

CaptionForge is being designed around a multi-pass strategy.

Pass A — Caption Generation

Generate one or more captions from one or more engines.

Current Pass A engines include:

Qwen-family VLM captioning
JoyCaption/LLaVA-family captioning

Pass A records are written as JSONL audit entries containing model metadata, prompt settings, generation parameters, raw captions, cleaned captions, and image keys.

Pass B — Claim Extraction

Convert caption text into atomic visual claims.

Current Pass B status:

deterministic heuristic backend
manual JSON backend for parser testing
LLM-ready prompt and response schema
parser warnings
rejected claim records
optional raw response preservation
future-LLM prompt bundle export

Live LLM extraction is not yet integrated.

Future Passes

Planned future stages may include:

normalizing equivalent claims
comparing claims across engines and repeated runs
detecting contradictions
scoring claims by support and usefulness
validating captions against claim evidence
text-only LLM cleanup/refinement
producing final LoRA-ready captions
preserving all intermediate evidence in auditable records

The final caption should be less like a single model guess and more like a consensus summary of supported visual evidence.

Model-Agnostic Direction

CaptionForge is intended to remain engine-democratic.

The framework should be able to incorporate different kinds of captioning, tagging, or visual reasoning systems, such as:

local VLM captioners
specialized aesthetic or tagging models
general-purpose multimodal models
LLM-based text cleanup passes
validator models
future visual reasoning systems
additional robustness engines as they become useful

No single model is assumed to be canonical.

Qwen and Joy are the first working engines, not the final boundary of the project.

Installation

Clone or copy this repository into your ComfyUI custom_nodes folder:

ComfyUI/custom_nodes/CaptionForge

Restart ComfyUI after installation.

Model files are not included in this repository. CaptionForge nodes use local model folders under ComfyUI’s models/LLM/ directory and may offer download-probe behavior through the node UI.

Model Locations

Current model roots:

ComfyUI/models/LLM/JLC_QwenCaption/

ComfyUI/models/LLM/JLC_JoyCaption/

Qwen and Joy models are Hugging Face model directories, not single checkpoint files.

Large model weights are intentionally not stored in this repository.

Notes

CaptionForge is designed for local workflows and practical dataset preparation.

The current implementation favors:

reproducible settings
auditable intermediate records
explicit JSONL outputs
separation between ComfyUI node wrappers and reusable engines
local model execution
careful VRAM/cache management
incremental development toward consensus-based caption refinement

Development Status

This repository is in early active development.

APIs, node names, file layout, prompt presets, model registries, and output schemas may evolve before a stable public release.

Attribution & License

Concept and implementation by J. L. Córdova, with development assistance from ChatGPT (OpenAI).

Released under the MIT License. See LICENSE for details.

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
engines		engines
nodes		nodes
semantic_profiles		semantic_profiles
web		web
.gitignore		.gitignore
DirectoryStructure.md		DirectoryStructure.md
LICENSE		LICENSE
README.md		README.md
__init__.py		__init__.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

⚠️ EXPERIMENTAL CODE — DO NOT PULL ⚠️

CaptionForge

Current Status

Vision

Current Nodes

JLC Qwen Caption

JLC Joy Caption

JLC Qwen Caption (Lite)

JLC Joy Caption (Lite)

JLC CaptionForge Claim Extractor

Current Files

Pipeline Direction

Pass A — Caption Generation

Pass B — Claim Extraction

Future Passes

Model-Agnostic Direction

Installation

Model Locations

Notes

Development Status

Attribution & License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

⚠️ EXPERIMENTAL CODE — DO NOT PULL ⚠️

CaptionForge

Current Status

Vision

Current Nodes

JLC Qwen Caption

JLC Joy Caption

JLC Qwen Caption (Lite)

JLC Joy Caption (Lite)

JLC CaptionForge Claim Extractor

Current Files

Pipeline Direction

Pass A — Caption Generation

Pass B — Claim Extraction

Future Passes

Model-Agnostic Direction

Installation

Model Locations

Notes

Development Status

Attribution & License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages