"Optimize the flow of your inference"
A production-ready optimization system for ComfyUI that implements non-linear, non-binary optimization strategies for faster generation with lower VRAM usage.
| Node | Purpose | Key Feature |
|---|---|---|
| ConduitCore | Workflow Optimizer | DAG analysis, async scheduling |
| ConduitPool | VRAM Manager | 4-tier memory temperature gradient |
| ConduitGate | Precision Router | Dynamic FP32/FP16/FP8 routing |
| ConduitPath | Speculative Generator | Multi-branch generation with pruning |
| ConduitSeal | Deterministic Cache | Checksum-verified result caching |
| ConduitSense | Type Detector | Auto-detect workflow type |
| ConduitApply | Apply Optimizations | Combine configs and execute |
- Add ConduitCore to set optimization mode (balanced/speed/quality/memory)
- Add ConduitGate to configure precision routing
- Connect to ConduitApply before your model
- Run workflow
- Aggressive FP8 precision on RTX 40 series
- Async model preloading
- Estimated 2-3x speedup
- Full FP32 precision for attention
- No early exit from sampling
- Best visual quality
- Aggressive VRAM offloading
- Streaming decode
- Tile processing for large images
- Adaptive precision based on operation type
- Smart preloading without VRAM pressure
- Good balance of speed and quality
HOT (GPU VRAM) - Active inference
WARM (Pinned) - Next 2-3 models, instant transfer
COLD (RAM) - Recently used, ~500ms load
ARCHIVE (Disk) - Rarely used, predictive prefetch
Generate N branches, score at checkpoints, prune losers:
4 branches @ 25% → Score → Keep 2
2 branches @ 50% → Score → Keep 1
1 branch @ 100% → Output
Result: High-quality output at ~50% compute cost
- ComfyUI (latest)
- PyTorch 2.0+
- CUDA 11.8+ (for FP8 on RTX 40 series)
Automatically detects and uses:
- RTX 40 series FP8 TensorCores (1320 TFLOPS)
- BF16 on Ampere+ GPUs
- Mixed precision for optimal performance
Apache 2.0 - See LICENSE file
Built with advanced optimization research