Recommended models for prompt expansion:
Best (Most capable):
Meta-Llama-3.1-8B-Instruct- Great instruction followingQwen2.5-7B-Instruct- Excellent at detailed descriptionsMistral-7B-Instruct-v0.3- Good balance of speed and quality
Good (Faster):
Meta-Llama-3-8B-Instruct- Solid, widely compatiblePhi-3-medium-4k-instruct- Good for shorter prompts
Avoid:
- Base models (non-instruct versions)
- Very small models (< 7B parameters)
- Chat-tuned models optimized for conversation rather than instruction following
In LM Studio, go to Local Server tab:
Setting: Context Length / Max Context
Recommended: 8192 or higher
Why: Our prompts can be long (especially cinematic tier)
Low context (2048) = May cut off instructions ❌
Good context (8192) = Handles all tiers comfortably ✅
High context (16384+) = Overkill but works ✅
Setting: GPU Offload / GPU Layers Recommended: As many as your GPU can handle Why: Faster generation = better experience
CPU only (0 layers) = Very slow but works
Partial GPU (20-30 layers) = Good balance
Full GPU (all layers) = Fastest ✅
Setting: Server Port
Default: 1234
Our node expects: http://localhost:1234/v1
✅ Keep default unless you have a conflict
Temperature is CRITICAL for instruction following!
You control temperature in the node interface (0.1 - 2.0)
| Use Case | Temperature | Why |
|---|---|---|
| Maximum detail | 0.3 - 0.5 | LLM follows instructions precisely |
| Balanced | 0.6 - 0.8 | Good creativity + instruction following |
| Creative/Random | 0.9 - 1.2 | More unexpected choices |
| Wild experiments | 1.3 - 2.0 | Very unpredictable |
For prompt expansion, start with 0.6-0.7
Low (0.3):
- Follows word count requirements better
- More predictable output
- Less creative descriptions
- Better for specific cinematography terms
Medium (0.7):
- Good balance
- Still follows instructions
- More varied language
- Recommended starting point
High (1.2+):
- Very creative
- May ignore word counts
- May not use exact technical terms
- Good for "random" preset
Setting: Repeat Penalty / Repetition Penalty
Recommended: 1.1 to 1.15
Why: Prevents repetitive descriptions
Too low (1.0) = May repeat same adjectives
Good (1.1) = Varied language ✅
Too high (1.3+) = May avoid important terms
Setting: Top P
Recommended: 0.9 to 0.95
Why: Controls randomness
Lower (0.8) = More focused
Default (0.95) = Good balance ✅
Higher (0.99) = More random
Setting: Top K
Recommended: 40 to 50
Why: Limits vocabulary choices
Lower (20) = More predictable
Default (40-50) = Good ✅
Higher (100) = More varied
Important: Our ComfyUI node sends the system prompt automatically!
❌ Do NOT set a system prompt in LM Studio's "System Prompt" field
Why?
- Our node sends detailed instructions already
- LM Studio's system prompt would conflict
- Could confuse the model
Leave LM Studio's system prompt EMPTY or use the default.
Context: 8192+
Temperature: 0.6-0.7
Repeat Penalty: 1.1
Works great with all presets ✅
Context: 8192+
Temperature: 0.5-0.7
Repeat Penalty: 1.05-1.1
Excellent at descriptive detail ✅
May need lower temperature for instruction following
Context: 8192+
Temperature: 0.7-0.8
Repeat Penalty: 1.1
Good balance of speed and quality ✅
Context: 4096+ (has lower context)
Temperature: 0.6
Repeat Penalty: 1.1
Faster but may struggle with cinematic tier
Best for basic/enhanced tiers
Solutions:
- Lower temperature (try 0.4-0.5)
- Use better model (Llama 3.1 > Llama 3 > smaller models)
- Increase context length to 8192+
- Try different model altogether
Test:
Input: "cat playing piano"
Tier: advanced
Temperature: 0.5
Expected: 400-600 words minimum
Solutions:
- Lower temperature to 0.5 or less
- Use instruct-tuned model (must have "Instruct" in name)
- Check you're not setting conflicting system prompt in LM Studio
- Try different model
Test:
Input: "robot in city"
Preset: random
Expected: Output mentions "robot" and "city"
If not: Model not following instructions
Solutions:
- Lower temperature (0.4-0.6)
- Use Llama 3.1 or Qwen (better instruction following)
- Try "advanced" or "cinematic" tier
- Check model is instruct-tuned
Test:
Input: "spaceship flying"
Shot Size: wide shot
Lighting: edge lighting
Expected: Output must mention "wide shot" and "edge lighting"
Solutions:
- Increase repeat penalty to 1.15-1.2
- Increase temperature slightly (0.7-0.8)
- Generate multiple variations
Solutions:
- Lower temperature (0.5)
- Use newer model (Llama 3.1, Qwen 2.5)
- Check node is using latest version with better parsing
- Try different model
✅ Use quantized models (Q4_K_M or Q5_K_M)
✅ Full GPU offload
✅ Lower context (4096-8192)
✅ Batch size: 512
⚠️ Avoid F16/F32 models (slower)
✅ Use Q6_K or higher quantization
✅ Higher context (16384)
✅ Lower temperature (0.5-0.6)
✅ Better models (Llama 3.1, Qwen 2.5)
✅ Use Q4_K_M quantization (faster)
✅ Temperature 0.7-0.8 (variety)
✅ Generate 3 variations per run
✅ Use wildcards for more variety
Model: Meta-Llama-3.1-8B-Instruct-Q5_K_M
Context Length: 8192
GPU Layers: Maximum available
Temperature: 0.7 (set in ComfyUI node)
Repeat Penalty: 1.1
Top P: 0.95
Top K: 40
System Prompt: EMPTY (let ComfyUI node handle it)In ComfyUI Node:
llm_backend: lm_studio
model_name: llama3.1
api_endpoint: http://localhost:1234/v1
temperature: 0.7
- Start LM Studio with recommended settings
- In ComfyUI, use this test:
Input: "cat playing piano"
Preset: cinematic
Tier: advanced
Temperature: 0.6
Expected: 400-600 words, mentions cat, piano, professional cinematography terms
-
Check output length:
- < 200 words = Model not following instructions
- 200-400 words = Okay, but try lower temperature
- 400+ words = Good! ✅
-
Check quality:
- Mentions subject (cat, piano)? ✅
- Uses cinematography terms? ✅
- Flows naturally? ✅
- Detailed descriptions? ✅
In LM Studio:
- Click Search icon (🔍)
- Search for "Llama-3.1-8B-Instruct"
- Download Q5_K_M or Q4_K_M version
- Wait for download
- Load model in Local Server tab
Recommended downloads:
Meta-Llama-3.1-8B-Instruct-Q5_K_M.gguf (6GB)
or
Meta-Llama-3-8B-Instruct-Q5_K_M.gguf (6GB)
For faster generation (if GPU limited):
Meta-Llama-3.1-8B-Instruct-Q4_K_M.gguf (4.5GB)
Before using the prompt expansion node:
- Good model loaded (Llama 3.1 or Qwen 2.5)
- Context length: 8192+
- GPU layers: Maximum
- Server running on port 1234
- System prompt: EMPTY in LM Studio
- Temperature: 0.6-0.7 (in ComfyUI node)
Then test:
- Generate a prompt
- Check it's 400+ words (for advanced tier)
- Verify it mentions your input concept
- Confirm cinematography terms are used
If things still aren't working:
-
Check LM Studio console for errors
-
Check ComfyUI console for errors
-
Try the absolute simplest test:
- Input: "cat"
- Tier: basic
- Temperature: 0.5
- Should get 150+ words about a cat
-
Model issues?
- Try different model
- Llama 3.1 is most reliable
- Qwen 2.5 is best for detail
-
Still problems?
- Check model is "Instruct" version
- Verify server is running
- Test with curl or another client
Quick Start: Load Llama 3.1, set context to 8192, leave system prompt empty, use temperature 0.7 in the node. That's it!