Master Thesis Project — An AI-assisted pipeline for generating, editing, animating, and applying effects to procedural 2D shapes in Unity using HLSL shaders, ShaderGraph, and large language models.
- Overview
- Generated Shape Gallery
- Features
- Architecture
- Pipeline Flows
- Chatbot Interface
- Knowledge Base
- Human Review & Knowledge Base Curation
- Experiment Results
- Setup & Installation
- Configuration
- Editor Windows
- Output File Locations
- API Reference
- Project Structure
- Tech Stack
This Unity Editor tool enables non-programmers to create, edit, animate, and apply effects to 2D procedural shapes entirely through natural language — no HLSL coding required.
The system combines a RAG (Retrieval-Augmented Generation) pipeline, multiple LLM providers (Google Gemini, OpenAI GPT-4o, Anthropic Claude), and an interactive chatbot interface to guide users through the full asset creation lifecycle.
All shapes are built on Signed Distance Fields (SDF) — mathematical functions that return the distance from any point to a shape's boundary, enabling smooth, resolution-independent procedural geometry entirely in HLSL with no textures.
SDF concept: f(p) < 0 = inside the shape, f(p) = 0 = on the boundary, f(p) > 0 = outside. Every generated HLSL shader composes multiple SDF primitives to construct the final shape.
User Prompt / Reference Image
↓
Shape Decomposition
↓
Knowledge Base Retrieval (184+ shapes, embedding search)
↓
LLM HLSL Code Generation (Gemini)
↓
ShaderGraph + Material Build
↓
VLM Visual Evaluation (GPT-4o) — iterates until score > 7/10
↓
Preview Quad in Scene + PNG Screenshot
↓
Edit / Animate / Add Effects (follow-up flows)
All shapes below are purely procedural HLSL — no textures, no sprites. Each is generated end-to-end from a natural language prompt, rendered in Unity's Universal Render Pipeline at 512 × 512 px. Colours, proportions, and style parameters are adjustable through exposed ShaderGraph properties without touching any code.
![]() |
![]() |
![]() |
| Fox Face | Procedural Sword | Cartoon Sunflower |
![]() |
![]() |
![]() |
| Cartoon Mushroom | Ice Cream Cone | Watermelon Slice |
![]() |
![]() |
![]() |
| Stylized Christmas Tree | Cartoon Rainbow | Procedural Hot Dog |
![]() |
![]() |
![]() |
| Cute Star | Cartoon Cactus | Cartoon UFO |
| Feature | Description |
|---|---|
| Text-to-Shape | Describe any 2D shape in natural language; get an HLSL shader + Unity material |
| Image-to-Shape | Upload a reference image; Gemini Vision recreates it as a procedural shader |
| HLSL Import | Upload an existing .hlsl file; the tool wraps it in ShaderGraph with sensible property values |
| Shape Editing | AI classifies whether an edit needs a property tweak or full HLSL rewrite, then applies it |
| Animation | AI generates a C# MonoBehaviour to animate material properties; adds new shader properties when needed |
| Pixelation Effect | Client-side ShaderGraph node injection (Floor/Divide UV quantisation) — no LLM call required |
| Glow Effect | Client-side ShaderGraph modification: all colour properties → HDR, glowIntensity multiplier node injected before Base Color, URP Bloom post-processing volume added to scene — no LLM call required |
| Stacked Effects | Pixelation and Glow can be combined; applying one effect on top of the other preserves both |
| VLM Quality Loop | GPT-4o scores rendered previews 1–10; automatically refines until threshold is met |
| RAG Knowledge Base | 184+ verified shapes with embeddings for semantic retrieval; grows with each accepted generation |
| Chatbot UI | Conversational editor window with state machine, quick replies, material/image pickers |
| Human Review | EditorWindow for scoring, accepting, and curating generated shapes into the knowledge base |
Each generated HLSL shader is automatically wired into a Unity ShaderGraph with all parameters exposed as editable material properties:
| Generated ShaderGraph | Exposed Material Properties |
|---|---|
![]() |
![]() |
| Auto-generated ShaderGraph node graph for a procedural shape | Exposed shader properties in the Unity Inspector — adjustable without touching code |
Assets/ShaderGraphGenerator/
├── ShaderGraphJSONGenerator.cs ← HLSL → ShaderGraph JSON (with optional pixelation / glow nodes)
├── ShaderGraphNodeFactory.cs ← Creates typed ShaderGraph node objects
├── ShaderGraphPropertyFactory.cs ← Creates shader property definitions
├── ShaderGraphSlotFactory.cs ← Creates node input/output slot definitions
├── HLSLFunctionInfo.cs ← Parsed HLSL function metadata
├── FunctionParameter.cs ← Single parameter (name, type, direction)
│
├── Editor/
│ ├── Core/
│ │ ├── API/
│ │ │ ├── GeminiApiService.cs ← Gemini text & vision (HLSL generation, classification)
│ │ │ ├── OpenAIApiService.cs ← GPT-4o Vision (VLM scoring, image description)
│ │ │ └── ClaudeApiService.cs ← Claude (alternative structured HLSL generation)
│ │ ├── LLMDataModels.cs ← Serialisable LLM request/response structures
│ │ ├── MaterialPreviewHelper.cs ← Preview quads, screenshots, property application
│ │ ├── ShaderGenerationPipeline.cs ← Core: HLSL → ShaderGraph → Material → Preview
│ │ └── ShaderPromptBuilder.cs ← Prompt construction for shape generation
│ │
│ ├── Chat/
│ │ ├── ChatbotWindow.cs ← IMGUI chat window (bubbles, quick replies, pickers)
│ │ ├── ChatBridge.cs ← HTTP server (port 7723) + all pipeline triggers
│ │ ├── ChatBridgeLocal.cs ← Direct (non-HTTP) entry point for the IMGUI window
│ │ └── ChatSession.cs ← Static state machine (26 states) + message history
│ │
│ ├── KnowledgeBase/
│ │ ├── SemanticShapeSearch.cs ← Cosine-similarity search over embedding vectors
│ │ ├── ShapeEmbeddingService.cs ← Generates embeddings via OpenAI API
│ │ ├── ShapeMetadata.cs ← Data structures: ShapeMetadata, AnimatorEntry, etc.
│ │ ├── HLSLParser.cs ← Parses HLSL to extract function signatures
│ │ └── KnowledgeBaseLLMService.cs ← LLM-powered shape analysis and metadata extraction
│ │
│ └── RAG/
│ ├── Pipelines/
│ │ ├── RAGPipelineManager.cs ← Full text-to-shape pipeline (7 steps + VLM loop)
│ │ ├── ImageToShaderPipelineManager.cs ← Image-to-shader with Gemini Vision
│ │ └── HLSLUpdatePipelineManager.cs ← Edit pipeline: before/after VLM comparison
│ ├── Generation/
│ │ ├── RAGShapeGenerator.cs ← Decompose → retrieve → LLM compose
│ │ ├── ShapeDecompositionService.cs ← Breaks complex shapes into components
│ │ ├── ShaderGraphBuilder.cs ← HLSL + LLM response → ShaderGraph + Material
│ │ └── HLSLCompositionEngine.cs ← Merges multiple HLSL primitives
│ ├── Animation/
│ │ ├── MaterialAnimatorPipelineManager.cs ← C# animation script pipeline (domain-reload safe)
│ │ ├── AnimationScriptGenerator.cs ← LLM C# generation + animation classification
│ │ └── AnimationKnowledgeBase.cs ← Animation KB helpers + embedding search
│ ├── Edit/
│ │ └── EditClassifier.cs ← Classifies edits: property-only vs HLSL update
│ ├── Curation/
│ │ └── KnowledgeBaseUpdater.cs ← Ingests shapes into knowledge base
│ └── Windows/
│ ├── UnifiedGeneratorWindow.cs ← Main RAG generator UI (text + image modes)
│ ├── MaterialAnimatorWindow.cs ← Animation generation UI
│ ├── RAGHumanReviewWindow.cs ← Human scoring and KB curation
│ ├── RAGAutoLearnWindow.cs ← Auto-ingest successful results
│ ├── RAGUpdateWindow.cs ← Shape editing UI
│ └── ImageToShaderWindow.cs ← Image upload + generation controls
User text prompt
→ ShapeDecompositionService (break into visual components)
→ SemanticShapeSearch (find top-2 KB examples per component)
→ RAGShapeGenerator (build augmented prompt with retrieved HLSL)
→ GeminiApiService (generate new HLSL code)
→ ShaderGraphBuilder (HLSL → .shadergraph JSON)
→ MaterialPreviewHelper (create .mat + render 512×512 PNG)
→ OpenAIApiService (VLM score 1–10; if < 7 refine, max 3 tries)
→ Result: .hlsl + .shadergraph + .mat + preview PNG
Stages 1–4 of the RAG pipeline: the user prompt is decomposed into visual components (e.g. "tall cactus body", "flower pot with rim", "small oval spines"), each component is matched against the 184-shape Knowledge Base via embedding search, and the retrieved HLSL examples are passed to Gemini to compose the final shader.
Reference image (PNG/JPG)
→ OpenAIApiService.DescribeImage (GPT-4o: detailed visual description)
→ ShapeDecompositionService (decompose description)
→ SemanticShapeSearch (retrieve KB examples)
→ GeminiApiService.Vision (generate HLSL with both text + original image)
→ (same as text pipeline from ShaderGraphBuilder onward)
User uploads .hlsl file
→ Copy to Assets/ShaderGraphs/Generated/HLSL/
→ ShaderGraphJSONGenerator.GenerateFromHLSL (no pixelation)
→ MaterialPreviewHelper.CreateMaterialForShaderGraph
→ MaterialPreviewHelper.SetRandomMaterialProperties (safe baseline)
→ GeminiApiService (read HLSL → suggest sensible property values)
→ MaterialPreviewHelper.SetDefaultMaterialProperties (override with LLM values)
→ MaterialPreviewHelper.CreatePreviewQuad
→ Result: shadergraph + material with good defaults + preview
User edit request + material
→ EditClassifier.ClassifyEditRequestAsync (Gemini)
├── needs_hlsl_change = false → ApplyMaterialPropertyChanges (SetFloat/SetColor/SetVector)
│ → Preview quad in scene
└── needs_hlsl_change = true → HLSLUpdatePipelineManager
(extract HLSL → Gemini update → before/after VLM verify)
→ Preview quad + result image in chat
Editing example — scarf pattern updated via a single natural language instruction:
| Before | After |
|---|---|
![]() |
![]() |
| "A cartoon snowman" | "Make the scarf striped" |
User animation request + material
→ AnimationScriptGenerator.ClassifyAnimationRequirementsAsync (Gemini)
├── C# only → AnimationScriptGenerator.GenerateAnimationScriptAsync
│ → Write .cs → Unity recompiles → [domain reload] → attach to preview quad
└── HLSL needed → HLSLUpdatePipelineManager (add missing properties)
→ AnimationScriptGenerator.GenerateAnimationScriptAsync (on updated material)
→ Write .cs → domain reload → attach to quad
Source material
→ HLSLUpdatePipelineManager.ExtractHlslPathFromMaterial
→ detect if source already has Glow (preserves it if so)
→ ShaderGraphJSONGenerator.GenerateFromHLSL (..., usePixelation: true, useGlow: <preserved>)
(injects: UV × PixelCount → Floor → ÷ PixelCount nodes)
→ MaterialPreviewHelper.CreateMaterialForShaderGraph
→ CopyMatchingMaterialProperties (original → effect)
→ SetFloat("PixelCount", 64)
→ CreatePreviewQuad → named "Pixelation Effect — {name}" in scene
Source material
→ HLSLUpdatePipelineManager.ExtractHlslPathFromMaterial
→ detect if source already has Pixelation (preserves it if so)
→ ShaderGraphJSONGenerator.GenerateFromHLSL (..., useGlow: true, usePixelation: <preserved>)
(converts all colour properties to HDR colorMode=1)
(injects: FinalColor → Multiply(A) × glowIntensity(B) → BaseColor)
→ MaterialPreviewHelper.CreateMaterialForShaderGraph
→ CopyMatchingMaterialProperties (original colors + floats → effect)
→ SetFloat("glowIntensity", 2)
→ EnsureBloomVolume() — finds or creates a global URP Volume with Bloom, intensity=2
→ CreatePreviewQuad → named "Glow Effect — {name}" in scene
Stacking rules:
- Applying Glow to a Pixelated material → output named
{base}_Glow_Pixelated - Applying Pixelation to a Glowed material → output named
{base}_Glow_Pixelated - Both effects are always re-generated from the original HLSL source, so stacking is lossless
Visual effects example — same Cute Star shape with each post-process applied, including the stacked Glow + Pixelated variant:
| Original | Pixelation Effect | Glow Effect | Glow + Pixelated |
|---|---|---|---|
![]() |
![]() |
||
| Original shader | Pixelation (PixelCount = 64) | Glow (glowIntensity = 2, URP Bloom) | Both effects stacked (_Glow_Pixelated) |
All effects are zero-cost (no LLM call) and take under 2 seconds to apply.
The chatbot (ChatbotWindow.cs + ChatBridge.cs) is driven by a 26-state machine:
MainMenu
├── new_shape → NewShape_InputMode → [Text | Image] → Generating → Reviewing → PostGen
│ ├── edit → Edit_Describe → Edit_Running → PostGen
│ ├── animate→ Animate_Describe → Animate_Running → PostGen
│ └── effect → Effect_Pick → Animate_Running → PostGen
├── edit → Edit_Attach → Edit_Describe → Edit_Running → PostGen
├── animate → Animate_Attach → Animate_Describe → Animate_Running → PostGen
├── effect → Effect_Attach → Effect_Pick → Animate_Running → PostGen
├── hlsl → HLSL_Attach → HLSL_Running → HLSL_Done
├── explain → Explain → MainMenu
└── contact → Contact_Name → Contact_Intent → Contact_Email → Contact_Message → MainMenu
The chatbot also runs as an HTTP server on port 7723 with the following endpoints:
| Endpoint | Method | Purpose |
|---|---|---|
/ or /chat |
GET | Serves the web-based chat UI (chat.html) |
/send |
POST | Main message handler; triggers pipeline or state transition |
/image |
POST | Accepts base64-encoded image uploads |
/status |
GET | Returns current status message and last preview path |
/abort |
GET | Cancels the current in-progress pipeline |
/history |
GET | Returns full chat history, current state, and state config |
Users interact through typed messages and quick-reply buttons that appear contextually. A typical session:
- The window opens at
MainMenu— quick replies: New Shape, Edit Shape, Animate, Add Effect, Import HLSL. - New Shape → Text: type a description (e.g. "a cartoon cactus in a flower pot"). The pipeline runs automatically and a preview image appears inside the chat bubble.
- The bot responds with the generated preview and post-generation quick replies: Edit this, Animate it, Add Effect, Done.
- Edit this: type the change in plain language ("make the pot blue and add a face to the cactus"). The classifier decides whether a property tweak or a full HLSL rewrite is needed and applies the change.
- Animate it: describe the desired motion ("make it slowly bounce up and down"). The system generates a C#
MonoBehaviour, writes it to disk, waits for Unity's domain reload, then attaches it to the preview quad automatically. - Add Effect → Glow or Pixelation: applied instantly with no LLM call; result shown in chat.
- At any point the user can type freely outside quick replies — the state machine interprets the message in context.
The Image-to-Shape flow works the same way but starts with an image upload button; GPT-4o describes the image and the pipeline proceeds from there.
The chatbot also runs as an HTTP server on port 7723, so it can be driven from a browser at http://localhost:7723 — useful for demo setups where the Unity Editor runs headless or on a different machine. The web interface (chat.html) mirrors the IMGUI window exactly.
Chatbot UI — example sessions:
| Main menu | HLSL import flow | Pixelation effect |
|---|---|---|
| Welcome screen with quick-reply buttons | Importing an HLSL file → material generated in chat | Applying pixelation to a Cartoon Hamburger |
Location: Assets/ShaderGraphGenerator/KnowledgeBase/shape_metadata.json
Size: 184 verified shapes (as of thesis submission)
Schema:
{
"totalShapes": 184,
"shapes": [{
"id": "_259cbb14",
"fileName": "RoundedRectangle",
"filePath": "Assets/ShaderGraphs/SuccessfulResults/RoundedRectangle.hlsl",
"originalPrompt": "a rounded rectangle with adjustable corner radius...",
"visualDescription": "Smooth rectangular shape with rounded corners...",
"category": 1,
"complexity": 1,
"tags": ["rectangle", "rounded", "geometric"],
"parameters": [{"name": "Width", "type": "float", "defaultValue": "0.6"}],
"embedding": [0.016, -0.012, ...],
"verificationScore": 9,
"animators": [{
"fileName": "RoundedRectanglePulse",
"scriptPath": "Assets/ShaderGraphs/Animations/RoundedRectanglePulse.cs",
"propertiesUsed": ["_FillColor"],
"animationSummary": "Pulses fill color between two hues",
"embedding": [...]
}]
}]
}Category enum: Uncategorized=0, GeometricPrimitives=1, OrganicShapes=2, SymbolsAndIcons=3, CompositeShapes=4
Complexity enum: Unknown=0, Primitive=1, Intermediate=2, Complex=3
Generated shapes that pass the automated VLM quality loop (score ≥ 7) can optionally be reviewed by a human before being added to the knowledge base. This two-step curation process ensures the KB stays high-quality over time.
Generation run completes (VLM score ≥ 7)
↓
Human Review Window (RAGHumanReviewWindow.cs)
— lists pending shapes with their preview PNG, prompt, and VLM score
— reviewer assigns a score 1–10 and writes optional notes
↓
Accept (score ≥ 8) → KnowledgeBaseUpdater.IngestShape()
— appends shape to shape_metadata.json
— generates embedding via OpenAI text-embedding-3-small
— tags + categorises the new entry
↓
Reject → shape discarded; files remain in RAG_Generated/ for reference
For bulk ingestion of successful results (e.g. after an experiment run), the Auto Learn window (RAGAutoLearnWindow.cs) scans RAG_Generated/ for shapes with a logged VLM score ≥ a configurable threshold and ingests them automatically without requiring per-shape human review. This is used to grow the KB rapidly from a batch of high-quality generations.
| Method | Speed | Quality control |
|---|---|---|
| Human Review | Manual, per shape | Highest — reviewer inspects the preview image |
| Auto Learn | Automated batch | Medium — relies on VLM score threshold |
The KB ships with 184 verified shapes. Each accepted shape immediately improves future RAG retrievals for semantically similar prompts, creating a compounding quality improvement over time.
This section reports the results of a controlled evaluation study examining how the RAG pipeline, shape complexity, and LLM model choice affect HLSL shader generation quality, reliability, and cost. All results are from Phase 2 of the experiment, run inside the Unity Editor tool using the automated Phase2ExperimentRunnerWindow.
| Parameter | Value |
|---|---|
| Shape groups | Simple_InRAG, Simple_NotInRAG, Complex_InRAG, Complex_NotInRAG |
| Shapes per group | 5 |
| LLM models tested | Gemini 3 Pro Preview, Gemini 3.1 Pro, GPT-5.4, Claude Sonnet 4.6, Kimi K2.6 |
| Pipelines | RAG (retrieval-augmented) and NoRAG (direct generation) |
| Runs | 5 shapes × 4 groups × 5 models × 2 pipelines = 200 runs |
| VLM evaluator | GPT-4o Vision — scores rendered previews 1–10; threshold ≥ 7 = success |
| Max iterations per run | 3 |
| Human scores | Collected for a subset of runs (Gemini 3 Pro Preview and Claude Sonnet) |
Each run generates one HLSL shader + ShaderGraph material, renders a 512 × 512 PNG preview, and lets the VLM judge up to three times before recording the final result.
Shape sets used in the experiment:
| Group | Shapes |
|---|---|
| Simple_InRAG | Teardrop, ScallopedDisc, UShapedArc, LightningBolt, Trapezoid |
| Simple_NotInRAG | Parallelogram, ChevronArrow, Lemniscate, Hexagram, Squircle |
| Complex_InRAG | CartoonMushroom, CartoonFireFlame, CartoonGearCog, CartoonSpeechBubble, CartoonGiftBox |
| Complex_NotInRAG | CartoonRamenBowl, CartoonCactus, CartoonJellyfish, CartoonPaperAirplane, CartoonSaturnPlanet |
Pipeline-level summary:
| Pipeline | n | Success rate | Avg VLM score | Avg iterations | Avg time (s) | Compile rate | Avg cost / shape |
|---|---|---|---|---|---|---|---|
| RAG | 143 | 70.6% | 7.22 | 1.82 | 607 | 91.6% | $0.063 |
| NoRAG | 157 | 65.6% | 6.48 | 2.06 | 476 | 84.1% | $0.061 |
RAG vs NoRAG — visual output comparison on the same prompts:
| Shape | NoRAG | RAG |
|---|---|---|
| Cartoon Cactus | ![]() |
![]() |
| Cartoon Mushroom | ![]() |
![]() |
| Cartoon UFO | ![]() |
![]() |
Key findings for RQ1:
- RAG improves VLM score by +0.74 points on average (6.48 → 7.22), a meaningful gain given the 1–10 scale.
- RAG also raises the success rate by ~5 percentage points (65.6% → 70.6%) and the compile rate by ~7.5 points.
- NoRAG is not unreliable — with capable models it still achieves 80%+ success on simple shapes. The gap is mainly in output quality (VLM score), not just pass/fail.
- RAG averages more time per shape (607 s vs 476 s); the difference is driven by longer prompts and the retrieval step rather than extra refinement iterations.
- Costs are nearly identical ($0.063 vs $0.061/shape) — RAG's retrieval overhead is offset by fewer failed iterations.
RAG converges in fewer iterations (1.82 vs 2.06), meaning the LLM reaches an acceptable result sooner when given retrieved HLSL examples as context.
Pipeline × Complexity breakdown:
| Pipeline | Complexity | n | Success rate | Avg VLM score | Avg iterations | Avg time (s) | Compile rate | Avg cost / shape |
|---|---|---|---|---|---|---|---|---|
| NoRAG | Simple | 98 | 82.7% | 7.68 | 1.73 | 212 | 90.8% | $0.032 |
| NoRAG | Complex | 59 | 37.3% | 4.49 | 2.61 | 915 | 72.9% | $0.108 |
| RAG | Simple | 86 | 84.9% | 8.24 | 1.56 | 462 | 100.0% | $0.024 |
| RAG | Complex | 57 | 49.1% | 5.67 | 2.21 | 825 | 78.9% | $0.122 |
Key findings for RQ2:
- Simple shapes + NoRAG is the fastest configuration (212 s/shape) and achieves 82.7% success — sufficient for basic use cases and faster than all RAG variants.
- RAG improves simple shapes in VLM quality (7.68 → 8.24) and achieves a 100% compile rate, though it takes longer (462 s vs 212 s) due to retrieval overhead. The cost is actually lower ($0.024 vs $0.032) because first-pass compile success reduces refinement iterations.
- Complex shapes are the main challenge: NoRAG complex shapes succeed only 37.3% of the time. RAG raises this to 49.1% and is faster for complex shapes (825 s vs 915 s) because retrieved HLSL context helps the LLM avoid costly compile–fix loops.
- In-KB shapes (whose geometry is represented in the knowledge base) score consistently higher under RAG, as retrieved examples closely match the target. Not-in-KB shapes still benefit from RAG, but the gain is smaller — the retrieved examples provide structural guidance even when the shape is novel.
Higher retrieval similarity (cosine distance between query and nearest KB example) correlates with higher VLM scores, confirming that retrieval quality is a meaningful predictor of generation quality.
Complex shapes are decomposed into more components during the RAG retrieval step. Runs where all components had high-similarity matches tended to succeed on the first iteration.
Per-model summary (across all pipelines and complexities):
| Model | n | Success rate | Avg VLM score | Avg iterations | Avg time (s) | Compile rate | Avg cost / shape | Avg total tokens |
|---|---|---|---|---|---|---|---|---|
| Gemini 3.1 Pro | 75 | 93.3% | 8.53 | 1.59 | 359 | 98.7% | $0.036 | 9,602 |
| Gemini 3 Pro Preview | 40 | 87.5% | 7.58 | 1.73 | 607 | 97.5% | $0.038 | 12,778 |
| Claude Sonnet 4.6 | 79 | 72.2% | 7.13 | 1.80 | 129 | 72.2% | $0.072 | 10,395 |
| GPT-5.4 | 48 | 58.3% | 6.73 | 2.12 | 307 | 100.0% | $0.149 | 16,365 |
| Kimi K2.6 | 58 | 24.1% | 3.81 | 2.62 | 1,473 | 77.6% | $0.025 | 90,288 |
Key findings for RQ3:
- Gemini 3.1 Pro is the most reliable model overall (93.3% success, 98.7% compile rate, highest VLM at 8.53). It uses fewer tokens than the Preview variant and is slightly cheaper at $0.036/shape, making it the best overall choice.
- Gemini 3 Pro Preview achieves 87.5% success with a VLM score of 7.58 at $0.038/shape — nearly identical cost to 3.1 Pro but with lower reliability. Both Gemini models are clear leaders in quality.
- Claude Sonnet 4.6 is by far the fastest model at just 129 s/shape (2.8–11× faster than alternatives). Its success rate (72.2%) lags behind the Gemini models but it achieves 100% compile rate on simple shapes. It is the right choice when response speed is the primary constraint.
- GPT-5.4 achieves the highest compile rate (100%) but has the lowest success rate among competitive models (58.3%) and is by far the most expensive at $0.149/shape — over 4× the cost of the Gemini models.
- Kimi K2.6 succeeds on only about one quarter of all runs (24.1% success) and is the slowest model (1,473 s/shape) while consuming massive token counts (~90 K tokens/shape). It is not suitable for this task.
VLM scores (GPT-4o Vision) and human evaluator scores show strong positive correlation (r > 0.7), validating the automated evaluation loop as a reliable proxy for human judgement.
There is no simple relationship between generation time and output quality — Claude Sonnet produces competitive results in a fraction of the time, while Kimi spends the most time yet achieves the lowest scores.
Cost summary:
| Model | Avg cost / shape | 100-shape cost estimate |
|---|---|---|
| Kimi K2.6 | $0.025 | ~$2.50 |
| Gemini 3.1 Pro | $0.036 | ~$3.60 |
| Gemini 3 Pro Preview | $0.038 | ~$3.80 |
| Claude Sonnet 4.6 | $0.072 | ~$7.20 |
| GPT-5.4 | $0.149 | ~$14.90 |
Pipeline cost:
| Configuration | Avg cost / shape |
|---|---|
| RAG + Simple | $0.024 |
| NoRAG + Simple | $0.032 |
| NoRAG + Complex | $0.108 |
| RAG + Complex | $0.122 |
Key findings for RQ4:
- RAG Simple is cheaper than NoRAG Simple ($0.024 vs $0.032/shape) — the 100% first-pass compile rate under RAG eliminates expensive refinement loops, more than offsetting the retrieval overhead.
- Complex shapes cost significantly more ($0.108–$0.122/shape) due to longer HLSL generation prompts, more VLM refinement cycles, and larger model outputs.
- GPT-5.4 is the most expensive model at $0.149/shape — primarily because it charges higher per-token rates while achieving only 58.3% success.
- Kimi K2.6 appears cheap on paper ($0.025/shape) but consumes ~90 K tokens per shape. Its low price per token masks its extreme token usage, and its 24.1% failure rate means the effective cost-per-successful-shape is far higher.
- Best cost–quality tradeoff: Gemini 3.1 Pro at $0.036/shape with 93.3% success and VLM 8.53.
- Best cost–quality–speed tradeoff overall: Gemini 3.1 Pro for quality-focused work; Claude Sonnet 4.6 for time-constrained workflows.
The experiment confirms that the RAG pipeline meaningfully improves both reliability and output quality over direct (NoRAG) generation, with gains in VLM score (+0.74 points average), success rate (+5 pp), and compile rate (+7.5 pp). The pipeline is not the only variable that matters: model choice introduces a larger performance spread than the RAG/NoRAG decision for capable models like the two Geminis.
Summary of key takeaways:
-
RAG is worth it — It raises VLM scores, success rates, and compile rates across all shape groups. For simple shapes RAG is actually cheaper ($0.024 vs $0.032) thanks to higher first-pass compile rates.
-
NoRAG is not broken — With a capable model, NoRAG on simple shapes achieves ~83% success and is the fastest configuration (212 s/shape). It is the right choice when speed is the priority.
-
For complex shapes, RAG also saves time — Fewer refinement iterations under RAG mean RAG actually completes complex shapes faster (825 s vs 915 s) despite longer prompts.
-
Gemini 3.1 Pro is the clear leader — 93.3% success, VLM 8.53, 98.7% compile rate, and slightly cheaper than 3 Pro Preview at $0.036/shape. It performs consistently better across all shape groups.
-
Claude Sonnet is the speed champion at 129 s/shape — nearly 3× faster than the next-fastest model with competitive VLM quality (7.13). Recommended for iterative or real-time-feedback workflows.
-
GPT-5.4 is not cost-effective for this task — $0.149/shape with only 58.3% success makes it the worst cost-per-successful-shape among non-Kimi models.
-
Kimi K2.6 is unsuitable for this task. Its 24.1% success rate, 1,473 s/shape latency, and ~90 K token usage make it impractical.
-
In-KB shapes benefit more from RAG because retrieval similarity is higher, but even Not-in-KB shapes improve — the retrieved examples provide structural guidance even for novel geometry.
-
VLM scoring is a valid proxy for human evaluation (r > 0.7), making the automated quality loop a reliable substitute for manual review at scale.
- Unity 6000.0.41f1 (Unity 6) or later
- Newtonsoft.Json package (via Package Manager:
com.unity.nuget.newtonsoft-json) - API keys for at least one of: Google Gemini, OpenAI, Anthropic Claude
-
Clone this repository into your Unity project's
Assets/folder (or open as a Unity project directly). -
Install Newtonsoft.Json via Unity Package Manager:
Window → Package Manager → Add package by name → com.unity.nuget.newtonsoft-json -
Create the Config asset:
- Right-click in the Project window
Create → ShaderGraphGenerator → Config- Name it
ShaderGraphGeneratorConfig(or any name — it's found by type)
-
Add API keys to the Config asset:
openAIKey— required for VLM scoring and image descriptiongeminiKey— required for HLSL generation and most LLM callsclaudeKey— optional, alternative generation backend
-
Open the chatbot:
Tools → ShaderGraph Generator → Chat UI -
Open the main generator (standalone mode):
Tools → ShaderGraph Generator → 2.9 Unified RAG Generator
All API keys are stored in a ShaderGraphGeneratorConfig ScriptableObject asset:
[CreateAssetMenu(menuName = "ShaderGraphGenerator/Config")]
public class ShaderGraphGeneratorConfig : ScriptableObject
{
public string openAIKey; // GPT-4o Vision — VLM scoring, image descriptions
public string geminiKey; // Gemini — HLSL generation, classification, property suggestion
public string claudeKey; // Claude — alternative structured generation (optional)
}The asset is located anywhere in the project; all pipelines find it via AssetDatabase.FindAssets("t:ShaderGraphGeneratorConfig").
| Window | Menu Path | Purpose |
|---|---|---|
| Chat UI | Tools / ShaderGraph Generator / Chat UI | Main conversational interface |
| Unified RAG Generator | Tools / ShaderGraph Generator / 2.9 Unified RAG Generator | Direct text/image generation with VLM loop |
| Material Animator | Tools / ShaderGraph Generator / Animator | Generate animation scripts for materials |
| Human Review | Tools / ShaderGraph Generator / Human Review | Score and curate generated shapes |
| Auto Learn | Tools / ShaderGraph Generator / Auto Learn | Ingest successful results into knowledge base |
| RAG Update | Tools / ShaderGraph Generator / Update | Edit an existing shape's HLSL |
| Image to Shader | Tools / ShaderGraph Generator / Image to Shader | Standalone image-to-material pipeline |
| Embedding Generator | Tools / ShaderGraph Generator / Embeddings | Generate/update KB embeddings |
| Asset Type | Path |
|---|---|
| Generated HLSL | Assets/ShaderGraphs/RAG_Generated/{name}.hlsl |
| Updated HLSL | Assets/ShaderGraphs/RAG_Updates/{name}.hlsl |
| Imported HLSL | Assets/ShaderGraphs/Generated/HLSL/{name}.hlsl |
| ShaderGraph | Assets/ShaderGraphs/RAG_Generated/{name}.shadergraph |
| Effect ShaderGraph (Pixelation) | Assets/ShaderGraphs/Effects/{name}_Pixelated.shadergraph |
| Effect ShaderGraph (Glow) | Assets/ShaderGraphs/Effects/{name}_Glow.shadergraph |
| Effect ShaderGraph (Both) | Assets/ShaderGraphs/Effects/{name}_Glow_Pixelated.shadergraph |
| Material | Assets/ShaderGraphs/RAG_Generated/{name}.mat |
| Preview PNG | Assets/ShaderGraphs/Previews/{name}_{iter}.png |
| Animation Script | Assets/ShaderGraphs/Animations/{ClassName}.cs |
| Knowledge Base | Assets/ShaderGraphGenerator/KnowledgeBase/shape_metadata.json |
| Contact Messages | {ProjectRoot}/contact_messages.json |
- Model:
gemini-3-pro-preview - Endpoint:
https://generativelanguage.googleapis.com/v1beta/models/{model}:generateContent - Used for: HLSL code generation, animation script generation, animation/edit classification, property value suggestion
- Vision variant: Accepts
inlineDatabase64 image alongside text prompt
- Model:
gpt-4o - Endpoint:
https://api.openai.com/v1/chat/completions - Used for: VLM visual evaluation (1–10 scoring), reference image description, before/after edit comparison
- Also: Embedding generation for semantic search (
text-embedding-3-small)
- Model:
claude-sonnet-4-5-20250929 - Endpoint:
https://api.anthropic.com/v1/messages - Used for: Alternative HLSL generation with structured JSON schema enforcement
- Optional — the system works without a Claude key
Pricing basis — costs use April 2026 list prices:
gemini-3-pro-preview(treated as Gemini Pro tier): $1.25 / 1M input tokens, $5.00 / 1M output tokensgpt-4o: $2.50 / 1M input tokens, $10.00 / 1M output tokenstext-embedding-3-small: $0.02 / 1M tokens Image tokens (GPT-4o): 512 × 512 px ≈ 255 tokens, 256 × 256 px ≈ 85 tokensVerify current prices at platform.openai.com/pricing and ai.google.dev/pricing.
| Call | Model | Typical Tokens In | Typical Tokens Out | Min Cost | Max Cost |
|---|---|---|---|---|---|
| Gemini: decompose request | Gemini Pro | 1,550–2,600 | 300–600 | $0.003 | $0.006 |
| Gemini: compose HLSL (text) | Gemini Pro | 2,100–5,100 | 800–3,000 | $0.007 | $0.021 |
| Gemini: compose HLSL (+ image) | Gemini Pro | 2,350–5,350 | 800–3,000 | $0.007 | $0.022 |
| Gemini: classify edit / anim | Gemini Pro | 600–1,200 | 100–200 | $0.001 | $0.003 |
| Gemini: update HLSL | Gemini Pro | 2,700–5,000 | 600–2,000 | $0.007 | $0.017 |
| Gemini: generate anim script | Gemini Pro | 1,400–3,000 | 500–1,200 | $0.004 | $0.010 |
| Gemini: suggest prop values | Gemini Pro | 600–1,800 | 100–200 | $0.001 | $0.003 |
| GPT-4o: describe image | GPT-4o | 755–1,000 + 255 img | 500–1,000 | $0.007 | $0.013 |
| GPT-4o: VLM eval (1 image) | GPT-4o | 1,500–2,500 + 255 img | 100 | $0.005 | $0.008 |
| GPT-4o: before/after eval (2 images) | GPT-4o | 300 + 2×85 img | 100 | $0.002 | $0.003 |
| Embedding query (per component) | text-embedding-3-small | 50–200 | — | <$0.001 | <$0.001 |
Decompose prompt size note: The decomposition prompt includes a summary of all 184 KB shapes (names + tags, up to 20 per category) which adds ≈ 800–1,000 tokens to every decomposition call.
HLSL composition prompt note: Each retrieved KB example includes the full HLSL source file (≈ 300–800 tokens per file). With 2 examples retrieved, this adds ≈ 600–1,600 tokens per composition call.
| Step | Calls per iteration | Cost per iteration |
|---|---|---|
| Gemini: decompose request | ×1 | $0.003–$0.006 |
| Embedding searches (1–3 components) | ×1–3 | < $0.001 |
| Gemini: compose HLSL | ×1 | $0.007–$0.021 |
| GPT-4o: VLM eval | ×1 | $0.005–$0.008 |
| Total per iteration | $0.015–$0.035 |
| Scenario | Iterations | Min cost | Max cost |
|---|---|---|---|
| Best case (simple shape, passes first VLM) | 1 | $0.015 | $0.035 |
| Typical (medium complexity, 1–2 refinements) | 2 | $0.030 | $0.070 |
| Worst case (complex shape, 3 refinements) | 3 | $0.045 | $0.105 |
| Step | Calls | Cost |
|---|---|---|
| GPT-4o: describe reference image | ×1 | $0.007–$0.013 |
| Gemini: decompose description | ×1 per iter | $0.003–$0.006 |
| Embedding searches | ×1–3 per iter | < $0.001 |
| Gemini Vision: compose HLSL | ×1 per iter | $0.007–$0.022 |
| GPT-4o: VLM eval | ×1 per iter | $0.005–$0.008 |
| Scenario | Iterations | Min cost | Max cost |
|---|---|---|---|
| Best case (image is simple, passes first VLM) | 1 | $0.022 | $0.049 |
| Typical (1–2 refinements) | 2 | $0.037 | $0.084 |
| Worst case (3 refinements) | 3 | $0.052 | $0.119 |
| Step | Cost |
|---|---|
| Gemini: suggest property values | $0.001–$0.003 |
| Total | $0.001–$0.003 |
This path has no VLM evaluation — result is immediate.
Sub-path A — Property change only (no shader rewrite):
| Step | Cost |
|---|---|
| Gemini: classify edit request | $0.001–$0.003 |
| Total | $0.001–$0.003 |
Sub-path B — HLSL rewrite needed:
| Step | Calls | Cost |
|---|---|---|
| Gemini: classify edit | ×1 | $0.001–$0.003 |
| Gemini: update HLSL | ×1–2 | $0.007–$0.017 per iter |
| GPT-4o: before/after VLM eval | ×1–2 | $0.002–$0.003 per iter |
| Scenario | Iterations | Min cost | Max cost |
|---|---|---|---|
| Best case (property change only) | — | $0.001 | $0.003 |
| HLSL update, passes first try | 1 | $0.010 | $0.023 |
| HLSL update, 2 VLM iterations | 2 | $0.017 | $0.043 |
Sub-path A — C# script only (existing properties sufficient):
| Step | Cost |
|---|---|
| Gemini: classify animation | $0.001–$0.003 |
| Embedding search (animation KB) | < $0.001 |
| Gemini: generate C# animation script | $0.004–$0.010 |
| Total | $0.005–$0.013 |
Sub-path B — HLSL update required first (new shader properties needed):
| Step | Calls | Cost |
|---|---|---|
| Gemini: classify animation | ×1 | $0.001–$0.003 |
| Gemini: update HLSL | ×1–2 | $0.007–$0.017 per iter |
| GPT-4o: before/after VLM eval | ×1–2 | $0.002–$0.003 per iter |
| Embedding search (animation KB) | ×1 | < $0.001 |
| Gemini: generate C# animation script | ×1 | $0.004–$0.010 |
| Scenario | Min cost | Max cost |
|---|---|---|
| C# only (no HLSL needed) | $0.005 | $0.013 |
| HLSL update needed, 1 VLM iteration | $0.014 | $0.033 |
| HLSL update needed, 2 VLM iterations | $0.021 | $0.053 |
No API calls. ShaderGraph nodes are injected deterministically (UV quantisation via Floor/Divide).
| Total | $0.000 |
|---|
No API calls. ShaderGraph is regenerated with HDR colour properties and a glowIntensity multiply node. A URP Bloom post-processing volume is added to the scene automatically.
| Total | $0.000 |
|---|
| Session type | Paths used | Estimated total |
|---|---|---|
| Light — one simple shape, one property edit, one animation (C# only) | Text gen (1 iter) + Edit A + Anim A | ~$0.021–$0.051 |
| Standard — image gen with 1 refinement, HLSL edit, animation (C# only) | Image gen (2 iter) + Edit B (1 iter) + Anim A | ~$0.059–$0.119 |
| Heavy — complex shape (3 iters), HLSL edit (2 iters), HLSL-needed animation (2 iters) | Text gen (3 iter) + Edit B (2 iter) + Anim B (2 iter) | ~$0.083–$0.201 |
| Exploration — 5 text generations + 2 edits + 2 animations | 5×Text(avg 2 iter) + 2×Edit(mix) + 2×Anim(mix) | ~$0.200–$0.550 |
| Activity | Cost |
|---|---|
| Generate embeddings for all 184 KB shapes | ~$0.001 (one-time) |
| Each new shape added to KB (embedding) | ~$0.000004 per shape |
Knowledge base embedding is essentially free — the entire 184-shape KB costs less than $0.001 to fully re-embed.
- Reduce VLM iterations: Lower
maxVlmIterationsfrom 3 to 1 inRAGPipelineManagerandHLSLUpdatePipelineManagerif speed/cost matters more than quality. - Use text prompts over images: The image path costs $0.007–$0.013 more per session due to the GPT-4o image description call.
- Simple edits are cheap: If your shape already has the right properties exposed, editing is just one Gemini classification call (~$0.002).
- Effects are free: Both Pixelation and Glow effects use no LLM calls at all.
- Animation without HLSL changes: Sub-path A (C# only) costs ~$0.005–$0.013 vs ~$0.021–$0.053 for the HLSL-update path.
Assets/
├── ShaderGraphGenerator/ ← All tool source code
│ ├── Editor/ ← Unity editor-only scripts
│ │ ├── Chat/ ← Chatbot state machine + HTTP bridge
│ │ ├── Core/ ← APIs, material helpers, prompt builders
│ │ ├── KnowledgeBase/ ← Embedding search, KB management
│ │ └── RAG/ ← All generation pipelines + windows
│ │ ├── Animation/ ← Animation pipeline + C# script generation
│ │ ├── Curation/ ← KB ingestion and management
│ │ ├── Edit/ ← Edit classification
│ │ ├── Generation/ ← RAG composition engine
│ │ ├── Pipelines/ ← Top-level pipeline orchestrators
│ │ └── Windows/ ← All EditorWindow UIs
│ └── KnowledgeBase/ ← shape_metadata.json + embeddings
│
├── ShaderGraphs/
│ ├── RAG_Generated/ ← Generated HLSL, shadergraphs, materials
│ ├── RAG_Updates/ ← Edited/updated versions
│ ├── Effects/ ← Effect variants (pixelation, etc.)
│ ├── Generated/ ← HLSL imports + their shadergraphs
│ ├── Animations/ ← Generated C# animation MonoBehaviours
│ ├── Previews/ ← PNG screenshots of generated shapes
│ └── SuccessfulResults/ ← Manually curated HLSL library
│
└── ShaderGraphGeneratorConfig.asset ← API keys (do not commit)
| Component | Technology |
|---|---|
| Runtime | Unity 6 (6000.0.41f1), C# 9 |
| Editor UI | Unity IMGUI (EditorWindow, GUILayout) |
| HTTP Server | System.Net.HttpListener on localhost:7723 |
| Async | async/await + Task, CancellationToken |
| JSON | Newtonsoft.Json with custom JsonConverter |
| LLM — Code Gen | Google Gemini (gemini-3-pro-preview) |
| LLM — Vision | OpenAI GPT-4o (gpt-4o) |
| LLM — Structured | Anthropic Claude (claude-sonnet-4-5) |
| Embeddings | OpenAI text-embedding-3-small |
| Shader Format | Unity ShaderGraph JSON (custom node wiring) |
| Shader Language | HLSL (Custom Function nodes in ShaderGraph) |
| Persistence | EditorPrefs (domain reload safety), JSON files |
- API keys — Never commit
ShaderGraphGeneratorConfig.assetto a public repository. Add it to.gitignore. - Domain reloads — The animation pipeline survives Unity's script compilation domain reload by serialising state to
EditorPrefsbefore triggeringAssetDatabase.Refresh(). - VLM threshold — The acceptance threshold is score > 7 out of 10. Shapes below this are automatically refined with feedback up to 3 times.
- Knowledge base growth — Every shape accepted through Human Review (score ≥ 8) can be added to
shape_metadata.jsonvia the Auto Learn window, improving future RAG retrievals. - Pixelation — Implemented as a deterministic ShaderGraph modification (UV quantisation via Floor/Divide nodes), not an LLM call. Takes ~2 seconds.
- Glow — Implemented as a deterministic ShaderGraph modification. All colour properties are switched to HDR mode (
colorMode=1), aglowIntensity(default 2) multiply node is inserted before Base Color, and a global URP Bloom post-processing volume (intensity=2) is added to the scene. No LLM call. Requires URP with post-processing enabled on the camera. - Effect stacking — Pixelation and Glow can be applied on top of each other in any order. The system detects the existing effect and regenerates the ShaderGraph with both flags active, outputting a combined
_Glow_Pixelatedvariant.
Master Thesis — Niloufar Moradijam — 2026














































