Skip to content

niloufarmj/MasterThesis-GenerativeProceduralShapes

Repository files navigation

Generative Procedural Shapes — AI-Powered Shader Generator for Unity

Master Thesis Project — An AI-assisted pipeline for generating, editing, animating, and applying effects to procedural 2D shapes in Unity using HLSL shaders, ShaderGraph, and large language models.


Table of Contents


Overview

This Unity Editor tool enables non-programmers to create, edit, animate, and apply effects to 2D procedural shapes entirely through natural language — no HLSL coding required.

The system combines a RAG (Retrieval-Augmented Generation) pipeline, multiple LLM providers (Google Gemini, OpenAI GPT-4o, Anthropic Claude), and an interactive chatbot interface to guide users through the full asset creation lifecycle.

All shapes are built on Signed Distance Fields (SDF) — mathematical functions that return the distance from any point to a shape's boundary, enabling smooth, resolution-independent procedural geometry entirely in HLSL with no textures.

SDF Concept

SDF concept: f(p) < 0 = inside the shape, f(p) = 0 = on the boundary, f(p) > 0 = outside. Every generated HLSL shader composes multiple SDF primitives to construct the final shape.

User Prompt / Reference Image
        ↓
  Shape Decomposition
        ↓
  Knowledge Base Retrieval (184+ shapes, embedding search)
        ↓
  LLM HLSL Code Generation (Gemini)
        ↓
  ShaderGraph + Material Build
        ↓
  VLM Visual Evaluation (GPT-4o) — iterates until score > 7/10
        ↓
  Preview Quad in Scene + PNG Screenshot
        ↓
  Edit / Animate / Add Effects (follow-up flows)

Generated Shape Gallery

All shapes below are purely procedural HLSL — no textures, no sprites. Each is generated end-to-end from a natural language prompt, rendered in Unity's Universal Render Pipeline at 512 × 512 px. Colours, proportions, and style parameters are adjustable through exposed ShaderGraph properties without touching any code.

Fox Face Procedural Sword Cartoon Sunflower
Fox Face Procedural Sword Cartoon Sunflower
Cartoon Mushroom Ice Cream Cone Watermelon Slice
Cartoon Mushroom Ice Cream Cone Watermelon Slice
Stylized Christmas Tree Cartoon Rainbow Procedural Hot Dog
Stylized Christmas Tree Cartoon Rainbow Procedural Hot Dog
Cute Star Cartoon Cactus Cartoon UFO
Cute Star Cartoon Cactus Cartoon UFO

Features

Feature Description
Text-to-Shape Describe any 2D shape in natural language; get an HLSL shader + Unity material
Image-to-Shape Upload a reference image; Gemini Vision recreates it as a procedural shader
HLSL Import Upload an existing .hlsl file; the tool wraps it in ShaderGraph with sensible property values
Shape Editing AI classifies whether an edit needs a property tweak or full HLSL rewrite, then applies it
Animation AI generates a C# MonoBehaviour to animate material properties; adds new shader properties when needed
Pixelation Effect Client-side ShaderGraph node injection (Floor/Divide UV quantisation) — no LLM call required
Glow Effect Client-side ShaderGraph modification: all colour properties → HDR, glowIntensity multiplier node injected before Base Color, URP Bloom post-processing volume added to scene — no LLM call required
Stacked Effects Pixelation and Glow can be combined; applying one effect on top of the other preserves both
VLM Quality Loop GPT-4o scores rendered previews 1–10; automatically refines until threshold is met
RAG Knowledge Base 184+ verified shapes with embeddings for semantic retrieval; grows with each accepted generation
Chatbot UI Conversational editor window with state machine, quick replies, material/image pickers
Human Review EditorWindow for scoring, accepting, and curating generated shapes into the knowledge base

Architecture

System Architecture Diagram

Each generated HLSL shader is automatically wired into a Unity ShaderGraph with all parameters exposed as editable material properties:

Generated ShaderGraph Exposed Material Properties
ShaderGraph node graph Sword material inspector
Auto-generated ShaderGraph node graph for a procedural shape Exposed shader properties in the Unity Inspector — adjustable without touching code
Assets/ShaderGraphGenerator/
├── ShaderGraphJSONGenerator.cs      ← HLSL → ShaderGraph JSON (with optional pixelation / glow nodes)
├── ShaderGraphNodeFactory.cs        ← Creates typed ShaderGraph node objects
├── ShaderGraphPropertyFactory.cs    ← Creates shader property definitions
├── ShaderGraphSlotFactory.cs        ← Creates node input/output slot definitions
├── HLSLFunctionInfo.cs              ← Parsed HLSL function metadata
├── FunctionParameter.cs             ← Single parameter (name, type, direction)
│
├── Editor/
│   ├── Core/
│   │   ├── API/
│   │   │   ├── GeminiApiService.cs      ← Gemini text & vision (HLSL generation, classification)
│   │   │   ├── OpenAIApiService.cs      ← GPT-4o Vision (VLM scoring, image description)
│   │   │   └── ClaudeApiService.cs      ← Claude (alternative structured HLSL generation)
│   │   ├── LLMDataModels.cs             ← Serialisable LLM request/response structures
│   │   ├── MaterialPreviewHelper.cs     ← Preview quads, screenshots, property application
│   │   ├── ShaderGenerationPipeline.cs  ← Core: HLSL → ShaderGraph → Material → Preview
│   │   └── ShaderPromptBuilder.cs       ← Prompt construction for shape generation
│   │
│   ├── Chat/
│   │   ├── ChatbotWindow.cs             ← IMGUI chat window (bubbles, quick replies, pickers)
│   │   ├── ChatBridge.cs                ← HTTP server (port 7723) + all pipeline triggers
│   │   ├── ChatBridgeLocal.cs           ← Direct (non-HTTP) entry point for the IMGUI window
│   │   └── ChatSession.cs               ← Static state machine (26 states) + message history
│   │
│   ├── KnowledgeBase/
│   │   ├── SemanticShapeSearch.cs       ← Cosine-similarity search over embedding vectors
│   │   ├── ShapeEmbeddingService.cs     ← Generates embeddings via OpenAI API
│   │   ├── ShapeMetadata.cs             ← Data structures: ShapeMetadata, AnimatorEntry, etc.
│   │   ├── HLSLParser.cs                ← Parses HLSL to extract function signatures
│   │   └── KnowledgeBaseLLMService.cs   ← LLM-powered shape analysis and metadata extraction
│   │
│   └── RAG/
│       ├── Pipelines/
│       │   ├── RAGPipelineManager.cs           ← Full text-to-shape pipeline (7 steps + VLM loop)
│       │   ├── ImageToShaderPipelineManager.cs ← Image-to-shader with Gemini Vision
│       │   └── HLSLUpdatePipelineManager.cs    ← Edit pipeline: before/after VLM comparison
│       ├── Generation/
│       │   ├── RAGShapeGenerator.cs            ← Decompose → retrieve → LLM compose
│       │   ├── ShapeDecompositionService.cs    ← Breaks complex shapes into components
│       │   ├── ShaderGraphBuilder.cs           ← HLSL + LLM response → ShaderGraph + Material
│       │   └── HLSLCompositionEngine.cs        ← Merges multiple HLSL primitives
│       ├── Animation/
│       │   ├── MaterialAnimatorPipelineManager.cs ← C# animation script pipeline (domain-reload safe)
│       │   ├── AnimationScriptGenerator.cs        ← LLM C# generation + animation classification
│       │   └── AnimationKnowledgeBase.cs          ← Animation KB helpers + embedding search
│       ├── Edit/
│       │   └── EditClassifier.cs               ← Classifies edits: property-only vs HLSL update
│       ├── Curation/
│       │   └── KnowledgeBaseUpdater.cs         ← Ingests shapes into knowledge base
│       └── Windows/
│           ├── UnifiedGeneratorWindow.cs       ← Main RAG generator UI (text + image modes)
│           ├── MaterialAnimatorWindow.cs       ← Animation generation UI
│           ├── RAGHumanReviewWindow.cs         ← Human scoring and KB curation
│           ├── RAGAutoLearnWindow.cs           ← Auto-ingest successful results
│           ├── RAGUpdateWindow.cs              ← Shape editing UI
│           └── ImageToShaderWindow.cs          ← Image upload + generation controls

Pipeline Flows

1. Text-to-Shape (RAG Pipeline)

User text prompt
→ ShapeDecompositionService      (break into visual components)
→ SemanticShapeSearch            (find top-2 KB examples per component)
→ RAGShapeGenerator              (build augmented prompt with retrieved HLSL)
→ GeminiApiService               (generate new HLSL code)
→ ShaderGraphBuilder             (HLSL → .shadergraph JSON)
→ MaterialPreviewHelper          (create .mat + render 512×512 PNG)
→ OpenAIApiService               (VLM score 1–10; if < 7 refine, max 3 tries)
→ Result: .hlsl + .shadergraph + .mat + preview PNG

Decomposition and Retrieval

Stages 1–4 of the RAG pipeline: the user prompt is decomposed into visual components (e.g. "tall cactus body", "flower pot with rim", "small oval spines"), each component is matched against the 184-shape Knowledge Base via embedding search, and the retrieved HLSL examples are passed to Gemini to compose the final shader.

2. Image-to-Shape

Reference image (PNG/JPG)
→ OpenAIApiService.DescribeImage (GPT-4o: detailed visual description)
→ ShapeDecompositionService      (decompose description)
→ SemanticShapeSearch            (retrieve KB examples)
→ GeminiApiService.Vision        (generate HLSL with both text + original image)
→ (same as text pipeline from ShaderGraphBuilder onward)

3. HLSL Import (Chatbot)

User uploads .hlsl file
→ Copy to Assets/ShaderGraphs/Generated/HLSL/
→ ShaderGraphJSONGenerator.GenerateFromHLSL (no pixelation)
→ MaterialPreviewHelper.CreateMaterialForShaderGraph
→ MaterialPreviewHelper.SetRandomMaterialProperties  (safe baseline)
→ GeminiApiService (read HLSL → suggest sensible property values)
→ MaterialPreviewHelper.SetDefaultMaterialProperties (override with LLM values)
→ MaterialPreviewHelper.CreatePreviewQuad
→ Result: shadergraph + material with good defaults + preview

4. Shape Editing

User edit request + material
→ EditClassifier.ClassifyEditRequestAsync (Gemini)
   ├── needs_hlsl_change = false → ApplyMaterialPropertyChanges (SetFloat/SetColor/SetVector)
   │                             → Preview quad in scene
   └── needs_hlsl_change = true  → HLSLUpdatePipelineManager
                                    (extract HLSL → Gemini update → before/after VLM verify)
                                  → Preview quad + result image in chat

Editing example — scarf pattern updated via a single natural language instruction:

Before After
Snowman before edit Snowman after edit
"A cartoon snowman" "Make the scarf striped"

5. Animation

User animation request + material
→ AnimationScriptGenerator.ClassifyAnimationRequirementsAsync (Gemini)
   ├── C# only → AnimationScriptGenerator.GenerateAnimationScriptAsync
   │            → Write .cs → Unity recompiles → [domain reload] → attach to preview quad
   └── HLSL needed → HLSLUpdatePipelineManager (add missing properties)
                   → AnimationScriptGenerator.GenerateAnimationScriptAsync (on updated material)
                   → Write .cs → domain reload → attach to quad

6. Pixelation Effect (no LLM)

Source material
→ HLSLUpdatePipelineManager.ExtractHlslPathFromMaterial
→ detect if source already has Glow (preserves it if so)
→ ShaderGraphJSONGenerator.GenerateFromHLSL (..., usePixelation: true, useGlow: <preserved>)
   (injects: UV × PixelCount → Floor → ÷ PixelCount nodes)
→ MaterialPreviewHelper.CreateMaterialForShaderGraph
→ CopyMatchingMaterialProperties (original → effect)
→ SetFloat("PixelCount", 64)
→ CreatePreviewQuad → named "Pixelation Effect — {name}" in scene

7. Glow Effect (no LLM)

Source material
→ HLSLUpdatePipelineManager.ExtractHlslPathFromMaterial
→ detect if source already has Pixelation (preserves it if so)
→ ShaderGraphJSONGenerator.GenerateFromHLSL (..., useGlow: true, usePixelation: <preserved>)
   (converts all colour properties to HDR colorMode=1)
   (injects: FinalColor → Multiply(A) × glowIntensity(B) → BaseColor)
→ MaterialPreviewHelper.CreateMaterialForShaderGraph
→ CopyMatchingMaterialProperties (original colors + floats → effect)
→ SetFloat("glowIntensity", 2)
→ EnsureBloomVolume() — finds or creates a global URP Volume with Bloom, intensity=2
→ CreatePreviewQuad → named "Glow Effect — {name}" in scene

Stacking rules:

  • Applying Glow to a Pixelated material → output named {base}_Glow_Pixelated
  • Applying Pixelation to a Glowed material → output named {base}_Glow_Pixelated
  • Both effects are always re-generated from the original HLSL source, so stacking is lossless

Visual effects example — same Cute Star shape with each post-process applied, including the stacked Glow + Pixelated variant:

Original Pixelation Effect Glow Effect Glow + Pixelated
CuteStar original CuteStar pixelated CuteStar glow CuteStar glow and pixelated
Original shader Pixelation (PixelCount = 64) Glow (glowIntensity = 2, URP Bloom) Both effects stacked (_Glow_Pixelated)

All effects are zero-cost (no LLM call) and take under 2 seconds to apply.


Chatbot Interface

The chatbot (ChatbotWindow.cs + ChatBridge.cs) is driven by a 26-state machine:

Chatbot State Machine

MainMenu
├── new_shape → NewShape_InputMode → [Text | Image] → Generating → Reviewing → PostGen
│                                                                              ├── edit   → Edit_Describe → Edit_Running → PostGen
│                                                                              ├── animate→ Animate_Describe → Animate_Running → PostGen
│                                                                              └── effect → Effect_Pick → Animate_Running → PostGen
├── edit    → Edit_Attach   → Edit_Describe   → Edit_Running   → PostGen
├── animate → Animate_Attach → Animate_Describe → Animate_Running → PostGen
├── effect  → Effect_Attach → Effect_Pick → Animate_Running → PostGen
├── hlsl    → HLSL_Attach   → HLSL_Running   → HLSL_Done
├── explain → Explain → MainMenu
└── contact → Contact_Name → Contact_Intent → Contact_Email → Contact_Message → MainMenu

The chatbot also runs as an HTTP server on port 7723 with the following endpoints:

Endpoint Method Purpose
/ or /chat GET Serves the web-based chat UI (chat.html)
/send POST Main message handler; triggers pipeline or state transition
/image POST Accepts base64-encoded image uploads
/status GET Returns current status message and last preview path
/abort GET Cancels the current in-progress pipeline
/history GET Returns full chat history, current state, and state config

User Experience

Users interact through typed messages and quick-reply buttons that appear contextually. A typical session:

  1. The window opens at MainMenu — quick replies: New Shape, Edit Shape, Animate, Add Effect, Import HLSL.
  2. New Shape → Text: type a description (e.g. "a cartoon cactus in a flower pot"). The pipeline runs automatically and a preview image appears inside the chat bubble.
  3. The bot responds with the generated preview and post-generation quick replies: Edit this, Animate it, Add Effect, Done.
  4. Edit this: type the change in plain language ("make the pot blue and add a face to the cactus"). The classifier decides whether a property tweak or a full HLSL rewrite is needed and applies the change.
  5. Animate it: describe the desired motion ("make it slowly bounce up and down"). The system generates a C# MonoBehaviour, writes it to disk, waits for Unity's domain reload, then attaches it to the preview quad automatically.
  6. Add Effect → Glow or Pixelation: applied instantly with no LLM call; result shown in chat.
  7. At any point the user can type freely outside quick replies — the state machine interprets the message in context.

The Image-to-Shape flow works the same way but starts with an image upload button; GPT-4o describes the image and the pipeline proceeds from there.

The chatbot also runs as an HTTP server on port 7723, so it can be driven from a browser at http://localhost:7723 — useful for demo setups where the Unity Editor runs headless or on a different machine. The web interface (chat.html) mirrors the IMGUI window exactly.

Chatbot UI — example sessions:

Main menu HLSL import flow Pixelation effect
Chat main menu HLSL import result Pixelation via chat
Welcome screen with quick-reply buttons Importing an HLSL file → material generated in chat Applying pixelation to a Cartoon Hamburger

Knowledge Base

Location: Assets/ShaderGraphGenerator/KnowledgeBase/shape_metadata.json

Size: 184 verified shapes (as of thesis submission)

Schema:

{
  "totalShapes": 184,
  "shapes": [{
    "id": "_259cbb14",
    "fileName": "RoundedRectangle",
    "filePath": "Assets/ShaderGraphs/SuccessfulResults/RoundedRectangle.hlsl",
    "originalPrompt": "a rounded rectangle with adjustable corner radius...",
    "visualDescription": "Smooth rectangular shape with rounded corners...",
    "category": 1,
    "complexity": 1,
    "tags": ["rectangle", "rounded", "geometric"],
    "parameters": [{"name": "Width", "type": "float", "defaultValue": "0.6"}],
    "embedding": [0.016, -0.012, ...],
    "verificationScore": 9,
    "animators": [{
      "fileName": "RoundedRectanglePulse",
      "scriptPath": "Assets/ShaderGraphs/Animations/RoundedRectanglePulse.cs",
      "propertiesUsed": ["_FillColor"],
      "animationSummary": "Pulses fill color between two hues",
      "embedding": [...]
    }]
  }]
}

Category enum: Uncategorized=0, GeometricPrimitives=1, OrganicShapes=2, SymbolsAndIcons=3, CompositeShapes=4

Complexity enum: Unknown=0, Primitive=1, Intermediate=2, Complex=3


Human Review & Knowledge Base Curation

Generated shapes that pass the automated VLM quality loop (score ≥ 7) can optionally be reviewed by a human before being added to the knowledge base. This two-step curation process ensures the KB stays high-quality over time.

Review Flow

Generation run completes (VLM score ≥ 7)
        ↓
  Human Review Window (RAGHumanReviewWindow.cs)
  — lists pending shapes with their preview PNG, prompt, and VLM score
  — reviewer assigns a score 1–10 and writes optional notes
        ↓
  Accept (score ≥ 8)  →  KnowledgeBaseUpdater.IngestShape()
                         — appends shape to shape_metadata.json
                         — generates embedding via OpenAI text-embedding-3-small
                         — tags + categorises the new entry
        ↓
  Reject  →  shape discarded; files remain in RAG_Generated/ for reference

Auto Learn

For bulk ingestion of successful results (e.g. after an experiment run), the Auto Learn window (RAGAutoLearnWindow.cs) scans RAG_Generated/ for shapes with a logged VLM score ≥ a configurable threshold and ingests them automatically without requiring per-shape human review. This is used to grow the KB rapidly from a batch of high-quality generations.

Knowledge Base Growth

Method Speed Quality control
Human Review Manual, per shape Highest — reviewer inspects the preview image
Auto Learn Automated batch Medium — relies on VLM score threshold

The KB ships with 184 verified shapes. Each accepted shape immediately improves future RAG retrievals for semantically similar prompts, creating a compounding quality improvement over time.


Experiment Results

This section reports the results of a controlled evaluation study examining how the RAG pipeline, shape complexity, and LLM model choice affect HLSL shader generation quality, reliability, and cost. All results are from Phase 2 of the experiment, run inside the Unity Editor tool using the automated Phase2ExperimentRunnerWindow.


Experiment Design

Parameter Value
Shape groups Simple_InRAG, Simple_NotInRAG, Complex_InRAG, Complex_NotInRAG
Shapes per group 5
LLM models tested Gemini 3 Pro Preview, Gemini 3.1 Pro, GPT-5.4, Claude Sonnet 4.6, Kimi K2.6
Pipelines RAG (retrieval-augmented) and NoRAG (direct generation)
Runs 5 shapes × 4 groups × 5 models × 2 pipelines = 200 runs
VLM evaluator GPT-4o Vision — scores rendered previews 1–10; threshold ≥ 7 = success
Max iterations per run 3
Human scores Collected for a subset of runs (Gemini 3 Pro Preview and Claude Sonnet)

Each run generates one HLSL shader + ShaderGraph material, renders a 512 × 512 PNG preview, and lets the VLM judge up to three times before recording the final result.

Shape sets used in the experiment:

Group Shapes
Simple_InRAG Teardrop, ScallopedDisc, UShapedArc, LightningBolt, Trapezoid
Simple_NotInRAG Parallelogram, ChevronArrow, Lemniscate, Hexagram, Squircle
Complex_InRAG CartoonMushroom, CartoonFireFlame, CartoonGearCog, CartoonSpeechBubble, CartoonGiftBox
Complex_NotInRAG CartoonRamenBowl, CartoonCactus, CartoonJellyfish, CartoonPaperAirplane, CartoonSaturnPlanet

RQ1 — Does RAG improve generation quality?

Pipeline success rate comparison

Radar — pipeline comparison

Pipeline-level summary:

Pipeline n Success rate Avg VLM score Avg iterations Avg time (s) Compile rate Avg cost / shape
RAG 143 70.6% 7.22 1.82 607 91.6% $0.063
NoRAG 157 65.6% 6.48 2.06 476 84.1% $0.061

VLM score distribution

Score histogram

RAG vs NoRAG — visual output comparison on the same prompts:

Shape NoRAG RAG
Cartoon Cactus CartoonCactus NoRAG CartoonCactus RAG
Cartoon Mushroom CartoonMushroom NoRAG CartoonMushroom RAG
Cartoon UFO CartoonUFO NoRAG CartoonUFO RAG

Key findings for RQ1:

  • RAG improves VLM score by +0.74 points on average (6.48 → 7.22), a meaningful gain given the 1–10 scale.
  • RAG also raises the success rate by ~5 percentage points (65.6% → 70.6%) and the compile rate by ~7.5 points.
  • NoRAG is not unreliable — with capable models it still achieves 80%+ success on simple shapes. The gap is mainly in output quality (VLM score), not just pass/fail.
  • RAG averages more time per shape (607 s vs 476 s); the difference is driven by longer prompts and the retrieval step rather than extra refinement iterations.
  • Costs are nearly identical ($0.063 vs $0.061/shape) — RAG's retrieval overhead is offset by fewer failed iterations.

Iteration convergence

RAG converges in fewer iterations (1.82 vs 2.06), meaning the LLM reaches an acceptable result sooner when given retrieved HLSL examples as context.


RQ2 — How does shape complexity interact with the pipeline?

In-KB vs not-in-KB comparison

All experiments summary

Pipeline × Complexity breakdown:

Pipeline Complexity n Success rate Avg VLM score Avg iterations Avg time (s) Compile rate Avg cost / shape
NoRAG Simple 98 82.7% 7.68 1.73 212 90.8% $0.032
NoRAG Complex 59 37.3% 4.49 2.61 915 72.9% $0.108
RAG Simple 86 84.9% 8.24 1.56 462 100.0% $0.024
RAG Complex 57 49.1% 5.67 2.21 825 78.9% $0.122

Average iterations

Time violin plot

Key findings for RQ2:

  • Simple shapes + NoRAG is the fastest configuration (212 s/shape) and achieves 82.7% success — sufficient for basic use cases and faster than all RAG variants.
  • RAG improves simple shapes in VLM quality (7.68 → 8.24) and achieves a 100% compile rate, though it takes longer (462 s vs 212 s) due to retrieval overhead. The cost is actually lower ($0.024 vs $0.032) because first-pass compile success reduces refinement iterations.
  • Complex shapes are the main challenge: NoRAG complex shapes succeed only 37.3% of the time. RAG raises this to 49.1% and is faster for complex shapes (825 s vs 915 s) because retrieved HLSL context helps the LLM avoid costly compile–fix loops.
  • In-KB shapes (whose geometry is represented in the knowledge base) score consistently higher under RAG, as retrieved examples closely match the target. Not-in-KB shapes still benefit from RAG, but the gain is smaller — the retrieved examples provide structural guidance even when the shape is novel.

RAG similarity vs score

Higher retrieval similarity (cosine distance between query and nearest KB example) correlates with higher VLM scores, confirming that retrieval quality is a meaningful predictor of generation quality.

RAG components vs success

Complex shapes are decomposed into more components during the RAG retrieval step. Runs where all components had high-similarity matches tended to succeed on the first iteration.


RQ3 — Which LLM performs best?

Model success rates

Model VLM scores

Model heatmap

Per-model summary (across all pipelines and complexities):

Model n Success rate Avg VLM score Avg iterations Avg time (s) Compile rate Avg cost / shape Avg total tokens
Gemini 3.1 Pro 75 93.3% 8.53 1.59 359 98.7% $0.036 9,602
Gemini 3 Pro Preview 40 87.5% 7.58 1.73 607 97.5% $0.038 12,778
Claude Sonnet 4.6 79 72.2% 7.13 1.80 129 72.2% $0.072 10,395
GPT-5.4 48 58.3% 6.73 2.12 307 100.0% $0.149 16,365
Kimi K2.6 58 24.1% 3.81 2.62 1,473 77.6% $0.025 90,288

Model compile rates

Model iterations

Model time per shape

Compile stability by model

Score stability by model

Recovery rate by model

Key findings for RQ3:

  • Gemini 3.1 Pro is the most reliable model overall (93.3% success, 98.7% compile rate, highest VLM at 8.53). It uses fewer tokens than the Preview variant and is slightly cheaper at $0.036/shape, making it the best overall choice.
  • Gemini 3 Pro Preview achieves 87.5% success with a VLM score of 7.58 at $0.038/shape — nearly identical cost to 3.1 Pro but with lower reliability. Both Gemini models are clear leaders in quality.
  • Claude Sonnet 4.6 is by far the fastest model at just 129 s/shape (2.8–11× faster than alternatives). Its success rate (72.2%) lags behind the Gemini models but it achieves 100% compile rate on simple shapes. It is the right choice when response speed is the primary constraint.
  • GPT-5.4 achieves the highest compile rate (100%) but has the lowest success rate among competitive models (58.3%) and is by far the most expensive at $0.149/shape — over 4× the cost of the Gemini models.
  • Kimi K2.6 succeeds on only about one quarter of all runs (24.1% success) and is the slowest model (1,473 s/shape) while consuming massive token counts (~90 K tokens/shape). It is not suitable for this task.

VLM vs human score correlation

VLM scores (GPT-4o Vision) and human evaluator scores show strong positive correlation (r > 0.7), validating the automated evaluation loop as a reliable proxy for human judgement.

Score vs time scatter

There is no simple relationship between generation time and output quality — Claude Sonnet produces competitive results in a fraction of the time, while Kimi spends the most time yet achieves the lowest scores.


RQ4 — What does each run cost?

Model cost and token breakdown

Cost summary:

Model Avg cost / shape 100-shape cost estimate
Kimi K2.6 $0.025 ~$2.50
Gemini 3.1 Pro $0.036 ~$3.60
Gemini 3 Pro Preview $0.038 ~$3.80
Claude Sonnet 4.6 $0.072 ~$7.20
GPT-5.4 $0.149 ~$14.90

Pipeline cost:

Configuration Avg cost / shape
RAG + Simple $0.024
NoRAG + Simple $0.032
NoRAG + Complex $0.108
RAG + Complex $0.122

Key findings for RQ4:

  • RAG Simple is cheaper than NoRAG Simple ($0.024 vs $0.032/shape) — the 100% first-pass compile rate under RAG eliminates expensive refinement loops, more than offsetting the retrieval overhead.
  • Complex shapes cost significantly more ($0.108–$0.122/shape) due to longer HLSL generation prompts, more VLM refinement cycles, and larger model outputs.
  • GPT-5.4 is the most expensive model at $0.149/shape — primarily because it charges higher per-token rates while achieving only 58.3% success.
  • Kimi K2.6 appears cheap on paper ($0.025/shape) but consumes ~90 K tokens per shape. Its low price per token masks its extreme token usage, and its 24.1% failure rate means the effective cost-per-successful-shape is far higher.
  • Best cost–quality tradeoff: Gemini 3.1 Pro at $0.036/shape with 93.3% success and VLM 8.53.
  • Best cost–quality–speed tradeoff overall: Gemini 3.1 Pro for quality-focused work; Claude Sonnet 4.6 for time-constrained workflows.

Conclusions

The experiment confirms that the RAG pipeline meaningfully improves both reliability and output quality over direct (NoRAG) generation, with gains in VLM score (+0.74 points average), success rate (+5 pp), and compile rate (+7.5 pp). The pipeline is not the only variable that matters: model choice introduces a larger performance spread than the RAG/NoRAG decision for capable models like the two Geminis.

Summary of key takeaways:

  1. RAG is worth it — It raises VLM scores, success rates, and compile rates across all shape groups. For simple shapes RAG is actually cheaper ($0.024 vs $0.032) thanks to higher first-pass compile rates.

  2. NoRAG is not broken — With a capable model, NoRAG on simple shapes achieves ~83% success and is the fastest configuration (212 s/shape). It is the right choice when speed is the priority.

  3. For complex shapes, RAG also saves time — Fewer refinement iterations under RAG mean RAG actually completes complex shapes faster (825 s vs 915 s) despite longer prompts.

  4. Gemini 3.1 Pro is the clear leader — 93.3% success, VLM 8.53, 98.7% compile rate, and slightly cheaper than 3 Pro Preview at $0.036/shape. It performs consistently better across all shape groups.

  5. Claude Sonnet is the speed champion at 129 s/shape — nearly 3× faster than the next-fastest model with competitive VLM quality (7.13). Recommended for iterative or real-time-feedback workflows.

  6. GPT-5.4 is not cost-effective for this task — $0.149/shape with only 58.3% success makes it the worst cost-per-successful-shape among non-Kimi models.

  7. Kimi K2.6 is unsuitable for this task. Its 24.1% success rate, 1,473 s/shape latency, and ~90 K token usage make it impractical.

  8. In-KB shapes benefit more from RAG because retrieval similarity is higher, but even Not-in-KB shapes improve — the retrieved examples provide structural guidance even for novel geometry.

  9. VLM scoring is a valid proxy for human evaluation (r > 0.7), making the automated quality loop a reliable substitute for manual review at scale.


Setup & Installation

Requirements

  • Unity 6000.0.41f1 (Unity 6) or later
  • Newtonsoft.Json package (via Package Manager: com.unity.nuget.newtonsoft-json)
  • API keys for at least one of: Google Gemini, OpenAI, Anthropic Claude

Steps

  1. Clone this repository into your Unity project's Assets/ folder (or open as a Unity project directly).

  2. Install Newtonsoft.Json via Unity Package Manager:

    Window → Package Manager → Add package by name → com.unity.nuget.newtonsoft-json
    
  3. Create the Config asset:

    • Right-click in the Project window
    • Create → ShaderGraphGenerator → Config
    • Name it ShaderGraphGeneratorConfig (or any name — it's found by type)
  4. Add API keys to the Config asset:

    • openAIKey — required for VLM scoring and image description
    • geminiKey — required for HLSL generation and most LLM calls
    • claudeKey — optional, alternative generation backend
  5. Open the chatbot:

    Tools → ShaderGraph Generator → Chat UI
    
  6. Open the main generator (standalone mode):

    Tools → ShaderGraph Generator → 2.9 Unified RAG Generator
    

Configuration

All API keys are stored in a ShaderGraphGeneratorConfig ScriptableObject asset:

[CreateAssetMenu(menuName = "ShaderGraphGenerator/Config")]
public class ShaderGraphGeneratorConfig : ScriptableObject
{
    public string openAIKey;   // GPT-4o Vision — VLM scoring, image descriptions
    public string geminiKey;   // Gemini — HLSL generation, classification, property suggestion
    public string claudeKey;   // Claude — alternative structured generation (optional)
}

The asset is located anywhere in the project; all pipelines find it via AssetDatabase.FindAssets("t:ShaderGraphGeneratorConfig").


Editor Windows

Window Menu Path Purpose
Chat UI Tools / ShaderGraph Generator / Chat UI Main conversational interface
Unified RAG Generator Tools / ShaderGraph Generator / 2.9 Unified RAG Generator Direct text/image generation with VLM loop
Material Animator Tools / ShaderGraph Generator / Animator Generate animation scripts for materials
Human Review Tools / ShaderGraph Generator / Human Review Score and curate generated shapes
Auto Learn Tools / ShaderGraph Generator / Auto Learn Ingest successful results into knowledge base
RAG Update Tools / ShaderGraph Generator / Update Edit an existing shape's HLSL
Image to Shader Tools / ShaderGraph Generator / Image to Shader Standalone image-to-material pipeline
Embedding Generator Tools / ShaderGraph Generator / Embeddings Generate/update KB embeddings

Output File Locations

Asset Type Path
Generated HLSL Assets/ShaderGraphs/RAG_Generated/{name}.hlsl
Updated HLSL Assets/ShaderGraphs/RAG_Updates/{name}.hlsl
Imported HLSL Assets/ShaderGraphs/Generated/HLSL/{name}.hlsl
ShaderGraph Assets/ShaderGraphs/RAG_Generated/{name}.shadergraph
Effect ShaderGraph (Pixelation) Assets/ShaderGraphs/Effects/{name}_Pixelated.shadergraph
Effect ShaderGraph (Glow) Assets/ShaderGraphs/Effects/{name}_Glow.shadergraph
Effect ShaderGraph (Both) Assets/ShaderGraphs/Effects/{name}_Glow_Pixelated.shadergraph
Material Assets/ShaderGraphs/RAG_Generated/{name}.mat
Preview PNG Assets/ShaderGraphs/Previews/{name}_{iter}.png
Animation Script Assets/ShaderGraphs/Animations/{ClassName}.cs
Knowledge Base Assets/ShaderGraphGenerator/KnowledgeBase/shape_metadata.json
Contact Messages {ProjectRoot}/contact_messages.json

API Reference

Google Gemini

  • Model: gemini-3-pro-preview
  • Endpoint: https://generativelanguage.googleapis.com/v1beta/models/{model}:generateContent
  • Used for: HLSL code generation, animation script generation, animation/edit classification, property value suggestion
  • Vision variant: Accepts inlineData base64 image alongside text prompt

OpenAI

  • Model: gpt-4o
  • Endpoint: https://api.openai.com/v1/chat/completions
  • Used for: VLM visual evaluation (1–10 scoring), reference image description, before/after edit comparison
  • Also: Embedding generation for semantic search (text-embedding-3-small)

Anthropic Claude

  • Model: claude-sonnet-4-5-20250929
  • Endpoint: https://api.anthropic.com/v1/messages
  • Used for: Alternative HLSL generation with structured JSON schema enforcement
  • Optional — the system works without a Claude key

API Cost Analysis

Pricing basis — costs use April 2026 list prices: gemini-3-pro-preview (treated as Gemini Pro tier): $1.25 / 1M input tokens, $5.00 / 1M output tokens gpt-4o: $2.50 / 1M input tokens, $10.00 / 1M output tokens text-embedding-3-small: $0.02 / 1M tokens Image tokens (GPT-4o): 512 × 512 px ≈ 255 tokens, 256 × 256 px ≈ 85 tokens

Verify current prices at platform.openai.com/pricing and ai.google.dev/pricing.


Building Blocks — Cost per API Call

Call Model Typical Tokens In Typical Tokens Out Min Cost Max Cost
Gemini: decompose request Gemini Pro 1,550–2,600 300–600 $0.003 $0.006
Gemini: compose HLSL (text) Gemini Pro 2,100–5,100 800–3,000 $0.007 $0.021
Gemini: compose HLSL (+ image) Gemini Pro 2,350–5,350 800–3,000 $0.007 $0.022
Gemini: classify edit / anim Gemini Pro 600–1,200 100–200 $0.001 $0.003
Gemini: update HLSL Gemini Pro 2,700–5,000 600–2,000 $0.007 $0.017
Gemini: generate anim script Gemini Pro 1,400–3,000 500–1,200 $0.004 $0.010
Gemini: suggest prop values Gemini Pro 600–1,800 100–200 $0.001 $0.003
GPT-4o: describe image GPT-4o 755–1,000 + 255 img 500–1,000 $0.007 $0.013
GPT-4o: VLM eval (1 image) GPT-4o 1,500–2,500 + 255 img 100 $0.005 $0.008
GPT-4o: before/after eval (2 images) GPT-4o 300 + 2×85 img 100 $0.002 $0.003
Embedding query (per component) text-embedding-3-small 50–200 <$0.001 <$0.001

Decompose prompt size note: The decomposition prompt includes a summary of all 184 KB shapes (names + tags, up to 20 per category) which adds ≈ 800–1,000 tokens to every decomposition call.

HLSL composition prompt note: Each retrieved KB example includes the full HLSL source file (≈ 300–800 tokens per file). With 2 examples retrieved, this adds ≈ 600–1,600 tokens per composition call.


Per-Path Cost Breakdown

Path 1 — Generate Shape from Text

Step Calls per iteration Cost per iteration
Gemini: decompose request ×1 $0.003–$0.006
Embedding searches (1–3 components) ×1–3 < $0.001
Gemini: compose HLSL ×1 $0.007–$0.021
GPT-4o: VLM eval ×1 $0.005–$0.008
Total per iteration $0.015–$0.035
Scenario Iterations Min cost Max cost
Best case (simple shape, passes first VLM) 1 $0.015 $0.035
Typical (medium complexity, 1–2 refinements) 2 $0.030 $0.070
Worst case (complex shape, 3 refinements) 3 $0.045 $0.105

Path 2 — Generate Shape from Image

Step Calls Cost
GPT-4o: describe reference image ×1 $0.007–$0.013
Gemini: decompose description ×1 per iter $0.003–$0.006
Embedding searches ×1–3 per iter < $0.001
Gemini Vision: compose HLSL ×1 per iter $0.007–$0.022
GPT-4o: VLM eval ×1 per iter $0.005–$0.008
Scenario Iterations Min cost Max cost
Best case (image is simple, passes first VLM) 1 $0.022 $0.049
Typical (1–2 refinements) 2 $0.037 $0.084
Worst case (3 refinements) 3 $0.052 $0.119

Path 3 — HLSL Import (Upload .hlsl file)

Step Cost
Gemini: suggest property values $0.001–$0.003
Total $0.001–$0.003

This path has no VLM evaluation — result is immediate.


Path 4 — Edit Shape

Sub-path A — Property change only (no shader rewrite):

Step Cost
Gemini: classify edit request $0.001–$0.003
Total $0.001–$0.003

Sub-path B — HLSL rewrite needed:

Step Calls Cost
Gemini: classify edit ×1 $0.001–$0.003
Gemini: update HLSL ×1–2 $0.007–$0.017 per iter
GPT-4o: before/after VLM eval ×1–2 $0.002–$0.003 per iter
Scenario Iterations Min cost Max cost
Best case (property change only) $0.001 $0.003
HLSL update, passes first try 1 $0.010 $0.023
HLSL update, 2 VLM iterations 2 $0.017 $0.043

Path 5 — Animate Shape

Sub-path A — C# script only (existing properties sufficient):

Step Cost
Gemini: classify animation $0.001–$0.003
Embedding search (animation KB) < $0.001
Gemini: generate C# animation script $0.004–$0.010
Total $0.005–$0.013

Sub-path B — HLSL update required first (new shader properties needed):

Step Calls Cost
Gemini: classify animation ×1 $0.001–$0.003
Gemini: update HLSL ×1–2 $0.007–$0.017 per iter
GPT-4o: before/after VLM eval ×1–2 $0.002–$0.003 per iter
Embedding search (animation KB) ×1 < $0.001
Gemini: generate C# animation script ×1 $0.004–$0.010
Scenario Min cost Max cost
C# only (no HLSL needed) $0.005 $0.013
HLSL update needed, 1 VLM iteration $0.014 $0.033
HLSL update needed, 2 VLM iterations $0.021 $0.053

Path 6 — Pixelation Effect

No API calls. ShaderGraph nodes are injected deterministically (UV quantisation via Floor/Divide).

Total $0.000

Path 7 — Glow Effect

No API calls. ShaderGraph is regenerated with HDR colour properties and a glowIntensity multiply node. A URP Bloom post-processing volume is added to the scene automatically.

Total $0.000

Typical Session Estimates

Session type Paths used Estimated total
Light — one simple shape, one property edit, one animation (C# only) Text gen (1 iter) + Edit A + Anim A ~$0.021–$0.051
Standard — image gen with 1 refinement, HLSL edit, animation (C# only) Image gen (2 iter) + Edit B (1 iter) + Anim A ~$0.059–$0.119
Heavy — complex shape (3 iters), HLSL edit (2 iters), HLSL-needed animation (2 iters) Text gen (3 iter) + Edit B (2 iter) + Anim B (2 iter) ~$0.083–$0.201
Exploration — 5 text generations + 2 edits + 2 animations 5×Text(avg 2 iter) + 2×Edit(mix) + 2×Anim(mix) ~$0.200–$0.550

One-Time Costs

Activity Cost
Generate embeddings for all 184 KB shapes ~$0.001 (one-time)
Each new shape added to KB (embedding) ~$0.000004 per shape

Knowledge base embedding is essentially free — the entire 184-shape KB costs less than $0.001 to fully re-embed.


Cost Optimisation Tips

  • Reduce VLM iterations: Lower maxVlmIterations from 3 to 1 in RAGPipelineManager and HLSLUpdatePipelineManager if speed/cost matters more than quality.
  • Use text prompts over images: The image path costs $0.007–$0.013 more per session due to the GPT-4o image description call.
  • Simple edits are cheap: If your shape already has the right properties exposed, editing is just one Gemini classification call (~$0.002).
  • Effects are free: Both Pixelation and Glow effects use no LLM calls at all.
  • Animation without HLSL changes: Sub-path A (C# only) costs ~$0.005–$0.013 vs ~$0.021–$0.053 for the HLSL-update path.

Project Structure

Assets/
├── ShaderGraphGenerator/           ← All tool source code
│   ├── Editor/                     ← Unity editor-only scripts
│   │   ├── Chat/                   ← Chatbot state machine + HTTP bridge
│   │   ├── Core/                   ← APIs, material helpers, prompt builders
│   │   ├── KnowledgeBase/          ← Embedding search, KB management
│   │   └── RAG/                    ← All generation pipelines + windows
│   │       ├── Animation/          ← Animation pipeline + C# script generation
│   │       ├── Curation/           ← KB ingestion and management
│   │       ├── Edit/               ← Edit classification
│   │       ├── Generation/         ← RAG composition engine
│   │       ├── Pipelines/          ← Top-level pipeline orchestrators
│   │       └── Windows/            ← All EditorWindow UIs
│   └── KnowledgeBase/              ← shape_metadata.json + embeddings
│
├── ShaderGraphs/
│   ├── RAG_Generated/              ← Generated HLSL, shadergraphs, materials
│   ├── RAG_Updates/                ← Edited/updated versions
│   ├── Effects/                    ← Effect variants (pixelation, etc.)
│   ├── Generated/                  ← HLSL imports + their shadergraphs
│   ├── Animations/                 ← Generated C# animation MonoBehaviours
│   ├── Previews/                   ← PNG screenshots of generated shapes
│   └── SuccessfulResults/          ← Manually curated HLSL library
│
└── ShaderGraphGeneratorConfig.asset ← API keys (do not commit)

Tech Stack

Component Technology
Runtime Unity 6 (6000.0.41f1), C# 9
Editor UI Unity IMGUI (EditorWindow, GUILayout)
HTTP Server System.Net.HttpListener on localhost:7723
Async async/await + Task, CancellationToken
JSON Newtonsoft.Json with custom JsonConverter
LLM — Code Gen Google Gemini (gemini-3-pro-preview)
LLM — Vision OpenAI GPT-4o (gpt-4o)
LLM — Structured Anthropic Claude (claude-sonnet-4-5)
Embeddings OpenAI text-embedding-3-small
Shader Format Unity ShaderGraph JSON (custom node wiring)
Shader Language HLSL (Custom Function nodes in ShaderGraph)
Persistence EditorPrefs (domain reload safety), JSON files

Notes

  • API keys — Never commit ShaderGraphGeneratorConfig.asset to a public repository. Add it to .gitignore.
  • Domain reloads — The animation pipeline survives Unity's script compilation domain reload by serialising state to EditorPrefs before triggering AssetDatabase.Refresh().
  • VLM threshold — The acceptance threshold is score > 7 out of 10. Shapes below this are automatically refined with feedback up to 3 times.
  • Knowledge base growth — Every shape accepted through Human Review (score ≥ 8) can be added to shape_metadata.json via the Auto Learn window, improving future RAG retrievals.
  • Pixelation — Implemented as a deterministic ShaderGraph modification (UV quantisation via Floor/Divide nodes), not an LLM call. Takes ~2 seconds.
  • Glow — Implemented as a deterministic ShaderGraph modification. All colour properties are switched to HDR mode (colorMode=1), a glowIntensity (default 2) multiply node is inserted before Base Color, and a global URP Bloom post-processing volume (intensity=2) is added to the scene. No LLM call. Requires URP with post-processing enabled on the camera.
  • Effect stacking — Pixelation and Glow can be applied on top of each other in any order. The system detects the existing effect and regenerates the ShaderGraph with both flags active, outputting a combined _Glow_Pixelated variant.

Master Thesis — Niloufar Moradijam — 2026

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors