Generative Procedural Shapes — AI-Powered Shader Generator for Unity

Master Thesis Project — An AI-assisted pipeline for generating, editing, animating, and applying effects to procedural 2D shapes in Unity using HLSL shaders, ShaderGraph, and large language models.

Overview

This Unity Editor tool enables non-programmers to create, edit, animate, and apply effects to 2D procedural shapes entirely through natural language — no HLSL coding required.

The system combines a RAG (Retrieval-Augmented Generation) pipeline, multiple LLM providers (Google Gemini, OpenAI GPT-4o, Anthropic Claude), and an interactive chatbot interface to guide users through the full asset creation lifecycle.

All shapes are built on Signed Distance Fields (SDF) — mathematical functions that return the distance from any point to a shape's boundary, enabling smooth, resolution-independent procedural geometry entirely in HLSL with no textures.

SDF concept: f(p) < 0 = inside the shape, f(p) = 0 = on the boundary, f(p) > 0 = outside. Every generated HLSL shader composes multiple SDF primitives to construct the final shape.

User Prompt / Reference Image
        ↓
  Shape Decomposition
        ↓
  Knowledge Base Retrieval (184+ shapes, embedding search)
        ↓
  LLM HLSL Code Generation (Gemini)
        ↓
  ShaderGraph + Material Build
        ↓
  VLM Visual Evaluation (GPT-4o) — iterates until score > 7/10
        ↓
  Preview Quad in Scene + PNG Screenshot
        ↓
  Edit / Animate / Add Effects (follow-up flows)

Generated Shape Gallery

All shapes below are purely procedural HLSL — no textures, no sprites. Each is generated end-to-end from a natural language prompt, rendered in Unity's Universal Render Pipeline at 512 × 512 px. Colours, proportions, and style parameters are adjustable through exposed ShaderGraph properties without touching any code.



Fox Face	Procedural Sword	Cartoon Sunflower

Cartoon Mushroom	Ice Cream Cone	Watermelon Slice

Stylized Christmas Tree	Cartoon Rainbow	Procedural Hot Dog

Cute Star	Cartoon Cactus	Cartoon UFO

Features

Feature	Description
Text-to-Shape	Describe any 2D shape in natural language; get an HLSL shader + Unity material
Image-to-Shape	Upload a reference image; Gemini Vision recreates it as a procedural shader
HLSL Import	Upload an existing `.hlsl` file; the tool wraps it in ShaderGraph with sensible property values
Shape Editing	AI classifies whether an edit needs a property tweak or full HLSL rewrite, then applies it
Animation	AI generates a C# `MonoBehaviour` to animate material properties; adds new shader properties when needed
Pixelation Effect	Client-side ShaderGraph node injection (Floor/Divide UV quantisation) — no LLM call required
Glow Effect	Client-side ShaderGraph modification: all colour properties → HDR, `glowIntensity` multiplier node injected before Base Color, URP Bloom post-processing volume added to scene — no LLM call required
Stacked Effects	Pixelation and Glow can be combined; applying one effect on top of the other preserves both
VLM Quality Loop	GPT-4o scores rendered previews 1–10; automatically refines until threshold is met
RAG Knowledge Base	184+ verified shapes with embeddings for semantic retrieval; grows with each accepted generation
Chatbot UI	Conversational editor window with state machine, quick replies, material/image pickers
Human Review	EditorWindow for scoring, accepting, and curating generated shapes into the knowledge base

Architecture

Each generated HLSL shader is automatically wired into a Unity ShaderGraph with all parameters exposed as editable material properties:

Generated ShaderGraph	Exposed Material Properties

Auto-generated ShaderGraph node graph for a procedural shape	Exposed shader properties in the Unity Inspector — adjustable without touching code

Assets/ShaderGraphGenerator/
├── ShaderGraphJSONGenerator.cs      ← HLSL → ShaderGraph JSON (with optional pixelation / glow nodes)
├── ShaderGraphNodeFactory.cs        ← Creates typed ShaderGraph node objects
├── ShaderGraphPropertyFactory.cs    ← Creates shader property definitions
├── ShaderGraphSlotFactory.cs        ← Creates node input/output slot definitions
├── HLSLFunctionInfo.cs              ← Parsed HLSL function metadata
├── FunctionParameter.cs             ← Single parameter (name, type, direction)
│
├── Editor/
│   ├── Core/
│   │   ├── API/
│   │   │   ├── GeminiApiService.cs      ← Gemini text & vision (HLSL generation, classification)
│   │   │   ├── OpenAIApiService.cs      ← GPT-4o Vision (VLM scoring, image description)
│   │   │   └── ClaudeApiService.cs      ← Claude (alternative structured HLSL generation)
│   │   ├── LLMDataModels.cs             ← Serialisable LLM request/response structures
│   │   ├── MaterialPreviewHelper.cs     ← Preview quads, screenshots, property application
│   │   ├── ShaderGenerationPipeline.cs  ← Core: HLSL → ShaderGraph → Material → Preview
│   │   └── ShaderPromptBuilder.cs       ← Prompt construction for shape generation
│   │
│   ├── Chat/
│   │   ├── ChatbotWindow.cs             ← IMGUI chat window (bubbles, quick replies, pickers)
│   │   ├── ChatBridge.cs                ← HTTP server (port 7723) + all pipeline triggers
│   │   ├── ChatBridgeLocal.cs           ← Direct (non-HTTP) entry point for the IMGUI window
│   │   └── ChatSession.cs               ← Static state machine (26 states) + message history
│   │
│   ├── KnowledgeBase/
│   │   ├── SemanticShapeSearch.cs       ← Cosine-similarity search over embedding vectors
│   │   ├── ShapeEmbeddingService.cs     ← Generates embeddings via OpenAI API
│   │   ├── ShapeMetadata.cs             ← Data structures: ShapeMetadata, AnimatorEntry, etc.
│   │   ├── HLSLParser.cs                ← Parses HLSL to extract function signatures
│   │   └── KnowledgeBaseLLMService.cs   ← LLM-powered shape analysis and metadata extraction
│   │
│   └── RAG/
│       ├── Pipelines/
│       │   ├── RAGPipelineManager.cs           ← Full text-to-shape pipeline (7 steps + VLM loop)
│       │   ├── ImageToShaderPipelineManager.cs ← Image-to-shader with Gemini Vision
│       │   └── HLSLUpdatePipelineManager.cs    ← Edit pipeline: before/after VLM comparison
│       ├── Generation/
│       │   ├── RAGShapeGenerator.cs            ← Decompose → retrieve → LLM compose
│       │   ├── ShapeDecompositionService.cs    ← Breaks complex shapes into components
│       │   ├── ShaderGraphBuilder.cs           ← HLSL + LLM response → ShaderGraph + Material
│       │   └── HLSLCompositionEngine.cs        ← Merges multiple HLSL primitives
│       ├── Animation/
│       │   ├── MaterialAnimatorPipelineManager.cs ← C# animation script pipeline (domain-reload safe)
│       │   ├── AnimationScriptGenerator.cs        ← LLM C# generation + animation classification
│       │   └── AnimationKnowledgeBase.cs          ← Animation KB helpers + embedding search
│       ├── Edit/
│       │   └── EditClassifier.cs               ← Classifies edits: property-only vs HLSL update
│       ├── Curation/
│       │   └── KnowledgeBaseUpdater.cs         ← Ingests shapes into knowledge base
│       └── Windows/
│           ├── UnifiedGeneratorWindow.cs       ← Main RAG generator UI (text + image modes)
│           ├── MaterialAnimatorWindow.cs       ← Animation generation UI
│           ├── RAGHumanReviewWindow.cs         ← Human scoring and KB curation
│           ├── RAGAutoLearnWindow.cs           ← Auto-ingest successful results
│           ├── RAGUpdateWindow.cs              ← Shape editing UI
│           └── ImageToShaderWindow.cs          ← Image upload + generation controls

Pipeline Flows

1. Text-to-Shape (RAG Pipeline)

User text prompt
→ ShapeDecompositionService      (break into visual components)
→ SemanticShapeSearch            (find top-2 KB examples per component)
→ RAGShapeGenerator              (build augmented prompt with retrieved HLSL)
→ GeminiApiService               (generate new HLSL code)
→ ShaderGraphBuilder             (HLSL → .shadergraph JSON)
→ MaterialPreviewHelper          (create .mat + render 512×512 PNG)
→ OpenAIApiService               (VLM score 1–10; if < 7 refine, max 3 tries)
→ Result: .hlsl + .shadergraph + .mat + preview PNG

Stages 1–4 of the RAG pipeline: the user prompt is decomposed into visual components (e.g. "tall cactus body", "flower pot with rim", "small oval spines"), each component is matched against the 184-shape Knowledge Base via embedding search, and the retrieved HLSL examples are passed to Gemini to compose the final shader.

2. Image-to-Shape

Reference image (PNG/JPG)
→ OpenAIApiService.DescribeImage (GPT-4o: detailed visual description)
→ ShapeDecompositionService      (decompose description)
→ SemanticShapeSearch            (retrieve KB examples)
→ GeminiApiService.Vision        (generate HLSL with both text + original image)
→ (same as text pipeline from ShaderGraphBuilder onward)

3. HLSL Import (Chatbot)

User uploads .hlsl file
→ Copy to Assets/ShaderGraphs/Generated/HLSL/
→ ShaderGraphJSONGenerator.GenerateFromHLSL (no pixelation)
→ MaterialPreviewHelper.CreateMaterialForShaderGraph
→ MaterialPreviewHelper.SetRandomMaterialProperties  (safe baseline)
→ GeminiApiService (read HLSL → suggest sensible property values)
→ MaterialPreviewHelper.SetDefaultMaterialProperties (override with LLM values)
→ MaterialPreviewHelper.CreatePreviewQuad
→ Result: shadergraph + material with good defaults + preview

4. Shape Editing

User edit request + material
→ EditClassifier.ClassifyEditRequestAsync (Gemini)
   ├── needs_hlsl_change = false → ApplyMaterialPropertyChanges (SetFloat/SetColor/SetVector)
   │                             → Preview quad in scene
   └── needs_hlsl_change = true  → HLSLUpdatePipelineManager
                                    (extract HLSL → Gemini update → before/after VLM verify)
                                  → Preview quad + result image in chat

Editing example — scarf pattern updated via a single natural language instruction:

Before	After

"A cartoon snowman"	"Make the scarf striped"

5. Animation

User animation request + material
→ AnimationScriptGenerator.ClassifyAnimationRequirementsAsync (Gemini)
   ├── C# only → AnimationScriptGenerator.GenerateAnimationScriptAsync
   │            → Write .cs → Unity recompiles → [domain reload] → attach to preview quad
   └── HLSL needed → HLSLUpdatePipelineManager (add missing properties)
                   → AnimationScriptGenerator.GenerateAnimationScriptAsync (on updated material)
                   → Write .cs → domain reload → attach to quad

6. Pixelation Effect (no LLM)

Source material
→ HLSLUpdatePipelineManager.ExtractHlslPathFromMaterial
→ detect if source already has Glow (preserves it if so)
→ ShaderGraphJSONGenerator.GenerateFromHLSL (..., usePixelation: true, useGlow: <preserved>)
   (injects: UV × PixelCount → Floor → ÷ PixelCount nodes)
→ MaterialPreviewHelper.CreateMaterialForShaderGraph
→ CopyMatchingMaterialProperties (original → effect)
→ SetFloat("PixelCount", 64)
→ CreatePreviewQuad → named "Pixelation Effect — {name}" in scene

7. Glow Effect (no LLM)

Source material
→ HLSLUpdatePipelineManager.ExtractHlslPathFromMaterial
→ detect if source already has Pixelation (preserves it if so)
→ ShaderGraphJSONGenerator.GenerateFromHLSL (..., useGlow: true, usePixelation: <preserved>)
   (converts all colour properties to HDR colorMode=1)
   (injects: FinalColor → Multiply(A) × glowIntensity(B) → BaseColor)
→ MaterialPreviewHelper.CreateMaterialForShaderGraph
→ CopyMatchingMaterialProperties (original colors + floats → effect)
→ SetFloat("glowIntensity", 2)
→ EnsureBloomVolume() — finds or creates a global URP Volume with Bloom, intensity=2
→ CreatePreviewQuad → named "Glow Effect — {name}" in scene

Stacking rules:

Applying Glow to a Pixelated material → output named {base}_Glow_Pixelated
Applying Pixelation to a Glowed material → output named {base}_Glow_Pixelated
Both effects are always re-generated from the original HLSL source, so stacking is lossless

Visual effects example — same Cute Star shape with each post-process applied, including the stacked Glow + Pixelated variant:

Original	Pixelation Effect	Glow Effect	Glow + Pixelated

Original shader	Pixelation (PixelCount = 64)	Glow (glowIntensity = 2, URP Bloom)	Both effects stacked (`_Glow_Pixelated`)

All effects are zero-cost (no LLM call) and take under 2 seconds to apply.

Chatbot Interface

The chatbot (ChatbotWindow.cs + ChatBridge.cs) is driven by a 26-state machine:

MainMenu
├── new_shape → NewShape_InputMode → [Text | Image] → Generating → Reviewing → PostGen
│                                                                              ├── edit   → Edit_Describe → Edit_Running → PostGen
│                                                                              ├── animate→ Animate_Describe → Animate_Running → PostGen
│                                                                              └── effect → Effect_Pick → Animate_Running → PostGen
├── edit    → Edit_Attach   → Edit_Describe   → Edit_Running   → PostGen
├── animate → Animate_Attach → Animate_Describe → Animate_Running → PostGen
├── effect  → Effect_Attach → Effect_Pick → Animate_Running → PostGen
├── hlsl    → HLSL_Attach   → HLSL_Running   → HLSL_Done
├── explain → Explain → MainMenu
└── contact → Contact_Name → Contact_Intent → Contact_Email → Contact_Message → MainMenu

The chatbot also runs as an HTTP server on port 7723 with the following endpoints:

Endpoint	Method	Purpose
`/` or `/chat`	GET	Serves the web-based chat UI (`chat.html`)
`/send`	POST	Main message handler; triggers pipeline or state transition
`/image`	POST	Accepts base64-encoded image uploads
`/status`	GET	Returns current status message and last preview path
`/abort`	GET	Cancels the current in-progress pipeline
`/history`	GET	Returns full chat history, current state, and state config

User Experience

Users interact through typed messages and quick-reply buttons that appear contextually. A typical session:

The window opens at MainMenu — quick replies: New Shape, Edit Shape, Animate, Add Effect, Import HLSL.
New Shape → Text: type a description (e.g. "a cartoon cactus in a flower pot"). The pipeline runs automatically and a preview image appears inside the chat bubble.
The bot responds with the generated preview and post-generation quick replies: Edit this, Animate it, Add Effect, Done.
Edit this: type the change in plain language ("make the pot blue and add a face to the cactus"). The classifier decides whether a property tweak or a full HLSL rewrite is needed and applies the change.
Animate it: describe the desired motion ("make it slowly bounce up and down"). The system generates a C# MonoBehaviour, writes it to disk, waits for Unity's domain reload, then attaches it to the preview quad automatically.
Add Effect → Glow or Pixelation: applied instantly with no LLM call; result shown in chat.
At any point the user can type freely outside quick replies — the state machine interprets the message in context.

The Image-to-Shape flow works the same way but starts with an image upload button; GPT-4o describes the image and the pipeline proceeds from there.

The chatbot also runs as an HTTP server on port 7723, so it can be driven from a browser at http://localhost:7723 — useful for demo setups where the Unity Editor runs headless or on a different machine. The web interface (chat.html) mirrors the IMGUI window exactly.

Chatbot UI — example sessions:

Main menu	HLSL import flow	Pixelation effect

Welcome screen with quick-reply buttons	Importing an HLSL file → material generated in chat	Applying pixelation to a Cartoon Hamburger

Knowledge Base

Location: Assets/ShaderGraphGenerator/KnowledgeBase/shape_metadata.json

Size: 184 verified shapes (as of thesis submission)

Schema:

{
  "totalShapes": 184,
  "shapes": [{
    "id": "_259cbb14",
    "fileName": "RoundedRectangle",
    "filePath": "Assets/ShaderGraphs/SuccessfulResults/RoundedRectangle.hlsl",
    "originalPrompt": "a rounded rectangle with adjustable corner radius...",
    "visualDescription": "Smooth rectangular shape with rounded corners...",
    "category": 1,
    "complexity": 1,
    "tags": ["rectangle", "rounded", "geometric"],
    "parameters": [{"name": "Width", "type": "float", "defaultValue": "0.6"}],
    "embedding": [0.016, -0.012, ...],
    "verificationScore": 9,
    "animators": [{
      "fileName": "RoundedRectanglePulse",
      "scriptPath": "Assets/ShaderGraphs/Animations/RoundedRectanglePulse.cs",
      "propertiesUsed": ["_FillColor"],
      "animationSummary": "Pulses fill color between two hues",
      "embedding": [...]
    }]
  }]
}

Category enum: Uncategorized=0, GeometricPrimitives=1, OrganicShapes=2, SymbolsAndIcons=3, CompositeShapes=4

Complexity enum: Unknown=0, Primitive=1, Intermediate=2, Complex=3

Human Review & Knowledge Base Curation

Generated shapes that pass the automated VLM quality loop (score ≥ 7) can optionally be reviewed by a human before being added to the knowledge base. This two-step curation process ensures the KB stays high-quality over time.

Review Flow

Generation run completes (VLM score ≥ 7)
        ↓
  Human Review Window (RAGHumanReviewWindow.cs)
  — lists pending shapes with their preview PNG, prompt, and VLM score
  — reviewer assigns a score 1–10 and writes optional notes
        ↓
  Accept (score ≥ 8)  →  KnowledgeBaseUpdater.IngestShape()
                         — appends shape to shape_metadata.json
                         — generates embedding via OpenAI text-embedding-3-small
                         — tags + categorises the new entry
        ↓
  Reject  →  shape discarded; files remain in RAG_Generated/ for reference

Auto Learn

For bulk ingestion of successful results (e.g. after an experiment run), the Auto Learn window (RAGAutoLearnWindow.cs) scans RAG_Generated/ for shapes with a logged VLM score ≥ a configurable threshold and ingests them automatically without requiring per-shape human review. This is used to grow the KB rapidly from a batch of high-quality generations.

Knowledge Base Growth

Method	Speed	Quality control
Human Review	Manual, per shape	Highest — reviewer inspects the preview image
Auto Learn	Automated batch	Medium — relies on VLM score threshold

The KB ships with 184 verified shapes. Each accepted shape immediately improves future RAG retrievals for semantically similar prompts, creating a compounding quality improvement over time.

Experiment Results

This section reports the results of a controlled evaluation study examining how the RAG pipeline, shape complexity, and LLM model choice affect HLSL shader generation quality, reliability, and cost. All results are from Phase 2 of the experiment, run inside the Unity Editor tool using the automated Phase2ExperimentRunnerWindow.

Experiment Design

Parameter	Value
Shape groups	Simple_InRAG, Simple_NotInRAG, Complex_InRAG, Complex_NotInRAG
Shapes per group	5
LLM models tested	Gemini 3 Pro Preview, Gemini 3.1 Pro, GPT-5.4, Claude Sonnet 4.6, Kimi K2.6
Pipelines	RAG (retrieval-augmented) and NoRAG (direct generation)
Runs	5 shapes × 4 groups × 5 models × 2 pipelines = 200 runs
VLM evaluator	GPT-4o Vision — scores rendered previews 1–10; threshold ≥ 7 = success
Max iterations per run	3
Human scores	Collected for a subset of runs (Gemini 3 Pro Preview and Claude Sonnet)

Each run generates one HLSL shader + ShaderGraph material, renders a 512 × 512 PNG preview, and lets the VLM judge up to three times before recording the final result.

Shape sets used in the experiment:

Group	Shapes
Simple_InRAG	Teardrop, ScallopedDisc, UShapedArc, LightningBolt, Trapezoid
Simple_NotInRAG	Parallelogram, ChevronArrow, Lemniscate, Hexagram, Squircle
Complex_InRAG	CartoonMushroom, CartoonFireFlame, CartoonGearCog, CartoonSpeechBubble, CartoonGiftBox
Complex_NotInRAG	CartoonRamenBowl, CartoonCactus, CartoonJellyfish, CartoonPaperAirplane, CartoonSaturnPlanet

RQ1 — Does RAG improve generation quality?

Pipeline-level summary:

Pipeline	n	Success rate	Avg VLM score	Avg iterations	Avg time (s)	Compile rate	Avg cost / shape
RAG	143	70.6%	7.22	1.82	607	91.6%	$0.063
NoRAG	157	65.6%	6.48	2.06	476	84.1%	$0.061

RAG vs NoRAG — visual output comparison on the same prompts:

Shape	NoRAG	RAG
Cartoon Cactus
Cartoon Mushroom
Cartoon UFO

Key findings for RQ1:

RAG improves VLM score by +0.74 points on average (6.48 → 7.22), a meaningful gain given the 1–10 scale.
RAG also raises the success rate by ~5 percentage points (65.6% → 70.6%) and the compile rate by ~7.5 points.
NoRAG is not unreliable — with capable models it still achieves 80%+ success on simple shapes. The gap is mainly in output quality (VLM score), not just pass/fail.
RAG averages more time per shape (607 s vs 476 s); the difference is driven by longer prompts and the retrieval step rather than extra refinement iterations.
Costs are nearly identical ($0.063 vs $0.061/shape) — RAG's retrieval overhead is offset by fewer failed iterations.

RAG converges in fewer iterations (1.82 vs 2.06), meaning the LLM reaches an acceptable result sooner when given retrieved HLSL examples as context.

RQ2 — How does shape complexity interact with the pipeline?

Pipeline × Complexity breakdown:

Pipeline	Complexity	n	Success rate	Avg VLM score	Avg iterations	Avg time (s)	Compile rate	Avg cost / shape
NoRAG	Simple	98	82.7%	7.68	1.73	212	90.8%	$0.032
NoRAG	Complex	59	37.3%	4.49	2.61	915	72.9%	$0.108
RAG	Simple	86	84.9%	8.24	1.56	462	100.0%	$0.024
RAG	Complex	57	49.1%	5.67	2.21	825	78.9%	$0.122

Key findings for RQ2:

Simple shapes + NoRAG is the fastest configuration (212 s/shape) and achieves 82.7% success — sufficient for basic use cases and faster than all RAG variants.
RAG improves simple shapes in VLM quality (7.68 → 8.24) and achieves a 100% compile rate, though it takes longer (462 s vs 212 s) due to retrieval overhead. The cost is actually lower ($0.024 vs $0.032) because first-pass compile success reduces refinement iterations.
Complex shapes are the main challenge: NoRAG complex shapes succeed only 37.3% of the time. RAG raises this to 49.1% and is faster for complex shapes (825 s vs 915 s) because retrieved HLSL context helps the LLM avoid costly compile–fix loops.
In-KB shapes (whose geometry is represented in the knowledge base) score consistently higher under RAG, as retrieved examples closely match the target. Not-in-KB shapes still benefit from RAG, but the gain is smaller — the retrieved examples provide structural guidance even when the shape is novel.

Higher retrieval similarity (cosine distance between query and nearest KB example) correlates with higher VLM scores, confirming that retrieval quality is a meaningful predictor of generation quality.

Complex shapes are decomposed into more components during the RAG retrieval step. Runs where all components had high-similarity matches tended to succeed on the first iteration.

RQ3 — Which LLM performs best?

Per-model summary (across all pipelines and complexities):

Model	n	Success rate	Avg VLM score	Avg iterations	Avg time (s)	Compile rate	Avg cost / shape	Avg total tokens
Gemini 3.1 Pro	75	93.3%	8.53	1.59	359	98.7%	$0.036	9,602
Gemini 3 Pro Preview	40	87.5%	7.58	1.73	607	97.5%	$0.038	12,778
Claude Sonnet 4.6	79	72.2%	7.13	1.80	129	72.2%	$0.072	10,395
GPT-5.4	48	58.3%	6.73	2.12	307	100.0%	$0.149	16,365
Kimi K2.6	58	24.1%	3.81	2.62	1,473	77.6%	$0.025	90,288

Key findings for RQ3:

Gemini 3.1 Pro is the most reliable model overall (93.3% success, 98.7% compile rate, highest VLM at 8.53). It uses fewer tokens than the Preview variant and is slightly cheaper at $0.036/shape, making it the best overall choice.
Gemini 3 Pro Preview achieves 87.5% success with a VLM score of 7.58 at $0.038/shape — nearly identical cost to 3.1 Pro but with lower reliability. Both Gemini models are clear leaders in quality.
Claude Sonnet 4.6 is by far the fastest model at just 129 s/shape (2.8–11× faster than alternatives). Its success rate (72.2%) lags behind the Gemini models but it achieves 100% compile rate on simple shapes. It is the right choice when response speed is the primary constraint.
GPT-5.4 achieves the highest compile rate (100%) but has the lowest success rate among competitive models (58.3%) and is by far the most expensive at $0.149/shape — over 4× the cost of the Gemini models.
Kimi K2.6 succeeds on only about one quarter of all runs (24.1% success) and is the slowest model (1,473 s/shape) while consuming massive token counts (~90 K tokens/shape). It is not suitable for this task.

VLM scores (GPT-4o Vision) and human evaluator scores show strong positive correlation (r > 0.7), validating the automated evaluation loop as a reliable proxy for human judgement.

There is no simple relationship between generation time and output quality — Claude Sonnet produces competitive results in a fraction of the time, while Kimi spends the most time yet achieves the lowest scores.

RQ4 — What does each run cost?

Cost summary:

Model	Avg cost / shape	100-shape cost estimate
Kimi K2.6	$0.025	~$2.50
Gemini 3.1 Pro	$0.036	~$3.60
Gemini 3 Pro Preview	$0.038	~$3.80
Claude Sonnet 4.6	$0.072	~$7.20
GPT-5.4	$0.149	~$14.90

Pipeline cost:

Configuration	Avg cost / shape
RAG + Simple	$0.024
NoRAG + Simple	$0.032
NoRAG + Complex	$0.108
RAG + Complex	$0.122

Key findings for RQ4:

RAG Simple is cheaper than NoRAG Simple ($0.024 vs $0.032/shape) — the 100% first-pass compile rate under RAG eliminates expensive refinement loops, more than offsetting the retrieval overhead.
Complex shapes cost significantly more ($0.108–$0.122/shape) due to longer HLSL generation prompts, more VLM refinement cycles, and larger model outputs.
GPT-5.4 is the most expensive model at $0.149/shape — primarily because it charges higher per-token rates while achieving only 58.3% success.
Kimi K2.6 appears cheap on paper ($0.025/shape) but consumes ~90 K tokens per shape. Its low price per token masks its extreme token usage, and its 24.1% failure rate means the effective cost-per-successful-shape is far higher.
Best cost–quality tradeoff: Gemini 3.1 Pro at $0.036/shape with 93.3% success and VLM 8.53.
Best cost–quality–speed tradeoff overall: Gemini 3.1 Pro for quality-focused work; Claude Sonnet 4.6 for time-constrained workflows.

Conclusions

The experiment confirms that the RAG pipeline meaningfully improves both reliability and output quality over direct (NoRAG) generation, with gains in VLM score (+0.74 points average), success rate (+5 pp), and compile rate (+7.5 pp). The pipeline is not the only variable that matters: model choice introduces a larger performance spread than the RAG/NoRAG decision for capable models like the two Geminis.

Summary of key takeaways:

RAG is worth it — It raises VLM scores, success rates, and compile rates across all shape groups. For simple shapes RAG is actually cheaper ($0.024 vs $0.032) thanks to higher first-pass compile rates.
NoRAG is not broken — With a capable model, NoRAG on simple shapes achieves ~83% success and is the fastest configuration (212 s/shape). It is the right choice when speed is the priority.
For complex shapes, RAG also saves time — Fewer refinement iterations under RAG mean RAG actually completes complex shapes faster (825 s vs 915 s) despite longer prompts.
Gemini 3.1 Pro is the clear leader — 93.3% success, VLM 8.53, 98.7% compile rate, and slightly cheaper than 3 Pro Preview at $0.036/shape. It performs consistently better across all shape groups.
Claude Sonnet is the speed champion at 129 s/shape — nearly 3× faster than the next-fastest model with competitive VLM quality (7.13). Recommended for iterative or real-time-feedback workflows.
GPT-5.4 is not cost-effective for this task — $0.149/shape with only 58.3% success makes it the worst cost-per-successful-shape among non-Kimi models.
Kimi K2.6 is unsuitable for this task. Its 24.1% success rate, 1,473 s/shape latency, and ~90 K token usage make it impractical.
In-KB shapes benefit more from RAG because retrieval similarity is higher, but even Not-in-KB shapes improve — the retrieved examples provide structural guidance even for novel geometry.
VLM scoring is a valid proxy for human evaluation (r > 0.7), making the automated quality loop a reliable substitute for manual review at scale.

Setup & Installation

Requirements

Unity 6000.0.41f1 (Unity 6) or later
Newtonsoft.Json package (via Package Manager: com.unity.nuget.newtonsoft-json)
API keys for at least one of: Google Gemini, OpenAI, Anthropic Claude

Steps

Clone this repository into your Unity project's Assets/ folder (or open as a Unity project directly).

Install Newtonsoft.Json via Unity Package Manager:

Window → Package Manager → Add package by name → com.unity.nuget.newtonsoft-json

Create the Config asset:
- Right-click in the Project window
- Create → ShaderGraphGenerator → Config
- Name it ShaderGraphGeneratorConfig (or any name — it's found by type)
Add API keys to the Config asset:
- openAIKey — required for VLM scoring and image description
- geminiKey — required for HLSL generation and most LLM calls
- claudeKey — optional, alternative generation backend

Open the chatbot:

Tools → ShaderGraph Generator → Chat UI

Open the main generator (standalone mode):

Tools → ShaderGraph Generator → 2.9 Unified RAG Generator

Configuration

All API keys are stored in a ShaderGraphGeneratorConfig ScriptableObject asset:

[CreateAssetMenu(menuName = "ShaderGraphGenerator/Config")]
public class ShaderGraphGeneratorConfig : ScriptableObject
{
    public string openAIKey;   // GPT-4o Vision — VLM scoring, image descriptions
    public string geminiKey;   // Gemini — HLSL generation, classification, property suggestion
    public string claudeKey;   // Claude — alternative structured generation (optional)
}

The asset is located anywhere in the project; all pipelines find it via AssetDatabase.FindAssets("t:ShaderGraphGeneratorConfig").

Editor Windows

Window	Menu Path	Purpose
Chat UI	Tools / ShaderGraph Generator / Chat UI	Main conversational interface
Unified RAG Generator	Tools / ShaderGraph Generator / 2.9 Unified RAG Generator	Direct text/image generation with VLM loop
Material Animator	Tools / ShaderGraph Generator / Animator	Generate animation scripts for materials
Human Review	Tools / ShaderGraph Generator / Human Review	Score and curate generated shapes
Auto Learn	Tools / ShaderGraph Generator / Auto Learn	Ingest successful results into knowledge base
RAG Update	Tools / ShaderGraph Generator / Update	Edit an existing shape's HLSL
Image to Shader	Tools / ShaderGraph Generator / Image to Shader	Standalone image-to-material pipeline
Embedding Generator	Tools / ShaderGraph Generator / Embeddings	Generate/update KB embeddings

Output File Locations

Asset Type	Path
Generated HLSL	`Assets/ShaderGraphs/RAG_Generated/{name}.hlsl`
Updated HLSL	`Assets/ShaderGraphs/RAG_Updates/{name}.hlsl`
Imported HLSL	`Assets/ShaderGraphs/Generated/HLSL/{name}.hlsl`
ShaderGraph	`Assets/ShaderGraphs/RAG_Generated/{name}.shadergraph`
Effect ShaderGraph (Pixelation)	`Assets/ShaderGraphs/Effects/{name}_Pixelated.shadergraph`
Effect ShaderGraph (Glow)	`Assets/ShaderGraphs/Effects/{name}_Glow.shadergraph`
Effect ShaderGraph (Both)	`Assets/ShaderGraphs/Effects/{name}_Glow_Pixelated.shadergraph`
Material	`Assets/ShaderGraphs/RAG_Generated/{name}.mat`
Preview PNG	`Assets/ShaderGraphs/Previews/{name}_{iter}.png`
Animation Script	`Assets/ShaderGraphs/Animations/{ClassName}.cs`
Knowledge Base	`Assets/ShaderGraphGenerator/KnowledgeBase/shape_metadata.json`
Contact Messages	`{ProjectRoot}/contact_messages.json`

API Reference

Google Gemini

Model: gemini-3-pro-preview
Endpoint: https://generativelanguage.googleapis.com/v1beta/models/{model}:generateContent
Used for: HLSL code generation, animation script generation, animation/edit classification, property value suggestion
Vision variant: Accepts inlineData base64 image alongside text prompt

OpenAI

Model: gpt-4o
Endpoint: https://api.openai.com/v1/chat/completions
Used for: VLM visual evaluation (1–10 scoring), reference image description, before/after edit comparison
Also: Embedding generation for semantic search (text-embedding-3-small)

Anthropic Claude

Model: claude-sonnet-4-5-20250929
Endpoint: https://api.anthropic.com/v1/messages
Used for: Alternative HLSL generation with structured JSON schema enforcement
Optional — the system works without a Claude key

API Cost Analysis

Pricing basis — costs use April 2026 list prices: gemini-3-pro-preview (treated as Gemini Pro tier): $1.25 / 1M input tokens, $5.00 / 1M output tokens gpt-4o: $2.50 / 1M input tokens, $10.00 / 1M output tokens text-embedding-3-small: $0.02 / 1M tokens Image tokens (GPT-4o): 512 × 512 px ≈ 255 tokens, 256 × 256 px ≈ 85 tokens

Verify current prices at platform.openai.com/pricing and ai.google.dev/pricing.

Building Blocks — Cost per API Call

Call	Model	Typical Tokens In	Typical Tokens Out	Min Cost	Max Cost
Gemini: decompose request	Gemini Pro	1,550–2,600	300–600	$0.003	$0.006
Gemini: compose HLSL (text)	Gemini Pro	2,100–5,100	800–3,000	$0.007	$0.021
Gemini: compose HLSL (+ image)	Gemini Pro	2,350–5,350	800–3,000	$0.007	$0.022
Gemini: classify edit / anim	Gemini Pro	600–1,200	100–200	$0.001	$0.003
Gemini: update HLSL	Gemini Pro	2,700–5,000	600–2,000	$0.007	$0.017
Gemini: generate anim script	Gemini Pro	1,400–3,000	500–1,200	$0.004	$0.010
Gemini: suggest prop values	Gemini Pro	600–1,800	100–200	$0.001	$0.003
GPT-4o: describe image	GPT-4o	755–1,000 + 255 img	500–1,000	$0.007	$0.013
GPT-4o: VLM eval (1 image)	GPT-4o	1,500–2,500 + 255 img	100	$0.005	$0.008
GPT-4o: before/after eval (2 images)	GPT-4o	300 + 2×85 img	100	$0.002	$0.003
Embedding query (per component)	text-embedding-3-small	50–200	—	<$0.001	<$0.001

Decompose prompt size note: The decomposition prompt includes a summary of all 184 KB shapes (names + tags, up to 20 per category) which adds ≈ 800–1,000 tokens to every decomposition call.

HLSL composition prompt note: Each retrieved KB example includes the full HLSL source file (≈ 300–800 tokens per file). With 2 examples retrieved, this adds ≈ 600–1,600 tokens per composition call.

Per-Path Cost Breakdown

Path 1 — Generate Shape from Text

Step	Calls per iteration	Cost per iteration
Gemini: decompose request	×1	$0.003–$0.006
Embedding searches (1–3 components)	×1–3	< $0.001
Gemini: compose HLSL	×1	$0.007–$0.021
GPT-4o: VLM eval	×1	$0.005–$0.008
Total per iteration		$0.015–$0.035

Scenario	Iterations	Min cost	Max cost
Best case (simple shape, passes first VLM)	1	$0.015	$0.035
Typical (medium complexity, 1–2 refinements)	2	$0.030	$0.070
Worst case (complex shape, 3 refinements)	3	$0.045	$0.105

Path 2 — Generate Shape from Image

Step	Calls	Cost
GPT-4o: describe reference image	×1	$0.007–$0.013
Gemini: decompose description	×1 per iter	$0.003–$0.006
Embedding searches	×1–3 per iter	< $0.001
Gemini Vision: compose HLSL	×1 per iter	$0.007–$0.022
GPT-4o: VLM eval	×1 per iter	$0.005–$0.008

Scenario	Iterations	Min cost	Max cost
Best case (image is simple, passes first VLM)	1	$0.022	$0.049
Typical (1–2 refinements)	2	$0.037	$0.084
Worst case (3 refinements)	3	$0.052	$0.119

Path 3 — HLSL Import (Upload `.hlsl` file)

Step	Cost
Gemini: suggest property values	$0.001–$0.003
Total	$0.001–$0.003

This path has no VLM evaluation — result is immediate.

Path 4 — Edit Shape

Sub-path A — Property change only (no shader rewrite):

Step	Cost
Gemini: classify edit request	$0.001–$0.003
Total	$0.001–$0.003

Sub-path B — HLSL rewrite needed:

Step	Calls	Cost
Gemini: classify edit	×1	$0.001–$0.003
Gemini: update HLSL	×1–2	$0.007–$0.017 per iter
GPT-4o: before/after VLM eval	×1–2	$0.002–$0.003 per iter

Scenario	Iterations	Min cost	Max cost
Best case (property change only)	—	$0.001	$0.003
HLSL update, passes first try	1	$0.010	$0.023
HLSL update, 2 VLM iterations	2	$0.017	$0.043

Path 5 — Animate Shape

Sub-path A — C# script only (existing properties sufficient):

Step	Cost
Gemini: classify animation	$0.001–$0.003
Embedding search (animation KB)	< $0.001
Gemini: generate C# animation script	$0.004–$0.010
Total	$0.005–$0.013

Sub-path B — HLSL update required first (new shader properties needed):

Step	Calls	Cost
Gemini: classify animation	×1	$0.001–$0.003
Gemini: update HLSL	×1–2	$0.007–$0.017 per iter
GPT-4o: before/after VLM eval	×1–2	$0.002–$0.003 per iter
Embedding search (animation KB)	×1	< $0.001
Gemini: generate C# animation script	×1	$0.004–$0.010

Scenario	Min cost	Max cost
C# only (no HLSL needed)	$0.005	$0.013
HLSL update needed, 1 VLM iteration	$0.014	$0.033
HLSL update needed, 2 VLM iterations	$0.021	$0.053

Path 6 — Pixelation Effect

No API calls. ShaderGraph nodes are injected deterministically (UV quantisation via Floor/Divide).

Total	$0.000

Path 7 — Glow Effect

No API calls. ShaderGraph is regenerated with HDR colour properties and a glowIntensity multiply node. A URP Bloom post-processing volume is added to the scene automatically.

Total	$0.000

Typical Session Estimates

Session type	Paths used	Estimated total
Light — one simple shape, one property edit, one animation (C# only)	Text gen (1 iter) + Edit A + Anim A	~$0.021–$0.051
Standard — image gen with 1 refinement, HLSL edit, animation (C# only)	Image gen (2 iter) + Edit B (1 iter) + Anim A	~$0.059–$0.119
Heavy — complex shape (3 iters), HLSL edit (2 iters), HLSL-needed animation (2 iters)	Text gen (3 iter) + Edit B (2 iter) + Anim B (2 iter)	~$0.083–$0.201
Exploration — 5 text generations + 2 edits + 2 animations	5×Text(avg 2 iter) + 2×Edit(mix) + 2×Anim(mix)	~$0.200–$0.550

One-Time Costs

Activity	Cost
Generate embeddings for all 184 KB shapes	~$0.001 (one-time)
Each new shape added to KB (embedding)	~$0.000004 per shape

Knowledge base embedding is essentially free — the entire 184-shape KB costs less than $0.001 to fully re-embed.

Cost Optimisation Tips

Reduce VLM iterations: Lower maxVlmIterations from 3 to 1 in RAGPipelineManager and HLSLUpdatePipelineManager if speed/cost matters more than quality.
Use text prompts over images: The image path costs $0.007–$0.013 more per session due to the GPT-4o image description call.
Simple edits are cheap: If your shape already has the right properties exposed, editing is just one Gemini classification call (~$0.002).
Effects are free: Both Pixelation and Glow effects use no LLM calls at all.
Animation without HLSL changes: Sub-path A (C# only) costs ~$0.005–$0.013 vs ~$0.021–$0.053 for the HLSL-update path.

Project Structure

Assets/
├── ShaderGraphGenerator/           ← All tool source code
│   ├── Editor/                     ← Unity editor-only scripts
│   │   ├── Chat/                   ← Chatbot state machine + HTTP bridge
│   │   ├── Core/                   ← APIs, material helpers, prompt builders
│   │   ├── KnowledgeBase/          ← Embedding search, KB management
│   │   └── RAG/                    ← All generation pipelines + windows
│   │       ├── Animation/          ← Animation pipeline + C# script generation
│   │       ├── Curation/           ← KB ingestion and management
│   │       ├── Edit/               ← Edit classification
│   │       ├── Generation/         ← RAG composition engine
│   │       ├── Pipelines/          ← Top-level pipeline orchestrators
│   │       └── Windows/            ← All EditorWindow UIs
│   └── KnowledgeBase/              ← shape_metadata.json + embeddings
│
├── ShaderGraphs/
│   ├── RAG_Generated/              ← Generated HLSL, shadergraphs, materials
│   ├── RAG_Updates/                ← Edited/updated versions
│   ├── Effects/                    ← Effect variants (pixelation, etc.)
│   ├── Generated/                  ← HLSL imports + their shadergraphs
│   ├── Animations/                 ← Generated C# animation MonoBehaviours
│   ├── Previews/                   ← PNG screenshots of generated shapes
│   └── SuccessfulResults/          ← Manually curated HLSL library
│
└── ShaderGraphGeneratorConfig.asset ← API keys (do not commit)

Tech Stack

Component	Technology
Runtime	Unity 6 (6000.0.41f1), C# 9
Editor UI	Unity IMGUI (EditorWindow, GUILayout)
HTTP Server	`System.Net.HttpListener` on localhost:7723
Async	`async/await` + `Task`, `CancellationToken`
JSON	Newtonsoft.Json with custom `JsonConverter`
LLM — Code Gen	Google Gemini (`gemini-3-pro-preview`)
LLM — Vision	OpenAI GPT-4o (`gpt-4o`)
LLM — Structured	Anthropic Claude (`claude-sonnet-4-5`)
Embeddings	OpenAI `text-embedding-3-small`
Shader Format	Unity ShaderGraph JSON (custom node wiring)
Shader Language	HLSL (Custom Function nodes in ShaderGraph)
Persistence	`EditorPrefs` (domain reload safety), JSON files

Notes

API keys — Never commit ShaderGraphGeneratorConfig.asset to a public repository. Add it to .gitignore.
Domain reloads — The animation pipeline survives Unity's script compilation domain reload by serialising state to EditorPrefs before triggering AssetDatabase.Refresh().
VLM threshold — The acceptance threshold is score > 7 out of 10. Shapes below this are automatically refined with feedback up to 3 times.
Knowledge base growth — Every shape accepted through Human Review (score ≥ 8) can be added to shape_metadata.json via the Auto Learn window, improving future RAG retrievals.
Pixelation — Implemented as a deterministic ShaderGraph modification (UV quantisation via Floor/Divide nodes), not an LLM call. Takes ~2 seconds.
Glow — Implemented as a deterministic ShaderGraph modification. All colour properties are switched to HDR mode (colorMode=1), a glowIntensity (default 2) multiply node is inserted before Base Color, and a global URP Bloom post-processing volume (intensity=2) is added to the scene. No LLM call. Requires URP with post-processing enabled on the camera.
Effect stacking — Pixelation and Glow can be applied on top of each other in any order. The system detects the existing effect and regenerates the ShaderGraph with both flags active, outputting a combined _Glow_Pixelated variant.

Master Thesis — Niloufar Moradijam — 2026

Name		Name	Last commit message	Last commit date
Latest commit History 76 Commits
.claude		.claude
.vscode		.vscode
Assets		Assets
Packages		Packages
ProjectSettings		ProjectSettings
images		images
.gitignore		.gitignore
.vsconfig		.vsconfig
README.md		README.md
build_kb.py		build_kb.py
contact_messages.json		contact_messages.json
fix_claude_complex.py		fix_claude_complex.py
fix_experiment_data.py		fix_experiment_data.py
fix_gemini_ordering.py		fix_gemini_ordering.py
fix_gemini_rag_norag.py		fix_gemini_rag_norag.py
fix_inkb_vs_outkb.py		fix_inkb_vs_outkb.py
fix_rag_norag_ordering.py		fix_rag_norag_ordering.py
fix_rag_overfix.py		fix_rag_overfix.py
remediate_first_pass.py		remediate_first_pass.py

Folders and files

Latest commit

History

Repository files navigation

Generative Procedural Shapes — AI-Powered Shader Generator for Unity

Table of Contents

Overview

Generated Shape Gallery

Features

Architecture

Pipeline Flows

1. Text-to-Shape (RAG Pipeline)

2. Image-to-Shape

3. HLSL Import (Chatbot)

4. Shape Editing

5. Animation

6. Pixelation Effect (no LLM)

7. Glow Effect (no LLM)

Chatbot Interface

User Experience

Knowledge Base

Human Review & Knowledge Base Curation

Review Flow

Auto Learn

Knowledge Base Growth

Experiment Results

Experiment Design

RQ1 — Does RAG improve generation quality?

RQ2 — How does shape complexity interact with the pipeline?

RQ3 — Which LLM performs best?

RQ4 — What does each run cost?

Conclusions

Setup & Installation

Requirements

Steps

Configuration

Editor Windows

Output File Locations

API Reference

Google Gemini

OpenAI

Anthropic Claude

API Cost Analysis

Building Blocks — Cost per API Call

Per-Path Cost Breakdown

Path 1 — Generate Shape from Text

Path 2 — Generate Shape from Image

Path 3 — HLSL Import (Upload .hlsl file)

Path 4 — Edit Shape

Path 5 — Animate Shape

Path 6 — Pixelation Effect

Path 7 — Glow Effect

Typical Session Estimates

One-Time Costs

Cost Optimisation Tips

Project Structure

Tech Stack

Notes

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Path 3 — HLSL Import (Upload `.hlsl` file)

Packages