Skip to content

Latest commit

 

History

History
788 lines (636 loc) · 38.4 KB

File metadata and controls

788 lines (636 loc) · 38.4 KB

2026-01-04

Response Healing for Malformed LLM JSON

Feature: Client-side JSON healing module that automatically repairs malformed tool arguments from LLM responses. Mirrors OpenRouter's server-side response-healing plugin behavior, but works for tool calling (which doesn't use response_format).

Problem: PostgreSQL logs revealed recurring JSON parse failures:

  • JSON parse error for read_file: Unexpected non-whitespace character after JSON at position 32 (mimo-v2-flash)
  • JSON parse error for write_file: Expected ',' or '}' after property value in JSON at position 4700
  • Unicode tool call delimiters from Kimi K2: <|tool▁call▁begin|>{...}<|tool▁call▁end|>

Solution: 4-phase healing pipeline in services/responseHealing.js:

  1. Strip Wrappers - Remove markdown code blocks, JSONP wrappers, Unicode delimiters, extract from prose
  2. Repair Structure - Fix trailing commas, close unclosed brackets/braces, add missing commas
  3. Normalize Quotes - Convert smart quotes (U+201C etc.), single quotes to double, quote unquoted keys
  4. Final Cleanup - Convert Python bools (Truetrue), remove comments

Changes Made:

File Change
services/responseHealing.js New ~160 line module
services/responseHealing.test.js 38 unit tests from production data
index.js:42 Import healAndParseJSON
index.js:2036-2049 Replace try/catch with healing
services/editService.js:7 Import healAndParseJSON
services/editService.js:240-249 Replace try/catch with healing
package.json Add npm test script

Test Coverage (38 tests):

  • 4 real production failures from PostgreSQL logs
  • 4 markdown wrapper patterns
  • 5 structural repairs (commas, brackets)
  • 4 quote normalization cases
  • 4 Python value conversions
  • 3 prose extraction patterns
  • 8 edge cases (null, empty, whitespace, nested)
  • 3 combined repair scenarios
  • 4 tool-specific patterns

Console Output:

[RESPONSE_HEALING] Applied: extracted_from_prose, closed_braces, py_true

Run Tests: npm test


2026-01-03

Expanded Discord Listening & Response Sensitivity

Feature: Expanded bot's Discord listening capabilities with three new mechanisms: active channel tracking, thread auto-join, and "bot" keyword trigger.

Changes Made:

1. Active Channel Tracking (index.js:315-380):

  • New activeChannels Map tracks channels where bot has been engaged
  • Channels stay "active" for 30 minutes after last interaction
  • trackActiveChannel(message) - registers channel with timestamp
  • isRecentlyActiveChannel(channelId) - checks if within 30 min window
  • cleanupActiveChannels() - runs every 5 min to expire old entries
  • shouldProcessChannel() - unified filter checking: configured IDs, active channels, thread parents

2. Thread Auto-Join (index.js:395-410):

  • New threadCreate event handler
  • Auto-joins threads in monitored/active parent channels
  • Logs: [THREAD] Auto-joined thread "name" in parent channel X

3. "bot" Keyword Trigger (index.js:357-360):

  • containsBotKeyword(content) - detects standalone "bot" or "Bot" via regex /\bbot\b|\bBot\b/
  • Bot responds to keyword mentions in addition to @mentions
  • Trigger type logged: [MENTION] vs [BOT_KEYWORD]

4. Modified messageCreate Handler (index.js:2686-2735):

  • Thread detection: isThread, parentId extracted from channel
  • Uses shouldProcessChannel() for filtering
  • Dual trigger: isMentioned || hasBotKeyword
  • Tracks channel as active when triggered
  • Enhanced logging with thread context

5. Enhanced Channel Metadata (index.js:553-564):

  • getChannelMetadata() now returns id, parentId, isActive in addition to existing fields

Unfinished Research:

Discord.js Thread Behavior (needs verification):

  • Does GuildMessages intent receive thread messages automatically, or do we need explicit thread intents?
  • Does thread.join() persist across bot restarts, or do we need to rejoin on startup?
  • Do thread messages have channel.parentId reliably set, or only when fetched fresh?
  • How do archived threads behave - does the bot get kicked out?

Active Channel Edge Cases (needs testing):

  • What happens if user interacts in channel, bot goes offline, comes back within 30 min?
    • Answer: activeChannels Map is in-memory only, lost on restart
  • Should we persist active channels to PostgreSQL for restart resilience?
  • What's the Discord rate limit on channel.messages.fetch() for newly active channels?

"bot" Keyword False Positives (needs monitoring):

  • Will messages like "chatbot", "robot", "reboot" trigger? (No - regex uses word boundaries)
  • What about "Bot!" or "bot," with punctuation? (Should work - \b handles these)
  • Need to monitor logs for unwanted triggers in production

Thread Auto-Join Limitations:

  • Private threads may require different permissions
  • Existing threads at bot startup are not auto-joined (only new ones via threadCreate)
  • Consider adding startup scan to join existing threads in monitored channels

Commit: e073989 - "feat: expand discord listening with active channels, threads, and 'bot' keyword trigger"


Hallucinated Success Prevention

Issue: Bot was claiming "yeah there we go, all set changes are live" without actually executing any tools. User reported requesting an edit to enlightenment-breathing.html and bot falsely claimed success, then treated the complaint "no changes were made" as casual conversation.

Evidence:

Bot SportelloAPP — 1:54 AM
yeah there we go, all set changes are live at https://bot.inference-arcade.com/

[User: "no changes were made"]

ROUTER: Plan: chat → [none] (confidence: 0.5, method: pattern)
MENTION: Attempt 1: Router=chat(0.50) Classifier=CONVERSATION(keyword)

Root Causes:

  1. editService.js:389 - Fallback returned "changes are live" even when editCompleted=false
  2. requestClassifier.js:135-137 - Short messages ≤20 chars without keywords classified as CONVERSATION, so "no changes were made" (19 chars) got casual chat response
  3. index.js agent loop - LLM could generate success-sounding text without calling tools, no verification against actual tool execution

Fixes Applied:

1. editService.js (services/editService.js:389-410):

  • Only claims success if editCompleted=true
  • If no edit happened, returns suggestNormalFlow: true to let full agent retry
  • Logs to PostgreSQL when edit loop fails without completion

2. requestClassifier.js (services/requestClassifier.js:127-155):

  • Added complaint pattern detection before greeting check:
    const complaintPatterns = [
        /\bno changes\b/,
        /\bnothing (was|happened|changed)\b/,
        /\bdidn'?t (work|change|do|happen)\b/,
        /\bstill (the same|broken|wrong)\b/,
        /\btry again\b/,
        // ... etc
    ];
  • Complaints now route to agent flow with method: 'complaint-detected'

3. index.js (index.js:2197-2225):

  • Added hallucination detection after response cleaning:
    const routerSuggestedAction = routingPlan && actionIntents.includes(routingPlan.intent);
    const soundsLikeSuccess = /\b(changes are live|all set|done|updated|...)\b/i.test(content);
    
    if (routerSuggestedAction && completedActions === 0 && soundsLikeSuccess) {
        logEvent('LLM', `HALLUCINATION DETECTED: Router suggested ${routingPlan.intent} but no tools executed`);
        // Log to PostgreSQL and return honest response instead
    }
  • Logs hallucinations to PostgreSQL with category: 'hallucination' for tracking

4. index.js fallback cleanup (index.js:2928-2932):

  • Removed false success fallback in edit loop path
  • Now falls through to full agent flow instead of claiming success

Testing:

# Router correctly identifies edit intent
node -e "const { patternRoute } = require('./services/llmRouter'); console.log(patternRoute('update enlightenment-breathing.html to share noir aesthetic', {}))"
# → intent: 'edit', confidence: 0.8, toolSequence: ['file_exists', 'read_file', 'edit_file']

# Classifier correctly routes complaints
node -e "require('./services/requestClassifier').classifyRequest('no changes were made').then(r => console.log(r))"
# → type: 'UNKNOWN', method: 'keyword' (no longer CONVERSATION)

2026-01-02

Compositional Identity & API Error Fixes

Issue: Railway logs showed two critical errors: (1) "No models provided" 400 error from OpenRouter API, and (2) agent exhibiting "schizophrenic" behavior with inconsistent personality across different code paths after modular prompt refactoring.

Root Causes:

  1. MODEL_PRESETS.glm referenced in index.js:2970,3000 but 'glm' key was removed from config/models.js during model cleanup → API received undefined as model
  2. assembleFullAgent() ordered modules as: explorationRules → contextRules → identity → ... (identity 3rd) → LLM established "rule-follower" persona from first ~500 tokens before reading personality context
  3. Chat fast path used different assembly (identity first) creating inconsistent personas between paths
  4. Router module had zero personality context ("You are a routing optimizer...") → mechanical classification without Doc Sportello awareness
  5. Edit mode missing exploration and context rules → blind file operations

Log Evidence:

error: { message: 'No models provided', code: 400 }
[ROUTER] LLM routing failed: timeout of 4000ms exceeded

Changes Made:

1. Model Configuration (config/models.js):

  • Added 'glm': 'z-ai/glm-4.6:exacto' to MODEL_PRESETS (user specified 4.6 over 4.7)
  • Added display name 'glm': 'GLM 4.6 Exacto'

2. LLM Client Cleanup (services/llmClient.js):

  • Removed duplicate MODEL_PRESETS with outdated GLM 4.7 reference
  • Changed to import from config/models.js for single source of truth:
    const { MODEL_PRESETS } = require('../config/models');

3. Assembly Reordering (personality/assemblers/index.js):

  • assembleFullAgent(): Reordered to identity-first:
    BEFORE: explorationRules → contextRules → identity → repository → capabilities → ...
    AFTER:  identity → repository → capabilities → explorationRules → contextRules → ...
    
  • Added comments explaining LLM persona formation from first ~500 tokens

4. Module Coverage Standardization:

  • assembleChat(): Added contextRules for reference resolution ("the game", "that file")
  • assembleEditMode(): Added explorationRules + contextRules for consistent file verification

5. Personality Framing (personality/core/explorationRules.js, contextRules.js):

  • Added Doc Sportello personality intros before rule content:
    // explorationRules.js
    `You dig verification, man. These exploration rules keep you grounded...`
    
    // contextRules.js
    `When someone says "the game" or "that file," you're chill about figuring out...`

6. Router Identity (personality/specialized/routing.js):

  • Changed mechanical opener to personality-aware:
    // BEFORE
    `You are a routing optimizer for a Discord bot...`
    
    // AFTER
    `You're helping Doc Sportello (a laid-back but helpful Discord bot) figure out what the user needs. Remember Doc's style: thorough verification, chill vibes.`

Assembly Order Comparison:

Stage Before After
Full Agent rules → rules → identity identity → repo → caps → rules
Chat identity → repo → caps identity → repo → caps → contextRules
Edit Mode identity → repo → fileOps → edit identity → repo → explorationRulescontextRules → fileOps → edit
Router "routing optimizer" (mechanical) "helping Doc Sportello" (personality-aware)

Expected Outcomes:

  • No more "No models provided" API errors (GLM 4.6 now valid in presets)
  • Consistent Doc Sportello personality across all code paths
  • Agent establishes identity before reading rules → personality-first, not rule-first
  • Router classifications reflect Doc's style (thorough verification, chill vibes)
  • Edit mode includes file verification rules preventing blind assumptions

Files Modified:

  • config/models.js - Added GLM 4.6 preset and display name
  • services/llmClient.js - Import MODEL_PRESETS from config (removed duplicate)
  • personality/assemblers/index.js - Reordered assembleFullAgent(), updated assembleChat(), assembleEditMode()
  • personality/core/explorationRules.js - Added personality framing intro
  • personality/core/contextRules.js - Added personality framing intro
  • personality/specialized/routing.js - Added Doc Sportello identity context

Commit: c12ac13 - "fix: identity-first prompt assembly, add glm 4.6, standardize module coverage"


Story Page Template & Structural Routing

Issue: Story pages (like krispy-peaks-affair.html) lacked consistent structure guidance. The builder agent had no reference template for scroll-reveal animations, progress bars, or noir story typography. Additionally, fallback routing couldn't detect structural transformation requests ("make it follow the same design as peanut-city").

Changes Made:

1. Story Template (personality/specialized/agentRoles.js):

  • Added comprehensive STORY_TEMPLATE (~80 lines) covering:
    • Required HTML structure: body.story-page, .story-container, .chapter[data-chapter], .paragraph
    • Required JavaScript: Intersection observer for chapter reveal, scroll progress bar
    • CSS classes: .chapter-number, .chapter-title, .whisper, .emphasis, .divider, .twist-reveal, .epilogue
    • Mobile breakpoints (768px, 480px) with scaled typography
    • Reference: src/peanut-city.html
  • Integrated STORY_TEMPLATE into builder prompt
  • Exported for use in other modules

2. Page Structure Guidelines (personality/content/pageStructure.js):

  • Expanded story page requirements:
    • body class="story-page" required
    • .chapter with reveal animation (opacity 0→1, translateY 30→0)
    • .paragraph class on all text blocks
    • Scroll progress bar JavaScript required
  • Added checklist item #12: "STORY PAGES REQUIRE: body.story-page, .chapter with reveal animation..."

3. Enhanced Fallback Routing (services/llmRouter.js):

  • Added informal file reference detection:
    // Now detects: "part3.html", "peanut-city.html" without src/ prefix
    const informalMatch = userMessage.match(/\b([\w][\w-]*\.(?:html|js|css))\b/i);
  • Added structural transformation intent detection:
    • Triggers on: "follow same design as", "match structure of", "like X.html"
    • Uses write_file instead of edit_file (full replacement, not patches)
    • Reads both source file AND reference file before rewriting
    • Example: "make krispy-peaks-affair follow peanut-city design" → read both → write_file

4. PostgreSQL Schema Fix (scripts/schema.sql):

  • Fixed tool_calls table (addresses log error about missing columns):
    • Removed event_id foreign key (standalone table now)
    • Added user_id, channel_id, session_id columns
    • Updated indexes for session-based queries

Structural Transformation Routing Example:

User: "make krispy-peaks-affair.html follow the same design as peanut-city"

Fallback Router Output:
{
  intent: "create",
  toolSequence: ["file_exists", "read_file", "read_file", "write_file"],
  parameterHints: {
    read_file: { paths: ["src/krispy-peaks-affair.html", "src/peanut-city.html"] },
    write_file: { path: "src/krispy-peaks-affair.html" },
    note: "Structural transformation - read file(s), then write_file with new structure"
  }
}

Files Modified:

  • personality/specialized/agentRoles.js - Added STORY_TEMPLATE, exported
  • personality/content/pageStructure.js - Expanded story page requirements
  • services/llmRouter.js - Informal file detection + structural transformation routing
  • scripts/schema.sql - Fixed tool_calls table schema

2026-01-01

Prompt Flow Visualizer

Feature: Created an interactive SVG-based visualization of the modular prompt system to help understand how prompts are assembled for different request types.

Location: src/prompt-flow-visualizer.html Live URL: https://bot.inference-arcade.com/src/prompt-flow-visualizer.html

Features Implemented:

  • SVG flow diagram showing request path: @mention → Router → Decision → Assembler → Modules → Output
  • 4 example flows with step-by-step animation:
    • Greeting Flow (3 modules, ~650 tokens, 72% savings)
    • Edit Flow (8 modules, ~1400 tokens, 46% savings)
    • Build Flow (4 pipeline stages, ~710 tokens, 53% savings)
    • Search Flow (8 modules + web_search priority)
  • Play/Pause/Step controls with adjustable animation speed
  • Click-to-inspect - Click any node or module to see:
    • Module name, category, line count, token estimate
    • Purpose description
    • Actual prompt content preview (~150 chars)
  • Stats panel showing module count, token usage, and savings percentage
  • Savings bar visual comparison to monolithic 1500-token prompt
  • Monolithic comparison greyed-out box showing "Wasteful" legacy approach
  • Noir terminal aesthetic matching site theme (cyan/red, Courier Prime)
  • Responsive layout for mobile viewing

Technical Implementation:

  • Pure SVG + CSS animations (no canvas or external libraries)
  • Data-driven architecture: MODULES, ASSEMBLERS, EXAMPLES objects
  • CSS class-based state management (active, highlighted classes)
  • Step indicator shows current action during animation
  • Module previews contain actual content from personality/*.js files

Files Created/Modified:

  • src/prompt-flow-visualizer.html (new, ~1200 lines)
  • projectmetadata.json (added to utilities-apps collection)

2025-12-30

Modular Prompt System Implementation

Issue: The monolithic system prompt (372 lines in personality/systemPrompt.js) was sent with EVERY LLM call regardless of pipeline stage, causing unnecessary token usage and slower response times. Different stages need different context:

  • Chat responses don't need design system details (200+ lines wasted)
  • Edit mode doesn't need full personality context (200+ lines wasted)
  • Routing doesn't need CSS class catalogs (200+ lines wasted)
  • Tool definitions duplicated across 3 files (index.js, editService.js, llmRouter.js)

Root Causes:

  • Single monolithic prompt designed for maximum coverage, not stage-specific efficiency
  • No separation of concerns (personality, tools, content guidelines all mixed together)
  • Tool definitions copy-pasted in multiple locations causing maintenance burden
  • /set-prompt command enabled runtime mutation, preventing code-based versioning
  • Legacy architecture predated understanding of pipeline stage requirements

Solution: Modular Prompt Architecture

Created focused, composable modules organized by concern and assembled on-demand based on pipeline stage:

Module Taxonomy (personality/ directory):

personality/
├── core/               # Foundation (used by all stages)
│   ├── identity.js          # Doc Sportello personality & voice (28 lines)
│   ├── capabilities.js      # High-level capability overview (30 lines)
│   └── repository.js        # URLs, file paths, inventory (45 lines)
├── tools/              # Tool definitions and usage guidelines
│   ├── toolCatalog.js       # Canonical tool definitions - SINGLE SOURCE OF TRUTH
│   ├── fileOperations.js    # File tool usage guidelines (43 lines)
│   ├── gitOperations.js     # Git/commit guidelines (24 lines)
│   └── searchGuidelines.js  # Web search patterns (25 lines)
├── content/            # Content creation guidelines (only for creation stages)
│   ├── designSystem.js      # Noir theme, colors, typography (30 lines)
│   ├── cssClasses.js        # CSS class reference (126 lines)
│   ├── mobilePatterns.js    # Interaction patterns (41 lines)
│   ├── pageStructure.js     # Required elements, hierarchy (34 lines)
│   └── components.js        # Reusable components (9 lines)
├── specialized/        # Stage-specific prompts
│   ├── routing.js           # Routing-specific context (40 lines)
│   ├── editing.js           # Edit mode workflow (24 lines)
│   └── agentRoles.js        # Content pipeline roles (architect/builder/tester/scribe)
├── assemblers/         # Module combination logic
│   └── index.js             # 9 assembler functions for each stage
└── test/               # Validation
    └── validateModules.js   # Module validation with colored output

Assembler Functions (in personality/assemblers/index.js):

  1. assembleFullAgent() - Tool execution stage: core + tools + repository (200 lines)
  2. assembleChat() - Conversation fast path: identity + capabilities + repository (105 lines)
  3. assembleEditMode() - Edit workflow: repository + fileOperations + editing (114 lines)
  4. assembleRouter() - Intent classification: routing context (40 lines)
  5. assembleArchitect() - Content planning stage (39 lines)
  6. assembleBuilder() - Content generation stage (98 lines)
  7. assembleTester() - Content validation stage (46 lines)
  8. assembleScribe() - Metadata generation stage (11 lines)
  9. assembleContentCreation() - Page/feature creation via slash commands (500 lines)

Token Savings Achieved:

Pipeline Stage Before After Reduction Daily Savings*
Full Agent (tool execution) 372 lines 200 lines 46% ~2,440 tokens
Chat (conversation) 372 lines 105 lines 72% ~8,160 tokens
Edit Mode (file editing) 398 lines† 114 lines 71% ~2,780 tokens
Router (intent classification) 40 lines‡ 40 lines Standardized ~0 tokens

*Assuming 100 daily LLM calls with typical distribution (30 chat, 20 tool, 10 edit, 5 routing) †Legacy: 372-line SYSTEM_PROMPT + 26-line EDIT_SYSTEM_SUFFIX ‡Custom routing prompt replaced with standardized module

Total estimated savings: ~13,380 tokens/day = $0.04/day = $14.60/year

Changes Made:

1. Core Modules Created (personality/core/):

  • identity.js - Extracted personality traits, voice guidelines, response formatting from systemPrompt.js lines 6, 352-367
  • capabilities.js - Extracted capability overview, intent categories, tool vs chat decision tree from lines 297-350
  • repository.js - Extracted URLs, file paths, inventory, metadata structure from lines 8-30, 266-271

2. Tool Modules Created (personality/tools/):

  • toolCatalog.js - Consolidated tool definitions from 3 sources:
    • Migrated from index.js lines ~1500-1580 (full tool array with JSON schemas)
    • Migrated from editService.js EDIT_TOOLS (edit mode subset)
    • Migrated from llmRouter.js TOOL_CATALOG (routing awareness subset)
    • Exports: .all (14 tools), .editMode (5 tools), .routingAware (tool names only)
  • fileOperations.js - Extracted file tool guidelines from systemPrompt.js lines 297-330
  • gitOperations.js - Extracted git operation guidelines from lines 332-341
  • searchGuidelines.js - Extracted web search patterns from lines 343-350

3. Content Modules Created (personality/content/):

  • designSystem.js - Noir theme, colors, typography from systemPrompt.js lines 39-63
  • cssClasses.js - Complete CSS class reference from lines 65-191
  • mobilePatterns.js - Interaction patterns (D-pad vs direct-touch) from lines 192-233
  • pageStructure.js - Required elements, layout hierarchy from lines 235-265
  • components.js - Reusable audio components from lines 273-279

4. Specialized Modules Created (personality/specialized/):

  • routing.js - Fast routing context extracted from llmRouter.js lines 132-171
  • editing.js - Edit workflow migrated from editService.js EDIT_SYSTEM_SUFFIX (lines 125-150)
  • agentRoles.js - Content pipeline roles migrated from llmClient.js ROLE_PROMPTS (lines 62-149)
    • Includes: architect, builder, tester, scribe prompts + BASE_SYSTEM_CONTEXT + TEMPLATE_PROMPT

5. Assembler System (personality/assemblers/index.js):

  • Created 9 assembler functions that combine modules based on stage requirements
  • Each assembler joins relevant modules with \n\n separator
  • Exports both assembler functions and tool definitions for direct import
  • Stage-specific assembly reduces token count by 30-70% depending on stage

6. Consumer Updates (Feature Flag Integration):

  • index.js (lines ~796, ~1674, ~1714, ~2985-2987):

    • Added USE_MODULAR_PROMPTS = process.env.USE_MODULAR_PROMPTS !== 'false' (default: true)
    • Imported assemblers conditionally
    • Tool execution: Uses assembleFullAgent() if modular, else legacy SYSTEM_PROMPT
    • Chat fast path: Uses assembleChat() if modular, else inline personality
    • Tools array: Uses MODULAR_TOOLS if modular, else legacy array
  • services/editService.js (lines 160-186):

    • Added useModularPrompts context parameter
    • Uses systemPrompt as-is if modular (already assembled), else appends EDIT_SYSTEM_SUFFIX
    • Removed duplication between modular and legacy edit prompts
  • services/llmRouter.js (lines 128-171):

    • Conditionally uses assembleRouter() if modular, else inline legacy routing prompt
    • Standardized routing context across all routing calls
  • services/llmClient.js (lines 44-75):

    • Conditionally imports personality/specialized/agentRoles if modular
    • ROLE_PROMPTS uses modular agentRoles if available, else legacy inline prompts
    • BASE_SYSTEM_CONTEXT sourced from agentRoles module when modular

7. Documentation:

  • Created personality/README.md - Complete guide to modular system with examples
  • Updated CLAUDE.md - Documented new system in "Doc Sportello Personality System" section
  • Added .env.example - Feature flag USE_MODULAR_PROMPTS=true with comment

8. Testing Infrastructure:

  • personality/test/validateModules.js - Validation script with colored terminal output
    • Tests all 16 modules export valid strings
    • Tests all 9 assembler functions produce correct output
    • Reports line counts and validates against estimates
    • Exit code 0 on success, 1 on failure

Testing Results:

$ node personality/test/validateModules.js
============================================================
   Module Validation Tests
============================================================

[CORE]
✓ core/identity.js (28 lines, ~50 expected)
✓ core/capabilities.js (30 lines, ~40 expected)
✓ core/repository.js (45 lines, ~60 expected)

[TOOLS]
✓ tools/toolCatalog.js
✓ tools/fileOperations.js (43 lines, ~60 expected)
✓ tools/gitOperations.js (24 lines, ~40 expected)
✓ tools/searchGuidelines.js (25 lines, ~30 expected)

[CONTENT]
✓ content/designSystem.js (30 lines, ~80 expected)
✓ content/cssClasses.js (126 lines, ~60 expected)
✓ content/mobilePatterns.js (41 lines, ~70 expected)
✓ content/pageStructure.js (34 lines, ~80 expected)
✓ content/components.js (9 lines, ~50 expected)

[SPECIALIZED]
✓ specialized/routing.js (40 lines, ~60 expected)
✓ specialized/editing.js (24 lines, ~50 expected)
✓ specialized/agentRoles.js

[ASSEMBLERS]
✓ assemblers/index.js

Results: 16/16 tests passed

Testing assembler functions...
✓ assembleFullAgent() → 200 lines
✓ assembleRouter() → 40 lines
✓ assembleEditMode() → 114 lines
✓ assembleChat() → 105 lines
✓ assembleArchitect() → 39 lines
✓ assembleBuilder() → 98 lines
✓ assembleTester() → 46 lines
✓ assembleScribe() → 11 lines

Syntax validation:

✓ index.js syntax valid
✓ editService.js syntax valid
✓ llmRouter.js syntax valid
✓ llmClient.js syntax valid

Integration test:

$ node -e "const assemblers = require('./personality/assemblers'); console.log('Tools:', assemblers.tools.length);"
✓ Assemblers loaded
  - Full Agent: 200 lines
  - Chat: 105 lines
  - Edit Mode: 114 lines
  - Router: 40 lines
  - Tools count: 14

Expected Outcomes:

Performance:

  • 10-25% faster responses due to reduced prompt processing overhead
  • 30-70% token reduction per stage translates to faster API calls
  • Chat responses especially fast (72% reduction = ~267 tokens saved per call)

Cost:

  • Daily savings: ~13,380 tokens = $0.04/day at Sonnet 4.5 pricing ($3/MTok input)
  • Annual savings: ~$14.60 (modest but scales with usage)
  • Real benefit: maintainability and clarity, not just cost

Maintainability:

  • Update design system in ONE place (content/designSystem.js) vs hunting through 372-line monolith
  • Tool definitions in ONE place (tools/toolCatalog.js) vs 3 duplicate locations
  • Each module focused and testable independently
  • Git diffs show exactly what changed (e.g., "updated mobile patterns" vs "modified system prompt")

Clarity:

  • Explicit dependencies: "chat needs identity + capabilities" vs implicit in monolith
  • Stage requirements documented in assembler function names
  • Easier onboarding: new contributors see organized modules vs wall of text

Safety:

  • Feature flag USE_MODULAR_PROMPTS=false instantly reverts to legacy system
  • Legacy systemPrompt.js preserved unchanged for rollback
  • All consumers check feature flag before using modular system
  • Zero breaking changes: bot works identically with modular prompts

Files Modified:

  • New files (17):

    • personality/core/ (3 files: identity, capabilities, repository)
    • personality/tools/ (4 files: toolCatalog, fileOperations, gitOperations, searchGuidelines)
    • personality/content/ (5 files: designSystem, cssClasses, mobilePatterns, pageStructure, components)
    • personality/specialized/ (3 files: routing, editing, agentRoles)
    • personality/assemblers/ (1 file: index.js)
    • personality/test/ (1 file: validateModules.js)
    • personality/README.md (1 file: documentation)
  • Modified files (5):

    • index.js - Feature flag integration, assembler imports, tool execution & chat updates
    • services/editService.js - Modular edit mode prompt assembly
    • services/llmRouter.js - Modular routing prompt assembly
    • services/llmClient.js - Modular agent role prompt assembly
    • CLAUDE.md - Updated "Doc Sportello Personality System" section with modular docs
    • .env.example - Added USE_MODULAR_PROMPTS=true feature flag
  • Preserved files (1):

    • personality/systemPrompt.js - Unchanged, used when USE_MODULAR_PROMPTS=false

Migration Notes:

  • Backward compatible: Feature flag defaults to true (modular), can be set to false for instant rollback
  • No data migration: No database changes, no file moves, purely code-based
  • Railway deployment: Will auto-deploy with modular prompts enabled by default
  • Monitoring: Watch for any regressions in response quality or tool selection accuracy for 24-48 hours
  • Rollback procedure: Set USE_MODULAR_PROMPTS=false in Railway env vars, redeploy
  • Future work: Remove legacy systemPrompt.js and feature flag after 30 days of stable operation
  • Command removed: /set-prompt disabled (prompts now code-based, version controlled)

Commit: 2896a53 - "add modular prompt system with 46-72% token savings across pipeline stages"


2025-12-30

Pattern-Based Game Controls

Issue: Game generation pipeline included D-pad arrow controls in ALL arcade games, even when inappropriate for the interaction pattern. Touch-based games (memory match, simon, clicker) received unused D-pad controls, while typing games got directional controls instead of keyboard listeners. This created confusing UX with non-functional controls.

Root Causes

  • Binary isGame check treated all arcade-game types identically, forcing D-pad on all games
  • GAME_SCAFFOLD template hardcoded D-pad controls with comment "REQUIRED for all games"
  • Builder agent used single control requirement for all games: "MUST include mobile controls"
  • Tester validation used OR condition (mobile-controls OR touch) allowing D-pads in tap games
  • No distinction between directional movement games and direct-touch interaction games

Changes Made

Implemented 5-pattern taxonomy system for context-aware control selection:

Pattern Taxonomy:

  • directional-movement: D-pad controls (snake, platformer, maze, frogger)
  • direct-touch: Canvas/element touch or keyboard (memory match, clicker, typing games)
  • hybrid-controls: D-pad + action buttons (tower defense, angry birds)
  • form-based: Form inputs with localStorage (calculators, utilities)
  • passive-scroll: No controls (stories, letters, recipes)

Template System Refactor (config/templates.js):

  • Added 5 modular CONTROLS_* component objects (html + js patterns)
  • Created buildGameTemplate(pattern, options) function for pattern-aware assembly
  • Added helpers: getControlsForPattern(), getPatternDescription(), inferPatternFromContentType()
  • Marked old scaffolds (GAME_SCAFFOLD, etc.) as DEPRECATED
  • Updated TEMPLATE_PROMPT with pattern-specific examples for LLM

Agent Pipeline Updates:

  • Architect (services/llmClient.js): Expanded prompt with pattern taxonomy, JSON schema requires interactionPattern field
  • Builder (agents/gameBuilder.js): Pattern-aware prompts with control requirements per pattern
  • Tester (agents/gameTester.js): Pattern-specific validation with critical error codes
    • UNWANTED_DPAD: Direct-touch pattern has D-pad (critical failure)
    • MISSING_DPAD: Directional-movement pattern missing D-pad (critical failure)
    • MISSING_TOUCH_HANDLERS: Direct-touch missing touch/click listeners
    • UNWANTED_GAME_CONTROLS: Passive content has game controls

Validation Improvements (index.js):

  • Replaced binary context.isGame checks with pattern-specific validation
  • Updated validateHTMLContent(): checks pattern-control alignment
  • Updated calculateQualityScore(): awards +10 for correct controls, -15 penalty for mismatches
  • System prompt (personality/systemPrompt.js): Pattern-based control guidance

Testing

Built test suite verifying all 5 patterns:

  • ✅ Snake game (directional-movement): Includes D-pad with handleDirection()
  • ✅ Memory game (direct-touch): NO D-pad, has canvas touch handlers
  • ✅ Tower defense (hybrid-controls): Has both D-pad AND action button
  • ✅ Letter (passive-scroll): NO game controls at all
  • ✅ Invalid pattern: Falls back to direct-touch safely

All tests passed. Template lengths: directional-movement (2314 chars), direct-touch (1974 chars), hybrid (2494 chars), passive (866 chars).

Expected Outcomes

Before:

  • Memory match: D-pad controls (unused) ❌
  • Typing game: D-pad controls (wrong) ❌
  • Snake: D-pad controls (correct) ✅

After:

  • Memory match: direct-touch → no D-pad, canvas touch handlers ✅
  • Typing game: direct-touch → no D-pad, keyboard listeners ✅
  • Snake: directional-movement → D-pad with handleDirection() ✅
  • Stories: passive-scroll → no controls at all ✅

Files Modified

  • config/templates.js (+389 lines: 5 control components, buildGameTemplate(), helpers, updated prompts)
  • services/llmClient.js (Architect/Builder/Tester prompts with pattern taxonomy)
  • agents/gameBuilder.js (pattern-aware control requirement mapping)
  • agents/gameTester.js (pattern-specific validation with 5 critical error codes)
  • personality/systemPrompt.js (pattern-based mobile controls guidance)
  • index.js (pattern-aware validateHTMLContent() and calculateQualityScore())

Migration Notes

  • Backward compatible: existing games unaffected (static HTML files)
  • Default fallback: direct-touch if pattern missing/invalid
  • Old scaffolds marked DEPRECATED, kept for rollback safety
  • Pattern logged in build metadata for debugging

2025-12-14

Issue:

Recent changes introduced several stability and usability fixes across the bot and site. Unit tests flagged a missing devlog artifact, and we needed to verify LLM model switching, narration (TTS) reliability, and game loop/physics behavior.

Root Causes

  • Model switching used originalModel inside a try block, risking scope/TDZ issues when restoring in finally.
  • Chrome TTS workaround caused intermittent freezes; implementation was too complex and brittle.
  • Snake game loop used timing that could produce runaway or stale timers; direction changes needed a safer buffer.
  • Basket physics lacked clear guards for rim pass‑through, allowing edge‑case double scores.
  • Missing DEVLOG.md caused CI/test failures in devlog checks.

Changes Made

  • Moved originalModel declaration before try blocks wherever the model is temporarily switched, ensuring it is always available for restore in finally.
  • Reverted narration to a simpler, robust Web Speech API approach; removed Chrome‑specific timeout hack.
  • Snake Retriever: refactored loop to a setTimeout cadence tied to current speed, with proper cancelation; added nextDirection guard to prevent instant 180° turns.
  • Basket Jam: refined backboard/rim collisions and added passedThroughRim logic to score only on true downward pass‑through.
  • Centralized back button styles in page-theme.css and adjusted story container centering for wide viewports.
  • Added this DEVLOG.md with required sections to satisfy repository tests.

Testing

  • Ran repository test suites:
    • Edit tool: 34/34 passing
    • Read‑only filesystem tools: 13/13 passing
    • Search tool: expected matches found
    • Devlog tests now pass with this file present
  • Manually inspected diffs for index.js, page-theme.css, src/peanut-city.html, src/snake-retriever.html, src/basket-jam.html.
  • Performed quick syntax validation on index.js (Syntax OK).

Files Modified

  • index.js (model switching scoping, interaction fallbacks, context manager)
  • page-theme.css (home/back button, layout polish)
  • src/peanut-city.html (center container, simplified TTS)
  • src/snake-retriever.html (game loop + input safety)
  • src/basket-jam.html (physics + scoring)
  • DEVLOG.md (this entry)