This repository uses gemini-3-flash-preview exclusively; any other model versions are prohibited.
- Onboarding is now a host-enforced runtime lifecycle, not a hardcoded React stepper.
- The host owns only:
- lifecycle state persistence (
.neural/onboarding-state.json) - onboarding policy/safety gates
- onboarding API endpoints
- routing first-run users into
onboarding_app
- lifecycle state persistence (
- The model owns onboarding UI and conversational flow through
emit_screen. - Filesystem skills remain canonical. During incomplete onboarding, host prompt policy gives onboarding precedence and expects
onboarding_skill. - Tool policy during required onboarding is allowlist-only:
emit_screen,onboarding_get_state,onboarding_set_workspace_root,save_provider_key,onboarding_set_model_preferences,read,write,edit,onboarding_complete. AppSkillrecords are deprecated migration debt and must not drive runtime behavior selection.
- Purpose: Neural Computer is an AI-native desktop simulation where every window’s content is authored by
gemini-3-flash-preview. The system must stay responsive while authoring fresh content per interaction, honoring user-configured style settings, local statefulness, and the orchestrated set of “skills” described below. - Primary goals: (1) Keep the experience deterministic and observable without using RL; (2) surface selected skills and interventions based on interaction telemetry; (3) continuously self-improve the execution reliability of each skill via heuristic scoring, promotion/demotion, and instrumentation.
- Success criteria: each interaction selects a coherent skill set, generates a prompt that respects system rules, and can fallback safely when Gemini responses hit error cases. Self-improvement is measured by rising skill execution success rates, cache hit ratios, and falling request latencies.
- Non-goals: There is no PPO/ILM-style training loop, no model fine-tuning, and no policy gradient. Improvement happens through deterministic, statistics-driven coordination, not data-efficient gradient updates.
- Style: Layered + event-driven vertical slices (UI → Interaction Manager → Skill Coordinator → Persistence) with a command/response loop pipelined through a
SelfImprovementCoordinator. - Justification: This style isolates UI concerns, keeps Gemini API calls centralized, and gives the self-improvement subsystem full visibility over interaction history, telemetry, and skill selection. Event-driven slices ensure each user action deterministically triggers skill retrieval, scoring, and optional promotion/demotion, allowing precise instrumentation.
- Modules:
react-ui/: handles icon grid, windows, global prompt, and rendersGeneratedContent. It emitsInteractionData.InteractionManager: canonicalizes user clicks/prompts, enforces history length, and feeds the self-improvement loop.SkillRegistry: holds descriptors, metadata, and state for every capability we can invoke (search summarization, critique, layout drafting, etc.).SelfImprovementCoordinator: orchestrates retrieval, scoring, Gemini invocation, evaluation, caching, and promotion/demotion decisions.Persistence Layer: IndexedDB/localStorage-backed stores for skill descriptors, interaction telemetry, and metrics.Observability: collects instrumentation, surfaces failures, and exposes dashboards.
- Skill Taxonomy (non-exhaustive):
- Context Curators – synthesize condensed interaction summaries (
context_summary,user_intent_sensor). - Retrieval Helpers – fetch external facts via
google_searchtool or pre-baked knowledge (fact_lookup,reference_picker). - Critique & Safety Checkers – run deterministic heuristics to detect hallucinations or safety blocks (
safety_guard,prompt_sanitizer). - UI Schedulers – orchestrate Gemini’s output formatting for Desktop, Settings, or other apps (
layout_builder,settings_form_generator). - Action Executors – handle post-generation actions (e.g., telemetry updates, cache writes, background pre-generation).
- Context Curators – synthesize condensed interaction summaries (
- Interfaces: each skill implements a descriptor that anchors retrieval, scoring, and failure handling. Example:
export type SkillCategory = | 'context' | 'retrieval' | 'critique' | 'layout' | 'action'; export interface SkillDescriptor { id: string; name: string; category: SkillCategory; shortDescription: string; requiredHistoryDepth: number; maxLatencyMs: number; promotionThreshold: number; // [0,1] demotionThreshold: number; // [0,1] invocationTemplate: (ctx: RetrievalContext) => Promise<SkillCandidate>; failureModes: string[]; }
- Failure modes:
- Missing
SkillDescriptor→ fallback to conservativelayout_builder. - Retrieval candidate starvation → degrade to default skill list.
- Tool rate limit → mark skill as temporarily demoted and emit telemetry.
- Missing
src/components/– existing UI atoms.src/services/– Gemini client plus newselfImprovement/folder containingskillRegistry.ts,selfImprovementCoordinator.ts, andtelemetry.ts.src/storage/– adapters for IndexedDB/localStorage (keyed by schema names in section 7).src/types/– shared TypeScript contracts for descriptors, policies, and instrumentation metrics.scripts/– optional migration utilities for future schema changes (e.g., skill promotions).- Rules: new self-improvement code must live under
src/services/selfImprovement/*; UI code remains incomponents. Shared contracts go undertypes/.
- User interaction emitted →
InteractionManagerstandardizes it, trims history, updates config, and stores event ininteraction_historystore. - **SelfImprovementCoordinator
is signaled** with(interactionHistory, styleConfig)`. - Retrieval policy:
Score factors: match to
class SkillRegistry { async retrieveCandidates(ctx: RetrievalContext): Promise<SkillCandidate[]> { const freshness = Date.now() - ctx.lastSkillInvocation; const basePool = this.skillDescriptors.filter(sd => sd.requiredHistoryDepth <= ctx.history.length); return basePool .map(sd => ({ descriptor: sd, score: scoreSkill(sd, ctx), freshness })) .sort((a, b) => b.score - a.score) .slice(0, ctx.maxCandidates); } }
appContext,historySummary, penalty formaxLatencyMsbreach, telemetry-driven trust. - Scoring/evaluation loop:
SelfImprovementCoordinator.runEvaluationLoopiterates:- Pick top candidate.
- Assemble prompt harness with
systemPrompt,historySummary, and context. - Invoke
streamAppContentusinggemini-3-flash-preview. - Evaluate response via heuristics (token safety checks,
toolusage, generation length stability). - Record outcome in
skill_execution_eventsandmetrics_batch.
- Promotion/demotion policy:
- On consecutive successes (
score >= descriptor.promotionThreshold) the skill earnssuccessStreak++. On failure due to safety blocks or tool missing, incrementfailureStreak. if successStreak >= 3 and trust < 0.95 ⇒ trust += 0.05.if failureStreak >= 2 ⇒ trust = max(trust - 0.1, 0.3)and the skill is flagged fordemotionCooldown.- Skill trust influences future
scoreSkilland caches asskillTrustIndex.
- On consecutive successes (
- Boundaries:
- UI never directly touches storage; it talks to
InteractionManager. - Self-improvement logic owns all Gemini calls and caches (no duplication).
- Tool calls (e.g., Google search) are managed by
SelfImprovementCoordinatorand respect the same caching/promotion logic.
- UI never directly touches storage; it talks to
- Authentication/Secrets: Gemini API key and search keys stay in
.env.local. Secrets are injected at build time and never checked into source. - Logging & Observability:
telemetry.tsexposesemitEvent(kind, payload)which writes to console andmetrics_batchstore. Each emission defines schema{ timestamp, level, correlationId, data }. - Error handling:
SelfImprovementCoordinatorcatches critical errors, setsllmContentfallback message, demotes offending skill, and surfaces error states viaGeneratedContentprops. - Instrumentation metrics: (Referenced in section 7)
selfImprovementCycleDuration(ms)
skillExecutionSuccessRate(per skill)skillDemotions/skillPromotions(counts)cacheHitRateforstatefulnesspathtoolLatency(Google search, etc.)safetyBlockRate(finish reasons from Gemini)userInteractionToResponseLatencystyleConfigChangeFrequencyfailedStreamRetriesinteractionHistoryOverflow
- Each metric logs its source and a
failureModedescription when a threshold is breached (e.g.,toolLatency > maxLatencyMstriggerslatencyFailurestate).
- Storages:
skill_descriptors(IndexedDB table)field type notes idstring (PK) matches descriptor above categorystring from taxonomy trustnumber [0,1], seeds promotion lastInvokedAtnumber epoch successStreaknumber consecutive pass count failureStreaknumber consecutive fail count demotionCooldownUntilnumber timestamp interaction_history| field | type | notes | |interactionId| string | usesInteractionData.id| |vectorSummary| string | trimmed JSON summary | |appContext| string | used for retrieval policies | |timestamp| number | for TTL eviction |skill_execution_eventsskillId,status(success/fail),latencyMs,toolUsed,safetyBlockReason?
metrics_batch- batched instrumentation flush every 30s or on window unload.
- Retrieval context contract:
export interface RetrievalContext { history: InteractionData[]; styleConfig: StyleConfig; lastSkillInvocation: number; maxCandidates: number; appContext?: string; }
- Gemini Integration: All prompt assembly and streaming happens through
services/geminiService.ts. The coordinator wrapsstreamAppContentwith optionalthinkingConfigadjustments. - Search Tool:
services/searchService.tsis called exclusively when a skill candidate hastoolRequest: { name: 'google_search' }. Tool failures feed into promotion/demotion decisions.
- Vite + React front-end served via
npm run devin development,npm run buildfor production. .env.localholdsAPI_KEY,BING_API_KEY,GOOGLE_SEARCH_API_KEY,GOOGLE_SEARCH_CX. Build pipeline should validate these keys before bootstrappingSelfImprovementCoordinator.- CLI/test runner
node scripts/validate-self-improvement.jscan simulate interactions and verify metric thresholds. - Environments share the same storage schema; only
styleConfigdefaults vary (e.g., QA usesspeedMode: 'fast').
- No RL, heuristics-only: Deterministic evaluation ensures the simulation remains auditable.
- Skill descriptors drive everything: Because descriptors normalize invocation templates, scoring, and failure handling, new abilities can plug in by defining metadata plus
invokelogic. - IndexedDB first for persistence: This keeps telemetry and trust indices local, quick to read/write, and resilient to page reloads.
- Promotion/demotion via trust index: Promoting on
successStreakavoids oscillating around transient metric blips; demoting onfailureStreakensures repeated safety blocks reduce a skill’s chance to be selected. - Telemetry instrumentation as a first-class signal: Every Gemini invocation records metrics so observed degradation immediately affects retrieval scores.
flowchart LR
User[User Interaction] --> UI[React Window + Icons]
UI -->|emit interaction| InteractionManager
InteractionManager -->|signals| Coordinator[SelfImprovementCoordinator]
Coordinator --> SkillRegistry
Coordinator --> Gemini[gemini-3-flash-preview]
Coordinator --> Storage[IndexedDB / localStorage]
Storage --> Metrics[Instrumentation]
flowchart TB
subgraph UI Layer
A[Window/UI] --> B[GeneratedContent Renderer]
end
subgraph Self-Improvement
B --> C[InteractionManager]
C --> D[SkillRegistry & Retrieval Policy]
D --> E[Evaluation Loop & Scoring]
E --> Gemini
E --> F[Telemetry + Promotion/Demotion]
end
subgraph Persistence
F --> G[skill_descriptors]
F --> H[interaction_history]
F --> I[metrics_batch]
end
- Never bypass
SelfImprovementCoordinatorto callstreamAppContent. - Do not cache Gemini outputs without validating against the trust index.
- Avoid storing secrets in version control or emitting them via telemetry.
- Never execute Gemini calls without
thinkingConfignormalized to the configuredspeedMode.
- Should skill trust decay automatically when the app is idle for >6 hours?
- How soon should
promotionThresholdanddemotionThresholdbe tunable via settings without breaking cache assumptions? - Is there a need for remote sync of
skill_descriptorsfor multi-device statefulness?