NeuralForge — Roadmap & Status Tracker

Version History

v1.0 — Core Training Engine (Complete)

v2.0 — Feature Platform (Complete)

Six major features added:

E: LR Scheduler — Cosine annealing with linear warmup
- --warmup N, --lr-min, --lr-schedule cosine
- LR displayed in dashboard, included in step JSON
F: Data Pipeline — Multi-shard, shuffle, train/val split
- --val-data, --val-every N, --shuffle
- Validation loss tracked and charted
C: Live Charts — EMA smoothing, TFLOPS chart, val loss overlay
- EMA toggle (alpha=0.98), chart window picker (All/500/1K/2K)
- TFLOPS over time, validation loss dashed overlay
D: Text Generation — Autoregressive inference with sampling
- neuralforge generate --prompt "..." --temperature 0.8 --top-p 0.9
- Streaming token output via NDJSON
- GenerateView in app with parameter controls
A: LoRA Fine-Tuning — Low-rank adaptation
- Rank 4-64, configurable alpha, target selection (Q/K/V/O)
- Tiny checkpoints (~2MB), merge-on-export support
B: Multi-Model Support — Runtime dimensions
- ModelConfig replaces compile-time #defines
- All MIL generators and CPU ops parameterized

Additional v2.0 work:

Security audit (input validation, bounds checking, NDJSON escaping)
Compile timer UX (orange banner with seconds counter)
Expanded test suite (109 CLI + 119 Swift tests)

v2.1 — Enterprise Foundations (Complete)

P0: Fast Tokenizer ✅

Replace O(n^2) BPE with priority queue (max-heap) algorithm — O(n log n)
Target: tokenize 1MB text in <1 second — achieved: 936ms for 1MB
Maintain compatibility with existing tokenizer.bin format — all 112 tests pass
Speed tests added: 10K (8.7ms), 100K (86ms), 1M (936ms)
CLI tokenize command now works on large files (previously hung on >10KB)
Status: Complete.

P1: Document Ingestion Pipeline ✅

P1: Audit Log ✅

Append-only JSONL log file (~/Library/Logs/NeuralForge/audit.jsonl)
Log: training start/stop, config used, checkpoint saves, exports, generation
Log: who ran what, when, with which data (user, timestamp, model, data paths)
Tamper detection — SHA-256 hash chain (prev_hash → hash per entry)
Hash chain verification function (nf_audit_verify) with tamper location detection
Convenience functions: nf_audit_training_start/stop, checkpoint, export, generate
11 CLI tests + 11 Swift tests covering chain integrity, tamper detection, format validation
Status: Complete.

P2: Real Base Models ✅

Model registry with 5 built-in models (SmolLM 135M/360M/1.7B, TinyLlama 1.1B/1.1B-base)
neuralforge models CLI command (text + JSON output)
neuralforge download CLI command with streaming NDJSON progress
Python converter: HuggingFace safetensors → llama2.c format (convert_hf.py)
GQA → MHA KV head expansion for cross-architecture compatibility
Tokenizer conversion (HF tokenizer.json → tokenizer.bin)
Model card JSON metadata (model_card.json per download)
ModelCardView in app: browse models, download with progress, architecture details
13 CLI tests + 8 Swift tests covering registry, search, JSON emission, download events
Status: Complete.

P2: LLM Assistant Integration ✅

Claude API client (NFIntelligence.swift) — async URLSession, rate limiting (5 req/min), retry
API key management — macOS Keychain storage (save/load/delete), settings UI
Training assistant chat view (AssistantView.swift) — full chat UI with message history
System prompt with live training context (model info, loss curve, config, TFLOPS)
Auto hyperparameter suggestions — analyzes setup, returns structured JSON, one-click apply
Generated text evaluation — fluency/coherence/creativity scoring with grammar analysis
Privacy-first design: only metadata sent, never weights or training data, 100% optional
12 Swift tests covering message types, JSON parsing, context building, rate limiting
Status: Complete.

v3.0 — Collaboration (Complete)

Centralized Audit Dashboard ✅

Team Sync Service ✅

Cloud Sync Backend (Future)

CloudKit or S3 backend for remote sync
Conflict resolution for concurrent training

Compliance Report Generation ✅

Multi-User Audit Aggregation

Web dashboard for audit log aggregation across machines
Multi-user compliance reporting with merged logs

Distributed ANE Compute ✅

v4.0 — Productivity & Insights (Complete)

Settings & Preferences ✅

Dedicated SettingsView.swift with 4-tab layout (General, Training, API Keys, About)
CLI binary path management with browse, auto-detect, and status indicator
Default training hyperparameters (steps, LR, accumulation, checkpoint, grad clip, seed)
Default scheduler settings (warmup, LR schedule, shuffle, LoRA rank)
API key management — Claude API + HuggingFace token via macOS Keychain
Export format defaults (GGUF, llama2c, CoreML)
Auto-save interval and max history entries settings
About tab with version info, CLI status, and feature summary
12 Swift tests covering defaults, ranges, identifiers, options
Status: Complete.

Training Run History & Experiment Tracker ✅

Model Evaluation Benchmarks ✅

v5.0 — Polish & Production Readiness (Complete)

Onboarding Wizard ✅

Menu Bar Integration ✅

MenuBarManager.swift — @MainActor singleton for training status
Real-time tracking: step, total, loss, best loss, TFLOPS, ms/step
Progress percentage and ETA calculation with timer-based elapsed tracking
Dynamic menu bar icon (bolt.fill when training, cpu when idle)
MenuBarView with training metrics grid (loss, TFLOPS, ms/step, elapsed, ETA)
Idle state display with "Open NeuralForge" and "Quit" actions
MenuBarExtra scene with .window style in app entry point
10 Swift tests covering progress, ETA, formatting, icons, loss tracking
Status: Complete.

Quantization Service ✅

QuantizationService.swift — GGUF quantization and CoreML conversion pipeline
8 quantization types: F16, Q8_0, Q5_1, Q5_0, Q4_1, Q4_0, Q3_K_M, Q2_K
Per-type metadata: bits/weight, quality rating (1-10), descriptions
Size estimation: estimateSize(modelParams:quantType:) and formatSize()
QuantizationJob model with status tracking (pending/running/success/failed)
CoreMLConfig with compute unit selection (All/CPU+GPU/CPU) and precision (F16/F32)
QuantizationService @MainActor singleton: quantizeGGUF(), convertCoreML()
ExportView updated with quantization picker, size estimates, export history
18 Swift tests covering types, ordering, size estimation, formatting, jobs
Status: Complete.

Generate → Evaluate Pipeline ✅

Full eval pipeline in AssistantView.requestEvaluation()
Auto-detect tokenizer from model directory (tokenizer.bin, tokenizer.model)
Generate 3 text samples with diverse prompts via CLIRunner.generate()
Evaluate samples via Claude API (NFIntelligence.evaluateGeneratedText)
Display eval report and collected samples in AssistantView
9 Swift tests covering prompts, tokenizer detection, path construction, params
Status: Complete.

Bug Fixes (15) ✅

Fixed 3 HIGH bugs: orphan CLIRunner, swallowed taps, broken selection checkboxes
Fixed 5 MEDIUM bugs: @StateObject → @ObservedObject for singletons, CSV export filtering, tokenizer auto-detection, SyncDashboardView status enum matching
Fixed 4 LOW bugs: deprecated onChange, sync config save, unused env object, eval stub
Status: Complete. All 15 bugs verified fixed, BUILD SUCCEEDED.

App Entry Point ✅

@AppStorage("onboardingComplete") conditional routing
Dynamic defaultSize based on onboarding state
MenuBarExtra scene with MenuBarView and dynamic status icon/title
EnvironmentObject injection for projectManager + cliRunner on all views
8 Swift tests covering routing, window sizing, env objects
Status: Complete.

v5.1: Deferred Platform Features ✅

CloudKit/S3 Remote Sync ✅

CloudSyncProvider protocol (upload, download, list, delete, testConnection)
S3SyncProvider with AWS Signature V4 (HMAC-SHA256), presigned URLs
CloudKitSyncProvider with CKContainer/CKDatabase/CKAsset (iCloud private DB)
CloudSyncConfig (Codable) with S3/CloudKit settings, Keychain credential storage
CloudSyncManager (@MainActor singleton): upload/download/list/sync/testConnection
12 Swift tests covering config, errors, URL construction, credential handling
Status: Complete.

Gradient Aggregation Protocol ✅

GradientMessage wire protocol (Codable): assignWork, gradientReady, aggregated, heartbeat, syncCheckpoint
AggregationConfig: AllReduce, ParameterServer, GossipProtocol strategies
StragglerPolicy: Wait, Skip, Timeout modes
GradientAggregator (@MainActor singleton): coordinator/worker modes, all-reduce averaging, ring-reduce
GradientMetrics (ObservableObject): rounds, throughput, straggler/failure counts, rolling averages
Gradient compression (threshold-based sparsification) and checksum verification
15 Swift tests covering strategies, metrics, compression, ring topology
Status: Complete.

Multi-User Audit Web Dashboard ✅

WebDashboardConfig (Codable): port, bind address, auth token, refresh interval
AuditAggregator: local log scanning, multi-machine sync directory scanning, entry merging
AuditAPIHandler: HTTP request parsing, 6 REST routes (/, /api/entries, /api/stats, /api/verify, /api/machines, /health)
Full HTML dashboard with dark theme, stats cards, filter bar, audit entry table, auto-refresh
AuditWebServer (@MainActor singleton): NWListener-based HTTP server, CORS support, bearer token auth
Thread-safe connection handling with ObjectIdentifier-based tracking
18 Swift tests covering config, URL generation, request parsing, response serialization, auth
Status: Complete.

v5.2: Hardening & Polish ✅

XCUITests (End-to-End UI Automation) ✅

XCUITest target added to Xcode project (NeuralForgeUITests)
22 UI test cases covering onboarding flow, main view, project creation, settings, menus, window sizing
Launch argument support for -onboardingComplete to test both onboarding and main flows
Accessibility validation tests
Status: Complete. UI test target builds and compiles.

Compiler Warning Fixes ✅

Fixed CLIRunner.swift unused [weak self] capture in CoreML export callback
Fixed ComputeClusterService.swift non-exhaustive switch on NWTXTRecord.Entry
Fixed BenchmarkService.swift unused batchCount variable in eval callback
Fixed AppIcon.appiconset — added 3 unassigned children (64x64, 64x64@2x, 1024x1024) to Contents.json
Status: Complete. Zero warnings on clean build.

README / User Docs Update ✅

Comprehensive README rewrite with all v5.x features
Updated test counts (508 total: 152 CLI + 356 Swift)
Added generate command documentation and examples
Added macOS app feature list (16 features)
Updated architecture diagram (39 source files, UITests)
Status: Complete.

v6.0: Platform Extensions ✅

Training Profiles (Save/Load Config Presets) ✅

TrainingProfile model (Codable): name, description, config, tags, lastUsed
5 built-in presets: Quick Test, Standard, Long Run, LoRA Fine-Tune, Conservative
TrainingProfileService (@MainActor singleton): CRUD, search, filter by tag, recent tracking
Profile diff computation — show config differences between two profiles
Apply profile to project — one-click config swap
Create profile from project — extract current config into reusable preset
Import/export profiles as JSON for sharing across machines
Duplicate profiles with auto-naming
15 Swift tests covering presets, serialization, search, diff, apply, recent tracking
Status: Complete.

Drag & Drop Data Ingestion ✅

Webhook Notifications (Slack/Discord/Generic) ✅

MLX Backend (Metal GPU Alternative) ✅

Test Coverage

Component	Tests	Last Verified
CLI (test_cli.m)	152	2026-03-07
Swift (NeuralForgeTests.swift)	416	2026-03-07
Xcode build (43 source files)	SUCCEEDED	2026-03-07
Real training (50 steps)	PASSED	2025-03-07
Real generation (100 tokens)	PASSED	2025-03-07
CLI tokenize (45KB file)	PASSED	2025-03-07

Known Issues

Issue	Severity	Status
~~BPE tokenizer O(n^2), hangs on >10KB~~	~~High~~	Fixed — replaced with O(n log n) heap
Training data may be Git LFS placeholder (15 bytes)	Medium	Workaround: regenerate with Python
~~Tokenizer hangs CLI `tokenize` command on large files~~	~~High~~	Fixed — same fix, 1MB in <1s
First ANE compile takes 20-30s (no visible progress)	Low	Fixed — compile timer added

FilesExpand file tree

ROADMAP.md

Latest commit

History

ROADMAP.md

File metadata and controls

NeuralForge — Roadmap & Status Tracker

Version History

v1.0 — Core Training Engine (Complete)

v2.0 — Feature Platform (Complete)

v2.1 — Enterprise Foundations (Complete)

P0: Fast Tokenizer ✅

P1: Document Ingestion Pipeline ✅

P1: Audit Log ✅

P2: Real Base Models ✅

P2: LLM Assistant Integration ✅

v3.0 — Collaboration (Complete)

Centralized Audit Dashboard ✅

Team Sync Service ✅

Cloud Sync Backend (Future)

Compliance Report Generation ✅

Multi-User Audit Aggregation

Distributed ANE Compute ✅

v4.0 — Productivity & Insights (Complete)

Settings & Preferences ✅

Training Run History & Experiment Tracker ✅

Model Evaluation Benchmarks ✅

v5.0 — Polish & Production Readiness (Complete)

Onboarding Wizard ✅

Menu Bar Integration ✅

Quantization Service ✅

Generate → Evaluate Pipeline ✅

Bug Fixes (15) ✅

App Entry Point ✅

v5.1: Deferred Platform Features ✅

CloudKit/S3 Remote Sync ✅

Gradient Aggregation Protocol ✅

Multi-User Audit Web Dashboard ✅

v5.2: Hardening & Polish ✅

XCUITests (End-to-End UI Automation) ✅

Compiler Warning Fixes ✅

README / User Docs Update ✅

v6.0: Platform Extensions ✅

Training Profiles (Save/Load Config Presets) ✅

Drag & Drop Data Ingestion ✅

Webhook Notifications (Slack/Discord/Generic) ✅

MLX Backend (Metal GPU Alternative) ✅

Test Coverage

Known Issues