Skip to content

Latest commit

 

History

History
442 lines (382 loc) · 23.8 KB

File metadata and controls

442 lines (382 loc) · 23.8 KB

NeuralForge — Roadmap & Status Tracker

Version History

v1.0 — Core Training Engine (Complete)

  • ANE forward/backward pass pipeline
  • Adam optimizer with gradient accumulation
  • Checkpoint save/resume (survives exec() restarts)
  • NDJSON app-CLI communication protocol
  • SwiftUI macOS app with live dashboard
  • BPE tokenizer (encode + decode)
  • CLI commands: train, tokenize, export, info, benchmark
  • GGUF export (for llama.cpp)
  • CoreML export
  • Basic test suite (43 CLI + 32 Swift)

v2.0 — Feature Platform (Complete)

Six major features added:

  • E: LR Scheduler — Cosine annealing with linear warmup

    • --warmup N, --lr-min, --lr-schedule cosine
    • LR displayed in dashboard, included in step JSON
  • F: Data Pipeline — Multi-shard, shuffle, train/val split

    • --val-data, --val-every N, --shuffle
    • Validation loss tracked and charted
  • C: Live Charts — EMA smoothing, TFLOPS chart, val loss overlay

    • EMA toggle (alpha=0.98), chart window picker (All/500/1K/2K)
    • TFLOPS over time, validation loss dashed overlay
  • D: Text Generation — Autoregressive inference with sampling

    • neuralforge generate --prompt "..." --temperature 0.8 --top-p 0.9
    • Streaming token output via NDJSON
    • GenerateView in app with parameter controls
  • A: LoRA Fine-Tuning — Low-rank adaptation

    • Rank 4-64, configurable alpha, target selection (Q/K/V/O)
    • Tiny checkpoints (~2MB), merge-on-export support
  • B: Multi-Model Support — Runtime dimensions

    • ModelConfig replaces compile-time #defines
    • All MIL generators and CPU ops parameterized

Additional v2.0 work:

  • Security audit (input validation, bounds checking, NDJSON escaping)
  • Compile timer UX (orange banner with seconds counter)
  • Expanded test suite (109 CLI + 119 Swift tests)

v2.1 — Enterprise Foundations (Complete)

P0: Fast Tokenizer ✅

  • Replace O(n^2) BPE with priority queue (max-heap) algorithm — O(n log n)
  • Target: tokenize 1MB text in <1 second — achieved: 936ms for 1MB
  • Maintain compatibility with existing tokenizer.bin format — all 112 tests pass
  • Speed tests added: 10K (8.7ms), 100K (86ms), 1M (936ms)
  • CLI tokenize command now works on large files (previously hung on >10KB)
  • Status: Complete.

P1: Document Ingestion Pipeline ✅

  • neuralforge ingest CLI subcommand with full pipeline
  • PDF text extraction (via PDFKit/Quartz)
  • DOCX text extraction (via macOS textutil)
  • Plain text (.txt, .md, .csv, .json, code files) support
  • Code file support (.py, .js, .c, .m, .h, .swift, .rs, .go, .java, .ts, .tsx, .jsx)
  • Manifest file for shard tracking (JSON with version, timestamps, processed files)
  • Incremental mode (--incremental) — skips unchanged files based on mtime
  • Configurable shard size (--max-shard-mb, default 50MB)
  • App UI: IngestView with source/output folder pickers, shard size picker, incremental toggle
  • CLIRunner.ingest() with streaming per-file progress via NDJSON
  • Audit log integration for ingest events
  • 16 CLI tests + 12 Swift tests covering extraction, scanning, manifests, JSON parsing
  • Status: Complete.

P1: Audit Log ✅

  • Append-only JSONL log file (~/Library/Logs/NeuralForge/audit.jsonl)
  • Log: training start/stop, config used, checkpoint saves, exports, generation
  • Log: who ran what, when, with which data (user, timestamp, model, data paths)
  • Tamper detection — SHA-256 hash chain (prev_hash → hash per entry)
  • Hash chain verification function (nf_audit_verify) with tamper location detection
  • Convenience functions: nf_audit_training_start/stop, checkpoint, export, generate
  • 11 CLI tests + 11 Swift tests covering chain integrity, tamper detection, format validation
  • Status: Complete.

P2: Real Base Models ✅

  • Model registry with 5 built-in models (SmolLM 135M/360M/1.7B, TinyLlama 1.1B/1.1B-base)
  • neuralforge models CLI command (text + JSON output)
  • neuralforge download CLI command with streaming NDJSON progress
  • Python converter: HuggingFace safetensors → llama2.c format (convert_hf.py)
  • GQA → MHA KV head expansion for cross-architecture compatibility
  • Tokenizer conversion (HF tokenizer.json → tokenizer.bin)
  • Model card JSON metadata (model_card.json per download)
  • ModelCardView in app: browse models, download with progress, architecture details
  • 13 CLI tests + 8 Swift tests covering registry, search, JSON emission, download events
  • Status: Complete.

P2: LLM Assistant Integration ✅

  • Claude API client (NFIntelligence.swift) — async URLSession, rate limiting (5 req/min), retry
  • API key management — macOS Keychain storage (save/load/delete), settings UI
  • Training assistant chat view (AssistantView.swift) — full chat UI with message history
  • System prompt with live training context (model info, loss curve, config, TFLOPS)
  • Auto hyperparameter suggestions — analyzes setup, returns structured JSON, one-click apply
  • Generated text evaluation — fluency/coherence/creativity scoring with grammar analysis
  • Privacy-first design: only metadata sent, never weights or training data, 100% optional
  • 12 Swift tests covering message types, JSON parsing, context building, rate limiting
  • Status: Complete.

v3.0 — Collaboration (Complete)

Centralized Audit Dashboard ✅

  • AuditLogReader.swift — JSONL parser with SHA-256 hash chain verification
  • AuditEntry model with computed properties (eventIcon, eventColor, summary, date)
  • AuditStats aggregation (entry counts, users, training time, best loss)
  • AuditVerification with chain integrity status and tamper location detection
  • AuditDashboardView.swift — full compliance audit log viewer
  • Stats bar (entries, trainings, checkpoints, exports, generations, train time, best loss)
  • Filter bar with event type picker and text search
  • Scrollable entry list with icons, seq numbers, event types, timestamps, truncated hashes
  • Hash chain verification sheet with visual pass/fail indicators
  • Entry detail sheet showing all audit fields + hash chain info
  • CSV export via NSSavePanel
  • 12 Swift tests covering entry parsing, chain format, verification, stats, CSV export
  • Status: Complete.

Team Sync Service ✅

  • SyncService.swift — checkpoint sync engine with configurable shared directory
  • Automatic checkpoint detection and sync across all projects
  • Shared model registry — browse synced checkpoints and models
  • LaunchAgent plist generation with configurable interval (5-120 min)
  • LaunchAgent install/uninstall via launchctl
  • Restore checkpoints from shared directory to any project
  • Sync status tracking (idle, syncing, success, error)
  • Pending sync detection (unsynced checkpoints)
  • SyncDashboardView.swift — full sync UI with setup, status, shared browser
  • Sync history with file sizes and step numbers
  • Project name sanitization for safe directory names
  • Sync tab added to ProjectDetailView
  • 12 Swift tests covering config, codable, paths, plist, sanitization
  • Status: Complete. CloudKit/S3 backend deferred to future release.

Cloud Sync Backend (Future)

  • CloudKit or S3 backend for remote sync
  • Conflict resolution for concurrent training

Compliance Report Generation ✅

  • ComplianceReportGenerator.swift — generates structured reports from audit data
  • Three compliance frameworks: General Audit, HIPAA, SOX
  • HIPAA sections: §164.312(a-e) — audit controls, access control, integrity, authentication, transmission
  • SOX sections: §302 management assessment, §404 internal controls, separation of duties, config changes
  • Date range filtering for report period selection
  • Hash chain integrity verification integrated into all report types
  • Multi-user access tracking with threshold-based warnings
  • ComplianceReportView.swift — full report UI with framework picker, preview, export
  • Report status badges: Compliant (green), Needs Review (orange), Non-Compliant (red)
  • Section severity indicators: INFO, PASS, REVIEW, CRITICAL
  • Text export (plain text with formatted sections)
  • PDF export (via NSPrintOperation)
  • Reports tab added to ProjectDetailView
  • 12 Swift tests covering frameworks, statuses, sections, filtering, thresholds
  • Status: Complete.

Multi-User Audit Aggregation

  • Web dashboard for audit log aggregation across machines
  • Multi-user compliance reporting with merged logs

Distributed ANE Compute ✅

  • ComputeClusterService.swift — Bonjour-based multi-Mac ANE compute cluster
  • DeviceCapabilities model — chip detection, ANE TFLOPS estimation, memory, CPU/GPU cores
  • IOPlatformUUID-based stable device identification
  • Chip family database: M1/M2/M3/M4 (base/Pro/Max/Ultra) with TFLOPS + GPU core estimates
  • NWListener-based service advertisement with TXT record metadata
  • NWBrowser-based automatic device discovery on local network
  • ClusterNode model with status tracking (discovered/available/training/syncing/error/offline)
  • TFLOPS-weighted shard distribution algorithm for data parallelism
  • Cluster metrics aggregation (total TFLOPS, total memory, node count)
  • ComputeClusterView.swift — full cluster dashboard UI
  • Local device info card with chip, memory, ANE TFLOPS, CPU cores
  • Discovered nodes list with status badges and capability display
  • Shard distribution visualization with proportional bars
  • Cluster tab added to ProjectDetailView
  • 12 Swift tests covering service type, status, TFLOPS/memory formatting, GPU/ANE estimates, shard distribution, device model
  • Status: Complete. Gradient aggregation protocol deferred to future release.

v4.0 — Productivity & Insights (Complete)

Settings & Preferences ✅

  • Dedicated SettingsView.swift with 4-tab layout (General, Training, API Keys, About)
  • CLI binary path management with browse, auto-detect, and status indicator
  • Default training hyperparameters (steps, LR, accumulation, checkpoint, grad clip, seed)
  • Default scheduler settings (warmup, LR schedule, shuffle, LoRA rank)
  • API key management — Claude API + HuggingFace token via macOS Keychain
  • Export format defaults (GGUF, llama2c, CoreML)
  • Auto-save interval and max history entries settings
  • About tab with version info, CLI status, and feature summary
  • 12 Swift tests covering defaults, ranges, identifiers, options
  • Status: Complete.

Training Run History & Experiment Tracker ✅

  • TrainingHistoryService.swift — persist completed training runs to JSON
  • TrainingRun model with full metadata: project, model, config snapshot, results, timestamps
  • TrainingRunConfig snapshot preserving all hyperparameters at time of run
  • LossPoint model for serializable loss curve data
  • Loss curve downsampling for efficient storage (500 train + 200 val points max)
  • Auto-save on training completion via recordCompletedRun()
  • Run queries: by project, best run, recent, search (name/notes/model/LoRA)
  • CRUD operations: add, delete, batch delete, update notes, clear
  • TrainingHistoryView.swift — browse runs with sortable table (date, steps, loss, duration, LR, LoRA, TFLOPS)
  • Search bar with text filtering
  • Sort orders: newest, oldest, best loss, longest
  • Run detail sheet with loss chart, config, model info, notes
  • ComparisonSheet — side-by-side run comparison with overlaid loss curves
  • CSV export via NSSavePanel
  • Best run trophy indicator
  • History tab added to ProjectDetailView
  • 12 Swift tests covering model, formatting, codable, downsample, search, export
  • Status: Complete.

Model Evaluation Benchmarks ✅

  • BenchmarkService.swift — perplexity evaluation engine with persistent results
  • BenchmarkResult model: perplexity, avg loss, tokens, eval time, checkpoint info
  • Perplexity scoring via CLI evaluatePerplexity command
  • CLIRunner.evaluatePerplexity() — streaming batch evaluation with NDJSON progress
  • Checkpoint-to-checkpoint comparison with trend detection (improving/stable/degrading)
  • BenchmarkStats aggregation (best/worst/avg perplexity, trend analysis)
  • Automated quality regression detection with configurable thresholds
  • RegressionAlert model with warning (>0.5) and critical (>2.0) severity levels
  • BenchmarkView.swift — full evaluation UI with stats bar, perplexity chart, results table
  • Evaluation controls: data path picker, run button, streaming progress
  • BenchmarkDetailSheet with metrics, checkpoint info, metadata
  • CSV export for benchmark results
  • Best result star indicator
  • Benchmarks tab added to ProjectDetailView
  • 12 Swift tests covering perplexity math, formatting, trends, regression, stats, export
  • Status: Complete.

v5.0 — Polish & Production Readiness (Complete)

Onboarding Wizard ✅

  • OnboardingView.swift — 4-page first-run wizard (Welcome, Setup, Goal, Ready)
  • CLI binary auto-detection from common paths + manual browse
  • HuggingFace token input with macOS Keychain storage
  • Training goal selection (Experiment / Fine-tune / Production) with per-goal defaults
  • Goal-based default hyperparameters (steps: 1K/5K/10K, LR: 3e-4/2e-4/1e-4)
  • First project creation on completion
  • @AppStorage("onboardingComplete") conditional routing in app entry point
  • Dynamic window sizing (620×520 onboarding, 1200×800 main)
  • NFKeychain extension for arbitrary service/account key pairs
  • 12 Swift tests covering goals, defaults, page nav, path validation
  • Status: Complete.

Menu Bar Integration ✅

  • MenuBarManager.swift — @MainActor singleton for training status
  • Real-time tracking: step, total, loss, best loss, TFLOPS, ms/step
  • Progress percentage and ETA calculation with timer-based elapsed tracking
  • Dynamic menu bar icon (bolt.fill when training, cpu when idle)
  • MenuBarView with training metrics grid (loss, TFLOPS, ms/step, elapsed, ETA)
  • Idle state display with "Open NeuralForge" and "Quit" actions
  • MenuBarExtra scene with .window style in app entry point
  • 10 Swift tests covering progress, ETA, formatting, icons, loss tracking
  • Status: Complete.

Quantization Service ✅

  • QuantizationService.swift — GGUF quantization and CoreML conversion pipeline
  • 8 quantization types: F16, Q8_0, Q5_1, Q5_0, Q4_1, Q4_0, Q3_K_M, Q2_K
  • Per-type metadata: bits/weight, quality rating (1-10), descriptions
  • Size estimation: estimateSize(modelParams:quantType:) and formatSize()
  • QuantizationJob model with status tracking (pending/running/success/failed)
  • CoreMLConfig with compute unit selection (All/CPU+GPU/CPU) and precision (F16/F32)
  • QuantizationService @MainActor singleton: quantizeGGUF(), convertCoreML()
  • ExportView updated with quantization picker, size estimates, export history
  • 18 Swift tests covering types, ordering, size estimation, formatting, jobs
  • Status: Complete.

Generate → Evaluate Pipeline ✅

  • Full eval pipeline in AssistantView.requestEvaluation()
  • Auto-detect tokenizer from model directory (tokenizer.bin, tokenizer.model)
  • Generate 3 text samples with diverse prompts via CLIRunner.generate()
  • Evaluate samples via Claude API (NFIntelligence.evaluateGeneratedText)
  • Display eval report and collected samples in AssistantView
  • 9 Swift tests covering prompts, tokenizer detection, path construction, params
  • Status: Complete.

Bug Fixes (15) ✅

  • Fixed 3 HIGH bugs: orphan CLIRunner, swallowed taps, broken selection checkboxes
  • Fixed 5 MEDIUM bugs: @StateObject → @ObservedObject for singletons, CSV export filtering, tokenizer auto-detection, SyncDashboardView status enum matching
  • Fixed 4 LOW bugs: deprecated onChange, sync config save, unused env object, eval stub
  • Status: Complete. All 15 bugs verified fixed, BUILD SUCCEEDED.

App Entry Point ✅

  • @AppStorage("onboardingComplete") conditional routing
  • Dynamic defaultSize based on onboarding state
  • MenuBarExtra scene with MenuBarView and dynamic status icon/title
  • EnvironmentObject injection for projectManager + cliRunner on all views
  • 8 Swift tests covering routing, window sizing, env objects
  • Status: Complete.

v5.1: Deferred Platform Features ✅

CloudKit/S3 Remote Sync ✅

  • CloudSyncProvider protocol (upload, download, list, delete, testConnection)
  • S3SyncProvider with AWS Signature V4 (HMAC-SHA256), presigned URLs
  • CloudKitSyncProvider with CKContainer/CKDatabase/CKAsset (iCloud private DB)
  • CloudSyncConfig (Codable) with S3/CloudKit settings, Keychain credential storage
  • CloudSyncManager (@MainActor singleton): upload/download/list/sync/testConnection
  • 12 Swift tests covering config, errors, URL construction, credential handling
  • Status: Complete.

Gradient Aggregation Protocol ✅

  • GradientMessage wire protocol (Codable): assignWork, gradientReady, aggregated, heartbeat, syncCheckpoint
  • AggregationConfig: AllReduce, ParameterServer, GossipProtocol strategies
  • StragglerPolicy: Wait, Skip, Timeout modes
  • GradientAggregator (@MainActor singleton): coordinator/worker modes, all-reduce averaging, ring-reduce
  • GradientMetrics (ObservableObject): rounds, throughput, straggler/failure counts, rolling averages
  • Gradient compression (threshold-based sparsification) and checksum verification
  • 15 Swift tests covering strategies, metrics, compression, ring topology
  • Status: Complete.

Multi-User Audit Web Dashboard ✅

  • WebDashboardConfig (Codable): port, bind address, auth token, refresh interval
  • AuditAggregator: local log scanning, multi-machine sync directory scanning, entry merging
  • AuditAPIHandler: HTTP request parsing, 6 REST routes (/, /api/entries, /api/stats, /api/verify, /api/machines, /health)
  • Full HTML dashboard with dark theme, stats cards, filter bar, audit entry table, auto-refresh
  • AuditWebServer (@MainActor singleton): NWListener-based HTTP server, CORS support, bearer token auth
  • Thread-safe connection handling with ObjectIdentifier-based tracking
  • 18 Swift tests covering config, URL generation, request parsing, response serialization, auth
  • Status: Complete.

v5.2: Hardening & Polish ✅

XCUITests (End-to-End UI Automation) ✅

  • XCUITest target added to Xcode project (NeuralForgeUITests)
  • 22 UI test cases covering onboarding flow, main view, project creation, settings, menus, window sizing
  • Launch argument support for -onboardingComplete to test both onboarding and main flows
  • Accessibility validation tests
  • Status: Complete. UI test target builds and compiles.

Compiler Warning Fixes ✅

  • Fixed CLIRunner.swift unused [weak self] capture in CoreML export callback
  • Fixed ComputeClusterService.swift non-exhaustive switch on NWTXTRecord.Entry
  • Fixed BenchmarkService.swift unused batchCount variable in eval callback
  • Fixed AppIcon.appiconset — added 3 unassigned children (64x64, 64x64@2x, 1024x1024) to Contents.json
  • Status: Complete. Zero warnings on clean build.

README / User Docs Update ✅

  • Comprehensive README rewrite with all v5.x features
  • Updated test counts (508 total: 152 CLI + 356 Swift)
  • Added generate command documentation and examples
  • Added macOS app feature list (16 features)
  • Updated architecture diagram (39 source files, UITests)
  • Status: Complete.

v6.0: Platform Extensions ✅

Training Profiles (Save/Load Config Presets) ✅

  • TrainingProfile model (Codable): name, description, config, tags, lastUsed
  • 5 built-in presets: Quick Test, Standard, Long Run, LoRA Fine-Tune, Conservative
  • TrainingProfileService (@MainActor singleton): CRUD, search, filter by tag, recent tracking
  • Profile diff computation — show config differences between two profiles
  • Apply profile to project — one-click config swap
  • Create profile from project — extract current config into reusable preset
  • Import/export profiles as JSON for sharing across machines
  • Duplicate profiles with auto-naming
  • 15 Swift tests covering presets, serialization, search, diff, apply, recent tracking
  • Status: Complete.

Drag & Drop Data Ingestion ✅

  • DragDropDataService (@MainActor singleton): batch file processing with progress tracking
  • 9 supported file types: txt, md, json, jsonl, csv, pdf, swift, py, html
  • File validation: size limits (100MB), empty file detection, UTF-8 encoding check
  • DroppedFileResult model with success/skipped/error status tracking
  • IngestBatch aggregation: success/error/skip counts, total characters, line counts
  • Staging directory workflow: validate → stage → concatenate → tokenize
  • Token count estimation (chars ÷ 4 for English text)
  • File size formatting (B/KB/MB)
  • Configurable batch limit (1000 files per batch)
  • 15 Swift tests covering extensions, limits, formatting, batching, staging, progress
  • Status: Complete.

Webhook Notifications (Slack/Discord/Generic) ✅

  • WebhookNotificationService (@MainActor singleton): multi-provider webhook delivery
  • 3 providers: Slack (attachments), Discord (embeds), Generic (JSON)
  • 7 event types: training started/completed/failed, checkpoint, validation improved, loss target, export
  • WebhookConfig (Codable): per-endpoint event filtering, metrics toggle, custom message
  • Provider-specific payload formatting with color-coded status indicators
  • Delivery tracking with success/failure history (max 100 entries)
  • Test webhook functionality for verification
  • Success rate monitoring and per-webhook delivery history
  • Thread-safe nonisolated network calls with URLSession
  • 15 Swift tests covering providers, events, payloads, delivery tracking, serialization
  • Status: Complete.

MLX Backend (Metal GPU Alternative) ✅

  • MLXBackendService (@MainActor singleton): compute backend selection and management
  • 3 compute backends: ANE (Neural Engine), MLX (Metal GPU), CPU (Accelerate)
  • MLX availability detection via Python subprocess (version check)
  • MLXModelInfo model: param count formatting, memory estimation per quantization
  • Backend compatibility matrix per model format (bin/safetensors/gguf/npz)
  • Performance multiplier estimates (ANE ~10x, MLX ~7x, CPU baseline)
  • CLI argument generation per backend (--backend mlx, --no-ane-extras)
  • MLX training/generate command generation (mlx_lm.lora, mlx_lm.generate)
  • Backend benchmarking with forward/backward pass timing, TFLOPS, memory
  • System capabilities query (CPU count, memory, OS version)
  • 15 Swift tests covering backends, formatting, memory estimation, commands, benchmarks
  • Status: Complete.

Test Coverage

Component Tests Last Verified
CLI (test_cli.m) 152 2026-03-07
Swift (NeuralForgeTests.swift) 416 2026-03-07
Xcode build (43 source files) SUCCEEDED 2026-03-07
Real training (50 steps) PASSED 2025-03-07
Real generation (100 tokens) PASSED 2025-03-07
CLI tokenize (45KB file) PASSED 2025-03-07

Known Issues

Issue Severity Status
BPE tokenizer O(n^2), hangs on >10KB High Fixed — replaced with O(n log n) heap
Training data may be Git LFS placeholder (15 bytes) Medium Workaround: regenerate with Python
Tokenizer hangs CLI tokenize command on large files High Fixed — same fix, 1MB in <1s
First ANE compile takes 20-30s (no visible progress) Low Fixed — compile timer added