Date: 2026-04-16 (updated after tokenizer + presets)
Session: Claude-20260415T120000Z-bitnet-distributed-training
HEAD: adc801a (both origin + azure synced)
Tests: 314 fast-lane passing
Latest additions since initial handoff:
- WordLevelTokenizer (5174-vocab, Contracts) + tokenize-corpus CLI
- TruckMateModelPresets (small ~7M, medium ~56M, large ~121M)
- CoordinatorOptions.ModelPreset → WeightApplicationService dimension override
- Pre-tokenized corpus staged on PAYTON-DESKTOP (1.83M tokens, 10 binary shards)
A complete distributed CPU training system for the BitNet b1.58 ternary SLM, targeting the Truck Mate voice-assistant intent-classification use case. The system spans four .NET projects, a Docker image, and a Windows service deployment on two machines.
src/
BitNetSharp.Distributed.Contracts/ # Wire-format DTOs, codecs, tokenizer
BitNetSharp.Distributed.Coordinator/ # ASP.NET Core host (Duende IS + Blazor)
BitNetSharp.Distributed.Worker/ # Console app with BDN calibration + Serilog
docker/
worker/ # Dockerfile + docker-compose + build.ps1
.claude/
scripts/ # PS remoting deployment scripts for PAYTON-DESKTOP
tests/
BitNetSharp.Tests/ # 307+ xunit cases
- Hosting: ASP.NET Core Web +
UseWindowsService— runs as console or Windows service - Auth: Duende IdentityServer 7.4.7 — worker machine-login (client_credentials), admin OIDC (code+PKCE)
- Persistence: SQLite WAL — 5 stores (WorkQueue, WorkerRegistry, ClientRevocation, Telemetry, LogStore)
- Weights:
FileSystemWeightStore— immutable versioned fp32 blobs with SHA-256 sidecars - Weight apply:
WeightApplicationService— in-memory global fp32 vector, staleness compensation (lr / (1 + staleness * α)), max-staleness rejection, persist-on-every-apply - CQRS: McpServer.Cqrs library (cross-repo ProjectReference to
F:\GitHub\McpServer) —IDispatcher,Result<T>monad, assembly-scanned handlers - MVVM: CommunityToolkit.Mvvm ObservableObject ViewModels, minimal Razor code-behind
- Background services:
StaleSweeperService(stale workers → Gone, timed-out tasks → Pending),TelemetryPruneService(hourly DELETE of old rows) - Codecs:
Int8GradientCodec(per-tensor scale + error-feedback residual),WeightBlobCodec(version + fp32 vector) - Corpus:
TruckMateCorpusGenerator(50K synthetic intent examples),WordLevelTokenizer(5174-vocab word-level, pre-tokenized to binary int32 shards)
Blazor admin pages (all OIDC cookie-gated):
| Page | URL | Features |
|---|---|---|
| Dashboard | /admin/dashboard |
Interactive server-render, 5s auto-refresh, per-worker table with Drain/Gone/Rotate actions, task counts, weight version, telemetry rollup |
| API keys | /admin/api-keys |
List/rotate worker OAuth secrets with immediate JWT revocation |
| Tasks | /admin/tasks |
Queue snapshot + bulk seed form |
| Install | /admin/install |
Per-client bash + PowerShell worker bootstrap scripts |
| Logs | /admin/logs |
Structured log viewer with worker/level/search filtering |
| Login | /Account/Login |
Duende IS interactive login |
REST endpoints:
| Method | Path | Auth | Purpose |
|---|---|---|---|
| POST | /connect/token |
Public | OAuth client_credentials → JWT |
| POST | /register |
JWT | Worker registration + capability report |
| GET | /work |
JWT | Atomic task claim (204 when empty) |
| POST | /heartbeat |
JWT | Worker keep-alive |
| POST | /gradient |
JWT | Task completion + gradient decode/apply |
| POST | /logs |
JWT | Structured log ingestion |
| GET | /weights/{version} |
JWT | Weight blob download with range support |
| GET | /corpus/{shardId} |
JWT | Corpus shard download |
| GET | /health |
Public | Health check |
| GET | /status |
Public | Queue + worker counts JSON |
CLI subcommands:
seed-tasks [count]— inject pending tasks into SQLite queuegenerate-corpus [count]— produce synthetic Truck Mate training examplestokenize-corpus [maxVocab]— train tokenizer + write binary int32 shards
- Calibration: BenchmarkDotNet InProcessNoEmitToolchain on startup — measures int8×ternary matmul throughput, reports tokens/sec
- Task sizing:
CapabilityReport.RecommendedTokensPerTask()scales to 10-minute target per worker - HTTP client:
CoordinatorClientwith JWT token cache, auto-refresh, fire-and-forget retry - Logging: Serilog dual-sink (Console +
CoordinatorLogSinkbatching to POST /logs) - Gradient: D-4b int8 error-feedback encoding with cross-step residual accumulation
- Docker: Multi-stage
mcr.microsoft.com/dotnet/runtime:10.0, non-root uid 10001, HEALTHCHECK via beacon file mtime,docker-compose.ymlwith--scale worker=N
- Coordinator: Windows service
BitNetCoordinatoron PAYTON-DESKTOP (Ryzen 7 2700X, 16 threads, 32GB)http://192.168.1.77:5000(LAN IPv4)- DB:
F:\ProgramData\BitNetCoordinator\coordinator.db - Corpus:
F:\ProgramData\BitNetCoordinator\corpus/(50K text + 10 tokenized binary shards) - Env vars in
HKLM\SYSTEM\CurrentControlSet\Services\BitNetCoordinator\Environment
- Worker: Console process on PAYTON-LEGION2 (Ryzen 9 5900HX, 16 threads, 24GB)
- Calibrates at ~4,750 tok/s
- Full lifecycle proven: JWT → register → heartbeat → work → gradient → task Done
Commit ae8ee29 aligned all three perplexity code paths (BitNetPaperModel, BitNetPaperAudit ×2) to 1e-6 matching TraditionalLocalModel. Impact: WikiText2 audit 19661→16444, C4 66957→16533, RedPajama 19576→9333.
-
Scale BitNetSharp.Core model config
- Current: VocabSize=68, ~4.5M params
- Target: VocabSize=5174, hidden=512, layers=12-16, ~100-150M params
- Files:
BitNetPaperModel.csconfig struct,BitNetPaperModelConfigor similar - Risk: scaling may surface numerical issues in the ternary quantization path
-
Worker corpus loader
- Download tokenized
.binshards from coordinator viaGET /corpus/{shardId} - Parse int32 sequences into batches of (input, target) pairs
- Feed into BitNet forward pass
- Files: new
CorpusDataLoaderin Worker or Core
- Download tokenized
-
Replace D-4b synthetic gradient with real backprop
- Worker's
RunWorkLoopAsynccurrently generates fake gradients - Swap for: load weights → forward on corpus batch → backward → encode gradient
- Files:
Worker/Program.cswork loop + integration withBitNetPaperModel.Train
- Worker's
-
Convergence sanity check
- Seed 50K-example corpus as tasks
- Run 1-3 epochs of distributed training
- Verify loss descends in the dashboard telemetry
- ngrok tunnel setup for external workers
- Admin client_credentials grant for scripted task seeding (current: OIDC-only + CLI)
- Blazor interactive-server upgrade for log viewer (dashboard already upgraded)
- Antiforgery tokens on login + admin POST forms
- CSRF hardening audit
Credentials rotate on every desktop-install-service-only.ps1 run. The latest set is printed by the install script's output. The admin page at /admin/api-keys shows the current worker client secrets after OIDC login.
- SQLite WAL for all coordinator persistence — single-writer topology, zero ops
- Duende IdentityServer for both worker machine-login and admin OIDC — one auth provider
- McpServer.Cqrs cross-repo ProjectReference — MVVM+CQRS enforced, all handlers assembly-scanned
- Int8 + per-tensor scale gradient codec with error-feedback residual — not ternary, because staleness effect dominates quantization term
- Staleness compensation:
effective_lr = base_lr / (1 + staleness * α)with hard reject beyond MaxStalenessSteps - Worker self-calibration via BenchmarkDotNet — coordinator sizes tasks to 10-minute target per worker
- Word-level tokenizer (not BPE) — 5174 vocab is sufficient for the narrow trucking intent domain
- Static SSR Blazor for most pages, Interactive Server for dashboard only — minimizes SignalR overhead
# On PAYTON-LEGION2 (dev box):
cd F:\GitHub\BitNet-b1.58-Sharp
dotnet build BitNet-b1.58-Sharp.slnx -c Release
dotnet test tests/BitNetSharp.Tests -c Release -f net10.0 --filter "Category!=SlowLane"
# Coordinator service on PAYTON-DESKTOP:
pwsh .claude/scripts/desktop-install-service-only.ps1
# Generate + tokenize corpus:
pwsh .claude/scripts/desktop-stage-corpus.ps1
pwsh .claude/scripts/desktop-tokenize-corpus.ps1
# Seed tasks + run worker:
pwsh .claude/scripts/desktop-seed-tasks-cli.ps1 -Count 100
$env:BITNET_COORDINATOR_URL = "http://192.168.1.77:5000/"
$env:BITNET_CLIENT_ID = "<from install output>"
$env:BITNET_CLIENT_SECRET = "<from install output>"
dotnet run --project src/BitNetSharp.Distributed.Worker -c Release -f net10.0Session Claude-20260415T120000Z-bitnet-distributed-training on MCP server at http://PAYTON-LEGION2:7147. ~20 turns logged covering every commit and design decision.