Hybrid-GRU is a high-performance, memory-efficient inference engine built on a Hybrid-GRU + BitNet 1.58b (Ternary) architecture. It is designed for ultra-fast, local sequence modeling with a minimal memory footprint (~72MB RAM).
| Specification | Metric | Status |
|---|---|---|
| Architecture | Hybrid GRU + BitNet 1.58b (Ternary {-1, 0, 1}) | Optimized |
| Model Size (Disk) | 16.68 MB (Bit-Packed Ternary Weights) | Verified |
| Recurrence | Full Hidden-State Connectivity | Implemented |
| Normalization | RMSNorm Stabilization | Hardened |
| Quantization | 1.58-bit (log2(3)) Ternary Logic | Native |
Unlike standard 16-bit or 4-bit models, this engine utilizes Ternary Weight Quantization, replacing floating-point multiplications with simple additions and subtractions.
| Metric | BitNet 1.58b (This Engine) | LLaMA-style Q4_K_M |
|---|---|---|
| VRAM Usage | < 20MB | ~140MB+ |
| Compute Ops | ADD/SUB only | INT4/FP16 MUL-ADD |
| Energy Efficiency | ~10x better | Baseline |
| Throughput | 14,000+ TPS (RTX 3050) | ~2,500 TPS |
The current implementation focuses on architectural stability and memory efficiency:
- Full Recurrence Restoration: Optimized the hidden-state passing bottleneck, ensuring temporal consistency.
- Dynamic Xavier Scaling: Layer-specific signal scaling to prevent weight saturation in ternary logic.
- MaxMatch Tokenization: Encoding pipeline optimized for 0% unknown token rate.
- Bit-Packed Storage: Weights are stored as bit-packed ternary values, reducing disk footprint by >80%.
- Knowledge Distillation: Training scripts provided for distilling knowledge from larger dense models into this ternary core.
- RMSNormalization: Block-level normalization for activation stability during long-sequence generation.
graph TD
Input([Input Sequence]) --> Tokenizer[MaxMatch Subword Encoder]
Tokenizer --> Embedding[Ternary Embedding Weights]
subgraph "Hybrid-GRU Block (BitNet 1.58b)"
Embedding --> GRU[GRU Core + Context Buffer]
GRU --> Norm[RMSNorm Stabilization]
Norm --> Recurrence[Hidden State H_t-1]
Recurrence -.-> GRU
end
Norm --> Head[Ternary Output Head]
Head --> Softmax[Softmax Layer]
Softmax --> Output([Next Token Probs])
/architecture/neural_core: Optimized C++ source code and bitwise kernels./bin: Compiled production library (hybrid_gru.dll).vocab.txt: 50,261-entry vocabulary.PROGRESS_LOG.md: Implementation history and phase-by-phase updates.
import ctypes
# Initialize the Engine
sov = ctypes.CDLL("bin/hybrid_gru.dll")
master = sov.hybrid_gru_init_master()
agent = sov.hybrid_gru_init_agent(b"primary", master, 42)
# Observe and Generate
sov.hybrid_gru_agent_observe(agent, b"The system status is ")
response = sov.hybrid_gru_agent_act(agent, 16, 0.7).decode()
print(f"Output: {response}")Released under the MIT License. Created by Sumith Kumar.