SR-MoA is a parameter-efficient, dynamic neural architecture that introduces autonomous test-time neuroplasticity into frozen Large Language Models.
Unlike traditional transformers with static weights, SR-MoA allows the model to autonomously detect its own reasoning failures, formulate self-reflective correction rules, and execute gradient backpropagation to rewire its own adapter bank and routing pathways directly during inference—all without external human supervision or pre-existing datasets.
To prevent catastrophic forgetting, 99% of the base LLM remains permanently frozen. We inject a lightweight bank of LoRA experts (Adapter Bank) modulated by an MLP-based Router.
[ Input Context ]
│
┌───────────────────────┴───────────────────────┐
▼ ▼
[ Frozen Base LLM ] [ Differentiable Router ]
│ │
│ ┌───────────────────────────────────────────┘ (Softmax weights)
▼ ▼
[ einsum() Dynamic Fusion ]
│ (Combines AdapterBank weights on-the-fly)
▼
[ Target Adapter Output ]
│
▼
[ Final Prediction ]
When a reasoning failure is detected:
- Self-Reflection: The model generates a symbolic natural language correction rule (e.g., "RULE: Always verify intermediate arithmetic products before calculating discounts.").
- Signal Diversification: The rule is compiled into 4 diverse semantic prompt structures to prevent overfitting and local minima.
- Targeted Backpropagation: A unified loss (Causal LM loss + Router Cross-Entropy loss) flows back only into the target adapter weights and the Router. The model literally rewires its routing connections to map similar future query embeddings to the newly optimized adapter.
AdapterBank: Uses multi-adapter batch-mode tensor calculations (torch.einsum) to apply distinct dynamic expert modulations to hidden representations inside the forward pass.Router: Fuses historical semantic state representations (e.g., from an external episodic memory graph) with active contextual states to produce dynamic routing weights over the adapter bank.SRMoATrainer: Handles backpropagation steps exclusively for parameters withrequires_grad=Trueutilizing gradient clipping for test-time optimization stability.
Ensure you have the required dependencies:
pip install torch transformers sentence-transformersInitialize the architecture and execute a test-time learning step on an error correction:
import torch
from transformers import AutoTokenizer
from sr_moa import SRMoAModel
from trainer import SRMoATrainer
# 1. Initialize Model & Tokenizer
model = SRMoAModel(base_model_name="Qwen/Qwen2.5-1.5B-Instruct", num_adapters=16)
tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen2.5-1.5B-Instruct")
model.cuda().half()
# Keep learning weights in float32 for gradient updates
model.router = model.router.float()
model.adapter_bank = model.adapter_bank.float()
# 2. Define Context & Rule
memory_state = torch.zeros(1, 128, device="cuda") # Simulated DSM Memory Vector
rule_text = "RULE: If buying 5 or more notebooks, apply 20% discount."
target_adapter_idx = 4
# 3. Optimize Weights in Real-Time (Test-Time Training)
trainer = SRMoATrainer(learning_rate=1e-3)
loss, routing_weights = trainer.compute_gradients(model, tokenizer, rule_text, target_adapter_idx)
print(f"Pre-step routing probabilities: {routing_weights[0].tolist()}")
updated_params = trainer.step(model, loss)
print(f"Successfully rewired {updated_params} parameters.")Evaluated on a subset of the GSM8K mathematical reasoning benchmark:
| Model Configuration | Accuracy | Router Loss (CE) | Convergence Speed |
|---|---|---|---|
| Vanilla Qwen 1.5B Core | 50% | N/A | Instant (Static) |
| SR-MoA (1 Cycle SR-TTT) | 70% | 10.1 | 8 micro-steps |
| SR-MoA (2 Cycles SR-TTT) | 80% | 4.7 | 16 micro-steps |
- Impact: The system achieved a +60.0% relative reasoning improvement (50% → 80% accuracy) purely via test-time self-updates, proving that a lightweight, frozen core can autonomously correct its own cognitive pathways.
sr_moa/
├── README.md # Scientific and engineering documentation
├── requirements.txt # Package dependencies
├── src/
│ ├── sr_moa.py # Core neural network modules (AdapterBank, Router, SRMoAModel)
│ └── trainer.py # SR-TTT gradient training pipeline (SRMoATrainer)
└── tests/
└── sr_ttt.py # Continuous self-improvement benchmark suite
This project is licensed under the MIT License - see the LICENSE file for details.
If you utilize this architecture in your research, please cite:
@software{danik_sr_moa_2026,
author = {Danil (NARE Labs)},
title = {Self-Reflective Mixture of Adapters (SR-MoA): Edge-Centric Test-Time Neuroplasticity},
year = {2026},
publisher = {GitHub},
journal = {GitHub Repository},
howpublished = {\url{https://github.com/narelabs/sr-moa}}
}