SR-MoA: Self-Reflective Mixture of Adapters

SR-MoA is a parameter-efficient, dynamic neural architecture that introduces autonomous test-time neuroplasticity into frozen Large Language Models.

Unlike traditional transformers with static weights, SR-MoA allows the model to autonomously detect its own reasoning failures, formulate self-reflective correction rules, and execute gradient backpropagation to rewire its own adapter bank and routing pathways directly during inference—all without external human supervision or pre-existing datasets.

Key Concepts

1. Differentiable Routing & Adapter Isolation

To prevent catastrophic forgetting, 99% of the base LLM remains permanently frozen. We inject a lightweight bank of LoRA experts (Adapter Bank) modulated by an MLP-based Router.

                                  [ Input Context ]
                                          │
                  ┌───────────────────────┴───────────────────────┐
                  ▼                                               ▼
         [ Frozen Base LLM ]                            [ Differentiable Router ]
                  │                                               │
                  │   ┌───────────────────────────────────────────┘ (Softmax weights)
                  ▼   ▼
             [ einsum() Dynamic Fusion ] 
                  │  (Combines AdapterBank weights on-the-fly)
                  ▼
         [ Target Adapter Output ]
                  │
                  ▼
          [ Final Prediction ]

2. Self-Reflective Test-Time Training (SR-TTT)

When a reasoning failure is detected:

Self-Reflection: The model generates a symbolic natural language correction rule (e.g., "RULE: Always verify intermediate arithmetic products before calculating discounts.").
Signal Diversification: The rule is compiled into 4 diverse semantic prompt structures to prevent overfitting and local minima.
Targeted Backpropagation: A unified loss (Causal LM loss + Router Cross-Entropy loss) flows back only into the target adapter weights and the Router. The model literally rewires its routing connections to map similar future query embeddings to the newly optimized adapter.

Architecture Specification

AdapterBank: Uses multi-adapter batch-mode tensor calculations (torch.einsum) to apply distinct dynamic expert modulations to hidden representations inside the forward pass.
Router: Fuses historical semantic state representations (e.g., from an external episodic memory graph) with active contextual states to produce dynamic routing weights over the adapter bank.
SRMoATrainer: Handles backpropagation steps exclusively for parameters with requires_grad=True utilizing gradient clipping for test-time optimization stability.

Quick Start

Installation

Ensure you have the required dependencies:

pip install torch transformers sentence-transformers

Basic Usage

Initialize the architecture and execute a test-time learning step on an error correction:

import torch
from transformers import AutoTokenizer
from sr_moa import SRMoAModel
from trainer import SRMoATrainer

# 1. Initialize Model & Tokenizer
model = SRMoAModel(base_model_name="Qwen/Qwen2.5-1.5B-Instruct", num_adapters=16)
tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen2.5-1.5B-Instruct")
model.cuda().half()

# Keep learning weights in float32 for gradient updates
model.router = model.router.float()
model.adapter_bank = model.adapter_bank.float()

# 2. Define Context & Rule
memory_state = torch.zeros(1, 128, device="cuda") # Simulated DSM Memory Vector
rule_text = "RULE: If buying 5 or more notebooks, apply 20% discount."
target_adapter_idx = 4

# 3. Optimize Weights in Real-Time (Test-Time Training)
trainer = SRMoATrainer(learning_rate=1e-3)
loss, routing_weights = trainer.compute_gradients(model, tokenizer, rule_text, target_adapter_idx)

print(f"Pre-step routing probabilities: {routing_weights[0].tolist()}")
updated_params = trainer.step(model, loss)
print(f"Successfully rewired {updated_params} parameters.")

Benchmark Results

Evaluated on a subset of the GSM8K mathematical reasoning benchmark:

Model Configuration	Accuracy	Router Loss (CE)	Convergence Speed
Vanilla Qwen 1.5B Core	50%	N/A	Instant (Static)
SR-MoA (1 Cycle SR-TTT)	70%	10.1	8 micro-steps
SR-MoA (2 Cycles SR-TTT)	80%	4.7	16 micro-steps

Impact: The system achieved a +60.0% relative reasoning improvement (50% → 80% accuracy) purely via test-time self-updates, proving that a lightweight, frozen core can autonomously correct its own cognitive pathways.

Directory Structure

sr_moa/
├── README.md               # Scientific and engineering documentation
├── requirements.txt        # Package dependencies
├── src/
│   ├── sr_moa.py           # Core neural network modules (AdapterBank, Router, SRMoAModel)
│   └── trainer.py          # SR-TTT gradient training pipeline (SRMoATrainer)
└── tests/
    └── sr_ttt.py           # Continuous self-improvement benchmark suite

License

This project is licensed under the MIT License - see the LICENSE file for details.

Citation

If you utilize this architecture in your research, please cite:

@software{danik_sr_moa_2026,
  author = {Danil (NARE Labs)},
  title = {Self-Reflective Mixture of Adapters (SR-MoA): Edge-Centric Test-Time Neuroplasticity},
  year = {2026},
  publisher = {GitHub},
  journal = {GitHub Repository},
  howpublished = {\url{https://github.com/narelabs/sr-moa}}
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SR-MoA: Self-Reflective Mixture of Adapters

Key Concepts

1. Differentiable Routing & Adapter Isolation

2. Self-Reflective Test-Time Training (SR-TTT)

Architecture Specification

Quick Start

Installation

Basic Usage

Benchmark Results

Directory Structure

License

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
data		data
src		src
tests		tests
.gitattributes		.gitattributes
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

SR-MoA: Self-Reflective Mixture of Adapters

Key Concepts

1. Differentiable Routing & Adapter Isolation

2. Self-Reflective Test-Time Training (SR-TTT)

Architecture Specification

Quick Start

Installation

Basic Usage

Benchmark Results

Directory Structure

License

Citation

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages