Skip to content

augstentatious/TRuCAL

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

47 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

TRuCAL

Truth-Recursive Correction Attention Layer — a compact PyTorch research prototype for inference-time alignment interventions.

TRuCAL explores a simple question: can a model route high-risk hidden states through a bounded self-correction loop before returning an output, without retraining the base model?

The current implementation is intentionally small. It is designed to be read, tested, criticized, and extended.

Status: research prototype. This repository is code-first; claims should be treated as experimental until independently evaluated.


Core mechanism

TRuCAL has three moving parts:

  1. VulnerabilitySpotter Aggregates uncertainty and instability signals into a scalar risk proxy v_t.

  2. TinyCorrectionLayer Runs a bounded THINK → ACT → COHERENCE loop when v_t crosses a trigger threshold.

  3. UnifiedCAL_TRM Exposes a minimal public API for calling the correction layer and optionally returning audit metadata.

The useful idea is the loop: detect an unstable state, route it through a small correction module, measure coherence against the previous state, and stop early once the state stabilizes.


Repository layout

TRuCAL/
├── cal.py                         # single-file reference implementation
├── components/                    # modular implementation
│   ├── vulnerability_spotter.py
│   ├── correction_template.py
│   ├── tiny_correction_layer.py
│   ├── unified_cal_trm.py
│   ├── scratchpad_layer.py
│   └── cal_trm_hybrid.py
├── examples/truthfulqa_eval.py    # evaluation scaffold
├── tests/                         # local smoke/regression tests
├── requirements.txt
└── LICENSE

Installation

git clone https://github.com/augstentatious/TRuCAL.git
cd TRuCAL
python -m pip install -r requirements.txt

Quick start

import torch
from cal import UnifiedCAL_TRM

model = UnifiedCAL_TRM(d_model=256)
x = torch.randn(1, 32, 256)

out, meta = model(x, return_metadata=True, audit_mode=False)

print(out.shape)                    # torch.Size([1, 32, 256])
print(meta["correction_triggered"]) # True/False
print(meta["coherence_score"])      # scalar coherence proxy

Advanced configuration:

from cal import TinyCorrectionLayer

layer = TinyCorrectionLayer(
    d_model=256,
    trigger_thresh=0.08,
    per_dim_kl=True,
)

Validation

Local checks:

python tests/test_bug_fixes.py
python tests/test_prosody.py
python tests/test_cal.py

These scripts are smoke/regression checks for the prototype. They are not a benchmark suite and should not be presented as production safety evidence.


Research direction

Near-term work that would make TRuCAL meaningfully stronger:

  1. Replace embedding-coordinate prosody proxies with tokenizer-aware features.
  2. Add pinned dependency versions and CI smoke tests.
  3. Publish a clean evaluation harness for TruthfulQA / AdvBench-style runs.
  4. Add a Hugging Face integration path that hooks post-embedding decoder states.
  5. Compare against simpler baselines: threshold-only gating, entropy-only gating, and no-loop ablations.

License

MIT. See LICENSE for details.

Packages

 
 
 

Contributors