PyTorch research prototype for inference-time alignment interventions using risk aggregation and bounded correction loops.
rust transformers pytorch alignment ai-safety red-teaming adversarial-robustness mechanistic-interpretability llm-security hallucination-mitigation agentic-systems inference-time-alignment
-
Updated
May 14, 2026 - Jupyter Notebook