어텐션 잔차로 입력별 깊이 라우팅 — 휴면 · Attention-residual signal for adaptive depth routing in decoder LMs · narrow cc_news quality edge survives, no deployment win (dormant)
nlp efficiency pytorch transformer attention language-model research-archive llm adaptive-compute depth-routing
-
Updated
May 28, 2026 - Python