Skip to content

Add Low-Rank Q factorization for ~22% faster training steps

ad482ec
Select commit
Loading
Failed to load commit list.
Open

Depth Recurrence via Layer Sharing (3 shared blocks → 1/3 params, matched BPB) #167

Add Low-Rank Q factorization for ~22% faster training steps
ad482ec
Select commit
Loading
Failed to load commit list.

Workflow runs completed with no jobs