Skip to content

cattermelon1234/relaxed-recursive-transformer

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Pytorch implementation of a relaxed recursive transformer architecture. Reduces GPT-2 parameters by 40% (1.5 billion to 84 million), uptrained on 20 billion tokens on openwebtext2. Achieves performance on par with GPT-2 and GPT-2 distilled on wiki-text-103 dataset in perplexity.

About

pytorch implementation of relaxed recursive transformer architecture with layer-wise LoRA

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages