GitHub - cattermelon1234/relaxed-recursive-transformer: pytorch implementation of relaxed recursive transformer architecture with layer-wise LoRA

Pytorch implementation of a relaxed recursive transformer architecture. Reduces GPT-2 parameters by 40% (1.5 billion to 84 million), uptrained on 20 billion tokens on openwebtext2. Achieves performance on par with GPT-2 and GPT-2 distilled on wiki-text-103 dataset in perplexity.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
__pycache__		__pycache__
architecture		architecture
wandb		wandb
README.md		README.md
training.py		training.py
transformer.py		transformer.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages