Wikitext-MoE-40M

This Model is a compact Transformer decoder implementing a Sparse Mixture of Experts (MoE) architecture. It is trained on 30% of Wikitext-103 dataset, a text corpus of Wikipedia articles. It is able to achieve reletively good results while maintaining a low parameter count.

The model was trained on a 30% subset of the WikiText-103 dataset. Evaluation was performed on the WikiText-2 dataset (2 million tokens), yielding 38 ppl and 3.6 cross Entropy

Key Features

Sparse Mixture of Experts (MoE): Implements a routing mechanism with 6 experts and top-k (k=2) selection to optimize compute efficiency.
Rotary Positional Embeddings (RoPE): Utilizes relative position encoding for improved long-range dependency handling.
Rotary Causal Attention: Integrates RoPE directly into the Multi-Head Attention mechanism for enhanced spatial awareness.
RMSNorm & Stability: Employs Root Mean Square Layer Normalization with an epsilon of $1e^{-6}$ for stable training dynamics.

🛠️ Installation & Setup

To replicate the results of the Wikitext-MoE-40M model, follow these steps to clone the repository and set up the environment.

1. Clone the Repository

Open your terminal and run the following commands to download the code and enter the project directory:

git clone [https://github.com/RkCode2025/Wikitext-MoE-40M.git]https://github.com/RkCode2025/Wikitext-MoE-40M.git)
cd Wikitext-MoE-40M

Name		Name	Last commit message	Last commit date
Latest commit History 42 Commits
models		models
utils		utils
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
metrics.jpg		metrics.jpg
requirements.txt		requirements.txt
test.py		test.py
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Wikitext-MoE-40M

Key Features

🛠️ Installation & Setup

1. Clone the Repository

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Wikitext-MoE-40M

Key Features

🛠️ Installation & Setup

1. Clone the Repository

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages