DEEPSEEK-V3-_FROM_SCRATCH

Implementation of the DeepSeek V3 architecture from scratch with modern transformer architecture including Decoupled Rotary PE Multi-Head Latent Attention (MLA), Mixture of Experts (MoE), and Multi-Token Prediction (MTP).(Also explains the prior innovations in the attention architecture such as after MHA, MQA, GQA as well.

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
config		config
data		data
deepsek_moe_ropeless		deepsek_moe_ropeless
model_architecture		model_architecture
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

DEEPSEEK-V3-_FROM_SCRATCH

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

DEEPSEEK-V3-_FROM_SCRATCH

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages