Makemore MLP Model

This repository contains an implementation of a simple Multi-Layer Perceptron (MLP) model inspired by Andrej Karpathy's "makemore" series. The model is designed to generate text by predicting the next character/word in a sequence based on previous inputs.

Model Overview

The MLP class is built using PyTorch and consists of:

Token Embedding Layer (nn.Embedding): Converts token indices into dense vector representations.
Multi-Layer Perceptron (nn.Sequential):
- Fully connected layers (nn.Linear)
- Tanh activation function
- Final output layer predicting the next token

How It Works

The input sequence of tokens is embedded into dense vectors.
The model shifts input tokens and replaces the first token with a special <BLANK> token.
The embeddings of previous words are concatenated and passed through an MLP.
The model outputs logits (predictions) for the next token.
During training, the model computes cross-entropy loss if target labels are provided.

References

Andrej Karpathy's makemore series

License

This project is open-source and available under the MIT License.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Makemore MLP Model

Model Overview

How It Works

References

License

FilesExpand file tree

README.md

Latest commit

History

README.md

File metadata and controls

Makemore MLP Model

Model Overview

How It Works

References

License