This repository contains an implementation of a simple Multi-Layer Perceptron (MLP) model inspired by Andrej Karpathy's "makemore" series. The model is designed to generate text by predicting the next character/word in a sequence based on previous inputs.
The MLP class is built using PyTorch and consists of:
- Token Embedding Layer (
nn.Embedding): Converts token indices into dense vector representations. - Multi-Layer Perceptron (
nn.Sequential):- Fully connected layers (
nn.Linear) Tanhactivation function- Final output layer predicting the next token
- Fully connected layers (
- The input sequence of tokens is embedded into dense vectors.
- The model shifts input tokens and replaces the first token with a special
<BLANK>token. - The embeddings of previous words are concatenated and passed through an MLP.
- The model outputs logits (predictions) for the next token.
- During training, the model computes cross-entropy loss if target labels are provided.
This project is open-source and available under the MIT License.