This repository contains a from-scratch implementation of a Qwen-like transformer model in PyTorch. The model incorporates several modern architectural features, including Grouped Query Attention, RMSNorm, and SwiGLU activation functions.
The project includes scripts for training the model on the Tiny Shakespeare dataset and for running inference to generate text.
we don't follow the hyperparameters as described in the architecture though.
image credit: I stole the architecture diagram from internet
Check huggingface for model file: https://huggingface.co/heissanjay/qwen-tiny-shakespeare-experiment
- Qwen-like Transformer Architecture: A custom implementation of a decoder-only transformer model.
- Grouped Query Attention (GQA): Efficient attention mechanism for faster inference and reduced memory usage.
- RMSNorm: Used for layer normalization, providing better performance and stability.
- SwiGLU Activation: A variant of the Gated Linear Unit (GLU) for improved performance.
- Rotary Positional Embeddings (RoPE): Applied to queries and keys for better positional encoding.
- Training and Inference Scripts: Clear and easy-to-use scripts for both training and text generation.
- Hugging Face Integration: Uses the
transformerslibrary for tokenization anddatasetsfor data loading.
model.py: Defines the core components of the Qwen model, including the attention mechanism, transformer blocks, and the main model class.train.py: The script for training the model. It loads the Tiny Shakespeare dataset, initializes the model and optimizer, and runs the training loop, saving the final model checkpoint.inference.py: The script for generating text using a trained model checkpoint. It includes a function for streaming the generated text to the console.config.json: A configuration file that stores the hyperparameters for the model, such as dimensions, number of heads, and vocabulary size.architecture.png: A diagram illustrating the model's architecture.note.ipynb: A Jupyter notebook used for experimentation, saving the tokenizer, and uploading the model to the Hugging Face Hub.saved_models/: This directory contains the saved model checkpoints.tokenizer/: This directory contains the tokenizer files, saved in the Hugging Face format.
-
Clone the repository:
git clone <repository-url> cd qwen
-
Install the required dependencies: Make sure you have PyTorch installed. Then, install the other required packages.
pip install torch transformers datasets
To train the model, run the train.py script:
python train.pyThe script will train the model on the Tiny Shakespeare dataset and save the checkpoint to saved_models/qwen_model_checkpoint.pt. You can modify the training parameters (epochs, batch size, etc.) directly in the train.py script.
To generate text with the trained model, run the inference.py script:
python inference.pyThis will load the model from the checkpoint and generate text based on the prompt provided in the script. You can change the prompt and generation parameters (e.g., max_new_tokens, temperature) in inference.py.
The model's hyperparameters are defined in config.json. These parameters are loaded during both training and inference to ensure consistency.
{
"n_dim": 512,
"hidden_dim": 768,
"seq_len": 128,
"n_heads": 8,
"depth": 12,
"n_groups": 4,
"vocab_size": 50257,
"device": "cuda"
}The trained model is saved as qwen_model_checkpoint.pt in the saved_models directory. This file contains the model's state dictionary, the optimizer's state, the configuration, the epoch number, and the loss.
