Qwen-at-144p Model Implementation

This repository contains a from-scratch implementation of a Qwen-like transformer model in PyTorch. The model incorporates several modern architectural features, including Grouped Query Attention, RMSNorm, and SwiGLU activation functions.

The project includes scripts for training the model on the Tiny Shakespeare dataset and for running inference to generate text.

we don't follow the hyperparameters as described in the architecture though.

image credit: I stole the architecture diagram from internet

model file

Check huggingface for model file: https://huggingface.co/heissanjay/qwen-tiny-shakespeare-experiment

Features

Qwen-like Transformer Architecture: A custom implementation of a decoder-only transformer model.
Grouped Query Attention (GQA): Efficient attention mechanism for faster inference and reduced memory usage.
RMSNorm: Used for layer normalization, providing better performance and stability.
SwiGLU Activation: A variant of the Gated Linear Unit (GLU) for improved performance.
Rotary Positional Embeddings (RoPE): Applied to queries and keys for better positional encoding.
Training and Inference Scripts: Clear and easy-to-use scripts for both training and text generation.
Hugging Face Integration: Uses the transformers library for tokenization and datasets for data loading.

File Descriptions

model.py: Defines the core components of the Qwen model, including the attention mechanism, transformer blocks, and the main model class.
train.py: The script for training the model. It loads the Tiny Shakespeare dataset, initializes the model and optimizer, and runs the training loop, saving the final model checkpoint.
inference.py: The script for generating text using a trained model checkpoint. It includes a function for streaming the generated text to the console.
config.json: A configuration file that stores the hyperparameters for the model, such as dimensions, number of heads, and vocabulary size.
architecture.png: A diagram illustrating the model's architecture.
note.ipynb: A Jupyter notebook used for experimentation, saving the tokenizer, and uploading the model to the Hugging Face Hub.
saved_models/: This directory contains the saved model checkpoints.
tokenizer/: This directory contains the tokenizer files, saved in the Hugging Face format.

Setup & Installation

Clone the repository:
```
git clone <repository-url>
cd qwen
```
Install the required dependencies: Make sure you have PyTorch installed. Then, install the other required packages.
```
pip install torch transformers datasets
```

Usage

Training

To train the model, run the train.py script:

python train.py

The script will train the model on the Tiny Shakespeare dataset and save the checkpoint to saved_models/qwen_model_checkpoint.pt. You can modify the training parameters (epochs, batch size, etc.) directly in the train.py script.

Inference

To generate text with the trained model, run the inference.py script:

python inference.py

This will load the model from the checkpoint and generate text based on the prompt provided in the script. You can change the prompt and generation parameters (e.g., max_new_tokens, temperature) in inference.py.

Configuration

The model's hyperparameters are defined in config.json. These parameters are loaded during both training and inference to ensure consistency.

{
    "n_dim": 512,
    "hidden_dim": 768,
    "seq_len": 128,
    "n_heads": 8,
    "depth": 12,
    "n_groups": 4,
    "vocab_size": 50257,
    "device": "cuda"
}

Model Checkpoint

The trained model is saved as qwen_model_checkpoint.pt in the saved_models directory. This file contains the model's state dictionary, the optimizer's state, the configuration, the epoch number, and the loss.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Qwen-at-144p Model Implementation

model file

Features

File Descriptions

Setup & Installation

Usage

Training

Inference

Configuration

Model Checkpoint

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
tokenizer		tokenizer
README.md		README.md
architecture.png		architecture.png
config.json		config.json
inference.py		inference.py
launch_sagemaker.py		launch_sagemaker.py
model.py		model.py
note.ipynb		note.ipynb
requirements.txt		requirements.txt
train.py		train.py

Folders and files

Latest commit

History

Repository files navigation

Qwen-at-144p Model Implementation

model file

Features

File Descriptions

Setup & Installation

Usage

Training

Inference

Configuration

Model Checkpoint

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages