Skip to content

heissanjay/qwen-3

Repository files navigation

Qwen-at-144p Model Implementation

This repository contains a from-scratch implementation of a Qwen-like transformer model in PyTorch. The model incorporates several modern architectural features, including Grouped Query Attention, RMSNorm, and SwiGLU activation functions.

The project includes scripts for training the model on the Tiny Shakespeare dataset and for running inference to generate text.

Model Architecture

we don't follow the hyperparameters as described in the architecture though.

image credit: I stole the architecture diagram from internet

model file

Check huggingface for model file: https://huggingface.co/heissanjay/qwen-tiny-shakespeare-experiment

Features

  • Qwen-like Transformer Architecture: A custom implementation of a decoder-only transformer model.
  • Grouped Query Attention (GQA): Efficient attention mechanism for faster inference and reduced memory usage.
  • RMSNorm: Used for layer normalization, providing better performance and stability.
  • SwiGLU Activation: A variant of the Gated Linear Unit (GLU) for improved performance.
  • Rotary Positional Embeddings (RoPE): Applied to queries and keys for better positional encoding.
  • Training and Inference Scripts: Clear and easy-to-use scripts for both training and text generation.
  • Hugging Face Integration: Uses the transformers library for tokenization and datasets for data loading.

File Descriptions

  • model.py: Defines the core components of the Qwen model, including the attention mechanism, transformer blocks, and the main model class.
  • train.py: The script for training the model. It loads the Tiny Shakespeare dataset, initializes the model and optimizer, and runs the training loop, saving the final model checkpoint.
  • inference.py: The script for generating text using a trained model checkpoint. It includes a function for streaming the generated text to the console.
  • config.json: A configuration file that stores the hyperparameters for the model, such as dimensions, number of heads, and vocabulary size.
  • architecture.png: A diagram illustrating the model's architecture.
  • note.ipynb: A Jupyter notebook used for experimentation, saving the tokenizer, and uploading the model to the Hugging Face Hub.
  • saved_models/: This directory contains the saved model checkpoints.
  • tokenizer/: This directory contains the tokenizer files, saved in the Hugging Face format.

Setup & Installation

  1. Clone the repository:

    git clone <repository-url>
    cd qwen
  2. Install the required dependencies: Make sure you have PyTorch installed. Then, install the other required packages.

    pip install torch transformers datasets

Usage

Training

To train the model, run the train.py script:

python train.py

The script will train the model on the Tiny Shakespeare dataset and save the checkpoint to saved_models/qwen_model_checkpoint.pt. You can modify the training parameters (epochs, batch size, etc.) directly in the train.py script.

Inference

To generate text with the trained model, run the inference.py script:

python inference.py

This will load the model from the checkpoint and generate text based on the prompt provided in the script. You can change the prompt and generation parameters (e.g., max_new_tokens, temperature) in inference.py.

Configuration

The model's hyperparameters are defined in config.json. These parameters are loaded during both training and inference to ensure consistency.

{
    "n_dim": 512,
    "hidden_dim": 768,
    "seq_len": 128,
    "n_heads": 8,
    "depth": 12,
    "n_groups": 4,
    "vocab_size": 50257,
    "device": "cuda"
}

Model Checkpoint

The trained model is saved as qwen_model_checkpoint.pt in the saved_models directory. This file contains the model's state dictionary, the optimizer's state, the configuration, the epoch number, and the loss.

Releases

No releases published

Packages

 
 
 

Contributors