Skip to content

harinath23/build-text-llm-model

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 

Repository files navigation

build-text-llm-model

Story llm model

Table of Contents

  1. Introduction
  2. Features
  3. Installation
  4. Usage
  5. Dataset Preparation
  6. Training the Model
  7. Generating Stories
  8. Contributing
  9. License

Introduction

Provide a brief overview of your project:

  • What does it do?
  • What problem does it solve?
  • Who is it for?

Example:

This project is a story generation model using GPT-2. It generates coherent and engaging stories based on given prompts. The model is fine-tuned on a custom dataset of story texts, allowing it to produce creative and contextually relevant narratives.

Features

List the main features of your project:

  • Text generation using GPT-2
  • Custom dataset preparation
  • Easy-to-use training and generation scripts

Example:

- Generates stories based on user prompts.
- Trained on a custom dataset for better relevance.
- Easy-to-use interface for training and generating text.

Installation

Provide step-by-step instructions on how to install and set up the project.

Example:

1. Clone the repository:
    ```bash
    git clone https://github.com/harinath23/build-text-llm-model
    cd build-text-llm-model
    ```

2. Create and activate a virtual environment:
    ```bash
    python -m venv env
    source env/bin/activate  # On Windows use `env\Scripts\activate`
    ```

3. Install the required dependencies:
    ```bash
    pip install -r requirements.txt
    ```

4. Set up the project (if any additional setup is needed, e.g., downloading pre-trained models):
    ```bash
    # Example command to download pre-trained models
    python main.py
    ```

Usage

Provide instructions on how to use the project. Include examples of commands and expected outputs.

Example:

1. **Tokenizing the Data:**
    ```python
    from yourmodule import tokenizer_func
    
    book_directory = "path/to/your/cleaned_books"
    tokenizer_func(book_directory)
    ```

2. **Training the Model:**
    ```python
    from transformers import Trainer, TrainingArguments
    
    # Define training arguments
    training_args = TrainingArguments(
        output_dir='./results',
        num_train_epochs=3,
        per_device_train_batch_size=2,
        save_steps=10_000,
        save_total_limit=2,
    )

    # Initialize Trainer and train
    trainer = Trainer(
        model=model,
        args=training_args,
        data_collator=data_collator,
        train_dataset=dataset,
    )

    trainer.train()
    ```

3. **Generating Stories:**
    ```python
    prompt = "Once upon a time, in a faraway land,"
    input_ids = tokenizer.encode(prompt, return_tensors='pt')
    output = model.generate(input_ids, max_length=200, num_return_sequences=1)
    generated_story = tokenizer.decode(output[0], skip_special_tokens=True)
    print(generated_story)
    ```

Dataset Preparation

Detail how to prepare the dataset for training.

Example:

1. Collect text data from sources such as Project Gutenberg, online stories, etc.
2. Clean and preprocess the data to remove unwanted text (e.g., headers, footers).
3. Tokenize the text using the provided `read_and_tokenize_book` function.
4. Split the tokenized text into chunks using the `chunk_tokens` function.

Training the Model

Provide detailed instructions on how to train the model.

Example:

1. Ensure your data is tokenized and chunked properly.
2. Use the `TextDataset` class to create a dataset.
3. Define training arguments using `TrainingArguments`.
4. Initialize the `Trainer` and start training using the `trainer.train()` method.

Generating Stories

Explain how to use the trained model to generate stories.

Example:

1. Provide a prompt to the model.
2. Encode the prompt using the tokenizer.
3. Use the `model.generate` method to generate the story.
4. Decode the generated tokens back to text using the tokenizer.

Contributing

Guidelines for contributing to the project.

Example:

1. Fork the repository.
2. Create a new branch (e.g., `feature/your-feature`).
3. Commit your changes.
4. Push to the branch.
5. Open a pull request.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages