build-text-llm-model

Story llm model

Introduction

Provide a brief overview of your project:

What does it do?
What problem does it solve?
Who is it for?

Example:

This project is a story generation model using GPT-2. It generates coherent and engaging stories based on given prompts. The model is fine-tuned on a custom dataset of story texts, allowing it to produce creative and contextually relevant narratives.

Features

List the main features of your project:

Text generation using GPT-2
Custom dataset preparation
Easy-to-use training and generation scripts

Example:

- Generates stories based on user prompts.
- Trained on a custom dataset for better relevance.
- Easy-to-use interface for training and generating text.

Installation

Provide step-by-step instructions on how to install and set up the project.

Example:

1. Clone the repository:
    ```bash
    git clone https://github.com/harinath23/build-text-llm-model
    cd build-text-llm-model
    ```

2. Create and activate a virtual environment:
    ```bash
    python -m venv env
    source env/bin/activate  # On Windows use `env\Scripts\activate`
    ```

3. Install the required dependencies:
    ```bash
    pip install -r requirements.txt
    ```

4. Set up the project (if any additional setup is needed, e.g., downloading pre-trained models):
    ```bash
    # Example command to download pre-trained models
    python main.py
    ```

Usage

Provide instructions on how to use the project. Include examples of commands and expected outputs.

Example:

1. **Tokenizing the Data:**
    ```python
    from yourmodule import tokenizer_func
    
    book_directory = "path/to/your/cleaned_books"
    tokenizer_func(book_directory)
    ```

2. **Training the Model:**
    ```python
    from transformers import Trainer, TrainingArguments
    
    # Define training arguments
    training_args = TrainingArguments(
        output_dir='./results',
        num_train_epochs=3,
        per_device_train_batch_size=2,
        save_steps=10_000,
        save_total_limit=2,
    )

    # Initialize Trainer and train
    trainer = Trainer(
        model=model,
        args=training_args,
        data_collator=data_collator,
        train_dataset=dataset,
    )

    trainer.train()
    ```

3. **Generating Stories:**
    ```python
    prompt = "Once upon a time, in a faraway land,"
    input_ids = tokenizer.encode(prompt, return_tensors='pt')
    output = model.generate(input_ids, max_length=200, num_return_sequences=1)
    generated_story = tokenizer.decode(output[0], skip_special_tokens=True)
    print(generated_story)
    ```

Dataset Preparation

Detail how to prepare the dataset for training.

Example:

1. Collect text data from sources such as Project Gutenberg, online stories, etc.
2. Clean and preprocess the data to remove unwanted text (e.g., headers, footers).
3. Tokenize the text using the provided `read_and_tokenize_book` function.
4. Split the tokenized text into chunks using the `chunk_tokens` function.

Training the Model

Provide detailed instructions on how to train the model.

Example:

1. Ensure your data is tokenized and chunked properly.
2. Use the `TextDataset` class to create a dataset.
3. Define training arguments using `TrainingArguments`.
4. Initialize the `Trainer` and start training using the `trainer.train()` method.

Generating Stories

Explain how to use the trained model to generate stories.

Example:

1. Provide a prompt to the model.
2. Encode the prompt using the tokenizer.
3. Use the `model.generate` method to generate the story.
4. Decode the generated tokens back to text using the tokenizer.

Contributing

Guidelines for contributing to the project.

Example:

1. Fork the repository.
2. Create a new branch (e.g., `feature/your-feature`).
3. Commit your changes.
4. Push to the branch.
5. Open a pull request.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
Docs		Docs
src		src
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

build-text-llm-model

Story llm model

Table of Contents

Introduction

Features

Installation

Usage

Dataset Preparation

Training the Model

Generating Stories

Contributing

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

build-text-llm-model

Story llm model

Table of Contents

Introduction

Features

Installation

Usage

Dataset Preparation

Training the Model

Generating Stories

Contributing

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages