Story-Character-Extractor

This project is a text document processing pipeline designed to extract detailed character information from stories. Using embeddings, vector databases, and a large language model (LLM), the system provides structured information about characters, including their relationships, roles, and summaries.

Features

Document Processing: Loads and preprocesses .txt files from a specified directory.
Embedding Computation: Generates vector embeddings for text chunks using MistralAI.
Character Information Extraction: Retrieves structured details about characters using vector similarity search and LLM prompts.
Command-Line Interface (CLI): Provides an easy-to-use interface for embedding computation and character queries.

Technologies Used

Python
LangChain Framework
MistralAI
Chroma Vector Database
Typer (CLI Framework)
Pydantic (Data Validation)

Installation

Clone the Repository

git clone https://github.com/Aawegg/Story-Character-Extractor.git
cd Story-Character-Extractor

Set Up a Virtual Environment

python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

Install Dependencies

pip install -r requirements.txt

Set Up Environment Variables

Create a .env file or set the following environment variables:

MISTRAL_API_KEY=your-mistral-api-key

Usage

Compute Embeddings Generate embeddings for all .txt files in a directory and store them in a vector database.

python main.py compute-embeddings-cli <dataset_path>

Example:
- Example:

python main.py compute-embeddings-cli ./stories

Retrieve Character Information Query the system for details about a specific character.

python main.py get-character-info-cli <character_name>

Example:
- Example:

python main.py get-character-info-cli Alice

Project Structure

Story-Character-Extractor/
├── document_processing.py   # Handles loading and preprocessing text files.
├── embeddings.py            # Computes and stores embeddings in a vector database.
├── extraction.py            # Extracts structured character information using LLMs.
├── main.py                  # CLI for embedding computation and character queries.
├── requirements.txt         # List of dependencies.
└── README.md                # Project documentation.

Example Workflow

Prepare a Dataset: Place .txt files in a directory, e.g., ./stories.
Compute Embeddings: Run the compute-embeddings-cli command to generate embeddings.
Query Character Information: Use the get-character-info-cli command to retrieve details about characters.

Sample Output

Input Command:

python main.py get-character-info-cli "Alice"

Output (JSON):

{
    "name": "Alice",
    "storyTitle": "Adventures in Wonderland",
    "summary": "A curious and adventurous girl who explores a magical world.",
    "relations": {
        "White Rabbit": {
            "relationType": "Friend",
            "summary": "A guide and companion during her journey."
        },
        "Queen of Hearts": {
            "relationType": "Antagonist",
            "summary": "The ruler of Wonderland who opposes Alice."
        }
    },
    "characterType": "Protagonist"
}

Contributing

Contributions are welcome! If you have suggestions for improvements or new features, feel free to open an issue or submit a pull request.

License

This project is licensed under the MIT License.

Contact

For any inquiries or support, contact Aaweg Bhaladhare at aaweg.22110711@viit.ac.in.

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
Outputs		Outputs
stories		stories
.gitignore		.gitignore
LICENSE		LICENSE
document_processing.py		document_processing.py
embeddings.py		embeddings.py
extraction.py		extraction.py
main.py		main.py
readme.md		readme.md
requirements.txt		requirements.txt
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Story-Character-Extractor

Features

Technologies Used

Installation

Clone the Repository

Set Up a Virtual Environment

Install Dependencies

Set Up Environment Variables

Usage

Project Structure

Example Workflow

Sample Output

Input Command:

Output (JSON):

Contributing

License

Contact

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Story-Character-Extractor

Features

Technologies Used

Installation

Clone the Repository

Set Up a Virtual Environment

Install Dependencies

Set Up Environment Variables

Usage

Project Structure

Example Workflow

Sample Output

Input Command:

Output (JSON):

Contributing

License

Contact

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages