Mythryl

This project aims to create an easy-to-use script for building a RAG-based personalized chatbot using social interaction data (currently limited to WhatsApp chats).

Features

Multi-Service Support: Works with both cloud-based (Gemini) and local (Ollama) language models.
Automatic Data Processing: Extracts chat logs from .zip files directly from whatsapp
Intelligent Conversation Context Analysis: The bot creates diverse training datasets with conversation initiations, contextual responses, direct Q&As, and topic transitions.
RAG-Powered Pipeline: Implements Retrieval-Augmented Generation (RAG) using FAISS vector search to find relevant conversation examples and enhance AI responses with authentic communication patterns.
Context-Aware Response Generation: Combines vector similarity search and actual conversation history to generate replies that reflect both the style and context of your chosen persona.
Configurable LLM Parameters: Easily customize LLM temperature, top-p, context size, and search count via the env variables
One-Click Setup & Auto-Configuration: Automatically processes WhatsApp chat exports, creates vector databases, validates sender names, and sets up the entire RAG pipeline with minimal intervention from you!!

How It Works

Setup Phase: `setup.py`

Initial Setup & Validation:

Creates temp/ and extracted_chats/ directories, each with Personal/ and Group/ subfolders, then waits for you to upload chat files.
Automatically extracts .zip archives and consolidates .txt files, simplifying persona management.
Scans for WhatsApp chat .txt files and displays persona statistics.
Validates that the provided sender name exists in the chat files before proceeding.
Saves sender name to temp/sender_name.txt for future use.

Chat Processing:

Parses WhatsApp export format using regex (timestamp, sender, message).
Consolidates consecutive messages from the same sender and detects conversation breaks (gaps of several hours).
Creates 4 types of training examples: conversation starters, contextual responses, direct Q&A pairs, and topic transitions.
Exports processed data to temp/persona_style_v2.csv with relevant metadata.

Vector Database Creation:

Encodes all conversation prompts using the all-MiniLM-L6-v2 SentenceTransformer model.
Builds a FAISS vector index for semantic similarity search.
Saves the index to temp/style_v2.index for fast retrieval.

Chat Phase: `chat.py`

Service Selection:

Prompts you to choose between the gemini (cloud) or ollama (local) service.

RAG-Powered Response Generation:

Loads the FAISS index, conversation dataset, and the selected AI model.
Performs a semantic search to find similar conversation examples for your query.
Combines retrieved examples with conversation history to create context-rich prompts for the LLM.
Uses the chosen LLM (Gemini or Ollama) to generate responses that match the communication style for the given persona.

Setup and Usage

1. Installation

Clone and enter the repo:

 git clone https://github.com/Animesh-Varma/Mythryl.git
 cd Mythryl

Use Python 3.9 – 3.11. (Optional but recommended) Set up a virtual environment:

 python -m venv .venv

Activate your virtual environment:

 source .venv/bin/activate  # on Windows: .venv\Scripts\activate

Then install the requirements:

Note: You can customize the installation based on your needs. By default, it will install dependencies for both online and offline inference.

For online inference only: Remove ollama from requirements.txt.
For offline inference only: Remove google-generativeai from requirements.txt.
```
pip install -r requirements.txt
```

2. Data Generation

To get the chatbot ready, run the setup script and just follow the on-screen instructions:

python setup.py

Important: If not running for the first time place each exported WhatsApp .txt/.zip file in the auto created folder extracted_chats before running the setup script

3. Setting Environment Variables

Create a .env file in the root directory and add the following variables.

For Gemini (Cloud-based):
```
API_KEY=YOUR_GEMINI_API_KEY_HERE
```

For Ollama (Local):

OLLAMA_MODEL=NAME_OF_THE_LOCAL_MODEL_TO_USE

Optional LLM Parameters:

LLM_TEMPERATURE=0.7 
LLM_TOP_P=0.9
LLM_CONTEXT_SIZE=6
VECTOR_DB_SEARCH_COUNT=5

Optional Script Parameters:
```
DEBUG = False
```

4. Running the Chatbot

Once your data files are ready, you can chat with your personalized AI by running:

python chat.py

Inside the chat session: type switch to change persona or quit to exit.

API Usage

Mythryl includes a local API server, allowing you to integrate your personalized chatbot with other applications. To use the API first run and follow the setup.py, then start the server:

python api.py

Then the API will be available at http://127.0.0.1:50507.

Endpoints

GET /personas

Returns a list of all available personas.

Parameters: None

Example Request:

curl -X GET "http://127.0.0.1:50507/personas"

Example Response:

{
  "personas": [
    "persona1",
    "persona2",
    "persona3"
  ]
}

POST /verify_persona

Verifies if the persona is available. If an exact match is not found, it suggests the closest match.

Parameters:

persona (string, required): The name of the persona to verify.

Example Request:

curl -X POST "http://127.0.0.1:50507/verify_persona" -H "Content-Type: application/json" -d '{
  "persona": "persna1"
}'

Example Response (Closest Match):

{
  "status": "closest_match",
  "persona": "persona1",
  "confidence": 86
}

POST /chat

Handles a chat request with a specific persona.

Parameters:

persona (string, required): The name of the persona to chat with.
message (string, required): The user's current message.
service (string, optional, default: gemini): The model service to use (gemini or ollama).
conversation_history (list[string], optional): A list of strings representing the recent conversation for context.

Example Request:

curl -X POST "http://127.0.0.1:50507/chat" -H "Content-Type: application/json" -d '{
  "persona": "persona1",
  "message": "What was the last thing we talked about?",
  "service": "gemini",
  "conversation_history": [
    "User: What are your hobbies?",
    "Bot: I enjoy processing data and learning new things."
  ]
}'

Example Response:

{
  "response": "We talked about my hobbies."
}

POST /add_message

Adds a new message to the vector database for the specified persona.

Parameters:

persona (string, required): The persona to associate the new example with.
prompt (string, required): The new prompt or context.
response (string, required): The new response.

Example Request:

curl -X POST "http://127.0.0.1:50507/add_message" -H "Content-Type: application/json" -d '{
  "persona": "persona1",
  "prompt": "This is a new prompt.",
  "response": "This is a new response."
}'

Example Response:

{
  "message": "Message added successfully."
}

Config

Configuration is managed through the .env file. You can set your API key, choose your Ollama model, and adjust LLM and script parameters. If you want to change specific paths, gemini models or system prompts, you can do that directly in the scripts. [I'd suggest using gemini for generation as it works a lot better in my testing, the default gemini model is 2.5 flash]

TODO & Contributions

Contribute if you can, issues and feature requests are always appreciated! Future planned features, tasks, and any pending fixes are listed in the TODO.md (Some items are a bit vague, so feel free to email me if you need clarification.) Feel free to open a pull request or issue for new features, bug fixes, or suggestions. Everyone is more than welcome!

Tested on

Windows Home Version 24H2
Arch Linux
Debian 6.1.135-1 (aarch64, running on a terminal emulator on my Pixel 9)

Use of AI

The use of AI in development is now inevitable,trying to avoid it is simply impractical. In my humble view, the best approach is to use AI for bulk raw generation, and then fine-tune the results manually. That’s exactly the philosophy behind this project! The concept and core implementation are entirely my own, with invaluable assistance from AI systems, especially for rapid prototyping and improving code readability. Rest assured, all code was thoroughly tested and carefully reviewed by me before release.

Privacy

This project uses a hybrid approach to data privacy, combining local processing with cloud-based AI services. Here’s how your data is handled at each stage:

Local Data Processing (`setup.py`)

All your chat data stays on your local machine during setup. The setup.py script reads your chat export files from the extracted_chats directory and processes them locally.
Generated files (persona_style_v2.csv, style_v2.index, and sender_name.txt) are stored in the temp directory on your computer.
No chat data is sent to any external server or cloud service during this phase. The sentence transformer model for vector embeddings is downloaded and runs entirely on your machine.

Cloud-Based AI Interaction (`chat.py`)

// only applicable if using gemini as the provider, choosing ollama instead does all the processing locally

When you chat with the bot, certain pieces of information are sent to the Google Gemini API to generate responses. This is the only time your data leaves your local device.
The data sent to Gemini API includes:
- The message you type (your query)
- The last 6 messages of the ongoing conversation for context
- A few relevant examples from your own chat history (retrieved from your local persona_style_v2.csv) to help the AI match your style
- Your sender name and chosen persona context
Your API key, stored in the .env file, is used for secure authentication with the Gemini API.

Contact

Feel free to reach out if you have questions, suggestions, or want to collaborate! Email: animesh_varma@protonmail.com

NOTE: I’m a high school student building this project in any spare time I can find, so contributors and general advice are always more than welcomed! Also, this is my first serious project, so please excuse any small mistakes :-)

Name		Name	Last commit message	Last commit date
Latest commit History 64 Commits
.github		.github
.env.example		.env.example
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
TODO.md		TODO.md
api.py		api.py
chat.py		chat.py
core.py		core.py
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Mythryl

Table of Contents

Features

How It Works

Setup Phase: `setup.py`

Chat Phase: `chat.py`

Setup and Usage

1. Installation

2. Data Generation

3. Setting Environment Variables

4. Running the Chatbot

API Usage

Endpoints

GET /personas

POST /verify_persona

POST /chat

POST /add_message

Config

TODO & Contributions

Tested on

Use of AI

Privacy

Local Data Processing (`setup.py`)

Cloud-Based AI Interaction (`chat.py`)

Contact

About

Uh oh!

Releases 3

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Mythryl

Table of Contents

Features

How It Works

Setup Phase: setup.py

Chat Phase: chat.py

Setup and Usage

1. Installation

2. Data Generation

3. Setting Environment Variables

4. Running the Chatbot

API Usage

Endpoints

GET /personas

POST /verify_persona

POST /chat

POST /add_message

Config

TODO & Contributions

Tested on

Use of AI

Privacy

Local Data Processing (setup.py)

Cloud-Based AI Interaction (chat.py)

Contact

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 3

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Setup Phase: `setup.py`

Chat Phase: `chat.py`

Local Data Processing (`setup.py`)

Cloud-Based AI Interaction (`chat.py`)

Packages