Discord RAG Bot

A Discord bot that uses Retrieval-Augmented Generation (RAG) to answer questions based on your document collection. The bot processes PDF and text files, creates a searchable vector database, and provides intelligent responses using Groq's LLaMA model.

Features

Document Processing: Automatically processes PDF and text files from your data directory
Vector Search: Uses FAISS for efficient similarity search across document chunks
AI-Powered Responses: Leverages Groq's LLaMA 3 model for intelligent question answering
Discord Integration: Responds to mentions with formatted embed messages
Source Attribution: Shows which documents were used to generate each answer

Prerequisites

Python 3.8 or higher
Discord Bot Token
Groq API Key

Installation

Clone the repository

git clone <your-repo-url>
cd discord-rag-bot

Create a virtual environment

python -m venv .venv
.venv\Scripts\activate  # On Windows
# source .venv/bin/activate  # On macOS/Linux

Install dependencies
```
pip install -r requirements.txt
```

Set up environment variables

Create a .env file in the root directory:

DISCORD_BOT_TOKEN=your_discord_bot_token_here
GROQ_API_KEY=your_groq_api_key_here

Setup Guide

1. Discord Bot Setup

Go to the Discord Developer Portal
Create a new application and bot
Copy the bot token to your .env file
Invite the bot to your server with the following permissions:
- Read Messages
- Send Messages
- Use Slash Commands
- Embed Links

2. Groq API Setup

Sign up at Groq Console
Generate an API key
Add it to your .env file

3. Document Preparation

Place your PDF or text files in the data/ directory
The bot will automatically process all .pdf and .txt files

Usage

1. Create Vector Database

Before running the bot, create the vector database from your documents:

python rag_pipeline.py

This will:

Load all documents from the data/ directory
Split them into chunks
Create embeddings using HuggingFace's BGE model
Save the FAISS vector database

2. Run the Bot

python bot.py

3. Interact with the Bot

In Discord, mention the bot with your question:

@YourBot What is the refund policy?
@YourBot How do I submit an assignment?

The bot will respond with an answer and show which documents were used as sources.

Project Structure

discord-rag-bot/
├── bot.py                 # Main Discord bot implementation
├── rag_pipeline.py        # RAG pipeline and vector store creation
├── requirements.txt       # Python dependencies
├── .env                  # Environment variables (create this)
├── data/                 # Place your documents here
│   └── PUT THE PDFS HERE
└── vectorstore/          # Generated vector database
    └── db_faiss/

Configuration

Model Settings

The bot uses the following default models:

Embedding Model: BAAI/bge-small-en-v1.5 (HuggingFace)
LLM Model: llama3-8b-8192 (Groq)

You can modify these in rag_pipeline.py:

MODEL_NAME = "BAAI/bge-small-en-v1.5"  # Embedding model
# In create_rag_chain():
model_name="llama3-8b-8192"  # LLM model

Text Chunking

Documents are split with these parameters:

Chunk Size: 1000 characters
Chunk Overlap: 150 characters

Adjust in rag_pipeline.py:

text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=1000, 
    chunk_overlap=150
)

Troubleshooting

Common Issues

"Vector database not found" error
- Run python rag_pipeline.py first to create the database
"No documents found" error
- Ensure you have PDF or text files in the data/ directory
Discord bot not responding
- Check that the bot has proper permissions in your server
- Verify the bot token in your .env file
Groq API errors
- Verify your API key is correct
- Check your Groq API usage limits

Performance Tips

For large document collections, consider increasing chunk size
The bot loads the RAG chain once at startup for efficiency
Vector database is created locally and persists between runs

Dependencies

langchain - LangChain framework
langchain-groq - Groq integration
langchain-community - Community integrations
faiss-cpu - Vector similarity search
sentence-transformers - Embedding models
python-dotenv - Environment variable management
discord.py - Discord API wrapper

Contributing

Fork the repository
Create a feature branch
Make your changes
Test thoroughly
Submit a pull request

License

This project is open source. Please check the license file for details.

Support

If you encounter issues:

Check the troubleshooting section above
Review the console output for error messages
Ensure all dependencies are properly installed
Verify your API keys and bot permissions

Note: This bot processes documents locally and uses external APIs (Groq) for language model inference. Ensure you comply with your organization's data handling policies when using sensitive documents.

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
discord_bot		discord_bot
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Discord RAG Bot

Features

Prerequisites

Installation

Setup Guide

1. Discord Bot Setup

2. Groq API Setup

3. Document Preparation

Usage

1. Create Vector Database

2. Run the Bot

3. Interact with the Bot

Project Structure

Configuration

Model Settings

Text Chunking

Troubleshooting

Common Issues

Performance Tips

Dependencies

Contributing

License

Support

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Discord RAG Bot

Features

Prerequisites

Installation

Setup Guide

1. Discord Bot Setup

2. Groq API Setup

3. Document Preparation

Usage

1. Create Vector Database

2. Run the Bot

3. Interact with the Bot

Project Structure

Configuration

Model Settings

Text Chunking

Troubleshooting

Common Issues

Performance Tips

Dependencies

Contributing

License

Support

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages