A Discord bot that uses Retrieval-Augmented Generation (RAG) to answer questions based on your document collection. The bot processes PDF and text files, creates a searchable vector database, and provides intelligent responses using Groq's LLaMA model.
- Document Processing: Automatically processes PDF and text files from your data directory
- Vector Search: Uses FAISS for efficient similarity search across document chunks
- AI-Powered Responses: Leverages Groq's LLaMA 3 model for intelligent question answering
- Discord Integration: Responds to mentions with formatted embed messages
- Source Attribution: Shows which documents were used to generate each answer
- Python 3.8 or higher
- Discord Bot Token
- Groq API Key
-
Clone the repository
git clone <your-repo-url> cd discord-rag-bot
-
Create a virtual environment
python -m venv .venv .venv\Scripts\activate # On Windows # source .venv/bin/activate # On macOS/Linux
-
Install dependencies
pip install -r requirements.txt
-
Set up environment variables
Create a
.envfile in the root directory:DISCORD_BOT_TOKEN=your_discord_bot_token_here GROQ_API_KEY=your_groq_api_key_here
- Go to the Discord Developer Portal
- Create a new application and bot
- Copy the bot token to your
.envfile - Invite the bot to your server with the following permissions:
- Read Messages
- Send Messages
- Use Slash Commands
- Embed Links
- Sign up at Groq Console
- Generate an API key
- Add it to your
.envfile
- Place your PDF or text files in the
data/directory - The bot will automatically process all
.pdfand.txtfiles
Before running the bot, create the vector database from your documents:
python rag_pipeline.pyThis will:
- Load all documents from the
data/directory - Split them into chunks
- Create embeddings using HuggingFace's BGE model
- Save the FAISS vector database
python bot.pyIn Discord, mention the bot with your question:
@YourBot What is the refund policy?
@YourBot How do I submit an assignment?
The bot will respond with an answer and show which documents were used as sources.
discord-rag-bot/
├── bot.py # Main Discord bot implementation
├── rag_pipeline.py # RAG pipeline and vector store creation
├── requirements.txt # Python dependencies
├── .env # Environment variables (create this)
├── data/ # Place your documents here
│ └── PUT THE PDFS HERE
└── vectorstore/ # Generated vector database
└── db_faiss/
The bot uses the following default models:
- Embedding Model:
BAAI/bge-small-en-v1.5(HuggingFace) - LLM Model:
llama3-8b-8192(Groq)
You can modify these in rag_pipeline.py:
MODEL_NAME = "BAAI/bge-small-en-v1.5" # Embedding model
# In create_rag_chain():
model_name="llama3-8b-8192" # LLM modelDocuments are split with these parameters:
- Chunk Size: 1000 characters
- Chunk Overlap: 150 characters
Adjust in rag_pipeline.py:
text_splitter = RecursiveCharacterTextSplitter(
chunk_size=1000,
chunk_overlap=150
)-
"Vector database not found" error
- Run
python rag_pipeline.pyfirst to create the database
- Run
-
"No documents found" error
- Ensure you have PDF or text files in the
data/directory
- Ensure you have PDF or text files in the
-
Discord bot not responding
- Check that the bot has proper permissions in your server
- Verify the bot token in your
.envfile
-
Groq API errors
- Verify your API key is correct
- Check your Groq API usage limits
- For large document collections, consider increasing chunk size
- The bot loads the RAG chain once at startup for efficiency
- Vector database is created locally and persists between runs
langchain- LangChain frameworklangchain-groq- Groq integrationlangchain-community- Community integrationsfaiss-cpu- Vector similarity searchsentence-transformers- Embedding modelspython-dotenv- Environment variable managementdiscord.py- Discord API wrapper
- Fork the repository
- Create a feature branch
- Make your changes
- Test thoroughly
- Submit a pull request
This project is open source. Please check the license file for details.
If you encounter issues:
- Check the troubleshooting section above
- Review the console output for error messages
- Ensure all dependencies are properly installed
- Verify your API keys and bot permissions
Note: This bot processes documents locally and uses external APIs (Groq) for language model inference. Ensure you comply with your organization's data handling policies when using sensitive documents.