A study assistant that answers questions about your lecture notes using RAG (Retrieval-Augmented Generation). You upload a PDF, it gets embedded into a vector database, and then you can ask questions about it through a chat interface.
- Searches through your documents to find relevant sections
- Uses GPT to generate answers based only on your materials
- Remembers the conversation so you can ask follow-up questions
You'll need:
- Python 3.10 or higher
- An OpenAI API key
- A Pinecone API key (free tier works fine)
Clone the repo and cd into it:
git clone https://github.com/khalilCodeX/Smart-Study-Buddy-.git
cd Smart-Study-Buddy-Create a virtual environment:
python -m venv venv
source venv/bin/activateOn Windows use venv\Scripts\activate instead.
Install the dependencies:
pip install langchain langchain-openai langchain-core langchain-community
pip install pinecone pypdf python-dotenv gradioCreate a .env file in the project root with your API keys:
OPEN_AI_KEY=your_openai_api_key_here
PINECONE_KEY=your_pinecone_api_key_here
Put your PDF in the project folder and update the path in datastore.py:
self.file_path = "./your-document.pdf"First time only - you need to embed your document into Pinecone. Open chat.py and uncomment the embed_documents() line, or run this in Python:
from chat import chat
chatbot = chat()
chatbot.embed_documents()This splits your PDF into chunks and uploads them to Pinecone. It only needs to be done once per document.
Then start the web interface:
python gradio_chat_app.pyOpen http://localhost:7860 in your browser and start asking questions.
chat.py - main chat logic
datastore.py - loads and splits the PDF
vectorstore.py - handles Pinecone operations
management.py - initializes the API clients
gradio_chat_app.py - the web interface
system_prompt.md - instructions for the LLM
If you get an "index not found" error, you probably haven't run embed_documents() yet.
If API keys aren't working, double check your .env file uses OPEN_AI_KEY and PINECONE_KEY exactly.
If you're getting empty responses, check the Pinecone dashboard to make sure your vectors were actually uploaded.
You can change how the AI responds by editing system_prompt.md.
To adjust how the PDF gets split into chunks, look at the RecursiveCharacterTextSplitter settings in datastore.py. The default is 1000 characters with 200 character overlap.