Skip to content

khalilCodeX/Smart-Study-Buddy

Repository files navigation

Smart Study Buddy

A study assistant that answers questions about your lecture notes using RAG (Retrieval-Augmented Generation). You upload a PDF, it gets embedded into a vector database, and then you can ask questions about it through a chat interface.

What it does

  • Searches through your documents to find relevant sections
  • Uses GPT to generate answers based only on your materials
  • Remembers the conversation so you can ask follow-up questions

Before you start

You'll need:

  • Python 3.10 or higher
  • An OpenAI API key
  • A Pinecone API key (free tier works fine)

Setup

Clone the repo and cd into it:

git clone https://github.com/khalilCodeX/Smart-Study-Buddy-.git
cd Smart-Study-Buddy-

Create a virtual environment:

python -m venv venv
source venv/bin/activate

On Windows use venv\Scripts\activate instead.

Install the dependencies:

pip install langchain langchain-openai langchain-core langchain-community
pip install pinecone pypdf python-dotenv gradio

Create a .env file in the project root with your API keys:

OPEN_AI_KEY=your_openai_api_key_here
PINECONE_KEY=your_pinecone_api_key_here

Adding your own PDF

Put your PDF in the project folder and update the path in datastore.py:

self.file_path = "./your-document.pdf"

Running it

First time only - you need to embed your document into Pinecone. Open chat.py and uncomment the embed_documents() line, or run this in Python:

from chat import chat
chatbot = chat()
chatbot.embed_documents()

This splits your PDF into chunks and uploads them to Pinecone. It only needs to be done once per document.

Then start the web interface:

python gradio_chat_app.py

Open http://localhost:7860 in your browser and start asking questions.

Project structure

chat.py              - main chat logic
datastore.py         - loads and splits the PDF
vectorstore.py       - handles Pinecone operations
management.py        - initializes the API clients
gradio_chat_app.py   - the web interface
system_prompt.md     - instructions for the LLM

Common issues

If you get an "index not found" error, you probably haven't run embed_documents() yet.

If API keys aren't working, double check your .env file uses OPEN_AI_KEY and PINECONE_KEY exactly.

If you're getting empty responses, check the Pinecone dashboard to make sure your vectors were actually uploaded.

Customization

You can change how the AI responds by editing system_prompt.md.

To adjust how the PDF gets split into chunks, look at the RecursiveCharacterTextSplitter settings in datastore.py. The default is 1000 characters with 200 character overlap.

About

This tool aims to make studying more efficient and targeted.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages