As a freshman at UC San Diego, I thought that the official course catalog wasn't the greatest resource when planning courses I wanted to take. This was a shared opinion among my peers.
To explore a solution, I developed a proof-of-concept chatbot using a Retrieval-Augmented Generation (RAG) system. I used a document I knew well, the course catalog from my local community college (El Camino College) as the initial dataset.
This POC evolved into the Course Assistant in the TritonPlanner app (tritonplanner.com). My team and I scaled the concept by webscraping data from UCSD's course catalogs, department webpages, and other websites, transforming all course information into vector embeddings to use in a RAG-based chatbot, helping students quickly find answers to any question about UCSD courses, prerequisites, and professors.
PyPDF2==3.0.1
torch>=2.0.0
ollama>=0.1.7
openai>=1.0.0
- pip install -r requirements.txt
- Install Ollama: https://ollama.ai
- ollama pull mxbai-embed-large
- ollama pull dolphin-llama3
- ollama serve
- Use provided pdf path (course_catalog.pdf), or type in your own pdf path.
- Run cells in order
RAG_FOR_PDF/
├── requirements.txt
├── pdf_upload.py # PDF processing script
├── rag_chat.py # Chat interface script
├── chunked.txt # Processed document
└── README.md
- PDF Processing: Extracts text and splits into 1000-character chunks
- Embeddings: Uses
mxbai-embed-largeto create vector representations - Retrieval: Finds most relevant chunks using cosine similarity
- Generation: Uses
dolphin-llama3with context to answer questions (conversation history is maintained)