Proof of Concept for a RAG-based Course Assistant

As a freshman at UC San Diego, I thought that the official course catalog wasn't the greatest resource when planning courses I wanted to take. This was a shared opinion among my peers.

To explore a solution, I developed a proof-of-concept chatbot using a Retrieval-Augmented Generation (RAG) system. I used a document I knew well, the course catalog from my local community college (El Camino College) as the initial dataset.

This POC evolved into the Course Assistant in the TritonPlanner app (tritonplanner.com). My team and I scaled the concept by webscraping data from UCSD's course catalogs, department webpages, and other websites, transforming all course information into vector embeddings to use in a RAG-based chatbot, helping students quickly find answers to any question about UCSD courses, prerequisites, and professors.

Requirements

Python Packages

PyPDF2==3.0.1
torch>=2.0.0
ollama>=0.1.7
openai>=1.0.0

Install Dependencies

pip install -r requirements.txt
Install Ollama: https://ollama.ai

Setup

ollama pull mxbai-embed-large
ollama pull dolphin-llama3
ollama serve

Usage

Use provided pdf path (course_catalog.pdf), or type in your own pdf path.
Run cells in order

Files Structure

RAG_FOR_PDF/
├── requirements.txt
├── pdf_upload.py          # PDF processing script
├── rag_chat.py            # Chat interface script  
├── chunked.txt            # Processed document 
└── README.md

How It Works

PDF Processing: Extracts text and splits into 1000-character chunks
Embeddings: Uses mxbai-embed-large to create vector representations
Retrieval: Finds most relevant chunks using cosine similarity
Generation: Uses dolphin-llama3 with context to answer questions (conversation history is maintained)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Proof of Concept for a RAG-based Course Assistant

Requirements

Python Packages

Install Dependencies

Setup

Usage

Files Structure

How It Works

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
README.md		README.md
chunked.txt		chunked.txt
course_catalog.pdf		course_catalog.pdf
rag_poc.ipynb		rag_poc.ipynb
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

Proof of Concept for a RAG-based Course Assistant

Requirements

Python Packages

Install Dependencies

Setup

Usage

Files Structure

How It Works

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages