CUNY Course - RAG and LLM Core Concepts

Course materials and code examples for learning about RAG (Retrieval Augmented Generation) and LLM fundamentals.

Setup

Prerequisites

Python 3.9 or higher
Poetry for dependency management
Optional: Azure/OpenAI API access for the existing structured-output example

Installation

Clone this repository:

git clone https://github.com/drdk/CUNY_course.git
cd CUNY_course

Install dependencies using Poetry:

poetry install

(Optional) Set up your environment variables in a .env file for API-based examples:

# For Azure OpenAI
AZURE_OPENAI_ENDPOINT=your_endpoint_here
AZURE_OPENAI_API_KEY=your_key_here
AZURE_OPENAI_DEPLOYMENT=your_deployment_name

# Or for OpenAI
OPENAI_API_KEY=your_key_here

Running Code Examples

Using Poetry

Run Python scripts directly:

poetry run python CUNY_course/example_code/structured_output.py
poetry run python CUNY_course/example_code/format_errror.py
poetry run python CUNY_course/example_code/rag_pipeline/01_data_prep.py
poetry run python CUNY_course/example_code/rag_pipeline/02_chunking.py
poetry run python CUNY_course/example_code/rag_pipeline/03_embedding.py
poetry run python CUNY_course/example_code/rag_pipeline/04_metadata.py
poetry run python CUNY_course/example_code/rag_pipeline/05_retrieval.py
poetry run python CUNY_course/example_code/rag_pipeline/06_generation.py
poetry run python CUNY_course/example_code/rag_pipeline/pipeline_demo.py

Or activate the virtual environment first:

poetry shell
python CUNY_course/example_code/structured_output.py

Using Jupyter

Start Jupyter:

poetry run jupyter notebook

Open and run:

CUNY_course/example_code/rag_pipeline/rag_pipeline_demo.ipynb

Gradio Web App (Easiest Sharing)

Run the interactive RAG app locally:

poetry run python app.py

Then open http://localhost:7860.

Deploy for a Small Group (Hugging Face Spaces)

Create a new Gradio Space.
Push this repo (or at least app.py, requirements.txt, and CUNY_course/).
In Space Settings -> Variables and secrets, add:
- AZURE_OPENAI_ENDPOINT
- AZURE_OPENAI_API_KEY
- AZURE_OPENAI_DEPLOYMENT
(Recommended) Set Space visibility to Private and invite your users.

The app entrypoint is app.py and uses the same RAG pipeline code as the notebook.

Project Structure

CUNY_course/
├── CUNY_course/
│   ├── example_code/       # Python code examples
│   │   ├── structured_output.py
│   │   ├── format_errror.py
│   │   └── rag_pipeline/   # Local in-memory RG pipeline demo
│   ├── example_data/       # Sample data files
│   │   ├── 10languages.txt
│   │   ├── usconstitution.txt
│   │   └── ...
│   ├── data_types/         # Pydantic models and schemas
│   │   └── person.py
│   ├── outline.md          # Course outline
│   └── links.md            # Useful links and resources
├── pyproject.toml          # Poetry dependencies
└── README.md               # This file

Key Files

structured_output.py - Demonstrates how to extract structured data from text using LLMs with Pydantic models
format_errror.py - Simple example showing Python data structures
person.py - Pydantic models for representing people and relationships
outline.md - Full course outline with lecture topics
example_data/ - Various text files for experimentation

Local RG Pipeline Demo

Fully local and in-memory (no API keys, no persistence layer)
Uses open-source libraries only (sentence-transformers, faiss-cpu, transformers, torch)
One Python file per pipeline step plus a combined notebook demo
Default corpus file: CUNY_course/example_data/yosemite_guide.md
Default chunking: character-based chunks with overlap (word-based strategy still available)

Classroom Quick Commands

Use the included Makefile for one-command runs:

make install     # install dependencies
make rg-steps    # run all six RG step scripts
make rg-demo     # run full end-to-end RG demo
make notebook    # open the RG demo notebook
make all         # install + run full end-to-end RG demo

Development

Code Formatting

poetry run black .

Running Tests

poetry run pytest

Resources

See outline.md for the full course outline and lecture structure
See links.md for useful tools and resources

License

See the LICENSE file for details.

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
.vscode		.vscode
CUNY_course		CUNY_course
.DS_Store		.DS_Store
.gitignore		.gitignore
Makefile		Makefile
README.md		README.md
app.py		app.py
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CUNY Course - RAG and LLM Core Concepts

Setup

Prerequisites

Installation

Running Code Examples

Using Poetry

Using Jupyter

Gradio Web App (Easiest Sharing)

Deploy for a Small Group (Hugging Face Spaces)

Project Structure

Key Files

Local RG Pipeline Demo

Classroom Quick Commands

Development

Code Formatting

Running Tests

Resources

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

CUNY Course - RAG and LLM Core Concepts

Setup

Prerequisites

Installation

Running Code Examples

Using Poetry

Using Jupyter

Gradio Web App (Easiest Sharing)

Deploy for a Small Group (Hugging Face Spaces)

Project Structure

Key Files

Local RG Pipeline Demo

Classroom Quick Commands

Development

Code Formatting

Running Tests

Resources

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages