Course materials and code examples for learning about RAG (Retrieval Augmented Generation) and LLM fundamentals.
- Python 3.9 or higher
- Poetry for dependency management
- Optional: Azure/OpenAI API access for the existing structured-output example
- Clone this repository:
git clone https://github.com/drdk/CUNY_course.git
cd CUNY_course- Install dependencies using Poetry:
poetry install- (Optional) Set up your environment variables in a
.envfile for API-based examples:
# For Azure OpenAI
AZURE_OPENAI_ENDPOINT=your_endpoint_here
AZURE_OPENAI_API_KEY=your_key_here
AZURE_OPENAI_DEPLOYMENT=your_deployment_name
# Or for OpenAI
OPENAI_API_KEY=your_key_hereRun Python scripts directly:
poetry run python CUNY_course/example_code/structured_output.py
poetry run python CUNY_course/example_code/format_errror.py
poetry run python CUNY_course/example_code/rag_pipeline/01_data_prep.py
poetry run python CUNY_course/example_code/rag_pipeline/02_chunking.py
poetry run python CUNY_course/example_code/rag_pipeline/03_embedding.py
poetry run python CUNY_course/example_code/rag_pipeline/04_metadata.py
poetry run python CUNY_course/example_code/rag_pipeline/05_retrieval.py
poetry run python CUNY_course/example_code/rag_pipeline/06_generation.py
poetry run python CUNY_course/example_code/rag_pipeline/pipeline_demo.pyOr activate the virtual environment first:
poetry shell
python CUNY_course/example_code/structured_output.pyStart Jupyter:
poetry run jupyter notebookOpen and run:
CUNY_course/example_code/rag_pipeline/rag_pipeline_demo.ipynb
Run the interactive RAG app locally:
poetry run python app.pyThen open http://localhost:7860.
- Create a new Gradio Space.
- Push this repo (or at least
app.py,requirements.txt, andCUNY_course/). - In Space Settings -> Variables and secrets, add:
AZURE_OPENAI_ENDPOINTAZURE_OPENAI_API_KEYAZURE_OPENAI_DEPLOYMENT
- (Recommended) Set Space visibility to Private and invite your users.
The app entrypoint is app.py and uses the same RAG pipeline code as the notebook.
CUNY_course/
├── CUNY_course/
│ ├── example_code/ # Python code examples
│ │ ├── structured_output.py
│ │ ├── format_errror.py
│ │ └── rag_pipeline/ # Local in-memory RG pipeline demo
│ ├── example_data/ # Sample data files
│ │ ├── 10languages.txt
│ │ ├── usconstitution.txt
│ │ └── ...
│ ├── data_types/ # Pydantic models and schemas
│ │ └── person.py
│ ├── outline.md # Course outline
│ └── links.md # Useful links and resources
├── pyproject.toml # Poetry dependencies
└── README.md # This file
structured_output.py- Demonstrates how to extract structured data from text using LLMs with Pydantic modelsformat_errror.py- Simple example showing Python data structuresperson.py- Pydantic models for representing people and relationshipsoutline.md- Full course outline with lecture topicsexample_data/- Various text files for experimentation
- Fully local and in-memory (no API keys, no persistence layer)
- Uses open-source libraries only (
sentence-transformers,faiss-cpu,transformers,torch) - One Python file per pipeline step plus a combined notebook demo
- Default corpus file:
CUNY_course/example_data/yosemite_guide.md - Default chunking: character-based chunks with overlap (word-based strategy still available)
Use the included Makefile for one-command runs:
make install # install dependencies
make rg-steps # run all six RG step scripts
make rg-demo # run full end-to-end RG demo
make notebook # open the RG demo notebook
make all # install + run full end-to-end RG demopoetry run black .poetry run pytest- See outline.md for the full course outline and lecture structure
- See links.md for useful tools and resources
See the LICENSE file for details.