AI Engineering Projects Overview

This repository contains five hands-on projects completed over five weeks, followed by a capstone in week six. The first five projects build capability with large language models, retrieval, tool use, research workflows, and multimodality. Week six is a capstone where you design your own system, tool, or startup idea based on your learnings. The instructions below are generic and apply to all projects. Each project also includes additional instructions specific to that project.

Each week, a new project is added to the repo at a specific release date and time. The weekly release includes the notebook, data, and environment file.

Quick start

You can run the projects either on Google Colab (no local setup required) or locally (using Conda environments for reproducibility).

Option A: Run in Google Colab

Upload the notebook for the current week to Colab.
If needed, add your API tokens using os.environ[...] = "value".
Ensure that any local file paths are adjusted for Colab.

Option B: Run locally with Conda

Each project comes with an environment.yml file that specifies its dependencies. This ensures consistent environments.

Install Miniconda or Anaconda.
Create and activate the environment from the provided YAML file:
```
conda env create -f environment.yml
conda activate <ENV_NAME>
```
The environment name is set inside the YAML. You can change it if desired.
Launch Jupyter and open the notebook for the current week:
```
jupyter notebook
```

Recommendation: Use Colab for projects 1 and 5, and local development for projects 2, 3, and 4.

Accounts and keys you may need

The projects are designed so they do not require specific API keys or tokens by default. However, they are flexible, meaning you can switch to different LLMs, models, and systems. Depending on what you choose to experiment with, you may need to set up API keys or tokens from certain providers.

Possible API keys you might need:

OPENAI_API_KEY for OpenAI models
ANTHROPIC_API_KEY for Claude models
GOOGLE_API_KEY for Gemini models
HUGGINGFACEHUB_API_TOKEN for Hugging Face hosted models and datasets
TAVILY_API_KEY or SERPAPI_API_KEY for web search tools
PINECONE_API_KEY, or alternatives if using remote vector stores

Project expectations

Projects are designed flexibly. They guide you step by step and provide the workflow. You will need to implement the sections marked with "your code here".
There are multiple ways to implement each section. Feel free to deviate from the provided template and experiment with different algorithms, models, and systems.
No submission is required. In the live deep-dive sessions, we will review each project in detail and show one possible implementation.

Troubleshooting

Post questions in the corresponding Q/A space. You are also welcome to share your thoughts, opinions, and interesting findings in the same space.

Weekly projects

Project 1: Build an LLM Playground

An introductory project to explore how prompts, tokenization, and decoding settings work in practice, building the foundation for effective use of large language models.

Learning objectives:

Tokenization of raw text into discrete tokens
Basics of GPT-2 and Transformer architectures
Loading pre-trained LLMs with Hugging Face
Decoding strategies for text generation
Completion vs. instruction-tuned models

Project 2: Customer-Support Chatbot for an E-Commerce Store

A hands-on project to build a retrieval-based chatbot that answers customer questions for an imaginary e-commerce store.

Learning objectives:

Ingest and chunk unstructured documents
Create embeddings and index with FAISS
Retrieve context and design prompts
Run an open-weight LLM locally with Ollama
Build a RAG (Retrieval-Augmented Generation) pipeline
Package the chatbot in a minimal Streamlit UI

Project 3: Ask-the-Web Agent

A project to create a simplified Perplexity-style agent that searches the web, reads content, and provides answers.

Learning objectives:

Understand why tool calling is useful for LLMs
Implement a loop to parse model calls and execute Python functions
Use function schemas (docstrings and type hints) to scale across tools
Apply LangChain for function calling, reasoning, and multi-step planning
Combine Llama-3 7B Instruct with a web search tool to build an ask-the-web agent

Project 4: Build a Deep Research System

A project focused on reasoning workflows, where you design a multi-step agent that plans, gathers evidence, and synthesizes findings.

Learning objectives:

Apply inference-time scaling methods (zero-shot/few-shot CoT, self-consistency, sequential decoding, tree-of-thoughts)
Gain intuition for training reasoning models with the STaR approach
Build a deep-research agent that combines reasoning with live web search
Extend deep research into a multi-agent system

Project 5: Build a Multimodal Generation Agent

A project to build an agent that combines textual question answering with image and video generation capabilities within a unified system.

Learning objectives:

Generate images from text using Stable Diffusion XL
Create short clips with a text-to-video model
Build a multimodal agent that handles questions and media requests
Develop a simple Gradio UI to interact with the agent

Week 6: Capstone Project

Purpose: Design and build your own system based on what you learned in weeks 1 to 5. This can be a product prototype, an internal tool, a research workflow, or the first step toward a startup idea. The hope is that some projects will continue after the cohort, using the connections and community built here.

Reference docs and readings

The following documentation pages cover the core libraries and services used across projects:

Conda documentation: Manage isolated Python environments and dependencies with Conda
Pip documentation: Install and manage Python packages with pip
duckduckgo-search: Python library to query DuckDuckGo search results programmatically
gradio: Build quick interactive demos and UIs for machine learning models
Streamlit documentation: Build and deploy simple web apps for data and ML projects
huggingface_hub: Access and share models, datasets, and spaces on Hugging Face Hub
langchain: Framework for building applications powered by LLMs with memory, tools, and chains
numpy: Core library for numerical computing and array operations in Python
openai: Official API docs for using OpenAI models like GPT and embeddings
tiktoken: Fast tokenizer library for OpenAI models, used for counting tokens
torch: PyTorch deep learning framework for training and running models
transformers: Hugging Face library for using pre-trained LLMs and fine-tuning them
llama-index: Data framework for connecting external data sources to LLMs
chromadb: Open-source vector database for storing and retrieving embeddings in RAG systems

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
ai_reading_list		ai_reading_list
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

AI Engineering Projects Overview

Quick start

Option A: Run in Google Colab

Option B: Run locally with Conda

Accounts and keys you may need

Project expectations

Troubleshooting

Weekly projects

Project 1: Build an LLM Playground

Project 2: Customer-Support Chatbot for an E-Commerce Store

Project 3: Ask-the-Web Agent

Project 4: Build a Deep Research System

Project 5: Build a Multimodal Generation Agent

Week 6: Capstone Project

Reference docs and readings

About

Uh oh!

Releases

Packages

bytebyteai/ai-eng-projects-2

Folders and files

Latest commit

History

Repository files navigation

AI Engineering Projects Overview

Quick start

Option A: Run in Google Colab

Option B: Run locally with Conda

Accounts and keys you may need

Project expectations

Troubleshooting

Weekly projects

Project 1: Build an LLM Playground

Project 2: Customer-Support Chatbot for an E-Commerce Store

Project 3: Ask-the-Web Agent

Project 4: Build a Deep Research System

Project 5: Build a Multimodal Generation Agent

Week 6: Capstone Project

Reference docs and readings

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Packages