📚 RAG From Scratch

Overview

LLMs are trained on a large but fixed corpus of data, limiting their ability to reason about private or recent information. Fine-tuning is one way to mitigate this, but is often not well-suited for factual recall and can be costly.

Retrieval Augmented Generation (RAG) has emerged as a popular and powerful mechanism to expand an LLM's knowledge base, using documents retrieved from an external data source to ground the LLM generation via in-context learning.

This repository contains comprehensive notebooks that build up an understanding of RAG from scratch, covering the complete RAG landscape including indexing, retrieval, query transformations, routing, and advanced techniques.

🎥 Reference Video Playlist

This repository accompanies the excellent "RAG From Scratch" video series by LangChain:

🔗 Watch the Complete Playlist

📖 Table of Contents

1. RAG From Scratch: Parts 1-4 (Overview)

Foundation & Core Concepts

This notebook introduces the fundamental building blocks of RAG systems:

Part 1: Overview & Environment Setup
Part 2: Indexing - Document loading, text splitting, and vector embeddings
Part 3: Retrieval - Semantic search and document retrieval from vector stores
Part 4: Generation - Combining retrieved context with LLM prompts to generate answers

Key Topics: Document indexing, vector embeddings, ChromaDB, basic RAG pipeline, retrieval-augmented generation

2. RAG From Scratch: Parts 5-9 (Query Transformations)

Advanced Query Processing Techniques

Explore sophisticated methods for transforming and enhancing user queries:

Part 5: Multi-Query - Generate multiple perspectives of a single query for better retrieval
Part 6: RAG-Fusion - Combine results from multiple query variations with reciprocal rank fusion
Part 7: Decomposition - Break complex questions into simpler sub-questions
Part 8: Step-Back Prompting - Generate broader queries to retrieve higher-level context
Part 9: HyDE (Hypothetical Document Embeddings) - Generate hypothetical answers to improve retrieval

Key Topics: Query rewriting, multi-query retrieval, query decomposition, RAG-fusion, step-back prompting, HyDE

3. RAG From Scratch: Parts 10-11 (Routing)

Intelligent Query Routing

Learn how to route queries intelligently to different data sources or processing pipelines:

Part 10: Logical Routing - Use function-calling to classify and route queries logically
Part 11: Semantic Routing - Route queries based on semantic similarity to data sources

Key Topics: Query routing, logical routing, semantic routing, function calling, multi-index routing

4. RAG From Scratch: Parts 12-14 (Indexing)

Advanced Indexing Strategies

Deep dive into sophisticated indexing techniques for better retrieval:

Part 12: Multi-Representation Indexing - Index document summaries while retrieving full documents
Part 13: RAPTOR - Recursively cluster and summarize documents for hierarchical retrieval
Part 14: ColBERT - Late interaction models for token-level similarity matching

Key Topics: Multi-representation indexing, document summarization, hierarchical indexing, RAPTOR, ColBERT, late interaction retrieval

5. RAG From Scratch: Parts 15-18 (Retrieval)

Advanced Retrieval Methods

Master advanced retrieval techniques to improve answer quality:

Part 15: Re-Ranking - Use cross-encoders to re-rank retrieved documents (e.g., Cohere Rerank)
Part 16: Compression - Filter and compress retrieved context to focus on relevant information
Part 17: Contextual Compression - Combine compression with base retriever
Part 18: Fusion Retrieval - Merge results from multiple retrieval strategies

Key Topics: Re-ranking, cross-encoders, contextual compression, result fusion, Cohere rerank

🚀 Getting Started

Prerequisites

pip install langchain langchain_community langchain-openai
pip install tiktoken chromadb langchainhub
pip install youtube-transcript-api pytube cohere

Environment Setup

You'll need to set up the following:

LangSmith (optional, for tracing):
- Sign up at smith.langchain.com
- Set environment variables: LANGCHAIN_TRACING_V2, LANGCHAIN_ENDPOINT, LANGCHAIN_API_KEY
API Keys:
- OpenAI API key: OPENAI_API_KEY
- Cohere API key (for Parts 15-18): COHERE_API_KEY

📚 Learning Path

Recommended Order:

Start with Parts 1-4 to understand RAG fundamentals
Move to Parts 5-9 to learn query transformation techniques
Study Parts 10-11 for intelligent routing strategies
Explore Parts 12-14 for advanced indexing methods
Complete with Parts 15-18 to master retrieval optimization

🤝 Contributing

Feel free to open issues or submit pull requests for improvements!

📝 License

This project follows the original repository's licensing.

🙏 Acknowledgments

LangChain team for the excellent video series and educational content
Original repository: langchain-ai/rag-from-scratch

Name		Name	Last commit message	Last commit date
Latest commit History 24 Commits
README.md		README.md
rag_from_scratch_10_and_11(Routing).ipynb		rag_from_scratch_10_and_11(Routing).ipynb
rag_from_scratch_12_to_14(Indexing).ipynb		rag_from_scratch_12_to_14(Indexing).ipynb
rag_from_scratch_15_to_18(Retrieval).ipynb		rag_from_scratch_15_to_18(Retrieval).ipynb
rag_from_scratch_1_to_4_(Overview).ipynb		rag_from_scratch_1_to_4_(Overview).ipynb
rag_from_scratch_5_to_9(Query Transformations).ipynb		rag_from_scratch_5_to_9(Query Transformations).ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

📚 RAG From Scratch

Overview

🎥 Reference Video Playlist

📖 Table of Contents

1. RAG From Scratch: Parts 1-4 (Overview)

2. RAG From Scratch: Parts 5-9 (Query Transformations)

3. RAG From Scratch: Parts 10-11 (Routing)

4. RAG From Scratch: Parts 12-14 (Indexing)

5. RAG From Scratch: Parts 15-18 (Retrieval)

🚀 Getting Started

Prerequisites

Environment Setup

📚 Learning Path

🤝 Contributing

📝 License

🙏 Acknowledgments

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

📚 RAG From Scratch

Overview

🎥 Reference Video Playlist

📖 Table of Contents

🚀 Getting Started

Prerequisites

Environment Setup

📚 Learning Path

🤝 Contributing

📝 License

🙏 Acknowledgments

About

Resources

Uh oh!

Stars

Watchers

Forks

Uh oh!

Uh oh!

Languages