A Retrieval Augmented Generation (RAG) system for finding recipes using natural language search. This project uses semantic search to match your queries with relevant recipes, even if they don't contain the exact words you're looking for.
- Natural Language Search: Find recipes using conversational queries like "healthy vegetarian dinner" or "quick breakfast ideas"
- Semantic Understanding: Leverages modern embedding models to understand the meaning behind your queries
- Text Chunking: Splits recipes into smaller pieces for more accurate semantic search
- Interactive UI: User-friendly Streamlit interface for searching and viewing recipes
- Flexible Data Sources: Use our sample recipe data or upload your own recipe CSV file
- Detailed Recipe Display: View ingredients, directions, nutrition information, and more
- LangChain: Framework for building applications with language models
- Sentence Transformers: For generating embeddings from recipe text
- FAISS: Vector database for efficient similarity search
- Streamlit: For the web-based user interface
- Pandas: For data processing and manipulation
-
Clone this repository:
git clone https://github.com/ShivaniNR/Recipe-Retriever-RAG.git cd Recipe-Retriever-RAG -
Create a virtual environment (optional but recommended):
python -m venv venv source venv/bin/activate # On Windows: venv\Scripts\activate -
Install the required dependencies:
pip install -r requirements.txt
Recipe-Retriever-RAG/
│
├── data/ # Data directory
│ ├── raw/ # Raw recipe data (Food.com-recipes.csv)
│ ├── processed/ # Processed recipe data
│ └── vectorstore/ # Vector database files
│
├── src/ # Source code
│ ├── data_preprocessing.py # Data loading and preprocessing
│ ├── embeddings.py # Embedding generation and search
│ └── setup_script.py # One-time setup script
│
├── app.py # Streamlit web application
└── requirements.txt # Project dependencies
┌─────────────────┐ ┌───────────────────┐
│ │ │ │
│ Raw Recipe CSV ├─────────►│ DataPipeline │
│ │ │ │
└─────────────────┘ └─────────┬─────────┘
│
│ Processed CSV
▼
┌─────────────────┐ ┌─────────────────────┐
│ │ │ │
│ HuggingFace │◄─────────┤ SimpleRecipeEmbeddings
│ Model │ │ │
│ │ └──────────┬──────────┘
└─────────────────┘ │
│ Vector Embeddings
▼
┌─────────────────┐ ┌─────────────────────┐ ┌─────────────┐
│ │ │ │ │ │
│ User Query ├─────────►│ Streamlit App │◄────────┤ FAISS Vector│
│ │ │ │ │ Store │
└─────────────────┘ └─────────────────────┘ │ │
│ └─────────────┘
│
▼
┌─────────────────────┐
│ │
│ Search Results │
│ │
└─────────────────────┘
┌─────┐ ┌───────────────┐ ┌─────────────────┐ ┌──────────┐ ┌─────────┐
│Setup│ │DataPipeline │ │RecipeEmbeddings │ │FAISS │ │Streamlit│
└──┬──┘ └───────┬───────┘ └────────┬────────┘ └────┬─────┘ └────┬────┘
│ │ │ │ │
│ Run setup_script.py │ │ │ │
├────────────────────►│ │ │ │
│ │ │ │ │
│ │ Load & process CSV │ │ │
│ │◄──────────────────────────┤ │ │
│ │ │ │ │
│ │ Return processed data │ │ │
│ ├──────────────────────────►│ │ │
│ │ │ │ │
│ │ │ Create embeddings │ │
│ │ ├───────────────────────►│ │
│ │ │ │ │
│ │ │ Store vectors │ │
│ │ │◄───────────────────────┤ │
│ │ │ │ │
│ │ │ │ │
│ │ │ │ │
┌──┴──┐ ┌───────┴───────┐ ┌────────┴────────┐ ┌────┴─────┐ ┌────┴────┐
│User │ │ │ │ │ │ │ │ │
└──┬──┘ └───────────────┘ └─────────────────┘ └──────────┘ └────┬────┘
│ │
│ Run Streamlit app │
├─────────────────────────────────────────────────────────────────────────────────────────────►│
│ │
│ Enter search query │
├─────────────────────────────────────────────────────────────────────────────────────────────►│
│ │
│ ┌───────────────┐ ┌────────┬────────┐ ┌────┬─────┐ │
│ │ │ │RecipeEmbeddings │ │FAISS │ │
│ └───────┬───────┘ └────────┬────────┘ └────┬─────┘ │
│ │ │ │ │
│ │ │ Convert query to vector│ │
│ │ │◄───────────────────────┤ │
│ │ │ │ │
│ │ │ Search similar vectors │ │
│ │ ├───────────────────────►│ │
│ │ │ │ │
│ │ │ Return similar recipes │ │
│ │ │◄───────────────────────┤ │
│ │ │ │ │
│ │ │ │ │
│ Display search results │
│◄─────────────────────────────────────────────────────────────────────────────────────────────┤
│ │
┌──┴──┐ ┌───────────────┐ ┌─────────────────┐ ┌──────────┐ ┌────┴────┐
│ │ │ │ │ │ │ │ │ │
└─────┘ └───────────────┘ └─────────────────┘ └──────────┘ └─────────┘
The DataPipeline class handles the preprocessing of raw recipe data:
- Efficient data loading: Reads CSV data with optimized dtypes and chunking for memory efficiency
- Time parsing: Converts ISO duration formats to minutes
- Ingredient processing: Combines quantities and ingredient parts
- Recipe categorization: Automatically categorizes recipes by difficulty and time
- Searchable text creation: Generates optimized text for semantic search
The SimpleRecipeEmbeddings class manages the creation and search of recipe embeddings:
- Model loading: Uses HuggingFace's Sentence Transformers (default: 'all-MiniLM-L6-v2')
- Document preparation: Converts processed recipe data to LangChain Document objects
- Vector creation: Generates embeddings for each recipe
- FAISS integration: Stores embeddings in a FAISS vector database for efficient similarity search
- Search functionality: Provides semantic search capabilities with metadata filtering
The web interface provides a user-friendly way to interact with the recipe search system:
- Cached loading: Efficiently loads the vectorstore once
- Search interface: Allows natural language queries
- Result display: Shows recipe details including ingredients, instructions, and metadata
- Suggestion buttons: Provides example queries for users to try
A one-time setup script that:
- Loads and preprocesses the raw recipe data
- Creates embeddings and stores them in a FAISS vector database
-
Setup the system:
python src/setup_script.py -
Run the Streamlit app:
streamlit run app.py -
Enter your recipe search query in the search box and explore the results!
- Data Processing: Recipes are loaded and cleaned by the
DataPipelineclass - Embedding Generation: The
SimpleRecipeEmbeddingsclass uses a Sentence Transformer model to convert recipe text into vector embeddings - Vector Storage: Embeddings are stored in a FAISS index for efficient similarity search
- Query Processing: When you enter a search query, it's converted to an embedding and compared to the recipe embeddings
- Result Retrieval: The most similar recipes are retrieved and displayed based on semantic similarity
- Implement text chunking for handling longer recipes
- Implement dietary restriction filtering
- Add ingredient substitution suggestions
- Add user accounts and saved favorites
This project is open source and available under the MIT License.
- Built with Streamlit
- Embedding models from Sentence Transformers
- Vector search with FAISS
- Recipe data from Food.com dataset