Skip to content

ShivaniNR/Recipe-Retriever-RAG

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Recipe Retriever RAG 🍳

A Retrieval Augmented Generation (RAG) system for finding recipes using natural language search. This project uses semantic search to match your queries with relevant recipes, even if they don't contain the exact words you're looking for.

Features

  • Natural Language Search: Find recipes using conversational queries like "healthy vegetarian dinner" or "quick breakfast ideas"
  • Semantic Understanding: Leverages modern embedding models to understand the meaning behind your queries
  • Text Chunking: Splits recipes into smaller pieces for more accurate semantic search
  • Interactive UI: User-friendly Streamlit interface for searching and viewing recipes
  • Flexible Data Sources: Use our sample recipe data or upload your own recipe CSV file
  • Detailed Recipe Display: View ingredients, directions, nutrition information, and more

Technologies Used

  • LangChain: Framework for building applications with language models
  • Sentence Transformers: For generating embeddings from recipe text
  • FAISS: Vector database for efficient similarity search
  • Streamlit: For the web-based user interface
  • Pandas: For data processing and manipulation

Installation

  1. Clone this repository:

    git clone https://github.com/ShivaniNR/Recipe-Retriever-RAG.git
    cd Recipe-Retriever-RAG
    
  2. Create a virtual environment (optional but recommended):

    python -m venv venv
    source venv/bin/activate  # On Windows: venv\Scripts\activate
    
  3. Install the required dependencies:

    pip install -r requirements.txt
    

Project Structure

Recipe-Retriever-RAG/
│
├── data/                    # Data directory
│   ├── raw/                 # Raw recipe data (Food.com-recipes.csv)
│   ├── processed/           # Processed recipe data
│   └── vectorstore/         # Vector database files
│
├── src/                     # Source code
│   ├── data_preprocessing.py # Data loading and preprocessing
│   ├── embeddings.py        # Embedding generation and search
│   └── setup_script.py      # One-time setup script
│
├── app.py                   # Streamlit web application
└── requirements.txt         # Project dependencies

Implementation Details

Component Architecture

┌─────────────────┐          ┌───────────────────┐
│                 │          │                   │
│  Raw Recipe CSV ├─────────►│   DataPipeline    │
│                 │          │                   │
└─────────────────┘          └─────────┬─────────┘
                                       │
                                       │ Processed CSV
                                       ▼
┌─────────────────┐          ┌─────────────────────┐
│                 │          │                     │
│  HuggingFace    │◄─────────┤ SimpleRecipeEmbeddings
│  Model          │          │                     │
│                 │          └──────────┬──────────┘
└─────────────────┘                     │
                                        │ Vector Embeddings
                                        ▼
┌─────────────────┐          ┌─────────────────────┐         ┌─────────────┐
│                 │          │                     │         │             │
│  User Query     ├─────────►│   Streamlit App     │◄────────┤ FAISS Vector│
│                 │          │                     │         │ Store       │
└─────────────────┘          └─────────────────────┘         │             │
                                       │                     └─────────────┘
                                       │
                                       ▼
                              ┌─────────────────────┐
                              │                     │
                              │   Search Results    │
                              │                     │
                              └─────────────────────┘

Sequence Diagram

┌─────┐          ┌───────────────┐          ┌─────────────────┐          ┌──────────┐          ┌─────────┐
│Setup│          │DataPipeline   │          │RecipeEmbeddings │          │FAISS     │          │Streamlit│
└──┬──┘          └───────┬───────┘          └────────┬────────┘          └────┬─────┘          └────┬────┘
   │                     │                           │                        │                     │
   │ Run setup_script.py │                           │                        │                     │
   ├────────────────────►│                           │                        │                     │
   │                     │                           │                        │                     │
   │                     │ Load & process CSV        │                        │                     │
   │                     │◄──────────────────────────┤                        │                     │
   │                     │                           │                        │                     │
   │                     │ Return processed data     │                        │                     │
   │                     ├──────────────────────────►│                        │                     │
   │                     │                           │                        │                     │
   │                     │                           │ Create embeddings      │                     │
   │                     │                           ├───────────────────────►│                     │
   │                     │                           │                        │                     │
   │                     │                           │ Store vectors          │                     │
   │                     │                           │◄───────────────────────┤                     │
   │                     │                           │                        │                     │
   │                     │                           │                        │                     │
   │                     │                           │                        │                     │
┌──┴──┐          ┌───────┴───────┐          ┌────────┴────────┐          ┌────┴─────┐          ┌────┴────┐
│User │          │               │          │                 │          │          │          │         │
└──┬──┘          └───────────────┘          └─────────────────┘          └──────────┘          └────┬────┘
   │                                                                                                │
   │ Run Streamlit app                                                                              │
   ├─────────────────────────────────────────────────────────────────────────────────────────────►│
   │                                                                                                │
   │ Enter search query                                                                             │
   ├─────────────────────────────────────────────────────────────────────────────────────────────►│
   │                                                                                                │
   │                     ┌───────────────┐          ┌────────┬────────┐          ┌────┬─────┐     │
   │                     │               │          │RecipeEmbeddings │          │FAISS     │     │
   │                     └───────┬───────┘          └────────┬────────┘          └────┬─────┘     │
   │                             │                           │                        │           │
   │                             │                           │ Convert query to vector│           │
   │                             │                           │◄───────────────────────┤           │
   │                             │                           │                        │           │
   │                             │                           │ Search similar vectors │           │
   │                             │                           ├───────────────────────►│           │
   │                             │                           │                        │           │
   │                             │                           │ Return similar recipes │           │
   │                             │                           │◄───────────────────────┤           │
   │                             │                           │                        │           │
   │                             │                           │                        │           │
   │ Display search results                                                                        │
   │◄─────────────────────────────────────────────────────────────────────────────────────────────┤
   │                                                                                                │
┌──┴──┐          ┌───────────────┐          ┌─────────────────┐          ┌──────────┐          ┌────┴────┐
│     │          │               │          │                 │          │          │          │         │
└─────┘          └───────────────┘          └─────────────────┘          └──────────┘          └─────────┘

Core Components

1. Data Preprocessing (src/data_preprocessing.py)

The DataPipeline class handles the preprocessing of raw recipe data:

  • Efficient data loading: Reads CSV data with optimized dtypes and chunking for memory efficiency
  • Time parsing: Converts ISO duration formats to minutes
  • Ingredient processing: Combines quantities and ingredient parts
  • Recipe categorization: Automatically categorizes recipes by difficulty and time
  • Searchable text creation: Generates optimized text for semantic search

2. Embeddings Generation (src/embeddings.py)

The SimpleRecipeEmbeddings class manages the creation and search of recipe embeddings:

  • Model loading: Uses HuggingFace's Sentence Transformers (default: 'all-MiniLM-L6-v2')
  • Document preparation: Converts processed recipe data to LangChain Document objects
  • Vector creation: Generates embeddings for each recipe
  • FAISS integration: Stores embeddings in a FAISS vector database for efficient similarity search
  • Search functionality: Provides semantic search capabilities with metadata filtering

3. Streamlit Application (app.py)

The web interface provides a user-friendly way to interact with the recipe search system:

  • Cached loading: Efficiently loads the vectorstore once
  • Search interface: Allows natural language queries
  • Result display: Shows recipe details including ingredients, instructions, and metadata
  • Suggestion buttons: Provides example queries for users to try

4. Setup Script (src/setup_script.py)

A one-time setup script that:

  • Loads and preprocesses the raw recipe data
  • Creates embeddings and stores them in a FAISS vector database

Usage

  1. Setup the system:

    python src/setup_script.py
    
  2. Run the Streamlit app:

    streamlit run app.py
    
  3. Enter your recipe search query in the search box and explore the results!

How It Works

  1. Data Processing: Recipes are loaded and cleaned by the DataPipeline class
  2. Embedding Generation: The SimpleRecipeEmbeddings class uses a Sentence Transformer model to convert recipe text into vector embeddings
  3. Vector Storage: Embeddings are stored in a FAISS index for efficient similarity search
  4. Query Processing: When you enter a search query, it's converted to an embedding and compared to the recipe embeddings
  5. Result Retrieval: The most similar recipes are retrieved and displayed based on semantic similarity

Future Improvements

  • Implement text chunking for handling longer recipes
  • Implement dietary restriction filtering
  • Add ingredient substitution suggestions
  • Add user accounts and saved favorites

License

This project is open source and available under the MIT License.

Acknowledgments

About

A RAG project which generates recipes as per the user requriements

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages