Book Recommender System

Project Setup

1. Initialize the Project

Set up a Python environment for the project.

2. Create a Virtual Environment

Use venv or conda to manage dependencies:

Using `venv`:

python -m venv .venv
source .venv/bin/activate  # On macOS/Linux
.venv\Scripts\activate    # On Windows

Using `conda`:

conda create --name book-recommender python=3.9
conda activate book-recommender

3. Install Required Packages

Run the following command to install dependencies:

pip install kagglehub pandas matplotlib seaborn python-dotenv \
    langchain-community langchain-openai langchain-chroma gradio \
    transformers jupyter ipywidgets

4. Package Descriptions

Package	Description
kagglehub	Access and download datasets from Kaggle easily.
pandas	Data manipulation and analysis library, useful for handling structured data.
matplotlib	Visualization library for creating static, animated, and interactive plots.
seaborn	Statistical data visualization library built on top of matplotlib.
python-dotenv	Load environment variables from a `.env` file to manage secrets securely.
langchain-community	Community-supported extensions for working with LangChain.
langchain-openai	OpenAI API integration for LangChain applications.
langchain-chroma	ChromaDB integration for vector database storage and retrieval.
gradio	Create interactive web interfaces for machine learning models easily.
transformers	Hugging Face library for working with pre-trained transformer models.
jupyter notebook	Interactive computing environment for writing and running Python code.
ipywidgets	Interactive widgets for Jupyter notebooks to enhance user experience.

5. Running Jupyter Notebook

To start the Jupyter Notebook, run:

jupyter notebook

6. Data Processing and Vector Search

After cleaning the data, we will perform vector search and word embeddings to find similarities and dissimilarities between words. The process involves:

Creating distance between words that are dissimilar.
Relying on word embedding models by analyzing word usage in context.
Word2Vec: Learning which words immediately precede and follow a given word.
Transforming words into embeddings and adding positional embeddings to determine their position.
Feeding these embeddings into a self-attention mechanism to understand word relationships within a sentence.
Generating self-attention vectors for each word and averaging them over multiple iterations.
This process of generating and normalizing self-attention vectors is called the encoder block.

7. Transformer Models and Language Translation

Encoder Block: Learns all relationships between words in the source language.
Sends output to the Decoder, which relates words in the target language and utilizes encoder output to predict the most likely translation.
Encoder-Only Models (e.g., RoBERTa): Trained to predict a masked word in text.
- Tokenizes the text and adds special [CLS] and [SEP] tokens to mark the beginning and end.
- Applies word embeddings and self-attention in encoder blocks.
- Learns internal representations of language structure to improve accuracy.

8. Document Embedding and Search Optimization

Document Embedding: Identifies whether documents are similar or dissimilar based on embeddings.
We match embeddings to generate book recommendations.
Currently using a linear search approach.
Exploring vector indexing databases for grouping similar vectors efficiently.
Tradeoff exists between speed and accuracy in search optimization.

9. LangChain for Advanced AI Pipelines

LangChain is a Python framework offering various LLM functionalities.
Used for creating Retrieval-Augmented Generation (RAG) pipelines and chatbots.
State-of-the-art AI capabilities without being limited to a single LLM provider.

10. Zero-Shot Text Classification for Book Categorization

Text classification is a branch of NLP that assigns text to categories.
Zero-shot classification can categorize books into different groups without labeled training data.
Using Hugging Face’s transformers library, we apply zero-shot learning to classify books by genre, topic, or audience.This step helps refine recommendations by filtering books based on user preferences.

11. Sentiment Analysis for Enhanced Control

To provide users with an additional degree of control, we fine-tune our LLM to classify emotion.
We consider the RoBERTa model with its encoder layer.
Instead of predicting masked words, we replace the last layer with an emotion classification layer.
This helps categorize books based on emotional tone, improving recommendations.

12. Summary Vector Database and Gradio Dashboard

We implement a summary vector database that allows us to retrieve the most similar texts based on queries.
Text classification is used to determine if a book is fiction or non-fiction.
After classification, we analyze the emotional tone of the book.
We create an interactive Gradio dashboard, an open-source Python package, to visualize and explore recommendations dynamically.

Wrapped Gradio in FastAPI (required for Vercel).

Contributing

Feel free to fork this repository and submit pull requests. Contributions are welcome!

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
.idea		.idea
book-recommender		book-recommender
.gitignore		.gitignore
README.md		README.md
app.py		app.py
books_cleaned.csv		books_cleaned.csv
books_with_categories.csv		books_with_categories.csv
books_with_emotions.csv		books_with_emotions.csv
cover-not-found.jpg		cover-not-found.jpg
data-exploration.ipynb		data-exploration.ipynb
gradio-dashboard.py		gradio-dashboard.py
requirements.txt		requirements.txt
sentiment-analysis.ipynb		sentiment-analysis.ipynb
server.py		server.py
tagged_description.txt		tagged_description.txt
text-classification.ipynb		text-classification.ipynb
vector-search.ipynb		vector-search.ipynb
vercel.json		vercel.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Book Recommender System

Project Setup

1. Initialize the Project

2. Create a Virtual Environment

Using `venv`:

Using `conda`:

3. Install Required Packages

4. Package Descriptions

5. Running Jupyter Notebook

6. Data Processing and Vector Search

7. Transformer Models and Language Translation

8. Document Embedding and Search Optimization

9. LangChain for Advanced AI Pipelines

10. Zero-Shot Text Classification for Book Categorization

11. Sentiment Analysis for Enhanced Control

12. Summary Vector Database and Gradio Dashboard

Contributing

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Book Recommender System

Project Setup

1. Initialize the Project

2. Create a Virtual Environment

Using venv:

Using conda:

3. Install Required Packages

4. Package Descriptions

5. Running Jupyter Notebook

6. Data Processing and Vector Search

7. Transformer Models and Language Translation

8. Document Embedding and Search Optimization

9. LangChain for Advanced AI Pipelines

10. Zero-Shot Text Classification for Book Categorization

11. Sentiment Analysis for Enhanced Control

12. Summary Vector Database and Gradio Dashboard

Contributing

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Using `venv`:

Using `conda`:

Packages