This project implements a multi-layer review fraud detection system with:
- Textual layer (50% weight)
- Behavioral layer (30% weight)
- Temporal layer (20% weight)
It integrates customer, product, and review datasets, computes a Final Fraud Risk Score (0–100), aggregates scores at the product level, and exposes an interactive Streamlit dashboard.
The expected structure is:
fraud_detection_project/
│
├── data/
│ ├── customers.csv # or customer_preprocessed_only_200_rows.csv
│ ├── products.csv # or product_preprocessed_only_200_rows.csv
│ └── reviews.xlsx # or textual_temporal_200_rows.csv.xlsx
│
├── models/
│ └── fraud_model.py
│
├── utils/
│ ├── textual_layer.py
│ ├── behavioral_layer.py
│ ├── temporal_layer.py
│
├── app.py
├── setup_env.py
├── requirements.txt
└── README.md
Note: If you prefer to keep your original filenames, just copy them into the
datafolder. The code will automatically try both the generic names (customers.csv,products.csv,reviews.xlsx) and the original filenames (customer_preprocessed_only_200_rows.csv,product_preprocessed_only_200_rows.csv,textual_temporal_200_rows.csv.xlsx).
From the fraud_detection_project folder:
python -m venv venvOn Windows (PowerShell):
.\venv\Scripts\ActivateOn macOS / Linux:
source venv/bin/activateYou can either install manually:
pip install -r requirements.txtor let the automation script do everything for you (recommended):
python setup_env.pyThe setup_env.py script will:
- Create the
venvvirtual environment (if it doesn’t exist) - Install all dependencies from
requirements.txtinsidevenv - Download NLTK data:
punktstopwords
- Download spaCy model:
en_core_web_sm
Place your files in the data folder. The loader will automatically try multiple naming options:
- Customers
data/customers.csvdata/customer_preprocessed_only_200_rows.csv
- Products
data/products.csvdata/product_preprocessed_only_200_rows.csv
- Reviews
data/reviews.xlsxdata/textual_temporal_200_rows.csv.xlsx
Required logical columns (names can vary slightly; see code for fallbacks/aliases):
Customer_IDProduct_IDReview_ID(or equivalent)Review_Text- Optional:
Text_Fraud_Probability - Behavioral:
Account_AgeReview_FrequencyRefund_RatioVerified_Purchase_RatioAverage_Rating_By_User
- Temporal:
Reviews_Per_DayBurst_FlagReview_Date
After the environment is ready and data files are in data/:
streamlit run app.pyThe dashboard provides:
- Dropdown selection by
Review_ID - Display of:
- Review text
- Textual score
- Behavioral score
- Temporal score
- Final weighted fraud risk score
- Risk level (Low / Moderate / High)
- Product-level metrics:
- Average fraud score per product
- Suspicious review ratio per product
- Product authenticity score
- Basic charts:
- Distribution of fraud scores
- Per-product fraud metrics
- Random operations (e.g. train/test splits, model initialization) use a fixed random seed for reproducibility.
- All key hyperparameters and column names are defined in one place in the code and can be adjusted easily.