Phishing URL Detection System

Machine learning–based phishing detection that classifies URLs as Phishing or Legitimate, with a risk score and explainable features. Built with Python, scikit-learn, and Flask.

Features

URL features: length, dots, subdomains, HTTPS, suspicious keywords, URL shorteners, entropy
Domain features: WHOIS age, DNS records, abnormal patterns (optional; batch processing skips slow lookups)
Content features (optional): HTML forms, iframes, redirects, urgency language
Models: Logistic Regression (baseline) and Random Forest (primary), tuned for high recall
API: Flask web UI and REST API with classification, risk score (0–100), and top contributing features

Project Structure

├── config.py
├── run_training.py
├── requirements.txt
├── data/
│   ├── raw/                    # CSV dataset (url, label)
│   ├── processed/              # Extracted features
│   ├── download_sample_data.py
│   └── download_uci_phishing.py
├── feature_extraction/
│   ├── url_features.py
│   ├── domain_features.py
│   ├── content_features.py
│   └── extractor.py
├── model_training/
│   ├── pipeline.py
│   └── train.py
├── evaluation/
│   └── metrics.py
├── utils/
│   ├── safe_url.py
│   └── data_loader.py
├── deployment/
│   ├── predictor.py
│   └── app.py
└── models/                     # Saved model artifacts

Setup

cd "Phishing Detection"
python -m venv venv
venv\Scripts\activate          # Windows
pip install -r requirements.txt

Dataset

Use a CSV with columns url and label (1 = phishing, 0 = legitimate). Place it at data/raw/phishing_dataset.csv.

Download UCI PhiUSIIL dataset:

python data/download_uci_phishing.py

This fetches the dataset from the UCI repository and saves it in the correct format. Then run training.

Training

From the project root (set PYTHONPATH so imports work):

Windows (PowerShell):

$env:PYTHONPATH = (Get-Location).Path
python run_training.py

Windows (CMD):

set PYTHONPATH=%CD%
python run_training.py

Linux/macOS:

export PYTHONPATH=.
python run_training.py

Training will load the dataset, extract features, train Logistic Regression and Random Forest with cross-validation, and save the best model to models/.

Prediction API & Web UI

Start the Flask app:

python deployment/app.py

Web UI: http://127.0.0.1:5000/ — enter a URL to get classification, risk score, and top features.
REST API:
- GET /api/predict?url=https://example.com
- POST /api/predict with body {"url": "https://example.com"}

Response includes classification, risk_score (0–100), and top_contributing_features.

Programmatic Use

import sys
from pathlib import Path
sys.path.insert(0, str(Path(".").resolve()))
from deployment.predictor import predict_dict

result = predict_dict("https://example.com")
# result["classification"], result["risk_score"], result["top_contributing_features"]

Ethical Use

This tool is for defensive and educational use only (e.g. SOC workflows, internal security, learning). Do not use it to create or host phishing sites or to target systems without authorization. See ETHICS_AND_USE.md for details.

License

Use at your own risk. Not a replacement for professional security products. Comply with your organization’s policies and applicable laws.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Phishing URL Detection System

Features

Project Structure

Setup

Dataset

Training

Prediction API & Web UI

Programmatic Use

Ethical Use

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
data		data
deployment		deployment
evaluation		evaluation
feature_extraction		feature_extraction
model_training		model_training
models		models
utils		utils
.gitignore		.gitignore
ETHICS_AND_USE.md		ETHICS_AND_USE.md
README.md		README.md
config.py		config.py
requirements.txt		requirements.txt
run_training.py		run_training.py

Folders and files

Latest commit

History

Repository files navigation

Phishing URL Detection System

Features

Project Structure

Setup

Dataset

Training

Prediction API & Web UI

Programmatic Use

Ethical Use

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages