Skip to content

GauravP1101/FakeNews-Detection

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

19 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

📰 Fake News Detection

Python Pandas NumPy Scikit-learn XGBoost LightGBM CatBoost Jupyter Matplotlib Seaborn

A machine learning project that trains and evaluates a fake news classification model from tabular text data using Jupyter Notebooks. The workflow demonstrates industry best practices for reproducible ML development — including data preprocessing, feature engineering, cross-validation, and pipeline-based model training.


📂 Project Structure

FakeNews-Detection/ │ ├── data/ # Dataset (download separately) ├── notebooks/ │ ├── Untitled.ipynb # Baseline workflow │ ├── Untitled_optimized.ipynb # Reproducible + CV/Pipeline version ├── requirements.txt # Project dependencies ├── README.md # Documentation


⚙️ Features

  • Data loading & cleaning
  • Feature engineering & encoding
  • Stratified train/validation/test splits
  • Model training with scikit-learn Pipelines (avoiding leakage)
  • Cross-validated evaluation (Accuracy, F1, Confusion Matrix)
  • Optional hyperparameter search (GridSearchCV / RandomizedSearchCV)
  • Reproducibility with fixed seeds
  • Lightweight profiling/timers for faster iterations

📊 Dataset

This project uses the Fake News Dataset from Kaggle:
🔗 Fake News Dataset (Kaggle)

Download and place the CSV file(s) inside the data/ folder before running the notebooks.

🚀 Getting Started

1. Clone the repo  
  git clone https://github.com/GauravP1101/FakeNews-Detection.git
  cd FakeNews-Detection

2. Create & activate a virtual environment
   
  python -m venv .venv
  # Windows
  .venv\Scripts\activate
  # macOS/Linux
  source .venv/bin/activate

3. Install dependencies
  pip install -r requirements.txt
  If requirements.txt is missing, you can start with:
  pip install jupyter numpy pandas scikit-learn matplotlib seaborn xgboost lightgbm catboost
  pip freeze > requirements.txt

4. Run notebooks
  jupyter notebook

About

A robust, end-to-end Machine Learning pipeline for classifying text data as fake or genuine news using advanced ensemble models.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors