Skip to content

MohannadIsCoding/transport-train-model

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

3 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Transport Delay Predictor

An AI-powered bus delay analysis and prediction system built with machine learning models. This project provides both backend model training and a modern, interactive frontend for analyzing and predicting transportation delays.

πŸ“‹ Overview

The Transport Delay Predictor system includes:

  • Multiple ML Models: XGBoost, Random Forest, Linear Regression, and K-Nearest Neighbors (KNN)
  • Interactive Frontend: Web-based UI with real-time predictions and visualizations
  • Streamlit Dashboard: Advanced analytics and model performance monitoring
  • Data Processing Pipeline: Automated feature engineering and preprocessing

🎯 Features

Core Functionality

  • Delay Prediction: Predict bus delays based on various features (route, weather, passenger count, time, location)
  • Model Comparison: Compare predictions across multiple ML models simultaneously
  • Performance Metrics: View MAE (Mean Absolute Error) and RΒ² scores for each model
  • Data Analysis: Exploratory data analysis with interactive visualizations
  • Historical Data Insights: Analyze patterns and trends in historical delay data

Machine Learning Models

Model Test MAE Test RΒ² Status
XGBoost 56.29 0.425 ⭐ Recommended
Random Forest 56.29 0.427 βœ“ Good
Linear Regression 62.53 0.185 βœ“ Baseline
K-Nearest Neighbors 67.72 -0.043 ⚠️ Reference

πŸš€ Getting Started

Prerequisites

  • Python 3.8+
  • Node.js (optional, for running HTTP server)
  • Git

Installation

  1. Clone the repository (if not already done):
git clone <repository-url>
cd "Transport Train Model"
  1. Create a virtual environment:
python -m venv .venv
.\.venv\Scripts\activate  # On Windows
source .venv/bin/activate  # On macOS/Linux
  1. Install dependencies:
pip install -r requirements.txt
  1. Download the cleaned dataset (if not included):
    • Place cleaned_transport_dataset.csv in the project root

Running the Application

Option 1: Streamlit Frontend (Advanced)

streamlit run app.py

Access at: http://localhost:8501

Option 2: Static Web Frontend (Recommended for quick start)

# In PowerShell/Terminal
python -m http.server 8000
# Or use Node.js
npm install -g http-server
http-server

Access at: http://localhost:8000

πŸ“ Project Structure

β”œβ”€β”€ app.py                              # Streamlit application (advanced analytics)
β”œβ”€β”€ app.js                              # Frontend JavaScript logic
β”œβ”€β”€ index.html                          # Web frontend UI
β”œβ”€β”€ styles.css                          # Frontend styling
β”œβ”€β”€ train_models.py                     # Model training script
β”œβ”€β”€ transport_delay_analysis.ipynb      # Jupyter notebook for analysis
β”œβ”€β”€ cleaned_transport_dataset.csv       # Processed dataset
β”œβ”€β”€ dirty_transport_dataset.csv         # Raw dataset
β”œβ”€β”€ requirements.txt                    # Python dependencies
β”œβ”€β”€ model_evaluation_summary.csv        # Model performance metrics
β”œβ”€β”€ README.md                           # This file
β”‚
β”œβ”€β”€ models/                             # Trained model artifacts
β”‚   β”œβ”€β”€ linear_regression.pkl
β”‚   β”œβ”€β”€ random_forest.pkl
β”‚   β”œβ”€β”€ xgboost.pkl
β”‚   β”œβ”€β”€ knn.pkl
β”‚   β”œβ”€β”€ scaler.pkl
β”‚   β”œβ”€β”€ label_encoder_*.pkl
β”‚   └── metadata.json
β”‚
β”œβ”€β”€ tools/                              # Utility scripts
β”‚   β”œβ”€β”€ extract_importances.py
β”‚   └── extract_xgb_importances.py
β”‚
└── old/                                # Archived files
    β”œβ”€β”€ app_old.py
    └── train_models_old.py

πŸ”§ Model Training

Retraining Models

To retrain all models with your data:

python train_models.py

This script will:

  1. Load and preprocess the cleaned dataset
  2. Perform feature engineering
  3. Split data into train/test sets
  4. Train all four ML models
  5. Evaluate model performance
  6. Save trained models and metadata
  7. Generate performance metrics CSV

Dataset Features

The models use the following features:

  • Temporal: Hour, Day of Week, Time of Day, Weekend indicator
  • Location: Latitude, Longitude
  • Traffic: Route ID, Passenger Count
  • Weather: Weather Condition, Weather Severity

Target Variable

  • Delay (minutes): Actual delay from scheduled time

πŸ’» Frontend Pages

1. Dashboard

  • Key metrics (total records, mean/median/max delay)
  • Delay distribution histogram
  • Delay by route analysis
  • Weather impact analysis
  • Passenger count correlation
  • Dataset preview table

2. Predict Delay

  • Interactive prediction form with all input parameters
  • Model selection dropdown (XGBoost, Random Forest, Linear Regression, KNN)
  • Real-time predictions with status badges
  • Gauge chart visualization
  • All-models comparison table

3. Data Analysis

  • Statistics Tab: Comprehensive dataset statistics
  • Exploratory Tab: Correlation analysis and visualizations
  • Raw Data Tab: Searchable, filterable data table

4. Model Performance

  • MAE comparison chart
  • RΒ² score comparison chart
  • Model rankings and recommendations
  • Detailed performance metrics table

πŸ“Š API Integration (Streamlit)

The Streamlit app (app.py) provides advanced features:

  • Real-time model retraining interface
  • Cross-validation results
  • Feature importance visualization
  • Shapley value explanations
  • Custom prediction scenarios

πŸŽ“ Model Explanations

XGBoost (Recommended)

  • Gradient boosting ensemble method
  • Best overall performance (RΒ² = 0.425)
  • Robust to outliers and non-linear relationships
  • Suitable for production use

Random Forest

  • Ensemble of decision trees
  • Good generalization (RΒ² = 0.427)
  • Provides feature importance scores
  • Parallel prediction capability

Linear Regression

  • Baseline statistical model
  • Interpretable coefficients
  • Moderate performance (RΒ² = 0.185)
  • Fast inference

K-Nearest Neighbors (KNN)

  • Instance-based learning
  • Reference model for comparison
  • Lower performance (-0.043 RΒ²)
  • Useful for local pattern analysis

πŸ“ˆ Performance Metrics

Models are evaluated using:

  • MAE (Mean Absolute Error): Average prediction error in minutes
  • RMSE (Root Mean Squared Error): Penalizes larger errors more heavily
  • RΒ² Score: Coefficient of determination (0-1 scale)
  • Cross-Validation: k-fold CV for stability assessment

πŸ” Data Privacy

  • No personal data is collected or stored
  • Dataset contains aggregated transportation metrics only
  • All model artifacts are saved locally
  • No external API calls for predictions

πŸ› Troubleshooting

Models not loading

# Retrain models
python train_models.py

Port already in use

# Change port for HTTP server
python -m http.server 9000

# For Streamlit
streamlit run app.py --server.port 8502

Missing dependencies

pip install --upgrade -r requirements.txt

Dataset encoding issues

Ensure CSV files use UTF-8 encoding.

πŸ“ Recent Updates

KNN Model Addition

  • Added K-Nearest Neighbors model to the ensemble
  • Integrated into all UI components (frontend and Streamlit)
  • Added to model comparison visualizations
  • Includes performance metrics evaluation

πŸš€ Future Enhancements

  • Deep Learning models (LSTM, Neural Networks)
  • Real-time data ingestion
  • Geographic heat maps
  • Mobile app version
  • REST API for external integrations
  • Automated retraining pipeline
  • Model explainability dashboard
  • Anomaly detection for unusual delays

πŸ“š Dependencies

See requirements.txt for full list:

  • pandas: Data manipulation
  • numpy: Numerical computing
  • scikit-learn: ML algorithms & preprocessing
  • xgboost: Gradient boosting
  • plotly: Interactive visualizations
  • streamlit: Web framework
  • jupyter: Notebook environment
  • joblib: Model serialization

πŸ‘¨β€πŸ’» Development

Adding New Models

  1. Train and test the model in the Jupyter notebook
  2. Add model saving to train_models.py
  3. Update app.py to load the new model
  4. Add to app.js frontend model selection
  5. Update performance comparison in index.html
  6. Run tests and validate predictions

Code Style

  • Python: PEP 8 compliance
  • JavaScript: ES6+ standards
  • HTML/CSS: Semantic markup

πŸ“„ License

This project is provided as-is for educational and operational purposes.

🀝 Support

For issues or questions:

  1. Check the troubleshooting section
  2. Review model training logs
  3. Verify dataset format
  4. Check browser console for frontend errors

πŸ“ž Contact

For more information about this project, please refer to the model documentation and code comments.


Last Updated: December 2025
Models Included: XGBoost, Random Forest, Linear Regression, K-Nearest Neighbors
Dataset: Transport Delay Analysis (500+ records)

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published