A machine-learning project that predicts whether a flight will be delayed using historical flight and weather data.
This project cleans real flight datasets, engineers features (time, airport, airline, weather), and trains ML models to forecast delays. The goal is to support smarter scheduling and improve travel reliability.
- Data cleaning & preprocessing
- Feature Selection
- ML model training & evaluation
- Delay prediction for new flights
- Pandas
- Scikit-Learn
- Matplotlib
- Seaborn
- Pyspark
Historical U.S. flight data + weather info provided by Rob Mulla from Kaggle. This
dataset contains all flight information including cancellation and delays by airlines for 2022. The data
was extracted from the Marketing Carrier On-Time Performance data table of the “On-Time” database
from the TranStats data library.
🔗 https://www.kaggle.com/datasets/robikscube/flight-delay-dataset-20182022
- Normalization
- PCA