π Customer Churn Prediction
Benchmarking of Machine Learning and Deep Learning models for customer churn prediction in the telecom industry, with automatic hyperparameter optimization via Optuna.
π About the Project
This project investigates the predictive power of five distinct approaches to identify customers at risk of churning (service cancellation), using a real-world telecom dataset. The full pipeline covers exploratory data analysis, preprocessing, feature engineering, model training, hyperparameter optimization, and comparative evaluation.
π§ͺ Models Evaluated
MLP
Random Forest
XGBoostGradient
TabTabular
TabPFN2
ποΈ Dataset
Source: Telco Customer Churn
Target: Churn β Yes / No
βοΈ Pipeline
-
DATA LOADING
-
PREPROCESSING
βββ Convert TotalCharges to numeric (coerce errors)
βββ Encode target variable: Churn β binary (0/1)
βββ One-hot encoding of categorical features
βββ Standardization with StandardScaler
βββ Train / Validation / Test split -
EXPLORATORY DATA ANALYSIS βββ Distribution plots for tenure, MonthlyCharges, TotalCharges
-
FEATURE ENGINEERING β Bucketing βββ MonthlyCharges β <22 | 22β68 | 68β105 | 105+
βββ tenure β 0β3 | 4β69 | 70+
βββ TotalCharges β <200 | 200β2000 | 2000+ -
CLASS BALANCING
βββ RandomOverSampler -
MODEL TRAINING + HYPERPARAMETER OPTIMIZATION (Optuna)
-
EVALUATION βββ Metrics: Accuracy, Balanced Accuracy, Precision, Recall, F1, AUC-ROC, KS Statistic
βββ Confusion Matrix per model
βββ ROC Curve comparison
π Results