📊 Telecom Customer Churn Prediction & Analysis 🔍 Project Overview
Customer churn refers to customers who stop using a company’s services. This project focuses on building an end-to-end customer churn prediction system using Machine Learning (Python) and Power BI to help businesses identify high-risk customers and reduce churn.
The project demonstrates the complete workflow of a Data Analyst / Junior Data Scientist, from data preprocessing and model building to business-focused visualization.
🎯 Project Objectives
Predict whether a customer is likely to churn
Perform data preprocessing and feature engineering
Handle class imbalance using SMOTE
Train and compare multiple machine learning models
Evaluate models using business-relevant metrics
Visualize churn insights and high-risk customers using Power BI
🗂️ Dataset
Telco Customer Churn Dataset
Each row represents a customer
Target variable: Churn (Yes / No)
Features include:
Customer demographics (gender, senior citizen, dependents)
Services used (internet service, streaming, security)
Account information (tenure, contract type, charges)
🛠️ Tools & Technologies
Python
Pandas & NumPy
Scikit-learn
Matplotlib
Imbalanced-learn (SMOTE)
Power BI
Google Colab
GitHub
🔄 Project Workflow 1️⃣ Data Loading & Understanding
Load dataset
Inspect data structure and target variable
Identify categorical and numerical features
2️⃣ Exploratory Data Analysis (EDA)
Analyze churn distribution
Study churn behavior by contract, tenure, and payment method
3️⃣ Data Preprocessing
Handle missing values
Encode categorical variables
Scale numerical features
Split data into train and test sets
Handle class imbalance using SMOTE
4️⃣ Model Training
Logistic Regression
Random Forest Classifier
5️⃣ Model Evaluation
Confusion Matrix
Precision, Recall, F1-score
ROC–AUC Score
ROC Curve
Precision–Recall Curve
6️⃣ Model Selection & Saving
Best model selected based on ROC–AUC
Model serialized as a .pkl file
7️⃣ Power BI Dashboard
Visualize churn rate and churn drivers
Identify high-risk customers using churn probability
Interactive analysis using slicers
📊 Power BI Dashboard Features
Customer Churn Rate (KPI Card)
Churn by Contract Type
Churn by Payment Method
Churn by Tenure
High-Risk Customers Table (Churn Probability ≥ 0.7)
Interactive slicers:
Contract
Internet Service
Payment Method
Gender
✅ Conclusion
The project successfully predicts customer churn by learning patterns from historical telecom customer data. Among the trained models, Random Forest outperformed Logistic Regression, achieving a higher ROC–AUC score and better recall for churned customers. The analysis shows that month-to-month contracts, low tenure, and certain payment methods are strong indicators of churn.
By combining machine learning predictions with Power BI visualizations, the project enables businesses to identify high-risk customers early and take proactive retention actions.
💾 Model File Note
Due to GitHub file size limitations, the trained model file (.pkl) is not included in this repository. The model can be recreated easily by running the notebook end-to-end, ensuring full reproducibility of results.
👤 Author
Aman Niranjan