Skip to content

bergerache/Telecom_churn_predictor

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

6 Commits
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

๐ŸŽฏ Telecom Customer Churn Prediction

Machine learning solution to predict customer churn and enable proactive retention strategies


๐Ÿš€ Business Problem

Telecom operators face significant challenges with customer retention, as customer acquisition costs are typically 5-25x higher than retention costs. This project develops a machine learning solution to identify customers at risk of churning, enabling Interconnect to offer targeted promotional codes and special plans before customers leave.

๐Ÿ’ก Why This Matters

  • Proactive retention strategies through early churn identification
  • Optimize marketing spend by targeting high-risk customers
  • Reduce revenue loss from customer defection
  • Increase customer lifetime value through data-driven interventions

๐Ÿ“Š Dataset Overview

Multi-source dataset from Interconnect telecom operator including 7,043 customers:

Data Source Key Features
๐Ÿ“‹ Contract Information Begin/end dates, contract types, billing preferences, payment methods
๐Ÿ‘ค Personal Data Demographics, senior citizen status, family information
๐ŸŒ Internet Services DSL/Fiber connections, security features, streaming services
๐Ÿ“ž Phone Services Multiple line options and usage patterns

๐ŸŽฏ Target Variable

  • 26.5% churn rate (1,869 churned customers out of 7,043)
  • Business-critical imbalance requiring specialized handling

๐Ÿ”ฌ Methodology

๐Ÿงน Data Preprocessing & Engineering

  • Multi-source integration: Merged 4 datasets using customer ID joins
  • Date processing: Converted contract dates to identify active vs churned customers
  • Missing value treatment: Filled gaps with business logic (new customers = 0 total charges)
  • Feature engineering: Created duration_days from contract length for enhanced variability
  • Categorical encoding: Mapped text values to numeric (Yes/No โ†’ 1/0, contract types โ†’ 0/1/2)

๐Ÿ“ˆ Exploratory Data Analysis

Key business insights discovered:

Finding Churn Impact
๐Ÿ“ Electronic check payments Higher churn rates - potential payment friction
๐Ÿ“… Month-to-month contracts Significantly higher churn vs annual contracts
๐Ÿ’ฐ Monthly charges Churned customers average ~$80 vs $65 for retained
๐ŸŒ Fiber optic service Highest churn ratio - service quality issues
โฑ๏ธ Contract duration Longer tenure strongly correlates with retention

๐Ÿค– Model Development

Implemented comprehensive ML pipeline:

# Models Tested
โ”œโ”€โ”€ Dummy Classifier (Baseline)
โ”œโ”€โ”€ Logistic Regression 
โ”œโ”€โ”€ Random Forest
โ”œโ”€โ”€ Gradient Boosting โญ (Best Performer)
โ””โ”€โ”€ AdaBoost

Advanced Techniques:

  • SMOTE oversampling for class imbalance handling
  • MinMax scaling for feature normalization
  • GridSearchCV for hyperparameter optimization
  • Cross-validation with F1-score optimization

๐Ÿ“Š Model Evaluation Strategy

F1-score selected as primary metric - balances precision and recall for business impact where both false positives (unnecessary interventions) and false negatives (missed churners) carry costs.


๐ŸŽฏ Key Results

๐Ÿ† Best Model: Gradient Boosting

  • F1-Score: 0.79 (2.8x better than baseline)
  • Overall Accuracy: 89%
  • Precision: 82% - High confidence in churn predictions
  • Recall: 76% - Captures 3 out of 4 potential churners

๐Ÿ“Š Model Comparison

Model F1-Score Accuracy Business Value
Dummy Classifier 0.28 61% Baseline
Logistic Regression 0.66 77% Good interpretability
Random Forest 0.67 79% Feature importance insights
Gradient Boosting 0.79 89% Optimal performance
AdaBoost 0.74 86% Strong alternative

๐Ÿ” Key Predictive Factors

  1. Contract Type - Month-to-month customers at highest risk
  2. Payment Method - Electronic check users show elevated churn
  3. Service Tenure - Newer customers more likely to leave
  4. Monthly Charges - Price sensitivity impacts retention
  5. Internet Service Type - Fiber optic users require attention

๐Ÿ’ผ Business Impact

๐ŸŽฏ Immediate Actionable Insights

  • Target month-to-month customers with loyalty programs
  • Improve electronic check payment experience
  • Focus retention efforts on fiber optic subscribers
  • Implement graduated pricing for high-charge customers
  • Enhance onboarding for new subscribers (first 90 days critical)

๐Ÿ“ˆ Expected ROI

  • 76% recall rate enables proactive intervention for majority of at-risk customers
  • 82% precision minimizes wasted marketing spend on false positives
  • Early intervention potential saves acquisition costs for retained customers

๐Ÿ› ๏ธ Technical Implementation

Core Technologies

Python | Pandas | NumPy | Scikit-learn
Matplotlib | Seaborn | Jupyter Notebook

Advanced Techniques

SMOTE Oversampling | MinMax Scaling
Grid Search CV | Feature Engineering
Ensemble Methods | Statistical Analysis

๐Ÿ“ Project Structure

๐Ÿ“ฆ Telecom_churn_predictor/
โ”œโ”€โ”€ ๐Ÿ““ Telecom_churn_predictor.ipynb    # Complete analysis & methodology
โ”œโ”€โ”€ ๐Ÿ“Š data/
โ”‚   โ”œโ”€โ”€ contract.csv                    # Contract information
โ”‚   โ”œโ”€โ”€ personal.csv                    # Customer demographics  
โ”‚   โ”œโ”€โ”€ internet.csv                    # Internet service data
โ”‚   โ””โ”€โ”€ phone.csv                       # Phone service data
โ”œโ”€โ”€ ๐Ÿ“‹ README.md                        # Project overview
โ””โ”€โ”€ ๐Ÿ“„ requirements.txt                 # Environment dependencies

๐Ÿš€ Quick Start

# Clone repository
git clone https://github.com/bergerache/Telecom_churn_predictor.git

# Install dependencies  
pip install pandas numpy matplotlib seaborn scikit-learn imbalanced-learn

# Launch analysis
jupyter notebook Telecom_churn_predictor.ipynb

๐Ÿ”ฎ Business Applications

Use Case Implementation
๐ŸŽฏ Targeted Campaigns Score customers monthly, prioritize top 20% risk
๐Ÿ“ž Retention Calls Automated alerts for high-risk customer segments
๐Ÿ’ฐ Pricing Strategy Adjust pricing based on churn probability scores
๐ŸŽ Promotional Offers Customize incentives by risk factors and preferences

๐Ÿฆ Banking & Fintech Applications

While this model was trained on telecom data, the methodology directly applies to financial services churn scenarios:

Telecom Scenario Banking Equivalent
Customer cancels service Account closure / Product attrition
Contract type impact Fixed-term vs flexible savings
Payment method friction Direct debit vs manual payment
Tenure analysis Customer relationship length
Monthly charges sensitivity Fee sensitivity / Price elasticity
Service usage patterns Transaction frequency / Product utilisation

Direct Applications

  • Deposit Account Churn โ€” Predicting customers likely to close current/savings accounts
  • Credit Card Attrition โ€” Identifying cardholders at risk of cancellation
  • Mortgage Refinancing โ€” Flagging customers likely to switch lenders
  • Investment Platform โ€” Early warning for ISA/pension transfer risk

Why This Transfers

The feature engineering approach (tenure analysis, usage patterns, contract terms) translates directly to banking KPIs:

  • Account age โ†’ Customer tenure
  • Monthly charges โ†’ Fee revenue per customer
  • Contract type โ†’ Product type (fixed/flexible)
  • Payment method โ†’ Direct debit adoption
  • Multiple services โ†’ Product holding depth

๐Ÿ“– Detailed Analysis

๐Ÿ”— View Complete Jupyter Notebook

Comprehensive methodology, EDA insights, model development, and business recommendations


Built for data-driven customer retention

Transforming customer behavior analysis into strategic business value