Machine learning solution to predict customer churn and enable proactive retention strategies
Telecom operators face significant challenges with customer retention, as customer acquisition costs are typically 5-25x higher than retention costs. This project develops a machine learning solution to identify customers at risk of churning, enabling Interconnect to offer targeted promotional codes and special plans before customers leave.
- Proactive retention strategies through early churn identification
- Optimize marketing spend by targeting high-risk customers
- Reduce revenue loss from customer defection
- Increase customer lifetime value through data-driven interventions
Multi-source dataset from Interconnect telecom operator including 7,043 customers:
| Data Source | Key Features |
|---|---|
| ๐ Contract Information | Begin/end dates, contract types, billing preferences, payment methods |
| ๐ค Personal Data | Demographics, senior citizen status, family information |
| ๐ Internet Services | DSL/Fiber connections, security features, streaming services |
| ๐ Phone Services | Multiple line options and usage patterns |
- 26.5% churn rate (1,869 churned customers out of 7,043)
- Business-critical imbalance requiring specialized handling
- Multi-source integration: Merged 4 datasets using customer ID joins
- Date processing: Converted contract dates to identify active vs churned customers
- Missing value treatment: Filled gaps with business logic (new customers = 0 total charges)
- Feature engineering: Created
duration_daysfrom contract length for enhanced variability - Categorical encoding: Mapped text values to numeric (Yes/No โ 1/0, contract types โ 0/1/2)
Key business insights discovered:
| Finding | Churn Impact |
|---|---|
| ๐ Electronic check payments | Higher churn rates - potential payment friction |
| ๐ Month-to-month contracts | Significantly higher churn vs annual contracts |
| ๐ฐ Monthly charges | Churned customers average ~$80 vs $65 for retained |
| ๐ Fiber optic service | Highest churn ratio - service quality issues |
| โฑ๏ธ Contract duration | Longer tenure strongly correlates with retention |
Implemented comprehensive ML pipeline:
# Models Tested
โโโ Dummy Classifier (Baseline)
โโโ Logistic Regression
โโโ Random Forest
โโโ Gradient Boosting โญ (Best Performer)
โโโ AdaBoostAdvanced Techniques:
- SMOTE oversampling for class imbalance handling
- MinMax scaling for feature normalization
- GridSearchCV for hyperparameter optimization
- Cross-validation with F1-score optimization
F1-score selected as primary metric - balances precision and recall for business impact where both false positives (unnecessary interventions) and false negatives (missed churners) carry costs.
- F1-Score: 0.79 (2.8x better than baseline)
- Overall Accuracy: 89%
- Precision: 82% - High confidence in churn predictions
- Recall: 76% - Captures 3 out of 4 potential churners
| Model | F1-Score | Accuracy | Business Value |
|---|---|---|---|
| Dummy Classifier | 0.28 | 61% | Baseline |
| Logistic Regression | 0.66 | 77% | Good interpretability |
| Random Forest | 0.67 | 79% | Feature importance insights |
| Gradient Boosting | 0.79 | 89% | Optimal performance |
| AdaBoost | 0.74 | 86% | Strong alternative |
- Contract Type - Month-to-month customers at highest risk
- Payment Method - Electronic check users show elevated churn
- Service Tenure - Newer customers more likely to leave
- Monthly Charges - Price sensitivity impacts retention
- Internet Service Type - Fiber optic users require attention
- Target month-to-month customers with loyalty programs
- Improve electronic check payment experience
- Focus retention efforts on fiber optic subscribers
- Implement graduated pricing for high-charge customers
- Enhance onboarding for new subscribers (first 90 days critical)
- 76% recall rate enables proactive intervention for majority of at-risk customers
- 82% precision minimizes wasted marketing spend on false positives
- Early intervention potential saves acquisition costs for retained customers
Core Technologies
Python | Pandas | NumPy | Scikit-learn
Matplotlib | Seaborn | Jupyter Notebook
Advanced Techniques
SMOTE Oversampling | MinMax Scaling
Grid Search CV | Feature Engineering
Ensemble Methods | Statistical Analysis
๐ฆ Telecom_churn_predictor/
โโโ ๐ Telecom_churn_predictor.ipynb # Complete analysis & methodology
โโโ ๐ data/
โ โโโ contract.csv # Contract information
โ โโโ personal.csv # Customer demographics
โ โโโ internet.csv # Internet service data
โ โโโ phone.csv # Phone service data
โโโ ๐ README.md # Project overview
โโโ ๐ requirements.txt # Environment dependencies
# Clone repository
git clone https://github.com/bergerache/Telecom_churn_predictor.git
# Install dependencies
pip install pandas numpy matplotlib seaborn scikit-learn imbalanced-learn
# Launch analysis
jupyter notebook Telecom_churn_predictor.ipynb| Use Case | Implementation |
|---|---|
| ๐ฏ Targeted Campaigns | Score customers monthly, prioritize top 20% risk |
| ๐ Retention Calls | Automated alerts for high-risk customer segments |
| ๐ฐ Pricing Strategy | Adjust pricing based on churn probability scores |
| ๐ Promotional Offers | Customize incentives by risk factors and preferences |
While this model was trained on telecom data, the methodology directly applies to financial services churn scenarios:
| Telecom Scenario | Banking Equivalent |
|---|---|
| Customer cancels service | Account closure / Product attrition |
| Contract type impact | Fixed-term vs flexible savings |
| Payment method friction | Direct debit vs manual payment |
| Tenure analysis | Customer relationship length |
| Monthly charges sensitivity | Fee sensitivity / Price elasticity |
| Service usage patterns | Transaction frequency / Product utilisation |
- Deposit Account Churn โ Predicting customers likely to close current/savings accounts
- Credit Card Attrition โ Identifying cardholders at risk of cancellation
- Mortgage Refinancing โ Flagging customers likely to switch lenders
- Investment Platform โ Early warning for ISA/pension transfer risk
The feature engineering approach (tenure analysis, usage patterns, contract terms) translates directly to banking KPIs:
- Account age โ Customer tenure
- Monthly charges โ Fee revenue per customer
- Contract type โ Product type (fixed/flexible)
- Payment method โ Direct debit adoption
- Multiple services โ Product holding depth
๐ View Complete Jupyter Notebook
Comprehensive methodology, EDA insights, model development, and business recommendations
Built for data-driven customer retention
Transforming customer behavior analysis into strategic business value