A full-stack machine learning system designed to predict customer churn in the telecommunications industry.
This project integrates an XGBoost classifier, a FastAPI backend, and a Streamlit frontend to provide real-time predictions, batch scoring, and actionable customer risk analytics.
Customer churn is a significant revenue challenge in the telecom industry.
Early identification of at-risk customers enables targeted retention strategies, directly improving customer lifetime value (CLV).
This system prioritizes Recall (79.36%) to ensure maximum identification of potential churners โ aligning model performance with business objectives.
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ Streamlit UI โ โ โข Real-time Predictions โ โ โข Batch Scoring โ โ โข Customer Risk Analytics โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโ โ HTTP/REST โผ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ FastAPI Backend โ โ โข Prediction Endpoint โ โ โข Preprocessing Pipeline โ โ โข Model + Threshold Loading โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโ โ Load Model โผ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ XGBoost Model โ โ โข churn_xgb.pkl โ โ โข Recall: 79.36% (Primary Metric) โ โ โข F1 Score: 0.642 โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
- โ High-Recall ML Model (79.36%)
- โ Production-ready FastAPI REST API
- โ Interactive Streamlit Interface
- โ Real-Time + Batch Predictions
- โ Fully Deployed Infrastructure
Deployment:
- Backend: Render
- Frontend: Streamlit Cloud
Source: Telco Customer Churn Dataset (Kaggle)
File: WA_Fn-UseC_-Telco-Customer-Churn.csv
Shape: 7,043 rows ร 21 features
Churnโ Yes / No- Class Imbalance:
- 73.5% Non-Churn
- 26.5% Churn
- Gender
- SeniorCitizen
- Partner
- Dependents
- PhoneService
- MultipleLines
- InternetService
- OnlineSecurity
- DeviceProtection
- TechSupport
- StreamingTV
- StreamingMovies
- Tenure
- Contract
- PaperlessBilling
- PaymentMethod
- MonthlyCharges
- TotalCharges
- Dropped
customerID - Converted
TotalChargesto numeric - Median imputation for missing values
- Removed
OnlineBackupdue to quality concerns
- Label Encoding (binary variables)
- One-Hot Encoding (multi-class variables)
Final feature count: 22 engineered features
- 80% Train
- 20% Test
random_state = 42
Hyperparameters n_estimators = 200 learning_rate = 0.05 max_depth = 5 subsample = 0.8 colsample_bytree = 0.8 scale_pos_weight = 2.7
| Metric | Score |
|---|---|
| Accuracy | 76.58% |
| F1 Score | 0.642 |
| โญ Recall | 79.36% |
- Highest recall among tested models
- 24% more churners identified than Logistic Regression
- Handles class imbalance effectively
- Strong F1 balance
| Metric | Score |
|---|---|
| Accuracy | 81.83% |
| Recall | 55.50% |
| F1 Score | 0.618 |
While interpretable, Logistic Regression underperforms on recall, making it unsuitable for retention-driven objectives.
- Recall โ Critical (do not miss churners)
- F1 Score โ Balanced evaluation
- Accuracy โ Least important due to class imbalance
A model predicting โNo Churnโ for everyone achieves 73.5% accuracy โ yet provides zero business value.
Default threshold (0.5) is unsuitable for imbalanced datasets.
Optimized threshold: 0.01
This maximizes recall and aligns with churn prevention objectives.
models/ โโโ churn_xgb.pkl โโโ threshold.pkl
from fastapi import FastAPI
import joblib
app = FastAPI()
model = joblib.load("models/churn_xgb.pkl")
threshold = joblib.load("models/threshold.pkl")
@app.post("/predict")
def predict(data: dict):
features = preprocess(data)
prob = model.predict_proba([features])[0][1]
pred = int(prob >= threshold)
return {
"churn_probability": prob,
"prediction": pred,
"risk_level": "High" if pred == 1 else "Low"
}
Deployment:
https://churn-prediction-2qrp.onrender.com/
๐ฅ๏ธ Streamlit Frontend
Real-time prediction interface
Batch CSV scoring
Visual analytics dashboard
Deployment:
https://churn-frontend-g3ku8j45b7mfsg6s4ztjfy.streamlit.app/
๐ ๏ธ Installation & Setup
Clone Repository
git clone https://github.com/yourusername/telecom-churn-prediction.git
cd telecom-churn-prediction
Backend Setup
cd backend
pip install -r requirements.txt
uvicorn app:app --reload
Frontend Setup
cd churn-frontend
pip install -r requirements.txt
streamlit run app.py
๐งช API Usage Example
curl -X POST "https://churn-prediction-2qrp.onrender.com/predict" \
-H "Content-Type: application/json" \
-d '{
"gender": "Female",
"SeniorCitizen": 0,
"Partner": "Yes",
"tenure": 12,
"MonthlyCharges": 70
}'
๐ Business Impact Analysis
Assumptions
Avg Revenue/User: $64/month
Lifetime Value โ $1500
Retention Cost: $75
Retention Success Rate: 40%
XGBoost (79.36% Recall)
1,483 churners identified
Revenue saved: $890,000
Campaign cost: $111,225
Net benefit: $778,775
Logistic Regression (55.50% Recall)
Revenue saved: $623,000
XGBoost prevents $267,000 additional revenue loss.
๐ Repository Structure
telecom-churn-prediction/
โ
โโโ backend/
โ โโโ app.py
โ โโโ preprocessing.py
โ โโโ models/
โ โ โโโ churn_xgb.pkl
โ โ โโโ threshold.pkl
โ โโโ requirements.txt
โ โโโ render.yaml
โ
โโโ churn-frontend/
โ โโโ app.py
โ โโโ pages/
โ โโโ utils/
โ โโโ requirements.txt
โ
โโโ notebooks/
โ โโโ main_analysis.ipynb
โ
โโโ README.md
โโโ .gitignore
๐ฎ Future Enhancements
SHAP-based Explainability
Automated Feature Engineering Pipeline
Customer Segmentation
Real-Time Scoring API Enhancements
Intervention Tracking & ROI Dashboard
๐ฌ Contact
๐ง workwithanshuman9468@gmail.com
โ
Project Status
Production Ready
Model Version: 1.0
Last Updated: Dec 2025
---
If you want, I can now:
- Make it look like a **top-tier GitHub portfolio project**
- Add badges (build, deploy, Python version, etc.)
- Add a professional banner section
- Add a short recruiter-focused version
Just tell me.