Skip to content

anshuman9468/Churn-Prediction-Model-

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

16 Commits
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

๐Ÿ“Š Telecom Customer Churn Prediction

End-to-End Production ML System

A full-stack machine learning system designed to predict customer churn in the telecommunications industry.
This project integrates an XGBoost classifier, a FastAPI backend, and a Streamlit frontend to provide real-time predictions, batch scoring, and actionable customer risk analytics.


๐Ÿš€ Project Overview

Customer churn is a significant revenue challenge in the telecom industry.
Early identification of at-risk customers enables targeted retention strategies, directly improving customer lifetime value (CLV).

This system prioritizes Recall (79.36%) to ensure maximum identification of potential churners โ€” aligning model performance with business objectives.


๐Ÿ—๏ธ System Architecture

โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ” โ”‚ Streamlit UI โ”‚ โ”‚ โ€ข Real-time Predictions โ”‚ โ”‚ โ€ข Batch Scoring โ”‚ โ”‚ โ€ข Customer Risk Analytics โ”‚ โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜ โ”‚ HTTP/REST โ–ผ โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ” โ”‚ FastAPI Backend โ”‚ โ”‚ โ€ข Prediction Endpoint โ”‚ โ”‚ โ€ข Preprocessing Pipeline โ”‚ โ”‚ โ€ข Model + Threshold Loading โ”‚ โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜ โ”‚ Load Model โ–ผ โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ” โ”‚ XGBoost Model โ”‚ โ”‚ โ€ข churn_xgb.pkl โ”‚ โ”‚ โ€ข Recall: 79.36% (Primary Metric) โ”‚ โ”‚ โ€ข F1 Score: 0.642 โ”‚ โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜


โœจ Key Features

  • โœ” High-Recall ML Model (79.36%)
  • โœ” Production-ready FastAPI REST API
  • โœ” Interactive Streamlit Interface
  • โœ” Real-Time + Batch Predictions
  • โœ” Fully Deployed Infrastructure

Deployment:

  • Backend: Render
  • Frontend: Streamlit Cloud

๐Ÿ“‚ Dataset

Source: Telco Customer Churn Dataset (Kaggle)
File: WA_Fn-UseC_-Telco-Customer-Churn.csv
Shape: 7,043 rows ร— 21 features

Target Variable

  • Churn โ†’ Yes / No
  • Class Imbalance:
    • 73.5% Non-Churn
    • 26.5% Churn

๐Ÿงฉ Feature Categories

Demographics

  • Gender
  • SeniorCitizen
  • Partner
  • Dependents

Services

  • PhoneService
  • MultipleLines
  • InternetService
  • OnlineSecurity
  • DeviceProtection
  • TechSupport
  • StreamingTV
  • StreamingMovies

Account Information

  • Tenure
  • Contract
  • PaperlessBilling
  • PaymentMethod
  • MonthlyCharges
  • TotalCharges

๐Ÿ”ง Data Preprocessing

Cleaning

  • Dropped customerID
  • Converted TotalCharges to numeric
  • Median imputation for missing values
  • Removed OnlineBackup due to quality concerns

Encoding

  • Label Encoding (binary variables)
  • One-Hot Encoding (multi-class variables)

Final feature count: 22 engineered features


๐Ÿค– Model Development

Train/Test Split

  • 80% Train
  • 20% Test
  • random_state = 42

๐ŸŸข Primary Model โ€” XGBoost Classifier

Hyperparameters n_estimators = 200 learning_rate = 0.05 max_depth = 5 subsample = 0.8 colsample_bytree = 0.8 scale_pos_weight = 2.7

Model Performance

Metric Score
Accuracy 76.58%
F1 Score 0.642
โญ Recall 79.36%

Why XGBoost?

  • Highest recall among tested models
  • 24% more churners identified than Logistic Regression
  • Handles class imbalance effectively
  • Strong F1 balance

๐ŸŸก Secondary Model โ€” Logistic Regression

Metric Score
Accuracy 81.83%
Recall 55.50%
F1 Score 0.618

While interpretable, Logistic Regression underperforms on recall, making it unsuitable for retention-driven objectives.


๐ŸŽฏ Business Metric Prioritization

  1. Recall โ€“ Critical (do not miss churners)
  2. F1 Score โ€“ Balanced evaluation
  3. Accuracy โ€“ Least important due to class imbalance

A model predicting โ€œNo Churnโ€ for everyone achieves 73.5% accuracy โ€” yet provides zero business value.


โš™๏ธ Threshold Optimization

Default threshold (0.5) is unsuitable for imbalanced datasets.

Optimized threshold: 0.01

This maximizes recall and aligns with churn prevention objectives.


๐Ÿ“ฆ Saved Artifacts

models/ โ”œโ”€โ”€ churn_xgb.pkl โ””โ”€โ”€ threshold.pkl


๐ŸŒ FastAPI Backend

Example Endpoint

from fastapi import FastAPI
import joblib

app = FastAPI()

model = joblib.load("models/churn_xgb.pkl")
threshold = joblib.load("models/threshold.pkl")

@app.post("/predict")
def predict(data: dict):
    features = preprocess(data)
    prob = model.predict_proba([features])[0][1]
    pred = int(prob >= threshold)

    return {
        "churn_probability": prob,
        "prediction": pred,
        "risk_level": "High" if pred == 1 else "Low"
    }
    Deployment:
https://churn-prediction-2qrp.onrender.com/
๐Ÿ–ฅ๏ธ Streamlit Frontend

Real-time prediction interface

Batch CSV scoring

Visual analytics dashboard

Deployment:
https://churn-frontend-g3ku8j45b7mfsg6s4ztjfy.streamlit.app/
๐Ÿ› ๏ธ Installation & Setup
Clone Repository
git clone https://github.com/yourusername/telecom-churn-prediction.git
cd telecom-churn-prediction
Backend Setup
cd backend
pip install -r requirements.txt
uvicorn app:app --reload
Frontend Setup
cd churn-frontend
pip install -r requirements.txt
streamlit run app.py
๐Ÿงช API Usage Example
curl -X POST "https://churn-prediction-2qrp.onrender.com/predict" \
-H "Content-Type: application/json" \
-d '{
  "gender": "Female",
  "SeniorCitizen": 0,
  "Partner": "Yes",
  "tenure": 12,
  "MonthlyCharges": 70
}'
๐Ÿ“ˆ Business Impact Analysis

Assumptions

Avg Revenue/User: $64/month

Lifetime Value โ‰ˆ $1500

Retention Cost: $75

Retention Success Rate: 40%

XGBoost (79.36% Recall)

1,483 churners identified

Revenue saved: $890,000

Campaign cost: $111,225

Net benefit: $778,775

Logistic Regression (55.50% Recall)

Revenue saved: $623,000

XGBoost prevents $267,000 additional revenue loss.

๐Ÿ“‚ Repository Structure
telecom-churn-prediction/
โ”‚
โ”œโ”€โ”€ backend/
โ”‚   โ”œโ”€โ”€ app.py
โ”‚   โ”œโ”€โ”€ preprocessing.py
โ”‚   โ”œโ”€โ”€ models/
โ”‚   โ”‚   โ”œโ”€โ”€ churn_xgb.pkl
โ”‚   โ”‚   โ””โ”€โ”€ threshold.pkl
โ”‚   โ”œโ”€โ”€ requirements.txt
โ”‚   โ””โ”€โ”€ render.yaml
โ”‚
โ”œโ”€โ”€ churn-frontend/
โ”‚   โ”œโ”€โ”€ app.py
โ”‚   โ”œโ”€โ”€ pages/
โ”‚   โ”œโ”€โ”€ utils/
โ”‚   โ””โ”€โ”€ requirements.txt
โ”‚
โ”œโ”€โ”€ notebooks/
โ”‚   โ””โ”€โ”€ main_analysis.ipynb
โ”‚
โ”œโ”€โ”€ README.md
โ””โ”€โ”€ .gitignore
๐Ÿ”ฎ Future Enhancements

SHAP-based Explainability

Automated Feature Engineering Pipeline

Customer Segmentation

Real-Time Scoring API Enhancements

Intervention Tracking & ROI Dashboard

๐Ÿ“ฌ Contact

๐Ÿ“ง workwithanshuman9468@gmail.com

โœ… Project Status

Production Ready
Model Version: 1.0
Last Updated: Dec 2025


---

If you want, I can now:

- Make it look like a **top-tier GitHub portfolio project**
- Add badges (build, deploy, Python version, etc.)
- Add a professional banner section
- Add a short recruiter-focused version  

Just tell me.

About

This is my 2nd Capstone Project. Enhancing my skills in ML Algorithm and gaining knowledge of the models and the proper steps from data preprocessing to model training and maintaing their accuracy more higher and accurate.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors