A production-ready ML microservice for telecom customer churn prediction.
Trained on 7,043 real customers. Deployed via FastAPI + Docker. Returns churn probability + risk tier.
| Metric | Detail |
|---|---|
| 📊 Training Data | 7,043 telecom customers (Telco Customer Churn — Kaggle) |
| 🤖 Model | Random Forest (n_estimators=200, class_weight="balanced") |
| 🎯 Features | 19 real churn signals (tenure, contract type, monthly charges, etc.) |
| 📤 Output | Churn prediction + probability score + risk tier (Low / Medium / High) |
| 🐳 Deployment | Dockerized — one command to build and run |
| 📄 API Docs | Auto-generated Swagger UI at /docs |
Most ML projects end at a .ipynb notebook. This project completes the full ML lifecycle:
1. train_churn_model.py → Train & evaluate on real Telco dataset → save .pkl
2. main.py → FastAPI wraps model as a REST microservice
3. Dockerfile → Containerize for cloud deployment
The result is a service any application can call — no Python environment needed on the consumer side.
class CustomerData(BaseModel):
tenure: int # Months with the company (0–72)
Contract: int # 0=Month-to-month, 1=One year, 2=Two year
MonthlyCharges: float # Monthly bill in USD
TotalCharges: float # Cumulative charges to date
InternetService: int # 0=DSL, 1=Fiber optic, 2=No
# ... 14 additional features
@app.post("/predict")
def predict(customer: CustomerData):
df = pd.DataFrame([customer.model_dump()])
df = df[feature_columns] # Enforce training column order
prediction = int(model.predict(df)[0])
probability = float(model.predict_proba(df)[0][1])
return {
"prediction": "Will Churn" if prediction == 1 else "Will Stay",
"churn_probability": round(probability, 4),
"risk_level": "High" if probability >= 0.7 else "Medium" if probability >= 0.4 else "Low"
}| Layer | Technology |
|---|---|
| API Framework | FastAPI |
| Input Validation | Pydantic v2 BaseModel with field descriptions |
| ML Model | Random Forest (class_weight="balanced" for ~26% churn imbalance) |
| Feature Encoding | LabelEncoder per categorical column |
| Serialization | Pickle (model + feature column order) |
| Containerization | Docker (python:3.10-slim) |
| API Docs | Auto-generated OpenAPI / Swagger UI |
POST /predict
// Request
{
"gender": 1, "SeniorCitizen": 0, "Partner": 1, "Dependents": 0,
"tenure": 5, "Contract": 0, "PaperlessBilling": 1,
"PaymentMethod": 2, "MonthlyCharges": 70.35, "TotalCharges": 351.75,
...
}
// Response
{
"prediction": "Will Churn",
"churn_probability": 0.7823,
"risk_level": "High"
}Additional endpoints:
GET /— API infoGET /health— Health check + model version
# 1. Clone
git clone https://github.com/Rahilshah01/customer-churn-prediction-api.git
cd customer-churn-prediction-api
# 2. Download dataset from Kaggle → place CSV in project root
# https://www.kaggle.com/datasets/blastchar/telco-customer-churn
# 3. Train and save the model
pip install scikit-learn pandas
python train_churn_model.py
# 4. Build and run with Docker
docker build -t churn-api .
docker run -p 8000:8000 churn-api
# 5. Open Swagger UI
# http://localhost:8000/docsWithout Docker:
pip install fastapi uvicorn scikit-learn pandas pydantic
uvicorn main:app --reloadcustomer-churn-prediction-api/
├── train_churn_model.py # Data cleaning, training, evaluation, saves .pkl files
├── main.py # FastAPI app — prediction endpoint
├── churn_model.pkl # Serialized RandomForestClassifier
├── feature_columns.pkl # Saved feature order (prevents column mismatch)
├── Dockerfile
├── requirements.txt
└── README.md
Model trained in Customer Churn Analysis — full EDA on churn drivers across 7,000+ telecom customers. This repo handles the deployment phase.
Built by Rahil Shah · MS Data Science @ Stevens Institute of Technology