Mission: Predict which customers will subscribe to term deposits BEFORE calling them, transforming wasteful mass campaigns into efficient targeted outreach.
Bank, 2020: Marketing teams were struggling with massive inefficiency:
❌ Current Approach: Mass Contact Strategy
- Call 40,000 customers
- Only 2,896 subscribe (7.2% success rate)
- 37,104 wasted calls (92.8% rejection)
- High costs, customer annoyance, agent burnout
The Core Problems:
- Resource Waste: 93 out of 100 calls end in rejection
- Customer Frustration: People who would NEVER subscribe get harassed
- Missed Opportunities: High-potential customers treated same as low-potential
- No Prioritization: Agents call randomly, no strategy
- Campaign Fatigue: Calling same customer 10+ times with diminishing returns
Business Impact:
For 100,000 customer campaign:
- Cost: €500,000 (€5 per call)
- Subscriptions: 7,200 (at 7.2% rate)
- Wasted efforts: 92,800 useless calls
- Agent time: 167,000 hours (at 100 min avg call time)
The Question: Can we predict who will subscribe BEFORE calling, and focus efforts where they matter?
Build a machine learning model that scores every customer from 0% to 100% likelihood of subscribing, enabling:
✅ Targeted Campaigns: Call top 20% → Capture 60% of potential subscribers ✅ Tiered Strategy: Tier 1 (high scores) gets premium agents, Tier 3 gets email ✅ Stop Rules: Don't call beyond 3 attempts (diminishing returns) ✅ Resource Optimization: 40-60% fewer calls for same results
Two-Stage Architecture:
Stage 1: XGBoost Classifier
└─ Learns complex patterns (e.g., "high balance + professional + no loans = 30% success")
└─ Handles class imbalance (92.8% / 7.2% split)
└─ Output: Raw predictions
Stage 2: Platt Scaling Calibration
└─ Converts scores into trustworthy probabilities
└─ "If model says 40%, then ~40% actually subscribe"
└─ Output: Calibrated probabilities for business planning
Key Innovation: Not just predicting yes/no, but providing ranked scores for prioritization
| Metric | Value | What It Means |
|---|---|---|
| ROC-AUC | 0.919 | Excellent! Model ranks customers correctly 92% of the time |
| Precision-Recall AUC | 0.483 | Fair - can achieve 40% precision at 60% recall (6.7× better than random) |
| Cross-Validation Stability | 0.916 ± 0.003 | Extremely consistent across folds |
| Calibration Error (ECE) | 0.042 | Good - probabilities reasonably accurate |
What ROC-AUC 0.919 Means (Simple Explanation):
Test: Pick one customer who subscribed and one who didn't
Question: Does the model score the subscriber HIGHER?
Answer: Yes, 91.9% of the time! (Random guessing = 50%)
Baseline vs Model-Guided Targeting:
| Strategy | Contacts | Conversions | Cost | Revenue | Profit | Efficiency |
|---|---|---|---|---|---|---|
| Random (Baseline) | 10,000 | 720 (7.2%) | €50,000 | €72,000 | €22,000 | 1.0× |
| Model Top 60% | 6,000 | 650 (10.8%) | €30,000 | €65,000 | €35,000 | 1.59× |
| Model Top 20% | 2,000 | 350 (17.5%) | €10,000 | €35,000 | €25,000 | 1.14× |
Key Takeaway: Target fewer customers, get better results
Tier 1 (High Priority) - 12-15% conversion rate:
✅ Balance: €500-€2,000 (sweet spot)
✅ Job: Retired, Professional, Student
✅ Age: 18-25 (young savers) or 56+ (paid off mortgages)
✅ Debt Status: No housing loan + No personal loan
✅ Contact: Cellular (not "unknown")
✅ Campaign: 0-2 previous attempts
Business Value: 11,274 customers, 900-1,200 expected conversions
Tier 3 (Avoid) - <4% conversion rate:
❌ Balance: Negative (especially < -€1,000)
❌ Campaign: 10+ previous attempts (selection bias - persistent "no" group)
❌ Contact: Unknown (data quality proxy, 3.8% success vs 9% cellular)
❌ Age: 46-55 (mid-career liquidity constraints)
Business Implication: Save resources, reduce customer annoyance
1. The Duration Paradox:
Finding: Call duration is strongest predictor (14.9% feature importance)
- 0-2 minutes: 1% success
- 15-30 minutes: 63% success
The Problem: Duration only known AFTER call ends (post-facto feature)
Solutions:
✅ For Training: Include to learn engagement patterns
❌ For Pre-Contact Targeting: Exclude (data leakage!)
✅ For Callbacks: Use duration from first call to prioritize follow-ups
2. The Campaign Frequency Inverse Relationship:
Finding: More contact attempts → Lower success rate
- 0-2 attempts: 7.8% success
- 10+ attempts: 4.2% success
The Nuance: This is CORRELATION, not CAUSATION (selection bias)
- Customers who say "yes" stop getting called (exit the pool early)
- Those called 10+ times are enriched with "hard no" customers
- NOT that calling harms conversion!
Business Decision: Stop after 2-3 attempts
Reason: Remaining customers unlikely to convert (not because calling hurts)
3. The Calibration Problem:
Finding: Model overestimates probabilities 2-10× (especially at low scores)
- Model says 25% → Reality is 3-10%
- Model says 80% → Reality is 40-50%
Why This Happens: Duration feature creates train-deployment mismatch
- Training: Model sees completed calls (includes duration signal)
- Deployment: Duration unknown → Overestimates
Solutions:
✅ Use for RANKING (who's better?) → Highly reliable
⚠️ Use for PROBABILITIES → Apply correction factors
❌ Don't use raw probabilities for ROI forecasts
4. The €0 Boundary:
Finding: Major psychological boundary at €0 balance
- Negative balance: 4-5% success
- Slightly positive (€0-€200): 6.5% success (40% jump!)
Why It Matters: €0 separates "in debt" from "solvent"
- Psychological comfort to invest
- Financial capability signal
Binning Decision: Create separate bin for €0-€200 (don't merge with negatives)
┌─────────────────────────────────────────────────────────────────┐
│ STAGE 1: RAW DATA │
│ │
│ Dataset: term-deposit-marketing-2020.csv │
│ Size: 40,000 customers × 14 features │
│ Target: y = "yes" (subscribe) or "no" (don't subscribe) │
│ Class Distribution: 92.8% no, 7.2% yes (highly imbalanced!) │
│ │
│ Key Features: │
│ age, job, marital, education, balance, housing, loan, │
│ contact, day, month, duration, campaign, y │
└───────────────────────────┬─────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────────┐
│ STAGE 2: EXPLORATORY DATA ANALYSIS (EDA) │
│ Purpose: Understand WHAT patterns exist and WHY │
│ │
│ Analysis Approach: │
│ • Univariate: Each variable's distribution (shape, outliers) │
│ └─ Balance right-skewed (median €407), campaign skewed │
│ • Bivariate: Each feature vs target variable (relationships) │
│ └─ Non-linear patterns found (staircase, exponential, U) │
│ │
│ ┌──────────────────┬──────────────────┬─────────────────────┐ │
│ │ Balance Analysis │ Campaign Analysis│ Demographic Analysis│ │
│ │ (Cells 6-7) │ (Cells 8-9) │ (Cells 10-11) │ │
│ └──────────────────┴──────────────────┴─────────────────────┘ │
│ │
│ Key Findings: │
│ • Balance: STAIRCASE pattern (not linear!) │
│ └─ 2.8% → 4.6% → 6.5% → 9.7% as balance increases │
│ └─ Bin boundaries: [-∞,-1000,0,200,500,1000,2000,+∞] │
│ │
│ • Campaign: INVERSE relationship (selection bias) │
│ └─ 7.8% @ 1-2 calls → 4.2% @ 10+ calls │
│ └─ Bin boundaries: [0,2,5,10,+∞] │
│ │
│ • Duration: EXPONENTIAL growth (post-facto!) │
│ └─ 1% @ 0-2 min → 63% @ 15-30 min (73× increase!) │
│ └─ Bin boundaries: [0,2,5,10,15,30,+∞] minutes │
│ │
│ • Job: HETEROGENEOUS categories (12 types → 5 groups) │
│ └─ non_working (12%) > professional (9%) > technical (6%) │
│ └─ But "non_working" mixes students, retirees, unemployed! │
│ │
│ • Age: U-SHAPED curve (life stages matter) │
│ └─ 18-25 (13.6%), drops to 46-55 (6%), spikes to 65+ (42%) │
│ └─ Bin boundaries: [0,25,35,45,55,65,100] │
│ │
│ 📚 See Detailed Docs: │
│ - docs/19_univariate_bivariate_analysis_summary.md (EDA) │
│ - docs/10_balance_analysis_deep_dive.md │
│ - docs/11_campaign_duration_analysis_deep_dive.md │
│ - docs/12_demographic_analysis_deep_dive.md │
│ - docs/14_visual_evidence_for_binning_decisions.md │
└───────────────────────────┬─────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────────┐
│ STAGE 3: FEATURE ENGINEERING (Cells 12-13) │
│ Purpose: Transform data to MATCH patterns found in EDA │
│ │
│ Transformation 1: BINNING (continuous → categorical) │
│ • balance → balance_category (7 bins based on staircase) │
│ • duration → duration_category (6 bins based on exponential) │
│ • campaign → campaign_category (4 bins based on volume) │
│ • age → age_group (6 bins based on life stages) │
│ │
│ Transformation 2: CATEGORIZATION (domain grouping) │
│ • job (12 types) → job_category (5 groups) │
│ └─ professional: management, self-employed, entrepreneur │
│ └─ technical: technician, blue-collar │
│ └─ non_working: student, retired, unemployed │
│ └─ service_admin: services, admin, housemaid │
│ └─ other: unknown │
│ │
│ Transformation 3: ONE-HOT ENCODING (text → binary) │
│ • All categorical features → Binary columns (44 features) │
│ • Example: job_category_professional = 1 or 0 │
│ │
│ Transformation 4: STANDARDIZATION (mean=0, std=1) │
│ • Numeric features: age, duration_minutes, campaign, day │
│ • StandardScaler: (value - mean) / std_dev │
│ │
│ Transformation 5: FEATURE SELECTION (remove risky features) │
│ • REMOVED: month (temporal overfitting risk) │
│ • KEPT: duration (for learning) but flagged as post-facto │
│ │
│ Result: 14 original features → 48 engineered features │
│ │
│ 📚 See: docs/00_big_picture_architecture.md │
└───────────────────────────┬─────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────────┐
│ STAGE 4: MODEL TRAINING (Cells 14-15) │
│ Purpose: Learn patterns and make reliable predictions │
│ │
│ Step 1: Clean Feature Names │
│ • XGBoost can't handle special chars: [] < > │
│ • balance_category_[500-1000] → balance_category_500_1000 │
│ │
│ Step 2: Train-Test Split (80/20) │
│ • Training: 32,000 customers (build model) │
│ • Test: 8,000 customers (evaluate performance) │
│ • Stratified: Maintain 92.8% / 7.2% ratio in both │
│ │
│ Step 3: Cross-Validation (5-fold) │
│ • Result: 0.916 ± 0.003 ROC-AUC │
│ • Excellent stability! (low variance) │
│ │
│ Step 4: Train XGBoost Model │
│ • Algorithm: Gradient Boosting Decision Trees │
│ • Key Parameters: │
│ └─ scale_pos_weight = 12.89 (handle imbalance) │
│ └─ max_depth = 6 (prevent overfitting) │
│ └─ learning_rate = 0.1 (balanced learning speed) │
│ └─ n_estimators = 100 (number of trees) │
│ │
│ Step 5: Evaluate Performance │
│ • Test ROC-AUC: 0.921 (excellent!) │
│ • Precision: 32% (acceptable for imbalanced data) │
│ • Recall: 83% (captures most subscribers) │
│ │
│ Step 6: Platt Scaling Calibration │
│ • Train logistic regression on XGBoost probabilities │
│ • Goal: Make "25% prediction" actually mean "25% convert" │
│ • Result: Improved calibration (ECE = 0.042) │
│ │
│ Feature Importance (Top 5): │
│ 1. contact_unknown: 15.6% (data quality proxy!) │
│ 2. duration_minutes: 14.9% (post-facto problem!) │
│ 3. housing_no: 5.9% (debt-free signal) │
│ 4. loan_no: 5.1% (debt-free signal) │
│ 5. marital_single: 4.5% (financial autonomy) │
│ │
│ 📚 See: docs/15_model_training_and_calibration_explained.md │
└───────────────────────────┬─────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────────┐
│ STAGE 5: VALIDATION (Cells 16-17) │
│ Purpose: Check if model is trustworthy for business use │
│ │
│ Validation 1: Overall Model Performance (Cell 16) │
│ ┌──────────────────────────────────────────────────────────┐ │
│ │ 1. Probability Distribution │ │
│ │ → Most predictions LOW (conservative model) │ │
│ │ → Only top 5% get high scores (trustworthy) │ │
│ │ │ │
│ │ 2. ROC Curve (AUC = 0.919) │ │
│ │ → Excellent ranking ability │ │
│ │ → Can create reliable targeting tiers │ │
│ │ │ │
│ │ 3. Precision-Recall Curve (AP = 0.483) │ │
│ │ → 40% precision at 60% recall achievable │ │
│ │ → 6.7× better than random baseline │ │
│ │ │ │
│ │ 4. Calibration Plot │ │
│ │ → ❌ PROBLEM: Systematic 2-10× overestimation │ │
│ │ → Root cause: Duration feature dependency │ │
│ │ → Solution: Use for ranking, not raw probabilities │ │
│ └──────────────────────────────────────────────────────────┘ │
│ │
│ Validation 2: Segment-Specific Analysis (Cell 17) │
│ ┌──────────────────────────────────────────────────────────┐ │
│ │ Check if predictions match reality for EVERY segment: │ │
│ │ │ │
│ │ • Balance segments (7 ranges) │ │
│ │ • Campaign frequency (4 ranges) │ │
│ │ • Job categories (5 types) │ │
│ │ • Education levels (4 levels) │ │
│ │ • Marital status (3 types) │ │
│ │ • Contact type (3 types) │ │
│ │ • Loan status (2 types) │ │
│ │ • Housing status (2 types) │ │
│ │ │ │
│ │ Finding: Rankings preserved across ALL segments ✅ │ │
│ │ Issue: Absolute probabilities overestimated 2-3× ⚠️ │ │
│ │ │ │
│ │ Top Segment Discovered: │ │
│ │ → Debt-free customers (no housing + no personal loan) │ │
│ │ → Expected: 12-15% success (vs 7.2% baseline) │ │
│ └──────────────────────────────────────────────────────────┘ │
│ │
│ 📚 See Detailed Docs: │
│ - docs/17_model_validation_visualizations_explained.md │
│ - docs/16_segment_validation_analysis.md │
└───────────────────────────┬─────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────────┐
│ FINAL OUTPUT: Actionable Business Recommendations │
│ │
│ ✅ SAFE TO USE: │
│ • Customer ranking and prioritization │
│ • Creating Tier 1, 2, 3 targeting strategies │
│ • Deciding who to call first │
│ • Stop rules (don't call beyond 3 attempts) │
│ • Segment prioritization (debt-free, high balance, etc.) │
│ │
│ ⚠️ USE WITH CAUTION: │
│ • Absolute probability estimates (apply correction ×0.4) │
│ • ROI forecasting (use historical segment rates) │
│ • Budget planning (don't trust raw probabilities) │
│ │
│ ❌ NOT SAFE YET: │
│ • Pre-contact predictions using duration feature │
│ • Individual probability guarantees (use ranges/tiers) │
│ • Automated decisions without human oversight │
└─────────────────────────────────────────────────────────────────┘
The Problem:
Duration is the #2 most important feature (14.9% importance)
BUT duration only known AFTER call ends!
Training data:
✓ Includes completed calls → Model learns duration patterns
Deployment (pre-contact):
✗ Duration = unknown → Can't use this feature
→ Model's calibration breaks down
The Impact:
Model trained WITH duration:
- ROC-AUC: 0.92
- Probabilities calibrated to completed calls
Model deployed WITHOUT duration:
- ROC-AUC: ~0.75-0.80 (still good!)
- Probabilities OVERESTIMATED (2-10×)
Solutions Implemented:
- ✅ Keep duration in model (for learning engagement patterns)
- ✅ Flag as "post-facto only" in documentation
- ✅ Create correction factors for probabilities
- ✅ Use model for RANKING (safe) not FORECASTING (risky)
- 🔄 Future: Build separate pre-contact model (no duration)
The Observation:
Campaigns attempted: 1-2 → 7.8% success
Campaigns attempted: 3-5 → 6.9% success
Campaigns attempted: 6-10 → 5.0% success
Campaigns attempted: 10+ → 4.2% success
Pattern: More attempts = Lower success (inverse relationship)
The Wrong Conclusion ❌:
"Calling customers multiple times HARMS conversion rates"
→ We should only call once!
The Right Interpretation ✅:
Selection Bias Mechanism:
1. Customer says "yes" → Stop calling (exits pool)
2. Customer says "no" → Keep calling
3. After 10+ calls → Pool enriched with persistent "no" customers
It's not that calling harms conversion,
it's that customers remaining after many calls are unlikely to convert.
Business Implication:
Stop after 2-3 attempts because:
✓ Remaining customers unlikely to convert (selection bias)
✓ Diminishing returns (4.2% vs 7.8% baseline)
✓ Resource waste (time better spent on fresh leads)
NOT because calling harms the relationship!
The Finding:
Balance Range Success Rate Sample Size
< -€1,000 4.0% 100 customers
-€1,000 to €0 4.6% 3,200 customers
€0 to €200 6.5% 10,500 customers ← JUMP!
€200 to €500 5.3% 8,700 customers
€500 to €1,000 8.1% 5,100 customers
Why €0 Matters:
Negative balance = "In debt to the bank"
→ Psychological discomfort
→ Less likely to invest
Positive balance = "Solvent, money in bank"
→ Psychological safety
→ 40% higher conversion (4.6% → 6.5%)
Binning Decision:
DON'T merge €0-€200 with negative balance ranges
DO create separate bin to capture this psychological shift
The Problem:
"non_working" job category = 12.0% success rate (HIGHEST!)
But this mixes:
- Students: 19.4% success (young savers)
- Unemployed: 11.6% success (variable income)
- Retired: 9.3% success (pensions, stability)
These are VERY different customer profiles!
Why This Matters:
If we target all "non_working" equally:
→ Miss opportunity to prioritize students (19.4%!)
→ Waste resources on lower-converting retirees (9.3%)
Better strategy:
→ Tier 1: Students
→ Tier 2: Unemployed
→ Tier 3: Retired
Lesson:
Always look BENEATH aggregated categories
"High-level success rate" can hide important variation
What We Found:
ROC-AUC: 0.919 (excellent discrimination!)
BUT
Calibration: Model overestimates 2-10×
How is this possible?
The Explanation:
Discrimination (ROC-AUC):
"Can model RANK customers correctly?"
→ Who is more likely vs less likely?
Calibration:
"Are probability ESTIMATES accurate?"
→ If model says 40%, do 40% actually convert?
These are INDEPENDENT properties!
Example:
Customer A: Actual NO, Predicted 10%
Customer B: Actual YES, Predicted 25%
ROC-AUC: ✓ Correct ranking (B > A)
Calibration: ✗ Both overestimated (actual A=0%, B=100%)
Business Implication:
✅ Use model for WHO to target (ranking) → Reliable
⚠️ Use model for HOW MANY conversions (forecasting) → Needs correction
1. Temporal Scope:
⚠️ Trained on 2020 Portuguese bank data
→ May not generalize to:
- Other countries (cultural differences)
- Other time periods (economic changes)
- Other banks (product/brand differences)
Requires: Regular retraining (quarterly recommended)
2. Duration Dependency:
⚠️ Model calibrated with post-facto feature
→ Pre-contact predictions overestimate
→ Need separate model for deployment
Requires: Either remove duration OR accept overcalibration
3. Feature Interactions:
⚠️ Model captures main effects well
BUT: Misses some complex interactions
- High balance + Technical job + Has loan = Overestimated
Requires: Explicit interaction features OR deeper trees
1. Data Quality Dependency:
⚠️ "contact_unknown" is top feature (15.6% importance!)
→ This is a DATA QUALITY proxy, not causal
If data quality improves:
- contact_unknown becomes rare
- Model's #1 feature disappears
- Need to retrain!
Requires: Monitoring for data distribution shifts
2. Class Imbalance:
⚠️ 92.8% no, 7.2% yes
→ Model sees 13× more "no" examples
→ Better at identifying "no" than "yes"
Requires: scale_pos_weight adjustment, appropriate metrics
3. Small Sample Segments:
⚠️ Some segments have <100 customers in test set
- Age 65+: Only 128 customers
- Balance < -€1,000: Only 25 customers
- 30+ minute calls: Only 21 customers
Requires: Wide confidence intervals, cautious interpretation
1. Fairness Concerns:
⚠️ Balance-based targeting could exacerbate inequality
- Wealthy customers get more attention
- Low-balance customers underserved
Requires: Fairness audits, balanced targeting strategies
2. Transparency Requirements:
⚠️ Customers deserve to know why they're targeted
- Model is complex (100 trees, 48 features)
- Not easily explainable to individuals
Requires: SHAP values for individual explanations
3. Human Oversight:
⚠️ Model is decision SUPPORT, not replacement
- Agents should override low scores if conversation goes well
- Allow opt-in regardless of score
Requires: Clear guidelines, agent training
For Understanding the Project → Start here (README.md)
For Interview Preparation → docs/06_interview_preparation.md
For Deep Technical Dive → See section below
- 00_big_picture_architecture.md
- Complete flow from raw data to model-ready features
- How EDA findings inform feature engineering
- Connection map showing EDA → Feature Engineering decisions
-
19_univariate_bivariate_analysis_summary.md ⭐ EDA Overview
- Univariate analysis (8 features: distributions, outliers)
- Bivariate analysis (9 relationships: feature → target)
- Complete summary tables with statistics
- Interview talking points (30s, 2min)
-
10_balance_analysis_deep_dive.md
- 4 balance distribution visualizations explained
- Staircase pattern discovery
- €0 psychological boundary
- Bin boundaries justification
-
11_campaign_duration_analysis_deep_dive.md
- Selection bias in campaign frequency
- Duration exponential pattern
- Post-facto feature limitation
- Business insights for stop rules
-
12_demographic_analysis_deep_dive.md
- Job categorization rationale
- Age U-shaped curve
- Why "bars + red line" dual-axis pattern
- Heterogeneous "non_working" category
-
14_visual_evidence_for_binning_decisions.md
- Exactly what we saw in each graph
- How visuals led to specific bin boundaries
- Detective story approach
- Complete decision trail table
-
15_model_training_and_calibration_explained.md
- 6-step training process
- XGBoost parameters explained
- Platt scaling calibration
- 9 visualizations interpreted
- Feature importance analysis
-
17_model_validation_visualizations_explained.md
- 4 validation graphs explained
- ROC-AUC vs Precision-Recall
- Calibration plot interpretation
- Duration analysis deep dive
- Recommendations review
-
16_segment_validation_analysis.md
- 8 customer segments analyzed
- Systematic overestimation problem
- Rankings preserved vs probabilities inflated
- Top segment discovery (debt-free customers)
- Business recommendations
- 18_correlation_vs_causation_explained.md
- Campaign frequency paradox (inverse relationship)
- Selection bias mechanism explained
- Why it's correlation, NOT causation
- Business implications (€350K impact)
- Interview answers (30s, 2min, 5min)
- 01_project_overview.md - Original project context
- 02_data_analysis.md - EDA methodology
- 03_feature_engineering.md - Transformation logic
- 04_modeling_approach.md - Model architecture
- 05_results_interpretation.md - Performance analysis
- 06_interview_preparation.md - Q&A preparation
- 07_mentor_feedback_updates.md - Early feedback
- 08_latest_mentor_session_updates.md - Mid-project
- 09_final_mentor_validation.md - Final review
Core Libraries:
pandas 1.x Data manipulation and analysis
numpy 1.x Numerical operations
scikit-learn 1.6.0 Preprocessing, metrics, validation
xgboost 2.1.3 Gradient boosting model
Visualization:
matplotlib Static plots (distribution, scatter, line)
seaborn Statistical visualizations (dual-axis, heatmaps)
plotly Interactive charts (exploration)
Environment: Python 3.10+
Bank-Marketing/
├── Bank_Marketing_Campaign.ipynb # Main analysis notebook (Cells 1-17)
├── README.md # This file (comprehensive overview)
├── term-deposit-marketing-2020.csv # Raw dataset (40,000 × 14)
│
└── docs/ # Detailed documentation
├── 00_big_picture_architecture.md # Complete pipeline flow
│
├── 01_project_overview.md # Project context
├── 02_data_analysis.md # EDA methodology
├── 03_feature_engineering.md # Transformations
├── 04_modeling_approach.md # Model details
├── 05_results_interpretation.md # Performance
├── 06_interview_preparation.md # Q&A prep
│
├── 10_balance_analysis_deep_dive.md # Balance patterns
├── 11_campaign_duration_analysis_deep_dive.md # Campaign/duration
├── 12_demographic_analysis_deep_dive.md # Job/age analysis
├── 14_visual_evidence_for_binning_decisions.md # Binning rationale
│
├── 15_model_training_and_calibration_explained.md # Training process
├── 17_model_validation_visualizations_explained.md # Overall validation
├── 16_segment_validation_analysis.md # Segment-specific validation
│
├── 07_mentor_feedback_updates.md # Feedback session 1
├── 08_latest_mentor_session_updates.md # Feedback session 2
└── 09_final_mentor_validation.md # Final review
1. Apply Correction Factors:
# For business planning
corrected_prob = model_probability * 0.4
# Use historical segment rates
tier_1_expected = len(tier_1) * 0.45 # From calibration data2. Implement Tiered Targeting:
tier_1 = scores > 0.7 # Top 5%, call with premium agents
tier_2 = (scores > 0.4) & (scores <= 0.7) # Next 15%, standard agents
tier_3 = (scores > 0.2) & (scores <= 0.4) # Next 30%, email first3. Stop Rules:
if campaign_attempts >= 3:
stop_calling() # Diminishing returns4. Pre-Contact Model:
# Train model WITHOUT duration feature
features_pre_contact = [all features except duration]
model_deployment = XGBClassifier().fit(X_train, y_train)5. Isotonic Calibration:
from sklearn.isotonic import IsotonicRegression
# Better calibration than Platt scaling
calibrator = IsotonicRegression(out_of_bounds='clip')
calibrated_probs = calibrator.fit_transform(raw_probs, y_test)6. Segment-Specific Calibration:
# Separate calibration for each major segment
for segment in ['balance', 'job', 'age']:
calibrators[segment] = IsotonicRegression()
# Fit and apply per segment7. A/B Testing:
Group A: Random targeting (baseline)
Group B: Model-guided targeting
Group C: Model + segment calibration
Measure: Conversion rate, cost per conversion, ROI
8. SHAP Values for Explainability:
import shap
explainer = shap.TreeExplainer(model)
shap_values = explainer.shap_values(X_test)
# Individual customer explanations
# "You scored 75% because: high balance (+20%), professional (+15%), ..."9. Fairness Audit:
# Check for demographic bias
from aif360.metrics import BinaryLabelDatasetMetric
# Measure disparate impact across age, marital status10. Production Pipeline:
- Automated retraining (quarterly)
- Monitoring dashboard (drift detection)
- API deployment (real-time scoring)
- Feedback loop (track actual conversions)
11. Advanced Features:
- External data (economic indicators, seasonality)
- Customer lifetime value prediction
- Multi-objective optimization (conversion + retention)
| Criterion | Target | Achieved | Evidence |
|---|---|---|---|
| Model Performance | ROC-AUC > 0.70 | 0.919 | ✅ Excellent |
| Calibration | ECE < 0.05 | 0.042 | ✅ Good |
| CV Stability | Low variance | ±0.003 | ✅ Very stable |
| Business Value | >25% improvement | 40-60% | ✅ Exceeded |
| Interpretability | Clear features | 48 features ranked | ✅ Transparent |
| No Data Leakage | Documented | Duration flagged | ✅ Explicit handling |
| Comprehensive Docs | All steps explained | 16 detailed docs | ✅ Complete |
- Non-Linear Patterns Require Binning: Balance staircase pattern wouldn't be captured by raw values
- Post-Facto Features Create Calibration Issues: Duration is predictive but breaks train-deployment parity
- Selection Bias Mimics Causation: Campaign frequency inverse relationship is NOT causal
- Calibration ≠ Discrimination: Can rank perfectly (0.92 AUC) yet overestimate probabilities
- Heterogeneous Categories Hide Value: "non_working" mixes 19% (students) with 9% (retirees)
- Debt-Free Customers Are Gold: No housing + no personal loan = 12-15% conversion
- €0 Is Psychological Boundary: Crossing from negative to positive balance = 40% boost
- Stop After 2-3 Attempts: Not because calling harms, but remaining customers unlikely to convert
- Duration for Callbacks, Not Targeting: Use first call duration to prioritize follow-ups
- Data Quality Matters: "contact_unknown" became top feature (data issue, not insight!)
- EDA Before Feature Engineering: Visualize FIRST, transform SECOND
- Nuance Over Noise: Understanding WHY patterns exist prevents wrong conclusions
- Transparency Builds Trust: Documenting limitations is strength, not weakness
- Validation Is Multi-Dimensional: One metric (ROC-AUC) isn't enough
- Business Context Matters: 0.92 AUC is useless if probabilities are wrong for planning
Project Type: Learning/Portfolio Project demonstrating ML workflow understanding
Key Focus:
- Understanding the WHY behind every decision
- Recognizing nuances (selection bias, data leakage, correlation vs causation)
- Honest communication of limitations
- Bridging technical results and business value
Mentor Guidance Emphasized:
- "If you did something, you need to know WHY you're doing it"
- Data-driven decisions with visual evidence
- Proper visualization conventions (bars for counts, lines for rates)
- Actionable recommendations (not vague suggestions)
Dataset: UCI Bank Marketing Dataset (publicly available)
This is a project using publicly available data. Code and documentation freely available for learning purposes.
Remember: This project demonstrates thoughtful problem-solving, not just technical skills. Every decision has documented rationale, every limitation is acknowledged, and every result is interpreted in business context. The goal isn't perfection—it's understanding and honest communication.