Skip to content

super30admin/modeldebugging

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 

Repository files navigation

Model Debugging & Error Analysis — FAANG-Level Hands-On (Customer Churn)

Goal: Learn the workflow FAANG expects after you have a model: diagnose errors, find slices that fail, run feature ablations, and iterate safely.

Outcome: Students can:

  • build a baseline churn classifier,
  • compute slice metrics and identify failure cohorts,
  • perform error analysis (FP vs FN),
  • run feature ablation and interpret results,
  • tune threshold for business cost.

How to Start

  1. Fork this repository.
  2. Open debug_student_lab.ipynb in Google Colab.
  3. Complete all TODO sections.
  4. Restart runtime → Run All cells.
  5. Push changes and submit a Pull Request.

⚠️ Do NOT edit notebooks directly on GitHub.


Lab Rules (FAANG Style)

  • ✅ Start with a baseline, then debug systematically
  • ✅ Slice analysis must be on a held-out set
  • ✅ Keep preprocessing leakage-safe (Pipeline/ColumnTransformer)
  • ✅ Separate ranking metrics from threshold metrics

Dataset — Customer Churn

Expected path:

  • data/churn/churn.csv

Common schema (Telco churn style):

  • target: Churn (Yes/No) or churn (0/1)

If the file is missing, the notebook uses a small synthetic churn-like dataset so it still runs.


Section 1 — Baseline + Holdout

Task 1.1: Build a baseline pipeline

  • numeric + categorical preprocessing
  • LogisticRegression baseline

Checkpoint Questions:

  • Why is a pipeline required for trustworthy debugging?

Section 2 — Error Breakdown

Task 2.1: FP vs FN analysis

Interview Angle:

  • Which is worse: FP or FN? (Depends on business cost.)

Section 3 — Slice Analysis

Task 3.1: Slice by cohorts

Examples:

  • Contract
  • InternetService
  • PaymentMethod
  • tenure bucket

FAANG Gotcha:

  • “Overall accuracy looks fine” can hide severe cohort failures.

Section 4 — Feature Ablation

Task 4.1: Drop feature groups and re-evaluate

  • remove all service columns
  • remove price columns
  • remove contract columns

Section 5 — Thresholding with Cost

Task 5.1: Choose threshold for a cost ratio

Example: FN costs 5x FP.


Submission Expectations

  • Baseline + holdout metrics
  • At least 3 slice tables with findings
  • Feature ablation summary
  • Threshold choice explained

FAANG Interview Evaluation Rubric

Skill Evaluated
Debugging workflow
Slice analysis quality
Feature reasoning
Threshold/cost reasoning
Communication clarity

Stretch Problems (Optional)

  • Add calibration + ECE
  • Add SHAP/permutation importance (conceptual or sklearn permutation)
  • Add drift simulation by shifting tenure distribution

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors