Credit-default-prediction

Credit Scoring Model for Loan Application Assessment

This project involves the development of a predictive credit scoring model to help a financial institution classify unsecured loan applications. By leveraging data analytics, the model aims to minimize credit risk, optimize loan approval rates, and provide a clear understanding of the key factors that drive loan defaults.

Business Problem

A bank needs to refine its loan approval process to balance market competitiveness with risk management. The core challenge is to accurately distinguish between applicants who are likely to repay a loan ("good" customers) and those who are likely to default ("bad" customers).

The key business objectives are:

Develop a model that accepts the maximum number of good applicants while correctly identifying at least 85% of bad applicants.
Develop a model that accepts at least 70% of good applicants while rejecting the maximum possible number of bad applicants.
Identify the most influential variables that determine a customer's repayment behavior to inform future lending strategies.

Dataset

The analysis is based on a historical dataset of past bank customers. The dataset (data.csv) contains customer-level information on financial health and loan characteristics. The target variable is BAD, where:

1: The applicant defaulted on the loan.
0: The applicant successfully paid off the loan.

Key Predictor Variables Include:

LOAN: The total amount of the loan requested.
MORTDUE: The amount due on the applicant's existing mortgage.
VALUE: The current value of the applicant's property.
DEBTINC: The applicant's debt-to-income ratio.
YOJ: Years at the applicant's current job.
DEROG: Number of major derogatory reports.
DELINQ: Number of delinquent credit lines.
CLAGE: Age of the oldest credit line in months.

Analytical Approach

The project follows a structured data analytics workflow to deliver a robust and interpretable solution.

Exploratory Data Analysis (EDA):
- Conducted a thorough investigation of all variables to understand their distributions, identify outliers, and assess data quality.
- Analyzed relationships between variables and their correlation with the target variable (BAD) using visualizations to rank feature importance.
Data Pre-processing:
- Developed strategies for handling missing values and records.
- Transformed categorical variables and created new features where appropriate to prepare the dataset for modeling.
Modeling:
- Built and evaluated at least two distinct classification algorithms to predict loan default.
- Selected appropriate performance measures aligned with the specific business objectives (e.g., recall, precision, ROC-AUC).
- Incorporated feature selection methods to create parsimonious and effective models.
Performance Evaluation and Optimization:
- Tuned model hyperparameters to optimize performance against the predefined business goals.
- Assessed the generalization performance of the final recommended models to ensure they are robust and reliable for future predictions.
Business Recommendations:
- Translated the model's findings into actionable business insights.
- Provided clear recommendations for the bank's lending strategy, highlighting key assumptions and potential limitations of the analytical solution.

Technology Stack

This analysis was conducted using Python and/or SAS. The primary Python libraries used are:

Pandas for data manipulation and analysis.
NumPy for numerical operations.
Matplotlib & Seaborn for data visualization.
Scikit-learn for building and evaluating machine learning models.

Machine Learning Algorithms Deployed

Histogram Gradient Boosting Classifier
Random Forest Classifier
Support Vector Classifier
Logistic Regression
k-Nearest Neighbors Classifier
Neural Network (Multi Layer Perceptron)
Decision Tree Classifier

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
EDA.py		EDA.py
FullDecisionTree.svg		FullDecisionTree.svg
README.md		README.md
modeling.py		modeling.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Credit-default-prediction

Credit Scoring Model for Loan Application Assessment

Business Problem

Dataset

Analytical Approach

Technology Stack

Machine Learning Algorithms Deployed

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Credit-default-prediction

Credit Scoring Model for Loan Application Assessment

Business Problem

Dataset

Analytical Approach

Technology Stack

Machine Learning Algorithms Deployed

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages