Skip to content

devina-h/Credit-default-prediction

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Credit-default-prediction

Credit Scoring Model for Loan Application Assessment

This project involves the development of a predictive credit scoring model to help a financial institution classify unsecured loan applications. By leveraging data analytics, the model aims to minimize credit risk, optimize loan approval rates, and provide a clear understanding of the key factors that drive loan defaults.

Business Problem

A bank needs to refine its loan approval process to balance market competitiveness with risk management. The core challenge is to accurately distinguish between applicants who are likely to repay a loan ("good" customers) and those who are likely to default ("bad" customers).

The key business objectives are:

  1. Develop a model that accepts the maximum number of good applicants while correctly identifying at least 85% of bad applicants.
  2. Develop a model that accepts at least 70% of good applicants while rejecting the maximum possible number of bad applicants.
  3. Identify the most influential variables that determine a customer's repayment behavior to inform future lending strategies.

Dataset

The analysis is based on a historical dataset of past bank customers. The dataset (data.csv) contains customer-level information on financial health and loan characteristics. The target variable is BAD, where:

  • 1: The applicant defaulted on the loan.
  • 0: The applicant successfully paid off the loan.

Key Predictor Variables Include:

  • LOAN: The total amount of the loan requested.
  • MORTDUE: The amount due on the applicant's existing mortgage.
  • VALUE: The current value of the applicant's property.
  • DEBTINC: The applicant's debt-to-income ratio.
  • YOJ: Years at the applicant's current job.
  • DEROG: Number of major derogatory reports.
  • DELINQ: Number of delinquent credit lines.
  • CLAGE: Age of the oldest credit line in months.

Analytical Approach

The project follows a structured data analytics workflow to deliver a robust and interpretable solution.

  1. Exploratory Data Analysis (EDA):
    • Conducted a thorough investigation of all variables to understand their distributions, identify outliers, and assess data quality.
    • Analyzed relationships between variables and their correlation with the target variable (BAD) using visualizations to rank feature importance.
  2. Data Pre-processing:
    • Developed strategies for handling missing values and records.
    • Transformed categorical variables and created new features where appropriate to prepare the dataset for modeling.
  3. Modeling:
    • Built and evaluated at least two distinct classification algorithms to predict loan default.
    • Selected appropriate performance measures aligned with the specific business objectives (e.g., recall, precision, ROC-AUC).
    • Incorporated feature selection methods to create parsimonious and effective models.
  4. Performance Evaluation and Optimization:
    • Tuned model hyperparameters to optimize performance against the predefined business goals.
    • Assessed the generalization performance of the final recommended models to ensure they are robust and reliable for future predictions.
  5. Business Recommendations:
    • Translated the model's findings into actionable business insights.
    • Provided clear recommendations for the bank's lending strategy, highlighting key assumptions and potential limitations of the analytical solution.

Technology Stack

This analysis was conducted using Python and/or SAS. The primary Python libraries used are:

  • Pandas for data manipulation and analysis.
  • NumPy for numerical operations.
  • Matplotlib & Seaborn for data visualization.
  • Scikit-learn for building and evaluating machine learning models.

Machine Learning Algorithms Deployed

  1. Histogram Gradient Boosting Classifier
  2. Random Forest Classifier
  3. Support Vector Classifier
  4. Logistic Regression
  5. k-Nearest Neighbors Classifier
  6. Neural Network (Multi Layer Perceptron)
  7. Decision Tree Classifier

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages