The goal of this project is to predict if a loan acquired by Fannie Mae will go into foreclosure or not.
Fannie Mae buys loans from lenders to try to incentive them to issue more loans.
Fannie Mae publishes here data for the loans that it has acquired and how they perform through time.
This project was done following a tutorial from Dataquest.
In 2020, Fannie Mae modified the way they publish their data. To avoid having to make too many modifications to the tutorial project, the old dataset was used.
An article on this project is available here.
- Clone this repo.
- Go to the folder:
cd loan-prediction. - Create a
datadirectory:mkdir data. - Go to the
datadirectory:cd data. - Get the data:
- Manually:
- Download the 2000-2015 dataset here.
- Extract the files from
2012 Q1through2015 Q1. - Remove the
.tarfile. - Move the files to the
datadirectory.
- Or run the download script:
python download_data.py.
- Manually:
- Go back to the
loan-predictiondirectory:cd ..
- Install the requirements:
pip install -r requirements.txt.
- Create a directory for the processed datasets:
mkdir processed - Combine the
AcquisitionandPerformancedatasets:python assemble.py:- This will create two files in the
processeddirectory:Acquisition.txtandPerformance.txt.
- This will create two files in the
- Generate the training data:
python annotate.py:- This will use
Acquisition.txtandPerformance.txtto create training data. - It will create a file
train.csvwith this data.
- This will use
- Run cross-validation on the training set, and print the associated error metrics:
python predict.py.