Skip to content

Latest commit

 

History

History
21 lines (17 loc) · 1.21 KB

File metadata and controls

21 lines (17 loc) · 1.21 KB

Using Machine Learning to identify Enron fraudsters

In this project, I will build a model for identifying potential fraudsters based on financial and e-mail data. For this, the following steps will be performed:

  • data exploration (learning about the data, cleaning and preparing the data)
  • feature selection and engineering (selecting the most significant features and creating new ones)
  • reducing the dimensionality of the data using principal component analysis
  • selection and tuning a supervised machine learning algorithms
  • validating the algorithm to ensure acceptable performance of the model

Results

The results are saved in the Jupyter notebook file in the repository.

Files

The following additional files can be found in the repository:

  • Enron_final.html: results in the html format.
  • final_project_dataset.pkl: dataset in pkl format.
  • final_project_dataset_modified.pkl, my_classifier.pkl, my_dataset.pkl, my_feature_list.pkl: files created as a result of project implementation.
  • poi_id.py: script with the python code referred to in the results file, as well as the final classifier.
  • tester.py: script used to test the classifier.
  • tools folder: scripts used for data processing.