Using Machine Learning to identify Enron fraudsters

In this project, I will build a model for identifying potential fraudsters based on financial and e-mail data. For this, the following steps will be performed:

data exploration (learning about the data, cleaning and preparing the data)
feature selection and engineering (selecting the most significant features and creating new ones)
reducing the dimensionality of the data using principal component analysis
selection and tuning a supervised machine learning algorithms
validating the algorithm to ensure acceptable performance of the model

Results

The results are saved in the Jupyter notebook file in the repository.

Files

The following additional files can be found in the repository:

Enron_final.html: results in the html format.
final_project_dataset.pkl: dataset in pkl format.
final_project_dataset_modified.pkl, my_classifier.pkl, my_dataset.pkl, my_feature_list.pkl: files created as a result of project implementation.
poi_id.py: script with the python code referred to in the results file, as well as the final classifier.
tester.py: script used to test the classifier.
tools folder: scripts used for data processing.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Using Machine Learning to identify Enron fraudsters

Results

Files

FilesExpand file tree

README.MD

Latest commit

History

README.MD

File metadata and controls

Using Machine Learning to identify Enron fraudsters

Results

Files