This worksheet illustrates an end to end pipeline used for solving general machine learning tasks. It includes selected techniques in the following key stages:
- Data preparation
- Understanding the data
- Data preprocessing
- Model training
- Testing model generalizability
- Post pipeline reflection
For illustration purpose, I have chosen "classification" as the task and have attached a sample data (data.csv) to run the code. But the intention is to introduce you to these stages and provide you with starter code that you can then extend to your problem statement. For any feedback, new functionality requests or to report an issue, please create a new issue in the 'Issues' tab of this repository. Refer - https://help.github.com/articles/creating-an-issue/.
The code is in Jupyter notebook. Installation instruction for this can be found here - https://jupyter.readthedocs.io/en/latest/install.html