This repository showcases techniques and best practices for dataset preprocessing, machine learning algorithms, and model training. It is organized with a clean, logical directory structure for maximum clarity and reproducibility.
src/
├── app.py
├── datasets
│ ├── adult.csv
│ ├── enriched_student_academic_performance_dataset.csv
│ ├── preprocessed_SAP_ds.csv
│ └── processed_adult.csv
├── models
│ ├── ds1
│ │ ├── best_model_random_forest.pkl
│ │ ├── label_encoder.pkl
│ │ ├── minmax_scaler.pkl
│ │ ├── standard_scaler.pkl
│ │ └── top_features.pkl
│ └── ds2
│ ├── daves_bouldin_model.pkl
│ ├── label_encoder.pkl
│ └── top_features.pkl
├── processing
│ ├── ds1
│ │ ├── ds1_pre-processing.ipynb
│ │ ├── prediction.txt
│ │ └── sample.py
│ ├── ds2
│ │ ├── ds2_preprocessing.ipynb
│ │ └── ds2_Student_Academic_Performance_Report
│ └── Feature Extraction
│ └── feature_extraction.ipynb
├── requirements.txt
├── templates
│ ├── index.html
│ └── script.js
└── utils.py
-
ds1 (Adult Income Dataset):
Processing notebooks and scripts for the UCI Adult Income dataset are found insrc/processing/ds1/ds1_pre-processing.ipynb.
Outputs: processed datasets and trained models stored insrc/datasets/andsrc/models/ds1/. -
ds2 (Student Academic Performance Dataset):
Processing and analysis for student academic performance insrc/processing/ds2/ds2_preprocessing.ipynb.
Outputs: processed datasets and models insrc/datasets/andsrc/models/ds2/. -
Feature Extraction (Wine Dataset):
Demonstrates advanced feature extraction using an autoencoder on the Wine dataset insrc/processing/Feature Extraction/feature_extraction.ipynb.
-
Install Requirements
pip install -r src/requirements.txt -
Process Data & Train Models
Run the relevant notebook(s) in thesrc/processing/folders (ds1,ds2, orFeature Extraction) to generate processed datasets and trained models. -
Start the Application
cd src python app.pyThe web UI will be accessible at http://127.0.0.1:{port} (default port as defined in
app.py).
- The
templates/folder contains the UI files (index.html,script.js) for the web interface. - Utility functions are defined in
src/utils.py. - All data and model artifacts are stored in the respective
datasets/andmodels/subfolders. - For best results, follow the directory structure and execution order as described above.
- Adult Income Dataset (UCI)
- Student Academic Performance Dataset
- Wine Dataset (scikit-learn)
Feel free to fork, open issues, or submit pull requests to enhance functionality or add new ML techniques!