A comprehensive collection of Data Science, Machine Learning, and Deep Learning projects developed during an intensive data science course. This repository showcases end-to-end AI pipelines—from exploratory data analysis and feature engineering to model building, evaluation, and neural network deployment.
- Overview
- Tech Stack
- Project Structure
- Machine Learning Projects
- Deep Learning Projects
- Data Analysis & Visualization
- Key Competencies
- Datasets
- Getting Started
- Contact
As an Applied AI Engineer, this portfolio demonstrates practical expertise across the full machine learning lifecycle. Each project tackles real-world problems using industry-standard tools and techniques, including:
- Regression & Classification — predicting continuous values and categorical outcomes
- Neural Networks — building deep learning models with TensorFlow/Keras
- Exploratory Data Analysis (EDA) — uncovering insights through statistical visualization
- Feature Engineering — handling missing data, outliers, encoding, and scaling
- Model Evaluation — using metrics like MSE, R², accuracy, precision, recall, and confusion matrices
| Category | Libraries & Frameworks |
|---|---|
| Core | Python, NumPy, Pandas |
| Machine Learning | scikit-learn (Linear Regression, Logistic Regression, Decision Trees, Random Forests, SVM, K-Means, etc.) |
| Deep Learning | TensorFlow, Keras, Neural Networks (Sequential, Dense layers) |
| Visualization | Matplotlib, Seaborn |
| Preprocessing | StandardScaler, LabelEncoder, train_test_split |
| Environment | Jupyter Notebooks, Kaggle, Google Colab |
Data_Science/
├── README.md # This file
├── All_Data/ # Raw datasets
│ ├── Automobile.csv
│ ├── Boston.csv
│ ├── car.csv
│ ├── car_data - Copy.csv
│ ├── Diabetes Classification.csv
│ ├── Exam.csv / Exam_Copy.csv / Exam_Score_Prediction.csv
│ ├── Heardisease.csv
│ ├── IRIS.csv / Iris_Flower_Dataset.csv
│ ├── Loan.csv
│ ├── real_car (1).csv
│ ├── Student_Performance.csv
│ └── IMDB_1000.xls
└── models/ # Jupyter notebooks with full implementations
├── Automobile_model.ipynb # ANN for MPG prediction
├── Boston_Dataset.ipynb # Regression on housing prices
├── diebetes.ipynb # Diabetes classification (ML)
├── neural_network_Diabetes.ipynb # Diabetes classification (DL)
├── heart.ipynb # Heart disease prediction (ML)
├── nnn_hearth.ipynb # Heart disease prediction (DL)
├── Iris_Flower_dataset.ipynb # Flower classification
├── Loan_Dataset_Model.ipynb # Loan approval prediction
├── Sklearn_Car_price.ipynb # Car price prediction
├── Imd_rating.ipynb # IMDB rating prediction (ANN)
├── Exam_prep.ipynb # Exam score prediction
├── Exams_dataset.ipynb # Student performance analysis
├── Ecoomerce_Constumers.ipynb # E-commerce customer analysis
├── Linear_Regression.ipynb # Foundational linear regression
├── Tensorflow.ipynb # TensorFlow fundamentals
├── Stats.ipynb # Statistical analysis
├── Outliers.ipynb # Outlier detection techniques
├── Graphs.ipynb # Data visualization techniques
├── Loan_Graph.ipynb # Loan data visualization
└── Matplotlib.ipynb # Matplotlib explorations
File: Automobile_model.ipynb | Dataset: Automobile.csv
- Task: Regression — Predict miles per gallon (MPG) from engine specs
- Techniques: IQR outlier detection, median imputation, label encoding, feature scaling, neural network (Sequential model with Dense layers)
- Features: cylinders, displacement, horsepower, weight, acceleration, model year, origin
File: Boston_Dataset.ipynb | Dataset: Boston.csv
- Task: Regression — Predict house prices from neighborhood and property features
- Techniques: Exploratory analysis, correlation matrices, linear regression, residual analysis
- Features: crime rate, RM (rooms), NOX, PTRATIO, LSTAT, etc.
Files: diebetes.ipynb (ML) | neural_network_Diabetes.ipynb (DL)
- Task: Binary Classification — Predict diabetes diagnosis
- Techniques (ML): Logistic regression, decision trees, feature encoding
- Techniques (DL): Multi-layer perceptron with TensorFlow/Keras, StandardScaler, train/test split
- Features: Age, BMI, blood pressure, FBS, HbA1c, family history, smoking, diet, exercise
Files: heart.ipynb (ML) | nnn_hearth.ipynb (DL)
- Task: Binary Classification — Predict presence/absence of heart disease
- Techniques (ML): Classification algorithms, feature selection, model evaluation
- Techniques (DL): Neural network with multiple dense layers, label encoding for categorical features
- Features: Age, gender, chest pain type, BP, cholesterol, max HR, ST depression, thallium scan
File: Iris_Flower_dataset.ipynb | Dataset: IRIS.csv
- Task: Multi-class Classification — Classify iris species
- Techniques: EDA (head, tail, info), scatter plots, classification models
- Features: sepal length/width, petal length/width → species (setosa, versicolor, virginica)
File: Loan_Dataset_Model.ipynb | Dataset: Loan.csv
- Task: Binary Classification — Predict loan approval status
- Techniques: Handling missing values, categorical encoding (Gender, Married, Education, Property Area), feature engineering
- Features: Applicant income, co-applicant income, loan amount, credit history, employment status
File: Sklearn_Car_price.ipynb | Dataset: car.csv
- Task: Regression — Predict car selling price
- Techniques: Kaggle environment, scikit-learn pipeline, LinearRegression, MSE, R² evaluation
- Features: Year, present price, kilometers driven, fuel type, seller type, transmission, owner count
Files: Exam_prep.ipynb, Exams_dataset.ipynb | Dataset: Exam_Score_Prediction.csv
- Task: Regression — Predict student exam scores
- Techniques: Feature correlation analysis, linear regression, visualization
- Features: study hours, class attendance, sleep quality, study method, facility rating, exam difficulty
File: Ecoomerce_Constumers.ipynb
- Task: Regression/Analysis — Predict yearly spending from user behavior
- Features: Avg session length, time on app/website, length of membership, email, address
File: neural_network_Diabetes.ipynb
- Architecture: Sequential model with multiple Dense layers
- Preprocessing: LabelEncoder for categorical features, StandardScaler for normalization
- Framework: TensorFlow 2.x / Keras
File: nnn_hearth.ipynb
- Architecture: Multi-layer perceptron with dense layers
- Preprocessing: IQR-based outlier handling, label encoding, train/test split
- Evaluation: Accuracy, loss curves, confusion matrix
File: Imd_rating.ipynb
- Task: Regression — Predict IMDB ratings from movie features
- Architecture: TensorFlow/Keras Sequential model
- Features: Runtime, genre, director, star, votes, metascore, gross revenue
File: Tensorflow.ipynb
- Content: Core TensorFlow operations, tensor manipulation, building blocks for neural networks
File: Stats.ipynb
- Topics: Descriptive statistics, distributions, hypothesis testing fundamentals
File: Outliers.ipynb
- Techniques: IQR method, Z-score, visualization-based outlier identification
Files: Matplotlib.ipynb, Graphs.ipynb, Loan_Graph.ipynb
- Charts: Scatter plots, histograms, heatmaps, pair plots, bar charts, line graphs
- Use Cases: Correlation analysis, distribution visualization, model diagnostics
| Skill Area | Details |
|---|---|
| Data Preprocessing | Missing value imputation (median/mean), outlier detection (IQR), duplicate removal, feature dropping |
| Feature Engineering | Label encoding, one-hot encoding, standardization, feature selection |
| Regression | Linear Regression, Multiple Regression, polynomial features, MSE/R² metrics |
| Classification | Logistic Regression, Decision Trees, Random Forests, SVM, KNN |
| Neural Networks | Sequential models, Dense layers, activation functions, optimizers, loss functions |
| Model Evaluation | Train/test split, cross-validation, confusion matrix, accuracy, precision, recall, F1, MSE, MAE, R² |
| Visualization | Matplotlib, Seaborn — scatter plots, histograms, heatmaps, pair plots, box plots |
| EDA | Dataset profiling, correlation analysis, distribution analysis, anomaly detection |
| Tools | Jupyter Notebooks, Kaggle, Google Colab, Anaconda |
| Dataset | Records | Features | Task Type |
|---|---|---|---|
| Automobile | 398 | 9 (mpg, cylinders, displacement, horsepower, weight, etc.) | Regression |
| Boston Housing | 506 | 14 (crim, zn, rm, lstat, price, etc.) | Regression |
| Diabetes | 128+ | 11 (Age, BMI, BP, FBS, HbA1c, etc.) | Classification |
| Heart Disease | 270 | 14 (Age, Chest Pain, BP, Cholesterol, Max HR, etc.) | Classification |
| Iris | 150 | 5 (sepal/petal dimensions, species) | Classification |
| Loan | 614 | 13 (Income, Loan Amount, Credit History, etc.) | Classification |
| Car Price | 301 | 15 (Year, KM Driven, Fuel Type, etc.) | Regression |
| IMDB | 1000 | 10 (Runtime, Genre, Director, Votes, etc.) | Regression |
| Exam Score | 20,000 | 13 (Study Hours, Attendance, Sleep, etc.) | Regression |
| Student Performance | 10,000 | 7 (Hours, Previous Scores, Activities, etc.) | Regression |
| E-Commerce | 500 | 8 (Session Length, App Time, Membership, etc.) | Regression |
Python 3.8+pip install pandas numpy matplotlib seaborn scikit-learn tensorflow jupyter# Navigate to the models directory
cd models
# Launch Jupyter
jupyter notebookThen open any .ipynb file and run cells sequentially.
- Model deployment with Flask/FastAPI
- Hyperparameter tuning (GridSearchCV, RandomizedSearchCV)
- Ensemble methods (XGBoost, LightGBM)
- NLP projects (text classification, sentiment analysis)
- Computer Vision projects (CNNs, image classification)
- MLOps practices (MLflow, model versioning, CI/CD)
I'm an Applied AI Engineer passionate about building data-driven solutions. This portfolio represents my journey through foundational and advanced data science concepts, with hands-on experience across:
- Supervised & Unsupervised Learning
- Neural Networks & Deep Learning
- Statistical Analysis & Data Visualization
- End-to-end ML pipelines
I'm actively expanding into production ML systems, MLOps, and specialized AI domains (NLP, Computer Vision).
"Data is the new oil, but models are the engines that make it valuable."
Last Updated: April 2026