Skip to content

Latest commit

 

History

History
275 lines (216 loc) · 12.4 KB

File metadata and controls

275 lines (216 loc) · 12.4 KB

🧠 Applied AI & Data Science Portfolio

A comprehensive collection of Data Science, Machine Learning, and Deep Learning projects developed during an intensive data science course. This repository showcases end-to-end AI pipelines—from exploratory data analysis and feature engineering to model building, evaluation, and neural network deployment.


📋 Table of Contents


🎯 Overview

As an Applied AI Engineer, this portfolio demonstrates practical expertise across the full machine learning lifecycle. Each project tackles real-world problems using industry-standard tools and techniques, including:

  • Regression & Classification — predicting continuous values and categorical outcomes
  • Neural Networks — building deep learning models with TensorFlow/Keras
  • Exploratory Data Analysis (EDA) — uncovering insights through statistical visualization
  • Feature Engineering — handling missing data, outliers, encoding, and scaling
  • Model Evaluation — using metrics like MSE, R², accuracy, precision, recall, and confusion matrices

🛠 Tech Stack

Category Libraries & Frameworks
Core Python, NumPy, Pandas
Machine Learning scikit-learn (Linear Regression, Logistic Regression, Decision Trees, Random Forests, SVM, K-Means, etc.)
Deep Learning TensorFlow, Keras, Neural Networks (Sequential, Dense layers)
Visualization Matplotlib, Seaborn
Preprocessing StandardScaler, LabelEncoder, train_test_split
Environment Jupyter Notebooks, Kaggle, Google Colab

📁 Project Structure

Data_Science/
├── README.md                 # This file
├── All_Data/                 # Raw datasets
│   ├── Automobile.csv
│   ├── Boston.csv
│   ├── car.csv
│   ├── car_data - Copy.csv
│   ├── Diabetes Classification.csv
│   ├── Exam.csv / Exam_Copy.csv / Exam_Score_Prediction.csv
│   ├── Heardisease.csv
│   ├── IRIS.csv / Iris_Flower_Dataset.csv
│   ├── Loan.csv
│   ├── real_car (1).csv
│   ├── Student_Performance.csv
│   └── IMDB_1000.xls
└── models/                   # Jupyter notebooks with full implementations
    ├── Automobile_model.ipynb        # ANN for MPG prediction
    ├── Boston_Dataset.ipynb          # Regression on housing prices
    ├── diebetes.ipynb                # Diabetes classification (ML)
    ├── neural_network_Diabetes.ipynb # Diabetes classification (DL)
    ├── heart.ipynb                   # Heart disease prediction (ML)
    ├── nnn_hearth.ipynb              # Heart disease prediction (DL)
    ├── Iris_Flower_dataset.ipynb     # Flower classification
    ├── Loan_Dataset_Model.ipynb      # Loan approval prediction
    ├── Sklearn_Car_price.ipynb       # Car price prediction
    ├── Imd_rating.ipynb              # IMDB rating prediction (ANN)
    ├── Exam_prep.ipynb               # Exam score prediction
    ├── Exams_dataset.ipynb           # Student performance analysis
    ├── Ecoomerce_Constumers.ipynb    # E-commerce customer analysis
    ├── Linear_Regression.ipynb       # Foundational linear regression
    ├── Tensorflow.ipynb              # TensorFlow fundamentals
    ├── Stats.ipynb                   # Statistical analysis
    ├── Outliers.ipynb                # Outlier detection techniques
    ├── Graphs.ipynb                  # Data visualization techniques
    ├── Loan_Graph.ipynb              # Loan data visualization
    └── Matplotlib.ipynb              # Matplotlib explorations

🤖 Machine Learning Projects

🔹 Automobile Fuel Efficiency Prediction

File: Automobile_model.ipynb | Dataset: Automobile.csv

  • Task: Regression — Predict miles per gallon (MPG) from engine specs
  • Techniques: IQR outlier detection, median imputation, label encoding, feature scaling, neural network (Sequential model with Dense layers)
  • Features: cylinders, displacement, horsepower, weight, acceleration, model year, origin

🔹 Boston Housing Price Prediction

File: Boston_Dataset.ipynb | Dataset: Boston.csv

  • Task: Regression — Predict house prices from neighborhood and property features
  • Techniques: Exploratory analysis, correlation matrices, linear regression, residual analysis
  • Features: crime rate, RM (rooms), NOX, PTRATIO, LSTAT, etc.

🔹 Diabetes Classification

Files: diebetes.ipynb (ML) | neural_network_Diabetes.ipynb (DL)

  • Task: Binary Classification — Predict diabetes diagnosis
  • Techniques (ML): Logistic regression, decision trees, feature encoding
  • Techniques (DL): Multi-layer perceptron with TensorFlow/Keras, StandardScaler, train/test split
  • Features: Age, BMI, blood pressure, FBS, HbA1c, family history, smoking, diet, exercise

🔹 Heart Disease Prediction

Files: heart.ipynb (ML) | nnn_hearth.ipynb (DL)

  • Task: Binary Classification — Predict presence/absence of heart disease
  • Techniques (ML): Classification algorithms, feature selection, model evaluation
  • Techniques (DL): Neural network with multiple dense layers, label encoding for categorical features
  • Features: Age, gender, chest pain type, BP, cholesterol, max HR, ST depression, thallium scan

🔹 Iris Flower Classification

File: Iris_Flower_dataset.ipynb | Dataset: IRIS.csv

  • Task: Multi-class Classification — Classify iris species
  • Techniques: EDA (head, tail, info), scatter plots, classification models
  • Features: sepal length/width, petal length/width → species (setosa, versicolor, virginica)

🔹 Loan Approval Prediction

File: Loan_Dataset_Model.ipynb | Dataset: Loan.csv

  • Task: Binary Classification — Predict loan approval status
  • Techniques: Handling missing values, categorical encoding (Gender, Married, Education, Property Area), feature engineering
  • Features: Applicant income, co-applicant income, loan amount, credit history, employment status

🔹 Car Price Prediction

File: Sklearn_Car_price.ipynb | Dataset: car.csv

  • Task: Regression — Predict car selling price
  • Techniques: Kaggle environment, scikit-learn pipeline, LinearRegression, MSE, R² evaluation
  • Features: Year, present price, kilometers driven, fuel type, seller type, transmission, owner count

🔹 Exam Score Prediction

Files: Exam_prep.ipynb, Exams_dataset.ipynb | Dataset: Exam_Score_Prediction.csv

  • Task: Regression — Predict student exam scores
  • Techniques: Feature correlation analysis, linear regression, visualization
  • Features: study hours, class attendance, sleep quality, study method, facility rating, exam difficulty

🔹 E-Commerce Customer Analysis

File: Ecoomerce_Constumers.ipynb

  • Task: Regression/Analysis — Predict yearly spending from user behavior
  • Features: Avg session length, time on app/website, length of membership, email, address

🧠 Deep Learning Projects

🔹 Neural Network for Diabetes Detection

File: neural_network_Diabetes.ipynb

  • Architecture: Sequential model with multiple Dense layers
  • Preprocessing: LabelEncoder for categorical features, StandardScaler for normalization
  • Framework: TensorFlow 2.x / Keras

🔹 Neural Network for Heart Disease Detection

File: nnn_hearth.ipynb

  • Architecture: Multi-layer perceptron with dense layers
  • Preprocessing: IQR-based outlier handling, label encoding, train/test split
  • Evaluation: Accuracy, loss curves, confusion matrix

🔹 ANN for IMDB Movie Rating Prediction

File: Imd_rating.ipynb

  • Task: Regression — Predict IMDB ratings from movie features
  • Architecture: TensorFlow/Keras Sequential model
  • Features: Runtime, genre, director, star, votes, metascore, gross revenue

🔹 TensorFlow Fundamentals

File: Tensorflow.ipynb

  • Content: Core TensorFlow operations, tensor manipulation, building blocks for neural networks

📊 Data Analysis & Visualization

🔹 Statistical Analysis

File: Stats.ipynb

  • Topics: Descriptive statistics, distributions, hypothesis testing fundamentals

🔹 Outlier Detection

File: Outliers.ipynb

  • Techniques: IQR method, Z-score, visualization-based outlier identification

🔹 Data Visualization with Matplotlib & Seaborn

Files: Matplotlib.ipynb, Graphs.ipynb, Loan_Graph.ipynb

  • Charts: Scatter plots, histograms, heatmaps, pair plots, bar charts, line graphs
  • Use Cases: Correlation analysis, distribution visualization, model diagnostics

🎓 Key Competencies Demonstrated

Skill Area Details
Data Preprocessing Missing value imputation (median/mean), outlier detection (IQR), duplicate removal, feature dropping
Feature Engineering Label encoding, one-hot encoding, standardization, feature selection
Regression Linear Regression, Multiple Regression, polynomial features, MSE/R² metrics
Classification Logistic Regression, Decision Trees, Random Forests, SVM, KNN
Neural Networks Sequential models, Dense layers, activation functions, optimizers, loss functions
Model Evaluation Train/test split, cross-validation, confusion matrix, accuracy, precision, recall, F1, MSE, MAE, R²
Visualization Matplotlib, Seaborn — scatter plots, histograms, heatmaps, pair plots, box plots
EDA Dataset profiling, correlation analysis, distribution analysis, anomaly detection
Tools Jupyter Notebooks, Kaggle, Google Colab, Anaconda

📂 Datasets

Dataset Records Features Task Type
Automobile 398 9 (mpg, cylinders, displacement, horsepower, weight, etc.) Regression
Boston Housing 506 14 (crim, zn, rm, lstat, price, etc.) Regression
Diabetes 128+ 11 (Age, BMI, BP, FBS, HbA1c, etc.) Classification
Heart Disease 270 14 (Age, Chest Pain, BP, Cholesterol, Max HR, etc.) Classification
Iris 150 5 (sepal/petal dimensions, species) Classification
Loan 614 13 (Income, Loan Amount, Credit History, etc.) Classification
Car Price 301 15 (Year, KM Driven, Fuel Type, etc.) Regression
IMDB 1000 10 (Runtime, Genre, Director, Votes, etc.) Regression
Exam Score 20,000 13 (Study Hours, Attendance, Sleep, etc.) Regression
Student Performance 10,000 7 (Hours, Previous Scores, Activities, etc.) Regression
E-Commerce 500 8 (Session Length, App Time, Membership, etc.) Regression

🚀 Getting Started

Prerequisites

Python 3.8+

Install Dependencies

pip install pandas numpy matplotlib seaborn scikit-learn tensorflow jupyter

Run a Notebook

# Navigate to the models directory
cd models

# Launch Jupyter
jupyter notebook

Then open any .ipynb file and run cells sequentially.


📈 What's Next

  • Model deployment with Flask/FastAPI
  • Hyperparameter tuning (GridSearchCV, RandomizedSearchCV)
  • Ensemble methods (XGBoost, LightGBM)
  • NLP projects (text classification, sentiment analysis)
  • Computer Vision projects (CNNs, image classification)
  • MLOps practices (MLflow, model versioning, CI/CD)

👨‍💻 About Me

I'm an Applied AI Engineer passionate about building data-driven solutions. This portfolio represents my journey through foundational and advanced data science concepts, with hands-on experience across:

  • Supervised & Unsupervised Learning
  • Neural Networks & Deep Learning
  • Statistical Analysis & Data Visualization
  • End-to-end ML pipelines

I'm actively expanding into production ML systems, MLOps, and specialized AI domains (NLP, Computer Vision).

"Data is the new oil, but models are the engines that make it valuable."

Last Updated: April 2026