This repository contains the course materials and resources for the Machine Learning Essentials course.
This course is designed to provide a beginner-friendly introduction to machine learning. It is intended for those who are new to machine learning, but have some experience with programming. The course will cover the following topics:
- Feature engineering and data preprocessing
- Data Imputation
- Numerical
- Categorical
- Data Scaling
- Standardization
- Normalization
- Data Encoding
- One-hot encoding
- Label encoding
- Handling Outliers
- Removing outliers
- Replacing outliers
- Capping outliers
- Discretization
- Grouping Operations
- Categorical columns
- Numerical columns
- Feature Split
- Log Transform
- Binning
- Scrubbing
- Generating Polynomial and Interaction Features
- Data Imputation
- Supervised learning
- Tree-based models
- Decision Trees
- Ensemble Methods
- Bagging
- Random Forests
- Boosting
- AdaBoost
- Gradient Boosting
- XGBoost
- voting
- Bagging
- Regression
- Simple Linear Regression
- Multiple Linear Regression
- Polynomial Interpolation
- Ordinary Least Square Regression
- Ridge Regression
- Lasso Regression
- Classification
- Logistic Regression
- K-Nearest Neighbors
- Support Vector Machines
- Naive Bayes
- Artificial Neural Networks
- Tree-based models
- Unsupervised learning
- Clustering
- K-Means
- Hierarchical Clustering
- Dimensionality Reduction
- Principal Component Analysis
- Random Projection
- Association Rule Learning
- Apriori
- FP-Growth
- Clustering
- Model Selection
- Resampling Methods
- Random Split
- Time Based Split
- K-Fold Cross Validation
- Stratified K-Fold Cross Validation
- Bootstrapping
- Probabilistic Methods
- Akaiki Information Criterion
- Bayesian Information Criterion
- Minimum Description Length
- Structural Risk Minimization
- Trade Off Methods
- Bias-Variance Trade Off
- Precision-Recall Trade Off
- Overfitting vs Underfitting
- Resampling Methods
- Model Evaluation
- Regression Metrics
- Mean Absolute Error
- Mean Squared Error
- Root Mean Squared Error
- Relative Squared Log Error
- R2 Score
- Adjusted R2 Score
- Classification Metrics
- Accuracy
- Precision
- Recall
- F1 Score
- ROC Curve
- AUC Score
- Log Loss
- Confusion Matrix
- Gain and Lift Charts
- Kolmogorov-Smirnov Chart
- Clustering Metrics
- Dunn Index
- Silhouette Coefficient
- Elbow Method
- Devis-Bouldin Index
- Fowlkes-Mallows Index
- Homegeneity, Completeness, and V-Measure
- Mutual Information
- Dimensionality Reduction Metrics
- Reconstruction Error
- Explained Variance Ratio
- Association Rule Learning Metrics
- Support
- Confidence
- Lift
- Leverage
- Conviction
- Regression Metrics
The following packages are required for this course:
- Python (>= 3.6)
- NumPy (>= 1.19.5)
- Pandas (>= 1.1.5)
- Matplotlib (>= 3.2.2)
- Seaborn (>= 0.11.1)
- Scikit-Learn (>= 0.24.1)
- Pillow (>= 8.1.0)