Machine Learning - Computer programs that improve their performance at some task through experience
Bias Variance Tradeoff
$$
E(y_0-\hat{f}(x_0))^2=Var(\hat{f}(x_0))+[Bias(\hat{f}(x_0))]^2+Var(\epsilon)
$$
Where

Linear Regression, lol
$$
f(x_i)=w_0+w_1x_i
$$
Mean Square Error
$$
MSE=\frac{1}{n}\sum_{i=1}^n{(y_i - f(x_i))^2}
$$
Least Squares for linear
$$
w_0=\overline{y}-w_1\overline{x},\ w_1=\frac{\overline{xy}-\overline{x},\overline{y}}{\overline{x^2}-(\overline{x})^2}
$$
Polynomial Regression
$$
y=w_0+w_1x+w_2x^2+\cdots +w_dx^d
$$
Logistic Regression (z=polynomial)
$$
p(x)=\frac{1}{1+e^{-z}}
$$
Sensitivity = True Positive Rate = Recall
Specificity = TN / (TN + FP)
FPR = FP / (TN + FP)
$$
F1=2\times\frac{Prec\times Recall}{Prec+Recall}
$$
KNN Rule of thumb:
Principal Component Analysis
Support Vector Machines
Deep Learning
- Feature learning is better than using hand-crafted features
- Feature learning should aim for learning hierarchical features
- Higher level features should not be simple linear combinations of low-level features We want these features to be:
- Informative, discriminative, and invariant to common variations, such as rotation, scaling, and noise
- Generalizable – able to capture and represent the underlying patterns and structure of the data in a way that can be effectively used to make predictions on new, unseen data.
- Hierarchical – hierarchical features are learned by progressively combining and abstracting lower-level features to form more complex and informative higher-level features
Regularization techniques for DNN
Dropout

- Only applied during the training.
- Dropout is a regularization technique used to reduce overfitting by randomly disabling a subset of neurons during training. It is particularly effective in large networks, long training sessions, or when training data is limited.
- Usually the dropping probability p is 0.5 or lower. Researchers recommend values between 0.2 and 0.5
- Dropout is most commonly applied after fully connected (dense) layers and is rarely used after convolutional layers. When used with convolutional layers, it typically involves a low dropout rate (i.e., low probability p of dropping units).
TODO Regularization techniques for DNNs
- Dropout, Batch Normalization, Early Stopping, Data augmentation Optimizers
- Adam, RMSProps, SGD with momentum Pretrained Layers
$$
\text{Gini} = 1 - \sum_{i=1}^{C} p_i^2
$$
Where