Pytorch Ultimate 2024 - Bert Gollnick

Layer Types:

Dense Layer - all perceptrons have a connection between one another
Convolutional Layer - layers consist of "filters", not all perceptrons connected
Recurrent Neural Networks - take their own output as an input with delay based on context
Long Short-Term Memory - uses a 'memory cell' for temporal sequences

Activation Functions:

ReLU
- LeakyReLU: x for x => 0, x * a for x < 0, a is usually .01
  - this ensures the gradient is never 0
tanh - nonlinear, but has a small range (normalize), activation btwn -1, 1
sigmoid - nonlinear, activation btwn 0, 1 -> better for probability
softmax - probability among n classes, used for multi-class classification

Loss Functions:

Regression
- Mean Squared Error
- Mean Absolute Error - MSE w/ abs instead of square
- Mean Bias Error - take away the abs sign now
- Output layer must have 1 node, typically used with linear activation functions
Binary Classification
- Binary Cross Entropy
- Hinge (SVM) Loss
- Output layer must have 1 node, typically used with sigmoid activation
Multi-label Classification
- Multi-label Cross Entropy
- Output layer has n nodes, typical activation function is softmax

Optimizers:

Gradient Descent
- Learning rate: can be too large (misses min) and too small (takes too long)
Adagrad - adapts learning rate to features, works well for sparse data sets
Adam - ADAptive Momentum estimation, includes previous gradients in calculation, popular
Stochastic Gradient Descent, Batch Gradient Descent

Frameworks:

Tensorflow - most popular, made by google
- he's making it seem like we're using tensorflow -_-

High Bias = Low Accuracy, High Variance = Low Precision
- High Bias means R^2 values of training or validation are off
- High Variance means the difference between the R^2 values of training and validation is high
General rule: More complex models -> Lower Bias and More Variance
Low variance algorithms: Linear Regression, LDA, Logistic Regression
High variance algorithms: Decision Trees, kNN, SVM
Resampling: e.g. train 5 models using 80/20 train/test splits so that all data is used for validation

working on files 015_NeuralNetworkFromScratch/*
StandardScaler from sklearn.preprocessing to normalize
- X_train_scale = scaler.fit_transform(X_train)
- X_test_scale = scaler.transform(X_test)

Provide feedback