-
Notifications
You must be signed in to change notification settings - Fork 2
Linear and Logistic Regression
Linear Regression:
F = \beta 0 + \beta1 X1 + ....
Reason why linear regression can't be converted to classification Because the fitting function is internally calculus/derivative which won't return a splitting value that is continuous.
Hence , logistic regression which is a sigmoid. y = 1 / (1+ e^-x)
If x is 0, then y is 1/2. If x is really high, then y is 0 If x is small, then y is 1
e^1000000000 = infinity e^-1000000000 = 0
Logistic regression equn:
g = 1/ (1+e^-(\beta0 + \beta1 X1 + ....)
If you had 4 classes you would have 4 such equations
The probabilities of the classes will add to 1.
The internals of regression:
Cost Function / Gradient Descent The fit algorithm basically tries different coefficients and looks at the error rate. If we plotted all the error rates for the different co-efficients we'd get a parabola, and the fit algorithm tries to find the global optima or the lowest point or the equation that gives the least error rate.
In some other models, there could be more than one local optima, so in those cases we optimize using step size and velocity after reaching the first optima.
Log likelihood is another way to do this , which is the Bayesian way to do it.
The other way to work with this is calculus, which might not be so fast, but has it's advantages.
Other topics to delve into to learn further is : perceptron, stochastic gradient descent, back propogation
What's a greedy way of doing this?