Algorithm — cost function is the 1/m*SSreg
When alpha too small, it’s very slow. When alpha too large, it will overshoot and could be further away from the local minima.
When we plug in what cost function is, we get the following function when j = 1. when j =0, omit x(i)j.
This explains why cost function has 2 in its denominator.
Have hypothesis plot placed next to the cost function plot — this will show how the linear regression line changes as we change the parameters to minimize cost function (Sum of Squares — reg)
This algorithm requires lots of iterations to solve optima.
Yet, what’s coming up is
- we can skip the iterations
- we can learn more features
Terms
batch gradient descent
normal equations method — can numerically solve cost function