Also why do linear regression (OLS) models need gradient descent at all? Cannot ...

snakeboy · on Sept 22, 2017

Others have pointed out that matrix inversion is O(n^3) and hence computationally infeasible. It is also worth considering that the conditioning of X^t X, k(X^t X) can be as large as k(X)^2, so solving in this way can be very unstable.

dragandj · on Sept 22, 2017

While this is correct, you do not need to compute the inverse to solve this kind of least squares problem (and you do not need the inverse to solve any linear systems of equations).

gipp · on Sept 22, 2017

You can, yes, but inverting a matrix is computationally expensive, and for a large X, numerical optimization methods can be much more time/space efficient.

septimus111 · on Sept 22, 2017

From a pedagogical point of view, I think it's a very strange choice to go with gradient descent in this case. It makes linear regression look like something much more complicated than it actually is. People might be misled into thinking they need to hand code gradient descent every time they do a regression, for their 100 point dataset.

padthai · on Sept 22, 2017

I upvoted the replies about computational cost and stability, because they raise an important point. But what I was trying to talk about (as you are) was the presentation of the idea.

Linear models are much older than computers, dating back to Gauss at least, and they do not have anything to do with gradient descent.

tmoot · on Sept 22, 2017

really quickly:

matrix inversion is ~O(n^3)

gradient descent is ~O(np) where p is the number of predictors and n are the observations (n x p matrix).

for lasso, calculating that derivative of the multiplier is not possible (for all points), so coordinated descent is used.