Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Also why do linear regression (OLS) models need gradient descent at all? Cannot you calculate the parameters directly?

y = X β + ε ...and a few assumptions give you... (X^t X)^-1 y = β*

I might be missing something in the blog post.



Others have pointed out that matrix inversion is O(n^3) and hence computationally infeasible. It is also worth considering that the conditioning of X^t X, k(X^t X) can be as large as k(X)^2, so solving in this way can be very unstable.


While this is correct, you do not need to compute the inverse to solve this kind of least squares problem (and you do not need the inverse to solve any linear systems of equations).


You can, yes, but inverting a matrix is computationally expensive, and for a large X, numerical optimization methods can be much more time/space efficient.


From a pedagogical point of view, I think it's a very strange choice to go with gradient descent in this case. It makes linear regression look like something much more complicated than it actually is. People might be misled into thinking they need to hand code gradient descent every time they do a regression, for their 100 point dataset.


I upvoted the replies about computational cost and stability, because they raise an important point. But what I was trying to talk about (as you are) was the presentation of the idea.

Linear models are much older than computers, dating back to Gauss at least, and they do not have anything to do with gradient descent.


really quickly:

matrix inversion is ~O(n^3)

gradient descent is ~O(np) where p is the number of predictors and n are the observations (n x p matrix).

for lasso, calculating that derivative of the multiplier is not possible (for all points), so coordinated descent is used.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: