Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Absolutely. And beyond weight regularization, for any weighting followed by a sigmoid or other squashing function, large weights simply tend to saturate the squashing function and there is very little gradient (quickly effectively zero) to benefit from increasing the weight value past that point.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: