My guess is that very small and very large values in the weights are already tra... | Hacker News

Hacker Newsnew | past | comments | ask | show | jobs | submit

		mFixman on Jan 16, 2023 \| parent \| context \| favorite \| on: Supporting half-precision floats is really annoyin... My guess is that very small and very large values in the weights are already trained away due to the regularisation of the cost function, so insignificant changes in a network don't tend to have significant changes in the output. You gain more by being able to run gradient descent faster than by having higher-precision floats.

Nevermark on Jan 16, 2023 [–]

Absolutely. And beyond weight regularization, for any weighting followed by a sigmoid or other squashing function, large weights simply tend to saturate the squashing function and there is very little gradient (quickly effectively zero) to benefit from increasing the weight value past that point.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact