And would the 2 memory algorithm be equivalent to a gradient descent with moment...

stellalo · on Nov 24, 2020

Recasting quantile estimation as an optimization problem is trivial: the q-quantile minimizes the “pinball” loss (see first eqn in http://statweb.stanford.edu/~owen/courses/305a/lec18.pdf) with parameter q. What they do in the paper is to take subgradient steps with respect to the latest observation (just think about subgradients as gradients, since the loss function is everywhere differentiable except for one point)

zaroth · on Nov 25, 2020

I hate it when the complexity of the lingo dramatically exceeds the complexity of the algorithm. Language shouldn’t be the barrier to understanding.

This seems to be particularly true in computer learning. We’re taking about a conditional step function here, right?

eru · on Nov 25, 2020

The lingo is complex here, because it's general enough to be used for much more complicated cases.

Think of it as a 'hello world' program. The typically 'hello world' program in eg Java teaches you more about the lingo of Java than about solving the problem of putting 'hello world' on the screen.

(Of course, there are still plenty of bad reasons to describe simple things in complex lingo. But the above is one good reason.)

stellalo · on Nov 24, 2020

Actually, it looks like in the paper something else is going on other than subgradient steps: there is some more randomization going on, that can prevent some steps from being taken. So yeah, there is a connection with online subgradient, but also more to it :-)

tadkar · on Nov 24, 2020

Thanks for the loss function reference! I wonder if there’s something waiting to be discovered here about doing gradient descent but only taking steps with some probability. Definitely something to think about, I can’t imagine this idea hasn’t been explored before. Thanks a lot for the insightful comments, I’ve definitely seen that work in a very new light after knowing about it for years!!

ppereira · on Nov 25, 2020

See quantile regression and hinge loss functions.