> when you don't have enough training data ahead of time but do have the benefit of lots of user interactions
But this is just a characteristic of "online learning" algorithms, no? I thought RL was special method that is online only, but there are other algos that can be made to be online that aren't RL, if my understanding is correct. Then the advantage you cite isn't unique to RL at all.
You can even do online learning with SGD (stochastic gradient descent)
But this is just a characteristic of "online learning" algorithms, no? I thought RL was special method that is online only, but there are other algos that can be made to be online that aren't RL, if my understanding is correct. Then the advantage you cite isn't unique to RL at all.
You can even do online learning with SGD (stochastic gradient descent)