Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> when you don't have enough training data ahead of time but do have the benefit of lots of user interactions

But this is just a characteristic of "online learning" algorithms, no? I thought RL was special method that is online only, but there are other algos that can be made to be online that aren't RL, if my understanding is correct. Then the advantage you cite isn't unique to RL at all.

You can even do online learning with SGD (stochastic gradient descent)



RL is special in the sense that your actions do not produce immediate feedback, and feedback are only available after a period of time.

E.g. You ate an apple, you opened the door, you arrived at office, and then had food poisoning.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: