> when you don't have enough training data ahead of time but do have the benefit... | Hacker News

Hacker Newsnew | past | comments | ask | show | jobs | submit

		platz on March 1, 2017 \| parent \| context \| favorite \| on: A JavaScript deep learning and reinforcement learn... > when you don't have enough training data ahead of time but do have the benefit of lots of user interactions But this is just a characteristic of "online learning" algorithms, no? I thought RL was special method that is online only, but there are other algos that can be made to be online that aren't RL, if my understanding is correct. Then the advantage you cite isn't unique to RL at all. You can even do online learning with SGD (stochastic gradient descent)

yzmtf2008 on March 1, 2017 [–]

RL is special in the sense that your actions do not produce immediate feedback, and feedback are only available after a period of time.

E.g. You ate an apple, you opened the door, you arrived at office, and then had food poisoning.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact