Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

It is necessary to make multiple alternatives in the tree comparable in an easy way (and nothing is better comparable than scalars). They could also go about training a network that compares two positions to decide which one is superior, but that would require much more computation. Or another alternative would possibly to learn the value somehow jointly with the action selection, but that would possibly also be harder both to train and evaluate.


Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: