Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Highly recommended .. even the main contents diagram is a great visual overview of RL in general, as is the 30 minute intro YT video.

Im expecting to see a lot of hyper growth startups using RL to solve a realworld problem in engineering / logistics / medicine

LLMs currently attract all the hype for good reasons, but Im surprised VCs dont seem to be looking at RL companies specifically.




RL is definitely really cool but I heavily doubt that we're gonna see 'hyper growth' from RL outside of the context of maybe training reasoning LLMs.

The period from ~2012-2019 of AI research had deepmind (who was the undisputed leader in money and talent) go all in on RL to solve problems and while they did do lots of interesting and useful work, there wasn't anything quite so extraordinary / revolutionary in massively accelerating the field or some sort of crazy breakthrough.

Their over-focus on RL instead of transformers/llms is what allowed OpenAI to surprise everyone and overtake deepmind.

Yes, RL is a useful tool, but outside the context of training LLMs for reasoning there isn't really any breakthrough that makes it more than an interesting tool for certain situations.


> Im expecting to see a lot of hyper growth startups using RL to solve a realworld problem in engineering / logistics / medicine

I love when people on hn make market predictions based on how revolutionary they think something is. I guess startup people thank they're also VC people.

FYI Sutton's book came out in 1999; none of this is revolutionary anymore and yet I don't see any "hyper growth". The reason is exactly because while you can train these models to play super Mario, you cannot use them to solve real world problems.

https://www.google.com/books/edition/Reinforcement_Learning/...


Sure.. and neural networks came out a very long time ago, but are now arguably approaching usefulness in LLMs.

Perhaps thats because it takes a while for the ideas to get polished/weeded and diffuse into the engineer zeitgeist .. or it could be that compute / GPUs are now powerful enough to run at the scale needed.

re : "RL cannot be used to solve real world problems" .. well, I would argue that these are useful real-world problems :

  - predict protein folding structure from DNA sequence
  - stabilizing high temperature fusion plasma
  - improving weather forecasting efficiency
  - improve DeepSeek's recent LLM model

Im currently using RL techniques to find 3D geometry - pipes, beams, walls - in pointclouds. It is of practical benefit, as a lot of this is done manually, ballpark $5Bn/yr

But I concede I cannot point to a plethora of small startups using RL for these real-world problems .. yet.

This is a prediction, and I could be wrong in many ways - not least that LLMs digest RLs in full and learn to express their logical reasoning, approaching AGI, and use RLs internally, and so subsume and automate the use of RL.

Are VCs better at predicting the future.. I guess that is their job, and they have money on the line... but I think even they would admit they need a large portfolio to capture the unicorns.

VCs probably get a less detailed tech view than founders, but the large number of pitches they review should give them a noisy but wider overview of the whole bleeding edge of innovation.

I think startup founders are in the same future prediction business .. and arguably have more skin in the game.

Predictions would be pretty useless if they weren't somewhat controversial - a prediction we all agree on doesn't say much. Come back and chastize me if we dont see more RL startups in 12 months time !


> Come back and chastize me if we dont see more RL startups in 12 months time !

1999 is 26 years ago but ya sure this is the year they finally take off.

> Perhaps thats because it takes a while for the ideas to get polished/weeded and diffuse into the engineer zeitgeist .. or it could be that compute / GPUs are now powerful enough to run at the scale needed.

Or perhaps it could be that you're wrong and they're useless? Nah that couldn't be it.


And 1967 was 58 years ago, which was when the first deep neural network was trained with stochastic gradient descent. Yet, DNNs didn't take off until the 2010s when the hardware became powerful enough and data became plenty enough to successfully train and utilize them such that they were practical.


I think you GOT caught here. That's why you don't respond to the Nobel prize winning example of RL.


Are we talking about AlphaFold? It did not use RL, right?



Reinforcement learning is hard to apply to real-world problems, but one cannot deny the success that a company such as OpenAI has.


> you cannot use them to solve real world problems

Doesn't waymo and other self-driving systems use reinforcement learning? I thought it was used in robotics as well (i.e., bipedal, quadrupedal movement).


generally you are right in spirit.

however multi-armed bandit algorithms are highly useful in practice. these are a special case of RL (RL with one state, essentially).

there are even some extensions of applied bandit algorithms to "true RL", e.g. for recommender systems that want to consider history.

this is the place to look for real-world applications of RL.

also RL uses importance-sampling estimators of the gradient. these sometimes show up in other applications though not framed as "RL".


"FYI Maxwell's paper came out in 1865 and now it's 1896 and Marconi's radio, which he invented a whole year ago, still doesn't pick up anything but buzzes and static. The reason is exactly because while you can manipulate the electromagnetic field with current fluctuations, you cannot use it to solve real world problems."




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: