A JavaScript deep learning and reinforcement learning library

merricksb · on March 1, 2017

Discussed 2 days ago (244 points):

https://news.ycombinator.com/item?id=13742911

solomatov · on March 1, 2017

It's useful only for predictions. Unless, GPU (Possibly with WebCL) support will appear, it will be impractical to use it for training.

hulahoof · on March 1, 2017

I don't suppose you can expand the above? How would one go about training with one stack and replicating with this?

Wouldn't the libraries/algos be slightly different throwing off the weights?

make3 · on March 1, 2017

even for predictions, you need the model (what was learned in the training phase). deep learning models are of multiple gigs in size. so, in browser wouldn't be practical, except for toy stuff.

sending the input data to the server, doing the computations there and getting the answers back will be the only practical way to go for remotely serious applications for a while still

Scea91 · on March 1, 2017

And you would also have to make the model available to the client, which is not reasonable in many commercial applications.

platz · on March 1, 2017

I get why deep learning is nice but I just do not get the hype around reinforcement learning yet. RL seems great for things like training video game agents and have seen videos of this, but fail to understand where RL can be applied in the real world.

It reminds me a bit of genetic algorithms. GA is the 'last resort' when you truly know nothing about how to model your problem.

What is the sweet spot for RL?

tikhonj · on March 1, 2017

I work in supply chain optimization, and reinforcement learning has been an important technique in the field for decades. Supply chain problems are naturally modeled as Markov decision problems (MDPs). As the state space of your MDPs gets bigger and bigger, simulation-based reinforcement learning becomes one of the most versatile techniques for approximating optimal solutions.

I see some sort of reinforcement learning as the most promising technique for overcoming the dramatically named "curse of dimensionality" in the state—the single biggest roadblock to optimizing more complex supply chain models.

In fact, the study of MDPs and their solutions stems from operations research, and I think studying problems in that context give you a powerful way of understanding how reinforcement learning algorithms work. Basic inventory control problems are very intuitive, and there's a natural progression from exact dynamic programming methods (Bellman iteration and policy iteration) to different reinforcement learning algorithms that really helps build an intuition for how RL works.

sabertoothed · on March 1, 2017

I am working in that niche as well - probably even in a special niche of that niche.

Could I get in touch with you via PM?

Eridrus · on March 1, 2017

The problem setting for RL algorithms provides far less supervision than traditional supervised learning.

This is most clearly seen when you look at how you would train a supervised learning system to operate an RL agent - you would need to provide the correct action at every timestep. So RL algorithms are mostly interesting when you get periodic reward signals and the reward may depend significantly on actions you did previously, rather than the action you just did. Learning to grip objects is an interesting use case from robotics.

IMO the main reason it's getting more attention is that there is a lot of progress being made, and a lot of that progress is due to progress that is being made in the supervised learning of neural networks.

However, people see some strong parallels between RL and GANs which promise to greatly improve unsupervised and semi-supervised learning. Also there has been work on using RL algorithms (largely REINFORCE) to train non-differentiable parts of neural networks. And then there has been recent work on using RL to decide how to train neural nets over all.

So while most people in industry may never need to touch RL, it will be useful in some systems with time-dependent components and is worth learning from a research perspective.

platz · on March 1, 2017

Thanks. I can certainly see how time-series & planning actions are currently a tough fit for existing approaches. (I always chuckle a bit when I see things like having to turn time-series into stationary distributions, effectively attempting to remove the time information from the data completely)

Houshalter · on March 1, 2017

Reinforcement learning is fully general. NNs on their own can only do prediction. You can predict what object is in an image, or what word a person will say next, etc. And that is quite powerful, and there are a lot of useful things you can do with that ability.

But there are also a lot of things you can't do. Really any task that requires performing a series of actions to reach some kind of goal. Which covers most of the things we want AI to do. Like controlling a robot, playing a game, talking to a human, proving a theorem, etc.

Of course with regular ML, you could do mimicry, and predict what actions a human would do at every time step. But then you are severly limited by the time and quality of your training data. RL requires no training data and can potentially learn to be much better than humans.

ilyaeck · on March 1, 2017

The hype is likely largely due to AlphaGo. It was a big win and used RL ==> RL must be a silver bullet!

But more generally, when you don't have enough training data ahead of time but do have the benefit of lots of user interactions, and can afford to experiment with live users, then you may be in the sweet spot for RL.

platz · on March 1, 2017

> when you don't have enough training data ahead of time but do have the benefit of lots of user interactions

But this is just a characteristic of "online learning" algorithms, no? I thought RL was special method that is online only, but there are other algos that can be made to be online that aren't RL, if my understanding is correct. Then the advantage you cite isn't unique to RL at all.

You can even do online learning with SGD (stochastic gradient descent)

yzmtf2008 · on March 1, 2017

RL is special in the sense that your actions do not produce immediate feedback, and feedback are only available after a period of time.

E.g. You ate an apple, you opened the door, you arrived at office, and then had food poisoning.

thecity2 · on March 1, 2017

Literally any application driven by reward, which is quite a lot. Robotics, self-driving cars, optimization problems, advertising, ... The list goes on.

platz · on March 1, 2017

Well in that sense, all machine learning problems can be characterized as finding a min or max of some optimization problem. "Reward" seems just like a change in terminology then

gwern · on March 1, 2017

Sure, but I think the key difference here is that the optimization problem here has a particularly intractable form. In a supervised image problem, you spit out a classification probability, the loss function is a cross entropy loss or something, which is smooth and differentiable and you can do gradient descent over it no problem; you have any sort of X->Y problem with a differentiable loss, you train a differentiable or convex model and minimize/maximize the loss. In a RL problem, you might get back only 0/1, or you might have to give many classifications in a row before any loss arrives. How do you maximize/minimize on an entire series of discrete actions with global losses?

platz · on March 1, 2017

I get the impression this won't be sufficient, but my first thought, for such problems, would be to consider Long Short Term Memory (LSTM) networks, whose defining feature is to be capable of learning long-term dependencies (Remembering information for long periods of time is practically their default behavior).

But I can also appreciate that, from what I'm reading here, that RL brings to center actions/decisions to effect an outcome that might not be as easy to tweak in a supervised setting.

gwern · on March 1, 2017

LSTMs still need a differentiable loss, because you have to backpropagate a gradient through a long unrolled series of RNN timesteps. Conceptually, there's not much difference between a RNN which takes 10 inputs 1 step at a time and a single big feedforward which takes 10 inputs 10 steps at a time. If you can't define the loss on the feedforward NN, you can't do it with the RNN either, and so you can't learn the parameters for the LSTM units.

An example here would be a char-RNN. It predicts one character at a time, log probability, and the loss function is the log vs the actual character. Nice and differentiable, so you can take the char-RNN unrolled over 10 timesteps, and at each timestep calculate the gradient to optimize the loss. This also gives you a generative model: sample a character based on the probabilities. Now, take the same char-RNN and redefine the loss as 'whether the user pushed upvote or downvote on the entire 10-character string generated'; you have the unrolled RNN which generated the full string, and you backpropagate... what? What is the gradient for each LSTM parameter, telling it how it should be tweaked to slightly increase/decrease the loss?

platz · on March 1, 2017

I see. This is a good example showing the limitations of neural networks and LSTMs, thanks.

sabertoothed · on March 1, 2017

To add to the other comments:

RL is also extremely useful in scenarios where rules cannot easily be made explicit. Think of riding a bike or flying a helicopter or gripping objects with a robot arm. Here we can more easily define the reward function - but the agent has to figure out how to do things (to maximise expected reward).

samirparikh · on March 1, 2017

I had the same question a month ago (https://news.ycombinator.com/item?id=13380076)

blt · on March 1, 2017

Robotics

albertTJames · on March 1, 2017

Interesting work ! Although I am not learning that :) What would be really interesting is a common interface for the browser and node, the former using this library, and the latter using a node extension communicating with tensorFlow... And on the user side a syntax EXACTLY similar to the Keras syntax. That - would be the bomb.

make3 · on March 1, 2017

most deep learning models are of multiple gigs in size. so, in browser wouldn't be practical, except for toy stuff.

sending the input data to the server, doing the computations there and getting the answers back will be the only practical way to go for remotely serious applications for a while still

albertTJames · on March 2, 2017

CNN models with a fair amount of variables (1million) are in megs. Learning is another story, and could be done on the server, but prediction is totally feasible on the browser.

fenomas · on March 1, 2017

Does anyone know enough to lend some context here? I gather this works similarly to ConvNetJS, or solves some of the same problems. Does it go further, or work in a different way, or what?

vonnik · on March 1, 2017

Just curious: Where does the computation happen? Is it all Javascript?